Three cool things you can do with Scalameta
While working on scalafmt, I've gotten some exposure to scala.meta. In the spirit of Buzzfeed list articles, here are some cool things you can do with the library.
Setup
All the following examples are run from the Ammonite-REPL. Once
installed, run amm
and execute the following lines:
@ load.ivy("org.scalameta" %% "scalameta" % "0.0.5-M1")
@ import scala.meta._
import scala.meta._
@ def file(filename: String) = new java.io.File(filename)
1. Check if two Scala source files are equal (ignoring comments and whitespace)
Assume we have two programs:
$ cat A.scala
object Hello extends App {
// This is A.scala
println("Hello world!")
}
$ cat B.scala
// This is B.scala
object Hello extends App { println("Hello world!") }
The programs are clearly identical, their abstract syntax trees (ASTs) are the same. However, it might be trickier to tell if files were longer.
Let's use scala.meta
's show[Structure]
to programatically compare the two
ASTs serialized as strings.
@ def structure(filename: String) = file(filename).parse[Source].show[Structure]
@ structure("A.scala")
res1: String = """
Source(Seq(
Defn.Object(
Nil,
Term.Name("Hello"),
Template(Nil, Seq(Ctor.Ref.Name("App")),
Term.Param(Nil, Name.Anonymous(), None, None),
Some(Seq(
Term.Apply(Term.Name("println"), Seq(Lit("Hello world!")))
))
))
))
"""
@ structure("A.scala") == structure("B.scala")
res2: Boolean = true
Indeed, the files have identical ASTs. This can be useful in many cases, for example:
- Code formatter: I'm writing scalafmt and I want to be sure it doesn't mess with the semantics of the code it formats. I can assert in my tests that the AST of the formatted output is identical to the AST of the original file.
- Plagarism detection: students submit identical code except with new variable names and different comments. To reduce false negatives, it's possible to give terms fresh names, remove optional type annotations and then compare structures. Of course, plagarism detection is a research topic on its own and a human eye must make the final judgement. You get the point.
Note. Two programs with different ASTs can still be observationally equivalent.
2. Count number of statements, more meaningful metric than lines of code
Lines of code, the metric we use so much but means so little. How many lines of Scala code have you written?
It doesn't say much, really. Maybe you put your function arguments all packed together while I put each argument into a separate line. Maybe I write lots of multiline strings with testing data while you write the few carefully designed lines of code that are being tested.
I think we can use scala.meta
to improve the LOC metric a wee bit and count
the number of statements instead.
For simplicity, I consider a statement to be any expression such that if I had
infinite column-width I would put it into a single line. For example, a
class
/def
declaration, case
in pattern matching or statement inside a
{ ... }
block.
@ def numberOfStatements(tree: Tree): Int =
tree.collect {
case source"..$stats" =>
stats.length
case q"package $ref { ..$stats }" =>
stats.length
case q"$expr match { ..case $cases }" =>
cases.length
case q"{ ..case $cases }" => // Partial function.
cases.length
case q"{ ..$stats }" => // Block
stats.length
case template"""{ ..$earlyInitializers } with ..$ctorcalls
{ $param => ..$stats }""" =>
earlyInitializers.length + stats.length
}.sum
defined function numberOfStatements
@ numberOfStatements(file("A.scala").parse[Source])
res30: Int = 2
@ numberOfStatements(file("B.scala").parse[Source])
res31: Int = 2
Sweet, our method can tell that our previous identical programs have equally
many statements. Understandably, wc -l
is not as smart.
$ wc -l A.scala B.scala
4 A.scala
2 B.scala
6 total
The more advanced cloc
(count lines of code) believes A.scala
has
three times more code than B.scala
. That's even worse than wc -l
!
$ cloc A.scala B.scala
-------------------------------------------------------------------------------
File files blank comment code
-------------------------------------------------------------------------------
A.scala 1 0 1 3
B.scala 1 0 1 1
-------------------------------------------------------------------------------
Beware, for simplicity I kept the implementation of numberOfStatements
naive.
I am sure there are missing some important cases.
I ran numberOfStatements
on the largest file in scala-js,
compiler/GenJSCode.scala. The file contains 4665 lines, cloc
says
it has 3133 LOC and numberOfStatements
says it has 1736 statements.
3. Find correct matching parentheses
Scala allows identifiers like `{`
in
class `{` extends Token {
def code = "{"
}
Although using escaped keywords as identifiers is not encouraged, it can be
handy sometimes. For example, the token classes in scala.meta
make
good use of it in my opinion.
With scala.meta
's tokenizer, you can easily tell apart the three opening curly
braces ({
) above.
@ val tokens = """
class `{` extends Token {
def code = "{"
}""".tokens
tokens: Tokens = Tokens(
...
`{` (7..10),
...
{ (25..26),
...
"{" (40..43),
...
)
The three tokens actually have different types:
- has the type
Ident
(identifier), because it's the name of the class. - has the type
`{`
, a curly brace opening a block. - has the type
Literal.String
.
Let's write a method to build a map from opening parentheses/bracket/curly braces to their respective closing delimiters.
@ import scala.meta.tokens.Token._
import scala.meta.tokens.Token._
@ def matching(tokens: Tokens): Map[Token, Token] = {
val result = scala.collection.mutable.Map.empty[Token, Token]
var stack = List.empty[Token]
tokens.foreach {
case open@( _: `{` | _: `[` | _: `(`) =>
stack = open :: stack
case close@( _: `}` | _: `]` | _: `)`) =>
val open = stack.head
result += open -> close
stack = stack.tail
case _ =>
}
result.toMap
}
defined function matching
@ matching(tokens)
res17: Map[Token, Token] = Map({ (25..26) -> } (44..45))
Great! This can be useful, for example in:
- code formatters: sometimes I need to know how big a
{ ... }
block is and I only have the opening curly brace. I can find the matching closing curly brace and usetoken.position.start.offset
to see how many characters are between the two. - text editors: I use vim and in normal mode I frequently press
%
over a{[(
token to find its matching delimiter. Unfortunately, when my buffer contains`{`
I notice a bug in%
because it thinks there are unbalanced curly braces.
Summary
That's all folks. I highly recommend you give scala.meta
a try. However, the
API can be a bit daunting at times. If you're ever stuck, don't hesitate to send
a message to the friendly people in the scalameta gitter
channel.
Thanks @xeno-by for reading over this.