Three cool things you can do with Scalameta

While working on scalafmt, I've gotten some exposure to scala.meta. In the spirit of Buzzfeed list articles, here are some cool things you can do with the library.

Setup
#

All the following examples are run from the Ammonite-REPL. Once installed, run amm and execute the following lines:

@ load.ivy("org.scalameta" %% "scalameta" % "0.0.5-M1")
@ import scala.meta._
import scala.meta._
@ def file(filename: String) = new java.io.File(filename)

1. Check if two Scala source files are equal (ignoring comments and whitespace)
#

Assume we have two programs:

$ cat A.scala
object Hello extends App {
  // This is A.scala
  println("Hello world!")
}
$ cat B.scala
// This is B.scala
object Hello extends App { println("Hello world!") }

The programs are clearly identical, their abstract syntax trees (ASTs) are the same. However, it might be trickier to tell if files were longer.

Let's use scala.meta's show[Structure] to programatically compare the two ASTs serialized as strings.

@ def structure(filename: String) = file(filename).parse[Source].show[Structure]
@ structure("A.scala")
res1: String = """
Source(Seq(
    Defn.Object(
        Nil,
        Term.Name("Hello"),
        Template(Nil, Seq(Ctor.Ref.Name("App")),
        Term.Param(Nil, Name.Anonymous(), None, None),
        Some(Seq(
            Term.Apply(Term.Name("println"), Seq(Lit("Hello world!")))
        ))
    ))
))
"""
@ structure("A.scala") == structure("B.scala")
res2: Boolean = true

Indeed, the files have identical ASTs. This can be useful in many cases, for example:

Code formatter: I'm writing scalafmt and I want to be sure it doesn't mess with the semantics of the code it formats. I can assert in my tests that the AST of the formatted output is identical to the AST of the original file.
Plagarism detection: students submit identical code except with new variable names and different comments. To reduce false negatives, it's possible to give terms fresh names, remove optional type annotations and then compare structures. Of course, plagarism detection is a research topic on its own and a human eye must make the final judgement. You get the point.

Note. Two programs with different ASTs can still be observationally equivalent.

2. Count number of statements, more meaningful metric than lines of code
#

Lines of code, the metric we use so much but means so little. How many lines of Scala code have you written?

It doesn't say much, really. Maybe you put your function arguments all packed together while I put each argument into a separate line. Maybe I write lots of multiline strings with testing data while you write the few carefully designed lines of code that are being tested.

I think we can use scala.meta to improve the LOC metric a wee bit and count the number of statements instead.

For simplicity, I consider a statement to be any expression such that if I had infinite column-width I would put it into a single line. For example, a class/def declaration, case in pattern matching or statement inside a { ... } block.

@ def numberOfStatements(tree: Tree): Int =
    tree.collect {
      case source"..$stats" =>
        stats.length
      case q"package $ref { ..$stats }" =>
        stats.length
      case q"$expr match { ..case $cases }" =>
        cases.length
      case q"{ ..case $cases }" => // Partial function.
        cases.length
      case q"{ ..$stats }" => // Block
        stats.length
      case template"""{ ..$earlyInitializers } with ..$ctorcalls
                      { $param => ..$stats }""" =>
        earlyInitializers.length + stats.length
    }.sum
defined function numberOfStatements
@ numberOfStatements(file("A.scala").parse[Source])
res30: Int = 2
@ numberOfStatements(file("B.scala").parse[Source])
res31: Int = 2

Sweet, our method can tell that our previous identical programs have equally many statements. Understandably, wc -l is not as smart.

$ wc -l A.scala B.scala
    4 A.scala
    2 B.scala
    6 total

The more advanced cloc (count lines of code) believes A.scala has three times more code than B.scala. That's even worse than wc -l!

$ cloc A.scala B.scala
-------------------------------------------------------------------------------
File                         files          blank        comment           code
-------------------------------------------------------------------------------
A.scala                          1              0              1              3
B.scala                          1              0              1              1
-------------------------------------------------------------------------------

Beware, for simplicity I kept the implementation of numberOfStatements naive. I am sure there are missing some important cases.

I ran numberOfStatements on the largest file in scala-js, compiler/GenJSCode.scala. The file contains 4665 lines, cloc says it has 3133 LOC and numberOfStatements says it has 1736 statements.

3. Find correct matching parentheses
#

Scala allows identifiers like `{` in

class `{` extends Token {
  def code = "{"
}

Although using escaped keywords as identifiers is not encouraged, it can be handy sometimes. For example, the token classes in scala.meta make good use of it in my opinion.

With scala.meta's tokenizer, you can easily tell apart the three opening curly braces ({) above.

@ val tokens = """
class `{` extends Token {
  def code = "{"
}""".tokens
tokens: Tokens = Tokens(
...
  `{` (7..10),
...
  { (25..26),
...
  "{" (40..43),
...
)

The three tokens actually have different types:

has the type Ident (identifier), because it's the name of the class.
has the type `{`, a curly brace opening a block.
has the type Literal.String.

Let's write a method to build a map from opening parentheses/bracket/curly braces to their respective closing delimiters.

@ import scala.meta.tokens.Token._
import scala.meta.tokens.Token._
@ def matching(tokens: Tokens): Map[Token, Token] = {
    val result = scala.collection.mutable.Map.empty[Token, Token]
    var stack = List.empty[Token]
    tokens.foreach {
      case open@( _: `{` | _: `[` | _: `(`) =>
        stack = open :: stack
      case close@( _: `}` | _: `]` | _: `)`) =>
        val open = stack.head
        result += open -> close
        stack = stack.tail
      case _ =>
    }
    result.toMap
  }
defined function matching
@ matching(tokens)
res17: Map[Token, Token] = Map({ (25..26) -> } (44..45))

Great! This can be useful, for example in:

code formatters: sometimes I need to know how big a { ... } block is and I only have the opening curly brace. I can find the matching closing curly brace and use token.position.start.offset to see how many characters are between the two.
text editors: I use vim and in normal mode I frequently press % over a {[( token to find its matching delimiter. Unfortunately, when my buffer contains `{` I notice a bug in % because it thinks there are unbalanced curly braces.

Summary
#

That's all folks. I highly recommend you give scala.meta a try. However, the API can be a bit daunting at times. If you're ever stuck, don't hesitate to send a message to the friendly people in the scalameta gitter channel.

Thanks @xeno-by for reading over this.

Three cool things you can do with Scalameta

Setup#

1. Check if two Scala source files are equal (ignoring comments and whitespace)#

2. Count number of statements, more meaningful metric than lines of code#

3. Find correct matching parentheses#

Summary#

Setup
#

1. Check if two Scala source files are equal (ignoring comments and whitespace)
#

2. Count number of statements, more meaningful metric than lines of code
#

3. Find correct matching parentheses
#

Summary
#