Writing native CLI applications in Scala with GraalVM

We've been always told that writing CLIs in Scala is not a good idea: memory consumption, slow startup, JIT warm-up and prerequisite of having JRE installed made this idea not seem appealing.

I believe that the development of GraalVM Native Image changed that drastically. It removed any initial overhead of running JVM while also reducing memory footprint of running executables significantly. Moreover, it enabled the distribution model known from Go - releasing a single standalone binary. It's safe to say that ahead of time compilation overcome all drawbacks of running short-lived programs on the JVM.

While Native Image made writing small native CLIs possible in JVM world, it has not, by itself, made it a productive option. To develop small CLIs quickly and productively you also need libraries that support that.

Let's think about what we would expect from the ecosystem of the language to create CLIs effectively. We need to:

work with files and paths efficiently. Provided by os-lib
work with subprocesses efficiently. Also achieved with os-lib
parse command line arguments, show proper error messages and show help messages. Provided by decline
use ANSI coloring of strings - fansi
have fully automated release process. It should produce standalone binaries for MacOS, Linux and Windows which are uploaded to Github release page. Powered by GraalVM Native Image and Travis CI
version of the release should be propagated automatically all the way down so the binary contains it. Provided by sbt-buildinfo and sbt-git

Hypothesis

I hypothesize that Scala might be a productive language for writing small CLIs. I keep on repeating "small CLI" but what do I mean by that? I mean glue code traditionally written in Python or Go in data science, bioinformatics and devops, among others.

Example application

For the sake of this article, I decided to implement a tiny tool for quickly switching between directories. The goal is that you can type tp goto x instead of cd /a/lot/of/directories/leading/to/x. It's inspired one to one by project written in Haskell: teleport.

Here's a short asciinema animation presenting the tool:

All of the code is available on Github as teleport-scala.

IOApp

You may think that the starting point of any Scala application is def main but it's not present in teleport-scala. I used cats.effect.IOApp instead. Intially I was refraining from using it thinking that similarly to scala.App it does not solve any problem and just make things less obvious. It's not the case with IOApp though.

It models your program explicitly as IO[ExitCode], helps you with cancellation and safe resource release and brings Timer[IO] into scope. Read more here.

Parsing arguments with decline

There are a few options for parsing command line arguments and I picked decline. Instead of explaining its API I would like to study the anatomy of one of teleport-scala commands: add. Here's an example invocation:

teleport-scala --no-colors add notes some/dir

Using decline's terminology we can say that:

--no-colors is a flag
add is a subcommand
notes and some/dir are positional arguments

Since --no-colors is a flag that can be applied to any subcommand I think of it as a "global flag". I gathered all of global flags together in their own type:

final case class GlobalFlags(colors: Boolean, headers: Boolean)

This is howdefine GlobalFlag parser looks like:

val flags: Opts[GlobalFlags] = {
    val nocolorsOpt  = booleanFlag("no-colors", help = "Disable ANSI color codes")
    val noheadersOpt = booleanFlag("no-headers", help = "Disable printing headers for tabular data")
    (nocolorsOpt, noheadersOpt).mapN((noColors, noHeaders) => GlobalFlags(!noColors, !noHeaders))
  }

add subcommand is defined in the following way:

val nameOpt = Opts.argument[String]("NAME")
val add =
      Command(
        name = "add",
        header = "add a teleport point"
      )((nameOpt, Opts.argument[String]("FOLDERPATH").orNone).mapN(AddCmdOptions))

I extracted Opts.argument[String]("NAME") to its own value because it will be used in a few other places.

Other 4 subcommands are defined in the same declarative way (details omitted for conciseness):

val list = Command(...)
val remove = Command(...)
val goto = Command(...)
val version = Command(...)

We chain subcommands together:

subcommands = Opts
  .subcommand(add)
  .orElse(Opts.subcommand(list))
  ...

We combine global flags with subcommands:

val appCmd: Opts[(GlobalFlags, CmdOptions)] = (flags, subcommands).tupled

And... that's all! We didn't have to write any parsing code explicitly, the whole specification is written in purely declarative way. Also, we provided names and headers of all the commands so decline has enough data to generate help messages. Here is an example of help message for tp add:

> tp add --help
Usage: teleport-scala add <NAME> [<FOLDERPATH>]

add a teleport point

Options and flags:
    --help
        Display this help text.

Business logic

In the previous point we parsed command line arguments to (GlobalFlags, CmdOptions). Now, we need to dispatch the command to the proper handling code:

def dispatchCmd(globalFlags: GlobalFlags, cmd: CmdOptions, handler: Handler)(
      implicit style: Style): IO[ExitCode] =
    cmd match {
      case cmd: AddCmdOptions =>
        handler.add(cmd).map {
          case Right(tpPoint) =>
            println(s"Creating teleport point: ${style.emphasis(tpPoint.name)}")
            ExitCode.Success
          case Left(err) =>
            println(err.fansi)
            ExitCode.Error
        }
  ...

As I spent most of my programming life writing servers I like to think about it as of routing code. There are 2 responsibilities of this code: dispatching "request" to proper "handler" and then presenting the result in proper form. In case of CLI it's not an HTML or JSON but simply text.

The actual business logic lies in Handler. I don't want to focus too much on it though as it's not particularly relevant to the main point of the article. In a nutshell - we persist teleport points as TeleportState in file $HOME/.teleport-data. It's a JSON file and circe is being used for working with JSON.

Even though the code of handler is straightforward, there's one interesting ingredient involved. It's os-lib for working with paths and files. os-lib has a unique philosophy behind it - it tries to use Scala more as a scripting language. Let's take a look at the type of os.read.apply, which reads a file into String:

def apply(arg: ReadablePath): String

Being a Scala developer you might be surprised by the simplicity of the result type. It's not IO[InputStream] or Try[InputStream] - it's just a String. Is this good? As always, it depends on the use case. If you write a CLI tool for power users then maybe just throwing an exception with filename is enough for them to figure out what went wrong? And if you can predict the size of a file maybe you don't need to stream the file?

Even if you decided you need more powerful tool for working with files you may still be interested in using os-lib for its capabilities in working with paths and subprocesses.

In the rest of this article I will walk you through aspects of developing CLIs that I find important.

Coloring

We will use fansi for ASCII coloring. With fansi string coloring boils down to:

fansi.Color.Red("Hello World Ansi!")

We don't want though to use it directly, mostly to have a single place to control color palette. Therefore, a trait Style is defined:

trait Style {
  def emphasis(input: String): fansi.Str
  def error(input: String): fansi.Str
}

In all places we want to use colors we will demand an instance of Style to be provided.

Interoperability with UNIX tools

Let's take a look at tp list output:

My helpful screenshot

Coloring makes it pleasant to read and helps to emphasize vital points - imagine how hard to read the results of unit tests would be without colors. However, keep in mind that it's not always a human who reads the output; pretty often the output of a program is being processed by scripts. There's a great article by Marius Eriksen on that.

Let's take a look at the raw output:

teleport points: [94m(total 1)[39m
oss [94m/home/michal/teleport-demo/code/scala-oss[39m

It does not look right because of ANSI escapes appearing. The first line, being a header, makes some processing much more difficult. That's why we introduced --no-colors and no-headers flags. Thanks to Style being a trait we can define NoColorsStyle:

object NoColorsStyle extends Style {
  override def emphasis(input: String): Str = Str(input)
  override def error(input: String): Str = Str(input)
}

Keeping version in sync

To keep it simple I decided to make Git the only source of truth in regards to project's version. If you want to release a new version you have to create and push a Git tag. That triggers a build on Travis CI. Since we used sbt-git and enabled GitVersioning sbt uses Git tag as a project version. Moreover, we configured sbt-buildinfo in a following way:

buildInfoKeys := Seq[BuildInfoKey](name, version, scalaVersion, sbtVersion, git.baseVersion, git.gitHeadCommit),
buildInfoPackage := "pl.msitko.teleport",
buildInfoUsePackageAsPath := true,

That means that all version-related data will be present in generated pl.msitko.teleport.BuildInfo class. We can import that class and use it in code handling version subcommand:

case VersionCmdOptions =>
  IO(println(BuildInfo.version)) *> IO(ExitCode.Success)

That way we don't need to hardcode version in the code and we guarantee it will always be in sync with Git.

Building binary with Native Image

In theory building binary with Native Image should be as simple as (omitting some options for the sake of brevity):

native-image --verbose --no-fallback --static -jar teleport-scala.jar teleport-scala

I produced fat-jar using sbt-assembly, run the above native-image command and that's what I've got:

Error: Unsupported features in 3 methods
Detailed message:
Error: com.oracle.graal.pointsto.constraints.UnsupportedFeatureException: Invoke with MethodHandle argument could not be reduced to at most a single call or single field access. The method handle must be a compile time constant, e.g., be loaded from a `static final` field. Method that contains the method handle invocation: java.lang.invoke.MethodHandle.invokeBasic()

It's described in the Github issue. I did what was advised there: I switched to Java 11 based GraalVM and added proper native-image.properties and after I did the code compiled just fine. Keep that in mind - debugging this gotcha was really frustrating.

Another important thing - you should always use native-image with --no-fallback. Without that option, in the case described above, native-image would print out a few warnings but would exit with code 0 and would generate an image that requires JDK for execution - something you definitely don't want when using native-image.

Limitations of Native Image

Even though using Native Image was not as easy as it appeared initially we are quite lucky with teleport-scala anyway. There are a number of Native Image limitations. A lot of them are related to features like runtime reflection or dynamic class loading which are not widely used in Scala libraries. You can see, however, how much hassle it is to run Akka or Netty from executable built with Native Image. It's not that problematic if someone described how to annotate library you try to run but since GraalVM is not widely adopted it's likely you will have to figure out some parts yourself.

When choosing Native Image you should be aware that you deal with immature and evolving software - new options are being added with each version, defaults for old options are being changed. That's the price you need to pay for surmounting traditional limitations of JVM.

CI build

As I mentioned in the introduction, we want to release executables for all 3 major operating systems. GraalVM Native Image does not, and probably will not, support cross compilation. The only exception is targeting Linux - its executables can be built at any platform as you can build them by running Native Image in a docker container. Nevertheless, there's no escape from running a build on a few platforms in our case. I decided to use Travis CI as it provides environments for Windows, MacOS and Linux.

Coming up with Linux script was easy but the problems appeared while I started working on Windows build. Unfortunately, Native Image documentation for Windows is simply not comprehensive enough, and thus I had to go through a couple of posts found here and there to understand how it Native Image is supposed to be used on Windows. The process was painful enough that I've written another blog post about it.

You can see the eventual build definition supporting all major operating systems here.

Testing

Analogously to server applications, you can choose from a few testing strategies - unit tests, integration tests, end to end tests, and everything in between them.

In teleport-scala I took an unusual approach. Since there's not that much logic in the program itself, and since it heavily relies on a filesystem, it has hardly any unit tests. Instead, an executable built with Native Image is being called from dockerized ammonite script. That script contains some testing code written with utest. It has several advantages:

since ammonite is dockerized it works on a separate filesystem, does not touch host file system at all
it verifies the actual artifact shipped to the users as opposed to testing merely Scala code. That way we eliminate the possibility of overlooking errors introduced in the process of creating artifact out of the code

To get a feeling of how those tests are defined take a look at smoke-test.sc.

Performance

We went a long way to have a native executable, the question arises - was it worth it? The size of the binary is around 12 MB, which is a decent size given that fat jar teleport-scala.jar weighs 20 MB and the binary is completely self-contained - no JVM needed. We could make it even smaller if we had not used --static but that would contradict the idea of standalone executable.

How about execution time? Let's check out the performance of native binary:

time ./teleport-scala list
./teleport-scala list  0,01s user 0,02s system 99% cpu 0,029 total

This is how JAR run with java performs:

time java -jar teleport-scala.jar list
java -jar teleport-scala.jar list  1,23s user 0,10s system 191% cpu 0,698 total

It's not a benchmark but the difference is clear and well expected.

Conclusion

Thanks to libraries like decline, os-lib and fansi I found the development process to be productive and enjoyable. Native Image produces fast and small binaries. Scala as a language and an ecosystem lays stress on reliability and maintainability. I think all those factors combined make for a good choice for writing CLI tools.

There were some serious setbacks like problems with lack of native-image.properties or setting up build on Windows. Yet I believe they are mostly one-off issues, which if solved for the first time should not appear in next projects.

Of course, no amount of blog posts will make a real difference - we will see if Scala starts being an attractive choice for CLIs by number and quality of projects. Time will tell but I feel the bright future might be ahead of us.

References and interesting links

Source code of teleport-scala (i.e. code described in this blog post)
How to work with Files in Scala and How to work with Subprocesses in Scala are must reads if you want to use os-lib
Picocli - a modern framework for building powerful, user-friendly, GraalVM-enabled command line apps. It's written in Java and its API, from what I've seen, is not something we would consider functional. However, it is amazingly rich with features. Some that caught my eye are:

an annotation processor that automatically Graal-enables your JAR during compilation. Look for picocli-codegen

tab autocompletion

generating manpage
And if you want to see Scala application using Picocli then follow Creating CLI tools with Scala, Picocli and GraalVM - published just a day before I released that blog post
There are some good alternatives to decline: case-app and scopt
pureapp - a library for writing referentially transparent and stack-safe sequential programs. Its scope is much bigger than what I was interested in. If you care about managing state of your CLI app in purely functional matter check it out.
Hints for writing UNIX tools - language agnostic guideline on how to create composable command line tools
Building Windows executables with GraalVM on Travis CI
Updates on Class Initialization in GraalVM Native Image Generation explains static initialization in Native Image
Instant Netty Startup ... is a good read to understand better limitations of Native Image
A few other articles about building Scala code with Native Image were published: one focusing on building lightweight docker images and another one focusing on http-4s
Presentation by by Francois Farquet: Run Programs Faster With GraalVM
There's GraalVM Native Image Plugin available as part of sbt-native-packager. I haven't used it here just because I was learning native-image and I prefered to use it directly