This is "today I learned" kind of post. The code I want to show is may appear boring by itself as it just loads a file from resources. I found it interesting though because that code works when being run from tests (e.g. when run with sbt test), whereas it fails after being packaged as a JAR. The realization of that was the beginning of an engaging investigation.

I am using Scala in this post but the essence remains the same for any code targeting JVM.

The case

Let's say we want to read a CSV file using scala-csv library. CSVReader has method open which accepts an argument of type File. Thus, providing we want to read a file from the filesystem, we can write something like this:

def readFromFilesystem: List[List[String]] = {
  CSVReader.open(new File("sample.csv")).all
}

However, the case I want to focus on in this post is reading from a resource. We can start with the following code:

def readAsResource: List[List[String]] = {
  val classloader = Thread.currentThread.getContextClassLoader
  val url = classloader.getResource("resource.csv")
  val file = Paths.get(url.toURI).toFile
  CSVReader.open(file).all()
}

It is slightly more involving, and that toURI looks a bit dubious, but let's give it a try. We will also write a test so the potential problem should be caught by it.

We can put both methods into the main method:

object Main {
  def main(args: Array[String]): Unit = {
    println(s"readFromFilesystem: ${Reader.readFromFilesystem}")
    println(s"readAsResource: ${Reader.readAsResource}")
  }
}

Then we run it with sbt reStart which produces the following result:

readFromFilesystem: List(List(a, b, c), List(d, e, f))
readAsResource: List(List(g, h, i), List(j, k, l))

This is exactly what is expected.

If we create a test it will also work:

[info] Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Everything looks fine. Then - time to deploy?

> sbt assembly
...
> java --show-version -jar target/scala-2.13/read-resource-assembly-1.0.jar
openjdk 11.0.2 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
readFromFilesystem: List(List(a, b, c), List(d, e, f))
Exception in thread "main" java.nio.file.FileSystemNotFoundException
  at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.getFileSystem(ZipFileSystemProvider.java:169)
  at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.getPath(ZipFileSystemProvider.java:155)
  at java.base/java.nio.file.Path.of(Path.java:208)
  at java.base/java.nio.file.Paths.get(Paths.java:97)
  at pl.msitko.Reader$.readAsResource(Reader.scala:21)
  at pl.msitko.Main$.main(Main.scala:10)
  at pl.msitko.Main.main(Main.scala)

Oops, it does not look good, let's see what went wrong.

Diving in

If we print out classloader.getResource("resource.csv") for packaged application we will see:

jar:file:/path/to/the/project/target/scala-2.13/read-resource-assembly-1.0.jar!/resource.csv

By the way, if we print out the same during tests the result will be file:/path/to/the/project/target/scala-2.13/classes/resource.csv which explains why that code worked when being run as test. During tests resource's URL points to the local file system.

Stack trace mentions ZipFileSystemProvider, after taking a look at its code and some legacy docs we may try to:

def readAsResource: List[List[String]] = {
  val classloader = Thread.currentThread.getContextClassLoader
  val url = classloader.getResource("resource.csv")

  // the next three lines are new compared to the previous code
  val jarProvider = FileSystemProvider.installedProviders.asScala.toList.filter(_.getScheme == "jar").head
  val jarUrl = new URI("jar:file:/path/to/the/project/target/scala-2.13/read-resource-assembly-1.0.jar")
  jarProvider.newFileSystem(jarUrl, Map.empty[String, Any].asJava)

  val file = Paths.get(url.toURI).toFile
  CSVReader.open(file).all()
}

That code is quite naive and assumes we know the location of JAR file beforehand, but we are just playing around here. It yields:

readFromFilesystem: List(List(a, b, c), List(d, e, f))
Exception in thread "main" java.lang.UnsupportedOperationException
  at jdk.zipfs/jdk.nio.zipfs.ZipPath.toFile(ZipPath.java:661)
  at pl.msitko.Reader$.readAsResource(Reader.scala:25)
  at pl.msitko.Main$.main(Main.scala:10)
  at pl.msitko.Main.main(Main.scala)

There is some progress: instead of previous FileSystemNotFoundException, we got UnsupportedOperationException. After looking at ZipPath.toFile implementation the culprit seems obvious:

@Override
public final File toFile() {
  throw new UnsupportedOperationException();
}

That implementation makes sense considering that java.io.File is meant to model local files. There is simply no local path for a collection of bytes within ZIP file (JAR is technically a ZIP file). To conclude - URL returned by ClassLoader.getResource cannot be converted to java.io.File as a resource cannot be expressed as java.io.File.

Back to initial task

With that conclusion we can go back to the initial scala-csv example. Another method for working with resources provided by ClassLoader is getResourceAsStream. We cannot use it directly as CSVReader has no API entry which accepts InputStream. Fortunately, among numerous overloaded CSVReader.open methods there is one which uses java.io.Reader as an argument. So we can rewrite code which loads CSV from resource:

def readResourceUsingReader: List[List[String]] = {
  val classloader = Thread.currentThread.getContextClassLoader
  val stream = classloader.getResourceAsStream("resource.csv")
  val reader = new InputStreamReader(stream, java.nio.charset.StandardCharsets.UTF_8)
  CSVReader.open(reader).all()
}

By using getResourceAsStream we avoid issues with File at all.

More on Zip File System Provider

Since Java SE 7 release Zip File System Provider is being included as part of JVM. We managed to make it work using newFileSystem and managed to resolve URL into Path. Thanks to that we can use any API which uses Path, for example, we can read all bytes of that resource file with Files.readAllBytes.

That being said - that code is quite hacky and I would consider it as last resort solution.

Key takeaways

  1. As a library developer, you should provide alternatives to API using java.io.File. java.nio.file.Path is probably a good idea as it is more general.
  2. You should realize that if your application is packaged as JAR there's no resource file at runtime. There's only a single JAR file and classloader which knows how to resolve resource path. While it may sound obvious to many readers, it can be really counterintuitive to many developers because they spend most of their time simply developing their code. At development time a simple association resource = file works, but at runtime it is no longer valid.
  3. As a consequence of the above point - be cautious with java.lang.Classloader.getResource as it returns URL not convertible to java.io.File. What is worse - you will learn about it as late as after packaging and running the code.
  4. Be mindful of differences between environment in which you run tests and production environment. The example described here is just one of a few differences between running Java code inside of your build tool and from within JAR.

Github repository

Repository with code used in this article