Jekyll2021-01-10T12:38:47+01:00http://msitko.pl/blog/feed.xmlMichał Sitko blogRandom thoughts on Software DevelopmentA few problems with Helm (that don’t exist in Dhall)2021-01-07T06:49:27+01:002021-01-07T06:49:27+01:00http://msitko.pl/blog/2021/01/07/a-few-problems-with-helm<p>A few weeks ago I had my first opportunity to work with Helm. As part of my assignment, I had to adjust some parts of Helm templates. It looked like a relatively small task, so I decided to rely on a combination of using existing templates as a reference and reading documentation just in time. A common practice in days of an abundance of tools, I believe.</p>
<p>If you have a lot of experience using Helm, you may sum up observations in the rest of this post with a sigh of “of course it works like that, it’s Helm 101”. Yet, I think it’s valuable to gather this kind of outsider feedback because insiders already know all gotchas and tend not to notice them anymore. And, <a href="https://twitter.com/danluu/status/917241999006294016/photo/1">contrary to what some people claim</a>, I don’t think gotchas are facts of nature but quite often stem from the wrong underlying model.</p>
<p>I could have limited myself to writing a few sentences of conclusions and the post wouldn’t lose too much of its purely technical substance but providing some narrative enables you to see how I came to those conclusions.</p>
<h3 id="use-case">Use case</h3>
<p>In short - I had to create a duplicate of some preexisting Helm-defined job. That duplicate was supposed to have a slightly different configuration from the original one.</p>
<p>Let’s say the original configuration looked like this (a fragment of <code class="highlighter-rouge">values.yaml</code>):</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">application_config</span><span class="pi">:</span>
<span class="na">host</span><span class="pi">:</span> <span class="s2">"</span><span class="s">localhost"</span>
<span class="na">port</span><span class="pi">:</span> <span class="m">1234</span>
</code></pre></div></div>
<p>In the new job I wanted to use the same <code class="highlighter-rouge">application_config</code> with <code class="highlighter-rouge">port</code> being overridden to <code class="highlighter-rouge">9876</code>. After a quick lookup I found out <code class="highlighter-rouge">mergeOverwrite</code>. Therefore, my first attempt to define <code class="highlighter-rouge">ConfigMaps</code> was as follows (<code class="highlighter-rouge">templates/configmaps.yaml</code>):</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="pi">{{</span> <span class="nv">/* The new job's configuration with slighly modified `data` */</span> <span class="pi">}}</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">ConfigMap</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">port_overriden</span>
<span class="na">data</span><span class="pi">:</span> <span class="pi">{{</span> <span class="nv">(mergeOverwrite .Values.application_config (dict "port" 9876)) | toYaml | nindent 2</span> <span class="pi">}}</span>
<span class="nn">---</span>
<span class="pi">{{</span> <span class="nv">/* The original job's configuration */</span> <span class="pi">}}</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">ConfigMap</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">original</span>
<span class="na">data</span><span class="pi">:</span> <span class="pi">{{</span> <span class="nv">.Values.application_config | toYaml | nindent 2</span> <span class="pi">}}</span>
</code></pre></div></div>
<h3 id="problem-1-mergeoverwrite-mutates-source-dictionary">Problem 1: mergeOverwrite mutates source dictionary</h3>
<p><a href="https://helm.sh/docs/chart_template_guide/function_list/#mergeoverwrite-mustmergeoverwrite">The documentation</a> of <code class="highlighter-rouge">mergeOverwrite</code> mentions that:</p>
<blockquote>
<p>Nested objects that are merged are the same instance on both dicts. If you want a deep copy along with the merge than use the deepCopy function along with merging.</p>
</blockquote>
<p>Since it was my first time using Helm I was not sure if I understood its terminology correctly. Seeing “deep copy” being mentioned I got a vague feeling that Helm allows mutability. To determine if that’s true I ran <code class="highlighter-rouge">helm template .</code> which yielded the following result:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Source: mychart/templates/configmaps.yaml</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">ConfigMap</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">port_overriden</span>
<span class="na">data</span><span class="pi">:</span>
<span class="na">host</span><span class="pi">:</span> <span class="s">localhost</span>
<span class="na">port</span><span class="pi">:</span> <span class="m">9876</span> <span class="c1"># That's fine</span>
<span class="nn">---</span>
<span class="c1"># Source: mychart/templates/configmaps.yaml</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">ConfigMap</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">original</span>
<span class="na">data</span><span class="pi">:</span>
<span class="na">host</span><span class="pi">:</span> <span class="s">localhost</span>
<span class="na">port</span><span class="pi">:</span> <span class="m">9876</span> <span class="c1"># I intended it to stay 1234!</span>
</code></pre></div></div>
<p>The result speaks for itself - <code class="highlighter-rouge">mergeOverwrite</code> overridden the original dictionary! I can easily imagine someone not checking the documentation and expecting <code class="highlighter-rouge">mergeOverwite</code> to create a new dictionary with the chosen values overridden. Apparently I was not the first one to find this behavior problematic - there is a <a href="https://github.com/Masterminds/sprig/issues/188">github issue</a> for that. I must admit that things improved because the sentence I quoted before has been added as resolution and helped me spot the problem right away.</p>
<p>What puzzles me the most here is not <code class="highlighter-rouge">mergeOverwrite</code> itself but the very fact that configuration language allows mutability at all. I cannot imagine any use case in which you need to model mutability using configuration language. It stands in contrast to general-purpose languages - most of them have to support mutability in some way because people are supposed to write long-running, stateful programs in them.</p>
<p>However, Helm is clearly not a general-purpose programming language in which you would write a web scraper or web service. <code class="highlighter-rouge">helm template</code> should just resolve templates, which is a single atomic operation with no notion of time. There’s only input and output.</p>
<p>Introducing mutability into templating brings a whole class of issues without solving any problem on its own. Have I mentioned that in the example above, if I change the ordering of <code class="highlighter-rouge">ConfigMaps</code> so <code class="highlighter-rouge">original</code> comes before <code class="highlighter-rouge">port_overriden</code>, then I would get the intended result? The thing is that I don’t want semantics of resolution depends on the ordering of fragments. It’s something that Terraform gets right - it builds a graph of dependencies so you don’t have to worry about the ordering of things.</p>
<p>Knowing that problem, we can fix it easily - add <code class="highlighter-rouge">deepCopy</code> like this:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">data</span><span class="pi">:</span> <span class="pi">{{</span> <span class="nv">(mergeOverwrite (deepCopy .Values.application_config) (dict "port" 1235)) | toYaml | nindent 2</span> <span class="pi">}}</span>
</code></pre></div></div>
<h3 id="problem-2-ternary-evaluates-parameters-greedily">Problem 2: ternary evaluates parameters greedily</h3>
<p>To appreciate what a Pandora’s box mutability opens, let’s take a look at <a href="https://helm.sh/docs/chart_template_guide/function_list/#ternary"><code class="highlighter-rouge">ternary</code></a>. This function does what the ternary operator does in many languages - picks one of two expressions depending on a predicate.</p>
<p>Let’s imagine we want to override <code class="highlighter-rouge">port</code> in the configuration based on some predicate:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">metadata</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">port_overriden</span>
<span class="na">data</span><span class="pi">:</span> <span class="pi">{{</span> <span class="nv">ternary ((mergeOverwrite .Values.application_config (dict "port" 9876)) | toYaml | nindent 2) .Values.application_config predicate</span> <span class="pi">}}</span>
</code></pre></div></div>
<p>Which overrides <code class="highlighter-rouge">port</code> to <code class="highlighter-rouge">9876</code> if <code class="highlighter-rouge">predicate</code> is true and uses original <code class="highlighter-rouge">application_config</code> otherwise. How about side effects though? Will <code class="highlighter-rouge">application_config</code> be touched in case <code class="highlighter-rouge">predicate</code> is false? It took me one <code class="highlighter-rouge">helm template</code> run to understand that <code class="highlighter-rouge">ternary</code> evaluates its parameters greedily. It means that no matter of <code class="highlighter-rouge">predicate</code> value the original <code class="highlighter-rouge">application_config</code> will be overriden. In this particular case it contradicts the whole point of using <code class="highlighter-rouge">ternary</code> because <code class="highlighter-rouge">port</code> will be set to <code class="highlighter-rouge">9876</code> regardless of the predicate.</p>
<p>Again, I am not assessing the specific design of Helm here but trying to make a larger point. I don’t even know Go, in which Helm is written, so I cannot tell if lazy evaluation would be actually possible here.</p>
<h3 id="a-few-other-problems">A few other problems</h3>
<p>Since Helm templates are embedded into YAML it means that whitespaces are significant. It introduces additional complexity, especially with conditionals - you need to be very careful with distinguishing between <code class="highlighter-rouge">{{</code> vs <code class="highlighter-rouge">{{-</code>.</p>
<p>Another curious Helm limitation is that conditionals and local variables <a href="https://stackoverflow.com/questions/57600772/is-it-possible-to-define-variables-use-if-else-condition-in-helm-chart">do not easily compose</a>.</p>
<h3 id="how-does-it-compare-to-dhall">How does it compare to Dhall?</h3>
<p><a href="https://dhall-lang.org/">Dhall</a> is a configuration language. You can think of it as a better YAML. It offers variables, functions and types and it embraces immutablity.
I’ve written about Dhall <a href="https://msitko.pl/blog/2019/03/13/replace-json-with-dhall.html">long time ago</a> so I am not going to dwell on it too much here but I want to point out that none of the problems described in this post exist in Dhall.</p>
<p>I am aware that comparing Helm to Dhall is not comparing apples to apples. What I am doing here is rather showing very concrete problems with Helm that would not appear in an hypothetical, yet-to-be-written tool based on Dhall. And I am not making this problems up - I stumbled upon all of them just in a few hours of work.</p>A few weeks ago I had my first opportunity to work with Helm. As part of my assignment, I had to adjust some parts of Helm templates. It looked like a relatively small task, so I decided to rely on a combination of using existing templates as a reference and reading documentation just in time. A common practice in days of an abundance of tools, I believe.A beginners step by step guide to Alloy2020-05-24T10:40:00+02:002020-05-24T10:40:00+02:00http://msitko.pl/blog/2020/05/24/guide-to-alloy<p>This post assumes:</p>
<ul>
<li>you already know that you want to learn some Alloy. That means I will not try to convince you that formal methods are useful or that you should study Alloy. Examples I will use are small and easy to follow as opposed to demonstrating the full power of the tool</li>
<li>you haven’t written a single line of Alloy and don’t have anything installed. You start from zero, and therefore I will cover installation and the basics</li>
</ul>
<p>There are a few Alloy tutorials on the web, but they do not sufficiently explain the basics. For example, all of them skip the installation step. Out of my frustration at the lack of entry-level material, I decided to write this tutorial.</p>
<p>The core parts of this tutorial are videos. I included installation instructions and all code snippets in the text itself to make it easier for you to copy-paste them. At the end of the article you can find links to interesting resources.</p>
<p>Why video? Even though, as a recipient, I prefer to study technical content in text form, I think in the case of Alloy it might be easier to explain things with video. The first reason is that Alloy Analyzer, the only way to use Alloy except of Java API, is a GUI program. Another reason is that result of running Alloy specification is usually a sequence of diagrams. You need to learn how to interpret those to be able to work with Alloy effectively. I believe video can be useful in that regard.</p>
<p>I am an Alloy beginner myself. Thus the code I write might be not idiomatic and my explanations might be superficial. On the other hand, it puts me in a good position for writing beginner’s tutorial as I know what I was struggling with.</p>
<h3 id="installation">Installation</h3>
<ol>
<li>Go to https://github.com/AlloyTools/org.alloytools.alloy/releases and download the latest release. At the moment of writing this guide it’s 5.1.0 and it’s the version I used for recording.</li>
<li>To start Alloy Analyzer GUI run: <code class="highlighter-rouge">java -jar org.alloytools.alloy.dist.jar</code></li>
</ol>
<h3 id="part-1-alloy-basics">Part 1: Alloy basics</h3>
<p><strong>Link to the video: <a href="https://www.youtube.com/watch?v=Sf8iWVvkWQ4&list=PLBogxgC0FgFq8ntU93oe4NHQn3blumTbw&index=2&t=0s">video</a></strong></p>
<p><strong>Link to the code: <a href="https://gist.github.com/note/66ff8c760e3d15ff658560f1873d5cb4">code</a></strong></p>
<h3 id="part-2-static-modelling-with-alloy">Part 2: Static modelling with Alloy</h3>
<p><strong>Link to the video: <a href="https://www.youtube.com/watch?v=UMViSWiFwKE&list=PLBogxgC0FgFq8ntU93oe4NHQn3blumTbw&index=2">video</a></strong></p>
<p><strong>Link to the code: <a href="https://gist.github.com/note/e2c79ab44d445faf3fc6f1977e4ce397">code</a></strong></p>
<p>Initial code for Einstein’s Riddle: <a href="https://gist.github.com/note/6198d7d3e01fd8220cd683b1f4320d5d">code</a>. It contains some basic definitions but you need to encode all the clues as predicates to find the solution.</p>
<h3 id="next-parts---to-be-done">Next parts - to be done?</h3>
<p>Recording first two parts was time consuming endeavour and I decided to stop right here, for some moment at least. I may record next parts one day.</p>
<h3 id="where-to-go-next">Where to go next</h3>
<p>If you are interested with learning more than it was covered in the videos then you have a few options:</p>
<ul>
<li><a href="http://alloytools.org/tutorials/online/index.html">Tutorial for Alloy Analyzer 4.0</a> - it covers similar material as the videos, although it does not explain how to install Alloy Tools, work with it and interpret generated examples. If you went through my videos you should be able to follow all the examples easily</li>
<li><a href="https://mitpress.mit.edu/books/software-abstractions-revised-edition">Software Abstractions</a> book by Alloy’s creator - Daniel Jackson. When buying it pay attention whether you are getting revised edition from 2012. Unfortunately, revised edition is available only in hardcover and its availability may depend on your location. Therefore, I ended up with Kindle version, which is original 2006 edition. <br /> Regardless of all those logistical issues in getting it, if you become serious about learning Alloy you will need to read this book for sure. It’s very well written and explains design considerations in depth. Similarly to other resources though, it does not explain too much about using GUI itself</li>
<li><a href="https://alloy.readthedocs.io/en/latest/">AlloyDocs</a> by Hillel Wayne. This resource has been published quite recently and as <a href="https://alloy.readthedocs.io/en/latest/intro.html#about-this-guide">it states</a> it aims to be a reference as opposed to tutorial</li>
</ul>
<h3 id="links-and-references">Links and References</h3>
<h4 id="motivational-resources">Motivational resources</h4>
<p>Those may interest you if you wonder why you should even care:</p>
<ul>
<li><a href="https://www.youtube.com/watch?v=_9B__0S21y8">Tackling Concurrency Bugs with TLA+</a> by Hillel Wayne</li>
<li><a href="https://www.youtube.com/watch?v=FvNRlE4E9QQ">Finding bugs without running or even looking at code</a> by Jay Parlar</li>
</ul>
<p>These videos made me realize that formal methods do not need to revolve around costly code verification. Instead, with a small initial investment, you can do design verification and use formal methods as a vehicle for exploring your domain. Something similar in spirit to DDD, UML, or working on a problem at a whiteboard with colleagues.</p>
<p><em>By the way, Hillel Wayne also writes a fascinating <a href="https://hillelwayne.com/post/">blog</a> in which, among other things, he writes on Alloy, TLA+ and empirical Software Engineering.</em></p>
<h4 id="real-world-applications-of-modeling-with-alloy">Real-world applications of modeling with Alloy</h4>
<ul>
<li><a href="https://www.semanticscholar.org/paper/Using-lightweight-modeling-to-understand-chord-Zave/fe129dbf40ae69a028960df0328d85d2e2808d41">Using Lightweight Modeling To Understand Chord</a> - a classic paper in which Pamela Zave shows that Chord protocol do not hold guarantees it stated it holds</li>
<li><a href="https://www.semanticscholar.org/paper/A-practical-comparison-of-Alloy-and-Spin-Zave/d742a097402008d4097ad58ecc60a7a95d438ad7">A Practical Comparison of Alloy and Spin</a> by Pamela Zave</li>
<li><a href="http://aosabook.org/en/500L/the-same-origin-policy.html">The Same-Origin Policy</a> - Eunsuk Kang et al., a chapter of <a href="http://aosabook.org/en/index.html">The Architecture of Open Source Applications</a></li>
</ul>
<h4 id="other-interesting-links">Other interesting links</h4>
<ul>
<li><a href="https://lamport.azurewebsites.net/tla/formal-methods-amazon.pdf">Use of Formal Methods at Amazon Web Services</a> by Chris Newcombe et al.</li>
<li><a href="https://www.semanticscholar.org/paper/An-Empirical-Study-on-the-Correctness-of-Formally-Fonseca-Zhang/2817df10c4ffe29482928cb97b8ee89d8560b4cd">An Empirical Study on the Correctness of Formally Verified Distributed Systems</a> by Pedro Fonseca et al. It’s an amazing paper, one of the very few empirical studies on the correctness of verified systems and it helps to understand the place of formal methods in the bigger picture of delivering software</li>
<li><a href="https://www.semanticscholar.org/paper/Alloy-meets-TLA%2B%3A-An-exploratory-study-Macedo-Cunha/476dfe02abcb0f824df56b30dd360d444dd3f26b">Alloy meets TLA+: An exploratory study</a> by Nuno Macedo, Alcino Cunha</li>
</ul>This post assumes:Isolated environments with nix-shell and zsh2020-04-22T10:40:00+02:002020-04-22T10:40:00+02:00http://msitko.pl/blog/2020/04/22/isolated-ennvironments-with-nix-shell-and-zsh<p><em>If you’ve heard about Nix before and wanted to try it out for a simple use case you can skip directly to “Before continuing” section</em></p>
<p>If you’re an occasional user of NPM or PIP you probably experienced a highly frustrating process of going through series of StackOverflow guided <code class="highlighter-rouge">pip --upgrade</code>, <code class="highlighter-rouge">pip uninstall this</code>, <code class="highlighter-rouge">sudo rm -rf that</code> just to install a tiny package you need to run only once. After 2 hours the best option seems to wipe out PIP altogether just to install it freshly… Eventually, after reinstalling PIP, you managed to run the command you needed. You are quite sure you broke a few other things on your way, but well, at least you finished your task. Tomorrow will be the day to tidy your system up.</p>
<p>I went through the above process more times than I would like. Although I don’t have hard data on this, everyone I talked to had this kind of problem a few times as well. The popularity of <a href="https://stackoverflow.com/questions/49836676/error-after-upgrading-pip-cannot-import-name-main">StackOverflow threads</a> related to corrupted PIP and NPM states and <a href="https://twitter.com/li_haoyi/status/1242715533805375490">twitter</a> <a href="https://twitter.com/li_haoyi/status/1242588785109397505">mentions</a> suggests it is a widespread issue indeed.</p>
<p>I’ve heard a lot of good things about Nix and experimented with it a bit but I was missing a useful enough, and at the same time easy enough, use case to put it into practice. I got a new laptop some time ago and for the sake of “stepping out of comfort zone” I decided not to install any packages using PIP or NPM on that machine. Instead, I resolved to install as much as I can using Nix.</p>
<p>Nix is a package manager so you can think of it as a replacement of say apt-get or brew. In this blog post I want to focus on nix-shell which uses Nix in order to create isolated, project-specific environments - something similar in spirit to virtualenv.</p>
<p>Keep in mind it’s just one many other use cases of Nix, yet perfect for educational purposes. It solves a real problem with only a small investment of effort in learning Nix. Also, there’s no risk of corrupting any system-wide package as Nix keeps all its files in <code class="highlighter-rouge">/nix</code> directory.</p>
<h3 id="why-nix">Why Nix</h3>
<p>It’s common in the field of Software Engineering that a new tool promises to solve issues of its predecessors without providing any rationale, in-depth analysis or even basic research in prior work. The result of such an approach is a multitude of tools that do not differ that much from each other - all of them sharing similar strengths and weaknesses. That’s not the case with Nix though.</p>
<p>Nix origin traces back to Eelco Dolstra’s PhD <a href="https://www.semanticscholar.org/paper/The-purely-functional-software-deployment-model-Dolstra/7c9d53d567c4db2034d8019ff11e0eb623fe2142">dissertation</a> in which he rethought package management from the ground up. He envisioned Nix to be a purely functional package manager built with full reproducibility in mind. I don’t want to elaborate too much on Nix design here, especially given that there are a lot of resources on that. A good starting point might be <a href="https://nixos.org/nix/manual/#ch-about-nix">the first chapter</a> of the Nix manual or the <a href="https://vimeo.com/showcase/4676161/video/223525975">presentation</a> by Joël Franusic.</p>
<h3 id="before-continuing">Before continuing</h3>
<p>In the remainder of that article I assume you have Nix installed. I recommend completing <a href="https://nixos.org/nix/manual/#chap-quick-start">Quick Start</a> - it should not take more than 10 minutes and it will walk you through installation and complete basics.</p>
<h3 id="example-1---awscli">Example 1 - awscli</h3>
<p>In the first example, we will install AWS CLI version 1. I used to install it with PIP as described <a href="https://docs.aws.amazon.com/cli/latest/userguide/install-cliv1.html#install-tool-pip">here</a> but I wanted to use Nix this time.</p>
<p>First I verified I don’t have a system-wide AWS CLI installed:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> which aws
aws not found
</code></pre></div></div>
<p>My task is well defined now - it’s to bring <code class="highlighter-rouge">aws</code> into scope. Let’s start with creating a directory for project:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">mkdir </span>aws-cli
<span class="nb">cd </span>aws-cli
</code></pre></div></div>
<p><em>zsh trivia: you can accomplish the same with single command: <code class="highlighter-rouge">take aws-cli</code></em></p>
<p>To install it within newly created directory I saved file <code class="highlighter-rouge">default.nix</code> with the following content:</p>
<div class="language-nix highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">with</span> <span class="kr">import</span> <span class="o"><</span><span class="nv">nixpkgs</span><span class="o">></span> <span class="p">{};</span>
<span class="nv">stdenv</span><span class="o">.</span><span class="nv">mkDerivation</span> <span class="kr">rec</span> <span class="p">{</span>
<span class="nv">name</span> <span class="o">=</span> <span class="s2">"aws-cli"</span><span class="p">;</span>
<span class="nv">buildInputs</span> <span class="o">=</span> <span class="p">[</span> <span class="nv">awscli</span> <span class="p">];</span>
<span class="p">}</span>
</code></pre></div></div>
<p>With such file in current directory I can:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> nix-shell
</code></pre></div></div>
<p>It will trigger building a new environment, namely downloading all declared artifacts and their dependencies from <code class="highlighter-rouge">cache.nixos.org</code> and/or building from sources. It may take some time for a first time but eventually you should end up in another shell. Let’s verify whether AWS CLI was installed there:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> which aws
/nix/store/kc8s3h40kbzlwa9al2yhnwy0gvjxcslf-awscli-1.17.13/bin/aws
</code></pre></div></div>
<p>It was enough to create an isolated environment with AWS CLI available. Let’s step back and understand what happened:</p>
<ul>
<li>If you wonder what actually that 5 lines of Nix code do please read <a href="https://www.sam.today/blog/environments-with-nix-shell-learning-nix-pt-1/">this short article</a> focusing mostly on that. From a pragmatic point of view, merely to be able to use nix-shell for simple use cases, you can copy-paste that file content and just edit <code class="highlighter-rouge">buildInputs</code> according to your needs.</li>
<li><code class="highlighter-rouge">name</code>, set to <code class="highlighter-rouge">aws-cli</code> above, serves only informational purposes in case of nix-shell. We will display it as part of bash prompt later on.</li>
<li>How did I know that I should use <code class="highlighter-rouge">awscli</code> in <code class="highlighter-rouge">buildInputs</code>? I found it out using <a href="https://nixos.org/nixos/packages.html?channel=nixpkgs-unstable">nixpkgs browser</a>.</li>
<li>What is nixpkgs then? It’s an official, community curated Nix channel. <a href="https://nixos.wiki/wiki/Nix_channels">Nix channels</a> is a mechanism for sharing Nix packages, in a way similar to e.g. <a href="https://help.ubuntu.com/community/Repositories/CommandLine">apt’s repositories</a>. When you want to install something nixpkgs will be the first place to look into.</li>
<li>Nix stores artifacts in <code class="highlighter-rouge">/nix</code> which allows having multiple versions of the same package. Packages and their versions are resolved per each nix-shell environment using symlinks. <a href="https://ariya.io/2016/06/isolated-development-environment-using-nix">That short article</a> presents how easy it is to use different python versions in different projects.</li>
<li>That also means that none of the system directories were touched. If you hadn’t had AWS CLI installed system-wide before running <code class="highlighter-rouge">nix-shell</code> you also didn’t have it afterwards
<ul>
<li>Bonus point - you don’t need <code class="highlighter-rouge">sudo</code> to use Nix</li>
</ul>
</li>
<li><code class="highlighter-rouge">nix-shell</code> command accepts the name of the file as a parameter. We didn’t pass it in our example as in case of absence of parameter <code class="highlighter-rouge">nix-shell</code> tries to open <code class="highlighter-rouge">default.nix</code>.</li>
</ul>
<p>A particularly interesting option of nix-shell is <code class="highlighter-rouge">--pure</code> parameter. With that option nix-shell will have access <strong>only to packages explicitly defined in the Nix build</strong>, in our case in <code class="highlighter-rouge">default.nix</code> file.</p>
<p>To demonstrate that I will try to execute command <code class="highlighter-rouge">git</code>, first in my standard shell and then in <code class="highlighter-rouge">nix-shell</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Default system-wide console:</span>
<span class="o">></span> git <span class="nt">--version</span>
git version 2.20.1
<span class="o">></span> nix-shell <span class="nt">--pure</span>
<span class="c"># Then within loaded nix-shell:</span>
<span class="o">></span> git <span class="nt">--version</span>
bash: git: <span class="nb">command </span>not found
</code></pre></div></div>
<p>As you see, even though I have <code class="highlighter-rouge">git</code> installed system-wide, it failed from within <code class="highlighter-rouge">nix-shell --pure</code> because I haven’t declared <code class="highlighter-rouge">git</code> in <code class="highlighter-rouge">default.nix</code>.</p>
<p><code class="highlighter-rouge">--pure</code>, while not being very useful for day-to-day work, is invaluable for ensuring that the definition of your build will run successfully on any other machine. <code class="highlighter-rouge">--pure</code> gives you a full certainty because there’s no way that some package, you just happen to have installed in your system, can leak into a nix-shell environment.</p>
<h3 id="refining-example-1---zsh-support-and-direnv">Refining example 1 - zsh support and direnv</h3>
<h4 id="zsh-support">zsh support</h4>
<p>There’s a serious inconvenience with the solution in a current shape but it’s not apparent by looking at code snippets. However, it’s easy to illustrate it with pictures.</p>
<p>Native <code class="highlighter-rouge">zsh</code> console lacking <code class="highlighter-rouge">aws</code> command:</p>
<p><img src="/blog/assets/nix-shell/zsh-no-aws.png" alt="Native zsh console lacking aws command" /></p>
<p><code class="highlighter-rouge">nix-shell</code> console has <code class="highlighter-rouge">aws</code> installed but otherwise looks ugly:</p>
<p><img src="/blog/assets/nix-shell/nix-aws.png" alt="nix-shell console with aws command available" /></p>
<p>The starkest difference is lack of colorful prompts and thus no information about git branch, time, exit code of the previous command, and so forth. Shell in that form turns out to be even less practical if you try to use it - that’s the moment when you realize how much you, and your muscle memory, rely on aliases and zsh plugins.</p>
<p>Naive <code class="highlighter-rouge">nix-shell</code> like this is good enough if you want to run a single command but cannot replace a full-blown, configured zsh. As a consequence, I found myself constantly switching between nixified console for <code class="highlighter-rouge">aws</code> and normal, non-nixified console for anything else. Until I got to know about <a href="https://github.com/chisui/zsh-nix-shell#oh-my-zsh">zsh-nix-shell</a>.</p>
<p>As an oh-my-zsh user, I followed installation instruction from <a href="https://github.com/chisui/zsh-nix-shell#oh-my-zsh">here</a>. Result:</p>
<p><img src="/blog/assets/nix-shell/nix-zsh.png" alt="nix-shell with zsh" /></p>
<p>It works nicely - nix-shell looks identically as my normal shell and has its all precious functionalities. However, in practice, it’s very useful to have some graphical indicator to distinguish nix-shell from a standard shell. We can configure it with <a href="https://github.com/chisui/zsh-nix-shell#environment-info">that zsh-nix-shell feature</a>. To set it up for <a href="https://github.com/Powerlevel9k/powerlevel9k">powerlevel9k theme</a> I made 2 following changes:</p>
<p>The first change, in <code class="highlighter-rouge">.zshrc</code> I edited <code class="highlighter-rouge">POWERLEVEL9K_LEFT_PROMPT_ELEMENTS</code> so it looks like:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">POWERLEVEL9K_LEFT_PROMPT_ELEMENTS</span><span class="o">=(</span>nix_shell <span class="nb">dir </span>rbenv vcs<span class="o">)</span>
</code></pre></div></div>
<p>The second change is a bit more involved but I mostly copied it from <a href="https://gist.github.com/chisui/0d12bd51a5fd8e6bb52e6e6a43d31d5e#file-agnoster-nix-zsh-theme">that gist</a> (mentioned in <a href="https://github.com/chisui/zsh-nix-shell#environment-info">here</a>). In <code class="highlighter-rouge">~/.oh-my-zsh/custom/themes/powerlevel9k/powerlevel9k.zsh-theme</code> (or whatever is the theme you use) I added:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Mostly copied from https://gist.github.com/chisui/0d12bd51a5fd8e6bb52e6e6a43d31d5e#file-agnoster-nix-zsh-theme</span>
prompt_nix_shell<span class="o">()</span> <span class="o">{</span>
<span class="k">if</span> <span class="o">[[</span> <span class="nt">-n</span> <span class="s2">"</span><span class="nv">$IN_NIX_SHELL</span><span class="s2">"</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then
if</span> <span class="o">[[</span> <span class="nt">-n</span> <span class="nv">$NIX_SHELL_PACKAGES</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then
</span><span class="nb">local </span><span class="nv">package_names</span><span class="o">=</span><span class="s2">""</span>
<span class="nb">local </span><span class="nv">packages</span><span class="o">=(</span><span class="nv">$NIX_SHELL_PACKAGES</span><span class="o">)</span>
<span class="k">for </span>package <span class="k">in</span> <span class="nv">$packages</span><span class="p">;</span> <span class="k">do
</span>package_names+<span class="o">=</span><span class="s2">" </span><span class="k">${</span><span class="nv">package</span><span class="p">##*.</span><span class="k">}</span><span class="s2">"</span>
<span class="k">done</span>
<span class="nv">$1_prompt_segment</span> <span class="s2">"</span><span class="nv">$0</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$2</span><span class="s2">"</span> black yellow <span class="s2">"{</span><span class="nv">$package_names</span><span class="s2"> }"</span>
<span class="k">elif</span> <span class="o">[[</span> <span class="nt">-n</span> <span class="nv">$name</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then
</span><span class="nb">local </span><span class="nv">cleanName</span><span class="o">=</span><span class="k">${</span><span class="nv">name</span><span class="p">#interactive-</span><span class="k">}</span>
<span class="nv">cleanName</span><span class="o">=</span><span class="k">${</span><span class="nv">cleanName</span><span class="p">%-environment</span><span class="k">}</span>
<span class="nv">$1_prompt_segment</span> <span class="s2">"</span><span class="nv">$0</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$2</span><span class="s2">"</span> black yellow <span class="s2">"{ </span><span class="nv">$cleanName</span><span class="s2"> }"</span>
<span class="k">else</span> <span class="c"># This case is only reached if the nix-shell plugin isn't installed or failed in some way</span>
<span class="nv">$1_prompt_segment</span> <span class="s2">"</span><span class="nv">$0</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$2</span><span class="s2">"</span> black yellow <span class="s2">"nix-shell {}"</span>
<span class="k">fi
fi</span>
<span class="o">}</span>
</code></pre></div></div>
<p>The result:</p>
<p><img src="/blog/assets/nix-shell/nix-prompt.png" alt="nix-shell with zsh and prompt" /></p>
<p>It’s a very comfortable setup. Please note that all the configuration made in this section to make nix-shell play nicely with zsh is a one-time job and you don’t need to repeat it for every project.</p>
<h4 id="automatically-open-nix-shell-with-direnv">Automatically open nix-shell with direnv</h4>
<p>There’s still one slightly bothering thing - you’re required to <code class="highlighter-rouge">nix-shell</code> manually in the directory with <code class="highlighter-rouge">default.nix</code>. In many cases it might be desirable to automate it too. Fortunately, there’s a tool for executing commands on entering a directory: <a href="https://github.com/direnv/direnv">direnv</a> (<a href="https://github.com/direnv/direnv/blob/master/docs/installation.md#installation">installation instruction</a>).</p>
<p>Once you have <a href="https://github.com/direnv/direnv/blob/master/docs/installation.md#installation">direnv installed</a> configuring it in your project is a matter of:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Execute in a directory containing `default.nix`:</span>
<span class="nb">echo</span> <span class="s2">"use_nix"</span> <span class="o">></span> .envrc
direnv allow <span class="nb">.</span>
</code></pre></div></div>
<p>Result:</p>
<p><img src="/blog/assets/nix-shell/nix-direnv.png" alt="nix-shell with zsh and prompt" /></p>
<h3 id="example-2---wscat-npm-package">Example 2 - wscat (npm package)</h3>
<p>Say you need a websocket client. After quick Google search, you might have decided to use <a href="https://www.npmjs.com/package/wscat">wscat</a> which is an npm package. It’s <a href="https://nixos.org/nixos/packages.html?channel=nixos-19.09&query=wscat">not available</a> in nixpkgs though so using it will not be as easy as it was in case of AWS CLI.</p>
<p>Let’s begin with creating a new directory for that project:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">mkdir </span>wscat
<span class="nb">cd </span>wscat
</code></pre></div></div>
<p>The initial approach might be to install <code class="highlighter-rouge">npm</code> using the same technique we used so far:</p>
<div class="language-nix highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">with</span> <span class="kr">import</span> <span class="o"><</span><span class="nv">nixpkgs</span><span class="o">></span> <span class="p">{};</span>
<span class="nv">stdenv</span><span class="o">.</span><span class="nv">mkDerivation</span> <span class="kr">rec</span> <span class="p">{</span>
<span class="nv">name</span> <span class="o">=</span> <span class="s2">"npm"</span><span class="p">;</span>
<span class="nv">buildInputs</span> <span class="o">=</span> <span class="p">[</span> <span class="nv">nodejs</span> <span class="p">];</span>
<span class="p">}</span>
</code></pre></div></div>
<p>That way <code class="highlighter-rouge">npm</code> will be available in <code class="highlighter-rouge">nix-shell</code>. Then we can create such <code class="highlighter-rouge">package.json</code>:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="nl">"devDependencies"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"wscat"</span><span class="p">:</span><span class="w"> </span><span class="s2">"latest"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>Then:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> nix-shell
<span class="c"># nix-shell opens and then:</span>
<span class="o">></span> npm <span class="nb">install</span>
<span class="c"># With install completed:</span>
<span class="o">></span> ./node_modules/wscat/bin/wscat
</code></pre></div></div>
<p>Despite it works, it feels against the spirit of nix-shell. One thing you can easily notice is that we need to <code class="highlighter-rouge">./node_modules/wscat/bin/wscat</code> instead of just <code class="highlighter-rouge">wscat</code>. It’s not only a mere inconvenience, rather it’s a symptom of a bigger issue. Note that <code class="highlighter-rouge">npm install</code> resulted in the creation of the directory <code class="highlighter-rouge">node_modules</code> and all npm artifacts are stored there. It circumvents the idea of nix, which we expect to store all the artifacts in <code class="highlighter-rouge">/nix</code> and reuse them between different environments.</p>
<p>It’s where <a href="https://github.com/svanderburg/node2nix">node2nix</a> comes into play.</p>
<h3 id="refining-example-2---wscat-with-node2nix">Refining example 2 - wscat with node2nix</h3>
<p>As stated on <a href="node2nix">node2nix github page</a> it “generates Nix expressions to build NPM packages”. We don’t want to build NPM package but it still relevant - we can describe <code class="highlighter-rouge">wscat</code> as a development dependency and node2nix will take care of generating proper Nix expressions containing node, npm and any other dependencies.</p>
<p>Create a new directory with 2 files in it:</p>
<p><code class="highlighter-rouge">node2nix.nix</code>:</p>
<div class="language-nix highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">with</span> <span class="kr">import</span> <span class="o"><</span><span class="nv">nixpkgs</span><span class="o">></span> <span class="p">{};</span>
<span class="nv">stdenv</span><span class="o">.</span><span class="nv">mkDerivation</span> <span class="kr">rec</span> <span class="p">{</span>
<span class="nv">name</span> <span class="o">=</span> <span class="s2">"node2nix"</span><span class="p">;</span>
<span class="nv">buildInputs</span> <span class="o">=</span> <span class="p">[</span> <span class="nv">nodePackages</span><span class="o">.</span><span class="nv">node2nix</span> <span class="p">];</span>
<span class="p">}</span>
</code></pre></div></div>
<p><code class="highlighter-rouge">package.json</code>:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"wscat"</span><span class="p">,</span><span class="w">
</span><span class="nl">"version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"latest"</span><span class="p">,</span><span class="w">
</span><span class="nl">"devDependencies"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"wscat"</span><span class="p">:</span><span class="w"> </span><span class="s2">"latest"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>Then:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nix-shell node2nix.nix <span class="nt">--run</span> <span class="s2">"node2nix --development"</span>
</code></pre></div></div>
<ul>
<li><code class="highlighter-rouge">--development</code> flag is crucial since we defined <code class="highlighter-rouge">wscat</code> as <code class="highlighter-rouge">devDependency</code></li>
</ul>
<p>That generates <code class="highlighter-rouge">node-env.nix</code>, <code class="highlighter-rouge">node-packages.nix</code> and <code class="highlighter-rouge">default.nix</code>. Having <code class="highlighter-rouge">default.nix</code> means that we can run <code class="highlighter-rouge">nix-shell</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> nix-shell
error: nix-shell requires a single derivation
Try <span class="s1">'nix-shell --help'</span> <span class="k">for </span>more information.
</code></pre></div></div>
<p>It’s because generated <code class="highlighter-rouge">default.nix</code> contains a few derivations. We need to specify attribute path using <code class="highlighter-rouge">-A</code> option. node2nix documentation <a href="https://github.com/svanderburg/node2nix#deploying-a-development-environment-of-a-nodejs-development-project">mentions</a> that you should use attribute <code class="highlighter-rouge">shell</code> , as in:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> nix-shell <span class="nt">-A</span> shell
<span class="c"># and in a new console:</span>
<span class="o">></span> wscat
Usage: wscat <span class="o">[</span>options] <span class="o">(</span><span class="nt">--listen</span> <port> | <span class="nt">--connect</span> <url><span class="o">)</span>
...
</code></pre></div></div>
<p>As you see it worked well. Although whole process is simple, it’s repeatable and contains a few gotchas like <code class="highlighter-rouge">--development</code> flag or <code class="highlighter-rouge">-A shell</code>, which you need to keep in mind. Therefore, I created a simple <a href="https://gist.github.com/note/49005aa90466a289bc936e7d3c434298">script</a> to automate that process. It might be a bit simplistic and it’s definitely not battle tested but it serves as good starting point for your own. Having that script aliased as <code class="highlighter-rouge">npminstall</code> creating a nix-shell environment with <code class="highlighter-rouge">wscat</code> boils down to single command:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> npminstall wscat
</code></pre></div></div>
<p>It will create a synthetic <code class="highlighter-rouge">package.json</code>, call <code class="highlighter-rouge">node2nix</code> and eventually create <code class="highlighter-rouge">default.nix</code> you can use with simply <code class="highlighter-rouge">nix-shell</code>.</p>
<h3 id="other-considerations">Other considerations</h3>
<h4 id="this-to-nix-that-to-nix-everything-to-nix">this-to-nix, that-to-nix, everything-to-nix</h4>
<p>As we saw in previous section - it’s very convenient to have a tool to translate “traditional builds” to Nix. Although not necessary they provide a first-class support for packages created with traditional package managers. That’s the reason for <a href="https://github.com/nix-community/bundix">multitude</a> <a href="https://github.com/moretea/yarn2nix">of</a> <a href="https://github.com/justinwoo/spago2nix">nix</a> <a href="https://github.com/johbo/pip2nix">to</a> X projects.</p>
<h4 id="why-not-docker">Why not Docker?</h4>
<p>The scenario I consider here sounds like a use case for Docker (or containers in general): I want to run a single binary without caring about its dependencies. It’s true that Nix and Docker overlap in that regard, however, Docker does more than just providing a package. Most importantly - it runs the binary in the container. The distinguishing trait of containers is process isolation - containers have limited access to filesystem, networking, CPU, and memory. In my case though I don’t need process isolation - actually I just want to have <code class="highlighter-rouge">aws-cli</code> or <code class="highlighter-rouge">wscat</code> or <code class="highlighter-rouge">jekyll</code> in my console and I want them to access all resources without any additional ceremonies containers involve. Of course, there’s no single answer and sometimes what you need is process isolation specifically but not in the case being discussed here.</p>
<h4 id="lorri-and-cachix">Lorri and cachix</h4>
<p><a href="https://github.com/target/lorri">Lorri</a> is supposed to be nix-shell replacement for project development. It fixes a few inconveniences of nix-shell, you can read more <a href="https://www.tweag.io/posts/2019-03-28-introducing-lorri.html">here</a>. I have not tried out <a href="https://www.tweag.io/posts/2019-03-28-introducing-lorri.html">lorri</a> yet so cannot tell anything else.</p>
<p>It’s good to be aware of <a href="https://domenkozar.com/2018/06/01/announcing-cachix-binary-cache-as-a-service/">cachix</a> too.</p>
<h3 id="where-to-go-next">Where to go next</h3>
<p>If you are interested in Nix I think it’s a good idea to incorporate more and more of nix-shell into your daily work. It’s a low effort endeavor and it can pay back by saving the time you normally spent fighting with inconsistencies in other build systems. A side effect, and a potential benefit, is learning Nix which might be a great investment given it feels like build tool of the future.</p>
<p>There’s a very similar <a href="https://medium.com/@ejpcmac/about-using-nix-in-my-development-workflow-12422a1f2f4c">article</a> focusing on different aspects. Both <a href="https://nixos.org/nix/manual/">Nix manual</a> and <a href="https://nixos.org/nixos/nix-pills/">Nix pills</a> are good reads too.</p>If you’ve heard about Nix before and wanted to try it out for a simple use case you can skip directly to “Before continuing” sectionWriting native CLI applications in Scala with GraalVM2020-03-10T09:40:00+01:002020-03-10T09:40:00+01:00http://msitko.pl/blog/2020/03/10/writing-native-cli-applications-in-scala-with-graalvm<p>We’ve been always told that writing CLIs in Scala is not a good idea: memory consumption, slow startup, JIT warm-up and prerequisite of having JRE installed made this idea not seem appealing.</p>
<p>I believe that the development of GraalVM Native Image changed that drastically. It removed any initial overhead of running JVM while also reducing memory footprint of running executables significantly. Moreover, it enabled the distribution model known from Go - releasing a single standalone binary. It’s safe to say that ahead of time compilation overcome all drawbacks of running short-lived programs on the JVM.</p>
<p>While Native Image made writing small native CLIs <em>possible</em> in JVM world, it has not, by itself, made it a <em>productive</em> option. To develop small CLIs <em>quickly and productively</em> you also need libraries that support that.</p>
<p>Let’s think about what we would expect from the ecosystem of the language to create CLIs effectively. We need to:</p>
<ul>
<li>work with files and paths efficiently. Provided by <a href="https://github.com/lihaoyi/os-lib">os-lib</a></li>
<li>work with subprocesses efficiently. Also achieved with <a href="https://github.com/lihaoyi/os-lib">os-lib</a></li>
<li>parse command line arguments, show proper error messages and show help messages. Provided by <a href="https://github.com/bkirwi/decline">decline</a></li>
<li>use ANSI coloring of strings - <a href="https://github.com/lihaoyi/fansi">fansi</a></li>
<li>have fully automated release process. It should produce standalone binaries for MacOS, Linux and Windows which are uploaded to Github release page. Powered by GraalVM Native Image and Travis CI</li>
<li>version of the release should be propagated automatically all the way down so the binary contains it. Provided by <a href="https://github.com/sbt/sbt-buildinfo">sbt-buildinfo</a> and <a href="https://github.com/sbt/sbt-git">sbt-git</a></li>
</ul>
<h3 id="hypothesis">Hypothesis</h3>
<p>I hypothesize that Scala might be a productive language for writing small CLIs. I keep on repeating “small CLI” but what do I mean by that? I mean glue code traditionally written in Python or Go in data science, bioinformatics and devops, among others.</p>
<h3 id="example-application">Example application</h3>
<p>For the sake of this article, I decided to implement a tiny tool for quickly switching between directories. The goal is that you can type <code class="highlighter-rouge">tp goto x</code> instead of <code class="highlighter-rouge">cd /a/lot/of/directories/leading/to/x</code>. It’s inspired one to one by project written in Haskell: <a href="https://github.com/bollu/teleport">teleport</a>.</p>
<p>Here’s a short asciinema animation presenting the tool:</p>
<script id="asciicast-310481" src="https://asciinema.org/a/310481.js" async=""></script>
<p>All of the code is available on Github as <a href="https://github.com/note/teleport-scala">teleport-scala</a>.</p>
<h3 id="ioapp">IOApp</h3>
<p>You may think that the starting point of any Scala application is <code class="highlighter-rouge">def main</code> but it’s not present in <code class="highlighter-rouge">teleport-scala</code>. I used <code class="highlighter-rouge">cats.effect.IOApp</code> instead. Intially I was refraining from using it thinking that similarly to <code class="highlighter-rouge">scala.App</code> it does not solve any problem and just make things less obvious. It’s not the case with <code class="highlighter-rouge">IOApp</code> though.</p>
<p>It models your program explicitly as <code class="highlighter-rouge">IO[ExitCode]</code>, helps you with <a href="https://typelevel.org/cats-effect/datatypes/ioapp.html#cancelation-and-safe-resource-release">cancellation and safe resource release</a> and brings <code class="highlighter-rouge">Timer[IO]</code> into scope. Read more <a href="https://typelevel.org/cats-effect/datatypes/ioapp.html">here</a>.</p>
<h3 id="parsing-arguments-with-decline">Parsing arguments with decline</h3>
<p>There are a few options for parsing command line arguments and I picked <a href="https://github.com/bkirwi/decline">decline</a>. Instead of explaining <a href="http://ben.kirw.in/decline/">its API</a> I would like to study the anatomy of one of teleport-scala commands: <code class="highlighter-rouge">add</code>. Here’s an example invocation:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>teleport-scala <span class="nt">--no-colors</span> add notes some/dir
</code></pre></div></div>
<p>Using decline’s terminology we can say that:</p>
<ul>
<li><code class="highlighter-rouge">--no-colors</code> is a flag</li>
<li><code class="highlighter-rouge">add</code> is a subcommand</li>
<li><code class="highlighter-rouge">notes</code> and <code class="highlighter-rouge">some/dir</code> are positional arguments</li>
</ul>
<p>Since <code class="highlighter-rouge">--no-colors</code> is a flag that can be applied to any subcommand I think of it as a “global flag”. I gathered all of global flags together in their own type:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">final</span> <span class="k">case</span> <span class="k">class</span> <span class="nc">GlobalFlags</span><span class="o">(</span><span class="n">colors</span><span class="k">:</span> <span class="kt">Boolean</span><span class="o">,</span> <span class="n">headers</span><span class="k">:</span> <span class="kt">Boolean</span><span class="o">)</span>
</code></pre></div></div>
<p>This is howdefine <code class="highlighter-rouge">GlobalFlag</code> parser looks like:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">val</span> <span class="nv">flags</span><span class="k">:</span> <span class="kt">Opts</span><span class="o">[</span><span class="kt">GlobalFlags</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">val</span> <span class="nv">nocolorsOpt</span> <span class="k">=</span> <span class="nf">booleanFlag</span><span class="o">(</span><span class="s">"no-colors"</span><span class="o">,</span> <span class="n">help</span> <span class="k">=</span> <span class="s">"Disable ANSI color codes"</span><span class="o">)</span>
<span class="k">val</span> <span class="nv">noheadersOpt</span> <span class="k">=</span> <span class="nf">booleanFlag</span><span class="o">(</span><span class="s">"no-headers"</span><span class="o">,</span> <span class="n">help</span> <span class="k">=</span> <span class="s">"Disable printing headers for tabular data"</span><span class="o">)</span>
<span class="o">(</span><span class="n">nocolorsOpt</span><span class="o">,</span> <span class="n">noheadersOpt</span><span class="o">).</span><span class="py">mapN</span><span class="o">((</span><span class="n">noColors</span><span class="o">,</span> <span class="n">noHeaders</span><span class="o">)</span> <span class="k">=></span> <span class="nc">GlobalFlags</span><span class="o">(!</span><span class="n">noColors</span><span class="o">,</span> <span class="o">!</span><span class="n">noHeaders</span><span class="o">))</span>
<span class="o">}</span>
</code></pre></div></div>
<p><code class="highlighter-rouge">add</code> subcommand is defined in the following way:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">val</span> <span class="nv">nameOpt</span> <span class="k">=</span> <span class="nv">Opts</span><span class="o">.</span><span class="py">argument</span><span class="o">[</span><span class="kt">String</span><span class="o">](</span><span class="s">"NAME"</span><span class="o">)</span>
<span class="k">val</span> <span class="nv">add</span> <span class="k">=</span>
<span class="nc">Command</span><span class="o">(</span>
<span class="n">name</span> <span class="k">=</span> <span class="s">"add"</span><span class="o">,</span>
<span class="n">header</span> <span class="k">=</span> <span class="s">"add a teleport point"</span>
<span class="o">)((</span><span class="n">nameOpt</span><span class="o">,</span> <span class="nv">Opts</span><span class="o">.</span><span class="py">argument</span><span class="o">[</span><span class="kt">String</span><span class="o">](</span><span class="s">"FOLDERPATH"</span><span class="o">).</span><span class="py">orNone</span><span class="o">).</span><span class="py">mapN</span><span class="o">(</span><span class="nc">AddCmdOptions</span><span class="o">))</span>
</code></pre></div></div>
<p>I extracted <code class="highlighter-rouge">Opts.argument[String]("NAME")</code> to its own value because it will be used in a few other places.</p>
<p>Other 4 subcommands are defined in the same declarative way (details omitted for conciseness):</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">val</span> <span class="nv">list</span> <span class="k">=</span> <span class="nc">Command</span><span class="o">(...)</span>
<span class="k">val</span> <span class="nv">remove</span> <span class="k">=</span> <span class="nc">Command</span><span class="o">(...)</span>
<span class="k">val</span> <span class="nv">goto</span> <span class="k">=</span> <span class="nc">Command</span><span class="o">(...)</span>
<span class="k">val</span> <span class="nv">version</span> <span class="k">=</span> <span class="nc">Command</span><span class="o">(...)</span>
</code></pre></div></div>
<p>We chain subcommands together:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">subcommands</span> <span class="k">=</span> <span class="nc">Opts</span>
<span class="o">.</span><span class="py">subcommand</span><span class="o">(</span><span class="n">add</span><span class="o">)</span>
<span class="o">.</span><span class="py">orElse</span><span class="o">(</span><span class="nv">Opts</span><span class="o">.</span><span class="py">subcommand</span><span class="o">(</span><span class="n">list</span><span class="o">))</span>
<span class="o">...</span>
</code></pre></div></div>
<p>We combine global flags with subcommands:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">val</span> <span class="nv">appCmd</span><span class="k">:</span> <span class="kt">Opts</span><span class="o">[(</span><span class="kt">GlobalFlags</span>, <span class="kt">CmdOptions</span><span class="o">)]</span> <span class="k">=</span> <span class="o">(</span><span class="n">flags</span><span class="o">,</span> <span class="n">subcommands</span><span class="o">).</span><span class="py">tupled</span>
</code></pre></div></div>
<p>And… that’s all! We didn’t have to write any parsing code explicitly, the whole specification is written in purely declarative way. Also, we provided names and headers of all the commands so <code class="highlighter-rouge">decline</code> has enough data to generate help messages. Here is an example of help message for <code class="highlighter-rouge">tp add</code>:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> <span class="n">tp</span> <span class="n">add</span> <span class="o">--</span><span class="n">help</span>
<span class="nc">Usage</span><span class="k">:</span> <span class="kt">teleport-scala</span> <span class="kt">add</span> <span class="kt"><NAME></span> <span class="err">[</span><span class="kt"><FOLDERPATH></span><span class="err">]</span>
<span class="kt">add</span> <span class="kt">a</span> <span class="kt">teleport</span> <span class="kt">point</span>
<span class="nc">Options</span> <span class="n">and</span> <span class="n">flags</span><span class="k">:</span>
<span class="kt">--help</span>
<span class="nc">Display</span> <span class="k">this</span> <span class="n">help</span> <span class="n">text</span><span class="o">.</span>
</code></pre></div></div>
<h3 id="business-logic">Business logic</h3>
<p>In the previous point we parsed command line arguments to <code class="highlighter-rouge">(GlobalFlags, CmdOptions)</code>. Now, we need to dispatch the command to the proper handling code:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">dispatchCmd</span><span class="o">(</span><span class="n">globalFlags</span><span class="k">:</span> <span class="kt">GlobalFlags</span><span class="o">,</span> <span class="n">cmd</span><span class="k">:</span> <span class="kt">CmdOptions</span><span class="o">,</span> <span class="n">handler</span><span class="k">:</span> <span class="kt">Handler</span><span class="o">)(</span>
<span class="k">implicit</span> <span class="n">style</span><span class="k">:</span> <span class="kt">Style</span><span class="o">)</span><span class="k">:</span> <span class="kt">IO</span><span class="o">[</span><span class="kt">ExitCode</span><span class="o">]</span> <span class="k">=</span>
<span class="n">cmd</span> <span class="k">match</span> <span class="o">{</span>
<span class="k">case</span> <span class="n">cmd</span><span class="k">:</span> <span class="kt">AddCmdOptions</span> <span class="o">=></span>
<span class="nv">handler</span><span class="o">.</span><span class="py">add</span><span class="o">(</span><span class="n">cmd</span><span class="o">).</span><span class="py">map</span> <span class="o">{</span>
<span class="k">case</span> <span class="nc">Right</span><span class="o">(</span><span class="n">tpPoint</span><span class="o">)</span> <span class="k">=></span>
<span class="nf">println</span><span class="o">(</span><span class="n">s</span><span class="s">"Creating teleport point: ${style.emphasis(tpPoint.name)}"</span><span class="o">)</span>
<span class="nv">ExitCode</span><span class="o">.</span><span class="py">Success</span>
<span class="k">case</span> <span class="nc">Left</span><span class="o">(</span><span class="n">err</span><span class="o">)</span> <span class="k">=></span>
<span class="nf">println</span><span class="o">(</span><span class="nv">err</span><span class="o">.</span><span class="py">fansi</span><span class="o">)</span>
<span class="nv">ExitCode</span><span class="o">.</span><span class="py">Error</span>
<span class="o">}</span>
<span class="o">...</span>
</code></pre></div></div>
<p>As I spent most of my programming life writing servers I like to think about it as of routing code. There are 2 responsibilities of this code: dispatching “request” to proper “handler” and then presenting the result in proper form. In case of CLI it’s not an HTML or JSON but simply text.</p>
<p>The actual business logic lies in <code class="highlighter-rouge">Handler</code>. I don’t want to focus too much on it though as it’s not particularly relevant to the main point of the article. In a nutshell - we
persist teleport points as <code class="highlighter-rouge">TeleportState</code> in file <code class="highlighter-rouge">$HOME/.teleport-data</code>. It’s a JSON file and circe is being used for working with JSON.</p>
<p>Even though the code of handler is straightforward, there’s one interesting ingredient involved. It’s <a href="https://github.com/lihaoyi/os-lib">os-lib</a> for working with paths and files. os-lib has a unique philosophy behind it - it tries to use Scala more as a scripting language. Let’s take a look at the type of <code class="highlighter-rouge">os.read.apply</code>, which reads a file into String:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">apply</span><span class="o">(</span><span class="n">arg</span><span class="k">:</span> <span class="kt">ReadablePath</span><span class="o">)</span><span class="k">:</span> <span class="kt">String</span>
</code></pre></div></div>
<p>Being a Scala developer you might be surprised by the simplicity of the result type. It’s not <code class="highlighter-rouge">IO[InputStream]</code> or <code class="highlighter-rouge">Try[InputStream]</code> - it’s just a <code class="highlighter-rouge">String</code>. Is this good? As always, it depends on the use case. If you write a CLI tool for power users then maybe just throwing an exception with filename is enough for them to figure out what went wrong? And if you can predict the size of a file maybe you don’t need to stream the file?</p>
<p>Even if you decided you need more powerful tool for working with files you may still be interested in using <code class="highlighter-rouge">os-lib</code> for its capabilities in working with <a href="http://www.lihaoyi.com/post/HowtoworkwithFilesinScala.html#paths">paths</a> and <a href="http://www.lihaoyi.com/post/HowtoworkwithSubprocessesinScala.html">subprocesses</a>.</p>
<p>In the rest of this article I will walk you through aspects of developing CLIs that I find important.</p>
<h3 id="coloring">Coloring</h3>
<p>We will use <a href="https://github.com/lihaoyi/fansi">fansi</a> for ASCII coloring. With fansi string coloring boils down to:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">fansi</span><span class="o">.</span><span class="py">Color</span><span class="o">.</span><span class="py">Red</span><span class="o">(</span><span class="s">"Hello World Ansi!"</span><span class="o">)</span>
</code></pre></div></div>
<p>We don’t want though to use it directly, mostly to have a single place to control color palette. Therefore, a trait <code class="highlighter-rouge">Style</code> is defined:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">trait</span> <span class="nc">Style</span> <span class="o">{</span>
<span class="k">def</span> <span class="nf">emphasis</span><span class="o">(</span><span class="n">input</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span><span class="k">:</span> <span class="kt">fansi.Str</span>
<span class="k">def</span> <span class="nf">error</span><span class="o">(</span><span class="n">input</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span><span class="k">:</span> <span class="kt">fansi.Str</span>
<span class="o">}</span>
</code></pre></div></div>
<p>In all places we want to use colors we will demand an instance of <code class="highlighter-rouge">Style</code> to be provided.</p>
<h3 id="interoperability-with-unix-tools">Interoperability with UNIX tools</h3>
<p>Let’s take a look at <code class="highlighter-rouge">tp list</code> output:</p>
<p><img src="/blog/assets/native-clis-in-scala/terminal-output.png" alt="My helpful screenshot" /></p>
<p>Coloring makes it pleasant to read and helps to emphasize vital points - imagine how hard to read the results of unit tests would be without colors. However, keep in mind that it’s not always a human who reads the output; pretty often the output of a program is being processed by scripts. There’s <a href="https://monkey.org/~marius/unix-tools-hints.html">a great article</a> by Marius Eriksen on that.</p>
<p>Let’s take a look at the raw output:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>teleport points: [94m(total 1)[39m
oss [94m/home/michal/teleport-demo/code/scala-oss[39m
</code></pre></div></div>
<p>It does not look right because of ANSI escapes appearing. The first line, being a header, makes some processing much more difficult. That’s why we introduced <code class="highlighter-rouge">--no-colors</code> and <code class="highlighter-rouge">no-headers</code> flags. Thanks to <code class="highlighter-rouge">Style</code> being a trait we can define <code class="highlighter-rouge">NoColorsStyle</code>:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">object</span> <span class="nc">NoColorsStyle</span> <span class="k">extends</span> <span class="nc">Style</span> <span class="o">{</span>
<span class="k">override</span> <span class="k">def</span> <span class="nf">emphasis</span><span class="o">(</span><span class="n">input</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span><span class="k">:</span> <span class="kt">Str</span> <span class="o">=</span> <span class="nc">Str</span><span class="o">(</span><span class="n">input</span><span class="o">)</span>
<span class="k">override</span> <span class="k">def</span> <span class="nf">error</span><span class="o">(</span><span class="n">input</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span><span class="k">:</span> <span class="kt">Str</span> <span class="o">=</span> <span class="nc">Str</span><span class="o">(</span><span class="n">input</span><span class="o">)</span>
<span class="o">}</span>
</code></pre></div></div>
<h3 id="keeping-version-in-sync">Keeping version in sync</h3>
<p>To keep it simple I decided to make Git the only source of truth in regards to project’s version. If you want to release a new version you have to create and push a Git tag. That triggers a build on Travis CI. Since we used <code class="highlighter-rouge">sbt-git</code> and <a href="https://github.com/note/teleport-scala/blob/master/build.sbt#L6">enabled</a> <code class="highlighter-rouge">GitVersioning</code> sbt uses Git tag as a project version. Moreover, we configured <code class="highlighter-rouge">sbt-buildinfo</code> in a following way:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">buildInfoKeys</span> <span class="o">:=</span> <span class="nc">Seq</span><span class="o">[</span><span class="kt">BuildInfoKey</span><span class="o">](</span><span class="n">name</span><span class="o">,</span> <span class="n">version</span><span class="o">,</span> <span class="n">scalaVersion</span><span class="o">,</span> <span class="n">sbtVersion</span><span class="o">,</span> <span class="nv">git</span><span class="o">.</span><span class="py">baseVersion</span><span class="o">,</span> <span class="nv">git</span><span class="o">.</span><span class="py">gitHeadCommit</span><span class="o">),</span>
<span class="n">buildInfoPackage</span> <span class="o">:=</span> <span class="s">"pl.msitko.teleport"</span><span class="o">,</span>
<span class="n">buildInfoUsePackageAsPath</span> <span class="o">:=</span> <span class="kc">true</span><span class="o">,</span>
</code></pre></div></div>
<p>That means that all version-related data will be present in generated <code class="highlighter-rouge">pl.msitko.teleport.BuildInfo</code> class. We can import that class and use it in code handling <code class="highlighter-rouge">version</code> subcommand:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">case</span> <span class="nc">VersionCmdOptions</span> <span class="k">=></span>
<span class="nc">IO</span><span class="o">(</span><span class="nf">println</span><span class="o">(</span><span class="nv">BuildInfo</span><span class="o">.</span><span class="py">version</span><span class="o">))</span> <span class="o">*></span> <span class="nc">IO</span><span class="o">(</span><span class="nv">ExitCode</span><span class="o">.</span><span class="py">Success</span><span class="o">)</span>
</code></pre></div></div>
<p>That way we don’t need to hardcode version in the code and we guarantee it will always be in sync with Git.</p>
<h3 id="building-binary-with-native-image">Building binary with Native Image</h3>
<p>In theory building binary with Native Image should be as simple as (<em>omitting some options for the sake of brevity</em>):</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>native-image --verbose --no-fallback --static -jar teleport-scala.jar teleport-scala
</code></pre></div></div>
<p>I produced fat-jar using <code class="highlighter-rouge">sbt-assembly</code>, run the above <code class="highlighter-rouge">native-image</code> command and that’s what I’ve got:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Error: Unsupported features in 3 methods
Detailed message:
Error: com.oracle.graal.pointsto.constraints.UnsupportedFeatureException: Invoke with MethodHandle argument could not be reduced to at most a single call or single field access. The method handle must be a compile time constant, e.g., be loaded from a `static final` field. Method that contains the method handle invocation: java.lang.invoke.MethodHandle.invokeBasic()
</code></pre></div></div>
<p>It’s described in the <a href="https://github.com/scala/bug/issues/11634">Github issue</a>. I did what was advised there: I switched to Java 11 based GraalVM and added proper <a href="https://github.com/note/teleport-scala/blob/master/src/main/resources/META-INF/native-image/org.scala-lang/scala-lang/native-image.properties"><code class="highlighter-rouge">native-image.properties</code></a> and after I did the code compiled just fine. <strong>Keep that in mind - debugging this gotcha was really frustrating.</strong></p>
<p>Another important thing - <strong>you should always use <code class="highlighter-rouge">native-image</code> with <code class="highlighter-rouge">--no-fallback</code></strong>. Without that option, in the case described above, native-image would print out a few warnings but would exit with code 0 and would generate an image that requires JDK for execution - something you definitely don’t want when using native-image.</p>
<h3 id="limitations-of-native-image">Limitations of Native Image</h3>
<p>Even though using Native Image was not as easy as it appeared initially we are quite lucky with teleport-scala anyway. There are a number of Native Image <a href="https://github.com/oracle/graal/blob/master/substratevm/LIMITATIONS.md">limitations</a>. A lot of them are related to features like runtime reflection or dynamic class loading which are not widely used in Scala libraries. You can see, however, how much hassle it is to run <a href="https://github.com/vmencik/akka-graal-native">Akka</a> or <a href="https://medium.com/graalvm/instant-netty-startup-using-graalvm-native-image-generation-ed6f14ff7692#4271">Netty</a> from executable built with Native Image. <em>It’s not that problematic if someone described how to annotate library you try to run but since GraalVM is not widely adopted it’s likely you will have to figure out some parts yourself.</em></p>
<p>When choosing Native Image you should be aware that you deal with immature and evolving software - new options are being added with each version, defaults for old options are being changed. That’s the price you need to pay for surmounting traditional limitations of JVM.</p>
<h3 id="ci-build">CI build</h3>
<p>As I mentioned in the introduction, we want to release executables for all 3 major operating systems. GraalVM Native Image <a href="https://github.com/oracle/graal/issues/407">does not, and probably will not</a>, support cross compilation. The only exception is targeting Linux - its executables can be built at any platform as you can build them by running Native Image in a docker container. Nevertheless, there’s no escape from running a build on a few platforms in our case. I decided to use Travis CI as it provides environments for Windows, MacOS and Linux.</p>
<p>Coming up with Linux script was easy but the problems appeared while I started working on Windows build. Unfortunately, Native Image documentation for Windows is simply not comprehensive enough, and thus I had to go through a couple of posts found here and there to understand how it Native Image is supposed to be used on Windows. The process was painful enough that I’ve written another <a href="https://msitko.pl/blog/2020/03/05/native-image-on-windows.html">blog post</a> about it.</p>
<p>You can see the eventual build definition supporting all major operating systems <a href="https://github.com/note/teleport-scala/blob/master/.travis.yml">here</a>.</p>
<h3 id="testing">Testing</h3>
<p>Analogously to server applications, you can choose from a few testing strategies - unit tests, integration tests, end to end tests, and everything in between them.</p>
<p>In teleport-scala I took an unusual approach. Since there’s not that much logic in the program itself, and since it heavily relies on a filesystem, it has hardly any unit tests. Instead, an executable built with Native Image is being called from dockerized ammonite script. That script contains some testing code written with <a href="https://github.com/lihaoyi/utest">utest</a>. It has several advantages:</p>
<ul>
<li>since ammonite is dockerized it works on a separate filesystem, does not touch host file system at all</li>
<li>it verifies the actual artifact shipped to the users as opposed to testing merely Scala code. That way we eliminate the possibility of <a href="https://msitko.pl/blog/2019/10/19/when-your-code-fails-after-being-packaged-as-jar.html">overlooking errors introduced</a> in the process of creating artifact out of the code</li>
</ul>
<p>To get a feeling of how those tests are defined take a look at <a href="https://github.com/note/teleport-scala/blob/master/smoke-test.sc#L20"><code class="highlighter-rouge">smoke-test.sc</code></a>.</p>
<h3 id="performance">Performance</h3>
<p>We went a long way to have a native executable, the question arises - was it worth it? The size of the binary is around 12 MB, which is a decent size given that fat jar <code class="highlighter-rouge">teleport-scala.jar</code> weighs 20 MB and the binary is completely self-contained - no JVM needed. We could make it even smaller if we had not used <code class="highlighter-rouge">--static</code> but that would contradict the idea of standalone executable.</p>
<p>How about execution time? Let’s check out the performance of native binary:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>time ./teleport-scala list
./teleport-scala list 0,01s user 0,02s system 99% cpu 0,029 total
</code></pre></div></div>
<p>This is how JAR run with java performs:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>time java -jar teleport-scala.jar list
java -jar teleport-scala.jar list 1,23s user 0,10s system 191% cpu 0,698 total
</code></pre></div></div>
<p>It’s not a benchmark but the difference is clear and well expected.</p>
<h3 id="conclusion">Conclusion</h3>
<p>Thanks to libraries like decline, os-lib and fansi I found the development process to be productive and enjoyable. Native Image produces fast and small binaries. Scala as a language and an ecosystem lays stress on reliability and maintainability. I think all those factors combined make for a good choice for writing CLI tools.</p>
<p>There were some serious setbacks like problems with lack of <code class="highlighter-rouge">native-image.properties</code> or setting up build on Windows. Yet I believe they are mostly one-off issues, which if solved for the first time should not appear in next projects.</p>
<p>Of course, no amount of blog posts will make a real difference - we will see if Scala starts being an attractive choice for CLIs by number and quality of projects. Time will tell but I feel the bright future might be ahead of us.</p>
<h3 id="references-and-interesting-links">References and interesting links</h3>
<ul>
<li><a href="https://github.com/note/teleport-scala">Source code</a> of teleport-scala (i.e. code described in this blog post)</li>
<li><a href="http://www.lihaoyi.com/post/HowtoworkwithFilesinScala.html">How to work with Files in Scala</a> and <a href="http://www.lihaoyi.com/post/HowtoworkwithSubprocessesinScala.html">How to work with Subprocesses in Scala</a> are must reads if you want to use os-lib</li>
<li><a href="https://github.com/remkop/picocli">Picocli</a> - a modern framework for building powerful, user-friendly, GraalVM-enabled command line apps. It’s written in Java and its API, from what I’ve seen, is not something we would consider functional. However, it is amazingly rich with features. Some that caught my eye are:
<ul>
<li>an annotation processor that automatically Graal-enables your JAR during compilation. Look for <code class="highlighter-rouge">picocli-codegen</code></li>
<li>tab autocompletion</li>
<li>generating manpage</li>
</ul>
</li>
<li>And if you want to see Scala application using Picocli then follow <a href="https://medium.com/@takezoe/creating-cli-tools-with-scala-picocli-and-graalvm-ffde05bbd01d">Creating CLI tools with Scala, Picocli and GraalVM</a> - published just a day before I released that blog post</li>
<li>There are some good alternatives to <code class="highlighter-rouge">decline</code>: <a href="https://github.com/alexarchambault/case-app">case-app</a> and <a href="https://github.com/scopt/scopt">scopt</a></li>
<li><a href="https://github.com/battermann/pureapp">pureapp</a> - a library for writing referentially transparent and stack-safe sequential programs. Its scope is much bigger than what I was interested in. If you care about managing state of your CLI app in purely functional matter check it out.</li>
<li><a href="https://monkey.org/~marius/unix-tools-hints.html">Hints for writing UNIX tools</a> - language agnostic guideline on how to create composable command line tools</li>
<li><a href="https://msitko.pl/blog/2020/03/05/native-image-on-windows.html">Building Windows executables with GraalVM on Travis CI</a></li>
<li><a href="https://medium.com/graalvm/updates-on-class-initialization-in-graalvm-native-image-generation-c61faca461f7">Updates on Class Initialization in GraalVM Native Image Generation</a> explains static initialization in Native Image</li>
<li><a href="https://medium.com/graalvm/instant-netty-startup-using-graalvm-native-image-generation-ed6f14ff7692#4271">Instant Netty Startup …</a> is a good read to understand better limitations of Native Image</li>
<li>A few other articles about building Scala code with Native Image were published: <a href="https://blog.softwaremill.com/small-fast-docker-images-using-graalvms-native-image-99c0bc92e70b">one focusing on building lightweight docker images</a> and <a href="https://www.inner-product.com/posts/serverless-scala-services-with-graalvm/">another one focusing on http-4s</a></li>
<li>Presentation by by Francois Farquet: <a href="https://www.youtube.com/watch?v=vgivXdtFBXw">Run Programs Faster With GraalVM</a></li>
<li>There’s <a href="https://www.scala-sbt.org/sbt-native-packager/formats/graalvm-native-image.html">GraalVM Native Image Plugin</a> available as part of sbt-native-packager. I haven’t used it here just because I was learning <code class="highlighter-rouge">native-image</code> and I prefered to use it directly</li>
</ul>We’ve been always told that writing CLIs in Scala is not a good idea: memory consumption, slow startup, JIT warm-up and prerequisite of having JRE installed made this idea not seem appealing.Building Windows executables with GraalVM on Travis CI2020-03-05T10:30:00+01:002020-03-05T10:30:00+01:00http://msitko.pl/blog/2020/03/05/native-image-on-windows<p>I am not very well acquainted with Windows and I spent an entire day figuring out how to use GraalVM Native Image on Windows and I know how frustrating that process can be. Given how uncomplete the documentation for Windows is and how many people are confused about it on forums I think writing about it may save unnecessary effort and frustration.</p>
<p>I will present how to do it both on Windows desktop and programmatically on a remote Windows environment, which matters for CI. Since Native Image does not support cross-compilation you need to compile your application on Windows machine if you want to distribute Windows binaries. I used Travis CI <a href="https://docs.travis-ci.com/user/reference/windows">Windows environment</a> for that purpose.</p>
<p>When using Native Image you should be aware of its limitations in regards to the use of reflection or <a href="https://medium.com/graalvm/updates-on-class-initialization-in-graalvm-native-image-generation-c61faca461f7">static initialization</a>. In this article I assume you made Native Image work for your input JAR on Linux or Mac and your problem is specifically with Windows.</p>
<p>I will use GraalVM 20.0.0. Starting from <a href="https://www.graalvm.org/docs/release-notes/20_0/">20.0.0</a>:</p>
<blockquote>
<p>Windows is no longer an experimental platform in the GraalVM ecosystem. Windows builds now contain the functional gu utility to install the components. GraalVM Native Image component needs to be installed with gu as on other platforms.</p>
</blockquote>
<p>Additionally, starting from 20.0.0, you can find GraalVM artifacts for Windows at <a href="https://github.com/graalvm/graalvm-ce-builds/releases">Github releases</a>, as well as for other platforms. In this guide I will install GraalVM JDK with <a href="https://sdkman.io/">SKDMAN</a> but it’s good to be aware of the alternative.</p>
<h3 id="prerequisites">Prerequisites</h3>
<ul>
<li><a href="https://chocolatey.org/docs/installation">chocolatey</a> - package manager for Windows</li>
<li><a href="https://gitforwindows.org/">Git Bash</a> - we’re not even interested that much in git as we are in bash</li>
</ul>
<p><strong>Note for Travis CI:</strong> all prerequisites are already installed on Travis CI’s <a href="https://docs.travis-ci.com/user/reference/windows/">Windows environment</a> and <strong>Git Bash is <a href="https://docs.travis-ci.com/user/reference/windows/#git-bash">the shell</a> that’s used to run your build</strong>. So you don’t need to do anything here.</p>
<h3 id="install-java-11-based-graalvm">Install Java 11 based GraalVM</h3>
<p><em>All bash snippets are supposed to be run from Git Bash unless I specifically note any other one</em>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>choco install zip unzip
choco install visualstudio2017-workload-vctools
curl -sL https://get.sdkman.io | bash
mkdir -p "$HOME/.sdkman/etc/"
echo sdkman_auto_answer=true > "$HOME/.sdkman/etc/config"
echo sdkman_auto_selfupdate=true >> "$HOME/.sdkman/etc/config"
"source $HOME/.sdkman/bin/sdkman-init.sh"
sdk install java 20.0.0.r11-grl
</code></pre></div></div>
<p><code class="highlighter-rouge">zip</code> and <code class="highlighter-rouge">unzip</code> are needed by SDKMAN. <code class="highlighter-rouge">visualstudio2017-workload-vctools</code> to get Windows compiler toolchain which will be used by <code class="highlighter-rouge">native-image</code>. Then we installed SDKMAN to use it for installing <code class="highlighter-rouge">20.0.0.r11-grl</code>. <code class="highlighter-rouge">r11</code> stands for Java 11 JDK and <code class="highlighter-rouge">grl</code> stands for <code class="highlighter-rouge">GraalVM</code>.</p>
<h3 id="install-native-image">Install native-image</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gu.cmd install native-image
</code></pre></div></div>
<h3 id="build-binary-using-native-image">Build binary using native-image</h3>
<p>Let’s try to build hello-world program written in Java. I prepared <a href="https://github.com/note/blog-examples/tree/master/native-image-on-windows">sbt project</a> with hello world application and built jar with <code class="highlighter-rouge">sbt assembly</code>. It does not really matter for the essence of the article how you built your JAR.</p>
<p>Now it’s time to finally run <code class="highlighter-rouge">native-image</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>native-image.cmd --verbose --static --no-fallback -H:+ReportExceptionStackTraces -jar Main.jar main
</code></pre></div></div>
<p>This is roughly an output you will receive:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>...
[main:12556] classlist: 1,145.18 ms, 1.00 GB
[main:12556] (cap): 113.18 ms, 1.00 GB
[main:12556] setup: 581.30 ms, 1.00 GB
Error: Unable to compile C-ABI query code. Make sure native software development toolchain is installed on your system.
</code></pre></div></div>
<p>If you inspect stack traces close you will also find:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Caused by: java.io.IOException: Cannot run program "CL" (in directory "C:\Users\user\AppData\Local\Temp\SVM-697645000753775759"): CreateProcess error=2, The system cannot find the file specified
</code></pre></div></div>
<p>Seems like native-image misses <code class="highlighter-rouge">CL</code> command and if we try to run ‘CL’ in Git Bash:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ CL
bash: CL: command not found
</code></pre></div></div>
<p>So our task now is to bring <code class="highlighter-rouge">CL</code> into scope.</p>
<p>We installed <code class="highlighter-rouge">visualstudio2017-workload-vctools</code> at the beginning so the toolchain is installed but it’s neither available from Git Bash nor from standard <code class="highlighter-rouge">cmd</code>. It took some time and desperation to eventually find <a href="https://github.com/oracle/graal/issues/2116#issuecomment-590468544">the answer</a> at Github that you need to run <code class="highlighter-rouge">x64 Native Tools Command Prompt for VS 2017</code>. That means that on Windows desktop you need to press Windows key and type in <code class="highlighter-rouge">x64 Native Tools Command Prompt for VS 2017</code>. On Travis CI you need to execute <code class="highlighter-rouge">C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Auxiliary\Build\vcvars64.bat</code>.</p>
<p>Let’s try to do that. Since Git bash replaces Windows back slashes into Linux slashes we can do the following:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ /c/Program\ Files\ \(x86\)/Microsoft\ Visual\ Studio/2017/BuildTools/VC/Auxiliary/Build/vcvars64.bat
**********************************************************************
** Visual Studio 2017 Developer Command Prompt v15.0
** Copyright (c) 2017 Microsoft Corporation
**********************************************************************
[vcvarsall.bat] Environment initialized for: 'x64'
</code></pre></div></div>
<p>Looks promising but we quickly realize that <code class="highlighter-rouge">CL</code> is still not available:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ CL
bash: CL: command not found
</code></pre></div></div>
<p>If we try to do the same in <code class="highlighter-rouge">cmd</code> (i.e. press Windows key and type in <code class="highlighter-rouge">cmd</code>) then:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>call "C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Auxiliary\Build\vcvars64.bat"
**********************************************************************
** Visual Studio 2017 Developer Command Prompt v15.0
** Copyright (c) 2017 Microsoft Corporation
**********************************************************************
[vcvarsall.bat] Environment initialized for: 'x64'
>CL
Microsoft (R) C/C++ Optimizing Compiler Version 19.16.27035 for x64
Copyright (C) Microsoft Corporation. All rights reserved.
usage: cl [ option... ] filename... [ /link linkoption... ]
</code></pre></div></div>
<p>That looks really good! How to achieve that on Travis CI environment? We can create <code class="highlighter-rouge">build.bat</code> in which we can write in Windows Batch. Create file <code class="highlighter-rouge">build.bat</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>call "C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Auxiliary\Build\vcvars64.bat"
%HOME%/.sdkman/candidates/java/current/bin/native-image.cmd --verbose --static --no-fallback -H:+ReportExceptionStackTraces -jar Main.jar main
</code></pre></div></div>
<p>There was one more problem to solve on the way - how to invoke <code class="highlighter-rouge">native-image.cmd</code> from standard, non Git Bash, console. I utilized the sdkman directory structure and used <code class="highlighter-rouge">%HOME%/.sdkman/candidates/java/current/bin/native-image.cmd</code>.</p>
<p>Now we can call <code class="highlighter-rouge">./build.bat</code> from Git Bash. If we do so and if everything went well the last lines of output are expected to be:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>...
[main:15708] image: 711.60 ms, 1.92 GB
[main:15708] write: 379.96 ms, 1.92 GB
[main:15708] [total]: 20,697.84 ms, 1.92 GB
</code></pre></div></div>
<p>Now you should be able to run the result:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./main.exe
Hello world!
</code></pre></div></div>
<h3 id="making-it-work-on-fresh-windows-system">Making it work on fresh Windows system</h3>
<p>It may look like everything worked but as noted <a href="https://github.com/remkop/picocli/blob/455fa3564c5a4c5f9780eabb123f2f9f3ed035ff/docs/build-great-native-cli-apps-in-java-with-graalvm-and-picocli.adoc#running-native-images-on-windows">here</a> if you run this executable on systems without Visual Studio toolchain it will fail with:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>The code execution cannot proceed because VCRUNTIME140.dll was not found.
</code></pre></div></div>
<p>You can fix it by installing <code class="highlighter-rouge">Microsoft Visual C++ 2017 Redistributable</code> or by redistributing <code class="highlighter-rouge">VCRUNTIME140.dll</code> next to your executable. As long both an executable and a dll file are in the same folder it should work. According to Microsoft <a href="https://docs.microsoft.com/en-us/visualstudio/productinfo/2017-redistribution-vs#visual-c-runtime-files">docs</a> it’s fine from license point of view (as <code class="highlighter-rouge">VCRUNTIME140.dll</code> is distributed in [VisualStudioFolder]/VC\Redist\MSVC[version]\x86).</p>
<p>To be sure you executable works without any dependencies I suggest running it on fresh Windows installation using a virtual machine.</p>
<h3 id="distributing-binary-using-github-releases-page">Distributing binary using Github releases page</h3>
<p>Since a single file is not enough we will create a zip containing <code class="highlighter-rouge">VCRUNTIME140.dll</code> side by side to <code class="highlighter-rouge">exe</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cp /c/Windows/System32/VCRUNTIME140.dll .
zip main.zip main.exe VCRUNTIME140.dll
</code></pre></div></div>
<p>Deploy part of <code class="highlighter-rouge">.travis.yml</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>deploy:
provider: releases
api_key: $GITHUB_TOKEN
file: main.zip
skip_cleanup: true
on:
tags: true
</code></pre></div></div>
<p>To make it work you need to:</p>
<ul>
<li>generate <code class="highlighter-rouge">Personal access token</code> at <a href="https://github.com/settings/tokens">https://github.com/settings/tokens</a></li>
<li>add environment variable <code class="highlighter-rouge">GITHUB_TOKEN</code> at Travis CI project settings and set it to token value from the previous point</li>
</ul>
<h3 id="repository-with-a-reproducer">Repository with a reproducer</h3>
<p>As a working code is worth a thousand words I prepared a <a href="https://github.com/note/native-image-windows-travis">repository</a> containing all you need to reproduce the solution.</p>
<h3 id="references">References</h3>
<p>I found those two articles especially useful: <a href="https://github.com/remkop/picocli/blob/455fa3564c5a4c5f9780eabb123f2f9f3ed035ff/docs/build-great-native-cli-apps-in-java-with-graalvm-and-picocli.adoc">Build great native CLI apps in Java with GraalVM and Picocli</a> and <a href="https://happylynx.github.io/2019/04/30/graalvm-native-image-on-windows.html">GraalVM native image on Windows</a>.</p>
<p>You can track the discussion on cross compilation support in Native Image <a href="https://github.com/oracle/graal/issues/407">here</a>.</p>I am not very well acquainted with Windows and I spent an entire day figuring out how to use GraalVM Native Image on Windows and I know how frustrating that process can be. Given how uncomplete the documentation for Windows is and how many people are confused about it on forums I think writing about it may save unnecessary effort and frustration.Haskell for impatient Scala developer: Getting into speed2020-02-08T09:00:00+01:002020-02-08T09:00:00+01:00http://msitko.pl/blog/2020/02/08/haskell-getting-into-speed<p>If you’re reading this I am assuming you are a Scala developer and you want to learn some Haskell. I got 2 news for you - a good one and a bad one. The good one is that there are plenty of Haskell resources available. The bad one is that there is not a single one targeting Scala developers specifically.</p>
<p>Why would lack of Haskell resources for Scala developer bother anyone? The thing is that as a Scala developer you know a lot of concepts already. You know what is monad and applicative, you folded through a list many times and you’re not scared of immutable collections. Therefore going through regular Haskell tutorials or books feels slow and is not very engaging because they assume you are starting from zero.</p>
<p>I prefer learning by practice so I tried to use Haskell for some side projects. However, I realized that I miss a single succinct Scala to Haskell cheat sheet that I could glance at when being at doubt about basic parts. This article is not such a spreadsheet - I started working on this <a href="https://note.github.io/scala-to-haskell-cheatsheet">here</a>. You shouldn’t rely on such superficial knowledge for too long - after all that Scala analogies are not 100% accurate. The point of them is getting you into speed.</p>
<p>This article aim is to walk you through essential parts of learning a new language that are often missed: which build tool you should use, how to create a new project or how to add an external dependency. Some of those things function differently from their Scala counterparts and I will try to stress them out.</p>
<p>The whole code presented here is available in the <a href="https://github.com/note/blog-examples/tree/master/haskell-getting-into-speed">repository</a>. It also contains the Scala equivalent of the application we build.</p>
<h3 id="what-is-our-sample-applications-supposed-to-do">What is our sample applications supposed to do</h3>
<p>It will be a simple command line application that given such file as input:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="nl">"tag"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Comment"</span><span class="p">,</span><span class="w"> </span><span class="nl">"blogPostId"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="nl">"content"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Some comment"</span><span class="p">},</span><span class="w">
</span><span class="p">{</span><span class="nl">"tag"</span><span class="p">:</span><span class="w"> </span><span class="s2">"BlogPost"</span><span class="p">,</span><span class="w"> </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="nl">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Some blog post"</span><span class="p">,</span><span class="w"> </span><span class="nl">"summary"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Some post"</span><span class="p">},</span><span class="w">
</span><span class="p">{</span><span class="nl">"tag"</span><span class="p">:</span><span class="w"> </span><span class="s2">"BlogPost"</span><span class="p">,</span><span class="w"> </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="nl">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Another blog post"</span><span class="p">,</span><span class="w"> </span><span class="nl">"summary"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Another post"</span><span class="p">},</span><span class="w">
</span><span class="p">{</span><span class="nl">"tag"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Comment"</span><span class="p">,</span><span class="w"> </span><span class="nl">"blogPostId"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="nl">"content"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Another comment"</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span></code></pre></div></div>
<p>will print out the following output:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="nl">"tag"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Comment"</span><span class="p">,</span><span class="w"> </span><span class="nl">"blogPostId"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="nl">"content"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Some comment"</span><span class="p">},</span><span class="w">
</span><span class="p">{</span><span class="nl">"tag"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Comment"</span><span class="p">,</span><span class="w"> </span><span class="nl">"blogPostId"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="nl">"content"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Another comment"</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span></code></pre></div></div>
<p>It’s basically: parse JSON, filter out some of items, print the result as JSON. Although very simple, it requires external dependency for working with JSON.</p>
<p>The first step of developing any application is creating initial directory structure so it adheres to build tool expectations. And as we talk about it - you need a build tool.</p>
<h3 id="our-build-tool-of-choice---stack">Our build tool of choice - stack</h3>
<p>We need an <code class="highlighter-rouge">sbt</code> analogue for Haskell. The two most popular choices in Haskell are <code class="highlighter-rouge">cabal</code> and <code class="highlighter-rouge">stack</code>. I decided to stick to <code class="highlighter-rouge">stack</code> for this article.</p>
<p><code class="highlighter-rouge">stack</code> is built on top of <code class="highlighter-rouge">ghci</code>, <code class="highlighter-rouge">cabal</code> and <code class="highlighter-rouge">hackage</code> and tries to provide better developer experience than using those tools directly. You can read more <a href="https://docs.haskellstack.org/en/stable/README/#why-stack">here</a>.</p>
<h3 id="ide---intellij-with-intellij-haskell-plugin">IDE - IntelliJ with IntelliJ-Haskell plugin</h3>
<p>You also need a code editor. As I am coming from Scala I have used IntelliJ on a daily basis for a few years. There’s an <a href="https://plugins.jetbrains.com/plugin/8258-intellij-haskell">IntelliJ-Haskell plugin</a> which “just works”.</p>
<p>The choice of both build tool and IDE is debatable and highly opinionated. Right now, however, we don’t care about the best option - we just want to get started. And I believe this setup resembles a typical Scala setup.</p>
<p>Let’s start!</p>
<h3 id="bootstrapping-the-project">Bootstrapping the project</h3>
<p>First, install stack according to this <a href="https://docs.haskellstack.org/en/stable/install_and_upgrade/">instruction</a>. Then, we need to bootstrap a new project. In Scala you may have used <code class="highlighter-rouge">sbt new</code> to do that. In case of stack the command happens to be called the same; therefore:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> stack new haskell-introduction
</code></pre></div></div>
<p><code class="highlighter-rouge">haskell-introduction</code> is the name of the new project; it will be used as a directory name too. As we have passed only one argument to <code class="highlighter-rouge">stack new</code> the default template will be used. After the command completed we should see something like this:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> tree haskell-introduction
haskell-introduction
├── app
│ └── Main.hs
├── ChangeLog.md
├── haskell-introduction.cabal
├── LICENSE
├── package.yaml
├── README.md
├── Setup.hs
├── src
│ └── Lib.hs
├── stack.yaml
└── test
└── Spec.hs
3 directories, 10 files
</code></pre></div></div>
<p>All <code class="highlighter-rouge">*.hs</code> files are Haskell sources. The most relevant files to the build definition are <code class="highlighter-rouge">package.yaml</code> and <code class="highlighter-rouge">stack.yaml</code>. In simplistic terms, <code class="highlighter-rouge">package.yaml</code> corresponds to <code class="highlighter-rouge">build.sbt</code> as it defines the project we build whereas <code class="highlighter-rouge">stack.yaml</code> control stack-related settings - things we would expect in <code class="highlighter-rouge">project</code> directory for <code class="highlighter-rouge">sbt</code>-based projects. We will only touch <code class="highlighter-rouge">package.yaml</code> in scope of this article.</p>
<p>Generated project comes with the functionality of printint out hardcoded string. Let run it with <code class="highlighter-rouge">stack exec</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> cd haskell-introduction # enter directory created by `stack new`
> stack build && stack exec haskell-introduction-exe
somefunc
</code></pre></div></div>
<p>If you see <code class="highlighter-rouge">somefunc</code> in your terminal too you’re now good to open the project in the IDE. Start with installing IntelliJ-Haskell according to <a href="https://github.com/rikvdkleij/intellij-haskell/blob/master/README.md#getting-started">getting started section</a>. This document also describes in detail how to open a new project. For the first project it includes some extra steps like configuring Project SDK so I suggest to read it carefully.</p>
<p>If you installed the plugin and opened the project you should observe no errors in the IDE and things such as highlighting, code completion and navigating to the definition should function properly.</p>
<h3 id="what-does-stack-actually-do">What does stack actually do?</h3>
<p>Let’s step back to understand what exactly happens when we <code class="highlighter-rouge">stack exec</code>.</p>
<p>One of things <code class="highlighter-rouge">stack</code> does is providing a compiler - <code class="highlighter-rouge">ghc</code>. I don’t have it installed on my system:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> which ghc
ghc not found
</code></pre></div></div>
<p>Yet, it is available to stack:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> stack exec -- which ghc
/home/michal/.stack/programs/x86_64-linux/ghc-tinfo6-8.6.5/bin/ghc
</code></pre></div></div>
<p>As you can see stack stores binaries that may be shared between projects in <code class="highlighter-rouge">$HOME/.stack</code>. This directory is not supposed to be on <code class="highlighter-rouge">$PATH</code> but <code class="highlighter-rouge">stack exec</code> is aware of artifacts stored there and it can resolve command to proper binary. <em>While a single <code class="highlighter-rouge">ghc</code> binary might be reused between projects it happens if and only if those projects’ declared version of <code class="highlighter-rouge">ghc</code> are the same. Different versions of ghc can be used in different projects without any issues.</em></p>
<p>And how about project related binaries like previously used <code class="highlighter-rouge">haskell-introduction-exe</code>? Let’s check it out:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> stack exec -- which haskell-introduction-exe
/home/michal/haskell-introduction/.stack-work/install/x86_64-linux-tinfo6/dd28ee69e237c048a9ddc4736a23ba5aabe5c6075009ccddf23dd601e1f9f4d6/8.6.5/bin/haskell-introduction-exe
</code></pre></div></div>
<p>The output tell us that stack stores project related binaries in <code class="highlighter-rouge">$PWD/.stack-work</code> directory.</p>
<p><em>Why sometimes we <code class="highlighter-rouge">stack exec command</code> and sometimes <code class="highlighter-rouge">stack exec -- command</code>? The former works only if command following it doesn’t contain any whitespace while the latter works for any command.</em></p>
<h3 id="stack-run">stack run</h3>
<p>If you need to simply run your project as we know it from <code class="highlighter-rouge">sbt run</code> then keep in mind that <code class="highlighter-rouge">stack exec</code> does not rebuild a project. That’s why we had to <code class="highlighter-rouge">stack build && stack exec haskell-introduction-exe</code>. Also, you need to pass the name of the executable (<code class="highlighter-rouge">haskell-introduction-exe</code> in our case) which depends on the project. Fortunately some time ago stack <a href="https://github.com/commercialhaskell/stack/pull/3952/files">introduced</a> <code class="highlighter-rouge">stack run</code> which we will use from now on to rebuild and run the project.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> stack purge && stack run # stack purge just to show that run triggers build
...
someFunc
</code></pre></div></div>
<h3 id="repl">REPL</h3>
<p>You can also run your code from the REPL. <code class="highlighter-rouge">ghci</code> is the default REPL distributed together with <code class="highlighter-rouge">ghc</code>. Similarly to ghc you don’t need to install it on your system - it will be fetched by stack based on your project definition.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> stack ghci
...
Ok, two modules loaded.
Loaded GHCi configuration from /tmp/haskell-stack-ghci/e5db0fdf/ghci-script
λ someFunc
someFunc
λ
</code></pre></div></div>
<p>If you just installed stack you probably see a different prompt. I configured it to <code class="highlighter-rouge">λ </code> and I will use it in snippets in this article to distinguish ghci code from bash commands, for which I use <code class="highlighter-rouge">></code> as prompt.</p>
<p>It’s important to note that <code class="highlighter-rouge">stack ghci</code> rebuilds your project and you can access your code from there. It gives a powerful way of tinkering with the code. If you find <code class="highlighter-rouge">ghci</code> input mode too limiting or need more of IDE support you can write your function in the file, rerun <code class="highlighter-rouge">ghci</code> and run the function. And all it feels close to immediate.</p>
<h3 id="adding-external-build-dependency">Adding external build dependency</h3>
<p>Let’s get back to the initial task of parsing JSON. A popular choice for JSON library in Haskell ecosystem is <code class="highlighter-rouge">aeson</code>. I think it’s safe to compare it to <code class="highlighter-rouge">circe</code>, both in terms of popularity and how it actually works.</p>
<p>The only thing you need to do to add a dependency is changing <code class="highlighter-rouge">package.yaml</code> so its dependencies section looks like that:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dependencies:
- base >= 4.7 && < 5
- aeson # The only new line
</code></pre></div></div>
<p>Looks neat but where is the organization name? - you may ask. And more importantly - where is the version specified?</p>
<p>To be able to explain how stack manages dependencies I need to mention two components: Hackage and Stackage. Hackage is a package repository of Haskell packages and it contains more than thousand open source libraries. You can think of it as Maven Central Repository for Haskell.</p>
<p>Stackage, according to <a href="https://docs.haskellstack.org/en/stable/README/#why-stack">the docs</a>, is:</p>
<blockquote>
<p>a curated set of packages from Hackage which are regularly tested for compatibility. Stack defaults to using Stackage package sets to avoid dependency problems.</p>
</blockquote>
<p>There’s no counterpart of Stackage in Scala environment and I think it’s pretty unusual concept for language specific build tool. However, it’s very common concept in OS package managers so you can think of it as nix channels or debian releases.</p>
<p>Let’s see how it works. First, we need to understand which Stackage <code class="highlighter-rouge">resolver</code> we use in our project. We can determine that by checking <code class="highlighter-rouge">stack.yaml</code> file in which we can find:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>resolver: lts-14.22
</code></pre></div></div>
<p>Now we can go to <a href="https://www.stackage.org/lts-14.22">https://www.stackage.org/lts-14.22</a> to see what packages in what versions are available for the resolver in use. Here’s the <a href="https://www.stackage.org/lts-14.22/hoogle?q=aeson">result</a> of searching for <code class="highlighter-rouge">aeson</code> and clicking on the first entry redirects us to <a href="https://www.stackage.org/lts-14.22/package/aeson-1.4.6.0">https://www.stackage.org/lts-14.22/package/aeson-1.4.6.0</a>. Therefore, we should expect aeson of version <code class="highlighter-rouge">1.4.6.0</code> to be used.</p>
<p>Let’s try it out then: (the only file we changed after last build was <code class="highlighter-rouge">package.yaml</code>)</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> stack build
</code></pre></div></div>
<p>The output on a system with just-installed stack will be quite big. A few selected lines:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dlist > configure
dlist > Configuring dlist-0.8.0.7...
dlist > build
dlist > Preprocessing library for dlist-0.8.0.7..
dlist > Building library for dlist-0.8.0.7..
dlist > [1 of 1] Compiling Data.DList
dlist > copy/register
dlist > Installing library in /home/michal/.stack/snapshots/x86_64-linux-tinfo6/dd28ee69e237c048a9ddc4736a23ba5aabe5c6075009ccddf23dd601e1f9f4d6/8.6.5/lib/x86_64-linux-ghc-8.6.5/dlist-0.8.0.7-62vR0IWGKydvDRbWJTrKt
dlist > Registering library for dlist-0.8.0.7..
...
aeson > Registering library for aeson-1.4.6.0..
...
</code></pre></div></div>
<p>The key observation here is that compiler on my machine actually compiled DList. And I haven’t even asked for Dlist - it’s being compiled because it’s a transitive dependency of aeson.</p>
<p><strong>One of crucial differences between stack and sbt (or rather between Haskell ecosystem and JVM ecosystem) is that libraries are distributed as source code as opposed to prebuilt JARs with bytecode</strong>. That means that stack needs to build aeson from source. More than this - it needs to build all aeson’s dependencies too - that’s why we see <code class="highlighter-rouge">dlist</code> in the above output. Keep that fact in mind whenever you are surprised why your tiny app compiles too long - it’s probably the dependencies being compiled. Compiled libraries are stored in <code class="highlighter-rouge">$HOME/.stack</code> so you will not pay the price for each compilation.</p>
<h3 id="defining-adt">Defining ADT</h3>
<p>We will be working with ADT that can be expressed in Scala as the following: (<a href="https://github.com/note/blog-examples/blob/master/haskell-getting-into-speed/scala/src/main/scala/example/entities/Entities.scala">full source</a>)</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">sealed</span> <span class="k">trait</span> <span class="nc">Activity</span>
<span class="k">final</span> <span class="k">case</span> <span class="k">class</span> <span class="nc">BlogPost</span><span class="o">(</span><span class="n">id</span><span class="k">:</span> <span class="kt">Int</span><span class="o">,</span> <span class="n">title</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">summary</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span> <span class="k">extends</span> <span class="nc">Activity</span>
<span class="k">final</span> <span class="k">case</span> <span class="k">class</span> <span class="nc">Comment</span><span class="o">(</span><span class="n">blogPostId</span><span class="k">:</span> <span class="kt">Int</span><span class="o">,</span> <span class="n">content</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span> <span class="k">extends</span> <span class="nc">Activity</span>
</code></pre></div></div>
<p>It translates to the following Haskell code: (<code class="highlighter-rouge">Lib.hs</code> file - <a href="https://github.com/note/blog-examples/blob/master/haskell-getting-into-speed/haskell-introduction/src/Lib.hs">full source</a>)</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">module</span> <span class="nn">Lib</span> <span class="p">(</span> <span class="kt">Activity</span><span class="p">(</span><span class="kt">BlogPost</span><span class="p">,</span> <span class="kt">Comment</span><span class="p">)</span> <span class="p">)</span> <span class="kr">where</span>
<span class="kr">data</span> <span class="kt">Activity</span> <span class="o">=</span> <span class="kt">BlogPost</span> <span class="p">{</span> <span class="n">id</span> <span class="o">::</span> <span class="kt">Int</span>
<span class="p">,</span> <span class="n">title</span> <span class="o">::</span> <span class="kt">String</span>
<span class="p">,</span> <span class="n">summary</span> <span class="o">::</span> <span class="kt">String</span>
<span class="p">}</span>
<span class="o">|</span> <span class="kt">Comment</span> <span class="p">{</span> <span class="n">blogPostId</span> <span class="o">::</span> <span class="kt">Int</span>
<span class="p">,</span> <span class="n">content</span> <span class="o">::</span> <span class="kt">String</span>
<span class="p">}</span>
</code></pre></div></div>
<p>ADTs by themselves are a good topic for a separate article so I will not go into details here. Let’s just try out to create instance of Comment in ghci:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> :t Comment
Comment :: Int -> String -> Activity
> let c = Comment 3 "awesome comment"
> :force c
c = <Comment> 3 "awesome comment"
</code></pre></div></div>
<p>Please mind that <code class="highlighter-rouge">Comment</code> return type is <code class="highlighter-rouge">Activity</code>. That’s because data constructors (<code class="highlighter-rouge">BlogPost</code> and <code class="highlighter-rouge">Comment</code>) are not types but only functions.</p>
<h3 id="derive-json-type-classes">Derive JSON type classes</h3>
<p>To be able to translate our ADT to JSON and back we need to have proper type class instances. In case of Scala we need to annotate trait <code class="highlighter-rouge">Activity</code> to derive its circe Encoder and Decoder: (<a href="https://github.com/note/blog-examples/blob/master/haskell-getting-into-speed/scala/src/main/scala/example/entities/Entities.scala">full source</a>)</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@ConfiguredJsonCodec</span>
<span class="k">sealed</span> <span class="k">trait</span> <span class="nc">Activity</span>
<span class="k">object</span> <span class="nc">Activity</span> <span class="o">{</span>
<span class="k">implicit</span> <span class="k">val</span> <span class="nv">config</span><span class="k">:</span> <span class="kt">Configuration</span> <span class="o">=</span>
<span class="nv">Configuration</span><span class="o">.</span><span class="py">default</span><span class="o">.</span><span class="py">withDiscriminator</span><span class="o">(</span><span class="s">"tag"</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">...</span>
</code></pre></div></div>
<p>We can do the same in Haskell: (<code class="highlighter-rouge">Lib.hs</code> file - <a href="https://github.com/note/blog-examples/blob/master/haskell-getting-into-speed/haskell-introduction/src/Lib.hs">full source</a>):</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">{-# LANGUAGE DeriveGeneric #-}</span>
<span class="o">...</span>
<span class="kr">data</span> <span class="kt">Activity</span> <span class="o">=</span> <span class="o">...</span>
<span class="o">|</span> <span class="kt">Comment</span> <span class="p">{</span> <span class="n">blogPostId</span> <span class="o">::</span> <span class="kt">Int</span>
<span class="p">,</span> <span class="n">content</span> <span class="o">::</span> <span class="kt">String</span>
<span class="p">}</span>
<span class="kr">deriving</span> <span class="p">(</span><span class="kt">Generic</span><span class="p">,</span> <span class="kt">Show</span><span class="p">)</span>
<span class="kr">instance</span> <span class="kt">ToJSON</span> <span class="kt">Activity</span>
<span class="kr">instance</span> <span class="kt">FromJSON</span> <span class="kt">Activity</span>
</code></pre></div></div>
<p>Having instances derived we can try to use them from <code class="highlighter-rouge">ghci</code>. Let’s find out the type of <code class="highlighter-rouge">Data.Aeson.encode</code> first:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>*Lib Lib> import Data.Aeson
*Lib Lib Data.Aeson> :t encode
encode
:: ToJSON a =>
a -> bytestring-0.10.8.2:Data.ByteString.Lazy.Internal.ByteString
</code></pre></div></div>
<p>In our application we intend to use <code class="highlighter-rouge">putStrLn</code> which is of type <code class="highlighter-rouge">String -> IO ()</code>. Then, we need to find a function <code class="highlighter-rouge">ByteString -> String</code>. As any problem we can “google it”. Alternatively, in case of Haskell, we can also “hoogle it”. <a href="https://hoogle.haskell.org/">Hoogle</a> is a Haskell API search engine which allows you to search for Haskell functions by function name or by type signature.</p>
<p>Therefore you can just look for <code class="highlighter-rouge">Data.ByteString.Lazy.Internal.ByteString -> [Char]</code> (I took the first type from the ghci output). The only <a href="https://hackage.haskell.org/package/bytestring-0.10.10.0/docs/Data-ByteString-Lazy-Internal.html#v:unpackChars">result</a> suggests importing <code class="highlighter-rouge">import Data.ByteString.Lazy.Internal</code>. Let’s give it a try:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> import Data.ByteString.Lazy.Internal
<no location info>: error:
Could not load module ‘Data.ByteString.Lazy.Internal’
It is a member of the hidden package ‘bytestring-0.10.8.2’.
You can run ‘:set -package bytestring’ to expose it.
(Note: this unloads all the modules in the current scope.)
</code></pre></div></div>
<p>What is this “hidden package” message about? It happens when you try to use from your code a type or function which is defined in a transitive dependency, i.e. dependency you have on dependencies list but only as a result of other package depending on it.</p>
<p>It’s very easy to fix it - just add the dependency explicitly in <code class="highlighter-rouge">package.yaml</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dependencies:
- base >= 4.7 && < 5
- aeson
- bytestring # The only new line
</code></pre></div></div>
<p>This is another difference between sbt and stack: <strong>stack does not allow you to refer to code defined in transitive dependencies</strong>. <em>Although there is an sbt <a href="https://github.com/cb372/sbt-explicit-dependencies">plugin</a> to achieve the same behaviour in sbt too.</em></p>
<p>Now, with <code class="highlighter-rouge">bytestring</code> as explicit dependency we should be able to import its types and finally get encoded string:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> import Data.Aeson
> import Data.ByteString.Lazy.Internal
> unpackChars ( encode ( Comment 3 "awesome Comment" ) )
"{\"tag\":\"Comment\",\"blogPostId\":3,\"content\":\"awesome comment\"}"
</code></pre></div></div>
<p>Looks good although all those parenthesis look a bit clunky. We can get rid of them with using widely used pattern:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> unpackChars $ encode $ Comment 3 "awesome Comment"
"{\"tag\":\"Comment\",\"blogPostId\":3,\"content\":\"awesome comment\"}"
</code></pre></div></div>
<p>You can think of it as an opening parenthesis which is accompanied by an implicit closing parenthesis at the end of the line.</p>
<h2 id="final-solution">Final solution</h2>
<p>We’ve implemented the JSON part of the task. It’s time to write main function which will load JSON from file, filter parsed content and print out the result. In Scala it may look like this: (<a href="https://github.com/note/blog-examples/blob/master/haskell-getting-into-speed/scala/src/main/scala/example/Main.scala">full source</a>)</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">object</span> <span class="nc">Main</span> <span class="o">{</span>
<span class="k">def</span> <span class="nf">main</span><span class="o">(</span><span class="n">args</span><span class="k">:</span> <span class="kt">Array</span><span class="o">[</span><span class="kt">String</span><span class="o">])</span><span class="k">:</span> <span class="kt">Unit</span> <span class="o">=</span> <span class="o">{</span>
<span class="k">val</span> <span class="nv">activitiesEither</span> <span class="k">=</span> <span class="nf">parseFile</span><span class="o">(</span><span class="nv">Paths</span><span class="o">.</span><span class="py">get</span><span class="o">(</span><span class="s">"../input.json"</span><span class="o">).</span><span class="py">toFile</span><span class="o">).</span><span class="py">flatMap</span><span class="o">(</span><span class="nv">_</span><span class="o">.</span><span class="py">as</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">Activity</span><span class="o">]])</span>
<span class="k">val</span> <span class="nv">output</span> <span class="k">=</span> <span class="n">activitiesEither</span> <span class="k">match</span> <span class="o">{</span>
<span class="k">case</span> <span class="nc">Right</span><span class="o">(</span><span class="n">activities</span><span class="o">)</span> <span class="k">=></span> <span class="nf">process</span><span class="o">(</span><span class="n">activities</span><span class="o">)</span>
<span class="k">case</span> <span class="nc">Left</span><span class="o">(</span><span class="n">e</span><span class="o">)</span> <span class="k">=></span> <span class="n">s</span><span class="s">"Something went wrong: $e"</span>
<span class="o">}</span>
<span class="nf">println</span><span class="o">(</span><span class="n">output</span><span class="o">)</span>
<span class="o">}</span>
<span class="k">def</span> <span class="nf">process</span><span class="o">(</span><span class="n">activities</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">Activity</span><span class="o">])</span><span class="k">:</span> <span class="kt">String</span> <span class="o">=</span>
<span class="nf">onlyComments</span><span class="o">(</span><span class="n">activities</span><span class="o">).</span><span class="py">asJson</span><span class="o">.</span><span class="py">spaces2</span>
<span class="k">def</span> <span class="nf">onlyComments</span><span class="o">(</span><span class="n">activities</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">Activity</span><span class="o">])</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">Activity</span><span class="o">]</span> <span class="k">=</span>
<span class="nv">activities</span><span class="o">.</span><span class="py">filter</span><span class="o">(</span><span class="n">isComment</span><span class="o">)</span>
<span class="k">def</span> <span class="nf">isComment</span><span class="o">(</span><span class="n">a</span><span class="k">:</span> <span class="kt">Activity</span><span class="o">)</span><span class="k">:</span> <span class="kt">Boolean</span> <span class="o">=</span> <span class="n">a</span> <span class="k">match</span> <span class="o">{</span>
<span class="k">case</span> <span class="nc">Comment</span><span class="o">(</span><span class="k">_</span><span class="o">,</span> <span class="k">_</span><span class="o">)</span> <span class="k">=></span> <span class="kc">true</span>
<span class="k">case</span> <span class="k">_</span> <span class="k">=></span> <span class="kc">false</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>And here’s the Haskell counterpart: (<a href="https://github.com/note/blog-examples/blob/master/haskell-getting-into-speed/haskell-introduction/app/Main.hs">full source</a>)</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">module</span> <span class="nn">Main</span> <span class="kr">where</span>
<span class="kr">import</span> <span class="k">qualified</span> <span class="nn">Data.ByteString.Lazy.Internal</span> <span class="k">as</span> <span class="n">C</span>
<span class="kr">import</span> <span class="nn">Data.Aeson</span>
<span class="kr">import</span> <span class="nn">Data.List</span>
<span class="kr">import</span> <span class="nn">Control.Arrow</span>
<span class="kr">import</span> <span class="nn">Lib</span>
<span class="n">main</span> <span class="o">::</span> <span class="kt">IO</span> <span class="nb">()</span>
<span class="n">main</span> <span class="o">=</span> <span class="kr">do</span> <span class="n">activitiesEither</span> <span class="o"><-</span> <span class="n">eitherDecodeFileStrict</span> <span class="s">"../input.json"</span> <span class="o">::</span> <span class="kt">IO</span> <span class="p">(</span><span class="kt">Either</span> <span class="kt">String</span> <span class="p">[</span><span class="kt">Activity</span><span class="p">])</span>
<span class="kr">let</span> <span class="n">output</span> <span class="o">=</span> <span class="kr">case</span> <span class="n">activitiesEither</span> <span class="kr">of</span>
<span class="c1">-- It will not work properly for UTF-8 characters but for sake of a demonstration it's good enough</span>
<span class="p">(</span><span class="kt">Right</span> <span class="n">activities</span><span class="p">)</span> <span class="o">-></span> <span class="kt">C</span><span class="o">.</span><span class="n">unpackChars</span> <span class="o">$</span> <span class="n">process</span> <span class="n">activities</span>
<span class="p">(</span><span class="kt">Left</span> <span class="n">e</span><span class="p">)</span> <span class="o">-></span> <span class="s">"Something went wrong: "</span> <span class="o">++</span> <span class="n">e</span>
<span class="kr">in</span> <span class="p">(</span><span class="n">putStrLn</span> <span class="n">output</span><span class="p">)</span>
<span class="n">process</span> <span class="o">::</span> <span class="p">[</span><span class="kt">Activity</span><span class="p">]</span> <span class="o">-></span> <span class="kt">C</span><span class="o">.</span><span class="kt">ByteString</span>
<span class="n">process</span> <span class="n">activities</span> <span class="o">=</span> <span class="n">encode</span> <span class="o">$</span> <span class="n">onlyComments</span> <span class="n">activities</span>
<span class="n">onlyComments</span> <span class="o">::</span> <span class="p">[</span><span class="kt">Activity</span><span class="p">]</span> <span class="o">-></span> <span class="p">[</span><span class="kt">Activity</span><span class="p">]</span>
<span class="n">onlyComments</span> <span class="n">activites</span> <span class="o">=</span> <span class="n">filter</span> <span class="n">isComment</span> <span class="n">activites</span>
<span class="n">isComment</span> <span class="o">::</span> <span class="kt">Activity</span> <span class="o">-></span> <span class="kt">Bool</span>
<span class="n">isComment</span> <span class="p">(</span><span class="kt">Comment</span> <span class="kr">_</span> <span class="kr">_</span><span class="p">)</span> <span class="o">=</span> <span class="kt">True</span>
<span class="n">isComment</span> <span class="n">otherwise</span> <span class="o">=</span> <span class="kt">False</span>
</code></pre></div></div>
<p>I will not comment in detail the above Haskell snippet and I hope you can make sense of it just by comparing it to Scala snippet. What I want to draw your attention to is that there is nothing here that is foreign to an average Scala developer. <code class="highlighter-rouge">Either</code> type, <code class="highlighter-rouge">filter</code> on List, type class based encoder and decoder, IO monad - those are standard tools for Scala developer. It’s true that syntax and specifics of implementation differs but the ideas stay the same.</p>
<h2 id="where-to-go-next">Where to go next</h2>
<p>I am a Haskell beginner myself so I cannot offer you any definitive answer. And I highly doubt there is any definitive answer anyway. I can describe my current approach but I encourage you to determine your own learning strategy.</p>
<p>I solve exercises from <a href="https://adventofcode.com/2019">Advent of Code</a>. They are simple algorithmic problems, ones that make you proficient with control structures, syntax, and basic data structures. They are easy enough for me to not get stuck, even while learning a new language, but challenging enough not to get bored. The success criteria are clear and feedback after providing the answer is immediate.</p>
<p>While solving small coding exercises is fun and lets me familiarize myself with syntax it does not help in understanding how to work with libraries, networking, databases and all those bits that actually make programming difficult. Here I can wholeheartedly recommend an amazing <a href="https://vadosware.io/post/rest-ish-services-in-haskell-part-1/">REST-ish Services in Haskell tutorial</a>. It includes parsing command line arguments, config file, implementing REST API endpoints, writing to a database and many others. It’s a comprehensive manual on how to write your own web application in Haskell. Moreover, it is also an in-depth resource on how to write web services in general. I could not recommend it enough!</p>
<p>I do read classical resources and find them very useful. I do not read them page by page as they don’t keep me engaged enough. Instead, I read selectively chapters I need right now to solve the problem at hand. I use excellent <a href="https://haskellbook.com/">Haskell Book</a> and <a href="http://learnyouahaskell.com/">Learn You a Haskell</a> among others.</p>
<p>I hope you will find <a href="https://note.github.io/scala-to-haskell-cheatsheet">Scala to Haskell cheatsheet</a> useful in your learning process too.</p>If you’re reading this I am assuming you are a Scala developer and you want to learn some Haskell. I got 2 news for you - a good one and a bad one. The good one is that there are plenty of Haskell resources available. The bad one is that there is not a single one targeting Scala developers specifically.When your code fails after being packaged as JAR2019-10-19T22:00:00+02:002019-10-19T22:00:00+02:00http://msitko.pl/blog/2019/10/19/when-your-code-fails-after-being-packaged-as-jar<p>This is “today I learned” kind of post. The code I want to show is may appear boring by itself as it just loads a file from resources. I found it interesting though because that code works when being run from tests (e.g. when run with <code class="highlighter-rouge">sbt test</code>), whereas it fails after being packaged as a JAR. The realization of that was the beginning of an engaging investigation.</p>
<p><em>I am using Scala in this post but the essence remains the same for any code targeting JVM.</em></p>
<h2 id="the-case">The case</h2>
<p>Let’s say we want to read a CSV file using <a href="https://github.com/tototoshi/scala-csv"><code class="highlighter-rouge">scala-csv</code></a> library. <code class="highlighter-rouge">CSVReader</code> has method <code class="highlighter-rouge">open</code> which accepts an argument of type <code class="highlighter-rouge">File</code>. Thus, providing we want to read a file from the filesystem, we can write something like this:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">readFromFilesystem</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">String</span><span class="o">]]</span> <span class="k">=</span> <span class="o">{</span>
<span class="nv">CSVReader</span><span class="o">.</span><span class="py">open</span><span class="o">(</span><span class="k">new</span> <span class="nc">File</span><span class="o">(</span><span class="s">"sample.csv"</span><span class="o">)).</span><span class="py">all</span>
<span class="o">}</span>
</code></pre></div></div>
<p>However, the case I want to focus on in this post is reading from a resource. We can start with the following code:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">readAsResource</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">String</span><span class="o">]]</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">val</span> <span class="nv">classloader</span> <span class="k">=</span> <span class="nv">Thread</span><span class="o">.</span><span class="py">currentThread</span><span class="o">.</span><span class="py">getContextClassLoader</span>
<span class="k">val</span> <span class="nv">url</span> <span class="k">=</span> <span class="nv">classloader</span><span class="o">.</span><span class="py">getResource</span><span class="o">(</span><span class="s">"resource.csv"</span><span class="o">)</span>
<span class="k">val</span> <span class="nv">file</span> <span class="k">=</span> <span class="nv">Paths</span><span class="o">.</span><span class="py">get</span><span class="o">(</span><span class="nv">url</span><span class="o">.</span><span class="py">toURI</span><span class="o">).</span><span class="py">toFile</span>
<span class="nv">CSVReader</span><span class="o">.</span><span class="py">open</span><span class="o">(</span><span class="n">file</span><span class="o">).</span><span class="py">all</span><span class="o">()</span>
<span class="o">}</span>
</code></pre></div></div>
<p>It is slightly more involving, and that <code class="highlighter-rouge">toURI</code> looks a bit dubious, but let’s give it a try. We will also write a test so the potential problem should be caught by it.</p>
<p>We can put both methods into the <code class="highlighter-rouge">main</code> method:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">object</span> <span class="nc">Main</span> <span class="o">{</span>
<span class="k">def</span> <span class="nf">main</span><span class="o">(</span><span class="n">args</span><span class="k">:</span> <span class="kt">Array</span><span class="o">[</span><span class="kt">String</span><span class="o">])</span><span class="k">:</span> <span class="kt">Unit</span> <span class="o">=</span> <span class="o">{</span>
<span class="nf">println</span><span class="o">(</span><span class="n">s</span><span class="s">"readFromFilesystem: ${Reader.readFromFilesystem}"</span><span class="o">)</span>
<span class="nf">println</span><span class="o">(</span><span class="n">s</span><span class="s">"readAsResource: ${Reader.readAsResource}"</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>Then we run it with <code class="highlighter-rouge">sbt reStart</code> which produces the following result:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>readFromFilesystem: List(List(a, b, c), List(d, e, f))
readAsResource: List(List(g, h, i), List(j, k, l))
</code></pre></div></div>
<p>This is exactly what is expected.</p>
<p>If we create a <a href="https://github.com/note/blog-examples/blob/master/reading-resources/src/test/scala/pl/msitko/ReaderSpec.scala">test</a> it will also work:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[info] Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
</code></pre></div></div>
<p>Everything looks fine. Then - time to deploy?</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> sbt assembly
...
<span class="o">></span> java <span class="nt">--show-version</span> <span class="nt">-jar</span> target/scala-2.13/read-resource-assembly-1.0.jar
openjdk 11.0.2 2019-01-15
OpenJDK Runtime Environment 18.9 <span class="o">(</span>build 11.0.2+9<span class="o">)</span>
OpenJDK 64-Bit Server VM 18.9 <span class="o">(</span>build 11.0.2+9, mixed mode<span class="o">)</span>
readFromFilesystem: List<span class="o">(</span>List<span class="o">(</span>a, b, c<span class="o">)</span>, List<span class="o">(</span>d, e, f<span class="o">))</span>
Exception <span class="k">in </span>thread <span class="s2">"main"</span> java.nio.file.FileSystemNotFoundException
at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.getFileSystem<span class="o">(</span>ZipFileSystemProvider.java:169<span class="o">)</span>
at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.getPath<span class="o">(</span>ZipFileSystemProvider.java:155<span class="o">)</span>
at java.base/java.nio.file.Path.of<span class="o">(</span>Path.java:208<span class="o">)</span>
at java.base/java.nio.file.Paths.get<span class="o">(</span>Paths.java:97<span class="o">)</span>
at pl.msitko.Reader<span class="nv">$.</span>readAsResource<span class="o">(</span>Reader.scala:21<span class="o">)</span>
at pl.msitko.Main<span class="nv">$.</span>main<span class="o">(</span>Main.scala:10<span class="o">)</span>
at pl.msitko.Main.main<span class="o">(</span>Main.scala<span class="o">)</span>
</code></pre></div></div>
<p>Oops, it does not look good, let’s see what went wrong.</p>
<h2 id="diving-in">Diving in</h2>
<p>If we print out <code class="highlighter-rouge">classloader.getResource("resource.csv")</code> for packaged application we will see:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>jar:file:/path/to/the/project/target/scala-2.13/read-resource-assembly-1.0.jar!/resource.csv
</code></pre></div></div>
<p><em>By the way, if we print out the same during tests the result will be <code class="highlighter-rouge">file:/path/to/the/project/target/scala-2.13/classes/resource.csv</code> which explains why that code worked
when being run as test. During tests resource’s URL points to the local file system.</em></p>
<p>Stack trace mentions <code class="highlighter-rouge">ZipFileSystemProvider</code>, after taking a look at its <a href="https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/jdk.zipfs/share/classes/jdk/nio/zipfs/ZipFileSystemProvider.java">code</a> and some legacy <a href="https://docs.oracle.com/javase/7/docs/technotes/guides/io/fsp/zipfilesystemprovider.html">docs</a> we may try to:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">readAsResource</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">String</span><span class="o">]]</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">val</span> <span class="nv">classloader</span> <span class="k">=</span> <span class="nv">Thread</span><span class="o">.</span><span class="py">currentThread</span><span class="o">.</span><span class="py">getContextClassLoader</span>
<span class="k">val</span> <span class="nv">url</span> <span class="k">=</span> <span class="nv">classloader</span><span class="o">.</span><span class="py">getResource</span><span class="o">(</span><span class="s">"resource.csv"</span><span class="o">)</span>
<span class="c1">// the next three lines are new compared to the previous code
</span> <span class="k">val</span> <span class="nv">jarProvider</span> <span class="k">=</span> <span class="nv">FileSystemProvider</span><span class="o">.</span><span class="py">installedProviders</span><span class="o">.</span><span class="py">asScala</span><span class="o">.</span><span class="py">toList</span><span class="o">.</span><span class="py">filter</span><span class="o">(</span><span class="nv">_</span><span class="o">.</span><span class="py">getScheme</span> <span class="o">==</span> <span class="s">"jar"</span><span class="o">).</span><span class="py">head</span>
<span class="k">val</span> <span class="nv">jarUrl</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">URI</span><span class="o">(</span><span class="s">"jar:file:/path/to/the/project/target/scala-2.13/read-resource-assembly-1.0.jar"</span><span class="o">)</span>
<span class="nv">jarProvider</span><span class="o">.</span><span class="py">newFileSystem</span><span class="o">(</span><span class="n">jarUrl</span><span class="o">,</span> <span class="nv">Map</span><span class="o">.</span><span class="py">empty</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">Any</span><span class="o">].</span><span class="py">asJava</span><span class="o">)</span>
<span class="k">val</span> <span class="nv">file</span> <span class="k">=</span> <span class="nv">Paths</span><span class="o">.</span><span class="py">get</span><span class="o">(</span><span class="nv">url</span><span class="o">.</span><span class="py">toURI</span><span class="o">).</span><span class="py">toFile</span>
<span class="nv">CSVReader</span><span class="o">.</span><span class="py">open</span><span class="o">(</span><span class="n">file</span><span class="o">).</span><span class="py">all</span><span class="o">()</span>
<span class="o">}</span>
</code></pre></div></div>
<p>That code is quite naive and assumes we know the location of JAR file beforehand, but we are just playing around here. It yields:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>readFromFilesystem: List(List(a, b, c), List(d, e, f))
Exception in thread "main" java.lang.UnsupportedOperationException
at jdk.zipfs/jdk.nio.zipfs.ZipPath.toFile(ZipPath.java:661)
at pl.msitko.Reader$.readAsResource(Reader.scala:25)
at pl.msitko.Main$.main(Main.scala:10)
at pl.msitko.Main.main(Main.scala)
</code></pre></div></div>
<p>There is some progress: instead of previous <code class="highlighter-rouge">FileSystemNotFoundException</code>, we got <code class="highlighter-rouge">UnsupportedOperationException</code>. After looking at <code class="highlighter-rouge">ZipPath.toFile</code> <a href="https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/jdk.zipfs/share/classes/jdk/nio/zipfs/ZipPath.java#L660">implementation</a> the culprit seems obvious:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@Override
public final File toFile() {
throw new UnsupportedOperationException();
}
</code></pre></div></div>
<p>That implementation makes sense considering that <code class="highlighter-rouge">java.io.File</code> is meant to model local files. There is simply no local path for a collection of bytes within ZIP file (JAR is technically a ZIP file). <strong>To conclude - URL returned by <code class="highlighter-rouge">ClassLoader.getResource</code> cannot be converted to <code class="highlighter-rouge">java.io.File</code> as a resource cannot be expressed as <code class="highlighter-rouge">java.io.File</code>.</strong></p>
<h2 id="back-to-initial-task">Back to initial task</h2>
<p>With that conclusion we can go back to the initial scala-csv example. Another method for working with resources provided by <code class="highlighter-rouge">ClassLoader</code> is <code class="highlighter-rouge">getResourceAsStream</code>. We cannot use it directly as <code class="highlighter-rouge">CSVReader</code> has no API entry which accepts <code class="highlighter-rouge">InputStream</code>. Fortunately, among numerous overloaded <code class="highlighter-rouge">CSVReader.open</code> methods there is one which uses <code class="highlighter-rouge">java.io.Reader</code> as an argument. So we can rewrite code which loads CSV from resource:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">readResourceUsingReader</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">String</span><span class="o">]]</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">val</span> <span class="nv">classloader</span> <span class="k">=</span> <span class="nv">Thread</span><span class="o">.</span><span class="py">currentThread</span><span class="o">.</span><span class="py">getContextClassLoader</span>
<span class="k">val</span> <span class="nv">stream</span> <span class="k">=</span> <span class="nv">classloader</span><span class="o">.</span><span class="py">getResourceAsStream</span><span class="o">(</span><span class="s">"resource.csv"</span><span class="o">)</span>
<span class="k">val</span> <span class="nv">reader</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">InputStreamReader</span><span class="o">(</span><span class="n">stream</span><span class="o">,</span> <span class="nv">java</span><span class="o">.</span><span class="py">nio</span><span class="o">.</span><span class="py">charset</span><span class="o">.</span><span class="py">StandardCharsets</span><span class="o">.</span><span class="py">UTF_8</span><span class="o">)</span>
<span class="nv">CSVReader</span><span class="o">.</span><span class="py">open</span><span class="o">(</span><span class="n">reader</span><span class="o">).</span><span class="py">all</span><span class="o">()</span>
<span class="o">}</span>
</code></pre></div></div>
<p>By using <code class="highlighter-rouge">getResourceAsStream</code> we avoid issues with <code class="highlighter-rouge">File</code> at all.</p>
<h2 id="more-on-zip-file-system-provider">More on Zip File System Provider</h2>
<p>Since Java SE 7 release Zip File System Provider is being included as part of JVM. We managed to make it work using <code class="highlighter-rouge">newFileSystem</code> and managed to resolve URL into <code class="highlighter-rouge">Path</code>. Thanks to that we can use any API which uses <code class="highlighter-rouge">Path</code>, for example, we can read all bytes of that resource file with <code class="highlighter-rouge">Files.readAllBytes</code>.</p>
<p>That being said - that code is quite hacky and I would consider it as last resort solution.</p>
<h2 id="key-takeaways">Key takeaways</h2>
<ol>
<li>As a library developer, you should provide alternatives to API using <code class="highlighter-rouge">java.io.File</code>. <code class="highlighter-rouge">java.nio.file.Path</code> is probably a good idea as it is more general.</li>
<li>You should realize that if your application is packaged as JAR there’s no resource file at runtime. There’s only a single JAR file and classloader which knows how to resolve resource path. While it may sound obvious to many readers, it can be really counterintuitive to many developers because they spend most of their time simply developing their code. At development time a simple association <code class="highlighter-rouge">resource = file</code> works, but at runtime it is no longer valid.</li>
<li>As a consequence of the above point - be cautious with <code class="highlighter-rouge">java.lang.Classloader.getResource</code> as it returns URL not convertible to <code class="highlighter-rouge">java.io.File</code>. What is worse - you will learn about it as late as after packaging and running the code.</li>
<li>Be mindful of differences between environment in which you run tests and production environment. The example described here is just one of a few differences between running Java code inside of your build tool and from within JAR.</li>
</ol>
<h2 id="github-repository">Github repository</h2>
<p><a href="https://github.com/note/blog-examples/tree/master/reading-resources">Repository</a> with code used in this article</p>This is “today I learned” kind of post. The code I want to show is may appear boring by itself as it just loads a file from resources. I found it interesting though because that code works when being run from tests (e.g. when run with sbt test), whereas it fails after being packaged as a JAR. The realization of that was the beginning of an engaging investigation.Replace JSON with Dhall: DynamoDB case study2019-03-13T21:49:27+01:002019-03-13T21:49:27+01:00http://msitko.pl/blog/2019/03/13/replace-json-with-dhall<p>In this post I will show you how you can rewrite a piece of schema-less JSON file into <a href="https://dhall-lang.org/">Dhall</a>. As an example I will use <a href="https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_CreateTable.html">JSON</a> being used for creating a DynamoDB table. It was chosen for illustrative purposes only and you don’t need to know anything about DynamoBD and it is not really relevant to the key message of this post.</p>
<p>Do not treat this blogpost as either comprehensive introduction to Dhall or list of best practices. I am a Dhall beginner and want to present a use case when it is useful. Thus, the code itself might not be of the highest quality.</p>
<p>Before diving into Dhall we will take a look at how configuration files are being written currently.</p>
<h2 id="current-approach-to-configuration-files">Current approach to configuration files</h2>
<p>Dhall is advertised as <code class="highlighter-rouge">non-repetitive alternative to YAML</code> and I think such positioning definitely makes sense. YAML, JSON, and their derivatives have become a de facto standard for many aspects of Devops and configuration management. Just think how you write your <code class="highlighter-rouge">docker-compose</code> file, your Kubernetes files, your OpenAPI specification, your DynamoDB table specification or CI job. All of them are either YAML or JSON. However, <a href="https://twitter.com/kartar/status/1081255787568205826">not</a> <a href="https://twitter.com/brunoborges/status/1098472238469111808">so</a> <a href="https://twitter.com/caged/status/1039937162769096704?lang=en">many</a> users of them would actually say they like those formats. Lack of schema, no support for code reuse or even variables, no type safety - those are the biggest problems, among others.</p>
<p>Another language in this domain is HashiCorp Configuration Language, also known simply as HCL, which is used to define Terraform based infrastructure. To me HCL feels like a language that emerged in ad-hoc fashion rather than one that was meticulously designed. It misses basic tools like user defined functions so it is hard to structure your code in a lightweight way. Lack of enums is also quite disturbing. Let’s consider attribute <code class="highlighter-rouge">encryption_type</code> of <code class="highlighter-rouge">aws_kinesis_stream</code> resource. Even though it is <a href="https://www.terraform.io/docs/providers/aws/r/kinesis_stream.html#encryption_type">documented</a> that <em>the only acceptable values are NONE or KMS</em> Terraform will happily accept any other value.</p>
<p>As a person working daily with strongly statically typed language (i.e. Scala) I was struck that crucial parts of code are written in a way that simple typo will be detected only at runtime. I sighed <em>if only we have some simple, possibly Turing incomplete language specialized in configuration</em>. Then a colleague of mine pointed me to Dhall and I realized that was the thing I was looking for.</p>
<h2 id="we-can-do-better-dhall">We can do better: Dhall</h2>
<p>Dhall is a configuration language. You can think of it as JSON. However, unlike JSON, it is programmable - you can define functions. It is modular - you can extract commonly used functions to a file and import it in many places. It’s also statically typed so you will be notified of type errors ahead of time. Since it is also strongly typed there is no type <a href="https://github.com/rancher/rancher/issues/550">casting</a>.</p>
<p>Although Dhall is programmable it is not Turing complete. It is a conscious design decision - thanks to that it is always guaranteed to terminate and will never hang. It only means that there is not general recursion in the language but you still can for example map over list.</p>
<p>I do not want to describe Dhall in detail in this blogpost. If you want to know more both Dhall’s <a href="https://github.com/dhall-lang/dhall-lang">readme</a> and <a href="https://dhall-lang.org/">site</a> are good places to start.</p>
<p>What I want to do instead is to show you an example of how Dhall can be used to simplify a configuration file.</p>
<h2 id="dynamodb-example---original-json">DynamoDB example - original JSON</h2>
<p>Before we start refactoring we need to understand what is the starting point, namely how the JSON used by DynamoDB looks like. We will be working with made up example so no need to think too much about the structure of that table. We focus on the way it is specified instead.</p>
<p>DynamoDB table can be created using CLI:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws dynamodb create-table <span class="nt">--cli-input-json</span> file:///your/path/table.json
</code></pre></div></div>
<p>In this text we focus solely on <code class="highlighter-rouge">table.json</code> file, which syntax is described in AWS <a href="https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_CreateTable.html">docs</a> . Here is how it may look like:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="nl">"AttributeDefinitions"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"AttributeName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Id"</span><span class="p">,</span><span class="w">
</span><span class="nl">"AttributeType"</span><span class="p">:</span><span class="w"> </span><span class="s2">"S"</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"AttributeName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Artist"</span><span class="p">,</span><span class="w">
</span><span class="nl">"AttributeType"</span><span class="p">:</span><span class="w"> </span><span class="s2">"S"</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"AttributeName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Song"</span><span class="p">,</span><span class="w">
</span><span class="nl">"AttributeType"</span><span class="p">:</span><span class="w"> </span><span class="s2">"S"</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"AttributeName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Year"</span><span class="p">,</span><span class="w">
</span><span class="nl">"AttributeType"</span><span class="p">:</span><span class="w"> </span><span class="s2">"N"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">],</span><span class="w">
</span><span class="nl">"KeySchema"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"KeyType"</span><span class="p">:</span><span class="w"> </span><span class="s2">"HASH"</span><span class="p">,</span><span class="w">
</span><span class="nl">"AttributeName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Id"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">],</span><span class="w">
</span><span class="nl">"GlobalSecondaryIndexes"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"IndexName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ArtistSongIndex"</span><span class="p">,</span><span class="w">
</span><span class="nl">"Projection"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"ProjectionType"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ALL"</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"ProvisionedThroughput"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"WriteCapacityUnits"</span><span class="p">:</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w">
</span><span class="nl">"ReadCapacityUnits"</span><span class="p">:</span><span class="w"> </span><span class="mi">3</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"KeySchema"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"KeyType"</span><span class="p">:</span><span class="w"> </span><span class="s2">"HASH"</span><span class="p">,</span><span class="w">
</span><span class="nl">"AttributeName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Artist"</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"KeyType"</span><span class="p">:</span><span class="w"> </span><span class="s2">"RANGE"</span><span class="p">,</span><span class="w">
</span><span class="nl">"AttributeName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Song"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"IndexName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"YearArtistIndex"</span><span class="p">,</span><span class="w">
</span><span class="nl">"Projection"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"ProjectionType"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ALL"</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"ProvisionedThroughput"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"WriteCapacityUnits"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w">
</span><span class="nl">"ReadCapacityUnits"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"KeySchema"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"KeyType"</span><span class="p">:</span><span class="w"> </span><span class="s2">"HASH"</span><span class="p">,</span><span class="w">
</span><span class="nl">"AttributeName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Year"</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nl">"KeyType"</span><span class="p">:</span><span class="w"> </span><span class="s2">"RANGE"</span><span class="p">,</span><span class="w">
</span><span class="nl">"AttributeName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Artist"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">],</span><span class="w">
</span><span class="nl">"ProvisionedThroughput"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"WriteCapacityUnits"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w">
</span><span class="nl">"ReadCapacityUnits"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"TableName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Songs"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>Problems with above JSON:</p>
<ul>
<li>lack of variables. If you make a typo by referring to <code class="highlighter-rouge">"Yearr"</code> instead of <code class="highlighter-rouge">"Year"</code> in any index definition it will be caught as late as while running AWS request</li>
<li>lack of types. You can define <code class="highlighter-rouge">KeyType</code> as <code class="highlighter-rouge">56</code> and nothing will complain</li>
<li>you can forget about <code class="highlighter-rouge">TableName</code> which is a required field</li>
<li>lack of enums. You can define <code class="highlighter-rouge">KeyType</code> as <code class="highlighter-rouge">"whatever"</code> even though <code class="highlighter-rouge">"HASH"</code> or <code class="highlighter-rouge">"RANGE"</code> are only valid values</li>
<li>lack of comments. It’s JSON specific issue, YAML has a way of adding comments</li>
<li>it’s very repetitive. You need to repeat 4 lines of <code class="highlighter-rouge">ProvisionedThroughput</code> over and over although it is basically a function of 2 integer arguments. Thus, it is cumbersome to write</li>
<li>due to all verbosity the signal to noise ratio of the file is very low. It makes reading and comprehending key ideas expressed in the file difficult</li>
</ul>
<p>Once we know what we want to fix let’s start doing that with Dhall!</p>
<h2 id="rewriting-dynamodb-example-with-dhall">Rewriting DynamoDB example with Dhall</h2>
<h3 id="how-to-run-the-code">How to run the code</h3>
<p>You can find the full code used in the example in github <a href="https://github.com/note/blog-examples/tree/master/dhall-dynamo">repository</a>. Its README contains instruction on how to run the code.</p>
<h3 id="file-structure">File structure</h3>
<p>File structure is as follows:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dhall
├── generic
│ ├── functions.dhall
│ ├── schema.dhall
│ └── types.dhall
└── migration.dhall
</code></pre></div></div>
<p>Directory <code class="highlighter-rouge">generic</code> contains common types and functions useful when working with DynamoDB <code class="highlighter-rouge">create-table</code> JSON format. In an ideal world it would have been written already by someone else and published in some repository. It consists of things that are supposed to be written once and used many times. I cut corners though and I implemented just pieces that are relevant to the example presented in this post.</p>
<p>The file <code class="highlighter-rouge">migration.dhall</code> is the only one that includes pieces of information related to exemplary JSON file mentioned at the beginning of this post.</p>
<p>Given such file structure you can generate JSON out of <code class="highlighter-rouge">migration.dhall</code> by:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dhall-to-json <span class="nt">--explain</span> <span class="nt">--pretty</span> <span class="o"><<<</span> <span class="s1">'./dhall/migration.dhall : ./dhall/generic/schema.dhall'</span>
</code></pre></div></div>
<h3 id="defining-types">Defining types</h3>
<p>Let’s start with defining types in <code class="highlighter-rouge">types.dhall</code>. Here is the fragment of it:</p>
<pre><code class="language-dhall">let AttributeDefinition = {
AttributeName: Text,
AttributeType: Text
}
let ProvisionedThroughput = {
WriteCapacityUnits: Natural,
ReadCapacityUnits: Natural
}
-- more types ommited for sake of readability
</code></pre>
<p>As you see it is quite straightforward. It also shows the usual pattern of having a sequence of <code class="highlighter-rouge">let</code> in the first part of Dhall’s file. It needs to be followed by <code class="highlighter-rouge">in</code> keyword and expression using definitions created with <code class="highlighter-rouge">let</code>.</p>
<p>In our case we will use a record with all defined types in the <code class="highlighter-rouge">in</code> section:</p>
<pre><code class="language-dhall">in
{
AttributeDefinition = AttributeDefinition,
GlobalSecondaryIndex = GlobalSecondaryIndex,
KeySchemaItem = KeySchemaItem,
ProvisionedThroughput = ProvisionedThroughput
}
</code></pre>
<p>Let’s try it out: <em>(I am using <code class="highlighter-rouge">dhall</code> command here which reads from standard input, by ctrl-d you can signal end of input)</em></p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> dhall
let Types = ./generic/types.dhall in
{
WriteCapacityUnits = 5,
ReadCapacityUnits = 5
} : Types.ProvisionedThroughput
^D
{ ReadCapacityUnits = 5, WriteCapacityUnits = 5 }
</code></pre></div></div>
<p>It worked as expected. Now let’s make a type mistake and see if Dhall will catch it:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> dhall
let Types = ./generic/types.dhall in
{
WriteCapacityUnits = 5,
ReadCapacityUnits = "hello"
} : Types.ProvisionedThroughput
^D
Use "dhall --explain" for detailed errors
Error: Expression doesn't match annotation
{ ReadCapacityUnits : - Natural
+ Text
, …
}
</code></pre></div></div>
<p>Error caught, success!</p>
<h3 id="defining-schema">Defining schema</h3>
<p>Now we can import types defined in previous point in <code class="highlighter-rouge">schema.dhall</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let Types = ./generic/types.dhall
in {
TableName: Text,
KeySchema: List Types.KeySchemaItem,
AttributeDefinitions: List Types.AttributeDefinition,
GlobalSecondaryIndexes: List Types.GlobalSecondaryIndex,
ProvisionedThroughput: Types.ProvisionedThroughput
}
</code></pre></div></div>
<p>Split between <code class="highlighter-rouge">types.dhall</code> and <code class="highlighter-rouge">schema.dhall</code> is arbitrary; they could as well be a single file. I find it clean to have the top level type defined in a separate file but Dhall itself does not enforce any structure.</p>
<h3 id="using-schema">Using schema</h3>
<p>The most straightforward way of using that schema would be:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let Types = ./generic/types.dhall
in
{
AttributeDefinition = [
{
AttributeName = "S",
AttributeType = "Id"
}
-- other attributes ommited
]
-- other attributes ommited
}
</code></pre></div></div>
<p>However, it is similarly verbose to the original JSON and we wanted to avoid that. To prevent repetition we will declare a few functions in <code class="highlighter-rouge">functions.dhall</code> to create a nice DSL we can use in <code class="highlighter-rouge">migration.dhall</code>.</p>
<p>Here’s the fragment of <code class="highlighter-rouge">functions.dhall</code> related to <code class="highlighter-rouge">AttributeDefinition</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let mkAttribute =
λ(attributeType: Text)
→ λ(attributeName: Text)
→ {
AttributeName = attributeName,
AttributeType = attributeType
}
-- partially applied functions for each of types:
let mkStringAttribute = mkAttribute "S"
let mkNumberAttribute = mkAttribute "N"
</code></pre></div></div>
<p>As you can see <code class="highlighter-rouge">Dhall</code> incorporates techniques known from functional programming such as currying and partial application. Thanks to that it gives us simple and reliable framework for abstraction.</p>
<h3 id="eventual-form">Eventual form</h3>
<p>All the generic functionality is in place, it is time to use it to rewrite the inital example:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>let Types = ./generic/types.dhall
let Functions = ./generic/functions.dhall
let id = "Id"
let artist = "Artist"
let song = "Song"
let year = "Year"
let defaultThroughput = Functions.mkThroughput 2 2
in
{
TableName = "Songs",
KeySchema = [Functions.mkHashIndex id],
AttributeDefinitions = [
Functions.mkStringAttribute id,
Functions.mkStringAttribute artist,
Functions.mkStringAttribute song,
Functions.mkNumberAttribute year
],
GlobalSecondaryIndexes = [
Functions.mkIndex [Functions.mkHashIndex artist, Functions.mkRangeIndex song] (Functions.mkThroughput 3 3),
Functions.mkIndex [Functions.mkHashIndex year, Functions.mkRangeIndex artist] defaultThroughput
],
ProvisionedThroughput = defaultThroughput
}
</code></pre></div></div>
<p>That’s it!</p>
<h2 id="dynamodb-example---what-was-achieved">DynamoDB example - what was achieved</h2>
<p>There is clear progress when you take a look at the final result and original DB example. The general feeling is that the resulting configuration is devoid of any noise; it simply conveys the essence of what needs to be expressed.</p>
<p>We were able to:</p>
<ul>
<li>eliminate repetitiveness of original format</li>
<li>introduce variables so we don’t have to repeat ourselves when it comes to name of fields. It also reduces spelling mistakes</li>
<li>force our configuration to adhere to the defined schema. It means it protects us from type errors, omitting attribute keys etc.</li>
</ul>
<p>You may argue that I had to write schema and Dhall functions that allowed me to radically improve level of expressiveness so there is some additional code outside of nice demo at the end.</p>
<p>That’s right, but:</p>
<ul>
<li>you write your schema and helper functions only once and then you can use them multiple times</li>
<li>once Dhall become more popular there will be a lot of schemas and code written by community. Of course, to some extent, it is already a case examples being <a href="https://github.com/dhall-lang/dhall-nix">dhall-nix</a> or <a href="https://github.com/dhall-lang/dhall-kubernetes">dhall-kubernetes</a>.</li>
</ul>
<h2 id="dynamodb-example---deficiencies">DynamoDB example - deficiencies</h2>
<p>Even though it looks quite good I must admit when I heard about Dhall first time I had something more powerful in mind. I expected to be able to describe whole schema with great precision using <a href="https://blog.softwaremill.com/algebraic-data-types-in-four-languages-858788043d4e">ADT</a>. Moreover, I hoped for strong typing in a sense that I will hardly ever use <code class="highlighter-rouge">Text</code> (i.e. Dhall’s <code class="highlighter-rouge">String</code>) type and the solution here is full of it.</p>
<p>Take a look at part of schema:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>AttributeDefinitions : List {
AttributeName: Text,
AttributeType: Text
}
</code></pre></div></div>
<p>While <code class="highlighter-rouge">AttributeName</code> is actually quite fine as <code class="highlighter-rouge">Text</code>, <code class="highlighter-rouge">AttributeType</code> in its substance is an enum with a few valid values only as documented <a href="https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_AttributeValue.html">here</a>. You cannot put there <code class="highlighter-rouge">ABC</code> and such type of mistake should be caught by configuration language when checking against schema. In that regard the mantra should be to check as much as possible as early as possible.</p>
<h3 id="union-types-to-the-rescue">Union types to the rescue?</h3>
<p>The good news is that Dhall enables to express enums on type level by using <a href="http://hackage.haskell.org/package/dhall-1.21.0/docs/Dhall-Tutorial.html#g:12">unions</a>. Here we try to be more explicit about what types we expect for <code class="highlighter-rouge">AttributeType</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-- There are a few more types supported by DynamoDB, let's consider those 3 to be more concise:
let AttributeType = < Number : {} | Binary : {} | String : {} >
let attributeType = constructors AttributeType
let AttributeDefinition = {
AttributeName: Text,
AttributeType: AttributeType
}
let idAttr = {
AttributeName = "Id",
AttributeType = attributeType.String {=}
}
in
idAttr
</code></pre></div></div>
<p>We can run it against <code class="highlighter-rouge">dhall</code> to prove that Dhall “understands” the meaning of such configuration:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dhall <<< './unions.dhall'
{ AttributeName =
"Id"
, AttributeType =
< String = {=} | Binary : {} | Number : {} >
}
</code></pre></div></div>
<p>Now, let’s try to generate JSON out of it:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dhall-to-json --pretty <<< './unions.dhall'
{
"AttributeName": "Id",
"AttributeType": {}
}
</code></pre></div></div>
<p><code class="highlighter-rouge">"AttributeType": {}</code> is not something we want to achieve. We would like to have <code class="highlighter-rouge">"AttributeType": "S"</code>. It is understandable that <code class="highlighter-rouge">dhall-to-json</code> did not come up with expected result taking into account we have not defined JSON representation for <code class="highlighter-rouge">AttributeType</code> union. We may do that by defining a function <code class="highlighter-rouge">attributeTypeToString = λ(t : AttributeType) → Text</code> in Dhall, which is easy. There is a major problem here though - as return type of that function is <code class="highlighter-rouge">Text</code> we would need to declare <code class="highlighter-rouge">AttributeType</code> field as <code class="highlighter-rouge">Text</code> again negating most of the benefit of introducing union type <code class="highlighter-rouge">AttributeType</code> at first. It still may have some benefit, but only providing you will keep the convention of setting <code class="highlighter-rouge">AttributeType</code> field always by using <code class="highlighter-rouge">attributeTypeToString</code> function. Mind that it would work only by convention and there is nothing in Dhall’s type system that will stop you from setting <code class="highlighter-rouge">AttributeType</code> to any, possibly invalid, <code class="highlighter-rouge">Text</code>.</p>
<p>All in all, the problem boils down to:</p>
<blockquote>
<p>When using Dhall via <code class="highlighter-rouge">dhall-to-json</code> all types in leaf nodes of a schema have to be declared as primitive types supported by <code class="highlighter-rouge">dhall-to-json</code>.</p>
</blockquote>
<p>It is not a problem of <code class="highlighter-rouge">dhall-to-json</code> itself; it is clear that it cannot be more precise then underlying format. Hypothetically it could have some resolution mechanism so it would try to find a function of type <code class="highlighter-rouge">AttributeType -> Text</code> to enable usage of rich types directly in schema but it is not a design goal of <code class="highlighter-rouge">dhall-to-json</code>. I have not checked <code class="highlighter-rouge">dhall-to-yaml</code> but I believe it has the same constraint.</p>
<p>Although it may look like an obvious limitation it took me some time to realize it. I believe it should be taken into account when thinking about potential use cases for <code class="highlighter-rouge">dhall-to-json</code>.</p>
<h3 id="possible-solutions">Possible solutions</h3>
<ul>
<li>One apparent solution would be to write our own <code class="highlighter-rouge">dhall-to-dynamo</code> using Dhall’s Haskell bindings. We would be able to treat DynamoDB related types differently there. However, in this blogpost I am advocating Dhall as a <em>Swiss army knife</em> for configuration formats. We should be able to write a few relatively straightforward <code class="highlighter-rouge">.dhall</code> files and simply profit without caring about Haskell bindings or even knowing Haskell at all, let alone building and distributing binaries</li>
<li>We may define <code class="highlighter-rouge">AttributeType</code> as <code class="highlighter-rouge">< Number : Text | Binary : Text | String : Text ></code>. Then we may create a type constructors which will propagate valid <code class="highlighter-rouge">Text</code> values, e.g. <code class="highlighter-rouge">let mkNumber = attributeType.Number "N"</code>. The problem here is that nothing stops user from bypassing the type constructor and simply specifying <code class="highlighter-rouge">attributeType.Number "rubbish"</code>. We cannot <code class="highlighter-rouge">< Number : "N" | ...</code> as <code class="highlighter-rouge">"N"</code> is term as opposed to type and Dhall provides no means of restricting valid values of types (would be very happy to be proven wrong here but I was not able to find anything in that regard)</li>
<li>We can define two schemas in Dhall: <em>rich</em> and <em>primitive</em>. <em>Rich</em> one would operate on semantic types while <em>primitive</em> on underlying format types. A schema developer would need to provide a function <code class="highlighter-rouge">transformSchema: RichSchema -> Schema</code>, that function being the only gateway from rich to primitive types. A person using schema would be supposed to work only with rich types and would call <code class="highlighter-rouge">transformSchema</code> function at the very end of the config.</li>
</ul>
<p>I implemented the third approach in a very limited scope (I enriched only <code class="highlighter-rouge">AttributeType</code> to be a union type) <a href="https://github.com/note/blog-examples/tree/master/dhall-dynamo/dhall-rich-schema">here</a>. In such limited scope the change looks quite simple but I am afraid in even a bit more advanced case maintaining function <code class="highlighter-rouge">transformSchema.dhall</code> would become a bottleneck. That important factor in that regard is depth of schema structure. In case of really deep structures some tools for working with them, such as optics in FP or visitor pattern in OOP, would be very useful. As far as I know Dhall currently does not provide them.</p>
<p>Still, I believe the last approach is best from proposed ones and is worth further exploring.</p>
<p><em>In case you wonder - why not simply call <code class="highlighter-rouge">transformAttributeType</code> (and <code class="highlighter-rouge">transformHashType</code> and so on) avoiding any necessity of working with nested structures? While it would work it would be against of the whole idea of strong typing. The essence of proposed solution is to have strictly one place where we translate</em></p>
<h2 id="other-use-cases--possible-extensions">Other use cases / Possible extensions</h2>
<p>What I described in this post is using Dhall only for generating one file which describes just one piece of overall architecture. The vision worth pursuing is something I call <em>Dhall all the way down</em>. The idea is to use Dhall files as the only ones that should be modified by developer of the application.</p>
<p>So instead of setting up DynamoDB table with Terraform, providing table schema with JSON and configuring your Scala application with HOCON (aka typesafe-config) you would configure everything at Dhall level only once. Dhall can generate proper configuration files in underlying formats so it is not required for all pieces to understand Dhall. The biggest advantage of such approach would be <em>referential integrity</em> check. Without Dhall when changing table name in JSON it is easy to forget to update HOCON used by Scala application.</p>
<p>It is a high level vision. Not sure how feasible it is right now. One apparent problem in example described above is lack of <code class="highlighter-rouge">dhall-to-hocon</code>.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Dhall provides a simple way of defining configuration files in less verbose and less error-prone way than JSON or YAML. Also, writing schema and helper functions is quite easy job and can pay off in increased productivity even for small use cases. I would say that if you need to maintain more that 5-10 configuration JSON files similar to described in this post it’s already a scale to start profiting from Dhall.</p>
<p><em>Disclaimer: for people not fluent in statically typed functional languages learning curve can be steeper</em>.</p>
<p>If you hope for being able to define schema in super typesafe and extremely precise way so you are able to express things like <em>field A is either number lower than 5 or string of length 15</em> then Dhall itself will not help you to that much extent (at least at the moment).</p>
<p>My general feeling is that Dhall philosophy is to provide a set of clean, well defined and very thoroughly defined primitives while not caring that much about ergonomics for specific use case. That goes along with observations gathered in <a href="http://www.haskellforall.com/2019/02/dhall-survey-results-2019-2019.html">Dhall survey</a>. It seems like providing right abstractions/tools for specific use cases is an exercise left for future. I agree with that on philosophical level because it is much easier to provide opinionated solutions on top of clean primitives than the other way round. From a pragmatic point of view the question is how fast and how big the community and tooling around Dhall will grow. I do not feel entitled to give my bet on this as I just started my adventure with Dhall. I personally will start using Dhall for simple cases and experiment around more advanced ones. With a grain of evangelism which I hopefully did in this post.</p>
<h2 id="acknowledgements">Acknowledgements</h2>
<p>Thanks to <a href="https://github.com/Gabriel439">Gabriel Gonzalez</a> and all <a href="https://github.com/dhall-lang/dhall-lang/graphs/contributors">contributors</a> for the wonderful work on Dhall. High quality of all software involved and clarity of thought of documentation are stunning.</p>
<p>Thanks to <a href="https://github.com/kjanosz">Krzysztof Janosz</a> who introduced me to Dhall.</p>
<h2 id="github-repository">Github repository</h2>
<p><a href="https://github.com/note/blog-examples/tree/master/dhall-dynamo">Repository</a> with code used in this article</p>In this post I will show you how you can rewrite a piece of schema-less JSON file into Dhall. As an example I will use JSON being used for creating a DynamoDB table. It was chosen for illustrative purposes only and you don’t need to know anything about DynamoBD and it is not really relevant to the key message of this post.