Haskell for impatient Scala developer: Getting into speed

If you're reading this I am assuming you are a Scala developer and you want to learn some Haskell. I got 2 news for you - a good one and a bad one. The good one is that there are plenty of Haskell resources available. The bad one is that there is not a single one targeting Scala developers specifically.

Why would lack of Haskell resources for Scala developer bother anyone? The thing is that as a Scala developer you know a lot of concepts already. You know what is monad and applicative, you folded through a list many times and you're not scared of immutable collections. Therefore going through regular Haskell tutorials or books feels slow and is not very engaging because they assume you are starting from zero.

I prefer learning by practice so I tried to use Haskell for some side projects. However, I realized that I miss a single succinct Scala to Haskell cheat sheet that I could glance at when being at doubt about basic parts. This article is not such a spreadsheet - I started working on this here. You shouldn't rely on such superficial knowledge for too long - after all that Scala analogies are not 100% accurate. The point of them is getting you into speed.

This article aim is to walk you through essential parts of learning a new language that are often missed: which build tool you should use, how to create a new project or how to add an external dependency. Some of those things function differently from their Scala counterparts and I will try to stress them out.

The whole code presented here is available in the repository. It also contains the Scala equivalent of the application we build.

What is our sample applications supposed to do

It will be a simple command line application that given such file as input:

[
  {"tag": "Comment", "blogPostId": 1, "content": "Some comment"},
  {"tag": "BlogPost", "id": 1,"title": "Some blog post", "summary": "Some post"},
  {"tag": "BlogPost", "id": 2,"title": "Another blog post", "summary": "Another post"},
  {"tag": "Comment", "blogPostId": 1, "content": "Another comment" }
]

will print out the following output:

[
  {"tag": "Comment", "blogPostId": 1, "content": "Some comment"},
  {"tag": "Comment", "blogPostId": 1, "content": "Another comment" }
]

It's basically: parse JSON, filter out some of items, print the result as JSON. Although very simple, it requires external dependency for working with JSON.

The first step of developing any application is creating initial directory structure so it adheres to build tool expectations. And as we talk about it - you need a build tool.

Our build tool of choice - stack

We need an sbt analogue for Haskell. The two most popular choices in Haskell are cabal and stack. I decided to stick to stack for this article.

stack is built on top of ghci, cabal and hackage and tries to provide better developer experience than using those tools directly. You can read more here.

IDE - IntelliJ with IntelliJ-Haskell plugin

You also need a code editor. As I am coming from Scala I have used IntelliJ on a daily basis for a few years. There's an IntelliJ-Haskell plugin which "just works".

The choice of both build tool and IDE is debatable and highly opinionated. Right now, however, we don't care about the best option - we just want to get started. And I believe this setup resembles a typical Scala setup.

Let's start!

Bootstrapping the project

First, install stack according to this instruction. Then, we need to bootstrap a new project. In Scala you may have used sbt new to do that. In case of stack the command happens to be called the same; therefore:

> stack new haskell-introduction

haskell-introduction is the name of the new project; it will be used as a directory name too. As we have passed only one argument to stack new the default template will be used. After the command completed we should see something like this:

> tree haskell-introduction
haskell-introduction
├── app
│   └── Main.hs
├── ChangeLog.md
├── haskell-introduction.cabal
├── LICENSE
├── package.yaml
├── README.md
├── Setup.hs
├── src
│   └── Lib.hs
├── stack.yaml
└── test
    └── Spec.hs

3 directories, 10 files

All *.hs files are Haskell sources. The most relevant files to the build definition are package.yaml and stack.yaml. In simplistic terms, package.yaml corresponds to build.sbt as it defines the project we build whereas stack.yaml control stack-related settings - things we would expect in project directory for sbt-based projects. We will only touch package.yaml in scope of this article.

Generated project comes with the functionality of printint out hardcoded string. Let run it with stack exec:

> cd haskell-introduction  # enter directory created by `stack new`
> stack build && stack exec haskell-introduction-exe
somefunc

If you see somefunc in your terminal too you're now good to open the project in the IDE. Start with installing IntelliJ-Haskell according to getting started section. This document also describes in detail how to open a new project. For the first project it includes some extra steps like configuring Project SDK so I suggest to read it carefully.

If you installed the plugin and opened the project you should observe no errors in the IDE and things such as highlighting, code completion and navigating to the definition should function properly.

What does stack actually do?

Let's step back to understand what exactly happens when we stack exec.

One of things stack does is providing a compiler - ghc. I don't have it installed on my system:

> which ghc
ghc not found

Yet, it is available to stack:

> stack exec -- which ghc
/home/michal/.stack/programs/x86_64-linux/ghc-tinfo6-8.6.5/bin/ghc

As you can see stack stores binaries that may be shared between projects in $HOME/.stack. This directory is not supposed to be on $PATH but stack exec is aware of artifacts stored there and it can resolve command to proper binary. While a single ghc binary might be reused between projects it happens if and only if those projects' declared version of ghc are the same. Different versions of ghc can be used in different projects without any issues.

And how about project related binaries like previously used haskell-introduction-exe? Let's check it out:

> stack exec -- which haskell-introduction-exe
/home/michal/haskell-introduction/.stack-work/install/x86_64-linux-tinfo6/dd28ee69e237c048a9ddc4736a23ba5aabe5c6075009ccddf23dd601e1f9f4d6/8.6.5/bin/haskell-introduction-exe

The output tell us that stack stores project related binaries in $PWD/.stack-work directory.

Why sometimes we stack exec command and sometimes stack exec -- command? The former works only if command following it doesn't contain any whitespace while the latter works for any command.

stack run

If you need to simply run your project as we know it from sbt run then keep in mind that stack exec does not rebuild a project. That's why we had to stack build && stack exec haskell-introduction-exe. Also, you need to pass the name of the executable (haskell-introduction-exe in our case) which depends on the project. Fortunately some time ago stack introduced stack run which we will use from now on to rebuild and run the project.

> stack purge && stack run       # stack purge just to show that run triggers build
...
someFunc

REPL

You can also run your code from the REPL. ghci is the default REPL distributed together with ghc. Similarly to ghc you don't need to install it on your system - it will be fetched by stack based on your project definition.

> stack ghci
...
Ok, two modules loaded.
Loaded GHCi configuration from /tmp/haskell-stack-ghci/e5db0fdf/ghci-script
λ someFunc
someFunc
λ

If you just installed stack you probably see a different prompt. I configured it to λ and I will use it in snippets in this article to distinguish ghci code from bash commands, for which I use > as prompt.

It's important to note that stack ghci rebuilds your project and you can access your code from there. It gives a powerful way of tinkering with the code. If you find ghci input mode too limiting or need more of IDE support you can write your function in the file, rerun ghci and run the function. And all it feels close to immediate.

Adding external build dependency

Let's get back to the initial task of parsing JSON. A popular choice for JSON library in Haskell ecosystem is aeson. I think it's safe to compare it to circe, both in terms of popularity and how it actually works.

The only thing you need to do to add a dependency is changing package.yaml so its dependencies section looks like that:

dependencies:
- base >= 4.7 && < 5
- aeson                 # The only new line

Looks neat but where is the organization name? - you may ask. And more importantly - where is the version specified?

To be able to explain how stack manages dependencies I need to mention two components: Hackage and Stackage. Hackage is a package repository of Haskell packages and it contains more than thousand open source libraries. You can think of it as Maven Central Repository for Haskell.

Stackage, according to the docs, is:

a curated set of packages from Hackage which are regularly tested for compatibility. Stack defaults to using Stackage package sets to avoid dependency problems.

There's no counterpart of Stackage in Scala environment and I think it's pretty unusual concept for language specific build tool. However, it's very common concept in OS package managers so you can think of it as nix channels or debian releases.

Let's see how it works. First, we need to understand which Stackage resolver we use in our project. We can determine that by checking stack.yaml file in which we can find:

resolver: lts-14.22

Now we can go to https://www.stackage.org/lts-14.22 to see what packages in what versions are available for the resolver in use. Here's the result of searching for aeson and clicking on the first entry redirects us to https://www.stackage.org/lts-14.22/package/aeson-1.4.6.0. Therefore, we should expect aeson of version 1.4.6.0 to be used.

Let's try it out then: (the only file we changed after last build was package.yaml)

> stack build

The output on a system with just-installed stack will be quite big. A few selected lines:

dlist               > configure
dlist               > Configuring dlist-0.8.0.7...
dlist               > build
dlist               > Preprocessing library for dlist-0.8.0.7..
dlist               > Building library for dlist-0.8.0.7..
dlist               > [1 of 1] Compiling Data.DList
dlist               > copy/register
dlist               > Installing library in /home/michal/.stack/snapshots/x86_64-linux-tinfo6/dd28ee69e237c048a9ddc4736a23ba5aabe5c6075009ccddf23dd601e1f9f4d6/8.6.5/lib/x86_64-linux-ghc-8.6.5/dlist-0.8.0.7-62vR0IWGKydvDRbWJTrKt
dlist               > Registering library for dlist-0.8.0.7..
...
aeson               > Registering library for aeson-1.4.6.0..
...

The key observation here is that compiler on my machine actually compiled DList. And I haven't even asked for Dlist - it's being compiled because it's a transitive dependency of aeson.

One of crucial differences between stack and sbt (or rather between Haskell ecosystem and JVM ecosystem) is that libraries are distributed as source code as opposed to prebuilt JARs with bytecode. That means that stack needs to build aeson from source. More than this - it needs to build all aeson's dependencies too - that's why we see dlist in the above output. Keep that fact in mind whenever you are surprised why your tiny app compiles too long - it's probably the dependencies being compiled. Compiled libraries are stored in $HOME/.stack so you will not pay the price for each compilation.

Defining ADT

We will be working with ADT that can be expressed in Scala as the following: (full source)

sealed trait Activity

final case class BlogPost(id: Int, title: String, summary: String) extends Activity
final case class Comment(blogPostId: Int, content: String) extends Activity

It translates to the following Haskell code: (Lib.hs file - full source)

module Lib ( Activity(BlogPost, Comment) ) where

data Activity = BlogPost { id       :: Int
                         , title    :: String
                         , summary  :: String
                         }
              | Comment  { blogPostId :: Int
                         , content    :: String
                         }

ADTs by themselves are a good topic for a separate article so I will not go into details here. Let's just try out to create instance of Comment in ghci:

> :t Comment
Comment :: Int -> String -> Activity
> let c = Comment 3 "awesome comment"
> :force c
c = <Comment> 3 "awesome comment"

Please mind that Comment return type is Activity. That's because data constructors (BlogPost and Comment) are not types but only functions.

Derive JSON type classes

To be able to translate our ADT to JSON and back we need to have proper type class instances. In case of Scala we need to annotate trait Activity to derive its circe Encoder and Decoder: (full source)

@ConfiguredJsonCodec
sealed trait Activity
object Activity {
  implicit val config: Configuration =
    Configuration.default.withDiscriminator("tag")
}
...

We can do the same in Haskell: (Lib.hs file - full source):

{-# LANGUAGE DeriveGeneric #-}
...
data Activity = ...
              | Comment  { blogPostId :: Int
                         , content    :: String
                         }
              deriving (Generic, Show)

instance ToJSON Activity
instance FromJSON Activity

Having instances derived we can try to use them from ghci. Let's find out the type of Data.Aeson.encode first:

*Lib Lib> import Data.Aeson
*Lib Lib Data.Aeson> :t encode
encode
  :: ToJSON a =>
     a -> bytestring-0.10.8.2:Data.ByteString.Lazy.Internal.ByteString

In our application we intend to use putStrLn which is of type String -> IO (). Then, we need to find a function ByteString -> String. As any problem we can "google it". Alternatively, in case of Haskell, we can also "hoogle it". Hoogle is a Haskell API search engine which allows you to search for Haskell functions by function name or by type signature.

Therefore you can just look for Data.ByteString.Lazy.Internal.ByteString -> [Char] (I took the first type from the ghci output). The only result suggests importing import Data.ByteString.Lazy.Internal. Let's give it a try:

> import Data.ByteString.Lazy.Internal

<no location info>: error:
    Could not load module ‘Data.ByteString.Lazy.Internal’
    It is a member of the hidden package ‘bytestring-0.10.8.2’.
    You can run ‘:set -package bytestring’ to expose it.
    (Note: this unloads all the modules in the current scope.)

What is this "hidden package" message about? It happens when you try to use from your code a type or function which is defined in a transitive dependency, i.e. dependency you have on dependencies list but only as a result of other package depending on it.

It's very easy to fix it - just add the dependency explicitly in package.yaml:

dependencies:
- base >= 4.7 && < 5
- aeson
- bytestring        # The only new line

This is another difference between sbt and stack: stack does not allow you to refer to code defined in transitive dependencies. Although there is an sbt plugin to achieve the same behaviour in sbt too.

Now, with bytestring as explicit dependency we should be able to import its types and finally get encoded string:

> import Data.Aeson
> import  Data.ByteString.Lazy.Internal

> unpackChars ( encode ( Comment 3 "awesome Comment" ) )
"{\"tag\":\"Comment\",\"blogPostId\":3,\"content\":\"awesome comment\"}"

Looks good although all those parenthesis look a bit clunky. We can get rid of them with using widely used pattern:

> unpackChars $ encode $ Comment 3 "awesome Comment"
"{\"tag\":\"Comment\",\"blogPostId\":3,\"content\":\"awesome comment\"}"

You can think of it as an opening parenthesis which is accompanied by an implicit closing parenthesis at the end of the line.

Final solution

We've implemented the JSON part of the task. It's time to write main function which will load JSON from file, filter parsed content and print out the result. In Scala it may look like this: (full source)

object Main {
  def main(args: Array[String]): Unit = {
    val activitiesEither = parseFile(Paths.get("../input.json").toFile).flatMap(_.as[List[Activity]])
    val output = activitiesEither match {
      case Right(activities) => process(activities)
      case Left(e)           => s"Something went wrong: $e"
    }
    println(output)
  }

  def process(activities: List[Activity]): String =
    onlyComments(activities).asJson.spaces2

  def onlyComments(activities: List[Activity]): List[Activity] =
    activities.filter(isComment)

  def isComment(a: Activity): Boolean = a match {
    case Comment(_, _) => true
    case _             => false
  }
}

And here's the Haskell counterpart: (full source)

module Main where

import qualified Data.ByteString.Lazy.Internal as C
import Data.Aeson
import Data.List
import Control.Arrow
import Lib

main :: IO ()
main = do activitiesEither <- eitherDecodeFileStrict "../input.json" :: IO (Either String [Activity])
          let output = case activitiesEither of 
                            -- It will not work properly for UTF-8 characters but for sake of a demonstration it's good enough
                            (Right activities) -> C.unpackChars $ process activities
                            (Left e)           -> "Something went wrong: " ++ e
              in (putStrLn output)

process :: [Activity] -> C.ByteString
process activities = encode $ onlyComments activities

onlyComments :: [Activity] -> [Activity]
onlyComments activites = filter isComment activites

isComment :: Activity -> Bool
isComment (Comment _ _) = True
isComment otherwise     = False

I will not comment in detail the above Haskell snippet and I hope you can make sense of it just by comparing it to Scala snippet. What I want to draw your attention to is that there is nothing here that is foreign to an average Scala developer. Either type, filter on List, type class based encoder and decoder, IO monad - those are standard tools for Scala developer. It's true that syntax and specifics of implementation differs but the ideas stay the same.

Where to go next

I am a Haskell beginner myself so I cannot offer you any definitive answer. And I highly doubt there is any definitive answer anyway. I can describe my current approach but I encourage you to determine your own learning strategy.

I solve exercises from Advent of Code. They are simple algorithmic problems, ones that make you proficient with control structures, syntax, and basic data structures. They are easy enough for me to not get stuck, even while learning a new language, but challenging enough not to get bored. The success criteria are clear and feedback after providing the answer is immediate.

While solving small coding exercises is fun and lets me familiarize myself with syntax it does not help in understanding how to work with libraries, networking, databases and all those bits that actually make programming difficult. Here I can wholeheartedly recommend an amazing REST-ish Services in Haskell tutorial. It includes parsing command line arguments, config file, implementing REST API endpoints, writing to a database and many others. It's a comprehensive manual on how to write your own web application in Haskell. Moreover, it is also an in-depth resource on how to write web services in general. I could not recommend it enough!

I do read classical resources and find them very useful. I do not read them page by page as they don't keep me engaged enough. Instead, I read selectively chapters I need right now to solve the problem at hand. I use excellent Haskell Book and Learn You a Haskell among others.

I hope you will find Scala to Haskell cheatsheet useful in your learning process too.