Haskell for impatient Scala developer: Getting into speed
If you're reading this I am assuming you are a Scala developer and you want to learn some Haskell. I got 2 news for you - a good one and a bad one. The good one is that there are plenty of Haskell resources available. The bad one is that there is not a single one targeting Scala developers specifically.
Why would lack of Haskell resources for Scala developer bother anyone? The thing is that as a Scala developer you know a lot of concepts already. You know what is monad and applicative, you folded through a list many times and you're not scared of immutable collections. Therefore going through regular Haskell tutorials or books feels slow and is not very engaging because they assume you are starting from zero.
I prefer learning by practice so I tried to use Haskell for some side projects. However, I realized that I miss a single succinct Scala to Haskell cheat sheet that I could glance at when being at doubt about basic parts. This article is not such a spreadsheet - I started working on this here. You shouldn't rely on such superficial knowledge for too long - after all that Scala analogies are not 100% accurate. The point of them is getting you into speed.
This article aim is to walk you through essential parts of learning a new language that are often missed: which build tool you should use, how to create a new project or how to add an external dependency. Some of those things function differently from their Scala counterparts and I will try to stress them out.
The whole code presented here is available in the repository. It also contains the Scala equivalent of the application we build.
What is our sample applications supposed to do
It will be a simple command line application that given such file as input:
[
{"tag": "Comment", "blogPostId": 1, "content": "Some comment"},
{"tag": "BlogPost", "id": 1,"title": "Some blog post", "summary": "Some post"},
{"tag": "BlogPost", "id": 2,"title": "Another blog post", "summary": "Another post"},
{"tag": "Comment", "blogPostId": 1, "content": "Another comment" }
]
will print out the following output:
[
{"tag": "Comment", "blogPostId": 1, "content": "Some comment"},
{"tag": "Comment", "blogPostId": 1, "content": "Another comment" }
]
It's basically: parse JSON, filter out some of items, print the result as JSON. Although very simple, it requires external dependency for working with JSON.
The first step of developing any application is creating initial directory structure so it adheres to build tool expectations. And as we talk about it - you need a build tool.
Our build tool of choice - stack
We need an sbt
analogue for Haskell. The two most popular choices in Haskell are cabal
and stack
. I decided to stick to stack
for this article.
stack
is built on top of ghci
, cabal
and hackage
and tries to provide better developer experience than using those tools directly. You can read more here.
IDE - IntelliJ with IntelliJ-Haskell plugin
You also need a code editor. As I am coming from Scala I have used IntelliJ on a daily basis for a few years. There's an IntelliJ-Haskell plugin which "just works".
The choice of both build tool and IDE is debatable and highly opinionated. Right now, however, we don't care about the best option - we just want to get started. And I believe this setup resembles a typical Scala setup.
Let's start!
Bootstrapping the project
First, install stack according to this instruction. Then, we need to bootstrap a new project. In Scala you may have used sbt new
to do that. In case of stack the command happens to be called the same; therefore:
> stack new haskell-introduction
haskell-introduction
is the name of the new project; it will be used as a directory name too. As we have passed only one argument to stack new
the default template will be used. After the command completed we should see something like this:
> tree haskell-introduction
haskell-introduction
├── app
│ └── Main.hs
├── ChangeLog.md
├── haskell-introduction.cabal
├── LICENSE
├── package.yaml
├── README.md
├── Setup.hs
├── src
│ └── Lib.hs
├── stack.yaml
└── test
└── Spec.hs
3 directories, 10 files
All *.hs
files are Haskell sources. The most relevant files to the build definition are package.yaml
and stack.yaml
. In simplistic terms, package.yaml
corresponds to build.sbt
as it defines the project we build whereas stack.yaml
control stack-related settings - things we would expect in project
directory for sbt
-based projects. We will only touch package.yaml
in scope of this article.
Generated project comes with the functionality of printint out hardcoded string. Let run it with stack exec
:
> cd haskell-introduction # enter directory created by `stack new`
> stack build && stack exec haskell-introduction-exe
somefunc
If you see somefunc
in your terminal too you're now good to open the project in the IDE. Start with installing IntelliJ-Haskell according to getting started section. This document also describes in detail how to open a new project. For the first project it includes some extra steps like configuring Project SDK so I suggest to read it carefully.
If you installed the plugin and opened the project you should observe no errors in the IDE and things such as highlighting, code completion and navigating to the definition should function properly.
What does stack actually do?
Let's step back to understand what exactly happens when we stack exec
.
One of things stack
does is providing a compiler - ghc
. I don't have it installed on my system:
> which ghc
ghc not found
Yet, it is available to stack:
> stack exec -- which ghc
/home/michal/.stack/programs/x86_64-linux/ghc-tinfo6-8.6.5/bin/ghc
As you can see stack stores binaries that may be shared between projects in $HOME/.stack
. This directory is not supposed to be on $PATH
but stack exec
is aware of artifacts stored there and it can resolve command to proper binary. While a single ghc
binary might be reused between projects it happens if and only if those projects' declared version of ghc
are the same. Different versions of ghc can be used in different projects without any issues.
And how about project related binaries like previously used haskell-introduction-exe
? Let's check it out:
> stack exec -- which haskell-introduction-exe
/home/michal/haskell-introduction/.stack-work/install/x86_64-linux-tinfo6/dd28ee69e237c048a9ddc4736a23ba5aabe5c6075009ccddf23dd601e1f9f4d6/8.6.5/bin/haskell-introduction-exe
The output tell us that stack stores project related binaries in $PWD/.stack-work
directory.
Why sometimes we stack exec command
and sometimes stack exec -- command
? The former works only if command following it doesn't contain any whitespace while the latter works for any command.
stack run
If you need to simply run your project as we know it from sbt run
then keep in mind that stack exec
does not rebuild a project. That's why we had to stack build && stack exec haskell-introduction-exe
. Also, you need to pass the name of the executable (haskell-introduction-exe
in our case) which depends on the project. Fortunately some time ago stack introduced stack run
which we will use from now on to rebuild and run the project.
> stack purge && stack run # stack purge just to show that run triggers build
...
someFunc
REPL
You can also run your code from the REPL. ghci
is the default REPL distributed together with ghc
. Similarly to ghc you don't need to install it on your system - it will be fetched by stack based on your project definition.
> stack ghci
...
Ok, two modules loaded.
Loaded GHCi configuration from /tmp/haskell-stack-ghci/e5db0fdf/ghci-script
λ someFunc
someFunc
λ
If you just installed stack you probably see a different prompt. I configured it to λ
and I will use it in snippets in this article to distinguish ghci code from bash commands, for which I use >
as prompt.
It's important to note that stack ghci
rebuilds your project and you can access your code from there. It gives a powerful way of tinkering with the code. If you find ghci
input mode too limiting or need more of IDE support you can write your function in the file, rerun ghci
and run the function. And all it feels close to immediate.
Adding external build dependency
Let's get back to the initial task of parsing JSON. A popular choice for JSON library in Haskell ecosystem is aeson
. I think it's safe to compare it to circe
, both in terms of popularity and how it actually works.
The only thing you need to do to add a dependency is changing package.yaml
so its dependencies section looks like that:
dependencies:
- base >= 4.7 && < 5
- aeson # The only new line
Looks neat but where is the organization name? - you may ask. And more importantly - where is the version specified?
To be able to explain how stack manages dependencies I need to mention two components: Hackage and Stackage. Hackage is a package repository of Haskell packages and it contains more than thousand open source libraries. You can think of it as Maven Central Repository for Haskell.
Stackage, according to the docs, is:
a curated set of packages from Hackage which are regularly tested for compatibility. Stack defaults to using Stackage package sets to avoid dependency problems.
There's no counterpart of Stackage in Scala environment and I think it's pretty unusual concept for language specific build tool. However, it's very common concept in OS package managers so you can think of it as nix channels or debian releases.
Let's see how it works. First, we need to understand which Stackage resolver
we use in our project. We can determine that by checking stack.yaml
file in which we can find:
resolver: lts-14.22
Now we can go to https://www.stackage.org/lts-14.22 to see what packages in what versions are available for the resolver in use. Here's the result of searching for aeson
and clicking on the first entry redirects us to https://www.stackage.org/lts-14.22/package/aeson-1.4.6.0. Therefore, we should expect aeson of version 1.4.6.0
to be used.
Let's try it out then: (the only file we changed after last build was package.yaml
)
> stack build
The output on a system with just-installed stack will be quite big. A few selected lines:
dlist > configure
dlist > Configuring dlist-0.8.0.7...
dlist > build
dlist > Preprocessing library for dlist-0.8.0.7..
dlist > Building library for dlist-0.8.0.7..
dlist > [1 of 1] Compiling Data.DList
dlist > copy/register
dlist > Installing library in /home/michal/.stack/snapshots/x86_64-linux-tinfo6/dd28ee69e237c048a9ddc4736a23ba5aabe5c6075009ccddf23dd601e1f9f4d6/8.6.5/lib/x86_64-linux-ghc-8.6.5/dlist-0.8.0.7-62vR0IWGKydvDRbWJTrKt
dlist > Registering library for dlist-0.8.0.7..
...
aeson > Registering library for aeson-1.4.6.0..
...
The key observation here is that compiler on my machine actually compiled DList. And I haven't even asked for Dlist - it's being compiled because it's a transitive dependency of aeson.
One of crucial differences between stack and sbt (or rather between Haskell ecosystem and JVM ecosystem) is that libraries are distributed as source code as opposed to prebuilt JARs with bytecode. That means that stack needs to build aeson from source. More than this - it needs to build all aeson's dependencies too - that's why we see dlist
in the above output. Keep that fact in mind whenever you are surprised why your tiny app compiles too long - it's probably the dependencies being compiled. Compiled libraries are stored in $HOME/.stack
so you will not pay the price for each compilation.
Defining ADT
We will be working with ADT that can be expressed in Scala as the following: (full source)
sealed trait Activity
final case class BlogPost(id: Int, title: String, summary: String) extends Activity
final case class Comment(blogPostId: Int, content: String) extends Activity
It translates to the following Haskell code: (Lib.hs
file - full source)
module Lib ( Activity(BlogPost, Comment) ) where
data Activity = BlogPost { id :: Int
, title :: String
, summary :: String
}
| Comment { blogPostId :: Int
, content :: String
}
ADTs by themselves are a good topic for a separate article so I will not go into details here. Let's just try out to create instance of Comment in ghci:
> :t Comment
Comment :: Int -> String -> Activity
> let c = Comment 3 "awesome comment"
> :force c
c = <Comment> 3 "awesome comment"
Please mind that Comment
return type is Activity
. That's because data constructors (BlogPost
and Comment
) are not types but only functions.
Derive JSON type classes
To be able to translate our ADT to JSON and back we need to have proper type class instances. In case of Scala we need to annotate trait Activity
to derive its circe Encoder and Decoder: (full source)
@ConfiguredJsonCodec
sealed trait Activity
object Activity {
implicit val config: Configuration =
Configuration.default.withDiscriminator("tag")
}
...
We can do the same in Haskell: (Lib.hs
file - full source):
{-# LANGUAGE DeriveGeneric #-}
...
data Activity = ...
| Comment { blogPostId :: Int
, content :: String
}
deriving (Generic, Show)
instance ToJSON Activity
instance FromJSON Activity
Having instances derived we can try to use them from ghci
. Let's find out the type of Data.Aeson.encode
first:
*Lib Lib> import Data.Aeson
*Lib Lib Data.Aeson> :t encode
encode
:: ToJSON a =>
a -> bytestring-0.10.8.2:Data.ByteString.Lazy.Internal.ByteString
In our application we intend to use putStrLn
which is of type String -> IO ()
. Then, we need to find a function ByteString -> String
. As any problem we can "google it". Alternatively, in case of Haskell, we can also "hoogle it". Hoogle is a Haskell API search engine which allows you to search for Haskell functions by function name or by type signature.
Therefore you can just look for Data.ByteString.Lazy.Internal.ByteString -> [Char]
(I took the first type from the ghci output). The only result suggests importing import Data.ByteString.Lazy.Internal
. Let's give it a try:
> import Data.ByteString.Lazy.Internal
<no location info>: error:
Could not load module ‘Data.ByteString.Lazy.Internal’
It is a member of the hidden package ‘bytestring-0.10.8.2’.
You can run ‘:set -package bytestring’ to expose it.
(Note: this unloads all the modules in the current scope.)
What is this "hidden package" message about? It happens when you try to use from your code a type or function which is defined in a transitive dependency, i.e. dependency you have on dependencies list but only as a result of other package depending on it.
It's very easy to fix it - just add the dependency explicitly in package.yaml
:
dependencies:
- base >= 4.7 && < 5
- aeson
- bytestring # The only new line
This is another difference between sbt and stack: stack does not allow you to refer to code defined in transitive dependencies. Although there is an sbt plugin to achieve the same behaviour in sbt too.
Now, with bytestring
as explicit dependency we should be able to import its types and finally get encoded string:
> import Data.Aeson
> import Data.ByteString.Lazy.Internal
> unpackChars ( encode ( Comment 3 "awesome Comment" ) )
"{\"tag\":\"Comment\",\"blogPostId\":3,\"content\":\"awesome comment\"}"
Looks good although all those parenthesis look a bit clunky. We can get rid of them with using widely used pattern:
> unpackChars $ encode $ Comment 3 "awesome Comment"
"{\"tag\":\"Comment\",\"blogPostId\":3,\"content\":\"awesome comment\"}"
You can think of it as an opening parenthesis which is accompanied by an implicit closing parenthesis at the end of the line.
Final solution
We've implemented the JSON part of the task. It's time to write main function which will load JSON from file, filter parsed content and print out the result. In Scala it may look like this: (full source)
object Main {
def main(args: Array[String]): Unit = {
val activitiesEither = parseFile(Paths.get("../input.json").toFile).flatMap(_.as[List[Activity]])
val output = activitiesEither match {
case Right(activities) => process(activities)
case Left(e) => s"Something went wrong: $e"
}
println(output)
}
def process(activities: List[Activity]): String =
onlyComments(activities).asJson.spaces2
def onlyComments(activities: List[Activity]): List[Activity] =
activities.filter(isComment)
def isComment(a: Activity): Boolean = a match {
case Comment(_, _) => true
case _ => false
}
}
And here's the Haskell counterpart: (full source)
module Main where
import qualified Data.ByteString.Lazy.Internal as C
import Data.Aeson
import Data.List
import Control.Arrow
import Lib
main :: IO ()
main = do activitiesEither <- eitherDecodeFileStrict "../input.json" :: IO (Either String [Activity])
let output = case activitiesEither of
-- It will not work properly for UTF-8 characters but for sake of a demonstration it's good enough
(Right activities) -> C.unpackChars $ process activities
(Left e) -> "Something went wrong: " ++ e
in (putStrLn output)
process :: [Activity] -> C.ByteString
process activities = encode $ onlyComments activities
onlyComments :: [Activity] -> [Activity]
onlyComments activites = filter isComment activites
isComment :: Activity -> Bool
isComment (Comment _ _) = True
isComment otherwise = False
I will not comment in detail the above Haskell snippet and I hope you can make sense of it just by comparing it to Scala snippet. What I want to draw your attention to is that there is nothing here that is foreign to an average Scala developer. Either
type, filter
on List, type class based encoder and decoder, IO monad - those are standard tools for Scala developer. It's true that syntax and specifics of implementation differs but the ideas stay the same.
Where to go next
I am a Haskell beginner myself so I cannot offer you any definitive answer. And I highly doubt there is any definitive answer anyway. I can describe my current approach but I encourage you to determine your own learning strategy.
I solve exercises from Advent of Code. They are simple algorithmic problems, ones that make you proficient with control structures, syntax, and basic data structures. They are easy enough for me to not get stuck, even while learning a new language, but challenging enough not to get bored. The success criteria are clear and feedback after providing the answer is immediate.
While solving small coding exercises is fun and lets me familiarize myself with syntax it does not help in understanding how to work with libraries, networking, databases and all those bits that actually make programming difficult. Here I can wholeheartedly recommend an amazing REST-ish Services in Haskell tutorial. It includes parsing command line arguments, config file, implementing REST API endpoints, writing to a database and many others. It's a comprehensive manual on how to write your own web application in Haskell. Moreover, it is also an in-depth resource on how to write web services in general. I could not recommend it enough!
I do read classical resources and find them very useful. I do not read them page by page as they don't keep me engaged enough. Instead, I read selectively chapters I need right now to solve the problem at hand. I use excellent Haskell Book and Learn You a Haskell among others.
I hope you will find Scala to Haskell cheatsheet useful in your learning process too.