Replace JSON with Dhall: DynamoDB case study
In this post I will show you how you can rewrite a piece of schema-less JSON file into Dhall. As an example I will use JSON being used for creating a DynamoDB table. It was chosen for illustrative purposes only and you don't need to know anything about DynamoBD and it is not really relevant to the key message of this post.
Do not treat this blogpost as either comprehensive introduction to Dhall or list of best practices. I am a Dhall beginner and want to present a use case when it is useful. Thus, the code itself might not be of the highest quality.
Before diving into Dhall we will take a look at how configuration files are being written currently.
Current approach to configuration files
Dhall is advertised as non-repetitive alternative to YAML
and I think such positioning definitely makes sense. YAML, JSON, and their derivatives have become a de facto standard for many aspects of Devops and configuration management. Just think how you write your docker-compose
file, your Kubernetes files, your OpenAPI specification, your DynamoDB table specification or CI job. All of them are either YAML or JSON. However, not so many users of them would actually say they like those formats. Lack of schema, no support for code reuse or even variables, no type safety - those are the biggest problems, among others.
Another language in this domain is HashiCorp Configuration Language, also known simply as HCL, which is used to define Terraform based infrastructure. To me HCL feels like a language that emerged in ad-hoc fashion rather than one that was meticulously designed. It misses basic tools like user defined functions so it is hard to structure your code in a lightweight way. Lack of enums is also quite disturbing. Let's consider attribute encryption_type
of aws_kinesis_stream
resource. Even though it is documented that the only acceptable values are NONE or KMS Terraform will happily accept any other value.
As a person working daily with strongly statically typed language (i.e. Scala) I was struck that crucial parts of code are written in a way that simple typo will be detected only at runtime. I sighed if only we have some simple, possibly Turing incomplete language specialized in configuration. Then a colleague of mine pointed me to Dhall and I realized that was the thing I was looking for.
We can do better: Dhall
Dhall is a configuration language. You can think of it as JSON. However, unlike JSON, it is programmable - you can define functions. It is modular - you can extract commonly used functions to a file and import it in many places. It's also statically typed so you will be notified of type errors ahead of time. Since it is also strongly typed there is no type casting.
Although Dhall is programmable it is not Turing complete. It is a conscious design decision - thanks to that it is always guaranteed to terminate and will never hang. It only means that there is not general recursion in the language but you still can for example map over list.
I do not want to describe Dhall in detail in this blogpost. If you want to know more both Dhall's readme and site are good places to start.
What I want to do instead is to show you an example of how Dhall can be used to simplify a configuration file.
DynamoDB example - original JSON
Before we start refactoring we need to understand what is the starting point, namely how the JSON used by DynamoDB looks like. We will be working with made up example so no need to think too much about the structure of that table. We focus on the way it is specified instead.
DynamoDB table can be created using CLI:
aws dynamodb create-table --cli-input-json file:///your/path/table.json
In this text we focus solely on table.json
file, which syntax is described in AWS docs . Here is how it may look like:
{
"AttributeDefinitions": [
{
"AttributeName": "Id",
"AttributeType": "S"
},
{
"AttributeName": "Artist",
"AttributeType": "S"
},
{
"AttributeName": "Song",
"AttributeType": "S"
},
{
"AttributeName": "Year",
"AttributeType": "N"
}
],
"KeySchema": [
{
"KeyType": "HASH",
"AttributeName": "Id"
}
],
"GlobalSecondaryIndexes": [
{
"IndexName": "ArtistSongIndex",
"Projection": {
"ProjectionType": "ALL"
},
"ProvisionedThroughput": {
"WriteCapacityUnits": 3,
"ReadCapacityUnits": 3
},
"KeySchema": [
{
"KeyType": "HASH",
"AttributeName": "Artist"
},
{
"KeyType": "RANGE",
"AttributeName": "Song"
}
]
},
{
"IndexName": "YearArtistIndex",
"Projection": {
"ProjectionType": "ALL"
},
"ProvisionedThroughput": {
"WriteCapacityUnits": 2,
"ReadCapacityUnits": 2
},
"KeySchema": [
{
"KeyType": "HASH",
"AttributeName": "Year"
},
{
"KeyType": "RANGE",
"AttributeName": "Artist"
}
]
}
],
"ProvisionedThroughput": {
"WriteCapacityUnits": 2,
"ReadCapacityUnits": 2
},
"TableName": "Songs"
}
Problems with above JSON:
- lack of variables. If you make a typo by referring to
"Yearr"
instead of"Year"
in any index definition it will be caught as late as while running AWS request - lack of types. You can define
KeyType
as56
and nothing will complain - you can forget about
TableName
which is a required field - lack of enums. You can define
KeyType
as"whatever"
even though"HASH"
or"RANGE"
are only valid values - lack of comments. It's JSON specific issue, YAML has a way of adding comments
- it's very repetitive. You need to repeat 4 lines of
ProvisionedThroughput
over and over although it is basically a function of 2 integer arguments. Thus, it is cumbersome to write - due to all verbosity the signal to noise ratio of the file is very low. It makes reading and comprehending key ideas expressed in the file difficult
Once we know what we want to fix let's start doing that with Dhall!
Rewriting DynamoDB example with Dhall
How to run the code
You can find the full code used in the example in github repository. Its README contains instruction on how to run the code.
File structure
File structure is as follows:
dhall
├── generic
│ ├── functions.dhall
│ ├── schema.dhall
│ └── types.dhall
└── migration.dhall
Directory generic
contains common types and functions useful when working with DynamoDB create-table
JSON format. In an ideal world it would have been written already by someone else and published in some repository. It consists of things that are supposed to be written once and used many times. I cut corners though and I implemented just pieces that are relevant to the example presented in this post.
The file migration.dhall
is the only one that includes pieces of information related to exemplary JSON file mentioned at the beginning of this post.
Given such file structure you can generate JSON out of migration.dhall
by:
dhall-to-json --explain --pretty <<< './dhall/migration.dhall : ./dhall/generic/schema.dhall'
Defining types
Let's start with defining types in types.dhall
. Here is the fragment of it:
let AttributeDefinition = {
AttributeName: Text,
AttributeType: Text
}
let ProvisionedThroughput = {
WriteCapacityUnits: Natural,
ReadCapacityUnits: Natural
}
-- more types ommited for sake of readability
As you see it is quite straightforward. It also shows the usual pattern of having a sequence of let
in the first part of Dhall's file. It needs to be followed by in
keyword and expression using definitions created with let
.
In our case we will use a record with all defined types in the in
section:
in
{
AttributeDefinition = AttributeDefinition,
GlobalSecondaryIndex = GlobalSecondaryIndex,
KeySchemaItem = KeySchemaItem,
ProvisionedThroughput = ProvisionedThroughput
}
Let's try it out: (I am using dhall
command here which reads from standard input, by ctrl-d you can signal end of input)
> dhall
let Types = ./generic/types.dhall in
{
WriteCapacityUnits = 5,
ReadCapacityUnits = 5
} : Types.ProvisionedThroughput
^D
{ ReadCapacityUnits = 5, WriteCapacityUnits = 5 }
It worked as expected. Now let's make a type mistake and see if Dhall will catch it:
> dhall
let Types = ./generic/types.dhall in
{
WriteCapacityUnits = 5,
ReadCapacityUnits = "hello"
} : Types.ProvisionedThroughput
^D
Use "dhall --explain" for detailed errors
Error: Expression doesn't match annotation
{ ReadCapacityUnits : - Natural
+ Text
, …
}
Error caught, success!
Defining schema
Now we can import types defined in previous point in schema.dhall
:
let Types = ./generic/types.dhall
in {
TableName: Text,
KeySchema: List Types.KeySchemaItem,
AttributeDefinitions: List Types.AttributeDefinition,
GlobalSecondaryIndexes: List Types.GlobalSecondaryIndex,
ProvisionedThroughput: Types.ProvisionedThroughput
}
Split between types.dhall
and schema.dhall
is arbitrary; they could as well be a single file. I find it clean to have the top level type defined in a separate file but Dhall itself does not enforce any structure.
Using schema
The most straightforward way of using that schema would be:
let Types = ./generic/types.dhall
in
{
AttributeDefinition = [
{
AttributeName = "S",
AttributeType = "Id"
}
-- other attributes ommited
]
-- other attributes ommited
}
However, it is similarly verbose to the original JSON and we wanted to avoid that. To prevent repetition we will declare a few functions in functions.dhall
to create a nice DSL we can use in migration.dhall
.
Here's the fragment of functions.dhall
related to AttributeDefinition
:
let mkAttribute =
λ(attributeType: Text)
→ λ(attributeName: Text)
→ {
AttributeName=attributeName,
AttributeType = attributeType
}
-- partially applied functions for each of types:
let mkStringAttribute = mkAttribute "S"
let mkNumberAttribute = mkAttribute "N"
As you can see Dhall
incorporates techniques known from functional programming such as currying and partial application. Thanks to that it gives us simple and reliable framework for abstraction.
Eventual form
All the generic functionality is in place, it is time to use it to rewrite the inital example:
let Types = ./generic/types.dhall
let Functions = ./generic/functions.dhall
let id = "Id"
let artist = "Artist"
let song = "Song"
let year = "Year"
let defaultThroughput = Functions.mkThroughput 2 2
in
{
TableName = "Songs",
KeySchema = [Functions.mkHashIndex id],
AttributeDefinitions = [
Functions.mkStringAttribute id,
Functions.mkStringAttribute artist,
Functions.mkStringAttribute song,
Functions.mkNumberAttribute year
],
GlobalSecondaryIndexes = [
Functions.mkIndex [Functions.mkHashIndex artist, Functions.mkRangeIndex song] (Functions.mkThroughput 3 3),
Functions.mkIndex [Functions.mkHashIndex year, Functions.mkRangeIndex artist] defaultThroughput
],
ProvisionedThroughput = defaultThroughput
}
That's it!
DynamoDB example - what was achieved
There is clear progress when you take a look at the final result and original DB example. The general feeling is that the resulting configuration is devoid of any noise; it simply conveys the essence of what needs to be expressed.
We were able to:
- eliminate repetitiveness of original format
- introduce variables so we don't have to repeat ourselves when it comes to name of fields. It also reduces spelling mistakes
- force our configuration to adhere to the defined schema. It means it protects us from type errors, omitting attribute keys etc.
You may argue that I had to write schema and Dhall functions that allowed me to radically improve level of expressiveness so there is some additional code outside of nice demo at the end.
That's right, but:
- you write your schema and helper functions only once and then you can use them multiple times
- once Dhall become more popular there will be a lot of schemas and code written by community. Of course, to some extent, it is already a case examples being dhall-nix or dhall-kubernetes.
DynamoDB example - deficiencies
Even though it looks quite good I must admit when I heard about Dhall first time I had something more powerful in mind. I expected to be able to describe whole schema with great precision using ADT. Moreover, I hoped for strong typing in a sense that I will hardly ever use Text
(i.e. Dhall's String
) type and the solution here is full of it.
Take a look at part of schema:
AttributeDefinitions : List {
AttributeName: Text,
AttributeType: Text
}
While AttributeName
is actually quite fine as Text
, AttributeType
in its substance is an enum with a few valid values only as documented here. You cannot put there ABC
and such type of mistake should be caught by configuration language when checking against schema. In that regard the mantra should be to check as much as possible as early as possible.
Union types to the rescue?
The good news is that Dhall enables to express enums on type level by using unions. Here we try to be more explicit about what types we expect for AttributeType
:
-- There are a few more types supported by DynamoDB, let's consider those 3 to be more concise:
let AttributeType = < Number : {} | Binary : {} | String : {} >
let attributeType = constructors AttributeType
let AttributeDefinition = {
AttributeName: Text,
AttributeType: AttributeType
}
let idAttr = {
AttributeName = "Id",
AttributeType = attributeType.String {=}
}
in
idAttr
We can run it against dhall
to prove that Dhall "understands" the meaning of such configuration:
dhall <<< './unions.dhall'
{ AttributeName =
"Id"
, AttributeType =
< String = {=} | Binary : {} | Number : {} >
}
Now, let's try to generate JSON out of it:
dhall-to-json --pretty <<< './unions.dhall'
{
"AttributeName": "Id",
"AttributeType": {}
}
"AttributeType": {}
is not something we want to achieve. We would like to have "AttributeType": "S"
. It is understandable that dhall-to-json
did not come up with expected result taking into account we have not defined JSON representation for AttributeType
union. We may do that by defining a function attributeTypeToString = λ(t : AttributeType) → Text
in Dhall, which is easy. There is a major problem here though - as return type of that function is Text
we would need to declare AttributeType
field as Text
again negating most of the benefit of introducing union type AttributeType
at first. It still may have some benefit, but only providing you will keep the convention of setting AttributeType
field always by using attributeTypeToString
function. Mind that it would work only by convention and there is nothing in Dhall's type system that will stop you from setting AttributeType
to any, possibly invalid, Text
.
All in all, the problem boils down to:
When using Dhall viadhall-to-json
all types in leaf nodes of a schema have to be declared as primitive types supported bydhall-to-json
.
It is not a problem of dhall-to-json
itself; it is clear that it cannot be more precise then underlying format. Hypothetically it could have some resolution mechanism so it would try to find a function of type AttributeType -> Text
to enable usage of rich types directly in schema but it is not a design goal of dhall-to-json
. I have not checked dhall-to-yaml
but I believe it has the same constraint.
Although it may look like an obvious limitation it took me some time to realize it. I believe it should be taken into account when thinking about potential use cases for dhall-to-json
.
Possible solutions
- One apparent solution would be to write our own
dhall-to-dynamo
using Dhall's Haskell bindings. We would be able to treat DynamoDB related types differently there. However, in this blogpost I am advocating Dhall as a Swiss army knife for configuration formats. We should be able to write a few relatively straightforward.dhall
files and simply profit without caring about Haskell bindings or even knowing Haskell at all, let alone building and distributing binaries - We may define
AttributeType
as< Number : Text | Binary : Text | String : Text >
. Then we may create a type constructors which will propagate validText
values, e.g.let mkNumber = attributeType.Number "N"
. The problem here is that nothing stops user from bypassing the type constructor and simply specifyingattributeType.Number "rubbish"
. We cannot< Number : "N" | ...
as"N"
is term as opposed to type and Dhall provides no means of restricting valid values of types (would be very happy to be proven wrong here but I was not able to find anything in that regard) - We can define two schemas in Dhall: rich and primitive. Rich one would operate on semantic types while primitive on underlying format types. A schema developer would need to provide a function
transformSchema: RichSchema -> Schema
, that function being the only gateway from rich to primitive types. A person using schema would be supposed to work only with rich types and would calltransformSchema
function at the very end of the config.
I implemented the third approach in a very limited scope (I enriched only AttributeType
to be a union type) here. In such limited scope the change looks quite simple but I am afraid in even a bit more advanced case maintaining function transformSchema.dhall
would become a bottleneck. That important factor in that regard is depth of schema structure. In case of really deep structures some tools for working with them, such as optics in FP or visitor pattern in OOP, would be very useful. As far as I know Dhall currently does not provide them.
Still, I believe the last approach is best from proposed ones and is worth further exploring.
In case you wonder - why not simply call transformAttributeType
(and transformHashType
and so on) avoiding any necessity of working with nested structures? While it would work it would be against of the whole idea of strong typing. The essence of proposed solution is to have strictly one place where we translate
Other use cases / Possible extensions
What I described in this post is using Dhall only for generating one file which describes just one piece of overall architecture. The vision worth pursuing is something I call Dhall all the way down. The idea is to use Dhall files as the only ones that should be modified by developer of the application.
So instead of setting up DynamoDB table with Terraform, providing table schema with JSON and configuring your Scala application with HOCON (aka typesafe-config) you would configure everything at Dhall level only once. Dhall can generate proper configuration files in underlying formats so it is not required for all pieces to understand Dhall. The biggest advantage of such approach would be referential integrity check. Without Dhall when changing table name in JSON it is easy to forget to update HOCON used by Scala application.
It is a high level vision. Not sure how feasible it is right now. One apparent problem in example described above is lack of dhall-to-hocon
.
Conclusion
Dhall provides a simple way of defining configuration files in less verbose and less error-prone way than JSON or YAML. Also, writing schema and helper functions is quite easy job and can pay off in increased productivity even for small use cases. I would say that if you need to maintain more that 5-10 configuration JSON files similar to described in this post it's already a scale to start profiting from Dhall.
Disclaimer: for people not fluent in statically typed functional languages learning curve can be steeper.
If you hope for being able to define schema in super typesafe and extremely precise way so you are able to express things like field A is either number lower than 5 or string of length 15 then Dhall itself will not help you to that much extent (at least at the moment).
My general feeling is that Dhall philosophy is to provide a set of clean, well defined and very thoroughly defined primitives while not caring that much about ergonomics for specific use case. That goes along with observations gathered in Dhall survey. It seems like providing right abstractions/tools for specific use cases is an exercise left for future. I agree with that on philosophical level because it is much easier to provide opinionated solutions on top of clean primitives than the other way round. From a pragmatic point of view the question is how fast and how big the community and tooling around Dhall will grow. I do not feel entitled to give my bet on this as I just started my adventure with Dhall. I personally will start using Dhall for simple cases and experiment around more advanced ones. With a grain of evangelism which I hopefully did in this post.
Acknowledgements
Thanks to Gabriel Gonzalez and all contributors for the wonderful work on Dhall. High quality of all software involved and clarity of thought of documentation are stunning.
Thanks to Krzysztof Janosz who introduced me to Dhall.
Github repository
Repository with code used in this article