Message compression formats in a web application

I have seen quite a few web applications which are passing heaps of data between client and server. Pretty much all of them use JSON for that purpose. And whilst that solves the business problem, some of those applications aim to provide nearly real-time user experience, and that’s where the issues arise.

With the rise of web sockets, RTC and HTTP2 tech, the data transfer speed issues might be not so noticeable, but the more data app tries to transfer, the more pressing the matter gets. This issue also becomes more apparent on the server side - server has to process a lot more data in a request thread.

At The Trade Desk I have observed an approach, where we are transferring a list of strings (later we converted them to integers to widen the bandwidth) between high-loaded services (think 15 million requests per second with a hard time bound on request processing of 100 milliseconds).

I thought this is a cool concept, worth exploring deeper. For this matter, I have searched the Internet for similar approaches.

One of the readings was an article on the russian resource about speeding up a web application by switching from HTTP1 to HTTP2, utilizing various data compression formats: for static assets - WEBP, AVIF and progressive JPEG; for general data - various stream compression algorithms - GZIP, Brotli and ZSTD; alongside with different data serialization formats - MessagePack.

I decided to expand my search in the direction of serialization formats. In this article I compare few of those and focus on their performance in serializing different kinds of data (lists, nested objects, integers and strings, large and small datasets) and the serialization & deserialization performance in browser and the compression rates.

The formats covered are:

One of the optimizations we have achieved at The Trade Desk was 7 times reduction in data size by switching from string IDs to integer values. Think "something-else" we used to operate everywhere was assigned an integer ID of 4092. Now that int is 4 bytes long, whereas the string value is 14 bytes long. That is more than 3x reduction in size, but you have to still store the mapping somewhere, which we already did (in our case).

But the overall idea is worth researching too - how much compression each format provides when working with long lists of strings and long lists of ints.

To make the thing more interesting, I came up with few various data types to be messed around:

  • a simple object Pet { name: string, kind: enum Kind { CAT, DOG } }
  • an object with an array of string identifiers StringIdsResource { ids: string[] }
  • an object with an array of integer identifiers StringIdsResource { ids: int[] }

As mentioned before, the metrics I’m going to be focusing on are:

  • encoding & decoding performance in browser environment
  • encoded message length (raw, as UTF-8 string and base-64 encoded UTF-8 string)
  • amount of runtime (bundled code) required to work with the format

To make it quick and easy, here’s the summary (time measured in browser, on a nested object with a list of 1000 children of various data types):

Serializer Encoding time Decoding time Encoded data size (byte array) Compression Encoded data size (base-64 utf-8 encoded) Bundle size
Avro 12ms 4ms 30003 77.16% 40004 111.7kb
BSON 10ms 11ms 98912 24.71% 131884 98.0kb
CBOR 3ms 4ms 89017 32.24% 118692 30.1kb
MessagePack 3ms 3ms 89017 32.24% 118692 27.7kb
Protobuf 13ms 3ms 38000 71.07% 50668 76.6kb
Protobuf, compiled 6ms 1ms 38000 71.07% 50668 30.0kb
Flatbuffers 9ms 3ms 32052 75.60% 42736 3.1kb
Thrift (binary) 42ms 6ms 45009 65.74% 60012 109.7kb
Thrift (compact) 33ms 11ms 36005 72.59% 48008 109.7kb

Few learnings:

  • Thrift is quite slow and not that straightforward to use (after all, it was designed to provide entire communication layer for an application), but provides decent compression rate
  • Protobuf and Avro provide by far the most compact output (because of schema provided)
  • Protobuf library (protobufjs), unlike Avro, can’t handle enumerations (Protobufjs requires raw integer values to be used whereas Avro supports semantic, string values)
  • Cap’n’Proto seems outdated and its JS plugin did not get any support for few years now, have to check the TS version
  • FlatBuffers is quite low-level and tricky to use (much more effort than those other tools)

My take on these results is that:

  • Protobuf, when using a compiled serialzier/deserializer for specific message(-s):
    • has a comfortable API
    • fast serialization / deserialization in browser
    • decent compression rate
    • relatively small bundle size increase
  • Flatbuffers:
    • great compression rate
    • tiny bundle size impact
    • great performance in browser
    • super-cumbersome API (well, that’s low-level trade-offs for ya)

The source code for the tests could be found on my GitHub repo.

Under the cut you will find a stream of consciousness - my notes whilst implementing the tests for these tech.

Read more

Jargon-free functional programming. Part 2: functional wrappers

Disclaimer

A big disclaimer before diving too deep: I am going to introduce all of the concepts without relying on any frameworks, libraries, specific programming languages and whatnot - just sticking to the hand-written TypeScript. This might seem like a lot of boilerplate and overhead for little benefit, but (with notes of a philosophy) the biggest benefit is in the cost of detecting and fixing errors in the code:

  • IDE highlighting an error (and maybe even suggesting a fix) - mere seconds of developer’s time
  • Local build (compiling the code locally, before pushing the code to the repository) - minutes, maybe tens of minutes
  • CI server build, running all the tests possible - around an hour
  • Pre-production environment (manual testing on dedicated QA / staging environment or even testing on production) - around few hours, may involve other people
  • Production - measured in days or months and risking the reputation with the customers

Hence if we could detect the errors while writing the code the first time - we could potentially save ourselves a fortune measured in both time and money.

Fancy-less introduction to functional programming

The ideas of functional programming are quite simple. In functional programming the assumption is that every function only operates on the arguments it has been passed and nothing else. It can not change the “outer world” - it can have values temporarily assigned to internal constants, nothing more - take it as there are no variables. A function should always return the same result for the same arguments, so functions are always predictable, no matter how many times you call them.

That sounds good and nice, but how does that solve the issues of the above problem, you might ask. For the most part the functions we have extracted already comply with the ideas of functional programming - do they not?

Well, not really. For once, fetching the data from the API is a big questionmark on how it fits into the picture of functional programming. Leaving the fetching aside, we have a random call in the middle. We also log some output to the console (and thus change the “outer world”, outside the printGame function).

For a lot of things like those, functional programming tries to separate the “pure functional operations” and “impure operations”.

See, in functional programming you operate these “pure functions”, which only use constants and inputs to produce their outputs. So a program would be nothing but a chain of function calls. With “simple” functions, returning the values which the next function in the chain can take as an input argument, this is quite easy.

Take the code above as an example:

fetchAPIResponse()
    .then(response => getResponseXML(response))
    .then(doc => extractGames(doc))
    .then(games => getRandomTop10Game(games))
    .then(game => printGame(game));

This could have been written as

fetchAPIResponse()
    .then(response =>
        printGame(
            getRandomTop10Game(
                extractGames(
                    getResponseXML(response)
                )
            )
        )
    );

Since each next function in the chain accepts exactly the type the previous function has returned, they all combine quite well.

JFYI: in other languages and some libraries there are operators to combine functions into one big function:

fetchAPIResponse()
    .then(response =>
        _.flow([ getResponseXML, extractGames, getRandomTop10Game, printGame ])(response)
    );

or

fetchAPIResponse()
    .then(response ->
        getResponseXML.andThen(extractGames).andThen(getRandomTop10Game).andThen(printGame).aplly(response)
    );

You will understand why this matters in a minute.

There are also functions which need to interact with the outer world. In that case, functional programming suggests that we wrap them in specific constructions and do not run them immediately. Instead, we weave them into the program, describing what would happen to the result of the wrapped function call when we get one. This makes programs again, “pure functional”, “safe” (as in not operating outside of the boundaries of the program itself, all is contained in the function call chains). Then, once we run the program, we enter the world of “unsafe” and execute all those wrapped functions and run the rest of the code once we get the results in place.

Sounds a bit hard to comprehend.

Let me rephrase this with few bits of code.

For the problem above, we are trying to get the response of an API somewhere in the outer world. This is said to be an “unsafe” operation, since the data lives outside of the program, so we need to wrap this operation in a “safe” manner. Essentially, we will create an object which describes an intention to run the fetch call and then write our program around this object to describe how this data will be processed down the line, when we actually run the program (and the fetch request together with it).

Let’s go through the thought process all together: we first need a class to wrap an unsafe function without executing it:

class IO {
    constructor(private intentionFunc: Function;) {
    }
}

We then need a way to explicitly execute this function when we are ready to do so:

class IO {
    constructor(private intentionFunc: Function;) {
    }

    unsafeRun() {
        this.intentionFunc();
    }
}

The last piece is we want to be able to chain this function with some other function in a safe manner (e.g. again, wrap the chained function):

class IO {
    constructor(private intentionFunc: Function) {
    }

    andThen(func: Function) {
        return new IO(() => func(this.intentionFunc()));
    }

    unsafeRun() {
        this.intentionFunc();
    }
}

Essentially, we save the function we intend to run in the intentionFunc member of an IO class. When we want to describe what would happen to the result of the data, we return a new IO object with a new function - a combination of a function we will call around the call to the function we saved. This is important to understand why we return a new object: so that we do not mutate the original object.

You might see this new IO thing is very similar to the Promise available in JS runtime already. The similarities are obvious: we also have this chaining with the then method. The call to the then method also returns a new object.

But the main issue with Promise is that they start running the code you passed in the constructor immediately. And that is exactly the issue we are trying to resolve.

Now, let us see how we would use this new IO class in the original problem:

new IO(() => fetch(`https://boardgamegeek.com/xmlapi2/hot?type=boardgame`))

That would not work, however, since fetch call will return a Promise. So we need to somehow work with Promise instances instead. Let me postpone this discussion for a short while.

Spoiler: we could have tried implementing an unpromisify helper which would make the fetch call synchronous, something like this:

const unpromisify = (promiseFn: Function) => {
    const state = { isReady: false, result: undefined, error: undefined };

    promiseFn()
        .then((result) => {
            state.result = result;
            state.isReady = true;
        })
        .catch((error) => {
            state.error = error;
            state.isReady = true;
        });

    while (!state.isReady);

    if (state.error) {
        throw state.error;
    }

    return state.result;
};

But in JS world, promises start executing not immediately, but once you leave the context of a currently running function. So having that endless while loop, waiting for a promise to get resolved has zero effect since this loop will be running until the end of days, but unless you exit the function beforehand, the promise won’t start executing because JS is single threaded and the execution queue / event loop prevents you from running the promise immediately.

End of spoiler

Read more

Jargon-free functional programming. Part 1: problem statement

Basics

Let me introduce you functional programming with as few jargonisms and buzz-words as possible.

Shall we start with a simple problem to solve: get a random board game from top-10 games on BoardGamesGeek website and print out its rank and title.

BoardGameGeek website has an API: a request GET https://boardgamegeek.com/xmlapi2/hot?type=boardgame will return an XML document like this:

<?xml version="1.0" encoding="utf-8"?>

<items termsofuse="https://boardgamegeek.com/xmlapi/termsofuse">
    <item id="361545" rank="1">
        <thumbnail value="https://cf.geekdo-images.com/lD8s_SQPObXTPevz-aAElA__thumb/img/YZG-deJK2vFm4NMOaniqZwwlaAE=/fit-in/200x150/filters:strip_icc()/pic6892102.png" />
        <name value="Twilight Inscription" />
        <yearpublished value="2022" />
    </item>

    <item id="276182" rank="2">
        <thumbnail value="https://cf.geekdo-images.com/4q_5Ox7oYtK3Ma73iRtfAg__thumb/img/TU4UOoot_zqqUwCEmE_wFnLRRCY=/fit-in/200x150/filters:strip_icc()/pic4650725.jpg" />
        <name value="Dead Reckoning" />
        <yearpublished value="2022" />
    </item>
</items>

In JavaScript a solution to this problem might look something like this:

fetch(`https://boardgamegeek.com/xmlapi2/hot?type=boardgame`)
    .then(response => response.text())
    .then(response => new DOMParser().parseFromString(response, "text/xml"))
    .then(doc => {
        const items = Array.from(doc.querySelectorAll('items item'));

        return items.map(item => {
            const rank = item.getAttribute('rank');
            const name = item.querySelector('name').getAttribute('value');

            return { rank, name };
        });
    })
    .then(games => {
        const randomRank = Math.floor((Math.random() * 100) % 10);

        return games[randomRank];
    })
    .then(randomTop10Game => {
        const log = `#${randomTop10Game.rank}: ${randomTop10Game.name}`;

        console.log(log);
    });

Quick and easy, quite easy to understand - seems good enough.

How about we write some tests for it? Oh, now it becomes a little bit clunky - we need to mock fetch call (Fetch API) and the Math.random. Oh, and the DOMParser with its querySelector and querySelectorAll calls too. Probably even console.log method as well. Okay, we will probably need to modify the original code to make testing easier (if even possible). How about we split the program into separate blocks of code?

const fetchAPIResponse = () =>
    fetch(`https://boardgamegeek.com/xmlapi2/hot?type=boardgame`)
        .then(response => response.text());

const getResponseXML = (response) =>
    new DOMParser().parseFromString(response, "text/xml");

const extractGames = (doc) => {
    const items = Array.from(doc.querySelectorAll('items item'));

    return items.map(item => {
        const rank = item.getAttribute('rank');
        const name = item.querySelector('name').getAttribute('value');

        return { rank, name };
    });
};

const getRandomTop10Game = (games) => {
    const randomRank = Math.floor((Math.random() * 100) % 10);

    return games[randomRank];
};

const printGame = (game) => {
    const log = `#${game.rank}: ${game.name}`;

    console.log(log);
};

fetchAPIResponse()
    .then(response => getResponseXML(response))
    .then(doc => extractGames(doc))
    .then(games => getRandomTop10Game(games))
    .then(game => printGame(game));

Okay, now we can test some of the bits of the program without too much of a hassle - we could test that every call of getRandomGame returns a different value (which might not be true) but within the given list of values. We could test the extractGames function on a mock XML document and verify it extracts all the <item> nodes and its <name> child. Testing fetchAPIResponse and getResponseXML and printGame functions, though, would be a bit tricky without either mocking the fetch, console.log and DOMParser or actually calling those functions.

Read more

Jargon-free functional programming. TL;DR

This is a boiled-down version of a much longer read, Jargon-free functional programming, giving a brief and visual introduction to the concepts of a real-world functional programming. This blog is aimed at people who already know something about programming and want to learn what the heck functional programming is, how is it different to “normal” programming and how does it look like in a real world.

In a non-functional world, the code we write depends on anything - a function, aside from its arguments, is free to use environment variables, global variables, outer scope, dependency injection - pretty much anything.

Moreover, it can modify all of the above (including outer scope, global and environment variables, etc.).

In a functional programming world we restrict a function to only rely on its arguments (or nothing at all).

But what about things like databases, user input, network communication, exceptions?

A typical application involving all of the above could be explained algorithmically as the endless loop, waiting for some input to appear before doing something (waiting for database query to complete, waiting for user to provide input, waiting for a network request to complete).

And every step of the program is described as a sequence of actions (potentially involving some rather trivial decision making). This approach is known as “imperative programming” and is very commonly used.

In reality, however, every step of this algorithm can go wrong in many different ways - each step is free to modify some global state (think OS and filesystem), it can fail terribly with an exception. Moreover, anything from the outside world (think OS or dependency injection) can break into the program and change any value or state of the program.

In a functional programming world, functions (and programs) are not described as sequences of commands - instead, they are more like recipes for calculations that will happen once all the requirements are provided.

The way to handle all the nifty things such as exceptions, networking, databases, etc. is to wrap a function which works with a result of the “unsafe” operation in a safe container. This container won’t execute the function - just hold it for a while. However, this container would have two special properties: an ability to run the underlying function when it is deemed safe and an ability to be connected to other containers of the same type.

Each container will perform its very specific role - handling exceptions to return a value instead of breaking, running some input-output operations (incl. networking and databases), etc. We assume containers already do these operations in a safe manner - meaning they do not change anything in the program outside of themselves (think global variables, outer scope, etc.) and they always return a value. They only execute the function they wrap once requested explicitly.

By making it so that safe containers of different types can not be chained, we eliminate the chance of unexpected program failure. And we make sure at any point in time we can say what a program is doing exactly by just looking at its types.

By connecting such containers in a chain, we make programs.

But these chains do not do anything until they are explicitly executed.

A program is a chain of functions, wrapped in “safe” constructs, which is executed “at the end / edge of the world” - meaning program is thought to be executed only once. If all the blocks of this chain of containers succeed - the entire program succeeds.

If any of the blocks fails - the program does not exit or terminates, the failed block simply returns a different value.

All the logic is hidden in those “safe” constructs - it is isolated from the rest of the world.

Those containers are only allowed access to their direct arguments. It is guaranteed to never break and always return a value (which might be wrapped in another “safe” construct).

A program made of these safe recipes on how to calculate the result is just another recipe itself - essentially a series of recipes.

This safe set of recipes is then thrown together with a bunch of inputs into a grinder called “real world”, where nothing is safe and everything can happen (theoretically).

In the grinder, the dish is being cooked from the inputs, following the recipes thrown to the grinder. The result of this cooking might be another program itself, which can then be recycled by being thrown back into the grinder - that would happen if a program enters the (infinite) loop, waiting for some inputs - it is essentially becomes a new program, which also needs to be executed when all the requirements are met.

In order to build one of those containers, one starts by creating a simple class.

The class must hold a function without running it.

There should be a way to link (chain) this container with some other function, creating a new safe container.

And finally there should be a way to execute the function wrapped by this safe container.

The details of each container’ implementation is what makes them different. For few examples, the container which makes an arbitrary function safe (in this case we assume it does some input-output stuff) could look like this:

class IO <A> {
    constructor(private f: () => A) {
    }

    andThen<B>(g: (_: A) => B) {
        return new IO(() => g(this.f()));
    }

    unsafeRun() {
        this.f();
    }
}

A container which wraps a function returning a Promise might look similar (except all the dancing around Promise API):

class PromiseIO <A> {
    constructor(private readonly f: () => Promise<A>) {}

    andThen<B>(g: (_: A) => B) {
        return new PromiseIO<B>(() => this.unsafeRun().then(g));
    }

    unsafeRun() {
        return this.f();
    }
}

You can see the pattern - these classes all have very similar interface. Hence you can extract it:

interface Container <A> {
    andThen<B>(g: (_: A) => B): Container<B>;
}

class IO <A> implements Container <A> { ... }

class PromiseIO <A> implements Container <A> { ... }

Then you can create a container which wraps a function which might throw an exception (together with its error handler):

class Try <A> implements Container <A> {
    constructor(private readonly f: () => Container<A>, private readonly errorHandler: (_: unknown) => Container<A>) {}

    andThen<B>(g: (_: A) => B) {
        return new Try<B, E>(
            () => this.f().andThen(g),
            (e) => this.errorHandler(e).andThen(g)
        );
    }

    unsafeRun() {
        try {
            return this.f();
        } catch (e) {
            return this.errorHandler(e);
        }
    }
}

Then you can write programs using these containers:

const fetchSomeResponse = () => new PromiseIO(() => fetch('/').then(r => r.text()));

const processResponse = (response: string) =>
    new Try(
        () => new IO(() => console.log('OK', response)),
        (e) => new IO(() => console.error('ERR', e))
    );

const program = fetchSomeResponse()
    .andThen(processResponse)
    .andThen(t => t.unsafeRunTry())
    .andThen(io => (io as IO<void>).unsafeRun())
    .unsafeRun();

The next article contains a few examples and explains the above in bloody details, using TypeScript and a (semi-)real-world problem and a step-by-step approach to arrivin at the above concepts. It also introduces few more of those containers so you can actually go out and understand some (if not most) of the real-world applications made with functional paradigm. Or even build your own!

.gitignore is not ignoring

This is going to be a very short blog. I have been struggling with this issue for few days now - having a seemingly valid .gitignore file which does not make Git ignore any of the files.

The .gitignore file contents:

node_modules

**/*.bundle.js
*.log

*compiled-proto*
*compiled-bundle*

Running git status:

$ git st
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)
        node_modules/
        test1/flatbuffers-compiled-proto/
        test5/index.bundle.js
        test5/test5-avro.bundle.js
        test5/test5-bson.bundle.js
        test5/test5-cbor.bundle.js
        test5/test5-flatbuffers-compiled-proto/
        test5/test5-flatbuffers-compiled.bundle.js
        test5/test5-messagepack.bundle.js
        test5/test5-protobuf-compiled-proto.js
        test5/test5-protobuf-compiled.bundle.js
        test5/test5-protobuf.bundle.js

nothing added to commit but untracked files present (use "git add" to track)

Weird, isn’t it?

There is a command which checks if a given path (whatever you pass as a parameter to the command, so technically just a string) would be ignored by any of the rules in .gitignore file or not:

git check-ignore --verbose <path>

Let’s run it on my repo:

$ git check-ignore --verbose node_modules

No output means the path (in this case - node_modules) will not be ignored. Which should not be the case - there’s a rule as the first line of the .gitignore file, right?!

The issue seems to be somewhat hidden - the file was saved in the UTF16-LE encoding, since I have used PowerShell to initialize the file with echo 'node_modules' >> .gitignore:

And seems like the valid encoding for .gitignore file would be UTF-8. Let’s use VSCode itself to save it in UTF-8 instead and try it again:

$ git check-ignore --verbose node_modules
.gitignore:1:node_modules       node_modules

That output tells us the rule in the first line, namely node_modules (no asterisks or slashes) ignores the path node_modules. Issue solved!

Distributed Erlang example

As promised in my previous blog about Erlang, I continue on a journey to more practical Erlang examples.

This might sound super ambitious, but let us build a distributed database.

For sake of simplicity, let’s make it a reduced version of Redis - a key-value distributed in-memory storage. Essentially, an over-engineered hashmap.

To draw some boundaries around this, let’s focus on these key features:

  • simple CRUD actions - get, set and delete a value; this should operate on REST-like API:
    • get(<key>) - get key value
    • set(<key>) - use request body for value and associate the value with the key
    • delete(<key>) - remove the key from the storage
  • running on multiple nodes (machines)
  • synchronizing the data between the nodes; this should support:
    • ability to rotate the nodes in the cluster (switch off nodes and add new ones randomly) without the loss of data
    • distributing the operations across the nodes to keep the data in sync

Sounds unbelievable, but with Erlang this is actually pretty simple. Erlang is actually a great choice for an application like this, since we do not have to worry about setting up the cluster and deal with communication protocols (as in how to pass data over the network between the nodes).

We will need a “main” process, which will take the requests from the users and pass them to the nodes. Each node will store its own copy of the data in memory. On startup, each new node will receive the copy of the data from the first available node. If no nodes are available - that means the cluster is fresh and we can safely assume the data is empty (or the cluster has died altogether and that state is unrecoverable).

Step 1: draw a circle

Start simple: let us have a simple application which starts a background process and responds to some commands. It could be as simple as ping - pong application - the background process waits for a ping message and responds with pong upon receiving one.

-module(db_server).

-export([start/0, stop/0, ping/0]).

start() ->
    Pid = spawn_link(fun () -> loop() end),

    register(server, Pid),

    ok.

stop() ->
    server ! stop,

    ok.

ping() ->
    server ! { ping, self() },

    receive
        pong -> pong
    end.

loop() ->
    receive
        { ping, Pid } ->
            Pid ! pong,
            loop();

        stop -> ok
    end.

This is a relatively simple application, but it has few quirks:

  • spawn_link/1 (spawn_link(loop)) - this function spawns a new background process and links it to the current one, so that once the current process stops, the linked process also stops
  • register/2 (register(server, Pid)) - registers a global (in the current module scope) symbol (server in this case) and associates it with the process PID passed as the second parameter (Pid in this case, the PID of the spawned process running the loop function)
  • the ping/0 function sends a message ({ ping, self() } tuple) containing the atom ping and current process’ PID to the process registered as server; it then starts waiting for any process to send a pong message to the current process, effectively blocking the current process until that message is received
  • the loop function responds to two messages:
    • { ping, Pid } - by sending pong message to the process with PID set in the Pid constant and running itself recursively (turning the entire loop/0 function into a while loop)
    • stop - by simply returning the ok atom and exiting (thus terminating the loop)

To run this application, start the Erlang shell with erl and type in few prompts:

1> c(db_server).
{ok,db_server}
2> db_server:start().
ok
3> db_server:ping().
pong

The benefit of this architecture is that once we have an Erlang shell open and run the start/0 function, it will start the background process and exit immediately. But the background process will keep running and we will have the other functions available to interact with that process, without loosing the access to the Erlang shell.

This protocol also highlights the concurrent aspect of the application - we send the current process’ PID to the callee and start listening for the messages from anywhere. Once we receive the message - we pass it to the other process. So the entire application is predominantly non-blocking and asynchronous in nature.

Step 2: draw a second circle

The application is surely awesome, but let’s make it actually useful to any extent. We are designing a key-value storage, so why not to have one? Instead of just reacting to the ping message, we can have a map and have a simple API to set value, get value and delete a value from the map while keeping the map on the child process.

-module(db_server).

-export([ start/0, stop/0 ]).
-export([ get/1, set/2, delete/1 ]).

start() ->
    Pid = spawn_link(fun () -> loop(maps:new()) end),

    register(server, Pid),

    ok.

stop() ->
    server ! stop,

    ok.

set(Key, Value) ->
    server ! { set, Key, Value },

    ok.

get(Key) ->
    server ! { get, Key, self() },

    receive
        { get, Value } -> Value
    end.

delete(Key) ->
    server ! { delete, Key },

    ok.

loop(Data) ->
    receive
        { set, Key, Value } ->
            NewData = maps:put(Key, Value, Data),

            loop(NewData);

        { get, Key, Pid } ->
            Pid ! { get, maps:get(Key, Data, none) },

            loop(Data);

        { delete, Key } ->
            NewData = maps:remove(Key, Data),

            loop(NewData);

        stop -> ok
    end.

As you can see, there’s nothing special going on here - all is quite simple. We start the background process with some initial data (maps:new()) and in the loop/1 function whenever we enter recursion, we pass the new version of the data to the recursive call. The only function which actually is blocking its execution until it receives a response from a callee is get/1 - we wait for the background process to send the value to the current process.

The interaction with this application could look like this:

1> c(db_server).
{ok,db_server}
2> db_server:start().
ok
3> db_server:set(moo, -3.14).
ok
4> db_server:get(moo).
-3.14
5> db_server:delete(moo).
ok
6> db_server:get(moo).
none

Step 3: draw the rest of the owl

Now for the juicy bits: let us have a worker node, which will actually be a replica of the database. It is as simple as extracting the logic from loop/1 function to a separate module and having a new process being started on a new node, once we decide to add it to the cluster:

-module(worker).

-export([ loop/1 ]).

loop(Data) ->
    receive
        { set, Key, Value } ->
            NewData = maps:put(Key, Value, Data),

            loop(NewData);

        { get, Key, Pid } ->
            Pid ! { get, maps:get(Key, Data, none) },

            loop(Data);

        { delete, Key } ->
            NewData = maps:remove(Key, Data),

            loop(NewData);

        stop -> ok
    end.

The db_server module needs few changes:

-module(db_server).

-export([ start/0, stop/0 ]).
-export([ get/1, set/2, delete/1 ]).
-export([ add_node/1 ]).

start() ->
    Pid = spawn_link(fun () -> loop([]) end),

    register(server, Pid),

    ok.

stop() ->
    server ! stop,

    ok.

set(Key, Value) ->
    server ! { set, Key, Value },

    ok.

get(Key) ->
    server ! { get, Key, self() },

    receive
        { get, Value } -> Value
    end.

delete(Key) ->
    server ! { delete, Key },

    ok.

add_node(NodeAddr) ->
    server ! { add_node, NodeAddr },

    ok.

loop(Nodes) ->
    receive
        { set, Key, Value } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { set, Key, Value } end, Nodes),

            loop(Nodes);

        { get, Key, Pid } ->
            [{ _, FirstNodePid }|_] = Nodes,

            FirstNodePid ! { get, Key, Pid },

            receive
                { get, Value } -> Pid ! { get, Value }
            end,

            loop(Nodes);

        { delete, Key } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { delete, Key } end, Nodes),

            loop(Nodes);

        { add_node, NodeAddr } ->
            Data = maps:new(),

            NodePid = spawn(NodeAddr, worker, loop, [ Data ]),

            monitor(process, NodePid),

            NewNodes = lists:append([ { NodeAddr, NodePid } ], Nodes),

            loop(NewNodes);

        stop -> ok
    end.

The public API of the db_server module does not really change - there’s a new function add_node/1 but that’s about it - these exported (aka “public”) functions only send messages to the server process.

The loop/1 function on the other hand has most changes - instead of sending message to a single process, it now distributes them across the processes stored in the Nodes constant. This constant is a list of tuples { NodeAdd, NodePid } where the NodeAddr (the first element of the tuple) is just to keep track of nodes added to the list (for debug purposes predominantly) and NodePid (the second element of the tuple) is what is actually used to communicate with the db_server process.

All the control messages (get, set and delete) are passed to all nodes from the Nodes list. Exception being the { get, Key, Pid } message - it only sends the message to the first node from the list. And since all the worker nodes are (supposed to be) just replicas of each other, it is sufficient.

So far this should work. Start the db_server node with

$ erl -sname supervisor

followed by the call to

(supervisor@MACHINENAME)> c(db_server).
{ok,db_server}
(supervisor@MACHINENAME)> db_server:start().
ok

Then, add few nodes to the cluster. For that, the node must be running and must use the same naming convention as the supervisor (short name, set with -sname param to the erl or long name, set with the -name param to the erl) - otherwise they won’t see each other.

$ erl -sname subnode1

followed by

(subnode1@MACHINENAME)> c(worker).
{ok,worker}

Note how you don’t need to do anything on the worker node except compile the module. The supervisor node will start processes on that node itself.

Lastly, on the server node call

(supervisor@MACHINENAME)> server:add_node(subnode1@MACHINENAME).
ok

The node name is printed out to the Erlang shell once you start erl with -sname or -name param provided. But you can also add io:format("Started node at ~s~n", node()) as the first statement in the loop/1 function in worker module - it will print out the node name to the Erlang shell once the function is started. You can then use it as a raw atom parameter to server:add_node/1.

Now, the last piece to the system is keeping all the worker nodes in sync by copying the data from the first alive node to the new node once it is added to the supervisor.

To do so, we will need to utilize the net_adm module, which conveniently has the net_adm:ping/1 function, taking the node address and returning pond atom if the node is alive and is visible from the current node or pang otherwise. This way we can replace all the occurrences of the Nodes list in the db_server:loop/1 function with the list of alive nodes:

% supervisor
loop(Nodes) ->
    AliveNodes = sets:filter(fun ({ NodeAddr, _ }) -> net_adm:ping(NodeAddr) == pong end, Nodes),

    receive
        { set, Key, Value } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { set, Key, Value } end, AliveNodes),

            loop(AliveNodes);

        { get, Key, Pid } ->
            [ { _, FirstNodePid } | _ ] = AliveNodes,

            FirstNodePid ! { get, Key, Pid },

            receive
                { get, Value } -> Pid ! { get, Value }
            end,

            loop(AliveNodes);

        { delete, Key } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { delete, Key } end, AliveNodes),

            loop(AliveNodes);

        { add_node, NodeAddr } ->
            Data = maps:new(),

            NodePid = spawn(NodeAddr, worker, loop, [ Data ]),

            monitor(process, NodePid),

            NewNodes = lists:append([ { NodeAddr, NodePid } ], AliveNodes),

            loop(NewNodes);

        stop -> ok
    end.

Now to get the data from the node, we will need an extra message to be supported by the worker module:

% worker
loop(Data) ->
    receive
        { set, Key, Value } ->
            NewData = maps:put(Key, Value, Data),

            loop(NewData);

        { get, Key, Pid } ->
            Pid ! { get, maps:get(Key, Data, none) },

            loop(Data);

        { delete, Key } ->
            NewData = maps:remove(Key, Data),

            loop(NewData);

        { get_all, Pid } ->
            Pid ! { get_all, Data },

            loop(Data);

        stop -> ok
    end.

This way when we want to add a new node to the cluster, we will need to first get the data from the first available node and then send it to the newly added node in the db_server module. But if there are no nodes alive, we want the maps:new() to be the new data:

% supervisor
loop(Nodes) ->
    AliveNodes = lists:filter(fun ({ NodeAddr, _ }) -> net_adm:ping(NodeAddr) == pong end, Nodes),

    receive
        { set, Key, Value } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { set, Key, Value } end, AliveNodes),

            loop(AliveNodes);

        { get, Key, Pid } ->
            [ { _, FirstNodePid } | _ ] = AliveNodes,

            FirstNodePid ! { get, Key, Pid },

            receive
                { get, ReceivedValue } -> ReceivedValue
            end,

            Pid ! { get, Value },

            loop(AliveNodes);

        { delete, Key } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { delete, Key } end, AliveNodes),

            loop(AliveNodes);

        { add_node, NodeAddr } ->
            Data = case AliveNodes of
                [ { _, FirstNodePid } | _ ] ->
                    FirstNodePid ! { get_all, self() },

                    receive
                        { get_all, NewData } -> NewData;

                        _ -> maps:new()
                    end;

                _ -> maps:new()
            end,

            NodePid = spawn(NodeAddr, worker, loop, [ Data ]),

            monitor(process, NodePid),

            NewNodes = lists:append([ { NodeAddr, NodePid } ], AliveNodes),

            loop(NewNodes);

        stop -> ok
    end.

You may practice putting nodes on and off the running cluster, getting and setting the values at the same time. But occasionally you might stumble upon a db_server:get/1 function hanging. This is what happens when there are no alive nodes on the cluster. In order to fix this, we can simply add a check for that instead of blindly taking the first node from the AliveNodes list:

% supervisor
loop(Nodes) ->
    AliveNodes = lists:filter(fun ({ NodeAddr, _ }) -> net_adm:ping(NodeAddr) == pong end, Nodes),

    receive
        { set, Key, Value } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { set, Key, Value } end, AliveNodes),

            loop(AliveNodes);

        { get, Key, Pid } ->
            Value = case AliveNodes of
                [ { _, FirstNodePid } | _ ] ->
                    FirstNodePid ! { get, Key, self() },

                    receive
                        { get, ReceivedValue } -> ReceivedValue
                    end;

                _ -> no_nodes
            end,

            Pid ! { get, Value },

            loop(AliveNodes);

        { delete, Key } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { delete, Key } end, AliveNodes),

            loop(AliveNodes);

        { add_node, NodeAddr } ->
            Data = case AliveNodes of
                [ { _, FirstNodePid } | _ ] ->
                    FirstNodePid ! { get_all, self() },

                    receive
                        { get_all, NewData } -> NewData;

                        _ -> maps:new()
                    end;

                _ -> maps:new()
            end,

            NodePid = spawn(NodeAddr, worker, loop, [ Data ]),

            monitor(process, NodePid),

            NewNodes = lists:append([ { NodeAddr, NodePid } ], AliveNodes),

            loop(NewNodes);

        stop -> ok
    end.

This will still not work however. The issue is that we only assign the value to the AliveNodes once we enter the db_server:loop/1 function. And that only happens after it processed a message. But the moment the value is actually needed is right before processing a message. See the difference? There might be a delay between processing (as in “receiving”) a message and figuring the list of actually alive nodes because the process is blocked waiting for new messages.

One way to fix this is to move the AliveNodes assignment to each branch of the receive block:

% supervisor
loop(Nodes) ->
    receive
        { set, Key, Value } ->
            AliveNodes = lists:filter(fun ({ NodeAddr, _ }) -> net_adm:ping(NodeAddr) == pong end, Nodes),

            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { set, Key, Value } end, AliveNodes),

            loop(AliveNodes);

        { get, Key, Pid } ->
            AliveNodes = lists:filter(fun ({ NodeAddr, _ }) -> net_adm:ping(NodeAddr) == pong end, Nodes),

            Value = case AliveNodes of
                [ { _, FirstNodePid } | _ ] ->
                    FirstNodePid ! { get, Key, self() },

                    receive
                        { get, ReceivedValue } -> ReceivedValue
                    end;

                _ -> no_nodes
            end,

            Pid ! { get, Value },

            loop(AliveNodes);

        { delete, Key } ->
            AliveNodes = lists:filter(fun ({ NodeAddr, _ }) -> net_adm:ping(NodeAddr) == pong end, Nodes),

            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { delete, Key } end, AliveNodes),

            loop(AliveNodes);

        { add_node, NodeAddr } ->
            AliveNodes = lists:filter(fun ({ NodeAddr, _ }) -> net_adm:ping(NodeAddr) == pong end, Nodes),

            Data = case AliveNodes of
                [ { _, FirstNodePid } | _ ] ->
                    FirstNodePid ! { get_all, self() },

                    receive
                        { get_all, NewData } -> NewData;

                        _ -> maps:new()
                    end;

                _ -> maps:new()
            end,

            NodePid = spawn(NodeAddr, worker, loop, [ Data ]),

            monitor(process, NodePid),

            NewNodes = lists:append([ { NodeAddr, NodePid } ], AliveNodes),

            loop(NewNodes);

        stop -> ok
    end.

Alternatively, we can utilize the after keyword which effectively implements a timeout mechanism:

% supervisor
loop(Nodes) ->
    receive
        { set, Key, Value } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { set, Key, Value } end, Nodes),

            loop(Nodes);

        { get, Key, Pid } ->
            Value = case Nodes of
                [ { _, FirstNodePid } | _ ] -> FirstNodePid ! { get, Key, self() },

                    receive
                        { get, ReceivedValue } -> ReceivedValue,

                        _ -> none
                    end;

                _ -> no_nodes
            end,

            Pid ! { get, Value },

            loop(Nodes);

        { delete, Key } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { delete, Key } end, Nodes),

            loop(Nodes);

        { add_node, NodeAddr } ->
            Data = case Nodes of
                [ { _, FirstNodePid } | _ ] ->
                    FirstNodePid ! { get_all, self() },

                    receive
                        { get_all, NewData } -> NewData;

                        _ -> maps:new()
                    end;

                _ -> maps:new()
            end,

            NodePid = spawn(NodeAddr, worker, loop, [ Data ]),

            monitor(process, NodePid),

            NewNodes = lists:append([ { NodeAddr, NodePid } ], Nodes),

            loop(NewNodes);

        stop -> ok
    after
        0 ->
            AliveNodes = lists:filter(fun ({ NodeAddr, _ }) -> net_adm:ping(NodeAddr) == pong end, Nodes),

            loop(AliveNodes)
    end.

This will make the receive expression of the db_server:loop/1 function check if there are any messages in the internal message inbox (aka message queue) associated with the current function. And if there are no messages after 0 (as in this case) time since the check started - the code in the corresponding matching expression will be executed. In our scenario, we figure the new list of alive nodes and restart the loop function. This way the time the process is sitting blocked waiting for messages is bare minimal.

On the other hand, if you look carefully at all those receive statements within the loop function, you might just notice… that you don’t need them in fact - the worker nodes might just communicate with each other! So instead of sending self() PID to each worker node (for ex. for get message or get_all message), you can just send the received node PID (which is a part of the message) to the worker node. This will make the code highly asynchronous:

% supervisor
loop(Nodes) ->
    AliveNodes = lists:filter(fun ({ NodeAddr, _ }) -> net_adm:ping(NodeAddr) == pong end, Nodes),

    receive
        { set, Key, Value } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { set, Key, Value } end, AliveNodes),

            loop(AliveNodes);

        { get, Key, Pid } ->
            case AliveNodes of
                [ { _, FirstNodePid } | _ ] -> FirstNodePid ! { get, Key, Pid };

                _ -> Pid ! { get, no_nodes }
            end,

            loop(AliveNodes);

        { delete, Key } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { delete, Key } end, AliveNodes),

            loop(AliveNodes);

        { add_node, NodeAddr } ->
            NodePid = spawn(NodeAddr, worker, loop, [ Data ]),

            monitor(process, NodePid),

            NewNodes = lists:append([ { NodeAddr, NodePid } ], AliveNodes),

            case AliveNodes of
                [ { _, FirstNodePid } | _ ] -> FirstNodePid ! { get_all, NodePid };

                _ -> NodePid ! { got_all, maps:new() }
            end,

            loop(NewNodes);

        stop -> ok
    end.

This will need a tiny adjustment to the worker module (to support that { got_all, Data } messages), but the benefits are more than worth it!

% worker
loop(Data) ->
    receive
        { set, Key, Value } ->
            NewData = maps:put(Key, Value, Data),

            loop(NewData);

        { get, Key, Pid } ->
            Pid ! { get, maps:get(Key, Data, none) },

            loop(Data);

        { delete, Key } ->
            NewData = maps:remove(Key, Data),

            loop(NewData);

        { get_all, Pid } ->
            Pid ! { get_all, Data },

            loop(Data);

        { got_all, NewData } ->
            loop(NewData);

        stop -> ok
    end.

Wrap-up

The cherry on top of this cake would be to create an actual REST API (or any non-Erlang API for that matter) so that this database could be used from the outside world without starting an Erlang shell. But I’d rather save it for a later blog.

For now you can actually plug and unplug nodes from the cluster, and while there is at least one node alive, the data integrity is guaranteed.

The highly asynchronous architecture of the solution allows for non-blocking calls all over the place.

For me, this is a great example on how to actually use Erlang!

The slightly built-up code is hosted on GitHub.

What is a Monad?

What is a monad?

There is a talk by Gilad Bracha, “Deconstructing functional programming”. This is a very good introduction to functional programming, monads included.

As per that talk, think about monad as a class like this:

abstract class FlatMappable <A> {
  @Immutable
  private final A a;
  
  FlatMappable(A a) {
    this.a = a;
  }
  
  FlatMappable<B> flatMap<B>(Function<A, FlatMappable<B>> fn) {
    return fn.apply(a);
  }
}

Just rename FlatMappable to Monad and there you go.

Now, if you want more Haskell naming (gonna implement it in C++ for more correct syntax):

template <typename A>
cass Monad {
public:
  static Monad<A> return(A a) {
    return Monad(a);
  }
  
  template <typename B> Monad<B> operator>>=(std::function<A, Monad<B>> fn) {
    return fn.apply(a);
  }

private:
  Monad(A a) : m_a(a) {}
  
  immutable<A> m_a;
}

Essentially, renaming constructor to return and flatMap to operator>>=

Also, in terms of Mappable vs FlatMappable:

class Mappable <A> {
  private final A a;
  
  Mappable(A a) { this.a = a; }
  
  Mappable<B> map(Function<A, B> fn) {
    return Mappable(fn.apply(a));
  }
}

class Flattable <A> extends Mappable <A> {
  Flattable(A a) { super(a); }
  
  Flattable<A> flatten() {
    if (a instanceOf Flattable<A>) {
      return a.flatten();
    }
    
    return a;
  }
}

class FlatMappable <A> extends Flattable <A> {
  FlatMappable(A a) { super(a); }
  
  FlatMappable<A> flatMap<B>(Function<A, FlatMappable<B>> fn) {
    return map(fn).flatten();
  }
}

Why do we need monads?

In order to preserve purity of programs in functional programming languages, you can not have side-effects in your program.

But if you need interaction with outside systems (IO, database, system clock, random number generator, etc.), you will need to somehow make these operations without side-effects.

So you write your entire program as a description of a workflow (e.g. how data will be processed). Basically your program becomes a chain of functions calling other functions. No waiting for events, nothing like that.

Then, when you run your program, “at the execution edge” (aka “event horizon”), just before your program finishes, the runtime will do all the “unsafe” (or rather “impure”) operations (interaction with IO, system clock, etc.) and will feed all the input to your functions, take the output and modify “the state of the world outside”.

Where are monads on this? Monads are basically just wrappers around other values, telling the runtime what to do. So that your code is still pure and all the impure side-effects will be eventually handled by runtime.

For example, reading from STDIN would be something like IO<String> - a monad, whose value will be provided eventually. So your program will be defined as a function something like

def main(IO[String] input): IO[String] =
  input.map { name -> printLn("Hello, ${name}") }

As you can see, main is a function which simply returns another IO[String] by mapping a monad parameter.

When you run it, the program will effectively stop until the IO[String] input parameter is filled with value. When the value will be provided, runtime will execute the rest of the code (the map { ... } part).

irrPaint3d

Recently I was reviving some of my old projects. And to my surprise, there were some people actually interested in those! That was a good enough reason for me to revise the old pet projects of mine and exercise my skills (mostly in C++ though).

One really interesting project I had back in the day was irrPaint3d. It all started as an idea for my B.Sc. thesis and the ultimate goal was to enable users paint 3D objects and immediately see the results in realtime, in 3D.

This is a no unique feature nowadays, but back in 2013, to my knowledge, the only way to texture 3D models was to either unwrap them to UV maps and then paint the textures in graphics editor (such as Gimp or Paint.NET) or to use a proprietary software, 3D Coat.

Unwrapping 3D model texture in Blender in 2013

And a mere idea of implementing such tool myself was quite exciting. Darn, if I managed to implement it back in 2013, my thesis would shine among best thesises in my uni!

Long story short, I have failed with that. And now, after spending just few days on this, I am happy to say I have achieved something with yet another glorious pet project revival:

Revised 3D model painting

And it actually allows to paint the model in 3D, displaying the results in real-time!

A bit of history on how I got there and maths behind the solution under the cut.

Read more

Irrlicht application template

Often when I start revising my old applicaitons with Irrlicht engine, I do few things very similarly. Especially when the application contains GUI and uses Irrlicht tools for it.

This is mostly due to the extremely outdated nature of Irrlicht itself. There are all sorts of things in it like the lack of 4K monitors support, the use of a very obscure font, sample applications being a simple yet messy single CPP files, etc.

The common things I do include:

  • using new C++ standard features such as:
    • shared pointers
    • automatic type inference
    • standard containers
    • C++-style string operations
  • setting the new font for GUI, adopted to higher screen resolutions
  • using CMake to build the project and vcpkg to manage dependencies
  • utilizing object-oriented approach
  • moving the classes to separate header and CPP files

In this blog I describe in bloody detail what I do and why.

Read more

Strongly-typed front-end

Serial experiments Lain

In this little research project I describe my journey through a series of experiments trying out a number of technologies and clashing them against each other.