Jargon-free functional programming. TL;DR

11 Aug 2022

This is a boiled-down version of a much longer read, Jargon-free functional programming, giving a brief and visual introduction to the concepts of a real-world functional programming. This blog is aimed at people who already know something about programming and want to learn what the heck functional programming is, how is it different to "normal" programming and how does it look like in a real world.

In a non-functional world, the code we write depends on anything - a function, aside from its arguments, is free to use environment variables, global variables, outer scope, dependency injection - pretty much anything.

Moreover, it can modify all of the above (including outer scope, global and environment variables, etc.).

In a functional programming world we restrict a function to only rely on its arguments (or nothing at all).

But what about things like databases, user input, network communication, exceptions?

A typical application involving all of the above could be explained algorithmically as the endless loop, waiting for some input to appear before doing something (waiting for database query to complete, waiting for user to provide input, waiting for a network request to complete).

And every step of the program is described as a sequence of actions (potentially involving some rather trivial decision making). This approach is known as "imperative programming" and is very commonly used.

In reality, however, every step of this algorithm can go wrong in many different ways - each step is free to modify some global state (think OS and filesystem), it can fail terribly with an exception. Moreover, anything from the outside world (think OS or dependency injection) can break into the program and change any value or state of the program.

In a functional programming world, functions (and programs) are not described as sequences of commands - instead, they are more like recipes for calculations that will happen once all the requirements are provided.

The way to handle all the nifty things such as exceptions, networking, databases, etc. is to wrap a function which works with a result of the "unsafe" operation in a safe container. This container won't execute the function - just hold it for a while. However, this container would have two special properties: an ability to run the underlying function when it is deemed safe and an ability to be connected to other containers of the same type.

Each container will perform its very specific role - handling exceptions to return a value instead of breaking, running some input-output operations (incl. networking and databases), etc. We assume containers already do these operations in a safe manner - meaning they do not change anything in the program outside of themselves (think global variables, outer scope, etc.) and they always return a value. They only execute the function they wrap once requested explicitly.

By making it so that safe containers of different types can not be chained, we eliminate the chance of unexpected program failure. And we make sure at any point in time we can say what a program is doing exactly by just looking at its types.

By connecting such containers in a chain, we make programs.

But these chains do not do anything until they are explicitly executed.

A program is a chain of functions, wrapped in "safe" constructs, which is executed "at the end / edge of the world" - meaning program is thought to be executed only once. If all the blocks of this chain of containers succeed - the entire program succeeds.

If any of the blocks fails - the program does not exit or terminates, the failed block simply returns a different value.

All the logic is hidden in those "safe" constructs - it is isolated from the rest of the world.

Those containers are only allowed access to their direct arguments. It is guaranteed to never break and always return a value (which might be wrapped in another "safe" construct).

A program made of these safe recipes on how to calculate the result is just another recipe itself - essentially a series of recipes.

This safe set of recipes is then thrown together with a bunch of inputs into a grinder called "real world", where nothing is safe and everything can happen (theoretically).

In the grinder, the dish is being cooked from the inputs, following the recipes thrown to the grinder. The result of this cooking might be another program itself, which can then be recycled by being thrown back into the grinder - that would happen if a program enters the (infinite) loop, waiting for some inputs - it is essentially becomes a new program, which also needs to be executed when all the requirements are met.

In order to build one of those containers, one starts by creating a simple class.

The class must hold a function without running it.

There should be a way to link (chain) this container with some other function, creating a new safe container.

And finally there should be a way to execute the function wrapped by this safe container.

The details of each container' implementation is what makes them different. For few examples, the container which makes an arbitrary function safe (in this case we assume it does some input-output stuff) could look like this:

class IO <A> {
    constructor(private f: () => A) {
    }

    andThen<B>(g: (_: A) => B) {
        return new IO(() => g(this.f()));
    }

    unsafeRun() {
        this.f();
    }
}

A container which wraps a function returning a Promise might look similar (except all the dancing around Promise API):

class PromiseIO <A> {
    constructor(private readonly f: () => Promise<A>) {}

    andThen<B>(g: (_: A) => B) {
        return new PromiseIO<B>(() => this.unsafeRun().then(g));
    }

    unsafeRun() {
        return this.f();
    }
}

You can see the pattern - these classes all have very similar interface. Hence you can extract it:

interface Container <A> {
    andThen<B>(g: (_: A) => B): Container<B>;
}

class IO <A> implements Container <A> { ... }

class PromiseIO <A> implements Container <A> { ... }

Then you can create a container which wraps a function which might throw an exception (together with its error handler):

class Try <A> implements Container <A> {
    constructor(private readonly f: () => Container<A>, private readonly errorHandler: (_: unknown) => Container<A>) {}

    andThen<B>(g: (_: A) => B) {
        return new Try<B, E>(
            () => this.f().andThen(g),
            (e) => this.errorHandler(e).andThen(g)
        );
    }

    unsafeRun() {
        try {
            return this.f();
        } catch (e) {
            return this.errorHandler(e);
        }
    }
}

Then you can write programs using these containers:

const fetchSomeResponse = () => new PromiseIO(() => fetch('/').then(r => r.text()));

const processResponse = (response: string) =>
    new Try(
        () => new IO(() => console.log('OK', response)),
        (e) => new IO(() => console.error('ERR', e))
    );

const program = fetchSomeResponse()
    .andThen(processResponse)
    .andThen(t => t.unsafeRunTry())
    .andThen(io => (io as IO<void>).unsafeRun())
    .unsafeRun();

The next article contains a few examples and explains the above in bloody details, using TypeScript and a (semi-)real-world problem and a step-by-step approach to arrivin at the above concepts. It also introduces few more of those containers so you can actually go out and understand some (if not most) of the real-world applications made with functional paradigm. Or even build your own!

.gitignore is not ignoring

11 May 2022

This is going to be a very short blog. I have been struggling with this issue for few days now - having a seemingly valid .gitignore file which does not make Git ignore any of the files.

The .gitignore file contents:

node_modules

**/*.bundle.js
*.log

*compiled-proto*
*compiled-bundle*

Running git status:

$ git st
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)
        node_modules/
        test1/flatbuffers-compiled-proto/
        test5/index.bundle.js
        test5/test5-avro.bundle.js
        test5/test5-bson.bundle.js
        test5/test5-cbor.bundle.js
        test5/test5-flatbuffers-compiled-proto/
        test5/test5-flatbuffers-compiled.bundle.js
        test5/test5-messagepack.bundle.js
        test5/test5-protobuf-compiled-proto.js
        test5/test5-protobuf-compiled.bundle.js
        test5/test5-protobuf.bundle.js

nothing added to commit but untracked files present (use "git add" to track)

Weird, isn't it?

There is a command which checks if a given path (whatever you pass as a parameter to the command, so technically just a string) would be ignored by any of the rules in .gitignore file or not:

git check-ignore --verbose <path>

Let's run it on my repo:

$ git check-ignore --verbose node_modules

No output means the path (in this case - node_modules) will not be ignored. Which should not be the case - there's a rule as the first line of the .gitignore file, right?!

The issue seems to be somewhat hidden - the file was saved in the UTF16-LE encoding, since I have used PowerShell to initialize the file with echo 'node_modules' >> .gitignore:

And seems like the valid encoding for .gitignore file would be UTF-8. Let's use VSCode itself to save it in UTF-8 instead and try it again:

$ git check-ignore --verbose node_modules
.gitignore:1:node_modules       node_modules

That output tells us the rule in the first line, namely node_modules (no asterisks or slashes) ignores the path node_modules. Issue solved!

Distributed Erlang example

04 Apr 2022

As promised in my previous blog about Erlang, I continue on a journey to more practical Erlang examples.

This might sound super ambitious, but let us build a distributed database.

For sake of simplicity, let's make it a reduced version of Redis - a key-value distributed in-memory storage. Essentially, an over-engineered hashmap.

To draw some boundaries around this, let's focus on these key features:

simple CRUD actions - get, set and delete a value; this should operate on REST-like API:
- get(<key>) - get key value
- set(<key>) - use request body for value and associate the value with the key
- delete(<key>) - remove the key from the storage
running on multiple nodes (machines)
synchronizing the data between the nodes; this should support:
- ability to rotate the nodes in the cluster (switch off nodes and add new ones randomly) without the loss of data
- distributing the operations across the nodes to keep the data in sync

Sounds unbelievable, but with Erlang this is actually pretty simple. Erlang is actually a great choice for an application like this, since we do not have to worry about setting up the cluster and deal with communication protocols (as in how to pass data over the network between the nodes).

We will need a "main" process, which will take the requests from the users and pass them to the nodes. Each node will store its own copy of the data in memory. On startup, each new node will receive the copy of the data from the first available node. If no nodes are available - that means the cluster is fresh and we can safely assume the data is empty (or the cluster has died altogether and that state is unrecoverable).

Step 1: draw a circle

Start simple: let us have a simple application which starts a background process and responds to some commands. It could be as simple as ping - pong application - the background process waits for a ping message and responds with pong upon receiving one.

-module(db_server).

-export([start/0, stop/0, ping/0]).

start() ->
    Pid = spawn_link(fun () -> loop() end),

    register(server, Pid),

    ok.

stop() ->
    server ! stop,

    ok.

ping() ->
    server ! { ping, self() },

    receive
        pong -> pong
    end.

loop() ->
    receive
        { ping, Pid } ->
            Pid ! pong,
            loop();

        stop -> ok
    end.

This is a relatively simple application, but it has few quirks:

spawn_link/1 (spawn_link(loop)) - this function spawns a new background process and links it to the current one, so that once the current process stops, the linked process also stops
register/2 (register(server, Pid)) - registers a global (in the current module scope) symbol (server in this case) and associates it with the process PID passed as the second parameter (Pid in this case, the PID of the spawned process running the loop function)
the ping/0 function sends a message ({ ping, self() } tuple) containing the atom ping and current process' PID to the process registered as server; it then starts waiting for any process to send a pong message to the current process, effectively blocking the current process until that message is received
the loop function responds to two messages:
- { ping, Pid } - by sending pong message to the process with PID set in the Pid constant and running itself recursively (turning the entire loop/0 function into a while loop)
- stop - by simply returning the ok atom and exiting (thus terminating the loop)

To run this application, start the Erlang shell with erl and type in few prompts:

1> c(db_server).
{ok,db_server}
2> db_server:start().
ok
3> db_server:ping().
pong

The benefit of this architecture is that once we have an Erlang shell open and run the start/0 function, it will start the background process and exit immediately. But the background process will keep running and we will have the other functions available to interact with that process, without loosing the access to the Erlang shell.

This protocol also highlights the concurrent aspect of the application - we send the current process' PID to the callee and start listening for the messages from anywhere. Once we receive the message - we pass it to the other process. So the entire application is predominantly non-blocking and asynchronous in nature.

Step 2: draw a second circle

The application is surely awesome, but let's make it actually useful to any extent. We are designing a key-value storage, so why not to have one? Instead of just reacting to the ping message, we can have a map and have a simple API to set value, get value and delete a value from the map while keeping the map on the child process.

-module(db_server).

-export([ start/0, stop/0 ]).
-export([ get/1, set/2, delete/1 ]).

start() ->
    Pid = spawn_link(fun () -> loop(maps:new()) end),

    register(server, Pid),

    ok.

stop() ->
    server ! stop,

    ok.

set(Key, Value) ->
    server ! { set, Key, Value },

    ok.

get(Key) ->
    server ! { get, Key, self() },

    receive
        { get, Value } -> Value
    end.

delete(Key) ->
    server ! { delete, Key },

    ok.

loop(Data) ->
    receive
        { set, Key, Value } ->
            NewData = maps:put(Key, Value, Data),

            loop(NewData);

        { get, Key, Pid } ->
            Pid ! { get, maps:get(Key, Data, none) },

            loop(Data);

        { delete, Key } ->
            NewData = maps:remove(Key, Data),

            loop(NewData);

        stop -> ok
    end.

As you can see, there's nothing special going on here - all is quite simple. We start the background process with some initial data (maps:new()) and in the loop/1 function whenever we enter recursion, we pass the new version of the data to the recursive call. The only function which actually is blocking its execution until it receives a response from a callee is get/1 - we wait for the background process to send the value to the current process.

The interaction with this application could look like this:

1> c(db_server).
{ok,db_server}
2> db_server:start().
ok
3> db_server:set(moo, -3.14).
ok
4> db_server:get(moo).
-3.14
5> db_server:delete(moo).
ok
6> db_server:get(moo).
none

Step 3: draw the rest of the owl

Now for the juicy bits: let us have a worker node, which will actually be a replica of the database. It is as simple as extracting the logic from loop/1 function to a separate module and having a new process being started on a new node, once we decide to add it to the cluster:

-module(worker).

-export([ loop/1 ]).

loop(Data) ->
    receive
        { set, Key, Value } ->
            NewData = maps:put(Key, Value, Data),

            loop(NewData);

        { get, Key, Pid } ->
            Pid ! { get, maps:get(Key, Data, none) },

            loop(Data);

        { delete, Key } ->
            NewData = maps:remove(Key, Data),

            loop(NewData);

        stop -> ok
    end.

The db_server module needs few changes:

-module(db_server).

-export([ start/0, stop/0 ]).
-export([ get/1, set/2, delete/1 ]).
-export([ add_node/1 ]).

start() ->
    Pid = spawn_link(fun () -> loop([]) end),

    register(server, Pid),

    ok.

stop() ->
    server ! stop,

    ok.

set(Key, Value) ->
    server ! { set, Key, Value },

    ok.

get(Key) ->
    server ! { get, Key, self() },

    receive
        { get, Value } -> Value
    end.

delete(Key) ->
    server ! { delete, Key },

    ok.

add_node(NodeAddr) ->
    server ! { add_node, NodeAddr },

    ok.

loop(Nodes) ->
    receive
        { set, Key, Value } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { set, Key, Value } end, Nodes),

            loop(Nodes);

        { get, Key, Pid } ->
            [{ _, FirstNodePid }|_] = Nodes,

            FirstNodePid ! { get, Key, Pid },

            receive
                { get, Value } -> Pid ! { get, Value }
            end,

            loop(Nodes);

        { delete, Key } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { delete, Key } end, Nodes),

            loop(Nodes);

        { add_node, NodeAddr } ->
            Data = maps:new(),

            NodePid = spawn(NodeAddr, worker, loop, [ Data ]),

            monitor(process, NodePid),

            NewNodes = lists:append([ { NodeAddr, NodePid } ], Nodes),

            loop(NewNodes);

        stop -> ok
    end.

The public API of the db_server module does not really change - there's a new function add_node/1 but that's about it - these exported (aka "public") functions only send messages to the server process.

The loop/1 function on the other hand has most changes - instead of sending message to a single process, it now distributes them across the processes stored in the Nodes constant. This constant is a list of tuples { NodeAdd, NodePid } where the NodeAddr (the first element of the tuple) is just to keep track of nodes added to the list (for debug purposes predominantly) and NodePid (the second element of the tuple) is what is actually used to communicate with the db_server process.

All the control messages (get, set and delete) are passed to all nodes from the Nodes list. Exception being the { get, Key, Pid } message - it only sends the message to the first node from the list. And since all the worker nodes are (supposed to be) just replicas of each other, it is sufficient.

So far this should work. Start the db_server node with

$ erl -sname supervisor

followed by the call to

(supervisor@MACHINENAME)> c(db_server).
{ok,db_server}
(supervisor@MACHINENAME)> db_server:start().
ok

Then, add few nodes to the cluster. For that, the node must be running and must use the same naming convention as the supervisor (short name, set with -sname param to the erl or long name, set with the -name param to the erl) - otherwise they won't see each other.

$ erl -sname subnode1

followed by

(subnode1@MACHINENAME)> c(worker).
{ok,worker}

Note how you don't need to do anything on the worker node except compile the module. The supervisor node will start processes on that node itself.

Lastly, on the server node call

(supervisor@MACHINENAME)> server:add_node(subnode1@MACHINENAME).
ok

The node name is printed out to the Erlang shell once you start erl with -sname or -name param provided. But you can also add io:format("Started node at ~s~n", node()) as the first statement in the loop/1 function in worker module - it will print out the node name to the Erlang shell once the function is started. You can then use it as a raw atom parameter to server:add_node/1.

Now, the last piece to the system is keeping all the worker nodes in sync by copying the data from the first alive node to the new node once it is added to the supervisor.

To do so, we will need to utilize the net_adm module, which conveniently has the net_adm:ping/1 function, taking the node address and returning pond atom if the node is alive and is visible from the current node or pang otherwise. This way we can replace all the occurrences of the Nodes list in the db_server:loop/1 function with the list of alive nodes:

% supervisor
loop(Nodes) ->
    AliveNodes = sets:filter(fun ({ NodeAddr, _ }) -> net_adm:ping(NodeAddr) == pong end, Nodes),

    receive
        { set, Key, Value } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { set, Key, Value } end, AliveNodes),

            loop(AliveNodes);

        { get, Key, Pid } ->
            [ { _, FirstNodePid } | _ ] = AliveNodes,

            FirstNodePid ! { get, Key, Pid },

            receive
                { get, Value } -> Pid ! { get, Value }
            end,

            loop(AliveNodes);

        { delete, Key } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { delete, Key } end, AliveNodes),

            loop(AliveNodes);

        { add_node, NodeAddr } ->
            Data = maps:new(),

            NodePid = spawn(NodeAddr, worker, loop, [ Data ]),

            monitor(process, NodePid),

            NewNodes = lists:append([ { NodeAddr, NodePid } ], AliveNodes),

            loop(NewNodes);

        stop -> ok
    end.

Now to get the data from the node, we will need an extra message to be supported by the worker module:

% worker
loop(Data) ->
    receive
        { set, Key, Value } ->
            NewData = maps:put(Key, Value, Data),

            loop(NewData);

        { get, Key, Pid } ->
            Pid ! { get, maps:get(Key, Data, none) },

            loop(Data);

        { delete, Key } ->
            NewData = maps:remove(Key, Data),

            loop(NewData);

        { get_all, Pid } ->
            Pid ! { get_all, Data },

            loop(Data);

        stop -> ok
    end.

This way when we want to add a new node to the cluster, we will need to first get the data from the first available node and then send it to the newly added node in the db_server module. But if there are no nodes alive, we want the maps:new() to be the new data:

% supervisor
loop(Nodes) ->
    AliveNodes = lists:filter(fun ({ NodeAddr, _ }) -> net_adm:ping(NodeAddr) == pong end, Nodes),

    receive
        { set, Key, Value } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { set, Key, Value } end, AliveNodes),

            loop(AliveNodes);

        { get, Key, Pid } ->
            [ { _, FirstNodePid } | _ ] = AliveNodes,

            FirstNodePid ! { get, Key, Pid },

            receive
                { get, ReceivedValue } -> ReceivedValue
            end,

            Pid ! { get, Value },

            loop(AliveNodes);

        { delete, Key } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { delete, Key } end, AliveNodes),

            loop(AliveNodes);

        { add_node, NodeAddr } ->
            Data = case AliveNodes of
                [ { _, FirstNodePid } | _ ] ->
                    FirstNodePid ! { get_all, self() },

                    receive
                        { get_all, NewData } -> NewData;

                        _ -> maps:new()
                    end;

                _ -> maps:new()
            end,

            NodePid = spawn(NodeAddr, worker, loop, [ Data ]),

            monitor(process, NodePid),

            NewNodes = lists:append([ { NodeAddr, NodePid } ], AliveNodes),

            loop(NewNodes);

        stop -> ok
    end.

You may practice putting nodes on and off the running cluster, getting and setting the values at the same time. But occasionally you might stumble upon a db_server:get/1 function hanging. This is what happens when there are no alive nodes on the cluster. In order to fix this, we can simply add a check for that instead of blindly taking the first node from the AliveNodes list:

% supervisor
loop(Nodes) ->
    AliveNodes = lists:filter(fun ({ NodeAddr, _ }) -> net_adm:ping(NodeAddr) == pong end, Nodes),

    receive
        { set, Key, Value } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { set, Key, Value } end, AliveNodes),

            loop(AliveNodes);

        { get, Key, Pid } ->
            Value = case AliveNodes of
                [ { _, FirstNodePid } | _ ] ->
                    FirstNodePid ! { get, Key, self() },

                    receive
                        { get, ReceivedValue } -> ReceivedValue
                    end;

                _ -> no_nodes
            end,

            Pid ! { get, Value },

            loop(AliveNodes);

        { delete, Key } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { delete, Key } end, AliveNodes),

            loop(AliveNodes);

        { add_node, NodeAddr } ->
            Data = case AliveNodes of
                [ { _, FirstNodePid } | _ ] ->
                    FirstNodePid ! { get_all, self() },

                    receive
                        { get_all, NewData } -> NewData;

                        _ -> maps:new()
                    end;

                _ -> maps:new()
            end,

            NodePid = spawn(NodeAddr, worker, loop, [ Data ]),

            monitor(process, NodePid),

            NewNodes = lists:append([ { NodeAddr, NodePid } ], AliveNodes),

            loop(NewNodes);

        stop -> ok
    end.

This will still not work however. The issue is that we only assign the value to the AliveNodes once we enter the db_server:loop/1 function. And that only happens after it processed a message. But the moment the value is actually needed is right before processing a message. See the difference? There might be a delay between processing (as in "receiving") a message and figuring the list of actually alive nodes because the process is blocked waiting for new messages.

One way to fix this is to move the AliveNodes assignment to each branch of the receive block:

% supervisor
loop(Nodes) ->
    receive
        { set, Key, Value } ->
            AliveNodes = lists:filter(fun ({ NodeAddr, _ }) -> net_adm:ping(NodeAddr) == pong end, Nodes),

            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { set, Key, Value } end, AliveNodes),

            loop(AliveNodes);

        { get, Key, Pid } ->
            AliveNodes = lists:filter(fun ({ NodeAddr, _ }) -> net_adm:ping(NodeAddr) == pong end, Nodes),

            Value = case AliveNodes of
                [ { _, FirstNodePid } | _ ] ->
                    FirstNodePid ! { get, Key, self() },

                    receive
                        { get, ReceivedValue } -> ReceivedValue
                    end;

                _ -> no_nodes
            end,

            Pid ! { get, Value },

            loop(AliveNodes);

        { delete, Key } ->
            AliveNodes = lists:filter(fun ({ NodeAddr, _ }) -> net_adm:ping(NodeAddr) == pong end, Nodes),

            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { delete, Key } end, AliveNodes),

            loop(AliveNodes);

        { add_node, NodeAddr } ->
            AliveNodes = lists:filter(fun ({ NodeAddr, _ }) -> net_adm:ping(NodeAddr) == pong end, Nodes),

            Data = case AliveNodes of
                [ { _, FirstNodePid } | _ ] ->
                    FirstNodePid ! { get_all, self() },

                    receive
                        { get_all, NewData } -> NewData;

                        _ -> maps:new()
                    end;

                _ -> maps:new()
            end,

            NodePid = spawn(NodeAddr, worker, loop, [ Data ]),

            monitor(process, NodePid),

            NewNodes = lists:append([ { NodeAddr, NodePid } ], AliveNodes),

            loop(NewNodes);

        stop -> ok
    end.

Alternatively, we can utilize the after keyword which effectively implements a timeout mechanism:

% supervisor
loop(Nodes) ->
    receive
        { set, Key, Value } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { set, Key, Value } end, Nodes),

            loop(Nodes);

        { get, Key, Pid } ->
            Value = case Nodes of
                [ { _, FirstNodePid } | _ ] -> FirstNodePid ! { get, Key, self() },

                    receive
                        { get, ReceivedValue } -> ReceivedValue,

                        _ -> none
                    end;

                _ -> no_nodes
            end,

            Pid ! { get, Value },

            loop(Nodes);

        { delete, Key } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { delete, Key } end, Nodes),

            loop(Nodes);

        { add_node, NodeAddr } ->
            Data = case Nodes of
                [ { _, FirstNodePid } | _ ] ->
                    FirstNodePid ! { get_all, self() },

                    receive
                        { get_all, NewData } -> NewData;

                        _ -> maps:new()
                    end;

                _ -> maps:new()
            end,

            NodePid = spawn(NodeAddr, worker, loop, [ Data ]),

            monitor(process, NodePid),

            NewNodes = lists:append([ { NodeAddr, NodePid } ], Nodes),

            loop(NewNodes);

        stop -> ok
    after
        0 ->
            AliveNodes = lists:filter(fun ({ NodeAddr, _ }) -> net_adm:ping(NodeAddr) == pong end, Nodes),

            loop(AliveNodes)
    end.

This will make the receive expression of the db_server:loop/1 function check if there are any messages in the internal message inbox (aka message queue) associated with the current function. And if there are no messages after 0 (as in this case) time since the check started - the code in the corresponding matching expression will be executed. In our scenario, we figure the new list of alive nodes and restart the loop function. This way the time the process is sitting blocked waiting for messages is bare minimal.

On the other hand, if you look carefully at all those receive statements within the loop function, you might just notice... that you don't need them in fact - the worker nodes might just communicate with each other! So instead of sending self() PID to each worker node (for ex. for get message or get_all message), you can just send the received node PID (which is a part of the message) to the worker node. This will make the code highly asynchronous:

% supervisor
loop(Nodes) ->
    AliveNodes = lists:filter(fun ({ NodeAddr, _ }) -> net_adm:ping(NodeAddr) == pong end, Nodes),

    receive
        { set, Key, Value } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { set, Key, Value } end, AliveNodes),

            loop(AliveNodes);

        { get, Key, Pid } ->
            case AliveNodes of
                [ { _, FirstNodePid } | _ ] -> FirstNodePid ! { get, Key, Pid };

                _ -> Pid ! { get, no_nodes }
            end,

            loop(AliveNodes);

        { delete, Key } ->
            lists:foreach(fun ({ _, NodePid }) -> NodePid ! { delete, Key } end, AliveNodes),

            loop(AliveNodes);

        { add_node, NodeAddr } ->
            NodePid = spawn(NodeAddr, worker, loop, [ Data ]),

            monitor(process, NodePid),

            NewNodes = lists:append([ { NodeAddr, NodePid } ], AliveNodes),

            case AliveNodes of
                [ { _, FirstNodePid } | _ ] -> FirstNodePid ! { get_all, NodePid };

                _ -> NodePid ! { got_all, maps:new() }
            end,

            loop(NewNodes);

        stop -> ok
    end.

This will need a tiny adjustment to the worker module (to support that { got_all, Data } messages), but the benefits are more than worth it!

% worker
loop(Data) ->
    receive
        { set, Key, Value } ->
            NewData = maps:put(Key, Value, Data),

            loop(NewData);

        { get, Key, Pid } ->
            Pid ! { get, maps:get(Key, Data, none) },

            loop(Data);

        { delete, Key } ->
            NewData = maps:remove(Key, Data),

            loop(NewData);

        { get_all, Pid } ->
            Pid ! { get_all, Data },

            loop(Data);

        { got_all, NewData } ->
            loop(NewData);

        stop -> ok
    end.

Wrap-up

The cherry on top of this cake would be to create an actual REST API (or any non-Erlang API for that matter) so that this database could be used from the outside world without starting an Erlang shell. But I'd rather save it for a later blog.

For now you can actually plug and unplug nodes from the cluster, and while there is at least one node alive, the data integrity is guaranteed.

The highly asynchronous architecture of the solution allows for non-blocking calls all over the place.

For me, this is a great example on how to actually use Erlang!

The slightly built-up code is hosted on GitHub.

What is a Monad?

02 Jul 2021

What is a monad?

There is a talk by Gilad Bracha, "Deconstructing functional programming". This is a very good introduction to functional programming, monads included.

As per that talk, think about monad as a class like this:

abstract class FlatMappable <A> {
  @Immutable
  private final A a;
  
  FlatMappable(A a) {
    this.a = a;
  }
  
  FlatMappable<B> flatMap<B>(Function<A, FlatMappable<B>> fn) {
    return fn.apply(a);
  }
}

Just rename FlatMappable to Monad and there you go.

Now, if you want more Haskell naming (gonna implement it in C++ for more correct syntax):

template <typename A>
cass Monad {
public:
  static Monad<A> return(A a) {
    return Monad(a);
  }
  
  template <typename B> Monad<B> operator>>=(std::function<A, Monad<B>> fn) {
    return fn.apply(a);
  }

private:
  Monad(A a) : m_a(a) {}
  
  immutable<A> m_a;
}

Essentially, renaming constructor to return and flatMap to operator>>=

Also, in terms of Mappable vs FlatMappable:

class Mappable <A> {
  private final A a;
  
  Mappable(A a) { this.a = a; }
  
  Mappable<B> map(Function<A, B> fn) {
    return Mappable(fn.apply(a));
  }
}

class Flattable <A> extends Mappable <A> {
  Flattable(A a) { super(a); }
  
  Flattable<A> flatten() {
    if (a instanceOf Flattable<A>) {
      return a.flatten();
    }
    
    return a;
  }
}

class FlatMappable <A> extends Flattable <A> {
  FlatMappable(A a) { super(a); }
  
  FlatMappable<A> flatMap<B>(Function<A, FlatMappable<B>> fn) {
    return map(fn).flatten();
  }
}

Why do we need monads?

In order to preserve purity of programs in functional programming languages, you can not have side-effects in your program.

But if you need interaction with outside systems (IO, database, system clock, random number generator, etc.), you will need to somehow make these operations without side-effects.

So you write your entire program as a description of a workflow (e.g. how data will be processed). Basically your program becomes a chain of functions calling other functions. No waiting for events, nothing like that.

Then, when you run your program, "at the execution edge" (aka "event horizon"), just before your program finishes, the runtime will do all the "unsafe" (or rather "impure") operations (interaction with IO, system clock, etc.) and will feed all the input to your functions, take the output and modify "the state of the world outside".

Where are monads on this? Monads are basically just wrappers around other values, telling the runtime what to do. So that your code is still pure and all the impure side-effects will be eventually handled by runtime.

For example, reading from STDIN would be something like IO<String> - a monad, whose value will be provided eventually. So your program will be defined as a function something like

def main(IO[String] input): IO[String] =
  input.map { name -> printLn("Hello, ${name}") }

As you can see, main is a function which simply returns another IO[String] by mapping a monad parameter.

When you run it, the program will effectively stop until the IO[String] input parameter is filled with value. When the value will be provided, runtime will execute the rest of the code (the map { ... } part).

irrPaint3d

04 May 2021

Recently I was reviving some of my old projects. And to my surprise, there were some people actually interested in those! That was a good enough reason for me to revise the old pet projects of mine and exercise my skills (mostly in C++ though).

One really interesting project I had back in the day was irrPaint3d. It all started as an idea for my B.Sc. thesis and the ultimate goal was to enable users paint 3D objects and immediately see the results in realtime, in 3D.

This is a no unique feature nowadays, but back in 2013, to my knowledge, the only way to texture 3D models was to either unwrap them to UV maps and then paint the textures in graphics editor (such as Gimp or Paint.NET) or to use a proprietary software, 3D Coat.

Unwrapping 3D model texture in Blender in 2013

And a mere idea of implementing such tool myself was quite exciting. Darn, if I managed to implement it back in 2013, my thesis would shine among best thesises in my uni!

Long story short, I have failed with that. And now, after spending just few days on this, I am happy to say I have achieved something with yet another glorious pet project revival:

And it actually allows to paint the model in 3D, displaying the results in real-time!

A bit of history on how I got there and maths behind the solution under the cut.

Irrlicht application template

02 May 2021

Often when I start revising my old applicaitons with Irrlicht engine, I do few things very similarly. Especially when the application contains GUI and uses Irrlicht tools for it.

This is mostly due to the extremely outdated nature of Irrlicht itself. There are all sorts of things in it like the lack of 4K monitors support, the use of a very obscure font, sample applications being a simple yet messy single CPP files, etc.

The common things I do include:

using new C++ standard features such as:
- shared pointers
- automatic type inference
- standard containers
- C++-style string operations
setting the new font for GUI, adopted to higher screen resolutions
using CMake to build the project and vcpkg to manage dependencies
utilizing object-oriented approach
moving the classes to separate header and CPP files

In this blog I describe in bloody detail what I do and why.

Strongly-typed front-end

19 Apr 2021

In this little research project I describe my journey through a series of experiments trying out a number of technologies and clashing them against each other.

Introduction
Experiment 1 - type safety
Experiment 2 - simple application
- Elm
- F#
- PureScript
- ReasonML

Strongly-typed front-end: introduction

19 Apr 2021

In this little research project I describe my journey through a series of experiments trying out a number of technologies and clashing them against each other.

I have always questioned the real value all those languages that compile to JS, especially TypeScript or Flow, give you.

So I have asked Atlassian front-enders a question:

I need your opinions for my research: what benefits does TypeScript (or Flow, depending on your camp) give you? why do you use it?

The answers I have received varied but the common themes were:

we like types! 😍
less errors
easier refactoring
tools & IDEs integration (mainly for code navigation and autocomplete)
self-documented or more readable code

The issues I see with TypeScript and Flow are bit closer to the real world:

they catch way too few errors - unless your whole project (including 3rd party dependencies) is using the thing correctly, you will see the errors whenever you end up in the layer between native JS and typed code
more often than not, error messages are either pointless or hard to read (see the examples below)

I do acknowledge the earlier you catch an error, the cheaper the fix would be. You have to put some effort into writing the typed code, but if it only catches a fraction of errors and only at compile time, then why all the hassle?

"Benefits" of TypeScript

Pointless errors

The most issues I have seen so far happen in what is considered an stdlib:

interface MyClass {
  id: string;
  val: number;
}

const a: MyClass[] = [
  { id : 'moo', val: 1 }, 
  { id: 'foo', val: -1 },
  { id: 'bar', val: 3.14 },
];

const counts = a.reduce((acc, e) => {
  if (acc.has(e.id)) {
    acc.set(e.id, acc.get(e.id) + e.val);
  } else {
    acc.set(e.id, e.val);
  }
  
  return acc;
}, new Map<string, number>());

Here we are reducing a list of objects. TS is freaking out every time you are using Map (however, it is a natively supported type, IIRC) - calling map.get() will always produce T | undefined type, unless you explicitly tell TS the type is T by using the as operator:

acc.set(e.id, acc.get(e.id) + e.val); // ERROR: Object is possibly 'undefined'.

Adding an explicit check does not have any effect:

if (acc.has(e.id) && acc.get(e.id) !== undefined) {
  acc.set(e.id, acc.get(e.id) + e.val); // TS does not give a damn: Object is possibly 'undefined'.
}

whereas

acc.set(e.id, acc.get(e.id) as number + e.val); // OK

But the issue is: if you do not add the check, the value in fact might be undefined.

Flow has flaws here too:

/* @flow */

interface MyClass {
  id: string;
  val: number;
}

const a: MyClass[] = [
  { id : 'moo', val: 1 }, 
  { id: 'foo', val: -1 },
  { id: 'bar', val: 3.14 },
];

const counts = a.reduce((acc, e) => {
  if (acc.has(e.id) && acc.get(e.id) !== undefined) { // ERROR: Cannot perform arithmetic operation because undefined [1] is not a number. [unsafe-addition]
    acc.set(e.id, acc.get(e.id) + e.val);
  } else {
    acc.set(e.id, e.val);
  }
  
  return acc;
}, new Map<string, number>());

But it provides a bit more context about the issue:

    16:     acc.set(e.id, acc.get(e.id) + e.val);
                          ^ Cannot perform arithmetic operation because undefined [1] is not a number. [unsafe-addition]
        References:
        [LIB] ..//static/v0.135.0/flowlib/core.js:617:     get(key: K): V | void;
                                                                            ^ [1]

Both Flow and TS work fine if you extract the .get call result to a variable and add a check for undefined:

interface MyClass {
    id: string;
    val: number;
}

const a: MyClass[] = [
    { id : 'moo', val: 1 }, 
    { id: 'foo', val: -1 },
    { id: 'bar', val: 3.14 },
];

const counts = a.reduce((acc, e) => {
    const prevVal = acc.get(e.id);

    if (prevVal !== undefined) {
        acc.set(e.id, prevVal + e.val);
    } else {
        acc.set(e.id, e.val);
    }

    return acc;
}, new Map<string, number>());

Implementation-specific errors

In order to understand this error message, you have to know how enums are implemented in TypeScript:

enum Figure {
  RECTANGLE,
  SQUARE,
  CIRCLE,
}

const area: Record<Figure, Function> = {
  [Figure.RECTANGLE]: (w: number, h: number) => w * h,
  [Figure.CIRCLE]: (r: number) => Math.PI * r * r,
  // ERROR: Property '1' is missing in type '{ 0: (w: number, h: number) => number; 2: (r: number) => number; }' but required in type 'Record<Figure, Function>'.
};

console.log(area);

Enums in TS are backed by numbers, by default. In order for that error above to make sense, you have to provide some sort of a reasonable (.toString()-backed) value for enum values:

enum Figure {
  RECTANGLE = 'RECTANGLE',
  SQUARE = 'SQUARE',
  CIRCLE = 'CIRCLE',
}

const area: Record<Figure, Function> = {
  [Figure.RECTANGLE]: (w: number, h: number) => w * h,
  [Figure.CIRCLE]: (r: number) => Math.PI * r * r,
  // ERROR: Property 'SQUARE' is missing in type '{ RECTANGLE: (w: number, h: number) => number; CIRCLE: (r: number) => number; }' but required in type 'Record<Figure, Function>'.
};

console.log(area);

Runtime is imperfect

You might have typed every single bit of your project and all the 3rd party dependencies. And you did it right. This still does not guarantee you won’t have Can not read property XXX of undefined or XXX is not a function at run time.

Type system won’t really save you, if you only have covered some of the use cases, but the user ended up in uncovered one:

import React, { useState, useCallback } from 'react';

enum Shape {
  SQUARE = 'SQUARE',
  CIRCLE = 'CIRCLE',
}

const AREA: Record<Shape, Function> = {
  [Shape.SQUARE]: (side: number) => side * side,
  [Shape.CIRCLE]: (r: number) => Math.PI * r * r,
};

export default () => {
  const [shape, setShape] = useState<Shape>(null);
  const [value, setValue] = useState<number>(0);
  const [area, setArea] = useState<number>(0);

  const onShapeChanged = useCallback((e: React.ChangeEvent<HTMLSelectElement>) => {
    setShape(e.target.value);
  }, []);

  const onValueChanged = useCallback((e: React.ChangeEvent<HTMLInputElement>) => {
    setValue(parseFloat(e.target.value));
  }, []);

  const onSubmit = useCallback(() => {
    setArea(AREA[shape](value));
  }, [shape, value]);

  return (
    <div>
      <select onChange={onShapeChanged}>
        <option value="">Choose shape</option>

        {Object.keys(AREA).map(shape => (<option value={shape}>{shape}</option>))}
      </select>

      <input value={value} onChange={onValueChanged} />

      <button onClick={onSubmit}>Calculate area</button>

      <div>
        Area: {area}
      </div>
    </div>
  );
};

This is a quite simple application (sandbox), built on top of the previous example with enums and records in TypeScript.

There are at least two uncovered scenarios in this application, resulting in errors:

when user does not select a shape and clicks “calculate”, the TypeError: AREA[shape] is not a function will be thrown
when user types anything but number in the input, the value immediately becomes NaN; if user then clicks “calculate”, an error won’t be thrown (since the app does not use the calculation result in any way), but the area calculated will also be NaN; for this example this is fine, but imagine using the value further down the line in some financial calculations

This is a trivial synthetic example and the errors might be easy to spot and fix, but the important question is: did TypeScript help you find those errors?

If you set up TSLint, you might have some errors caught:

Argument of type 'null' is not assignable to parameter of type 'Shape | (() => Shape)'.ts(2345)
Argument of type 'string' is not assignable to parameter of type 'SetStateAction<Shape>'.ts(2345)
Type 'null' cannot be used as an index type.ts(2538)

Unless you do the right thing, you might end up fixing those scenarios. But instead, I often see solutions like these (sandbox):

const [shape, setShape] = useState<Shape | null>(null);

// ...

setShape(e.target.value as Shape);

// ...

setArea(shape ? AREA[shape](value) : 0);

Those solutions do solve a subset of errors, at a cost of readability and potential other errors.

There are no classes in JavaScript

Found this one recently, apparently TypeScript classes are same as JavaScript (ES6) classes:

namespace TypeScriptIsGarbage {
  export class A {};

  export class B {};

  export const a = (): A => new B(); // perfectly fine

  export const b = (): B => new A(); // perfectly fine
}

namespace TypeScriptIsJavaScript {
  export enum Types {
    A,
    B,
  }

  export type A = { type: Types.A };

  export type B = { type: Types.B };

  export const a = (): A => ({ type: Types.B }); // Type 'Types.B' is not assignable to type 'Types.A'.

  export const b = (): B => ({ type: Types.A }); // Type 'Types.A' is not assignable to type 'Types.B'.
}

This one is actually quite serious, if your application relies on type safety and objective-oriented-design.

Try doing it in C# (which TS tries to inherit from, iirc) and yo'll get sane compile-time errors:

using System;

class A {}

class B {}

class Main {
  A createA() {
    return new B(); // Cannot implicitly convert type `B` to `A`
  }

  B createB() {
    return new A(); // Cannot implicitly convert type `A` to `B`
  }

  public static void Main(string[] args) {}
}

IDE integration is awesome

Recently I had to use both Cypress and Jest in a project of mine. Cypress was used for E2E tests and Jest was used for unit-tests. And they both provide some sort of assertion framework (think all those expect() calls).

And apparently their definitions are different and are clashing, since my VSCode looks like this:

Apparently, I needed two separate tsconfig.json files, for each specific set of tests to even compile the thing. Which is still not recognized by VSCode.

Errors can be found and eliminated early

The helpfulness of the error messages by TS compiler is far from perfect. And same holds for TSLint.

And in some cases (as with type checks), they are even completely missing, so good luck finding out why the application does not work.

It is possible to add a bunch of debugger statements, breakpoints and console.log()s to the code. But what is the benefit of that complex setup and extra overhead of typing every single line then?

Strongly-typed front-end: experiment 2, simple application, in Elm

19 Apr 2021

(Heavily over-opinionated statement) Elm forces you to handle error scenarios when writing the code.

Sandbox

This is pretty much a translation of a TypeScript code from above:

module Main exposing (..)

import Browser
import Html exposing (Html, button, div, text, input, select, option)
import Html.Attributes exposing (value)
import Html.Events exposing (onClick, onInput)

-- util

type Shape = Circle | Square

calculateArea : Shape -> Float -> Float
calculateArea shape value =
  case shape of
    Circle -> pi * value * value
    
    Square -> value * value
    
-- MAIN

main =
  Browser.sandbox { init = init, update = update, view = view }

-- MODEL

type alias Model = { shape: Shape, value: Float, area: Float }

init : Model
init = { shape = "", value = 0, area = 0 }

-- UPDATE

type Msg
  = ShapeChanged Shape
  | ValueChanged Float
  | CalculateArea

update : Msg -> Model -> Model
update msg model =
  case msg of
    ShapeChanged shape ->
      { model | shape = shape }

    ValueChanged value ->
      { model | value = value }
      
    CalculateArea ->
      { model | area = (calculateArea model.shape model.value) }

-- VIEW

onShapeChanged : String -> Msg
onShapeChanged shape = 
  case shape of
    "circle" -> ShapeChanged Circle
    "square" -> ShapeChanged Square

onValueChanged : String -> Msg
onValueChanged value = ValueChanged (Maybe.withDefault 0 (String.toFloat value))

view : Model -> Html Msg
view model =
  div []
    [ select [ onInput onShapeChanged ] [ 
      option [ value "" ] [ text "Choose shape" ], 
      option [ value "circle" ] [ text "Circle" ],
      option [ value "square" ] [ text "Square" ] ]
    , input [ value (String.fromFloat model.value), onInput onValueChanged ] []
    , button [ onClick CalculateArea ] [ text "Calculate area" ]
    , div [] [ text ("Area: " ++ (String.fromFloat model.area)) ]
    ]

Note that it won’t compile:

-- TYPE MISMATCH ----------------------------------------------- Jump To Problem

Something is off with the body of the `init` definition:

29| init = { shape = "", value = 0, area = 0 }
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The body is a record of type:

    { area : Float, shape : String, value : Float }

But the type annotation on `init` says it should be:

    Model

Strongly-typed front-end: experiment 2, simple application, in PureScript

19 Apr 2021

In PureScript world there are quite a few libraries for React. And all of them have terrible (or rather non-existent) documentation, so I had to use as much intuition as outdated and barely working code samples. In this case I have picked purescript-react-dom.

Initial application structure:

module Main where

import Prelude

import Control.Monad.Eff

import Data.Maybe
import Data.Maybe.Unsafe (fromJust)
import Data.Nullable (toMaybe)

import Effect (Effect)
import Effect.Console (log)

import DOM (DOM())
import DOM.HTML (window)
import DOM.HTML.Document (body)
import DOM.HTML.Types (htmlElementToElement)
import DOM.HTML.Window (document)

import DOM.Node.Types (Element())

import React

import React.DOM as DOM
import React.DOM.Props as Props

type Shape = Circle | Square

calculateArea :: Maybe Shape -> Float -> Float
calculateArea Nothing _ = 0
calculateArea (Just Circle) value = pi * value * value
calculateArea (Just Square) value = value * value

getShape :: String -> Maybe Shape
getShape "circle" = Just Circle
getShape "square" = Just Square
getShape _ = Nothing

onShapeChanged ctx evt = do
  writeState ctx { shape: getShape ((unsafeCoerce evt).target.value) }

onCalculateAreaClicked ctx evt = do
  { shape, value } <- readState ctx
  writeState ctx { area: calculateArea shape value }

areaCalculator = createClass $ spec { shape: Nothing, value: 0, area: 0 } \ctx -> do
  { shape, value, area } <- readState ctx
  return $ DOM.div [] [
    DOM.div [] [
      DOM.select [ Props.onChange (onShapeChanged ctx) ] [
          DOM.option [ Props.value "" ] [ DOM.text "Select shape" ],
          DOM.option [ Props.value "circle" ] [ DOM.text "Circle" ],
          DOM.option [ Props.value "square" ] [ DOM.text "Square" ]
      ],
      DOM.input [ Props.value (show value) ] [],
      DOM.button [ Props.onClick (onCalculateAreaClicked ctx) ] [ DOM.text "Calculate area" ]
    ],
    DOM.div [] [
      DOM.text ("Area: " ++ (show area))
    ]
    ]

main = container >>= render ui
  where
  ui :: ReactElement
  ui = createFactory areaCalculator {}

  container :: forall eff. Eff (dom :: DOM | eff) Element
  container = do
    win <- window
    doc <- document win
    elt <- fromJust <$> toMaybe <$> body doc
    return $ htmlElementToElement elt

Jargon-free functional programming. TL;DR

.gitignore is not ignoring

Distributed Erlang example

Step 1: draw a circle

Step 2: draw a second circle

Step 3: draw the rest of the owl

Wrap-up

What is a Monad?

What is a monad?

Why do we need monads?

irrPaint3d

Irrlicht application template

Strongly-typed front-end

Strongly-typed front-end: introduction

Contents

"Benefits" of TypeScript

Pointless errors

Implementation-specific errors

Runtime is imperfect

There are no classes in JavaScript

IDE integration is awesome

Errors can be found and eliminated early

Strongly-typed front-end: experiment 2, simple application, in Elm

Contents

Strongly-typed front-end: experiment 2, simple application, in PureScript

Contents