Clean before running a Spring Boot application in IntelliJ Idea

12 Jul 2025

My teammate has faced an issue when running our SpringBoot application in IntelliJ Idea: there were certain Gradle artifacts cached between switching the branches, which prevented the application from running. The issue is easily solved by running clean Gradle task. But my teammate was running it using IntelliJ's UI. And by default, it does not do a clean build before running - just the normal build, if the files have changed.

The solution was a bit of UI trickery.

In the run menu, edit the run configuration (from the meatball menu):

From the run configuration window, click the "Modify options" link:

In the "Before launch" section of a very long drop-down menu, click the "Add before launch task":

From the "Add task" pop-up menu select "Run Gradle task":

In the Gradle tak window fill out the "Gradle project" and "Tasks" fields (they both have auto-complete):

Bear in mind: this advice might become obsolete as IntelliJ team changes their new UI.

Modal text editors

27 Mar 2025

I was using vim on-and-off since my pal showed me Linux on my first ever computer, back in circa 2005. I then learned emacs back in 2015 and used it for remote development (over SSH). That was the first time I used a bunch of plugins to improve my developer experience. Then, circa 2022, I heard about an interesting take on vim ideology - Kakoune. I liked how it allowed user to see what he is about to operate on before actually performing an action. I did not end up using it often, but I liked the idea. Then at the end of 2024, I heard about Helix - another take on Vim ideology (or more like Kakoune at that point). At some random morning, scrolling the suggested videos on YouTube, I stumbled upon a semi-interesting (at the time) video about Vim productivity tips. Creator showed an interesting plugin, flash, which allowed him to quickly jump between any two points on the screen - much faster and easier than what I usually would do in Vim (relative line numbers and motions). I really liked the idea, to a point where I was on the edge of giving it a try. Since we have a few people in our team using NeoVim or Vim plugin for VS Code or IntelliJ Idea, I thought about using each of them for a while for work and comparing the experience.

IntelliJ Idea, baseline

For a number of years, the way I develop in IntelliJ is quite keyboard-centric - I disable tabs and use Shift-Shift menu and its tabs to navigate the project and perform actions.

The quick action menu has 6 tabs:

Classes
Files
Symbols
Actions
Text
All (combined of the above)

For the most part I only use Classes, Files and Actions tabs via a keyboard shortcut. From the Actions tab I rarely use more than "rename", "extract variable/method" or "copy path/reference".

The only other two features of the IDE I actually use are the file tree (so that I can see the structure of the project - it comes handy for Java packages organization, for example) which I configure to automatically focus the file currently opened in the editor and the terminal, which I use for pretty much everything else (including Git interactions and running / building project).

Actually, IntelliJ Idea gives you the statistics of features you use in the Help -> My Productivity menu.

In my case it looks like this (ordered by the usage frequency, descending - from most used to least):

Recent files
Go to declaration
Code completion
Syntax-aware selection (Cmd + w)
Find in files
Context actions (suggestions like "remove unused", "replace with X", "import Y", etc.)
Find (in current file)
Toggle comment
Go to file
Go to class
Rename
Find and replace (in current file)
Generate code (constructors, getters and setters)
Go to implementation
Introduce variable
Change case (camel case)
Implement methods

Some of these (implement methods, introduce variable, generate code) might be hard to achieve in either of editors, so they are not requirements, but rather nice to haves.

So this would be my measure of success in other editors (in terms of comfort) - on top of movement in a document, I would like to compare the editors in terms of moving in the project (files, symbols / classes and references) and quick actions.

NeoVim

I thought about trying NeoVim since it was the hype at the time. I was dumbfolded by the fact it did not even come with the default config file to work with (in a stumbling contrast to Helix, which has :config-open and :config-reload commands built in). Yet I kept going, spending days perfecting my config. I switched my terminal emulator from Warp to Ghostty (so that I can see images right inside my terminal, improving file manager experience), installed a good dozen of plugins and messed with color schemes.

My full neovim config is actually public.

After almost a week, my NeoVim plugin setup looked like this:

Lazy - plugin manager
Telescope - for fzf and quick actions
Neo-tree - the file tree

These three plugins alone allowed me to already be quite a bit productive compared to vanilla Vim experience. But it was not a complete setup:

telescope-file-browser (?) - as an alternative for Neo-tree
telescope-ui-select - for LSP integration (code actions, specifically)
mason - for LSP management
mason-lspconfig - a middleware between mason and lspconfig
nvim-lspconfig - for configuring LSP in NeoVim
nvim-treesitter - for treesitter integration
toggleterm - terminal integration, toggleable; obsolete when using zellij
vim-sleuth - automatically detects tabwidth settings for the buffer
Comment.nvim - toggle comment for lines and blocks

A rather large set of plugins is required for autocomplete to work:

nvim-cmp - autocomplete engine
cmp-nvim-lsp - autocomplete integration for LSP
cmp-path - autocomplete for file paths
LuaSnip - snippets for Lua
cmp_luasnip - autocomplete integration for luasnip
friendly-snippets - a set of snippets
lspkind.nvim - icons for autocomplete options

Those plugins make up for almost a complete setup.

But I wanted to further improve the Vim experience somewhat so motions become easier (like in the video from the top of the post):

nvim-treesitter-textobjects (?) - for moving between text blocks (paragraphs, function / class scopes, parameters, etc.)
nvim-surround - for operations on surrounding characters (braces, brackets, quotes, ticks, etc.)
flash - the fast movements
vim-visual-multi - true multi-cursor editing (unlike visual block mode, which would require macros for the best experience)

And there are a few quality-of-life improvements:

undotree - local file history
noice - moves command line and search line to the middle of the screen, in a popup panel
indent-blankline - to show indentation lines
cmp-cmdline - for autocomplete suggestions in the Noice command popup
nvim-treesitter-context - shows the current context as fixed lines at the top of the screen (like in VSCode)
outline with outline-treesitter-provider - shows symbols defined in the current buffer
quicker - allows for better quickref buffer experience
telescope-live-grep-args - allows to search not by text (or regex) alone, but also by filename / filetype (and more)

The config I have at the moment allows the following:

file tree
- <leader>n toggles NeoTree
- ~~when NeoVim opens, NeoTree opens too~~ later I decided against this, since sometimes I want to just edit a single file without a care for the directory containing it
- NeoTree follows the currently open file
- NeoTree shows hidden files (including dot files and directories, like .github, which are hidden by default)
- when NeoTree is the last open buffer, NeoVim will quit
movements
- s performs a Flash search (with literal references to different points in text)
- S-s performs a treesitter Flash search
terminal
- trying to utilize zellij instead of built-in NeoVim terminal, following the "one tool for its purpose" principle
- <leader>t toggles a terminal
- C-\ C-n switches terminal to Normal mode, but I also remapped it to <Esc><Esc>
- i switches terminal to Insert mode
navigation
- C-left / C-right goes through the jumplist (similarly to back/forward in IntelliJ / VSCode)
editing
- ys<motion><character> wraps selected text in a specified character
- cs<old char><new char> changes the wrapped character around text under cursor from <old char> to <new char>
- ds<char> deletes wrapped character around text under cursor
- gcc comments / uncomments the line
- gc comments the selected block of text
- <Esc> in Normal mode sends nohlsearch to remove search highlights
- <leader>u shows a list of changes (undo tree), aka local file history
- C-n / C-<down> / C-<up> creates a new cursor for the same word occurrences / below current cursor / above current cursor
LSP
- S-k shows the hint for the symbol under cursor
- <leader>ca (built-in gra since NeoVim 0.11) shows code actions for the symbol under cursor in a Telescope panel
- treesitter-textobjects provides custom scopes (like w for word or W for WORD):
  - ac / ic - outer class code / inner class code
  - af / if - outer function / inner function
  - as - local scope
  - ai / ii - outer function call / inner function call (i for invocation)
  - ap / ip - outer / inner parameter
  - ar / ir - outer / inner return statement
- <leader>o shows list of symbols defined in the current buffer (outline)
- gr (built-in grr since NeoVim 0.11) shows a list of LSP references; <C-q> sends the list of locations to quick buffer
- gd shows a list of LSP definitions
- gI (built-in gri since NeoVim 0.11) shows a list of LSP implementations
- grn (built-in since NeoVim 0.11) renames a symbol under cursor
- C-s (built-in since NeoVim 0.11) shows method signature reference
- ]d / [d (built-in) goes to previous / next diagnostic message location (e.g. warnings, errors, suggestions in the code)
- C-w,d shows the diagnostic message at cursor
Telescope
- <leader><leader> opens a quick action menu (Telescope builtin)
- <leader>f opens search in files
- <leader>gf opens search only in files tracked by Git
- <leader>b opens a list of NeoVim buffers, ignoring current buffer and sorting by last used timestamp
- <leader>/ performs a live-grep (with arguments) on files
- C-q (in the search results; needs Zellij in the locked mode - C-g - to prevent conflict) moves files to quickref buffer; then
  - ]c / [c for next / previous quickref entry
  - <leader>q toggles quickref buffer
  - <leader>l toggles locations buffer

Aside from ridiculous amount of time spent configuring Neovim, it ticks most of my boxes:

Recent files ✅ <leader>b
Go to declaration ✅ gd
Code completion ✅ (nvim-cmp) C-space
Syntax-aware selection ✅ (nvim-treesitter with customized incremental_selection): initialize treesitter selection mode with gsn and then increment / decrement node selection with gss and gsm
Find in files ✅ <leader>/ (telescope + telescope-live-grep-args, rg-powered)
Context actions ✅ gra
Find in current file ✅ /
Toggle comment ✅ gcc / gc<object> (e.g. gcaw - comment word)
Go to file ✅ <leader>f
Go to class ✅ <leader>S (telescope + LSP)
Rename ✅ grn
Find and replace in current file :%s/<search>/<replace>
Generate code ❌
Go to implementation ✅ gri
Introduce variable ❌
Change case ✅ (text-case.nvim or through external program)
Implement methods ❌

Helix

Actually, my first try was Helix, but I immediately stumbled upon the first blocker - the task I was working on at that time involved some Mustache templates. And Helix did not support it even on the most basic level (syntax highlighting). Moreover, Helix did not have plugins at the time, so there was little I could do for that particular task. The reason I wanted at least syntax highlighting had to do with the issue I was working on, which had misplaced conditionals in the template file, resulting in an incorrect rendering. I liked, however, how Helix came with a lot of handy utilities out of the box - the file picker, treesitter integration (so I could jump between the rest of Java / TypeScript codebase with ease).

I tried configuring a file tree via Yazi, which has an integration example on their website (using zellij terminal multiplexer panels), but it just refused to work for me on my Mac. That was when I timeboxed this and decided to switch my focus to NeoVim for the moment being.

After a while of just living my life, a new version of Helix has dropped, 25.07. And it introduced a few quality o flife changes. For example, the file explorer, which is now built in. It is not as powerful as Yazi, but it does the job. So I decided to switch to Helix for a while.

Upon getting back to Helix some time after, there was a new release, 25.07.1 which introduced a few quality of life improvements, including the file browser, sort of addressing the file tree issue I had before. Invoked with <leader>e (compared to <lead>f which just lists all the files in one long list), this one allows you to see directories. It is sort of an immediate-mode file manager as in it only shows the contents of a selected directory, but it is arguably more useful solution to the file tree problem than a list of all the files (recursively listing sub-directories). Additionally, tree-sitter was replaced so the syntax highlighting should be better - this would most likely be helpful in content like my blogs, often featuring multiple languages on top of the main Markdown.

Something that I liked about Helix was how it handles selections. For the most part, native multi-cursor is a great start - you just have to remember the difference between splitting the selection and selecting inside the selection - S (split) vs s (select) - both would create multiple cursors, just in a different way.

Different selection objects, including the ones provided by the tree-sitter - with the match mode you can select different objects (ma and mi are your friends). But you can also surround the selection (ms<char>) and change the surrounding (mr<from><to>) and delete the surrounding (md<char>), which I find pretty handy sometimes.

Some things I find bearable are the use of external tools for some actions, instead of plugins. Since Helix lacks plugin system in any shape or form (as of late October 2025), you can just pipe the selection to the external program. One such case is changing the case of selection - I use this often, especially in conjunction with multiple cursors - to quickly edit, say test data or a bunch of constants or to refactor multiple class fields. In this case I found a comment on Github suggesting the use of ccase - you first need to install this Rust utility, ccase, and then you'll be able to send your selection(-s) by using :pipe ccase -t screamingsnake, for instance.

Something I did not realize until writing this blog is that there are some really useful default keybindings for insert mode, like Ctrl+w to delete the previous word, Alt+w to delete the next word, Ctrl+k to delete to the end of the line and Ctrl+x to trigger autocomplete.

One annoying thing about Helix is the buffer picker (available via <lead>b) - it shows the list of open buffers (files). It does not appear to be sorting the elements of the list and the first element is always the current buffer. This is not really helpful if you want to quickly go the previously edited file. To make things worse, it does not seem to have any configuration around it, so the behaviour is pretty much set in stone.

I also am still missing one feature which made me try NeoVim in the first place, which is flash.nvim for quick jumps on the screen. Helix team did add something similar to HopWord command from hop.nvim, called goto_word, available via gw key shortcut. But unlike hop.nvim, Helix implementation is barely configurable (you can only configure the alphabet used for making labels) and comes with just one mode - HopWord. Additionally, it enforces exactly two-character abbreviations (jump labels). I thought this is rather limiting and since Helix is an open-source project, I decided to implement a behaviour similar to leap.nvim, since it is much less obstructive (it does not cover your entire screen in labels, making it super hard to figure out where you want to jump) and it allows to narrow down the places to jump by having prefix search. So I raised a PR and a suggestion discussion on Helix Github repo. To which the team just said "we don't want it" and dismissed the proposal:

We made intentional choices when implementing the goto_word behavior. We were quite aware if the nvim plugins. We are not going to replace or add alternative commands

One comment pointed to the plugin system when it is available:

I'm not inclined to add more jumping commands as core Helix commands like gw. There are a lot of different spins on jumping functions in Neovim plugins and I believe that future work should be done in plugins (once available) rather than as core faetures.

For context, plugin system has been discussed for over three years (as the moment of this writing, since September 2022) and apparently has been worked on for over two years (since October 2023). So I have no hope of seeing it released in the near future.

But instead of giving up, I decided to give a try to this work-in-progress plugin system and after a week a flash.hx plugin was born.

Here is my condensed cheatsheet of key shortcuts I use in Helix:

Pickers
- <lead>b buffer list
- <lead>e file explorer; far superior to file list
- <lead>/ search
- <lead>d show diagnostics (hints, errors, warnings, etc)
Multiple cursors
- , collapse all cursors into one
- s select in selection; create a new cursor at each selection; takes a regexp as a pattern
- S split selection; create a new cursor at each occurrence of pattern; takes regexp as a pattern
Selection
- v enters visual (selection) mode; this is how you can select multiple words - vwwww
- m enters marking mode; this is how you can select paragraphs, text and LSP objects, add, change and remove wrapping characters (like brackets and quotes); few examples:
  - mi( - select inside parentheses
  - ma" - select everything inside double-quotes and the double-quotes themselves
  - ms[ - surround selection with square brackets
  - mr[( - replace the surrounding square brackets with regular braces
  - mif - select the body of a function (method)
- = / < / > - format / unindent / indent selection
- :pipe or | - send selection to an external program and replace selection with the output of said program
  - as of version 25.7 you can also use interpolation with %{}, for instance, %{cursor_line} or %{buffer_name}, so you can pass in the filename to the external tool
- % - select an entire buffer (file)
- <lead><c> / <lead><C> - comment/uncomment the selection
Go to
- gl / gh - go to end of line / start of line (contrary to $ and ^ in Vim)
- gg / ge - go to first line / last line of the file
- Ctrl+i / Ctrl+o - go back and forth in your jumplist (places where the cursor has been placed; jumplist itself is available with <lead>j)
- gd / gD - go to definition / declaration
- gr - go to references
Copying (yanking)
- "+y - copy to the OS clip buffer (translates to "change the buffer to the + - OS clipboard and then yank the selection")

After a few weeks of working in Helix I was pleasantly surprised by how much out-of-the-box stuff it comes shipped with. And I did not have to spend a whole lot of time configuring it (aside from that week I spent developing flash.hx).

It checks most of my boxes too:

Recent files ✅ <leader>b
Go to declaration ✅ gd
Code completion ✅ C-x
Syntax-aware selection ✅ m<selection><see tooltip>
Find in files ✅ <leader>/ (simplified)
Context actions ✅ <leader>a (depends on LSP)
Find in current file ✅ /
Toggle comment ✅ <leader>c
Go to file ✅ <leader>f
Go to class ✅ <leader>S (capital S for workspace symbols, lowercase s for current file symbols) (partial)
Rename ✅ <leader>r
Find and replace in current file ⚠️ (unconventinal): select scope (% for current buffer), then use s to select occurrences or S to split selections into multiple cursors, then use actions - i or c to change the selections, d to delete selections
Generate code ❌
Go to implementation ✅ gi (depends on LSP)
Introduce variable ❌
Change case ✅ (through external tool): select text, then | (pipe it), then specify external program (ccase -t snake, for instance)
Implement methods ❌

Kakoune

TBD

Final thoughts

I was surprised by how nice these seemingly limiting editors have become over the years (last time I used them was over-configured Emacs back in circa 2015)!

Although rough around the edges and coming with massive pros and massive cons, both editors are being actively developed to become even nicer.

Neovim:

👍 comes with some quite nice things out of the box
👎 the out-of-the-box niceities need LSP to be plugged in, which comes through a plugin manager
👍 insane amount of plugins for all sorts of things
👎 a proportionally insane amount of time required to find and configure the plugins to make for a comfortable experience
👍 overall, the experience that you can create is on its own league
👍 has a nice file tree implementation (yes, I like to see the structure of the directory and its parents)
👍 has very nice semantic and LSP selection expand
👍 has an OG Flash implementation
👍 has nice surround behaviour
👍 file picker powered by ripgrep so you can filter all you want
👎 both selection expand and surround are a bit tricky to trigger (gsn followed by gss; ysaw" and cs[( and ds")

Helix:

👍 comes with a ton of nice things out of the box
👍 none of the niceities require extensive configuration (if any at all)
👎 very slow development cycle, so new features do not come out often (like twice a year, I believe)
👎 no plugin system, making the new features even less frequent or even possible
👎 very strongly opinionated developers, so if you are not comfortable with a feature - tough luck!
👎 does not have file tree
👎 file picker is very basic - only offers filtering by partial path match
👎 semantic selection expansion is not exactly expansion, but more like "select this exact scope"
👍 surround and semantic selection triggers are consistent e.g. make much more sense (ms(, mr([, md[; maf, maa)

My current stance is that Neovim is a much stronger contender with a much broader set of (much better implemented) features, but the amount of time you have to spend to get to that state is enormous. And whilst much nicely organised out of the box, Helix is very much undercooked (in my opinion).

Hence, for powerful and actual workloads - use Neovim, it is worth spending all that time configuring it for that purpose. For a casual relaxing but very restricted editing - Helix is an-okay choice.

Strongly-typed front-end: experiment 2, simple application, in Gleam / Lustre

20 Dec 2024

import gleam/float
import gleam/option.{type Option, None, Some}
import gleam/string
import lustre
import lustre/attribute.{value}
import lustre/element.{text}
import lustre/element/html.{button, div, input, p, select}
import lustre/event.{on_click, on_input}

type Shape {
  Circle
  Square
}

type Msg {
  ShapeChanged(s: Option(Shape))
  ValueChanged(x: Float)
  CalculateArea
}

type State {
  State(shape: Option(Shape), value: Float, area: Option(Float))
}

const pi = 3.14

fn calculate_area(shape: Shape, x: Float) -> Float {
  case shape {
    Circle -> pi *. x *. x
    Square -> x *. x
  }
}

fn init(_flags) -> State {
  State(shape: None, value: 0.0, area: None)
}

fn update(model: State, msg: Msg) -> State {
  case msg {
    ShapeChanged(s) -> State(..model, shape: s)
    ValueChanged(v) -> State(..model, value: v)
    CalculateArea ->
      State(
        ..model,
        area: option.map(model.shape, fn(s) { calculate_area(s, model.value) }),
      )
  }
}

fn handle_value_change(s: String) -> Msg {
  case string.is_empty(s) {
    True -> ValueChanged(0.0)
    False ->
      case float.parse(s) {
        Ok(x) -> ValueChanged(x)
        _ -> ValueChanged(0.0)
      }
  }
}

fn handle_shape_change(s: String) -> Msg {
  case s {
    "circle" -> ShapeChanged(Some(Circle))
    "square" -> ShapeChanged(Some(Square))
    _ -> ShapeChanged(None)
  }
}

fn view(model: State) {
  div([], [
    select([on_input(handle_shape_change)], [
      html.option([value("")], "Select shape"),
      html.option([value("circle")], "Circle"),
      html.option([value("square")], "Square"),
    ]),
    input([value(float.to_string(model.value)), on_input(handle_value_change)]),
    button([on_click(CalculateArea)], [text("Calculate area")]),
    p([], [
      text(
        "Area: " <> option.unwrap(option.map(model.area, float.to_string), ""),
      ),
    ]),
  ])
}

pub fn main() {
  let app = lustre.simple(init, update, view)
  let assert Ok(_) = lustre.start(app, "#app", Nil)

  Nil
}

Resulting bundle is quite big sitting at a whopping 65.9kb

Iterating a vector in C++

31 Oct 2024

Such a simple topic - iterating over a vector, is it even worth discussing?

Interestingly enough, there is a difference in how exactly you iterate - be it using iterators, for(:) sugar or plain old for(i=0; i<vec.size(); ++i).

Let us see what output does a compiler produce in each of these cases.

sample1 (simple for loop):

std::vector<int> data{ 5, 3, 2, 1, 4 };

for (auto i = 0; i < data.size(); ++i) {
  moo(data[i]);
}

        mov     DWORD PTR [rbp-20], 0
        jmp     .L3
.L4:
        mov     eax, DWORD PTR [rbp-20]
        movsx   rdx, eax
        lea     rax, [rbp-64]
        mov     rsi, rdx
        mov     rdi, rax
        call    std::vector<int, std::allocator<int> >::operator[](unsigned long)
        mov     eax, DWORD PTR [rax]
        mov     edi, eax
        call    moo(int)
        add     DWORD PTR [rbp-20], 1
.L3:
        mov     eax, DWORD PTR [rbp-20]
        movsx   rbx, eax
        lea     rax, [rbp-64]
        mov     rdi, rax
        call    std::vector<int, std::allocator<int> >::size() const
        cmp     rbx, rax
        setb    al
        test    al, al
        jne     .L4

but

sample2 (reversed for loop):

for (auto i = data.size() - 1; i >= 0; --i) {
  moo(data[i]);
}

        lea     rax, [rbp-64]
        mov     rdi, rax
        call    std::vector<int, std::allocator<int> >::size() const
        sub     rax, 1
        mov     QWORD PTR [rbp-24], rax
.L3:
        mov     rdx, QWORD PTR [rbp-24]
        lea     rax, [rbp-64]
        mov     rsi, rdx
        mov     rdi, rax
        call    std::vector<int, std::allocator<int> >::operator[](unsigned long)
        mov     eax, DWORD PTR [rax]
        mov     edi, eax
        call    moo(int)
        sub     QWORD PTR [rbp-24], 1
        jmp     .L3

also

sample3 (foreach):

for (auto i : data) {
  moo(i)
}

        lea     rax, [rbp-80]
        mov     QWORD PTR [rbp-24], rax
        mov     rax, QWORD PTR [rbp-24]
        mov     rdi, rax
        call    std::vector<int, std::allocator<int> >::begin()
        mov     QWORD PTR [rbp-88], rax
        mov     rax, QWORD PTR [rbp-24]
        mov     rdi, rax
        call    std::vector<int, std::allocator<int> >::end()
        mov     QWORD PTR [rbp-96], rax
        jmp     .L3
.L4:
        lea     rax, [rbp-88]
        mov     rdi, rax
        call    __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::operator*() const
        mov     eax, DWORD PTR [rax]
        mov     DWORD PTR [rbp-28], eax
        mov     eax, DWORD PTR [rbp-28]
        mov     edi, eax
        call    moo(int)
        lea     rax, [rbp-88]
        mov     rdi, rax
        call    __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::operator++()
.L3:
        lea     rdx, [rbp-96]
        lea     rax, [rbp-88]
        mov     rsi, rdx
        mov     rdi, rax
        call    bool __gnu_cxx::operator!=<int*, std::vector<int, std::allocator<int> > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > const&, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > const&)
        test    al, al
        jne     .L4

but with -O1

sample1 (simple for loop):

        movabs  rax, 12884901893
        movabs  rdx, 4294967298
        mov     QWORD PTR [r12], rax
        mov     QWORD PTR [r12+8], rdx
        mov     DWORD PTR [r12+16], 4
        mov     QWORD PTR [rsp+8], rbp
        mov     rbx, r12
        jmp     .L10
.L20:
        add     rbx, 4
        cmp     rbp, rbx
        je      .L19
.L10:
        mov     edi, DWORD PTR [rbx]
        call    moo(int)
        jmp     .L20

sample2 (reversed for loop):

        movabs  rax, 12884901893
        movabs  rdx, 4294967298
        mov     QWORD PTR [rbp+0], rax
        mov     QWORD PTR [rbp+8], rdx
        mov     DWORD PTR [rbp+16], 4
        lea     rbx, [rbp+16]
        jmp     .L4
.L8:
        sub     rbx, 4
.L4:
        mov     edi, DWORD PTR [rbx]
        call    moo(int)
        jmp     .L8

sample3 (foreach):

        mov     r12, rax
        lea     rbp, [rax+20]
        mov     QWORD PTR [rsp+16], rbp
        movabs  rax, 12884901893
        movabs  rdx, 4294967298
        mov     QWORD PTR [r12], rax
        mov     QWORD PTR [r12+8], rdx
        mov     DWORD PTR [r12+16], 4
        mov     QWORD PTR [rsp+8], rbp
        mov     rbx, r12
        jmp     .L10
.L20:
        add     rbx, 4
        cmp     rbx, rbp
        je      .L19
.L10:
        mov     edi, DWORD PTR [rbx]
        call    moo(int)
        jmp     .L20

Interesting how iterating forwards adds an extra boundary check and a jump (cmd rbx, rbp and je .L19 in samples #1 and #3), whereas iterating backwards does not.

But this find actually must come with one pretty big caveat: the cache lines. The number of assembly instructions is a pretty poor measure of performance - after all, CPU instructions are ridiculously fast, compared to any form of IO - specifically memory access.

The code is also available on Github: [https://github.com/shybovycha/iterate-over-vector-in-cpp/]

The benchmarking results seem to prove the assumption: bacwards iteration seems to be faster:

Test	Min	Max	50%	90%	95%	delta
sample1 (`++i`)	1325447	3762733	1339309	1416805	1429724	+7%
sample2 (`--i`)	1290571	2596630	1308953	1334955	1342008	0%
sample3 (`for(:)`)	1438847	3034474	1460698	1506001	1508657	+11%

Out of curiosity, I also benchmarked using the postfix operator instead of prefix: i++ (postfix) instead of ++i (prefix), which shows prefix operator is slightly faster:

Test	Min	Max	50%	90%	95%	delta
sample1 prefix (`++i`)	1325447	3762733	1339309	1416805	1429724	+11%
sample1 postfix (`i++`)	1292195	4155596	1308721	1370542	1382136	+7%
sample2 prefix (`--i`)	1290571	2596630	1308953	1334955	1342008	+5%
sample2 postfix (`i--`)	1256929	2598426	1267088	1277779	1282787	0%

Interestingly enough, there is some difference:

backward iteration with postfix decrement for (auto i=vec.size()-1; i>0; i--) is the fastest
backward iteration with prefix decrement --i is

5% slower than i--

forward iteration with postfix increment i++ is

3% slower than --i
7% slower than i--

forward iteration with prefix increment ++i is the slowest:

11% slower than i--
7% slower than --i
4% slower than i++

The above tests were ran 10000 times on a heap-allocated (std::vector) 10000 int elements.

Standard library iterators

Out of sheer curiosity, I decided to test how using an iterator would affect the results (sample4):

for (auto it = a.begin(); it != a.end(); ++it) {
  moo(*it);
}

With both prefix and suffix (postfix) variations:

for (auto it = a.begin(); it != a.end(); it++) {
  moo(*it);
}

And the reverse iterator (sample5):

for (auto it = a.end(); it != a.begin(); it--) {
  moo(*it);
}

Test	Min	Max	50%	90%	95%	delta
sample1 prefix (`++i`)	1325447	3762733	1339309	1416805	1429724	+11%
sample1 postfix (`i++`)	1292195	4155596	1308721	1370542	1382136	+7%
sample2 prefix (`--i`)	1290571	2596630	1308953	1334955	1342008	+4%
sample2 postfix (`i--`)	1256929	2598426	1267088	1277779	1282787	0%
sample3 (`for(:)`)	1438847	3034474	1460698	1506001	1508657	+17%
sample4 prefix (`++it`)	1470583	2834386	1495310	1561023	1564850	+22%
sample4 postfix (`it++`)	1461634	2695552	1535905	1547865	1556815	+21%
sample5 prefix (`--it`)	1462156	2802714	1470567	1478957	1512621	+18%
sample5 postfix (`it--`)	1447920	2831859	1471334	1479670	1490088	+16%

The results are a bit surprising, looking at the assembly code generated by G++:

.L25:
        cmp     rbx, r13
        je      .L10
        mov     rbp, r13
        jmp     .L11
.L27:
        add     rbp, 4
        cmp     rbx, rbp
        je      .L26
.L11:
        mov     edi, DWORD PTR [rbp+0]
        call    moo(int)
        jmp     .L27
.L26:
        test    r13, r13
        je      .L18
        mov     rsi, r12
        sub     rsi, r13
.L15:
        mov     rdi, r13
        call    operator delete(void*, unsigned long)
        ; ... cleanup & exit ...
        ret

Clang 17.0.1 is a slightly different story:

.LBB0_1:
        cmp     r14, rax
        jne     .LBB0_2
        sub     r14, r15
        movabs  rax, 9223372036854775804
        cmp     r14, rax
        mov     qword ptr [rsp], r15
        je      .LBB0_11
        mov     rbp, r14
        sar     rbp, 2
        cmp     rbp, 1
        mov     rax, rbp
        adc     rax, 0
        lea     rdx, [rax + rbp]
        mov     rcx, r12
        cmp     rdx, r12
        jbe     .LBB0_14
        add     rax, rbp
        jae     .LBB0_16
.LBB0_17:
        test    r12, r12
        je      .LBB0_18
.LBB0_19:
        lea     rdi, [4*r12]
        call    operator new(unsigned long)@PLT
        mov     r15, rax
        jmp     .LBB0_21
.LBB0_14:
        mov     rcx, rdx
        add     rax, rbp
        jb      .LBB0_17
.LBB0_16:
        mov     r12, rcx
        test    r12, r12
        jne     .LBB0_19
.LBB0_18:
        xor     r15d, r15d
.LBB0_21:
        mov     dword ptr [r15 + 4*rbp], r13d
        test    r14, r14
        mov     rbx, qword ptr [rsp]
        jle     .LBB0_23
        mov     rdi, r15
        mov     rsi, rbx
        mov     rdx, r14
        call    memmove@PLT
.LBB0_23:
        test    rbx, rbx
        je      .LBB0_25
        mov     rdi, rbx
        call    operator delete(void*)@PLT
.LBB0_25:
        lea     rbp, [r15 + 4*rbp]
        lea     rax, [r15 + 4*r12]
        movabs  r12, 2305843009213693951
        jmp     .LBB0_26
.LBB0_3:
        cmp     r15, r14
        je      .LBB0_7
        lea     r14, [r15 - 4]
.LBB0_5:
        mov     edi, dword ptr [r14 + 4]
        call    moo(int)@PLT
        add     r14, 4
        cmp     r14, rbp
        jne     .LBB0_5
.LBB0_7:
        test    r15, r15
        je      .LBB0_9
        mov     rdi, r15
        call    operator delete(void*)@PLT
.LBB0_9:
        xor     eax, eax
        add     rsp, 8
        pop     rbx
        pop     r12
        pop     r13
        pop     r14
        pop     r15
        pop     rbp
        ret
.LBB0_11:
        lea     rdi, [rip + .L.str.1]
        call    std::__throw_length_error(char const*)@PLT
        jmp     .LBB0_30
        jmp     .LBB0_30
        mov     qword ptr [rsp], r15
.LBB0_30:
        mov     r14, rax
        cmp     qword ptr [rsp], 0
        je      .LBB0_32
        mov     rdi, qword ptr [rsp]
        call    operator delete(void*)@PLT
.LBB0_32:
        mov     rdi, r14
        call    _Unwind_Resume@PLT

Caching impact

One other assumption is that iterating over a vector backwards affects the memory caching in a pretty poor manner. This is a rather complex scenario to test, since there are two potential scenarios on how this could happen:

swap memory usage - happens when the dataset is so big it does not fit in the available RAM and OS has to switch to using disk instead of RAM; this is the worst-case scenario
data locality / CPU cache usage - CPUs usually try to predict memory access patterns and pre-load more data than actually required by a given operation; when the data block is accessed at random points (indices), CPU fails to pre-load the correct slices of memory

Since this blog is about iterating over a vector, it would be seem to be easier to generate absurdely large chunks of data and try to iterate over them. But in reality it would only happen with absurdely large files - my machine, for instance, sports 32GB of RAM, so it would be quite a task to generate multiple 32GB files.

Simulating random memory access will break the purpose of iterating over the elements one by one - in these cases we either use iterators to access a linked list (each element is in random part of RAM) or access elements by index. And there is no way to iterate over them by incrementing an index. Alternatively, we have to dereference indexes from another block of memory. Either way it has nothing to do comparing for loops.

Simulating accessing multiple cache lines could actually be easier - we just need to replace integers with structures of variable size and use std::list (linked list) instead of std::vector (single block of memory):

struct MyStruct {
  bool f0; // 1 byte
  float f1[100]; // sizeof(float) * 100 = 4 * 100 = 400 bytes
  double f2[53]; // sizeof(double) * 53 = 8 * 53 = 424 bytes
  char f3[64]; // sizeof(char) * 64 = 1 * 64 = 64 bytes
}; // total size = 889 bytes, likely to span across multiple cache lines

std::list<MyStruct> a;

and generate a large enough number of these objects:

#include <random>

std::random_device rd;
std::mt19937_64 gen(rd());

std::uniform_int_distribution<int> bool_dist(0, 1);
std::uniform_real_distribution<float> float_dist(0.0f, 1000.0f);
std::uniform_int_distribution<int> int_dist(std::numeric_limits<int>::min(), std::numeric_limits<int>::max());

for (auto i = 0; i < 10000; ++i) {
  MyStruct e;

  e.f0 = static_cast<bool>(bool_dist(gen));

  for (auto t = 0; t < 100; ++t)
    e.f1[t] = float_dist(gen);

  for (auto t = 0; t < 53; ++t)
    e.f2[t] = static_cast<double>(float_dist(gen));

  for (auto t = 0; t < 64; ++t)
    e.f3[t] = static_cast<char>(int_dist(gen) % 255);

  a.push_back(e);
}

Just to prevent memory leaks, cleanup the test data at the end:

for (auto& item : a) {
  delete item.f3;
}

a.clear();

These benchmarks take significantly longer time to run and produce big numbers:

Test	Min	Max	50%	90%	95%	delta
sample1 (`++i`)	7882057	15385569	7948012	8004355	8031673	0%
sample2 (`--i`)	7915870	16297070	8971293	9109493	9151567	+13%
sample3 (`for(:)`)	8094894	16496526	8159004	8229138	8259364	+3%

Compare this to a structure which actually fits in one cache line. Cache line size is apparently 64 B - that is sixty-four bytes - on most CPUs (AMD Zen, Intel Ice Lake). My Apple M1 laptop has twice as much, 128 B, as shown by running sysctl -a | grep 'hw.cachelinesize'.

If the structure was reorganized to fit in that limit, like so:

struct MyStruct {
  bool f0; // 1 byte
  float* f1; // sizeof(float*) = 8 bytes
  double* f2; // sizeof(double*) = 8 bytes
  char* f3; // sizeof(char*) = 8 bytes
}; // total size = 25 bytes, likely to fir in a single cache line

The timings are considerably lower for the backwards iteration:

Test	Min	Max	50%	90%	95%	delta
sample1 (`++i`)	7575860	20005185	7640651	7700057	7725933	0%
sample2 (`--i`)	7466816	18068148	7657415	8109531	8208603	+6%
sample3 (`for(:)`)	7832724	14264212	7882930	7932295	7960718	+3%

This highlights how predictive cache loading and data element size that fits into a single CPU cache line actually impacts RAM reads - by a significant margin.

Conclusion

In simplest case (vector of numbers), the worst case scenario is within 11% difference, with backwards iteration being the fastest, followed by forward iteration and foreach being the slowest of the bunch. Prefix decrement in the backwards iteration being the fastest. Bear in mind: these numbers are in nanoseconds, meaning the worst case scenario is 10000 (ten thousand) elements of variable size being iterated in 1.5 ms and the fastest being 1.3 ms - that is milliseconds. For comparison, rendering a frame at 120 FPS rate would take about 8.3 ms. And if you were to process some requests (in a client-server system), you would be able to handle 699 requests per second while iterating over this list. So realistically, the way you iterate a vector of integers, any of these techniques would only really matter if you are working on a really high performance system.

That being said, this is all true while the vector elements fit in one CPU cache line or cache block, that is L1 and L2 caches.

So if the data element does not fit in a block of 64 B (or 128 B on some CPUs), the performance impact of backwards iteration is going to be more significant. In this case we are talking about 8 ms vs 9.1 ms. In terms of FPS, the difference would be 125 FPS vs 109 FPS. Or about 15 requests per second more.

Gantt chart. Part 4

27 Jun 2024

Seems like every two years or so I hop on my Gantt chart implementation and rework it completely.

Last few attempts (rev. 1, rev. 2, rev. 3) were alright, but I was never quite satisfied with the implementation - be it SVG, which has a toll on a browser and has quite limited customization functionality or Canvas API, with same limited customization but being fast.

With the recent introduction of grid layouts in CSS, now supported in all browsers, now seems like a perfect time to revisit the old implementations once again:

CodeSandbox / live demo

This revision now has a proper horizontal scrolling on the panel with bars - meaning the labels on the left panel stay in place whilst the left panel is scrollable. Moreover, the chart is now relies on pure HTML and CSS (being rendered with React though), making it is possible to use rich markup inside the bars and labels.

Implementation steps

The data for the tests is going to look like this:

export const data = [
  {
    id: 1,
    name: "epic 1"
  },
  {
    id: 2,
    name: "epic 2"
  },
  {
    id: 3,
    name: "epic 3"
  },
  {
    id: 4,
    name: "story 1",
    parent: 1
  },
  {
    id: 5,
    name: "story 2",
    parent: 1
  },
  {
    id: 6,
    name: "story 3",
    parent: 1
  },
  {
    id: 7,
    name: "story 4",
    parent: 2
  },
  {
    id: 8,
    name: "story 5",
    parent: 2
  },
  {
    id: 9,
    name: "lorem ipsum dolor atata",
    parent: 5
  },
  {
    id: 10,
    name: "task 2",
    parent: 5
  }
];

The main component, <Gantt>, initially was implementated as follows:

import React, { useMemo } from "react";

import style from "./gantt.module.css";

const LeftPaneRow = ({ id, name }) => {
  return <div className={style.row}>{name}</div>;
};

const LeftPane = ({ items }) => {
  return (
    <div className={style.left_pane}>
      <div className={style.left_pane_header}>/</div>

      <div className={style.left_pane_rows}>
        {items.map((item) => (
          <LeftPaneRow key={item.id} {...item} />
        ))}
      </div>
    </div>
  );
};

const RightPaneRow = ({ id, name }) => {
  return (
    <div className={style.row}>
      <div className={style.entry} style={{ left: 0 }}>
        {id}
      </div>
    </div>
  );
};

const RightPane = ({ items }) => {
  return (
    <div className={style.right_pane}>
      <div className={style.right_pane_header}>...scale...</div>
      <div className={style.right_pane_rows}>
        {items.map((item) => (
          <RightPaneRow key={item.id} {...item} />
        ))}
      </div>
    </div>
  );
};

export const flattenTree = (items) => {
  const queue = [];

  items.filter(({ parent }) => !parent).forEach((item) => queue.push(item));

  const result = [];
  const visited = new Set();

  while (queue.length > 0) {
    const item = queue.shift();

    if (visited.has(item.id)) {
      continue;
    }

    result.push(item);
    visited.add(item.id);

    items
      .filter((child) => child.parent === item.id)
      .forEach((child) => queue.unshift(child));
  }

  return result;
};

export const Gantt = ({ items }) => {
  const itemList = useMemo(() => flattenTree(items), [items]);

  return (
    <div className={style.gantt}>
      <LeftPane items={itemList} />
      <RightPane items={itemList} />
    </div>
  );
};

The core of the proper representation of this diagram is the CSS:

.gantt {
  display: grid;
  grid-template: 1fr / auto 1fr;
  grid-template-areas: "left right";
  width: 100%;
}

.gantt .left_pane {
  display: grid;
  grid-area: left;
  border-right: 1px solid #bbb;
  grid-template: auto 1fr / 1fr;
  grid-template-areas: "corner" "rows";
}

.gantt .left_pane .left_pane_rows {
  display: grid;
  grid-area: rows;
}

.gantt .left_pane .left_pane_header {
  display: grid;
  grid-area: corner;
}

.gantt .right_pane {
  display: grid;
  grid-template: auto 1fr / 1fr;
  grid-template-areas: "scale" "rows";
  grid-area: right;
  overflow: auto;
}

.gantt .right_pane .right_pane_rows {
  width: 10000px; /*temp*/
  display: grid;
  grid-area: rows;
}

.gantt .right_pane .right_pane_header {
  display: flex;
  grid-area: scale;
}

.gantt .row {
  height: 40px;
  align-items: center;
  display: flex;
}

.gantt .right_pane .row {
  position: relative;
}

.gantt .right_pane .row .entry {
  position: absolute;
  background: #eeeeee;
  padding: 0.1rem 0.5rem;
  border-radius: 0.4rem;
}

Split into two panels, right is scrollable

Good, we now have two panels with items aligned in rows and the right panel being scrollable if it gets really long. Next thing, position: absolute is absolutely disgusting - we use grid layout already! Instead, split each row into the same number of columns using grid and position the elements in there:

const RightPaneRow = ({ id, name, columns, start, end }) => {
  const gridTemplate = `auto / repeat(${columns}, 1fr)`;
  const gridArea = `1 / ${start} / 1 / ${end}`;

  return (
    <div
      className={style.row}
      style={{
        gridTemplate,
      }}
    >
      <div
        className={style.entry}
        style={{
          gridArea,
        }}
      >
        {id}
      </div>
    </div>
  );
};

and clean up the CSS a bit (like removing the position: absolute and reducing the width from 10000px down to 1000px):

.gantt .right_pane .right_pane_rows {
  width: 1000px; /*temp*/
  display: grid;
  grid-area: rows;
}

.gantt .row {
  height: 40px;
  align-items: center;
  display: grid;
}

.gantt .right_pane .row {
  position: relative;
}

.gantt .right_pane .row .entry {
  background: #eeeeee;
  padding: 0.1rem 0.5rem;
  border-radius: 0.4rem;
}

Now, let's position the elements in each row using the column index:

const RightPanelRowEntry = ({ id, start, end, children }) => {
  const gridArea = `1 / ${start} / 1 / ${end}`;

  return (
    <div
      className={style.entry}
      style={{
        gridArea,
      }}
    >
      {children}
    </div>
  );
};

const RightPaneRow = ({ id, name, columns, start, end }) => {
  const gridTemplate = `auto / repeat(${columns}, 1fr)`;
  const gridArea = `1 / ${start} / 1 / ${end}`;

  return (
    <div
      className={style.row}
      style={{
        gridTemplate,
      }}
    >
      <div
        className={style.entry}
        style={{
          gridArea,
        }}
      >
        {id}
      </div>
    </div>
  );
};

const RightPaneHeaderRow = ({ columns, children }) => {
  const gridTemplate = `auto / repeat(${columns}, 1fr)`;

  return (
    <div
      className={style.right_pane_header_row}
      style={{
        gridTemplate,
      }}
    >
      {children}
    </div>
  );
};

const RightPaneHeader = ({ children }) => {
  return <div className={style.right_pane_header}>{children}</div>;
};

const RightPane = ({ items, columns }) => {
  const columnHeaders = [...Array(columns)].map((_, idx) => (
    <RightPaneHeader>{idx + 1}</RightPaneHeader>
  ));

  const rows = items.map((item) => (
    <RightPaneRow key={item.id} columns={columns}>
      <RightPanelRowEntry {...item}>{item.id}</RightPanelRowEntry>
    </RightPaneRow>
  ));

  return (
    <div className={style.right_pane}>
      <RightPaneHeaderRow columns={columns}>{columnHeaders}</RightPaneHeaderRow>
      <div className={style.right_pane_rows}>{rows}</div>
    </div>
  );
};

And add corresponding new CSS styles:

.gantt .right_pane .right_pane_header_row {
  display: grid;
  grid-area: scale;
}

.gantt .right_pane .right_pane_header_row .right_pane_header {
  display: grid;
  align-items: center;
  text-align: center;
}

This requires start and end defined for each entry:

export const data = [
  {
    id: 1,
    name: "epic 1",
    start: 1,
    end: 12,
  },
  {
    id: 2,
    name: "epic 2",
    start: 2,
    end: 4,
  },
  {
    id: 3,
    name: "epic 3",
    start: 9,
    end: 11,
  },
  {
    id: 4,
    name: "story 1",
    parent: 1,
    start: 6,
    end: 7,
  },
  // ...
};

And, to make it not repeat a dozen of inline CSS styles, we can utilize CSS variables:

const RightPaneRow = ({ id, columns, children }) => {
  return (
    <div className={style.row}>
      {children}
    </div>
  );
};

const RightPanelRowEntry = ({ id, start, end, children }) => {
  return (
    <div
      className={style.entry}
      style={{
        "--col-start": start,
        "--col-end": end,
      }}
    >
      {children}
    </div>
  );
};

const RightPane = ({ items, columns }) => {
  const columnHeaders = [...Array(columns)].map((_, idx) => (
    <RightPaneHeader>{idx + 1}</RightPaneHeader>
  ));

  const rows = items.map((item) => (
    <RightPaneRow key={item.id} columns={columns}>
      <RightPanelRowEntry {...item}>{item.id}</RightPanelRowEntry>
    </RightPaneRow>
  ));

  return (
    <div className={style.right_pane} style={{ "--columns": columns }}>
      <RightPaneHeaderRow>{columnHeaders}</RightPaneHeaderRow>
      <div className={style.right_pane_rows}>{rows}</div>
    </div>
  );
};

We can also re-use the same row for header:

const RightPaneHeaderRow = ({ children }) => {
  return <div className={style.right_pane_header_row}>{children}</div>;
};

And corresponding CSS:

.gantt .right_pane .right_pane_header_row {
  display: grid;
  grid-area: scale;

  grid-template: auto / repeat(var(--columns, 1), 1fr);
}

.gantt .right_pane .row {
  position: relative;

  grid-template: auto / repeat(var(--columns, 1), 1fr);
}

.gantt .right_pane .row .entry {
  background: #eeeeee;
  padding: 0.1rem 0.5rem;
  border-radius: 0.5rem;
  align-items: center;
  text-align: center;

  grid-area: 1 / var(--col-start, 1) / 1 / var(--col-end, 1);
}

I like to also change the fonts, since the default sans-serif just looks terrible:

@import url("https://fonts.googleapis.com/css2?family=Assistant:wght@200..800&display=swap");

:root {
  font-family: "Assistant", sans-serif;
  font-optical-sizing: auto;
  font-weight: 300;
  font-style: normal;
  font-variation-settings: "wdth" 100;
}

And maybe add some grid lines for the rows:

.gantt .row:first-child {
  border-top: 1px solid var(--border-color, #eee);
}

.gantt .row {
  padding: 0 0.75rem;
  border-bottom: 1px solid var(--border-color, #eee);
}

Now let's add some padding to separate parent and child items of a chart:

const LeftPaneRow = ({ level, id, name }) => {
  const nestingPadding = `${level}rem`;

  return (
    <div className={style.row} style={{ "--label-padding": nestingPadding }}>
      {name}
    </div>
  );
};

.gantt .left_pane .row {
  padding-left: var(--label-padding, 0);
}

and fill out the level property when flattening the item tree:

export const flattenTree = (items) => {
  const queue = [];

  items
    .filter(({ parent }) => !parent)
    .forEach((item) => queue.push({ level: 0, item }));

  const result = [];
  const visited = new Set();

  while (queue.length > 0) {
    const { level, item } = queue.shift();

    if (visited.has(item.id)) {
      continue;
    }

    result.push({ ...item, level });
    visited.add(item.id);

    items
      .filter((child) => child.parent === item.id)
      .forEach((child) => queue.unshift({ item: child, level: level + 1 }));
  }

  return result;
};

And automate the number of columns calculation:

export const Gantt = ({ items }) => {
  const itemList = flattenTree(items);

  const startsAndEnds = items.flatMap(({ start, end }) => [start, end]);
  const columns = Math.max(...startsAndEnds) - Math.min(...startsAndEnds);

  return (
    <div className={style.gantt}>
      <LeftPane items={itemList} />
      <RightPane items={itemList} columns={columns} />
    </div>
  );
};

In order to make chart panel scrollable, one can set a width CSS property for the .right_pane_rows and .right_pane_header_row:

.gantt .right_pane .right_pane_rows {
  width: 2000px;
}

.gantt .right_pane .right_pane_header_row {
  width: 2000px;
}

The last bit for a this prototype would be to have a scale for the columns.

Assume a chart item has an abstract start and end fields - these could be dates or some domain-specific numbers (like a week in a quarter or a sprint, etc.). Those will then need to be mapped onto column index. Then the chart width (in columns) would be the difference between the smallest start value and the biggest end value:

export const Gantt = ({ items, scale }) => {
  const itemList = flattenTree(items).map((item) => ({
    ...item,
    ...scale(item), // assuming `scale` function returns an object { start: number; end: number }
  }));

  const minStartItem = minBy(itemList, (item) => item.start);
  const maxEndItem = maxBy(itemList, (item) => item.end);

  const columns = maxEndItem.end - minStartItem.start;

  return (
    <div className={style.gantt}>
      <LeftPane items={itemList} />
      <RightPane items={itemList} columns={columns} />
    </div>
  );
};

The minBy and maxBy helper functions could be either taken from lodash or manually defined like this:

const minBy = (items, selector) => {
  if (items.length === 0) {
    return undefined;
  }

  let minIndex = 0;

  items.forEach((item, index) => {
    if (selector(item) < selector(items[minIndex])) {
      minIndex = index;
    }
  });

  return items[minIndex];
}

For better navigation around this code we can add some types:

interface GanttChartItem {
  id: string;
  name: string;
}

interface GanttChartProps {
  items: GanttChartItem[];
  scale: (item: GanttChartItem) => { start: number; end: number };
}

function minBy<T>(items: T[], selector: (item: T) => number): T | undefined {
  // ...
}

export const Gantt = ({ items, scale }: GanttChartProps) => {
  // ...
};

export default function App() {
  const scale = ({ start, end }) => {
    return { start: start * 2, end: end * 2 };
  };

  return <Gantt items={data} scale={scale} />;
}

We can extend this even further by adding an API to provide labels for columns:

interface GanttChartProps {
  // ...
  scaleLabel: (column: number) => React.Element;
}

export const Gantt = ({ items, scale, scaleLabel }: GanttChartProps) => {
  // ...

  return (
    <div className={style.gantt}>
      <LeftPane items={itemList} />
      <RightPane items={itemList} columns={columns} scaleLabel={scaleLabel} />
    </div>
  );
};


const RightPane = ({ items, columns, scaleLabel }) => {
  const columnHeaders = [...Array(columns)].map((_, idx) => (
    <RightPaneHeader>{scaleLabel(idx)}</RightPaneHeader>
  ));

  // ...
};

export default function App() {
  const scale = ({ start, end }) => ({ start, end });
  };

  const scaleLabel = (col) => `${col}`;

  return <Gantt items={data} scale={scale} scaleLabel={scaleLabel} />;
}

This new API can then be utilized to show month names, for instance:

export default function App() {
  const scale = ({ start, end }) => {
    return { start, end };
  };

  const months = [
    "Jan",
    "Feb",
    "Mar",
    "Apr",
    "May",
    "Jun",
    "Jul",
    "Aug",
    "Sep",
    "Oct",
    "Nov",
    "Dec",
  ];

  const scaleLabel = (col) => months[col % 12];

  return <Gantt items={data} scale={scale} scaleLabel={scaleLabel} />;
}

Moreover, it is now possible to inline HTML and CSS in the name of each chart item:

export const LeftPaneRow = ({ level, name }) => {
  const nestingPadding = `${level}rem`;

  return (
    <div className={style.row} style={{ "--label-padding": nestingPadding }}>
      <span dangerouslySetInnerHTML={{__html: name}}></span>
    </div>
  );
};

And then in data.json (note that FontAwesome requires its CSS on a page in order to work):

[
  {
    id: 7,
    name: '<i style="font-family: \'FontAwesome\';" class="fa fa-car"></i>&nbsp;story with FontAwesome',
    parent: 2,
    start: 4,
    end: 6,
  },
  {
    id: 9,
    name: 'inline <em><b style="color: #5ebebe">CSS</b> color</em> <u style="border: 1px dashed #bebefe; padding: 2px; border-radius: 2px">works</u>',
    parent: 5,
    start: 5,
    end: 6,
  },
]

The API can be further improved by providing the render function for the bars' labels:

export const RightPane = ({ items, columns, scaleLabel, barLabel }) => {
  const rows = items.map((item) => (
    <RightPaneRow key={item.id} columns={columns}>
      <RightPaneRowEntry {...item}>{barLabel ? barLabel(item) : <>{item.id}</>}</RightPaneRowEntry>
    </RightPaneRow>
  ));

  // ...
};

export interface GanttChartProps {
  items: GanttChartItem[];
  scale: (item: GanttChartItem) => { start: number; end: number };
  scaleLabel: (column: number) => React.Element;
  barLabel: (item: GanttChartItem) => React.Element;
}

export const Chart = ({
  items,
  barLabel,
  scale,
  scaleLabel,
}: GanttChartProps) => {
  // ...
  return (
    <div className={style.gantt}>
      <LeftPane items={itemList} />
      <RightPane
        items={itemList}
        columns={columns}
        scaleLabel={scaleLabel}
        barLabel={barLabel}
      />
    </div>
  );
};

and then in App component:

import pluralize from "pluralize";

export default function App() {
  const barLabel = ({ start, end }) => (
    <>
      {end - start} {pluralize("month", end - start)}
    </>
  );

  // ...

  return (
    <Chart
      items={data}
      scale={scale}
      scaleLabel={scaleLabel}
      barLabel={barLabel}
    />
  );
};

So why Redux is bad?

26 Jun 2024

Redux is generally considered a bad choice for state management on front-end. But check out an average application' React application (or a few fragments of it):

import { Provider, useAtom } from 'jotai';
import { useQuery } from 'react-query';

const useInfo = () => {
  const { data, error, isLoading } = useQuery({
    queryKey: [ 'info' ],
    queryFn: () => fetch('/info').then(r => r.json()),
    staleTime: Infinity,
  });

  return {
    info: data,
    isLoading,
    error,
  };
};

const HomeWithoutProvider = () => {
  const { raiseToast } = useToast();

  const [ initialRender, setInitialRender ] = useState(false);

  const [ pageType, setPageType ] = useAtom(pageTypeAtom);

  const { info } = useInfo();

  useEffect(() => {
    if (info.isNewVersionAvailable) {
      raiseToast({
        // ...
      });
    }
  }, [info]);

  return (
    <div>...</div>
  );
};

const Home = () => (
  <Provider>
    <HomeWithoutProvider />
  </Provider>
);

const Routes = () => (
  <BrowserRouter>
    <Suspense fallback={<Loader />}>
      <Route element={<Home />} path="/" />
    </Suspense>
  </BrowserRouter>
);

const root = React.createRoot(document.getElementById('root'));

root.render(
  <StrictMode>
    <StyleThemeProvider>
      <ToastProvider>
        <GlobalErrorHandler>
          <ReactQueryProvider>
            <Routes />
          </ReactQueryProvider>
        </GlobalErrorHandler>
      </ToastProvider>
    </StyleThemeProvider>
  </StrictMode>
);

Usually most of the components (Home, Routes) and hooks (useInfo) are in separate files, but for the sake of simplicity I combined them all into one code block.

What I find suboptimal with this code is that it has at least three obvious different state management systems:

jotai for shared atoms (pieces of global state)
React.useState for internal component state
various React.Contexts (StyleThemeProvider, ToastProvider, GlobalErrorHandler, etc.)

On top of those, there are less obvious state management systems:

react-router uses internal router state, which could be treated as global application state
react-query uses its internal cache for each query
react-hook-form uses the form state of a component' ancestor (which could be declared on any level above the current component)

In the pursuit of encapsulation and reducing the boilerplate, front-end developers came up with all of these solutions aiming to solve the problem of managing application state.

So what exactly is this problem? And what are the issues all of the aforementioned solutions try to address?

As I see it, there are two competing camps:

containing the logic in small reusable chunks (hooks, components)
sharing chunks of state between different parts of the application

There are some side-tracks like dealing with asynchronous actions (like fetching the data from server), changing the state of external components (like showing a toast message), reducing the unnecessary re-renders.

Back in the day, Redux seemingly addressed these areas to a degree. Redux implements Flux architecture, which was compared to MVC (Model-View-Controller) architecture back in the day:

It became especially popular after Angular.JS' MVVM (Model-View-ViewModel) architecture implementation was considered slow with its dirty checks and constant re-rendering.

It could be said that Redux is being shipped in recent versions of React itself - with the use of useReducer() hook. One would rarely use Redux on its own, often sticking to a somewhat opinionated stack of reselect (for derived state), react-redux (to connect() components to the store) and redux-thunk or redux-sagas (for asynchronous dispatch() calls).

The aforementioned component could be implemented with the "conventional" (old) Redux approach like so:

import { createStore, combineReducers, applyMiddleware } from 'redux';
import { connect } from 'react-redux';
import { createSelector } from 'reselect';
import { thunk } from 'redux-thunk';

const infoReducer = (state = { isNewVersionAvailable: false }, action) => {
  switch (action.type) {
    case 'INFO_LOADED':
      return { ...state, ...action.payload };

    default:
      return state;
  }
};

const pageDataReducer = (state = { pageType: undefined }, action) => {
  switch (action.type) {
    case 'SET_PAGE_TYPE':
      return { ...state, pageType: action.pageType };

    default:
      return state;
  }
};

const toastReducer = (state = { isOpen: false, content: undefined }, action) => {
  switch (action.type) {
    case 'SHOW_TOAST':
      return { ...state, isOpen: true, content: action.payload };

    default:
      return state;
  }
};

const rootReducer = combineReducers({
  info: infoReducer,
  pageData: pageDataReducer,
  toast: toastReducer,
});

const store = createStore(rootReducer, applyMiddleware(thunk));

// Meet thunks.
// A thunk in this context is a function that can be dispatched to perform async
// activity and can dispatch actions and read state.
// This is an action creator that returns a thunk:
const loadInfoAction = () =>
  (dispatch) =>
    fetch('/info')
      .then(r => r.json)
      .then(payload => dispatch({ type: 'INFO_LOADED', payload }));

const raiseToastAction = (content) =>
  (dispatch) =>
    dispatch({ type: 'SHOW_TOAST', payload: content });

const setPageTypeAction = (pageType) =>
  (dispatch) =>
    dispatch({ type: 'SET_PAGE_TYPE', pageType });

const Home = ({ info, pageType, loadInfo, raiseToast, setPageType }) => {
  useEffect(() => {
    loadInfo();
  }, []);

  useEffect(() => {
    if (info.isNewVersionAvailable) {
      raiseToast({
        // ...
      });
    }
  }, [info]);

  return (
    <div>...</div>
  );
};

const mapStateToProps = ({ info, pageData: { pageType } }) => ({ info, pageType });

const mapDispatchToProps = (dispatch) => ({
  raiseToast: (content) => dispatch(raiseToastAction(content)),

  setPageType: (pageType) => dispatch(setPageTypeAction(content)),

  loadInfo: () => dispatch(loadInfoAction()),
});

const HomeContainer = connect(mapStateToProps, mapDispatchToProps)(Home);

const Routes = () => (
  <BrowserRouter>
    <Suspense fallback={<Loader />}>
      <Route element={<HomeContainer />} path="/" />
    </Suspense>
  </BrowserRouter>
);

const root = React.createRoot(document.getElementById('root'));

root.render(
  <Routes />
);

In my opinion, Redux is not suitable for complex projects for a few reasons:

it combines all states (both local and global) in one big messy furball; managing it is quite a hurdle
- as the project complexity grows, one can not just change a piece of state or selectors without affecting the entirety of the project (and teams)
- combining reducers into one supermassive function makes any state update unreasonably long and complex process (remember: each reducer returns a new state instance; now imagine having even a hundred of reducers, each of which returns a new state)
it is easy for a component to be re-rendered on any change to the state; a lot of effort goes into making sure selectors are well memoized and not re-calculated
asynchronous actions are a big unsolved mystery (are you going to use thunks, flux, sagas or something else?)

On a flip side, the idea itself could actually bring a lot of positives if cooked properly:

there is only one possible flow of data: via dispatch() call, through the reducers and back to the components connected to the store via component props
- this is supposed to make following the data (e.g. debugging the application) easy
components are pretty much stateless at this point, encapsulated and not having side effects leaking everywhere
logic is nicely separated from the representation and is encapsulated in the reducers (and maybe, to a small extent, in selectors)

Elm utilizes the language features and its own runtime combined with Redux-like architecture to improve some aspects of the more traditional pure JS way of things, where there are only opinionated libraries and no one way of doing things.

Consider Elm architecture and how it compares to Redux:

all the states are still combined into one big cauldron of chaos
by default, any component is just a function returning an array; the entire application will be rerendered on each state change, which is still suboptimal, since literally all components are connected to the store
asynchronous actions are handled separately by the runtime in a similar way to synchronous actions; each action returns a new state and a command (triggering the asynchronous processing)
- since commands are handled by the runtime and there's a handful of commands, all of them will (eventually) circle back to dispatching messages just like components do, following the same one-way data flow
reducers are a lot faster, since they are essentially a big switch..case statement (which is cheap)

The above component could be re-implemented in Elm as follows:

import Browser
import Html exposing (..)
import Html.Attributes exposing (style)
import Html.Events exposing (..)
import Http
import Json.Decode exposing (Decoder, map, field, bool)

main =
  Browser.element
    { init = init
    , update = update
    , subscriptions = subscriptions
    , view = view
    }

type alias Model =
  { info : Maybe (Result Http.Error Info)
  , toast : Maybe Toast
  , pageType : PageType
  }

type alias Info =
  { quote : String
  , source : String
  , author : String
  , year : Int
  }

type alias Toast =
  { content : String }

type PageType = Page1 | Page2

init : () -> (Model, Cmd Msg)
init _ =
  ({ info = Nothing, toast = Nothing, pageType = Page1 }, loadInfo)

type Msg
  = GotInfo (Result Http.Error Info)
  | ShowToast Toast
  | SetPageType PageType

update : Msg -> Model -> (Model, Cmd Msg)
update msg model =
  case msg of
    GotInfo result ->
      case result of
        Ok info ->
          ({ model | info = Just (Ok info) }, Cmd.none)

        Err e ->
          ({ model | info = Just (Err e) }, Cmd.none)

    ShowToast t ->
      ({ model | toast = Just t }, Cmd.none)

    SetPageType p ->
      ({ model | pageType = p }, Cmd.none)

subscriptions : Model -> Sub Msg
subscriptions model =
  Sub.none

view : Model -> Html Msg
view model =
  div []
    [ h2 [] [ text "Demo App" ]
    , viewInfo model.info
    , viewToast model.toast
    ]

viewInfo : Maybe (Result Http.Error Info) -> Html Msg
viewInfo mbInfoResult =
  case mbInfoResult of
    Nothing ->
      text "Loading..."

    Just infoResult ->
      case infoResult of
        Err _ ->
          div []
            [ text "Could not load info" ]

        Ok info ->
          div []
            [ text "App loaded" ]

viewToast : Maybe Toast -> Html Msg
viewToast mbToast =
  case mbToast of
    Nothing ->
      div [] []

    Just toast ->
      div [] [ text toast.content ]

loadInfo : Cmd Msg
loadInfo =
  Http.get
    { url = "/info"
    , expect = Http.expectJson GotInfo infoDecoder
    }

infoDecoder : Decoder Info
infoDecoder =
  map Info
    (field "isNewVersionAvailable" bool)

The good bits are:

forcing to handle all possible actions (messages) and results (HTTP success and error scenarios)
expressive language features (union types, strong typing, records, switch-case expressions) ensure robust code (as in this code does not leave room for mistakes like null/undefined/unhandled exceptions/unhandled code path/wrong value type)
no leeway for various ways to get things done (as in there is only one way to handle HTTP requests, only one way to handle asynchronous message dispatches, only one way to parse HTTP responses)

But if Redux spreads things apart compared to modern React, Elm feels like it spreads things further apart by handling effect results separately (like sending HTTP request, parsing HTTP response and processing the result by dispatching another message).

One other example would be PureScript (or rather Halogen). Purescript itself elevates the complexity to the skies and beyond, by making you run around with monads like a headless chicken. Consider "simple" example of sending a HTTP request:

module Main where

import Prelude

import Affjax.Web as AX
import Affjax.ResponseFormat as AXRF
import Data.Either (hush)
import Data.Maybe (Maybe(..))
import Effect (Effect)
import Effect.Aff.Class (class MonadAff)
import Halogen as H
import Halogen.Aff (awaitBody, runHalogenAff)
import Halogen.HTML as HH
import Halogen.HTML.Events as HE
import Halogen.HTML.Properties as HP
import Halogen.VDom.Driver (runUI)
import Web.Event.Event (Event)
import Web.Event.Event as Event

main :: Effect Unit
main = runHalogenAff do
  body <- awaitBody
  runUI component unit body

type State =
  { loading :: Boolean
  , username :: String
  , result :: Maybe String
  }

data Action
  = SetUsername String
  | MakeRequest Event

component :: forall query input output m. MonadAff m => H.Component query input output m
component =
  H.mkComponent
    { initialState
    , render
    , eval: H.mkEval $ H.defaultEval { handleAction = handleAction }
    }

initialState :: forall input. input -> State
initialState _ = { loading: false, username: "", result: Nothing }

render :: forall m. State -> H.ComponentHTML Action () m
render st =
  HH.form
    [ HE.onSubmit \ev -> MakeRequest ev ]
    [ HH.h1_ [ HH.text "Look up GitHub user" ]
    , HH.label_
        [ HH.div_ [ HH.text "Enter username:" ]
        , HH.input
            [ HP.value st.username
            , HE.onValueInput \str -> SetUsername str
            ]
        ]
    , HH.button
        [ HP.disabled st.loading
        , HP.type_ HP.ButtonSubmit
        ]
        [ HH.text "Fetch info" ]
    , HH.p_
        [ HH.text $ if st.loading then "Working..." else "" ]
    , HH.div_
        case st.result of
          Nothing -> []
          Just res ->
            [ HH.h2_
                [ HH.text "Response:" ]
            , HH.pre_
                [ HH.code_ [ HH.text res ] ]
            ]
    ]

handleAction :: forall output m. MonadAff m => Action -> H.HalogenM State Action () output m Unit
handleAction = case _ of
  SetUsername username -> do
    H.modify_ _ { username = username, result = Nothing }

  MakeRequest event -> do
    H.liftEffect $ Event.preventDefault event
    username <- H.gets _.username
    H.modify_ _ { loading = true }
    response <- H.liftAff $ AX.get AXRF.string ("https://api.github.com/users/" <> username)
    H.modify_ _ { loading = false, result = map _.body (hush response) }

Now add the halogen-store package to the mix to make use of Redux-like state management:

module Main where

import Prelude

import Affjax.Web as AX
import Affjax.ResponseFormat as AXRF
import Data.Either (hush)
import Data.Maybe (Maybe(..))
import Effect (Effect)
import Effect.Aff.Class (class MonadAff)
import Halogen as H
import Halogen.Aff as HA
import Halogen.HTML as HH
import Halogen.HTML.Events as HE
import Halogen.HTML.Properties as HP
import Halogen.VDom.Driver (runUI)
import Web.Event.Event (Event)
import Web.Event.Event as Event
import Halogen.Store.Monad (class MonadStore, updateStore, runStoreT)
import Halogen.Store.Connect (Connected, connect)
import Halogen.Store.Select (selectAll)
import Effect.Aff (launchAff_)

data StoreAction
  = StoreSetUsername String
  | StoreMakeRequest
  | StoreReceiveResponse (Maybe String)

reduce :: State -> StoreAction -> State
reduce store = case _ of
  StoreSetUsername username ->
    store { username = username, result = Nothing }
  StoreMakeRequest ->
    store { loading = true }
  StoreReceiveResponse response ->
    store { loading = false, result = response }

initialStore :: State
initialStore = { username: "", loading: false, result: Nothing }

main :: Effect Unit
main = launchAff_ do
  body <- HA.awaitBody
  root <- runStoreT initialStore reduce component
  void $ runUI root unit body

type State =
  { loading :: Boolean
  , username :: String
  , result :: Maybe String
  }

data Action
  = SetUsername String
  | MakeRequest Event
  | ReceiveState (Connected State Unit)

deriveState :: Connected State Unit -> State
deriveState { context: { username, loading, result }, input: _ } =
  { username: username
  , loading: loading
  , result: result
  }

component :: forall query output m. MonadAff m => MonadStore StoreAction State m => H.Component query Unit output m
component =
  connect selectAll $ H.mkComponent
    { initialState: deriveState
    , render
    , eval: H.mkEval $ H.defaultEval
      { handleAction = handleAction
      , receive = Just <<< ReceiveState
      }
    }

render :: forall m. State -> H.ComponentHTML Action () m
render st =
  HH.form
    [ HE.onSubmit \ev -> MakeRequest ev ]
    [ HH.h1_ [ HH.text "Look up GitHub user" ]
    , HH.label_
        [ HH.div_ [ HH.text "Enter username:" ]
        , HH.input
            [ HP.value st.username
            , HE.onValueInput \str -> SetUsername str
            ]
        ]
    , HH.button
        [ HP.disabled st.loading
        , HP.type_ HP.ButtonSubmit
        ]
        [ HH.text "Fetch info" ]
    , HH.p_
        [ HH.text $ if st.loading then "Working..." else "" ]
    , HH.div_
        case st.result of
          Nothing -> []
          Just res ->
            [ HH.h2_
                [ HH.text "Response:" ]
            , HH.pre_
                [ HH.code_ [ HH.text res ] ]
            ]
    ]

handleAction :: forall output m. MonadAff m => MonadStore StoreAction State m => Action -> H.HalogenM State Action () output m Unit
handleAction = case _ of
  SetUsername username -> do
    updateStore $ StoreSetUsername username

  MakeRequest event -> do
    H.liftEffect $ Event.preventDefault event
    username <- H.gets _.username
    updateStore $ StoreMakeRequest
    response <- H.liftAff $ AX.get AXRF.string ("https://api.github.com/users/" <> username)
    updateStore $ StoreReceiveResponse (map _.body (hush response))

  ReceiveState input ->
    H.put $ deriveState input

The really nice things about this approach are:

components could be self-sufficient, as opposed to Elm:
- they can have both internal state and communicate with the external application via Aff
- they can be extracted into separate modules, making them actually reusable components
state selectors and connecting to a store are seamlessly implemented based on existing Halogen tools (subscriptions)
it feels like you do not have to worry about state growing big, since each component explicitly declares which parts of that messy furball it needs (derives)

The bad news is that everything relies on monads and transformers - lifting, mapping, flat-mapping are just the very tip of the iceberg. Once you hit some mysterious error - it is quite tricky to understand what is going on. Unlike Elm, which has really nicely structured, formatted and presented both error message, its location and ways to fix it.

Just looking at the type definitions is nauseaing at best:

handleAction :: forall output m. MonadAff m => MonadStore StoreAction State m => Action -> H.HalogenM State Action () output m Unit

And then there is this bit, lifting everything to the same monad and then mapping and flat-mapping it to get the response body:

MakeRequest event -> do
  H.liftEffect $ Event.preventDefault event
  username <- H.gets _.username
  updateStore $ StoreMakeRequest
  response <- H.liftAff $ AX.get AXRF.string ("https://api.github.com/users/" <> username)
  updateStore $ StoreReceiveResponse (map _.body (hush response))

And do not forget that this monad is merely describing the computation, you will have to run it at some point:

main = runHalogenAff do
  body <- HA.awaitBody
  root <- runStoreT initialStore reduce component
  let ui = runUI root unit body
  ui

The example on halogen-store suggests using launchAff_, but then you will have to cast the return value type to match the monad of the main function (Effect Unit) or lift runStoreT to the Effect Unit monad - whichever you find suitable:

main = launchAff_ do
  body <- HA.awaitBody
  root <- runStoreT initialStore reduce component
  let ui = runUI root unit body
  void ui

But having to worry about all these intricacies actually strengthens the point that PureScript is not for the faint-harted - Elm prevails here.

The other drawback is that mixing logic in both component and the reducers is weird - it is not clear from the Flux architecture where the side-effects should live - like network calls, asynchronous actions, actions triggering other actions. Redux is known for suffering from all of these areas.

Elm solves this nicely with commands.

Halogen kind of takes a step backwards from Elm - it does have subscriptions, but it does not prevent you from issuing side effects from the handleActions. And halogen-store does not have a recipe for complex chained actions.

Ultimately, I don't think Redux is bad - the idea to have a full visibility into all possible application interactions is scary in a complex project and it is hard to come up with a clean way to work around it, but single-way data flow is actually nice.

Interesting how developers went from "we don't want Angular.js dirty checks - it is not clear where the data is flowing" to "we don't want a single point of contention for all application interactions".

In my eyes, the four technologies (modern-day React, Redux, Elm and Purescript) all come with their own pros and massive cons and there is no good or one-size-fits-all solution among them. And none of them ultimately solves the problem of managing application state and interactions in a non-bloated way. Maybe Angular or React 19 have an answer?

Bun is still undercooked

17 May 2024

my Skunkworks project was trying out Bun. it was not a successful project, but there are some learnings:

TL;DR: I think Bun is still undercooked and despite being super cool and competitive on paper, it is a bit too early to use it in big projects. But hey, it works for my blog!

What is Bun

Bun is a combined alternative for NodeJS, package manager (npm / yarn / pnpm), bundler (vite / webpack / esbuild) and test runner (vite / jest).

Bun is ridiculously fast

Command	Yarn	Bun
`yarn install`	`49 sec`	`7.5 sec`
`vite build`	`14 sec`	`0.9 sec`
`vite test`	`forever`	`4.5 sec`

This most likely has to do with what happens in those tools - Bun went with parsing files as ASTs, applying transformations and running them in memory (to the best of my knowledge, digging through the Bun code)

Some things work out of the box

Dependency management works like a charm. No questions asked. Bun is just 7x faster. Comparing the node_modules directories:

Only in node_modules_yarn:
    .yarn-state.yml
    @aashutoshrathi
    @isaacs
    @npmcli
    @pkgjs
    @tootallnate
    abbrev
    agentkeepalive
    aggregate-error
    aproba
    are-we-there-yet
    asynciterator.prototype
    cacache
    chownr
    clean-stack
    color-support
    console-control-strings
    deep-equal
    delegates
    depd
    eastasianwidth
    encoding
    env-paths
    err-code
    es-get-iterator
    exponential-backoff
    foreground-child
    fs-minipass
    gauge
    graceful-fs
    has
    has-unicode
    humanize-ms
    ip
    is-lambda
    jackspeak
    jsonc-parser
    make-fetch-happen
    minipass-collect
    minipass-fetch
    minipass-flush
    minipass-pipeline
    minipass-sized
    minizlib
    mkdirp
    negotiator
    node-gyp
    nopt
    npmlog
    object-is
    p-map
    promise-retry
    retry
    set-blocking
    smart-buffer
    socks
    socks-proxy-agent
    ssri
    stop-iteration-iterator
    string-width-cjs
    strip-ansi-cjs
    tar
    unique-filename
    unique-slug
    wide-align
    wrap-ansi-cjs

Only in node_modules_bun:
    confbox
    es-object-atoms
    word-wrap

Curious to see if those packages missing in bun's node_modules are actually used anywhere.

Plugins

In Relational Migrator we use few plugins with vite, namely svgr, vanilla-extract and sentry. Bun only supports limited esbuild plugins and does not have the aforementioned plugins. Some of them work with minimal changes, some of them do not work entirely.

`svgr` plugin

svgr worked with minimal alterations:

import svgrEsbuildPlugin from 'esbuild-plugin-svgr';

Bun.build({
    plugins: [
        svgrEsbuildPlugin() as unknown as BunPlugin,
    ]
})

But required to change the imports from

import { ReactComponent as DatabaseAccessImage } from './assets/database-access-image.svg';

import DatabaseAccessImage from './assets/database-access-image.svg';

`vanilla-extract` plugin

This one loads vite server to compile the CSS and does not work no matter what I tried, throwing the following errors all over the place:

error: Styles were unable to be assigned to a file. This is generally caused by one of the following:

- You may have created styles outside of a '.css.ts' context
- You may have incorrect configuration. See https://vanilla-extract.style/documentation/getting-started
      at getFileScope (.../frontend/node_modules/@vanilla-extract/css/fileScope/dist/vanilla-extract-css-fileScope.cjs.dev.js:35:11)
      at generateIdentifier (.../frontend/node_modules/@vanilla-extract/css/dist/vanilla-extract-css.cjs.dev.js:175:7)
      at style (.../frontend/node_modules/@vanilla-extract/css/dist/vanilla-extract-css.cjs.dev.js:374:19)
      at .../frontend/src/shared/leafygreen-ui/badge/badge.css.ts:4:28

Followed by

error: Module._load is not a function. (In 'Module._load(file, parentModule)', 'Module._load' is undefined)
    at .../frontend/src/components/mapping-banner.css.ts:1:0

`sentry` plugin

This one was trivial and did not complain (I did not check if it actually works):

Bun.build({
    plugins: [
        sentryEsbuildPlugin({
                disable: !process.env.SENTRY_AUTH_TOKEN,
                org: 'mongodb-org',
                project: 'relational-migrator-frontend',
                telemetry: false,
                sourcemaps: {
                        filesToDeleteAfterUpload: '**/*.map',
                },
        }) as unknown as BunPlugin,
    ]
})

Bundling

Bun is great to run bundling, testing or manage packages from CLI when things are relatively simple. When you need plugins (for instance), the interactions become tricky. For specifying and configuring bundle-time plugins one needs to use Bun's JS/TS API and make a custom build script:

await Bun.build({ ... });

By default, Bun does not log anything, which is actually quite inconvenient - not even build failures are logged. One has to get the result of Bun.build() and manually process them, which is a bit of a bummer:

const result = await Bun.build(...);

if (!result.success) {
  console.error('Build failed');

  for (const message of result.logs) {
    console.error(message);
  }
} else {
  console.info('Build succeeded');
}

Configuration

Another inconvenient interaction - some actions require entire scripts (like build configuration, serving files, etc.). Then there is a config file, bunfig.toml where users can specify some configurations for Bun.

Running tests

This one had the most issues on my end.

`react-testing-library`

Bun declares support for react-testing-library, which worked as expected.

Browser APIs

Had to use happy-dom and configure it in the bunfig.toml to enable some of the UI testing features (such as access to the window object). Yet, happy-dom still lacks support for Canvas API, for instance.

`test.each`

It is one example of Bun's partial compatibility with Jest - with Jest one can use nice-ish string interpolation to generate test name:

test.each`
    currentStep | lastCompletedStep | progressType
    ${0}        | ${0}              | ${'active'}
    ${1}        | ${0}              | ${'inactive'}
    ${1}        | ${2}              | ${'checked'}
  `(
    'returns $progressType for $currentStep and $lastCompletedStep',
    ({ currentStep, lastCompletedStep, progressType }) => { })
);

With bun:test it is slightly different - you can't use arguments out of order, nor do you have access to their names. Neither can you use this nice syntactic sugar for defining test cases in a table manner.

const cases = [
    // currentStep | lastCompletedStep | progressType
    [ 0, 0, 'active' ],
    [ 1, 0, 'inactive' ],
    [ 1, 2, 'checked' ],
  ];

  test.each(cases)(
    'For %p and %p returns %p',
    (currentStep, lastCompletedStep, progressType) => {
      expect(getProgressType(currentStep, lastCompletedStep)).toEqual(
        progressType
      );
    }
  );

Oh, and there is no describe.each() functionality at all, which makes defining suites of tests more tedious.

Mocks

Mocks work as expected, out of the box. There are mocks for system clock, which is nice. Had to replace vi.fn() with mock() and a corresponding import { mock } from 'bun:test';.

`ObjectContaining` matchers

When using nested matchers in the ObjectContaining, some of them are missing in Jest compatibility (like expect.toBeNumber):

expect(nodes).toEqual([
        expect.objectContaining({
          id: 'node-1',
          position: { x: expect.toBeNumber(), y: expect.toBeNumber() },
        }),
]);

Had to use expect.any(Number) instead:

expect(nodes).toEqual([
        expect.objectContaining({
          id: 'node-1',
          position: { x: expect.any(Number), y: expect.any(Number) },
        }),
]);

Using `Fragment` import alongside `<>`

If a component contains both import { Fragment } from 'react' and uses a shorthand <>, Bun will yell at test time (but not at build time, interestingly enough):

SyntaxError: Cannot declare an imported binding name twice: 'Fragment'.

If you specify a different jsxFragmentFactory in tsconfig.json and set "jsx": "react" (and not "react-jsx" or anything), you will get further.

After meddling with Bun source code itself, I figured something (like it parses files' AST and modifies them to add missing imports, like Fragment but it ends up with duplicates), but even after applying some crude hacks to prevent it from adding those duplicate statements, I could not get to fix the issues. Left a comment on Bun's Github issue, but from my experience developers do not pay enough attention to those.

Ended up manually changing sources for the libraries in question in node_modules folder directly (just for the test), which did actually help. Might be worth changing it in the libraries directly, but that won't work with everything.

Ace editor

It still is kinda impossible to use UMD/AMD modules in conjunction with TS in Bun tests - the nature of UMD is that once the file is imported, it uses IIF to define stuff, but Bun does not tolerate this (I presume it only parses the AST of the imported file but does not actually execute it in the right order).

Hence Ace editor, which uses UMDs, can not really be used as intended.

Bun's meat and potatoes

I did a bit of digging in Bun's source code and it seems... immature - commented out code, ignored tests, thousand-line-functions and files (js_parser.zig has 23.3k LOC). And this is on top of using Zig, which is still at version 0.12 (as of writing of this post, 10 May 2024) and has quite limited standard library (no remove and find methods in lists, no hash sets, etc.).

Bottom line

My experience shows that Bun might fine to be used in new and low-risk projects, but it is not ready for a drop-in replacement in existing or more or less complex projects.

Strongly-typed front-end: experiment 2, simple application, in PureScript / Halogen

17 May 2024

A more "conventional" way to implement the front-end application in PureScript would be using a framework called Halogen.

Starting off with a "hello world" example:

module Main where

import Prelude

import Effect (Effect)
import Halogen as H
import Halogen.Aff as HA
import Halogen.HTML as HH
import Halogen.HTML.Events as HE
import Halogen.VDom.Driver (runUI)

main :: Effect Unit
main = HA.runHalogenAff do
  body <- HA.awaitBody
  runUI component unit body

data Action = Increment | Decrement

component =
  H.mkComponent
    { initialState
    , render
    , eval: H.mkEval $ H.defaultEval { handleAction = handleAction }
    }
  where
  initialState _ = 0

  render state =
    HH.div_
      [ HH.button [ HE.onClick \_ -> Decrement ] [ HH.text "-" ]
      , HH.div_ [ HH.text $ show state ]
      , HH.button [ HE.onClick \_ -> Increment ] [ HH.text "+" ]
      ]

  handleAction = case _ of
    Increment -> H.modify_ \state -> state + 1
    Decrement -> H.modify_ \state -> state - 1

Adding the utility code akin to the other technologies:

data Shape = Circle | Square

calculateArea :: Maybe Shape -> Float -> Float
calculateArea Nothing _ = 0
calculateArea (Just Circle) value = pi * value * value
calculateArea (Just Square) value = value * value

getShape :: String -> Maybe Shape
getShape "circle" = Just Circle
getShape "square" = Just Square
getShape _ = Nothing

This resurfaces few differences from Haskell, Elm and others:

there is no pi constant in the Prelude, so need to import one of the available definitions, I went with Data.Number
Float is not a type; there is Number, however
0 is not a Number, it is Int, confusing the audience

These are all minor differences, however. But this code is not a conventional PureScript either - it is working against the good practices of functional programming and thus defeats the purpose of these experiments. Examples of this are the heavy reliance on String instead of using the available type system.

Let us change that a bit:

import Data.String.Read (class Read)

data Shape = Circle | Square

calculateArea :: Shape -> Number -> Number
calculateArea Circle value = pi * value * value
calculateArea Square value = value * value

instance Read Shape where
  read = case _ of
    "square" -> Just Square
    "circle" -> Just Circle
    _ -> Nothing

instance Show Shape where
  show = case _ of
    Square -> "square"
    Circle -> "circle"

Now, to the UI:

import Halogen.HTML.Properties as HP

render state =
  HH.div_
    [
      HH.select [] [
        HH.option [ HP.value "" ] [ HH.text "Select shape" ],
        HH.option [ HP.value (show Circle) ] [ HH.text (show Circle) ],
        HH.option [ HP.value (show Square) ] [ HH.text (show Square) ]
      ],
      HH.input [],
      HH.div_ [ HH.text "<area>" ]
    ]

In the application state we need to store the selected shape and the value, so we can utilize records for that:

initialState _ = { shape: Nothing, value: Nothing }

Then we need to modify the possible actions. Let's stick to the same approach of utilizing the type system:

data Action = ChangeValue (Maybe Number) | ChangeShape (Maybe Shape)

The thing glueing the two together is the handleAction function:

handleAction = case _ of
  ChangeValue value ->
    H.modify_ \state -> state { value = value }
  ChangeShape shape ->
    H.modify_ \state -> state { shape = shape }

Here, unlike Haskell (to my best knowledge), the placeholder variable is being used for pattern matching against the only function argument. So instead of a little verbose

handleAction action = case action of
  -- ...

you can use this placeholder variable and just provide the branches for each of its possible values:

handleAction = case _ of
  -- ...

Modifying the state is done using the Halogen.Hooks.HookM.modify_ function, which allows us to only use the previous state value and provide a new state value, without the need to mess with monads. In turn, we modify the state record using the record syntax:

state { shape = newShapeValue }

Now the only bit left is tying the UI with the actions:

import Halogen.HTML.Events as HE
import Data.String.Read (read)
import Data.Number as N
import Data.Tuple (Tuple(..))

render state =
  HH.div_
    [
      HH.select [ HE.onValueChange onShapeChanged ] [
        HH.option [ HP.value "" ] [ HH.text "Select shape" ],
        HH.option [ HP.value (show Circle) ] [ HH.text (show Circle) ],
        HH.option [ HP.value (show Square) ] [ HH.text (show Square) ]
      ],
      HH.input [ HE.onValueChange onValueChanged ],
      HH.div_ [ HH.text "<area>" ]
    ]

onShapeChanged v = ChangeShape (read v)

onValueChanged v = ChangeValue (N.fromString v)

showArea state =
  case res of
    Nothing ->
      HH.text "Choose shape and provide its parameter"

    Just (Tuple shape area) ->
      HH.text $ "Area of " <> (show shape) <> " is " <> (show area)

  where
    res = do
      shape <- state.shape
      value <- state.value
      let area = calculateArea shape value
      pure (Tuple shape area)

Here is where most fun and benefit from using PureScript comes into play.

First of all, the HE.onValueChange event handler (the onShapeChanged and onValueChanged functions) - it will be called with the new value for the input instead of an entire event object. This allows us to skip unpacking the raw value from that object.

Then, the action dispatchers take the value from the input and try to parse it, returning a Maybe a:

onShapeChanged :: String -> Maybe Shape
onShapeChanged v = ChangeShape (read v)

onValueChanged :: String -> Maybe Number
onValueChanged v = ChangeValue (N.fromString v)

It is actually a quite important part, since the shape might not be selected (making the <select> value an empty string) and the value might be either a blank string or not a valid number string. PureScript does not allow us to not handle these cases, so whenever we parse the user input, we get a Maybe a value and we have to handle both scenarios when the value is valid and when it is not.

The function showArea is where this neatness comes together - we handle both values as one, using the Data.Tuple type to pair them together:

res = do
  shape <- state.shape -- unpacks `Shape` from `Maybe Shape`
  value <- state.value -- unpacks `Number` from `Maybe Number`
  let area = calculateArea shape value -- always returns a Number, since both `shape` and `value` are always provided
  pure (Tuple shape area) -- returns a tuple of shape and area, packed in a `Maybe`

The above code will shortcircuit whenever at any point it is trying to unpack a value from a Nothing and the whole do block will return Nothing.

Putting it all together:

module Main where

import Prelude

import Data.Maybe (Maybe(..))
import Data.Number as N
import Data.String.Read (class Read, read)
import Data.Tuple (Tuple(..))
import Effect (Effect)
import Halogen as H
import Halogen.Aff as HA
import Halogen.HTML as HH
import Halogen.HTML.Events as HE
import Halogen.HTML.Properties as HP
import Halogen.VDom.Driver (runUI)

data Shape = Circle | Square

calculateArea :: Shape -> Number -> Number
calculateArea Circle value = N.pi * value * value
calculateArea Square value = value * value

instance Read Shape where
  read = case _ of
    "square" -> Just Square
    "circle" -> Just Circle
    _ -> Nothing

instance Show Shape where
  show = case _ of
    Square -> "square"
    Circle -> "circle"

data Action = ChangeValue (Maybe Number) | ChangeShape (Maybe Shape)

component =
  H.mkComponent
    { initialState
    , render
    , eval: H.mkEval $ H.defaultEval { handleAction = handleAction }
    }
  where
  initialState _ = { shape: Nothing, value: Nothing }

  render state =
    HH.div_
      [
        HH.select [ HE.onValueChange onShapeChanged ] [
          HH.option [ HP.value "" ] [ HH.text "Select shape" ],
          HH.option [ HP.value (show Circle) ] [ HH.text (show Circle) ],
          HH.option [ HP.value (show Square) ] [ HH.text (show Square) ]
        ],
        HH.input [ HE.onValueChange onValueChanged ],
        HH.div_ [ showArea state ]
      ]

  onShapeChanged v = ChangeShape (read v)

  onValueChanged v = ChangeValue (N.fromString v)

  showArea state =
    case res of
      Nothing ->
        HH.text "Select shape and provide its value"

      Just (Tuple shape area) ->
        HH.text $ "Area of " <> (show shape) <> " is " <> (show area)
    where
      res = do
        shape <- state.shape
        value <- state.value
        let area = calculateArea shape value
        pure (Tuple shape area)

  handleAction = case _ of
    ChangeValue value ->
      H.modify_ \state -> state { value = value }
    ChangeShape shape ->
      H.modify_ \state -> state { shape = shape }

main :: Effect Unit
main = HA.runHalogenAff do
  body <- HA.awaitBody
  runUI component unit body

More than TypeScript

12 May 2024

Back in 2011 frontend was a very different thing - JavaScript had no class, Object.entries / Object.keys, promises were a proof of concept idea (unless you used 3rd party library bluebird) and Node was v0.10.

Then came CoffeeScript, which added nice helper features to JavaScript - list comprehensions, classes, string interpolation and if statements (meaning you could use them for variable assignment):

# if statements
text = if happy and knowsIt
  chaChaCha()
else if sexy
  knowsIt()
else if tooSexy
  removeShirt()
else
  showIt()

# list comprehensions
courses = [ 'greens', 'caviar', 'truffles', 'roast', 'cake' ]
menu = (i, dish) -> "Menu Item #{i}: #{dish}"
menu i + 1, dish for dish, i in courses

# ranges and list comprehensions
countdown = (num for num in [10..1])

# iterating over object entries
yearsOld = max: 10, ida: 9, tim: 11

ages = for child, age of yearsOld
  "#{child} is #{age}"

Whilst it was still compiled to an inferior ES5 JavaScript, it helped to organise the code and make it substantially cleaner. The one drawback is that it did not provide any type safety (only added recently, via @flow and you would still have to duplicate your classes if you wanted to use it - once in coffeescript and once in @flow annotations). Anyhow, CoffeeScript was a good tool for the task, if you ask me.

Then came Dart, Flow and TypeScript, which were also compiled to ES5 JavaScript, but instead of adding new syntax features, they aimed to solve a different problem - by introducing types they sought to reduce the number of runtime errors with type checks at compile time.

This sounded so good that many developers and companies immediately got on board. Alas, the new tech still suffered from the same issue as JavaScript itself and most common cause of runtime errors - the null and undefined still was a thing, causing the exact same runtime errors.

The one true benefit offered by TypeScript over the others was that it allowed to seamlessly use existing JavaScript code. And, provided you have the type signatures for that JavaScript code, it could even perform type checking on it too, effectively reducing the requirements for using TypeScript in the existing codebase. No wonder it was an easy buy-in for many projects.

Fast-forward to 2024 (twelve years since its first release) and TypeScript dominates the frontend world. Over the years it seems to have been focused on improving the type system in terms of what can you do with types - union types, partial types, etc. It still suffers from the original issues though and there still are not too many new syntactic features to pair with its powerful type system (like pattern matching).

With the new EcmaScript standards, classes and promises becoming the first-class citizens in all browsers (even Internet Explorer / Edge, when it was still around), the APIs and syntax became more mature (Object.entries, async / await to reduce callback hell, for .. of, const / let, string interpolation and many others). There are still no list comprehensions or conditional expressions / pattern matching though.

TypeScript still does help in what I think is a small subset of highly-specific scenarios like navigating the code (jump to definition / declaration / find usages) in the IDE and refactoring the code. But IDEs matured as well, code navigation is not as much of a feature unique to TypeScript anymore as it used to be (in the era of Sublime Text). TypeScript can prevent some really naive errors at compile time like using a number instead of an object, but I don't think developers run into them these days - because, again, IDEs are really powerful these days, even VSCode and they help eliminate such mistakes to a large degree.

Let's consider a real-world scenario (from my work project): server returns a list of LocationType objects. Each one of the objects can be either a Collection, a Document (in a specific collection) or a Table, each with its own subset of fields, specific to the type. We need to handle each case differently (display them on the UI differently).

The OpenAPI spec:

tableLocation:
  type: object
  required:
    - table
  properties:
    table:
      type: string

collectionLocation:
  type: object
  required:
    - collection
  properties:
    collection:
      type: string

documentLocation:
  type: object
  required:
    - collection
    - document
  properties:
    collection:
      type: string
    document:
      type: string

location:
  oneOf:
    - $ref: "#/components/tableLocation"
    - $ref: "#/components/collectionLocation"
    - $ref: "#/components/documentLocation"

And the TypeScript code generated by OpenAPI for the above spec looks like this:

interface CollectionLocation {
    collection: string;
}

interface DocumentLocation {
    collection: string;
    document: string;
}

interface TableLocation {
    table: string;
}

function instanceOfCollectionLocation(value: object): boolean {
    let isInstance = true;
    isInstance = isInstance && "collection" in value;

    return isInstance;
}

function instanceOfDocumentLocation(value: object): boolean {
    let isInstance = true;
    isInstance = isInstance && "collection" in value;
    isInstance = isInstance && "document" in value;

    return isInstance;
}

function instanceOfTableLocation(value: object): boolean {
    let isInstance = true;
    isInstance = isInstance && "table" in value;

    return isInstance;
}

function CollectionLocationFromJSONTyped(json: any, ignoreDiscriminator: boolean): CollectionLocation {
    if ((json === undefined) || (json === null)) {
        return json;
    }
    return {
        'collection': json['collection'],
    };
}

function DocumentLocationFromJSONTyped(json: any, ignoreDiscriminator: boolean): DocumentLocation {
    if ((json === undefined) || (json === null)) {
        return json;
    }
    return {
        'collection': json['collection'],
        'document': json['document'],
    };
}

function TableLocationFromJSONTyped(json: any, ignoreDiscriminator: boolean): TableLocation {
    if ((json === undefined) || (json === null)) {
        return json;
    }
    return {
        'table': json['table'],
    };
}

type JobLocation = CollectionLocation | DocumentLocation | TableLocation;

function JobLocationFromJSONTyped(json: any, ignoreDiscriminator: boolean): JobLocation {
    if ((json === undefined) || (json === null)) {
        return json;
    }
    return { ...CollectionLocationFromJSONTyped(json, true), ...DocumentLocationFromJSONTyped(json, true), ...TableLocationFromJSONTyped(json, true) };
}

This might be a bit too harsh on TypeScript as a language, considering this is not the best implementation (in my opinion), but this highlights one of the issues with TypeScript: this very same code caused a number of runtime errors caught by users - not exceptions, not compilation errors, but wrong UI behaviour - on the UI all locations were treated as a table location.

The reason why this was happening is the code itself - it does not really handle the choice type JobUpdateLocation correctly and instead of a choice type it returns a union type, to put it roughly - instead of oneOf it returns essentially allOf object.

Now, even if we were to rewrite it by hand (which would defeat the purpose of using OpenAPI and could be harder to keep in sync between client and server code), we would end up with something like this:

type CollectionLocation = {
    collection: string;
}

type DocumentLocation = {
    collection: string;
    document: string;
}

type TableLocation = {
    table: string;
}

function instanceOfCollectionLocation(value: object): boolean {
    return ("collection" in value) && !("document" in value);
}

function instanceOfDocumentLocation(value: object): boolean {
    return ("collection" in value) && ("document" in value);
}

function instanceOfTableLocation(value: object): boolean {
    return ("table" in value);
}

function CollectionLocationFromJSONTyped(json: any): CollectionLocation | undefined {
    if ((json === undefined) || (json === null)) {
        return undefined;
    }
    return {
        'collection': json['collection'],
    };
}

function DocumentLocationFromJSONTyped(json: any): DocumentLocation | undefined {
    if ((json === undefined) || (json === null)) {
        return undefined;
    }
    return {
        'collection': json['collection'],
        'document': json['document'],
    };
}

function TableLocationFromJSONTyped(json: any): TableLocation | undefined {
    if ((json === undefined) || (json === null)) {
        return undefined;
    }
    return {
        'table': json['table'],
    };
}

type JobLocation = CollectionLocation | DocumentLocation | TableLocation;

function JobLocationFromJSONTyped(json: any): JobLocation | undefined {
    if ((json === undefined) || (json === null)) {
        return undefined;
    }
    if (instanceOfCollectionLocation(json) && !instanceOfDocumentLocation(json) && !instanceOfTableLocation(json)) {
      return CollectionLocationFromJSONTyped(json);
    }
    if (instanceOfDocumentLocation(json) && !instanceOfCollectionLocation(json) && !instanceOfTableLocation(json)) {
      return DocumentLocationFromJSONTyped(json);
    }
    if (instanceOfTableLocation(json) && !instanceOfCollectionLocation(json) && !instanceOfDocumentLocation(json)) {
      return TableLocationFromJSONTyped(json);
    }
    return undefined;
}

This is quite a verbose code, with few bad practices in place (the use of any and object, need to always be conscious of undefined), but this is literally what we ended up doing (the helpers, not redefining the types).

The types in TypeScript (or rather classes and interfaces) are also a point of a few confusing tricks you have to keep in mind - if we were to use class instead of type, as follows

class CollectionLocation {
    constructor(public collection: string) {}
}

class DocumentLocation {
    constructor(public collection: string, document: string) {}
}

class TableLocation {
    constructor(public table: string) {}
}

we would eventually run into the similar bug at runtime, which is a perfectly valid behaviour from the perspective of TypeScript, because of type compatibility, meaning DocumentLocation and CollectionLocation could be used interchangeably, since they have a subset of compatible fields:

const a: CollectionLocation = new DocumentLocation('col', 'doc'); // ok
const b: DocumentLocation = new CollectionLocation('col'); // also ok
const c: TableLocation = a; // not ok

This would not work if CollectionLocation, DocumentLocation and TableLocation were types instead:

const a: CollectionLocation = { collection: 'col', document: 'doc' }; // not ok
const b: DocumentLocation = { collection: 'col' }; // not ok
const c: TableLocation = b; // not ok

And the code to parse location from JSON is actually quite ugly. It would benefit so much from switch expressions or pattern matching!

And we still have to remember to handle those potentially undefined values whenever we use the helper functions. And here's an example from literally few days ago:

interface Job {
    projectId: string;
}

const jobsInProgress: Job[];

const message = useProjectStatus(
    jobsInProgress[0]?.projectId ?? ''
);

const useProjectStatus = (projectId: string) {
    const { data } = useQuery({ queryFn: () => fetch(`/project/${projectId}`) });

    return data;
};

The above code started throwing an error, since server responded with 500 Server Error. Reason was quite simple - the projectId, which we recently started to validate on the server (expecting it to be a valid ID), was a blank string. Interesting thing: no one has questioned the very line causing the "default value" for projectId to become an empty string:

jobsInProgress[0]?.projectId ?? ''

And issues like these are unbelievably common in front-end world, while being quite tricky to detect and resolve. To fix this particular issue we added a client-side validation (to an extent) to run the query only when the projectId value is provided:

useQuery({
  enabled: !!projectId,
  queryFn: () => fetch(`/project/${projectId}`)
})

The problem remains: it is really easy to miss this rather small fallback to an empty string.

There is a solution in Scala world that somewhat addresses this issue - refined types, which allows to have something along the lines of:

import eu.timepit.refined.*
import eu.timepit.refined.api.Refined
import eu.timepit.refined.string.*

type NonBlankString = MatchesRegex[".+"]

def useProjectStatus(projectId: String Refined NonBlankString) = ???

This would require wrapping all the values passed to fetchData in refineV[NonBlankString]() call and handling the case when the validation fails:

def generateId(): String = List("some", "").last

def fetchData(id: String Refined NonBlankString) = println(s"fetching '$id'...")

def main(args: Array[String]) = {
  for {
    id <- refineV[NonBlankString](generateId())
  } yield fetchData(id)
}

But in the land of TypeScript, there is only so much you can do - TypeScript only works at compile time.

The above examples might sound like very far-fetched edge case scenarios, but keep in mind: this is the code generated automatically by one of the most popular tools from a trivial schema. This is not as far-fetched as it might seem.

Can we do better in TypeScript? Something like refined in Scala? What if we had a powerful type system and syntax to support it? And, if possible, get rid of the null and undefined along the way?

The problem of null and undefined can be mitigated to an extent by using some concepts of functional programming, similarly to how I showcased some time ago. It would be quite hard to achieve, though, given the problem remains imbued in the language itself. Moreover, targeting the issues described above, it would take an entire standard library to really reduce the possibility of the issue:

const jobInProgressMaybe: Maybe<Job> = new List<Job>(jobsInProgress).first();
const projectIdMaybe: Maybe<string> = jobInProgress.map(j => j.projectId);

useQuery({
  enabled: projectIdMaybe.isSome(),
  queryFn: () =>
    projectIdMaybe
      .flatMap(projectId => fetch(`/project/${projectId}`))
      .orElse(Promise.reject('no projectId'))
})

We could try to address the empty string issue with the extensive type system:

type NonEmptyString<T extends string> = '' extends T ? never : T;
type MyString<T extends string> = T;

function fetchData<T extends string>(id: NonEmptyString<T>) {
  console.log(`Fetching ${id}`);
}

But it would only work if all the values are known at compile time, which is easily broken with the simplest test:

function generateId(): NonEmptyString<string> {
  return ['moo', ''][1] as NonEmptyString<string>;
}

fetchData(''); // not ok
fetchData(generateId()); // ok, but guaranteed undefined behaviour at runtime

And it would take even more effort to make all types uniquely identifiable (to solve the choice type problem).

The way most developers would approach solving similar issues in a real-world project would be (at best) adding some linters, checkers and relying on automated tests and high-quality code reviews. In my experience, this is a rather flimsy excuse rather than a real solution and it does not work most of the time - especially in edge case scenarios.

This is where I'd suggest to use another language altogether, which, similarly to CoffeeScript and TypeScript back in the day, solved some problems at compile time. And suggest I will.

There is a big warning before I proceed though: another technology is a rather big decision, not to be taken lightly. Some might see it as an impossible switch, only few small or indie developers on rather unimportant projects would make. But remember it was the same story with ES6, TypeScript, bundlers and pretty much any significant upgrade in the past.

Balance bike is a good tool to get you going - it gets you from walking to moving fast. But if you want to get faster and further, you have to drop it at some stage in favour of a more advanced bike.

Similarly to how TypeScript and ESNext got you from plain callback-hell-infested JavaScript code to a better place - you can refactor code faster, it saves you from a few errors at compile time, the code is much cleaner and conscise now. But if you really want to get even further, you will have to make a leap of faith, make an investment into the future.

Here is my big controversial suggestion: a pure functional language, one with strong type system, which does not have a concept of null and undefined in the first place, with a nice sweet syntax.

Before landing on a specific choice, check out Elm (dead, but a good starting point) and PureScript, in that order. Let me explain.

Elm is like a very simplified Haskell - it is a pure functional language with a subset of Haskell syntax. It has a nice compiler with really good error messages. It enforces a structure for your application (redux-like). It gives you a gentle introduction to the functional programming concepts and it targets browsers (web applications). With its architecture, you can look at the message (action from redux) type and see exactly what are all possible operations in the application (which makes reading code and getting to know new codebases much easier).

On a bad note, it is not being developed since 2019, it comes with an entire runtime (saves you from runtime errors, but blows up the bundle size) and it is a all-or-nothing commitment for the project - it is an all-in-one platform and if you want to gradually update your application from React - sorry, you will have to rewrite entire parts of you application entirely in Elm. The good point turned bad, having all possible actions defined in one message type make complex applications really complex (with one massive type definition, an issue very familiar to developers who had to deal with Redux).

here could have been an Elm code sample

The next step on this journey would be PureScript. It is an actively developed language, it has a minimal footprint after compiled to JS (much smaller than Elm), it has a very rich ecosystem and, best of all, it has a very simple interop with JS and it can compile just one module. Top it up with Halogen framework and you effectively got yourself Elm on steroids. The downside is that it is slightly more complex platform (language and framework) compared to Elm, so the learning curve is a bit steeper.

The above example of CoffeeScript code could be written in plain PureScript like this:

foreign import happy :: Boolean
foreign import knowsIt :: Boolean
foreign import sexy :: Boolean
foreign import tooSexy :: Boolean
foreign import chaChaCha :: String
foreign import knowsItStr :: String
foreign import removeShirt :: String
foreign import showIt :: String

import Data.Array ((..), mapWithIndex)
import Data.Map as M
import Data.Tuple (Tuple(..))

-- if statements with multiple branches become pattern matching
text
  | happy && knowsIt = chaChaCha
  | sexy = knowsItStr
  | tooSexy = removeShirt
  | otherwise = showIt

-- list comprehensions become function application
courses = [ "greens", "caviar", "truffles", "roast", "cake" ]

-- string interpolation is possible via an external packages
-- https://pursuit.purescript.org/packages/purescript-interpolate
import Data.Interpolate as I
menu' i dish = I.i "Menu Item " i ": " dish

-- https://pursuit.purescript.org/packages/purescript-template-strings
import Data.TemplateString.Unsafe ((<~>))
menu'1 i dish = "Menu Item ${i}: ${dish}" <~> { i: i, dish: dish }

import Data.TemplateString ((<->))
import Data.Tuple.Nested ((/\))
menu'2 i dish = "Menu Item ${i}: ${dish}" <-> [ "i" /\ i, "dish" /\ dish ]

-- pure PureScript string interpolation
menu i dish = "Menu Item " <> (show i) <> ": " <> dish

x i dish = menu (i + 1) dish

-- can not just call a function and ignore its result
x' = mapWithIndex x courses

-- ranges become Array monad
-- countdown :: Array Int
countdown = do
  num <- 10 .. 1
  pure num

-- JavaScript objects exist in a separate package
import Foreign.Object as FO
yearsOld' = FO.fromHomogeneous { max: 10, ida: 9, tim: 11 }

-- object as Map
yearsOld = M.fromFoldable [Tuple "max" 10, Tuple "ida" 9, Tuple "tim" 11]

y child age = (show child) <> " is " <> (show age)
ages = map y yearsOld
ages = map y yearsOld'

The real deal with this approach is how to migrate from an existing (most likely) React/TypeScript/(webpack | vite) ecosystem to PureScript?

Expanding on Scott Wlaschin's talk, you can (and probably should) separate the pure application logic from IO, potentially utilising the foreign imported functions to interact with the existing JS code (libraries). This way you keep your application logic error-free, and all the errors that can happen are shifted towards the presentation layer (MVC/MVP, remember this concept?).

This would be the best strategy for the most projects, migrating one bit at a time and making the application less and less error prone whilst not wreaking the havok by rewriting everything from scratch (very few businesses will buy into that).

The bigger issue is that most modern frontend apps I have seen are so mangled in mixing the business logic and the presentation layer, it would be challenging (to say the least) to unmangle it back to a reasonable code. Check how we handle UI action, triggering a HTTP request and updating both the UI (to display the request progress/status) and the application state (for other parts of the UI) at the same time.

here could have been a real-world application interaction handling code sample

Calling PureScript code from JavaScript (based on FFI example in PureScript book):

module Test where

import Prelude

gcd :: Int -> Int -> Int
gcd 0 m = m
gcd n 0 = n
gcd n m
  | n > m     = gcd (n - m) m
  | otherwise = gcd (m - n) n

data ZeroOrOne a = Zero | One a

inc :: ZeroOrOne Int -> ZeroOrOne Int
inc Zero = Zero
inc (One n) = One (n + 1)

_zero = Zero
_one = One 1
_two = One 2

and then in JS:

import Test from 'Test.js';

Test.gcd(15)(20);

const _zero = new Test.Zero();
const _one = new Test.One(1);
const _two = new Test.One(2);

console.log(Test.inc(_zero));
console.log(Test.inc(_one));
console.log(Test.inc(_two));

In the other direction (calling JS code from PureScript):

export const setItem = key => value => () =>
  window.localStorage.setItem(key, value);

export const getItem = key => () =>
  window.localStorage.getItem(key);

and then in PureScript:

foreign import setItem :: String -> String -> Effect Unit

foreign import getItem :: String -> Effect Json

import Data.Argonaut (class DecodeJson, class EncodeJson)
import Data.Argonaut.Decode.Generic (genericDecodeJson)
import Data.Argonaut.Encode.Generic (genericEncodeJson)
import Data.Generic.Rep (class Generic)

-- define PhoneType

derive instance Generic PhoneType _

instance EncodeJson PhoneType where encodeJson = genericEncodeJson
instance DecodeJson PhoneType where decodeJson = genericDecodeJson

processItem :: Json -> Either String Person
processItem item = do
  jsonString <- decodeJson item
  j          <- jsonParser jsonString
  decodeJson j

main = do
  item <- getItem "person"
  initialPerson <- case processItem item of
    Left  err -> do
      log $ "Error: " <> err <> ". Loading examplePerson"
      pure examplePerson
    Right p   -> pure p

To align this with the original problem about JobLocation, here's how this would look like in PureScript:

module Main where

import Prelude
import Data.Argonaut (jsonParser)
import Data.Argonaut.Decode (decodeJson, (.:), printJsonDecodeError)
import Data.Argonaut.Decode.Class (class DecodeJson)  
import Data.Argonaut.Decode.Error (JsonDecodeError(..))
import Data.Argonaut.Encode (encodeJson, (:=), (~>))
import Data.Argonaut.Encode.Class (class EncodeJson)
import Data.Bifunctor (bimap)
import Data.Either (Either(..))
import Data.Generic.Rep (class Generic)
import Data.Show.Generic (genericShow)
import Effect (Effect)
import Effect.Console (log)

data JobLocation
  = CollectionLocation { collection :: String }
  | DocumentLocation { collection :: String, document :: String }
  | TableLocation { table :: String }

derive instance Eq JobLocation
derive instance Generic JobLocation _
instance showJobLocation :: Show JobLocation where
  show a = genericShow a

instance DecodeJson JobLocation where
  decodeJson json = do
    obj <- decodeJson json
    
    -- Check which fields are present
    maybeCollection :: Maybe String <- obj .:? "collection"
    maybeDocument :: Maybe String <- obj .:? "document" 
    maybeTable :: Maybe String <- obj .:? "table"
    
    let hasCollection = isJust maybeCollection
    let hasDocument = isJust maybeDocument
    let hasTable = isJust maybeTable
    
    case hasCollection, hasDocument, hasTable of
      true, true, false -> do
        collection <- obj .: "collection"
        document <- obj .: "document"
        pure $ DocumentLocation { collection, document }
        
      true, false, false -> do
        collection <- obj .: "collection"
        pure $ CollectionLocation { collection }
        
      false, false, true -> do
        table <- obj .: "table"
        pure $ TableLocation { table }
        
      _, _, _ -> Left $ AtKey "structure" $ UnexpectedValue json

parseJobLocation :: String -> Either String JobLocation
parseJobLocation jsonStr = jsonParser jsonStr >>= (decodeJson >>> bimap printJsonDecodeError identity)

-- Example usage
examples :: Effect Unit
examples = do
  let collectionJson = """{"collection": "users"}"""
  case parseJobLocation collectionJson of
    Left err -> log $ "Collection parse error: " <> err
    Right location -> do
      log $ "Parsed CollectionLocation: " <> show location
  
  let documentJson = """{"collection": "users", "document": "user123"}"""
  case parseJobLocation documentJson of
    Left err -> log $ "Document parse error: " <> err
    Right location -> do
      log $ "Parsed DocumentLocation: " <> show location
  
  let tableJson = """{"table": "analytics"}"""
  case parseJobLocation tableJson of
    Left err -> log $ "Table parse error: " <> err
    Right location -> do
      log $ "Parsed TableLocation: " <> show location
  
  let invalidJson = """{"data": "something"}"""
  case parseJobLocation invalidJson of
    Left err -> log $ "Expected error: " <> err
    Right location -> log $ "Unexpected success: " <> show location

main :: Effect Unit
main = do
  examples

While this code is type safe and will handle all of the edge cases just perfectly, I find it pretty hard to read. Parts that I would be unable to understand in a month of not working with this code, like this one:

parseJobLocation jsonStr = jsonParser jsonStr >>= (decodeJson >>> bimap printJsonDecodeError identity)

It could be rewritten in an imperative way:

parseJobLocation jsonStr = do
  json <- jsonParser jsonStr
  bimap printJsonDecodeError identity (decodeJson json)

But I doubt this makes it any more readable - the bimap printJsonDecodeError identity (decodeJson json) part, specifically. This is a common issue with Haskell and PureScript - some of the functions are weirdly named and can only be understood either with experience or by reading the docs (which, in turn, are also quite cryptic). Take the bimap function for example. Its signature looks like this:

bimap :: forall a b c d. (a -> b) -> (c -> d) -> f a c -> f b d

This does not really help non-seasoned Haskeller.

The docs only say this:

The bimap function maps a pair of functions over the two type arguments of the bifunctor.

I had to use it since there are two function calls in parseJobLocation: jsonParser :: String -> Either String Json and decodeJson :: Json -> Either JsonDecodeError a. This means if you just pipe the output of the first to the second (jsonParser jsonStr >>= decodeJson), the return type should match - because of the nature of the >>= operator: >>= :: (b -> c) -> m a b -> m a c, meaning for Either a b and a function b -> c the return type would have to be Either a c, in our case (Json -> a) -> (Either String Json) should return Either String a, but decodeJson returns Either JsonDecodeError a instead. Effectively, >>= operates on the second argument of a functor (in case of Either a b, operator >>= changes b), whereas we want to change it first and then change the first argument as well (making the chain Either String Json -> Either JsonDecodeError a -> Either String a). But this has to be done in one go - or, rather, in one argument of >>=. We can create such function by combining decodeJson which returns Either JsonDecodeError a and another function, which converts that middle argument, JsonDecodeError into String, making it Either String a. We achieve this by using the >>> operator which is an alias for composeFlipped and is defined as composeFlipped :: a b c -> a c d -> a b d. In fact, instead of using bimap _ identity _ it is better to use lmap _ _ which would only apply function to the left side of Either:

parseJobLocation jsonStr = do
  json <- jsonParser jsonStr
  lmap printJsonDecodeError (decodeJson json)

Explanations like those make it a huge barrier to entry for newcomers.

Incorporating the above solution in an existing TypeScript project might look a tad cumbersome. For Vite, there is a PureScript plugin:

import { defineConfig } from 'vite';
import purescript from 'vite-plugin-purescript';

export default defineConfig({
  plugins: [
    // ...
    purescript(),
  ],
});

The tricky part is that we have to re-define types in TypeScript:

type CollectionLocation = { collection: string };
type DocumentLocation = { collection: string; document: string };
type TableLocation = { table: string };

type JobLocation = CollectionLocation | DocumentLocation | TableLocation;

Moreover, since the function returns an Either, we would need that one too:

interface Left<E> {
  readonly _tag: 'Left';
  readonly value0: E;
}

interface Right<A> {
  readonly _tag: 'Right';
  readonly value0: A;
}

type Either<E, A> = Left<E> | Right<A>;

And then importing the function and calling it from TypeScript:

import { parseJobLocation } from '../output/JobLocation';

const loc = parseJobLocation('{ "collection": "col", "document": "doc" }');

if (loc._tag === 'Right') {
    // ...
}

Quite the bother, right? That's why it is most beneficial if the entire critical section can be written in PureScript entirely. In my case, the JobLocation is used for displaying a corresponding UI element (in React), so there's little benefit at a cost of lots of boilerplate and not the best experience defining those parsers.

I personally prefer Scala 3 implementation (not the integration with TypeScript part though, just the JSON parser implementation):

import io.circe.*
import io.circe.parser.*
import cats.implicits.*
import cats.effect.{IO, IOApp}
import io.circe.generic.semiauto.*

enum JobLocation:
  case CollectionLocation(collection: String)
  case DocumentLocation(collection: String, document: String)
  case TableLocation(table: String)

object JobLocation:
  given Decoder[DocumentLocation] = deriveDecoder
  given Decoder[CollectionLocation] = deriveDecoder
  given Decoder[TableLocation] = deriveDecoder

object JobLocationApp extends IOApp.Simple:

  import JobLocation.*

  def parseJobLocation(jsonString: String): Either[Error, JobLocation] =
    parse(jsonString).flatMap { json =>
      json.as[TableLocation].widen[JobLocation] orElse
      json.as[DocumentLocation].widen[JobLocation] orElse
      json.as[CollectionLocation].widen[JobLocation]
    }

  def run: IO[Unit] =
    val testCases = List(
      """{"collection": "users"}""",
      """{"collection": "orders", "document": "order-123"}""",
      """{"table": "analytics"}""",
      """{"invalid": "data"}"""
    )

    testCases.traverse_ { jsonStr =>
      parseJobLocation(jsonStr) match
        case Right(location) =>
          IO.println(s"Parsed: $jsonStr -> $location")

        case Left(error) =>
          IO.println(s"Failed to parse: $jsonStr -> ${error.getMessage}")
    }

This is a really straightforward implementation in my opinion. In this example, the parsing is literally boiled down to "try parsing TableLocation and if it returns Left, try parsing DocumentLocation instead, and if that returns Left, try parsing CollectionLocation, otherwise return what you got".

Alternatively, circe, the JSON parsing library for scala-cats used in the example above, provides an even neater way:

object JobLocation:
  given Decoder[DocumentLocation] = deriveDecoder
  given Decoder[CollectionLocation] = deriveDecoder
  given Decoder[TableLocation] = deriveDecoder

  given Decoder[JobLocation] =
    summon[Decoder[TableLocation]].widen[JobLocation] or
    summon[Decoder[DocumentLocation]].widen[JobLocation] or
    summon[Decoder[CollectionLocation]].widen[JobLocation]

def parseJobLocation(jsonString: String): Either[Error, JobLocation] =
    parse(jsonString).flatMap(_.as[JobLocation])

In this code, the parser for the parent type, JobLocation is defined as a combined parser of whichever case class manages to get parsed first:

summon[Decoder[TableLocation]].widen[JobLocation] or
summon[Decoder[DocumentLocation]].widen[JobLocation] or
summon[Decoder[CollectionLocation]].widen[JobLocation]

Just to reiterate, I do understand that converting the application (and developers) to this new weird technology is an almost impossible task, especially in a large long-lived project. One way to reason about it and justify the transition is the resilience requirements of a project (the need for actually error-prone code) and the amount of time and effort spent to date on finding and fixing those nasty bugs and undefined behaviours in an application.

IO impact

10 May 2024

At MongoDB I work on a Relational Migrator project - a tool which helps people migrate their relational database to MongoDB. And recently we grew interested in the performance of our tool. Due to the nature of the migrations, they are usually extremely long (potentially even never ending, for some scenarios). It is a rather valuable information to know where we can speed things up.

Hence we ran a profiler on a relatively big database of 1M rows. And this was what we saw:

The handleBatch method is where the meat and potatoes of our migration logic reside. It lasts for approx. 6.5 sec. We could have debated on which parts of this flame graph we could optimize (and we actually did), but we first decided to take a quick look at the same graph from the higher level - not the CPU time (when CPU is actually doing the active work) but the total time:

The entire application run took 4,937 sec (1hr 22min 17sec). Of which, the migration itself took only 130 sec:

The biggest chunk of it was writing to MongoDB database at 120 sec:

The actual migration logic is really just 3.5 sec:

So out of 130 sec of the actual migration run, the actual logic took 3.5 sec or mere 2.69%. The rest was just IO (input/output). Which we also saw on the thread timeline:

Most time all the threads spent sleeping.

This is not new information, just a reminder that the slowest part of pretty much any application is input-output.

IntelliJ Idea, baseline

NeoVim

Helix

Kakoune

Final thoughts

Contents

Standard library iterators

Caching impact

Conclusion

Contents

Implementation steps

What is Bun

Bun is ridiculously fast

Some things work out of the box

Plugins

svgr plugin

vanilla-extract plugin

sentry plugin

Bundling

Configuration

Running tests

react-testing-library

Browser APIs

test.each

Mocks

ObjectContaining matchers

Using Fragment import alongside <>

Ace editor

Bun's meat and potatoes

Bottom line

Contents

`svgr` plugin

`vanilla-extract` plugin

`sentry` plugin

`react-testing-library`

`test.each`

`ObjectContaining` matchers

Using `Fragment` import alongside `<>`