Gantt chart. Part 4

Seems like every two years or so I hop on my Gantt chart implementation and rework it completely.

Last few attempts (rev. 1, rev. 2, rev. 3) were alright, but I was never quite satisfied with the implementation - be it SVG, which has a toll on a browser and has quite limited customization functionality or Canvas API, with same limited customization but being fast.

With the recent introduction of grid layouts in CSS, now supported in all browsers, now seems like a perfect time to revisit the old implementations once again:

Gantt chart, revision 4

CodeSandbox / live demo

This revision now has a proper horizontal scrolling on the panel with bars - meaning the labels on the left panel stay in place whilst the left panel is scrollable. Moreover, the chart is now relies on pure HTML and CSS (being rendered with React though), making it is possible to use rich markup inside the bars and labels.

Implementation steps

The data for the tests is going to look like this:

export const data = [
  {
    id: 1,
    name: "epic 1"
  },
  {
    id: 2,
    name: "epic 2"
  },
  {
    id: 3,
    name: "epic 3"
  },
  {
    id: 4,
    name: "story 1",
    parent: 1
  },
  {
    id: 5,
    name: "story 2",
    parent: 1
  },
  {
    id: 6,
    name: "story 3",
    parent: 1
  },
  {
    id: 7,
    name: "story 4",
    parent: 2
  },
  {
    id: 8,
    name: "story 5",
    parent: 2
  },
  {
    id: 9,
    name: "lorem ipsum dolor atata",
    parent: 5
  },
  {
    id: 10,
    name: "task 2",
    parent: 5
  }
];

The main component, <Gantt>, initially was implementated as follows:

import React, { useMemo } from "react";

import style from "./gantt.module.css";

const LeftPaneRow = ({ id, name }) => {
  return <div className={style.row}>{name}</div>;
};

const LeftPane = ({ items }) => {
  return (
    <div className={style.left_pane}>
      <div className={style.left_pane_header}>/</div>

      <div className={style.left_pane_rows}>
        {items.map((item) => (
          <LeftPaneRow key={item.id} {...item} />
        ))}
      </div>
    </div>
  );
};

const RightPaneRow = ({ id, name }) => {
  return (
    <div className={style.row}>
      <div className={style.entry} style={{ left: 0 }}>
        {id}
      </div>
    </div>
  );
};

const RightPane = ({ items }) => {
  return (
    <div className={style.right_pane}>
      <div className={style.right_pane_header}>...scale...</div>
      <div className={style.right_pane_rows}>
        {items.map((item) => (
          <RightPaneRow key={item.id} {...item} />
        ))}
      </div>
    </div>
  );
};

export const flattenTree = (items) => {
  const queue = [];

  items.filter(({ parent }) => !parent).forEach((item) => queue.push(item));

  const result = [];
  const visited = new Set();

  while (queue.length > 0) {
    const item = queue.shift();

    if (visited.has(item.id)) {
      continue;
    }

    result.push(item);
    visited.add(item.id);

    items
      .filter((child) => child.parent === item.id)
      .forEach((child) => queue.unshift(child));
  }

  return result;
};

export const Gantt = ({ items }) => {
  const itemList = useMemo(() => flattenTree(items), [items]);

  return (
    <div className={style.gantt}>
      <LeftPane items={itemList} />
      <RightPane items={itemList} />
    </div>
  );
};

The core of the proper representation of this diagram is the CSS:

.gantt {
  display: grid;
  grid-template: 1fr / auto 1fr;
  grid-template-areas: "left right";
  width: 100%;
}

.gantt .left_pane {
  display: grid;
  grid-area: left;
  border-right: 1px solid #bbb;
  grid-template: auto 1fr / 1fr;
  grid-template-areas: "corner" "rows";
}

.gantt .left_pane .left_pane_rows {
  display: grid;
  grid-area: rows;
}

.gantt .left_pane .left_pane_header {
  display: grid;
  grid-area: corner;
}

.gantt .right_pane {
  display: grid;
  grid-template: auto 1fr / 1fr;
  grid-template-areas: "scale" "rows";
  grid-area: right;
  overflow: auto;
}

.gantt .right_pane .right_pane_rows {
  width: 10000px; /*temp*/
  display: grid;
  grid-area: rows;
}

.gantt .right_pane .right_pane_header {
  display: flex;
  grid-area: scale;
}

.gantt .row {
  height: 40px;
  align-items: center;
  display: flex;
}

.gantt .right_pane .row {
  position: relative;
}

.gantt .right_pane .row .entry {
  position: absolute;
  background: #eeeeee;
  padding: 0.1rem 0.5rem;
  border-radius: 0.4rem;
}
Split into two panels, right is scrollable

Good, we now have two panels with items aligned in rows and the right panel being scrollable if it gets really long. Next thing, position: absolute is absolutely disgusting - we use grid layout already! Instead, split each row into the same number of columns using grid and position the elements in there:

const RightPaneRow = ({ id, name, columns, start, end }) => {
  const gridTemplate = `auto / repeat(${columns}, 1fr)`;
  const gridArea = `1 / ${start} / 1 / ${end}`;

  return (
    <div
      className={style.row}
      style={{
        gridTemplate,
      }}
    >
      <div
        className={style.entry}
        style={{
          gridArea,
        }}
      >
        {id}
      </div>
    </div>
  );
};

and clean up the CSS a bit (like removing the position: absolute and reducing the width from 10000px down to 1000px):

.gantt .right_pane .right_pane_rows {
  width: 1000px; /*temp*/
  display: grid;
  grid-area: rows;
}

.gantt .row {
  height: 40px;
  align-items: center;
  display: grid;
}

.gantt .right_pane .row {
  position: relative;
}

.gantt .right_pane .row .entry {
  background: #eeeeee;
  padding: 0.1rem 0.5rem;
  border-radius: 0.4rem;
}

Now, let’s position the elements in each row using the column index:

const RightPanelRowEntry = ({ id, start, end, children }) => {
  const gridArea = `1 / ${start} / 1 / ${end}`;

  return (
    <div
      className={style.entry}
      style={{
        gridArea,
      }}
    >
      {children}
    </div>
  );
};

const RightPaneRow = ({ id, name, columns, start, end }) => {
  const gridTemplate = `auto / repeat(${columns}, 1fr)`;
  const gridArea = `1 / ${start} / 1 / ${end}`;

  return (
    <div
      className={style.row}
      style={{
        gridTemplate,
      }}
    >
      <div
        className={style.entry}
        style={{
          gridArea,
        }}
      >
        {id}
      </div>
    </div>
  );
};

const RightPaneHeaderRow = ({ columns, children }) => {
  const gridTemplate = `auto / repeat(${columns}, 1fr)`;

  return (
    <div
      className={style.right_pane_header_row}
      style={{
        gridTemplate,
      }}
    >
      {children}
    </div>
  );
};

const RightPaneHeader = ({ children }) => {
  return <div className={style.right_pane_header}>{children}</div>;
};

const RightPane = ({ items, columns }) => {
  const columnHeaders = [...Array(columns)].map((_, idx) => (
    <RightPaneHeader>{idx + 1}</RightPaneHeader>
  ));

  const rows = items.map((item) => (
    <RightPaneRow key={item.id} columns={columns}>
      <RightPanelRowEntry {...item}>{item.id}</RightPanelRowEntry>
    </RightPaneRow>
  ));

  return (
    <div className={style.right_pane}>
      <RightPaneHeaderRow columns={columns}>{columnHeaders}</RightPaneHeaderRow>
      <div className={style.right_pane_rows}>{rows}</div>
    </div>
  );
};

And add corresponding new CSS styles:

.gantt .right_pane .right_pane_header_row {
  display: grid;
  grid-area: scale;
}

.gantt .right_pane .right_pane_header_row .right_pane_header {
  display: grid;
  align-items: center;
  text-align: center;
}

This requires start and end defined for each entry:

export const data = [
  {
    id: 1,
    name: "epic 1",
    start: 1,
    end: 12,
  },
  {
    id: 2,
    name: "epic 2",
    start: 2,
    end: 4,
  },
  {
    id: 3,
    name: "epic 3",
    start: 9,
    end: 11,
  },
  {
    id: 4,
    name: "story 1",
    parent: 1,
    start: 6,
    end: 7,
  },
  // ...
};
Aligning items in a grid

And, to make it not repeat a dozen of inline CSS styles, we can utilize CSS variables:

const RightPaneRow = ({ id, columns, children }) => {
  return (
    <div className={style.row}>
      {children}
    </div>
  );
};

const RightPanelRowEntry = ({ id, start, end, children }) => {
  return (
    <div
      className={style.entry}
      style={{
        "--col-start": start,
        "--col-end": end,
      }}
    >
      {children}
    </div>
  );
};

const RightPane = ({ items, columns }) => {
  const columnHeaders = [...Array(columns)].map((_, idx) => (
    <RightPaneHeader>{idx + 1}</RightPaneHeader>
  ));

  const rows = items.map((item) => (
    <RightPaneRow key={item.id} columns={columns}>
      <RightPanelRowEntry {...item}>{item.id}</RightPanelRowEntry>
    </RightPaneRow>
  ));

  return (
    <div className={style.right_pane} style={{ "--columns": columns }}>
      <RightPaneHeaderRow>{columnHeaders}</RightPaneHeaderRow>
      <div className={style.right_pane_rows}>{rows}</div>
    </div>
  );
};

We can also re-use the same row for header:

const RightPaneHeaderRow = ({ children }) => {
  return <div className={style.right_pane_header_row}>{children}</div>;
};

And corresponding CSS:

.gantt .right_pane .right_pane_header_row {
  display: grid;
  grid-area: scale;

  grid-template: auto / repeat(var(--columns, 1), 1fr);
}

.gantt .right_pane .row {
  position: relative;

  grid-template: auto / repeat(var(--columns, 1), 1fr);
}

.gantt .right_pane .row .entry {
  background: #eeeeee;
  padding: 0.1rem 0.5rem;
  border-radius: 0.5rem;
  align-items: center;
  text-align: center;

  grid-area: 1 / var(--col-start, 1) / 1 / var(--col-end, 1);
}

I like to also change the fonts, since the default sans-serif just looks terrible:

@import url("https://fonts.googleapis.com/css2?family=Assistant:wght@200..800&display=swap");

:root {
  font-family: "Assistant", sans-serif;
  font-optical-sizing: auto;
  font-weight: 300;
  font-style: normal;
  font-variation-settings: "wdth" 100;
}
Changing the font

And maybe add some grid lines for the rows:

.gantt .row:first-child {
  border-top: 1px solid var(--border-color, #eee);
}

.gantt .row {
  padding: 0 0.75rem;
  border-bottom: 1px solid var(--border-color, #eee);
}
Adding grid lines

Now let’s add some padding to separate parent and child items of a chart:

const LeftPaneRow = ({ level, id, name }) => {
  const nestingPadding = `${level}rem`;

  return (
    <div className={style.row} style={{ "--label-padding": nestingPadding }}>
      {name}
    </div>
  );
};
.gantt .left_pane .row {
  padding-left: var(--label-padding, 0);
}

and fill out the level property when flattening the item tree:

export const flattenTree = (items) => {
  const queue = [];

  items
    .filter(({ parent }) => !parent)
    .forEach((item) => queue.push({ level: 0, item }));

  const result = [];
  const visited = new Set();

  while (queue.length > 0) {
    const { level, item } = queue.shift();

    if (visited.has(item.id)) {
      continue;
    }

    result.push({ ...item, level });
    visited.add(item.id);

    items
      .filter((child) => child.parent === item.id)
      .forEach((child) => queue.unshift({ item: child, level: level + 1 }));
  }

  return result;
};
Shrinking right panel

And automate the number of columns calculation:

export const Gantt = ({ items }) => {
  const itemList = flattenTree(items);

  const startsAndEnds = items.flatMap(({ start, end }) => [start, end]);
  const columns = Math.max(...startsAndEnds) - Math.min(...startsAndEnds);

  return (
    <div className={style.gantt}>
      <LeftPane items={itemList} />
      <RightPane items={itemList} columns={columns} />
    </div>
  );
};

In order to make chart panel scrollable, one can set a width CSS property for the .right_pane_rows and .right_pane_header_row:

.gantt .right_pane .right_pane_rows {
  width: 2000px;
}

.gantt .right_pane .right_pane_header_row {
  width: 2000px;
}

The last bit for a this prototype would be to have a scale for the columns.

Assume a chart item has an abstract start and end fields - these could be dates or some domain-specific numbers (like a week in a quarter or a sprint, etc.). Those will then need to be mapped onto column index. Then the chart width (in columns) would be the difference between the smallest start value and the biggest end value:

export const Gantt = ({ items, scale }) => {
  const itemList = flattenTree(items).map((item) => ({
    ...item,
    ...scale(item), // assuming `scale` function returns an object { start: number; end: number }
  }));

  const minStartItem = minBy(itemList, (item) => item.start);
  const maxEndItem = maxBy(itemList, (item) => item.end);

  const columns = maxEndItem.end - minStartItem.start;

  return (
    <div className={style.gantt}>
      <LeftPane items={itemList} />
      <RightPane items={itemList} columns={columns} />
    </div>
  );
};

The minBy and maxBy helper functions could be either taken from lodash or manually defined like this:

const minBy = (items, selector) => {
  if (items.length === 0) {
    return undefined;
  }

  let minIndex = 0;

  items.forEach((item, index) => {
    if (selector(item) < selector(items[minIndex])) {
      minIndex = index;
    }
  });

  return items[minIndex];
}

For better navigation around this code we can add some types:

interface GanttChartItem {
  id: string;
  name: string;
}

interface GanttChartProps {
  items: GanttChartItem[];
  scale: (item: GanttChartItem) => { start: number; end: number };
}

function minBy<T>(items: T[], selector: (item: T) => number): T | undefined {
  // ...
}

export const Gantt = ({ items, scale }: GanttChartProps) => {
  // ...
};

export default function App() {
  const scale = ({ start, end }) => {
    return { start: start * 2, end: end * 2 };
  };

  return <Gantt items={data} scale={scale} />;
}

We can extend this even further by adding an API to provide labels for columns:

interface GanttChartProps {
  // ...
  scaleLabel: (column: number) => React.Element;
}

export const Gantt = ({ items, scale, scaleLabel }: GanttChartProps) => {
  // ...

  return (
    <div className={style.gantt}>
      <LeftPane items={itemList} />
      <RightPane items={itemList} columns={columns} scaleLabel={scaleLabel} />
    </div>
  );
};


const RightPane = ({ items, columns, scaleLabel }) => {
  const columnHeaders = [...Array(columns)].map((_, idx) => (
    <RightPaneHeader>{scaleLabel(idx)}</RightPaneHeader>
  ));

  // ...
};

export default function App() {
  const scale = ({ start, end }) => ({ start, end });
  };

  const scaleLabel = (col) => `${col}`;

  return <Gantt items={data} scale={scale} scaleLabel={scaleLabel} />;
}

This new API can then be utilized to show month names, for instance:

export default function App() {
  const scale = ({ start, end }) => {
    return { start, end };
  };

  const months = [
    "Jan",
    "Feb",
    "Mar",
    "Apr",
    "May",
    "Jun",
    "Jul",
    "Aug",
    "Sep",
    "Oct",
    "Nov",
    "Dec",
  ];

  const scaleLabel = (col) => months[col % 12];

  return <Gantt items={data} scale={scale} scaleLabel={scaleLabel} />;
}
Adding scale with labels

Moreover, it is now possible to inline HTML and CSS in the name of each chart item:

export const LeftPaneRow = ({ level, name }) => {
  const nestingPadding = `${level}rem`;

  return (
    <div className={style.row} style={{ "--label-padding": nestingPadding }}>
      <span dangerouslySetInnerHTML={{__html: name}}></span>
    </div>
  );
};

And then in data.json (note that FontAwesome requires its CSS on a page in order to work):

[
  {
    id: 7,
    name: '<i style="font-family: \'FontAwesome\';" class="fa fa-car"></i>&nbsp;story with FontAwesome',
    parent: 2,
    start: 4,
    end: 6,
  },
  {
    id: 9,
    name: 'inline <em><b style="color: #5ebebe">CSS</b> color</em> <u style="border: 1px dashed #bebefe; padding: 2px; border-radius: 2px">works</u>',
    parent: 5,
    start: 5,
    end: 6,
  },
]

The API can be further improved by providing the render function for the bars’ labels:

export const RightPane = ({ items, columns, scaleLabel, barLabel }) => {
  const rows = items.map((item) => (
    <RightPaneRow key={item.id} columns={columns}>
      <RightPaneRowEntry {...item}>{barLabel ? barLabel(item) : <>{item.id}</>}</RightPaneRowEntry>
    </RightPaneRow>
  ));

  // ...
};

export interface GanttChartProps {
  items: GanttChartItem[];
  scale: (item: GanttChartItem) => { start: number; end: number };
  scaleLabel: (column: number) => React.Element;
  barLabel: (item: GanttChartItem) => React.Element;
}

export const Chart = ({
  items,
  barLabel,
  scale,
  scaleLabel,
}: GanttChartProps) => {
  // ...
  return (
    <div className={style.gantt}>
      <LeftPane items={itemList} />
      <RightPane
        items={itemList}
        columns={columns}
        scaleLabel={scaleLabel}
        barLabel={barLabel}
      />
    </div>
  );
};

and then in App component:

import pluralize from "pluralize";

export default function App() {
  const barLabel = ({ start, end }) => (
    <>
      {end - start} {pluralize("month", end - start)}
    </>
  );

  // ...

  return (
    <Chart
      items={data}
      scale={scale}
      scaleLabel={scaleLabel}
      barLabel={barLabel}
    />
  );
};
More customized labels

Strongly-typed front-end: experiment 2, simple application, in PureScript / Halogen

Contents

  1. Introduction
  2. Experiment 1, darken_color
  3. Experiment 2, simple application

A more “conventional” way to implement the front-end application in PureScript would be using a framework called Halogen.

Starting off with a “hello world” example:

module Main where

import Prelude

import Effect (Effect)
import Halogen as H
import Halogen.Aff as HA
import Halogen.HTML as HH
import Halogen.HTML.Events as HE
import Halogen.VDom.Driver (runUI)

main :: Effect Unit
main = HA.runHalogenAff do
  body <- HA.awaitBody
  runUI component unit body

data Action = Increment | Decrement

component =
  H.mkComponent
    { initialState
    , render
    , eval: H.mkEval $ H.defaultEval { handleAction = handleAction }
    }
  where
  initialState _ = 0

  render state =
    HH.div_
      [ HH.button [ HE.onClick \_ -> Decrement ] [ HH.text "-" ]
      , HH.div_ [ HH.text $ show state ]
      , HH.button [ HE.onClick \_ -> Increment ] [ HH.text "+" ]
      ]

  handleAction = case _ of
    Increment -> H.modify_ \state -> state + 1
    Decrement -> H.modify_ \state -> state - 1

Adding the utility code akin to the other technologies:

data Shape = Circle | Square

calculateArea :: Maybe Shape -> Float -> Float
calculateArea Nothing _ = 0
calculateArea (Just Circle) value = pi * value * value
calculateArea (Just Square) value = value * value

getShape :: String -> Maybe Shape
getShape "circle" = Just Circle
getShape "square" = Just Square
getShape _ = Nothing

This resurfaces few differences from Haskell, Elm and others:

  • there is no pi constant in the Prelude, so need to import one of the available definitions, I went with Data.Number
  • Float is not a type; there is Number, however
  • 0 is not a Number, it is Int, confusing the audience

These are all minor differences, however. But this code is not a conventional PureScript either - it is working against the good practices of functional programming and thus defeats the purpose of these experiments. Examples of this are the heavy reliance on String instead of using the available type system.

Let us change that a bit:

import Data.String.Read (class Read)

data Shape = Circle | Square

calculateArea :: Shape -> Number -> Number
calculateArea Circle value = pi * value * value
calculateArea Square value = value * value

instance Read Shape where
  read = case _ of
    "square" -> Just Square
    "circle" -> Just Circle
    _ -> Nothing

instance Show Shape where
  show = case _ of
    Square -> "square"
    Circle -> "circle"

Now, to the UI:

import Halogen.HTML.Properties as HP

render state =
  HH.div_
    [
      HH.select [] [
        HH.option [ HP.value "" ] [ HH.text "Select shape" ],
        HH.option [ HP.value (show Circle) ] [ HH.text (show Circle) ],
        HH.option [ HP.value (show Square) ] [ HH.text (show Square) ]
      ],
      HH.input [],
      HH.div_ [ HH.text "<area>" ]
    ]

In the application state we need to store the selected shape and the value, so we can utilize records for that:

initialState _ = { shape: Nothing, value: Nothing }

Then we need to modify the possible actions. Let’s stick to the same approach of utilizing the type system:

data Action = ChangeValue (Maybe Number) | ChangeShape (Maybe Shape)

The thing glueing the two together is the handleAction function:

handleAction = case _ of
  ChangeValue value ->
    H.modify_ \state -> state { value = value }
  ChangeShape shape ->
    H.modify_ \state -> state { shape = shape }

Here, unlike Haskell (to my best knowledge), the placeholder variable is being used for pattern matching against the only function argument. So instead of a little verbose

handleAction action = case action of
  -- ...

you can use this placeholder variable and just provide the branches for each of its possible values:

handleAction = case _ of
  -- ...

Modifying the state is done using the Halogen.Hooks.HookM.modify_ function, which allows us to only use the previous state value and provide a new state value, without the need to mess with monads. In turn, we modify the state record using the record syntax:

state { shape = newShapeValue }

Now the only bit left is tying the UI with the actions:

import Halogen.HTML.Events as HE
import Data.String.Read (read)
import Data.Number as N
import Data.Tuple (Tuple(..))

render state =
  HH.div_
    [
      HH.select [ HE.onValueChange onShapeChanged ] [
        HH.option [ HP.value "" ] [ HH.text "Select shape" ],
        HH.option [ HP.value (show Circle) ] [ HH.text (show Circle) ],
        HH.option [ HP.value (show Square) ] [ HH.text (show Square) ]
      ],
      HH.input [ HE.onValueChange onValueChanged ],
      HH.div_ [ HH.text "<area>" ]
    ]

onShapeChanged v = ChangeShape (read v)

onValueChanged v = ChangeValue (N.fromString v)

showArea state =
  case res of
    Nothing ->
      HH.text "Choose shape and provide its parameter"

    Just (Tuple shape area) ->
      HH.text $ "Area of " <> (show shape) <> " is " <> (show area)

  where
    res = do
      shape <- state.shape
      value <- state.value
      let area = calculateArea shape value
      pure (Tuple shape area)

Here is where most fun and benefit from using PureScript comes into play.

First of all, the HE.onValueChange event handler (the onShapeChanged and onValueChanged functions) - it will be called with the new value for the input instead of an entire event object. This allows us to skip unpacking the raw value from that object.

Then, the action dispatchers take the value from the input and try to parse it, returning a Maybe a:

onShapeChanged :: String -> Maybe Shape
onShapeChanged v = ChangeShape (read v)

onValueChanged :: String -> Maybe Number
onValueChanged v = ChangeValue (N.fromString v)

It is actually a quite important part, since the shape might not be selected (making the <select> value an empty string) and the value might be either a blank string or not a valid number string. PureScript does not allow us to not handle these cases, so whenever we parse the user input, we get a Maybe a value and we have to handle both scenarios when the value is valid and when it is not.

The function showArea is where this neatness comes together - we handle both values as one, using the Data.Tuple type to pair them together:

res = do
  shape <- state.shape -- unpacks `Shape` from `Maybe Shape`
  value <- state.value -- unpacks `Number` from `Maybe Number`
  let area = calculateArea shape value -- always returns a Number, since both `shape` and `value` are always provided
  pure (Tuple shape area) -- returns a tuple of shape and area, packed in a `Maybe`

The above code will shortcircuit whenever at any point it is trying to unpack a value from a Nothing and the whole do block will return Nothing.

Putting it all together:

module Main where

import Prelude

import Data.Maybe (Maybe(..))
import Data.Number as N
import Data.String.Read (class Read, read)
import Data.Tuple (Tuple(..))
import Effect (Effect)
import Halogen as H
import Halogen.Aff as HA
import Halogen.HTML as HH
import Halogen.HTML.Events as HE
import Halogen.HTML.Properties as HP
import Halogen.VDom.Driver (runUI)

data Shape = Circle | Square

calculateArea :: Shape -> Number -> Number
calculateArea Circle value = N.pi * value * value
calculateArea Square value = value * value

instance Read Shape where
  read = case _ of
    "square" -> Just Square
    "circle" -> Just Circle
    _ -> Nothing

instance Show Shape where
  show = case _ of
    Square -> "square"
    Circle -> "circle"

data Action = ChangeValue (Maybe Number) | ChangeShape (Maybe Shape)

component =
  H.mkComponent
    { initialState
    , render
    , eval: H.mkEval $ H.defaultEval { handleAction = handleAction }
    }
  where
  initialState _ = { shape: Nothing, value: Nothing }

  render state =
    HH.div_
      [
        HH.select [ HE.onValueChange onShapeChanged ] [
          HH.option [ HP.value "" ] [ HH.text "Select shape" ],
          HH.option [ HP.value (show Circle) ] [ HH.text (show Circle) ],
          HH.option [ HP.value (show Square) ] [ HH.text (show Square) ]
        ],
        HH.input [ HE.onValueChange onValueChanged ],
        HH.div_ [ showArea state ]
      ]

  onShapeChanged v = ChangeShape (read v)

  onValueChanged v = ChangeValue (N.fromString v)

  showArea state =
    case res of
      Nothing ->
        HH.text "Select shape and provide its value"

      Just (Tuple shape area) ->
        HH.text $ "Area of " <> (show shape) <> " is " <> (show area)
    where
      res = do
        shape <- state.shape
        value <- state.value
        let area = calculateArea shape value
        pure (Tuple shape area)

  handleAction = case _ of
    ChangeValue value ->
      H.modify_ \state -> state { value = value }
    ChangeShape shape ->
      H.modify_ \state -> state { shape = shape }

main :: Effect Unit
main = HA.runHalogenAff do
  body <- HA.awaitBody
  runUI component unit body

IO impact

At MongoDB I work on a Relational Migrator project - a tool which helps people migrate their relational database to MongoDB. And recently we grew interested in the performance of our tool. Due to the nature of the migrations, they are usually extremely long (potentially even never ending, for some scenarios). It is a rather valuable information to know where we can speed things up.

Hence we ran a profiler on a relatively big database of 1M rows. And this was what we saw:

The handleBatch method is where the meat and potatoes of our migration logic reside. It lasts for approx. 6.5 sec. We could have debated on which parts of this flame graph we could optimize (and we actually did), but we first decided to take a quick look at the same graph from the higher level - not the CPU time (when CPU is actually doing the active work) but the total time:

The entire application run took 4,937 sec (1hr 22min 17sec). Of which, the migration itself took only 130 sec:

The biggest chunk of it was writing to MongoDB database at 120 sec:

The actual migration logic is really just 3.5 sec:

So out of 130 sec of the actual migration run, the actual logic took 3.5 sec or mere 2.69%. The rest was just IO (input/output). Which we also saw on the thread timeline:

Most time all the threads spent sleeping.

This is not new information, just a reminder that the slowest part of pretty much any application is input-output.

How unique are your bundles?

In the modern front-end development we are all used to package managers, transpilation and bundling. These concepts are the biproduct of the tools which we hoped would simplify developer’ burden and speed up the development.

However, how confident are you these tools are actually doing good job?

Developers seem comfortable off-loading the processing power to users’ machine (browser, predominantly). We are not surprised by seeing slow websites anymore. A simple blog is downloading 55MB of JavaScript? Seems fine nowadays.

I currently work on a fairly small tool (MongoDB Relational Migrator), which also utilizes TypeScript, React and, of course, bundling. We use Vite for that. Our bundles are split into chunks (but I have accounted for that, too).

I went ahead and wrote a rather simple script which parses the bundles (using TypeScript compiler API, because why not), extracting the function definitions (both arrow functions and plain old function) and counts how many times they occur in the file. For this last bit, to make sure I am not counting a => true and x => true as different occurrences, I am minimizing the function definition with uglifyjs and counting the SHA256 hashes of the minimized functions (just to have a reasonable key in my hashmap instead of entire function code).

These are my findings.

Out of 54 chunks, 47 are not css-in-js chunks. Out of 47 remaining, 7 have any significant duplication (over 5%). But when they do, they do it hard: duplication varies between 18% and a whopping 42% of sheer file size. Absolute numbers are also astonishing: 33% to 59% functions are duplicates.

Found 15192 functions, 8963 are unique (59%)
Duplicates length: 1518418 bytes out of 3537579 bytes are duplicate code (42.92%)

Found 1202 functions, 494 are unique (41.1%)
Duplicates length: 130649 bytes out of 340227 bytes are duplicate code (38.4%)

Found 513 functions, 231 are unique (45.03%)
Duplicates length: 50160 bytes out of 136057 bytes are duplicate code (36.87%)

Found 598 functions, 267 are unique (44.65%)
Duplicates length: 57607 bytes out of 164737 bytes are duplicate code (34.97%)

Found 17 functions, 10 are unique (58.82%)
Duplicates length: 1932 bytes out of 6532 bytes are duplicate code (29.58%)

Found 154 functions, 98 are unique (63.64%)
Duplicates length: 11140 bytes out of 45135 bytes are duplicate code (24.68%)

Found 968 functions, 651 are unique (67.25%)
Duplicates length: 52616 bytes out of 281406 bytes are duplicate code (18.7%)

I thought my code might be wrong, so I looked into the bundle code itself. Here’s a short excerpt:

Object.assign.bind():function(e){for(var t=1;t<arguments.length;t++){var n=arguments[t];for(var r in n)Object.prototype.hasOwnProperty.call(n,r)&&(e[r]=n[r])}return e},yR.apply(this,arguments)}var Zce;function bR(){return bR=Object.assign?Object.assign.bind():function(e){for(var t=1;t<arguments.length;t++){var n=arguments[t];for(var r in n)Object.prototype.hasOwnProperty.call(n,r)&&(e[r]=n[r])}return e},bR.apply(this,arguments)}var Wce;function wR(){return wR=Object.assign?Object.assign.bind():function(e){for(var t=1;t<arguments.length;t++){var n=arguments[t];for(var r in n)Object.prototype.hasOwnProperty.call(n,r)&&(e[r]=n[r])}return e},wR.apply(this,arguments)}var Uce;function $R(){return $R=Object.assign?Object.assign.bind():function(e){for(var t=1;t<arguments.length;t++){var n=arguments[t];for(var r in n)Object.prototype.hasOwnProperty.call(n,r)&&(e[r]=n[r])}return e},$R.apply(this,arguments)}var Gce;function OR(){return OR=Object.assign?Object.assign.bind():function(e){for(var t=1;t<arguments.length;t++){var n=arguments[t];for(var r in n)Object.prototype.hasOwnProperty.call(n,r)&&(e[r]=n[r])}return e},OR.apply(this,arguments)}var Kce;function xR(){return xR=Object.assign?Object.assign.bind():function(e){for(var t=1;t<arguments.length;t++){var n=arguments[t];for(var r in n)Object.prototype.hasOwnProperty.call(n,r)&&(e[r]=n[r])}return e},xR.apply(this,arguments)}

See how the following fragment of code repeats multiple times:

function(e){for(var t=1;t<arguments.length;t++){var n=arguments[t];for(var r in n)Object.prototype.hasOwnProperty.call(n,r)&&(e[r]=n[r])}return e}

In fact, this exact same fragment of code repeats 137 times in the same piece of bundle chunk (same file):

Repeated function definition in a single chunk of code

By the way, this is a production build of our front-end, built using Vite, with minification enabled.

The raw length of this function code is 146 characters. So in a single place, in a single file, you have 136 * 146 = 19_992 bytes of waste. Meaning, browser has to load these 20KB of code, parse it and create 136 duplicating functions.

Looking at the overall size of 3.5MB of code in this chunk and its insane 41% duplicated code (in sheer bytes, not occurrences, so 1.5MB wasted), imagine how much faster this single file could have been loaded in a browser.

I was keen on seeing what functions get duplicated most often and ran my script on an entire build output directory. Here are top 80-ish offenders:

Function code (minimized) Copies
function(r){for(var t=1;t<arguments.length;t++){var n,o=arguments[t];for(n in o)Object.prototype.hasOwnProperty.call(o,n)&&(r[n]=o[n])}return r} 2205
function n(){return Object.assign&&Object.assign.bind(),n.apply(this,arguments)} 1197
function n(){return Object.assign,n.apply(this,arguments)} 1008
function(){} 753
function(i){this.a=i} 250
function(n,e){if(null==n)return{};for(var r,t={},f=Object.keys(n),u=0;u<f.length;u++)r=f[u],0<=e.indexOf(r)||(t[r]=n[r]);return t} 191
function(e,r){if(null==e)return{};var t,n=function(e,r){if(null==e)return{};for(var t,n={},l=Object.keys(e),o=0;o<l.length;o++)t=l[o],0<=r.indexOf(t)||(n[t]=e[t]);return n}(e,r);if(Object.getOwnPropertySymbols)for(var l=Object.getOwnPropertySymbols(e),o=0;o<l.length;o++)t=l[o],0<=r.indexOf(t)||Object.prototype.propertyIsEnumerable.call(e,t)&&(n[t]=e[t]);return n} 187
function(e,r){return r=r||e.slice(0),Object.freeze(Object.defineProperties(e,{raw:{value:Object.freeze(r)}}))} 159
function(n){return this===n} 119
function(r,t){if("object"!=typeof r||null===r)return r;var e=r[Symbol.toPrimitive];if(void 0===e)return("string"===t?String:Number)(r);e=e.call(r,t||"default");if("object"!=typeof e)return e;throw new TypeError("@@toPrimitive must return a primitive value.")} 113
function(r,e,t){return i=function(r,e){if("object"!=typeof r||null===r)return r;var t=r[Symbol.toPrimitive];if(void 0===t)return String(r);t=t.call(r,e);if("object"!=typeof t)return t;throw new TypeError("@@toPrimitive must return a primitive value.")}(e,"string"),(e="symbol"==typeof i?i:String(i))in r?Object.defineProperty(r,e,{value:t,enumerable:!0,configurable:!0,writable:!0}):r[e]=t,r;var i} 111
function(t){t=function(t,r){if("object"!=typeof t||null===t)return t;var i=t[Symbol.toPrimitive];if(void 0===i)return String(t);i=i.call(t,r);if("object"!=typeof i)return i;throw new TypeError("@@toPrimitive must return a primitive value.")}(t,"string");return"symbol"==typeof t?t:String(t)} 111
function(e,n,r){return n in e?Object.defineProperty(e,n,{value:r,enumerable:!0,configurable:!0,writable:!0}):e[n]=r,e} 104
function(c,i){He.call(this,c,i)} 94
function(n,r){(null==r||r&gt;n.length)&&(r=n.length);for(var e=0,l=new Array(r);e<r;e++)l[e]=n[e];return l} 93
function(){return!0} 92
function(){return!1} 78
function(){return new gt(this)} 77
function(r){if(Array.isArray(r))return r} 77
function(){throw new TypeError(`Invalid attempt to destructure non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.`)} 77
function(t){return t&&"object"==typeof t&&"default"in t?t:{default:t}} 76
function(i,t){this.a=i,this.b=t} 58
function(){return this.a} 49
function(t,e){var r,n=Object.keys(t);return Object.getOwnPropertySymbols&&(r=Object.getOwnPropertySymbols(t),e&&(r=r.filter(function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable})),n.push.apply(n,r)),n} 49
function(e){var t;"default"!==e&&(t=Object.getOwnPropertyDescriptor(c,e),Object.defineProperty(y,e,t.get?t:{enumerable:!0,get:function(){return c[e]}}))} 49
function(){return c[b]} 49
function(a,e,i){var r,t=i["aria-label"],n=i["aria-labelledby"],c=i.title;switch(a){case"img":return t||n||c?(l(r={},"aria-labelledby",n),l(r,"aria-label",t),l(r,"title",c),r):{"aria-label":"".concat(e.replace(/([a-z])([A-Z])/g,"$1 $2")," Icon")};case"presentation":return{"aria-hidden":!0,alt:""}}} 49
function(i){Di(this,i)} 48
function(r){return Object.getOwnPropertyDescriptor(e,r).enumerable} 48
function(){throw M(new De)} 46
function(l,r){var t=null==l?null:typeof Symbol<"u"&&l[Symbol.iterator]||l["@@iterator"];if(null!=t){var n,u,e=[],a=!0,o=!1;try{for(t=t.call(l);!(a=(n=t.next()).done)&&(e.push(n.value),!r||e.length!==r);a=!0);}catch(l){o=!0,u=l}finally{try{a||null==t.return||t.return()}finally{if(o)throw u}}return e}} 44
function(n){throw M(new De)} 39
function(r){var n;return r&&"object"==typeof r&&"default"in r?r:(n=Object.create(null),r&&Object.keys(r).forEach(function(e){var t;"default"!==e&&(t=Object.getOwnPropertyDescriptor(r,e),Object.defineProperty(n,e,t.get?t:{enumerable:!0,get:function(){return r[e]}}))}),n.default=r,Object.freeze(n))} 39
function n(r){return n(r)} 38
function(n){return typeof n} 38
function(o){return o&&"function"==typeof Symbol&&o.constructor===Symbol&&o!==Symbol.prototype?"symbol":typeof o} 38
function(r){var n;return r&&r.__esModule?r:(n=Object.create(null),r&&Object.keys(r).forEach(function(e){var t;"default"!==e&&(t=Object.getOwnPropertyDescriptor(r,e),Object.defineProperty(n,e,t.get?t:{enumerable:!0,get:function(){return r[e]}}))}),n.default=r,Object.freeze(n))} 37
function(){return this.b} 33
function(l,r){var t=null==l?null:typeof Symbol<"u"&&l[Symbol.iterator]||l["@@iterator"];if(null!=t){var e,n,u,a,f=[],i=!0,o=!1;try{if(u=(t=t.call(l)).next,0===r){if(Object(t)!==t)return;i=!1}else for(;!(i=(e=u.call(t)).done)&&(f.push(e.value),f.length!==r);i=!0);}catch(l){o=!0,n=l}finally{try{if(!i&&null!=t.return&&(a=t.return(),Object(a)!==a))return}finally{if(o)throw n}}return f}} 33
function(t){Object.defineProperty(e,t,Object.getOwnPropertyDescriptor(n,t))} 32
function(n){return Ei(n)} 30
function(n){return L(fn,X,2,n,6,1)} 30
function(n){} 29
function(r){if(typeof Symbol<"u"&&null!=r[Symbol.iterator]||null!=r["@@iterator"])return Array.from(r)} 28
function(){throw new TypeError(`Invalid attempt to spread non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.`)} 28
()=>{} 27
function(){return null} 27
function(n,o){e.exports=o(m,on(),dn)} 27
function(i){Fi(this,i)} 23
function(n,o){e.exports=o(dn,on(),m)} 22
function(){throw M(new Fe(Re((xe(),Ds))))} 21
function(){return this} 20
function(i,t){this.b=i,this.a=t} 19
function(n,e){throw M(new De)} 18
function(){return 0} 17
function(r){Object.defineProperty(e,r,Object.getOwnPropertyDescriptor(t,r))} 16
()=>{var l;return null!=(l=e.options.debugAll)?l:e.options.debugHeaders} 16
function(n){return n} 15
function(c){Kr.call(this,c)} 15
function(){return this.c} 15
function(){return this.d} 15
function(n,r){return n} 14
d=>d.id 14
function(){var r=function(t,o){return(r=Object.setPrototypeOf||({__proto__:[]}instanceof Array?function(t,o){t.__proto__=o}:function(t,o){for(var n in o)Object.prototype.hasOwnProperty.call(o,n)&&(t[n]=o[n])}))(t,o)};return function(t,o){if("function"!=typeof o&&null!==o)throw new TypeError("Class extends value "+String(o)+" is not a constructor or null");function n(){this.constructor=t}r(t,o),t.prototype=null===o?Object.create(o):(n.prototype=o.prototype,new n)}} 14
function(_,o){_.__proto__=o} 14
function(o,r){for(var t in r)Object.prototype.hasOwnProperty.call(r,t)&&(o[t]=r[t])} 14
function(n){return!1} 13
function(t){var l=-1,n=null==t?0:t.length;for(this.clear();++l<n;){var r=t[l];this.set(r[0],r[1])}} 12
function(a,e,l){var i,r=l["aria-label"],t=l["aria-labelledby"],n=l.title;switch(a){case"img":return r||t||n?(f(i={},"aria-labelledby",t),f(i,"aria-label",r),f(i,"title",n),i):{"aria-label":"".concat(e.replace(/([a-z])([A-Z])/g,"$1 $2")," Icon")};case"presentation":return{"aria-hidden":!0,alt:""}}} 12
a=>a 12
a=>a() 12
function(n,a){n.a=a} 11
function(){ia.call(this)} 11
function(i,t,h){this.a=i,this.b=t,this.c=h} 11
function(n,r){return r} 11
function(){return this.a.gc()} 11
function(e){return e&&e.__esModule?e:{default:e}} 11
()=>n(!1) 11
function(i){this.b=i} 10
function(c){this.c=c} 10
function(){return this.f} 10
function(n){return n||"div"} 10
function(n,r,i){var l;return i=null!=(l=i)?l:"div",n||("string"==typeof(null==r?void 0:r.href)?"a":i)} 10

Let’s dive deeper, shall we?

Imagine for a second that we could define the duplicated functions once and then just reuse the short name instead (sounds reasonable, does it not?).

But not all of those functions could be de-duplicated in such way (at least not so easily). Some of these functions use the outer closure functions and variables (something defined outside of the function itself), so we can skip these. For instance, function(i){Di(this,i)} and function(){throw M(new De)} can be ignored.

Then, there are function using this. These might be tricky (this is not what you think is a famous JavaScript mantra).

Lastly, some (if not most) of the functions could be either replaced with arrow functions or standard library function. But that is uncertain - one must understand what a function does first.

With those points in mind, let’s look at the offenders once again:

Function code (minimized) Copies Notes
function(r){for(var t=1;t<arguments.length;t++){var n,o=arguments[t];for(n in o)Object.prototype.hasOwnProperty.call(o,n)&&(r[n]=o[n])}return r} 2205 spread operator?
function n(){return Object.assign&&Object.assign.bind(),n.apply(this,arguments)} 1197 Object.assign methods?
function n(){return Object.assign,n.apply(this,arguments)} 1008 Object.assign properties?
function(){} 753 no-op
function(n,e){if(null==n)return{};for(var r,t={},f=Object.keys(n),u=0;u<f.length;u++)r=f[u],0<=e.indexOf(r)||(t[r]=n[r]);return t} 191 ?
function(e,r){if(null==e)return{};var t,n=function(e,r){if(null==e)return{};for(var t,n={},l=Object.keys(e),o=0;o<l.length;o++)t=l[o],0<=r.indexOf(t)||(n[t]=e[t]);return n}(e,r);if(Object.getOwnPropertySymbols)for(var l=Object.getOwnPropertySymbols(e),o=0;o<l.length;o++)t=l[o],0<=r.indexOf(t)||Object.prototype.propertyIsEnumerable.call(e,t)&&(n[t]=e[t]);return n} 187 ?
function(e,r){return r=r||e.slice(0),Object.freeze(Object.defineProperties(e,{raw:{value:Object.freeze(r)}}))} 159 ?
function(r,t){if("object"!=typeof r||null===r)return r;var e=r[Symbol.toPrimitive];if(void 0===e)return("string"===t?String:Number)(r);e=e.call(r,t||"default");if("object"!=typeof e)return e;throw new TypeError("@@toPrimitive must return a primitive value.")} 113 ?
function(r,e,t){return i=function(r,e){if("object"!=typeof r||null===r)return r;var t=r[Symbol.toPrimitive];if(void 0===t)return String(r);t=t.call(r,e);if("object"!=typeof t)return t;throw new TypeError("@@toPrimitive must return a primitive value.")}(e,"string"),(e="symbol"==typeof i?i:String(i))in r?Object.defineProperty(r,e,{value:t,enumerable:!0,configurable:!0,writable:!0}):r[e]=t,r;var i} 111 ?
function(t){t=function(t,r){if("object"!=typeof t||null===t)return t;var i=t[Symbol.toPrimitive];if(void 0===i)return String(t);i=i.call(t,r);if("object"!=typeof i)return i;throw new TypeError("@@toPrimitive must return a primitive value.")}(t,"string");return"symbol"==typeof t?t:String(t)} 111 ?
function(e,n,r){return n in e?Object.defineProperty(e,n,{value:r,enumerable:!0,configurable:!0,writable:!0}):e[n]=r,e} 104 ?
function(n,r){(null==r||r>n.length)&&(r=n.length);for(var e=0,l=new Array(r);e<r;e++)l[e]=n[e];return l} 93 array spread?
function(){return!0} 92 always-true
function(){return!1} 78 always-false
function(r){if(Array.isArray(r))return r} 77 self explanatory
function(){throw new TypeError('Invalid attempt to destructure non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.')} 77 isSymbol?
function(t){return t&&"object"==typeof t&&"default"in t?t:{default:t}} 76 ?
function(t,e){var r,n=Object.keys(t);return Object.getOwnPropertySymbols&&(r=Object.getOwnPropertySymbols(t),e&&(r=r.filter(function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable})),n.push.apply(n,r)),n} 49 ?
function(e){var t;"default"!==e&&(t=Object.getOwnPropertyDescriptor(c,e),Object.defineProperty(y,e,t.get?t:{enumerable:!0,get:function(){return c[e]}}))} 49 ?
function(l,r){var t=null==l?null:typeof Symbol<"u"&&l[Symbol.iterator]||l["@@iterator"];if(null!=t){var n,u,e=[],a=!0,o=!1;try{for(t=t.call(l);!(a=(n=t.next()).done)&&(e.push(n.value),!r||e.length!==r);a=!0);}catch(l){o=!0,u=l}finally{try{a||null==t.return||t.return()}finally{if(o)throw u}}return e}} 44 ?
function(r){var n;return r&&"object"==typeof r&&"default"in r?r:(n=Object.create(null),r&&Object.keys(r).forEach(function(e){var t;"default"!==e&&(t=Object.getOwnPropertyDescriptor(r,e),Object.defineProperty(n,e,t.get?t:{enumerable:!0,get:function(){return r[e]}}))}),n.default=r,Object.freeze(n))} 39 ?
function n(r){return n(r)} 38 Function.apply?
function(n){return typeof n} 38 typeof
function(o){return o&&"function"==typeof Symbol&&o.constructor===Symbol&&o!==Symbol.prototype?"symbol":typeof o} 38 ?
function(r){var n;return r&&r.__esModule?r:(n=Object.create(null),r&&Object.keys(r).forEach(function(e){var t;"default"!==e&&(t=Object.getOwnPropertyDescriptor(r,e),Object.defineProperty(n,e,t.get?t:{enumerable:!0,get:function(){return r[e]}}))}),n.default=r,Object.freeze(n))} 37 import?
function(l,r){var t=null==l?null:typeof Symbol<"u"&&l[Symbol.iterator]||l["@@iterator"];if(null!=t){var e,n,u,a,f=[],i=!0,o=!1;try{if(u=(t=t.call(l)).next,0===r){if(Object(t)!==t)return;i=!1}else for(;!(i=(e=u.call(t)).done)&&(f.push(e.value),f.length!==r);i=!0);}catch(l){o=!0,n=l}finally{try{if(!i&&null!=t.return&&(a=t.return(),Object(a)!==a))return}finally{if(o)throw n}}return f}} 33 ?
function(n){} 29 no-op
function(r){if(typeof Symbol<"u"&&null!=r[Symbol.iterator]||null!=r["@@iterator"])return Array.from(r)} 28 ?
function(){throw new TypeError('Invalid attempt to spread non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.')} 28 ?
()=>{} 27 no-op
function(){return null} 27 always-null
function(){return this} 20 always-this
function(){return 0} 17 always-zero
function(n){return n} 15 identity
function(n,r){return n} 14 always-first-argument
d=>d.id 14 .id
function(){var r=function(t,o){return(r=Object.setPrototypeOf||({__proto__:[]}instanceof Array?function(t,o){t.__proto__=o}:function(t,o){for(var n in o)Object.prototype.hasOwnProperty.call(o,n)&&(t[n]=o[n])}))(t,o)};return function(t,o){if("function"!=typeof o&&null!==o)throw new TypeError("Class extends value "+String(o)+" is not a constructor or null");function n(){this.constructor=t}r(t,o),t.prototype=null===o?Object.create(o):(n.prototype=o.prototype,new n)}} 14 ?
function(_,o){_.__proto__=o} 14 Object.is_a?
function(o,r){for(var t in r)Object.prototype.hasOwnProperty.call(r,t)&&(o[t]=r[t])} 14 ?
function(n){return!1} 13 always-false
a=>a 12 identity
a=>a() 12 call first argument
function(n,a){n.a=a} 11 enum/const definition?
function(n,r){return r} 11 always-second-argument
function(e){return e&&e.__esModule?e:{default:e}} 11 import default
function(c){this.c=c} 10 enum/const definition?

As a matter of fact, someone on the internet did a very similar research few years ago. So I hoped to see the improvement in the build tools over the years.

As I mentioned above, our front-end is bundled with Vite. Let’s see if using esbuild or bun (since both are fairly new and stand out in terms of architecture and performance) do a better job.

With few small adjustments to make things fair (e.g. build the same thing in the same way), like disabling the plugins for Vite, setting up svgr loader, here are some build time stats:

yarn install:

➤ YN0000: Done with warnings in 17s 798ms
yarn  12.11s user 19.69s system 175% cpu 18.122 total

bun install:

warn: esbuild's postinstall script took 748.9ms

 1028 packages installed [1.82s]
  Removed: 2
bun install  0.22s user 0.65s system 47% cpu 1.849 total
Bundler Build time
bun 0.43s
esbuild 2.57s
vite 85.04s
webpack 138.64s

And the analysis of the built bundles:

vite:

Found 968 functions, 651 are unique (67.25%)
Found 598 functions, 267 are unique (44.65%)
Found 154 functions, 98 are unique (63.64%)
Found 17 functions, 10 are unique (58.82%)
Found 15192 functions, 8963 are unique (59%)
Found 1202 functions, 494 are unique (41.1%)
Found 513 functions, 231 are unique (45.03%)
= Total 18644 functions, 10714 are unique (57.4%)

Duplicates length: 52616 bytes out of 281406 bytes are duplicate code (18.7%)
Duplicates length: 57607 bytes out of 164737 bytes are duplicate code (34.97%)
Duplicates length: 11140 bytes out of 45135 bytes are duplicate code (24.68%)
Duplicates length: 1932 bytes out of 6532 bytes are duplicate code (29.58%)
Duplicates length: 1518418 bytes out of 3537579 bytes are duplicate code (42.92%)
Duplicates length: 130649 bytes out of 340227 bytes are duplicate code (38.4%)
Duplicates length: 50160 bytes out of 136057 bytes are duplicate code (36.87%)
= Total 1822522 out of 4511673 bytes are duplicate code (40.3%)

esbuild:

Found 46654 functions, 28952 are unique (62.06%)
Duplicates length: 6905599 bytes out of 9645594 bytes are duplicate code (71.59%)

bun:

Found 31113 functions, 25755 are unique (82.78%)
Duplicates length: 446020 bytes out of 5696964 bytes are duplicate code (7.83%)

webpack:

Found 2898 functions, 1434 are unique (49.48%)
Duplicates length: 320940 bytes out of 4645589 bytes are duplicate code (6.91%)

And a deeper analysis of the duplicated functions:

esbuild:

Function Copies
function(r){for(var t=1;t<arguments.length;t++){var n,o=arguments[t];for(n in o)Object.prototype.hasOwnProperty.call(o,n)&&(r[n]=o[n])}return r} 2216
function n(){return Object.assign&&Object.assign.bind(),n.apply(this,arguments)} 1204
function n(){return Object.assign,n.apply(this,arguments)} 1010
function(){} 844
function(t){return t&&"object"==typeof t&&"default"in t?t:{default:t}} 260
function(i){this.a=i} 250
function(n,e){if(null==n)return{};for(var r,t={},f=Object.keys(n),u=0;u<f.length;u++)r=f[u],0<=e.indexOf(r)||(t[r]=n[r]);return t} 203
function(e,r){if(null==e)return{};var t,n=function(e,r){if(null==e)return{};for(var t,n={},l=Object.keys(e),o=0;o<l.length;o++)t=l[o],0<=r.indexOf(t)||(n[t]=e[t]);return n}(e,r);if(Object.getOwnPropertySymbols)for(var l=Object.getOwnPropertySymbols(e),o=0;o<l.length;o++)t=l[o],0<=r.indexOf(t)||Object.prototype.propertyIsEnumerable.call(e,t)&&(n[t]=e[t]);return n} 194
function(e,r){return r=r||e.slice(0),Object.freeze(Object.defineProperties(e,{raw:{value:Object.freeze(r)}}))} 160
function(r,t){if("object"!=typeof r||null===r)return r;var e=r[Symbol.toPrimitive];if(void 0===e)return("string"===t?String:Number)(r);e=e.call(r,t||"default");if("object"!=typeof e)return e;throw new TypeError("@@toPrimitive must return a primitive value.")} 137
function(r,e,t){return i=function(r,e){if("object"!=typeof r||null===r)return r;var t=r[Symbol.toPrimitive];if(void 0===t)return String(r);t=t.call(r,e);if("object"!=typeof t)return t;throw new TypeError("@@toPrimitive must return a primitive value.")}(e,"string"),(e="symbol"==typeof i?i:String(i))in r?Object.defineProperty(r,e,{value:t,enumerable:!0,configurable:!0,writable:!0}):r[e]=t,r;var i} 134
function(t){t=function(t,r){if("object"!=typeof t||null===t)return t;var i=t[Symbol.toPrimitive];if(void 0===i)return String(t);i=i.call(t,r);if("object"!=typeof i)return i;throw new TypeError("@@toPrimitive must return a primitive value.")}(t,"string");return"symbol"==typeof t?t:String(t)} 134
()=>{} 129
function(n){return this===n} 119
function(e,n,r){return n in e?Object.defineProperty(e,n,{value:r,enumerable:!0,configurable:!0,writable:!0}):e[n]=r,e} 115
function(n,r){(null==r||r>n.length)&&(r=n.length);for(var e=0,l=new Array(r);e<r;e++)l[e]=n[e];return l} 106
function(c,i){Bs.call(this,c,i)} 94
function(){return!0} 93
function(r){if(Array.isArray(r))return r} 83
function(){throw new TypeError(Invalid attempt to destructure non-iterable instance.\nIn order to be iterable, non-array objects must have a Symbol.iterator method.)} 83
function(){return!1} 79
function(){return new cu(this)} 77
function(e){var t;"default"!==e&&(t=Object.getOwnPropertyDescriptor(b,e),Object.defineProperty(E,e,t.get?t:{enumerable:!0,get:function(){return b[e]}}))} 76
function(){return b[P]} 76
function(a,e,l){var i,r=l["aria-label"],t=l["aria-labelledby"],n=l.title;switch(a){case"img":return r||t||n?(s(i={},"aria-labelledby",t),s(i,"aria-label",r),s(i,"title",n),i):{"aria-label":"".concat(e.replace(/([a-z])([A-Z])/g,"$1 $2")," Icon")};case"presentation":return{"aria-hidden":!0,alt:""}}} 76
function(r){var n;return r&&r.__esModule?r:(n=Object.create(null),r&&Object.keys(r).forEach(function(e){var t;"default"!==e&&(t=Object.getOwnPropertyDescriptor(r,e),Object.defineProperty(n,e,t.get?t:{enumerable:!0,get:function(){return r[e]}}))}),n.default=r,Object.freeze(n))} 67
function(n){return typeof n} 64
function(o){return o&&"function"==typeof Symbol&&o.constructor===Symbol&&o!==Symbol.prototype?"symbol":typeof o} 64
function n(r){return n(r)} 63
function(t,e){var r,n=Object.keys(t);return Object.getOwnPropertySymbols&&(r=Object.getOwnPropertySymbols(t),e&&(r=r.filter(function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable})),n.push.apply(n,r)),n} 61
function(i,t){this.a=i,this.b=t} 58
function(r){var n;return r&&"object"==typeof r&&"default"in r?r:(n=Object.create(null),r&&Object.keys(r).forEach(function(e){var t;"default"!==e&&(t=Object.getOwnPropertyDescriptor(r,e),Object.defineProperty(n,e,t.get?t:{enumerable:!0,get:function(){return r[e]}}))}),n.default=r,Object.freeze(n))} 50
function(r){if(typeof Symbol<"u"&&null!=r[Symbol.iterator]||null!=r["@@iterator"])return Array.from(r)} 49
function(){throw new TypeError(`Invalid attempt to spread non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.`)} 49
function(){return this.a} 49
function(i){p0(this,i)} 48
function(l,r){var t=null==l?null:typeof Symbol<"u"&&l[Symbol.iterator]||l["@@iterator"];if(null!=t){var n,u,e=[],a=!0,o=!1;try{for(t=t.call(l);!(a=(n=t.next()).done)&&(e.push(n.value),!r||e.length!==r);a=!0);}catch(l){o=!0,u=l}finally{try{a||null==t.return||t.return()}finally{if(o)throw u}}return e}} 46
function(){throw St(new Ss)} 46
function(n){throw St(new Ss)} 39
()=>{"use strict";Vu(),Du()} 38
function(l,r){var t=null==l?null:typeof Symbol<"u"&&l[Symbol.iterator]||l["@@iterator"];if(null!=t){var e,n,u,a,f=[],i=!0,o=!1;try{if(u=(t=t.call(l)).next,0===r){if(Object(t)!==t)return;i=!1}else for(;!(i=(e=u.call(t)).done)&&(f.push(e.value),f.length!==r);i=!0);}catch(l){o=!0,n=l}finally{try{if(!i&&null!=t.return&&(a=t.return(),Object(a)!==a))return}finally{if(o)throw n}}return f}} 37
function(){return this.b} 33
function(){X(x)} 32
function(n){} 31
a=>a() 30
function(n){return V1(n)} 30
function(n){return dn(oi,mr,2,n,6,1)} 30
function(){return null} 29
function(){return this} 24
()=>{"use strict";Du()} 23
function(i){g0(this,i)} 23
function(t,r){var e;if(t)return"string"==typeof t?k(t,r):"Map"===(e="Object"===(e=Object.prototype.toString.call(t).slice(8,-1))&&t.constructor?t.constructor.name:e)||"Set"===e?Array.from(t):"Arguments"===e||/^(?:Ui|I)nt(?:8|16|32)(?:Clamped)?Array$/.test(e)?k(t,r):void 0} 22
function(t,r){var e;if(t)return"string"==typeof t?P(t,r):"Map"===(e="Object"===(e=Object.prototype.toString.call(t).slice(8,-1))&&t.constructor?t.constructor.name:e)||"Set"===e?Array.from(t):"Arguments"===e||/^(?:Ui|I)nt(?:8|16|32)(?:Clamped)?Array$/.test(e)?P(t,r):void 0} 22
function(n){return null!=n&&n instanceof Array} 21
function(r){if(Array.isArray(r))return P(r)} 21
function(e){return e&&e.__esModule?e:{default:e}} 21
function(t,r){var e;if(t)return"string"==typeof t?Q(t,r):"Map"===(e="Object"===(e=Object.prototype.toString.call(t).slice(8,-1))&&t.constructor?t.constructor.name:e)||"Set"===e?Array.from(t):"Arguments"===e||/^(?:Ui|I)nt(?:8|16|32)(?:Clamped)?Array$/.test(e)?Q(t,r):void 0} 21
function(){throw St(new Os(il((qs(),_g))))} 21
function(n){return n} 20
function(n){return null!=n&&n.nodeType===Node.ELEMENT_NODE} 20
function(e){throw Error("Received unhandled value: ".concat(e))} 20
function(r,n){return Array.isArray(r)?r.concat(n):"string"==typeof r?r:void 0} 19

bun:

Function Copies
function(){} 739
function(i){this.a=i} 250
function(r){for(var t=1;t<arguments.length;t++){var n,o=arguments[t];for(n in o)Object.prototype.hasOwnProperty.call(o,n)&&(r[n]=o[n])}return r} 197
()=>{} 141
function(n){return this===n} 119
function(c,f){f7.call(this,c,f)} 94
function(){return!0} 91
function(){return new p9(this)} 77
function(){return!1} 76
function(i,t){this.a=i,this.b=t} 58
function(n,e){if(null==n)return{};for(var r,t={},f=Object.keys(n),u=0;u<f.length;u++)r=f[u],0<=e.indexOf(r)||(t[r]=n[r]);return t} 53
function(r,t){if("object"!=typeof r||null===r)return r;var e=r[Symbol.toPrimitive];if(void 0===e)return("string"===t?String:Number)(r);e=e.call(r,t||"default");if("object"!=typeof e)return e;throw new TypeError("@@toPrimitive must return a primitive value.")} 51
function(){return this.a} 49
function(e,r){if(null==e)return{};var t,n=function(e,r){if(null==e)return{};for(var t,n={},l=Object.keys(e),o=0;o<l.length;o++)t=l[o],0<=r.indexOf(t)||(n[t]=e[t]);return n}(e,r);if(Object.getOwnPropertySymbols)for(var l=Object.getOwnPropertySymbols(e),o=0;o<l.length;o++)t=l[o],0<=r.indexOf(t)||Object.prototype.propertyIsEnumerable.call(e,t)&&(n[t]=e[t]);return n} 48
function(r,e,t){return i=function(r,e){if("object"!=typeof r||null===r)return r;var t=r[Symbol.toPrimitive];if(void 0===t)return String(r);t=t.call(r,e);if("object"!=typeof t)return t;throw new TypeError("@@toPrimitive must return a primitive value.")}(e,"string"),(e="symbol"==typeof i?i:String(i))in r?Object.defineProperty(r,e,{value:t,enumerable:!0,configurable:!0,writable:!0}):r[e]=t,r;var i} 48
function(t){t=function(t,r){if("object"!=typeof t||null===t)return t;var i=t[Symbol.toPrimitive];if(void 0===i)return String(t);i=i.call(t,r);if("object"!=typeof i)return i;throw new TypeError("@@toPrimitive must return a primitive value.")}(t,"string");return"symbol"==typeof t?t:String(t)} 48
function(i){T6(this,i)} 48
()=>{R3(),G3()} 46
function(){throw x0(new w7)} 46
function(e,r){return r=r||e.slice(0),Object.freeze(Object.defineProperties(e,{raw:{value:Object.freeze(r)}}))} 45
function(n,r){(null==r||r>n.length)&&(r=n.length);for(var e=0,l=new Array(r);e<r;e++)l[e]=n[e];return l} 41
function(n){throw x0(new w7)} 39
function(r){if(Array.isArray(r))return r} 36
function(){throw new TypeError("Invalid attempt to destructure non-iterable instance.\\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.")} 36
function(){return this.b} 33
function(n){return n2(n)} 30
function(n){return J1($5,p1,2,n,6,1)} 30
function(t,e){var r,n=Object.keys(t);return Object.getOwnPropertySymbols&&(r=Object.getOwnPropertySymbols(t),e&&(r=r.filter(function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable})),n.push.apply(n,r)),n} 29
function(l,r){var t=null==l?null:"undefined"!=typeof Symbol&&l[Symbol.iterator]||l["@@iterator"];if(null!=t){var e,n,u,f,i=[],a=!0,o=!1;try{if(u=(t=t.call(l)).next,0===r){if(Object(t)!==t)return;a=!1}else for(;!(a=(e=u.call(t)).done)&&(i.push(e.value),i.length!==r);a=!0);}catch(l){o=!0,n=l}finally{try{if(!a&&null!=t.return&&(f=t.return(),Object(f)!==f))return}finally{if(o)throw n}}return i}} 29
function(n){} 29
function(){return null} 28
function(e){return Object.getOwnPropertyDescriptor(Z,e).enumerable} 25
function(e){Object.defineProperty(Z,e,Object.getOwnPropertyDescriptor(W,e))} 25
function(n){return n} 23
function(i){C6(this,i)} 23
()=>{G3()} 22
function(){throw x0(new C7(n7((y7(),KG))))} 21
function(){return this} 19
function(i,t){this.b=i,this.a=t} 19
function(n){return!1} 18
function(n,w){throw x0(new w7)} 18
function(){return 0} 17
function(n){return typeof n} 17
function(o){return o&&"function"==typeof Symbol&&o.constructor===Symbol&&o!==Symbol.prototype?"symbol":typeof o} 17
()=>{R3()} 16
()=>{var e;return null!=(e=Z.options.debugAll)?e:Z.options.debugHeaders} 16
function(n,r){return n} 15
function(c){hX.call(this,c)} 15
function(){return this.c} 15
function(){return this.d} 15
function(e,n,r){return n in e?Object.defineProperty(e,n,{value:r,enumerable:!0,configurable:!0,writable:!0}):e[n]=r,e} 14
function(r){if("undefined"!=typeof Symbol&&null!=r[Symbol.iterator]||null!=r["@@iterator"])return Array.from(r)} 14
function(){throw new TypeError("Invalid attempt to spread non-iterable instance.\\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.")} 14
d=>d.id 14
()=>W(!1) 13
a=>a 12
function(t){var l=-1,n=null==t?0:t.length;for(this.clear();++l<n;){var r=t[l];this.set(r[0],r[1])}} 12
function(n,c){} 12
function(n,r){return r} 11
function(n,a){n.a=a} 11
function(){_V.call(this)} 11
function(i,t,h){this.a=i,this.b=t,this.c=h} 11
function(){return this.a.gc()} 11
function(i){this.b=i} 10
function(c){this.c=c} 10
function(){return this.f} 10

Interestingly enough, all three tools handled the job bad in different aspects:

  • vite was the slowest and produced second biggest bundle
  • esbuild was the fastest and produced the largest bundle
  • bun was slower than esbuild by a split of hair, but produced smallest bundle with least duplicates

Bonus points to bun for installing node modules in a link of an eye.

In terms of duplicates, however, all three failed miserably (in my opinion), with the best result being the bundle produced by bun with 18% duplicates and the rest having almost half the bundle wasted.

For the most part, bundlers seem to be doing a pretty bad job at tree shaking and keep a lot of those utility functions’ duplicates. One can estimate how much of a wasted space these use, by multiplying the function code length by the number of duplicates minus one (for one definition).

Let’s imagine some of the above functions could be de-duplicated. What are the benefit of that? For the most part, the front-end can load faster for users - simply because there is less bytes to transfer. On top of that, there are less functions to be created in memory. So technically, the front-end can act faster. Although, on the modern machines the difference between having one function and few thousand of the same function is negligible.

Here is a shortened list of top abusers from different bundlers for our tool:

Function vite esbuild bun
# Bytes # Bytes # Bytes
function(){} 753 9036 844 10128 739 8868
function(){return!0} 92 1840 93 1860 91 1820
function(){return!1} 78 1560 79 1580 76 1520
function(){return null} 27 621 29 667 28 644
function(){return this} 20 460 24 552 19 437
function(){return 0} 17 340 n/a 0 n/a 0
function(n){return!1} 13 273 n/a 0 18 378
function(n){} 29 377 31 403 29 377
function(n){return n} 15 315 20 420 23 483
function(n){return typeof n} n/a 0 64 1792 17 476
function(n){return this===n} n/a 0 119 3332 119 3332
function(n,r){return n} 14 322 n/a 0 15 345
function(n,r){return r} 11 253 n/a 0 11 253
function(n,c){} n/a 0 n/a 0 12 180
()=>{} 27 162 129 774 141 846
a=>a 12 48 n/a 0 12 48
a=>a() 12 72 n/a 0 n/a 0
function(r){for(var t=1;t<arguments.length;t++){var n,o=arguments[t];for(n in o)Object.prototype.hasOwnProperty.call(o,n)&&(r[n]=o[n])}return r} 2205 317520 2216 319104 197 28368
function n(){return Object.assign&&Object.assign.bind(),n.apply(this,arguments)} 1197 95760 1204 96320 n/a 0
function n(){return Object.assign,n.apply(this,arguments)} 1008 58464 1010 58580 n/a 0
function(n,r){(null==r||r>n.length)&&(r=n.length);for(var e=0,l=new Array(r);e<r;e++)l[e]=n[e];return l} 93 9672 106 11024 n/a 0
function(r){if(Array.isArray(r))return r} 77 3157 83 3403 36 1476
function(){throw new TypeError(&#39;Invalid attempt to destructure non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.&#39;)} 77 13244 83 14276 36 6192
function(t){return t&&"object"==typeof t&&"default"in t?t:{default:t}} 76 5624 260 19240 n/a 0
function(_,o){_.__proto__=o} 14 392 n/a 0 n/a 0
function(o,r){for(var t in r)Object.prototype.hasOwnProperty.call(r,t)&&(o[t]=r[t])} 14 1176 n/a 0 n/a 0
function(e){return e&&e.__esModule?e:{default:e}} 11 539 21 1029 n/a 0
function(e,n,r){return n in e?Object.defineProperty(e,n,{value:r,enumerable:!0,configurable:!0,writable:!0}):e[n]=r,e} n/a 0 115 13570 n/a 0
function(r,n){return Array.isArray(r)?r.concat(n):"string"==typeof r?r:void 0} n/a 0 19 1520 n/a 0
function(n){return null!=n&&n instanceof Array} n/a 0 21 987 n/a 0
function(n,e){if(null==n)return{};for(var r,t={},f=Object.keys(n),u=0;u<f.length;u++)r=f[u],0<=e.indexOf(r)||(t[r]=n[r]);return t} n/a 0 n/a 0 53 6890
function(r,t){if("object"!=typeof r||null===r)return r;var e=r[Symbol.toPrimitive];if(void 0===e)return("string"===t?String:Number)(r);e=e.call(r,t||"default");if("object"!=typeof e)return e;throw new TypeError("@@toPrimitive must return a primitive value.")} n/a 0 137 36853 51 13719
function(e,r){return r=r||e.slice(0),Object.freeze(Object.defineProperties(e,{raw:{value:Object.freeze(r)}}))} n/a 0 160 17600 n/a 0
function(o){return o&&"function"==typeof Symbol&&o.constructor===Symbol&&o!==Symbol.prototype?"symbol":typeof o} n/a 0 64 7424 n/a 0
function(r){if("undefined"!=typeof Symbol&&null!=r[Symbol.iterator]||null!=r["@@iterator"])return Array.from(r)} n/a 0 n/a 0 14 1624

Interestingly enough, aside from a lot of () => {} and (a, b) => a and () => true (as I call them, utility) functions, there are a lot of ES6 / TypeScript helpers such as class definition and spread operator variants, presumingly made to be compatible with ES5-only browsers. Maybe if we had targeted only platforms supporting the latest ES features we would get better results?

Well, not quite much:

bundle sizes:

Bundler Bundle size
bun 5.4M
esbuild 9.2M
esbuild (tuned) 8.0M
vite 7.1M
vite (tuned) 3.8M
webpack 4.4M

vite:

Found 15983 functions, 9581 are unique (59.94%)
Duplicates length: 1495985 bytes out of 4019326 bytes are duplicate code (37.22%)

esbuild:

Found 41736 functions, 29224 are unique (70.02%)
Duplicates length: 3406606 bytes out of 8347230 bytes are duplicate code (40.81%)

webpack is (should be) already using the target config option from tsconfig.json (which is set to ESNext in our case) and bun does not really have a whole lot of customization in this regard.

One unlikely possibility when these duplicates can be faster than having just one function is that running the nearly-defined code (in terms of a single block of code) might be slightly faster than making code jumps. This is super far-fetched idea from low-level programming, when CPU does not have to jump thousands or millions of (assembly) instructions back and forth but literally a few instead. This won’t justify using verbose ES5-compatible code on ESnext browser, however. How about we run a very synthetic benchmark to check just this one theory?

function f(){return!1}

console.time('1');
for(var i=0;i<100000;i++){var a=[];for(var t=0;t<20000;t++)a.push(Math.random()*1000000);var x=a.filter(f).length}
console.timeEnd('1');
console.time('2');
for(var i=0;i<100000;i++){var a=[];for(var t=0;t<20000;t++)a.push(Math.random()*1000000);var f=function(){return!1},x=a.filter(f).length}
console.timeEnd('2');

The results are actually quite stable:

1: 17029ms
1: 16998ms
1: 16903ms

2: 21877ms
2: 21811ms
2: 21821ms

Having just one function instance is approx. 23% faster. But what happens at consequitive runs?

1: 9194ms
1: 9159ms
1: 14044ms
1: 13882ms
1: 13975ms
1: 9205ms
1: 14026ms

2: 21821ms
2: 13843ms
2: 13866ms
2: 13854ms
2: 13961ms

2: 21718ms
2: 13952ms
2: 13925ms
2: 13923ms

Seems like CPU does indeed do a little bit of instruction caching and branch prediction (first run is visibly slower than the subsequent runs). But the observation still holds: having one function definition instead of many copies (even “near” copies) has a much bigger impact.

With that being said, there is one interesting thing to try here: what if we actually replace some of those bulky duplicates with one-time declarations, within the same bundle?

Without performing an in-depth code analysis and optimization, I came up with the following naive implementation:

  1. analyze the code and pick a few functions (biggest abusers) to fix them up
  2. extract all function names, function parameter names (including potential destructuring objects in function params) and all variable and constant names
  3. for each function to be de-duplicated, create a unique name (using the variable and function parameter and function names to avoid any clashes)
  4. from the end of the file to the beginning, remove all occurrences of the function declaration and replace all function references with the unique name
  5. add the function declarations to the beginning of the file

This approach is rather naive, since it does not account for a number of edge cases. For instance, if there are two functions to be replaced:

function(e){for(var r=1;r<arguments.length;r++){var t,l=arguments[r];for(t in l)Object.prototype.hasOwnProperty.call(l,t)&&(e[t]=l[t])}return e}

function p(){return(p=Object.assign||function(e){for(var r=1;r<arguments.length;r++){var t,l=arguments[r];for(t in l)Object.prototype.hasOwnProperty.call(l,t)&&(e[t]=l[t])}return e}).apply(this,arguments)}

e.g. one includes the other, the algorithm could have potentially evicted cases like this.

Before proceeding further, it is a good idea to test if the cleaned up bundle can safely replace the original one. Hence I just stuffed it in the static assets folder of our project and ran it with the modified bundle.

This way I figured few issues with the naive approach:

  • two function definitions are causing stack overflow:
    • $z=function n(){return Object.assign,n.apply(this,arguments)}
    • $q=function n(){return n=Object.assign&&Object.assign.bind(),n.apply(this,arguments)}
  • some of the empty functions are actually used as constructors (ES5-compatible OOP model) which is only discovered by finding the expressions like $FnName.prototype.something = somethingElse;
  • some functions are named and then referenced later in the code
  • some functions are not used at all: Unused aliases

For the shorthand functions I first tried manually fixing them up - had to replace them with $z=function(){return $z=Object.assign.bind(),$z.apply(this,arguments)} alikes. This worked, so I created an AST transformer to handle these one-line return-only functions:

const simplifyFunction = (code, fname) => {
    const tmpFilename = '_tmp';

    fs.writeFileSync(tmpFilename, code, 'utf-8');

    const root = ts.createSourceFile(
        tmpFilename,
        code,
        ts.ScriptTarget.ESNext,
        /* setParentNodes */ true
    );

    let rootFnName = undefined;

    const parse = (node) => {
        if (ts.isFunctionDeclaration(node) && ts.isIdentifier(node.name) && node.name.escapedText !== '') {
            rootFnName = node.name.escapedText;
            return;
        }

        ts.forEachChild(node, child => {
            if (child) {
                parse(child);
            }
        });
    };

    parse(root);

    if (!rootFnName) {
        fs.rmSync(tmpFilename);
        return code;
    }

    const transformer = (ctx) => (sourceFile) => {
        const visit = (node) => {
            if (ts.isIdentifier(node) && node.escapedText === rootFnName) {
                return ts.factory.createIdentifier(fname);
            }

            if (
                ts.isFunctionDeclaration(node) &&
                ts.isBlock(node.body) &&
                node.body.statements.length === 1 &&
                ts.isReturnStatement(node.body.statements[0]) &&
                ts.isBinaryExpression(node.body.statements[0].expression)
            ) {
                const next = ts.factory.createFunctionDeclaration(
                    [],
                    undefined,
                    undefined,
                    [],
                    [],
                    undefined,

                    ts.factory.createBlock([
                        ts.factory.createReturnStatement(
                            ts.factory.createComma(
                                ts.factory.createAssignment(
                                    ts.factory.createIdentifier(fname),
                                    node.body.statements[0].expression.left
                                ),

                                node.body.statements[0].expression.right
                            )
                        )
                    ])
                );

                return ts.visitEachChild(next, visit, ctx);
            }

            return ts.visitEachChild(node, visit, ctx);
        };

        return ts.visitNode(sourceFile, visit);
    };

    const s = ts.createSourceFile(tmpFilename, code, ts.ScriptTarget.ESNext);
    const { transformed } = ts.transform(s, [ transformer ]);

    const newCode = ts.createPrinter({ omitTrailingSemicolon: true })
        .printFile(transformed.find(({ fileName }) => fileName === tmpFilename));

    fs.rmSync(tmpFilename);

    return newCode;
};

The transformer is essentially a two-pass processor: it first parses the code and identifies the first function declaration. If none was found - it just returns the original code. If there was a so-called “root function” defined, it then replaces all identifier with that “root function” name with the alias provided as fname. It also replaces the return statements in form of a return something && something() with return alias = something, alias().

This approach is different from simply using uglifyjs to just try and minimize the code - it is way more complex (compared to just one function call). It results in few extra whitespaces being added. But using uglifyjs messes things up again and TypeScript compiler does not have an option for minimizing the output. But few extra whitespaces are totally acceptable given the much bigger savings from this transformation.

With the constructors I simply excluded them from being de-duplicated. This resulted in 155 fewer substitutions (in Vite mode), which is negligible on the overall scale of the problem.

As for the named functions which are in the global scope and are referenced later down the line, I had to create a list of “backwards-compatible aliases”, mapping those old function names onto the new unique names.

I ended up with a three-pass parser-transformer utility (not really “simple script” anymore). The passes being:

  1. figuring out the duplicate function declarations and replacing them from the end to the start of the code (to minimize the chance of writing over the just-changed code)
  2. removing the duplicate global-scope named functions and replacing them with shorthand aliases to the de-duplicated declarations
  3. replace the usages of all known aliases and shorthand-named de-duplicated functions with their corresponding new (generated) names

Other than those, replacing the original bundle with the optimized one worked like a charm!

The results? With the threshold of 20 duplicates or more:

Bundler Before optimization
Bundle size Total functions Unique functions Unique functions, % Duplicate code, %
bun 6.2M 8903 7443 83.6% 0.78%
esbuild 8.7M 13057 10250 78.5% 3.9%
vite 3.9M 3502 2365 67.53% 6.39%
webpack 4.4M 2898 1434 49.48% 6.91%
After optimization
bun 6.2M (same) 7865 (-1038) 7355 (-88) 93.52% (+9.92%) 0.51% (-0.27%)
esbuild 8.5M (-0.2M) 3265 (-9792) 2990 (-7260) 91.58% (+13.08%) 0.62% (-3.28%)
vite 3.6M (-0.3M) 2483 (-1019) 2277 (-88) 91.7% (+24.17%) 1.68% (-4.71%)
webpack 4.1M (-0.3M) 1484 (-1414) 1375 (-59) 92.65% (+43.17%) 0.43% (-6.48%)

In conclusion, the bundlers do a pretty average job at optimizing the bundles, even in production mode with some extra tuning. And if some brave soul is willing to invest even more time and effort than I did into developing a sophisticated solution (potentially improving the existing tools, like uglifyjs or bundlers themselves), the numbers can be improved even further. It would be really interesting to see what would the results be running this optimizer on a bigger bundle.

In my humble opinion, Bun does produce the cleanest bundle. It might be not the smallest one, but it has least unnecessary stuff. On top of that, it is the fastest tool in JS world I have ever used. By the way, this very blog is built using Bun and React SSR - on Github Actions it takes around a minute to build and publish:

Github Actions build and publish with Bun Github Actions build breakdown

TypeScript classes are not what you think

There was a talk at CPPCon 2017 by Matt Goldbolt with an inspirational title, What has my compiler done for me lately?. It actually resonates with me quite a bit, especially in the world where we try to push so hard for statically typed compiled languages on front-end.

Recently I recalled one of (many) aspects why I think TypeScript is useless (as a way to introduce strong type system to JavaScript world) in many cases.

To be realistic and not just throw bare accusations around, this was inspired by the work on contracts for a new microservice, specifically - the request & response types to be used in both client and the back-end of the microservice. The types were similar to the types used on a database layer and one of the developers just returned objects of that DB layer model type on a controller (endpoint) level, which confused me.

Consider the code below:

class A {
    constructor(public moo: string, public foo: number) {}
}

class B {
    constructor(public moo: string, public foo: number, public zoo?: string[]) {}
}

const b: B = new B('ololo', -3.14, ['1']);
const a: A = b;
const c: B = a;

This works because of a (rather questionable) design decision by TypeScript team called Type compatibility, where any two classes or interfaces that have overlapping public fields are deemed compatible and can be mutually interplaceable.

However, in most languages with reasonable type system, you would expect two different classes to be just that - two different classes - in the case of a code above, an object of class B can never be assigned to a variable of type A and vice versa.

There is a way to achieve this in TypeScript, however (a bit cumbersome, though): by hiding the properties and only exposing them through non-getter/non-setter methods, when needed:

class A {
    constructor(private moo: string, private foo: number) {}

    getMoo() {
        return this.moo;
    }

    getFoo() {
        return this.foo;
    }
}

class B {
    constructor(private moo: string, private foo: number, private zoo?: ReadonlyArray<string>) {}

    getMoo() {
        return this.moo;
    }

    getFoo() {
        return this.foo;
    }

    getZoo() {
        return this.zoo;
    }
}

const b: B = new B('ololo', -3.14, ['1']);
const a: A = b; // Type 'B' is not assignable to type 'A'. Types have separate declarations of a private property 'moo'.(2322)
const c: B = a; // Property 'getZoo' is missing in type 'A' but required in type 'B'.(2741)

Without knowing this “feature” beforehand, one might end up with an inconsistent code or errors down the line (when somebody decides to modify the DB layer model and gets errors on an API layer). And TypeScript does not really help here.

How not to use monads

Recently I have watched a video titled “Optionals and errors in Haskell & Rust - monads by example”. In the video, the author makes a simple application simulating the retrieval of a HTML document via URL and checking if the document contains the title. This is a half-simulated experience, aimed at the beginners to demonstrate how the error handling is done in Rust and Haskell.

However, author in his code has made quite a few bad decisions. And that’s what I want to make a focus of this blog: how one should not use monads and handle errors in Haskell.

Author’s original source looks like this:

{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE MultiWayIf #-}
{-# LANGUAGE OverloadedRecordDot #-}
{-# LANGUAGE OverloadedRecordUpdate #-}

import Control.Monad (forM_)
import Data.List (isInfixOf)
import Data.Maybe (fromMaybe)
import Text.Printf (printf)

data Error = Error (String) deriving (Show)

data Doc = Doc {head :: Maybe Head}

data Head = Head {title :: Maybe String}

data Summary = Summary
  { title :: Maybe String,
    ok :: Bool
  }
  deriving (Show)

readDoc :: String -> Either Error Doc
readDoc url =
  if
      | isInfixOf "fail" url -> Left $ Error $ printf "Bad read of %s" url
      | otherwise ->
          Right $
            if
                | isInfixOf "head-missing" url -> Doc {head = Nothing}
                | isInfixOf "title-missing" url ->
                    Doc {head = Just Head {title = Nothing}}
                | isInfixOf "title-empty" url ->
                    Doc {head = Just Head {title = Just ""}}
                | otherwise ->
                    Doc
                      { head =
                          Just Head {title = Just $ printf "Title of %s" url}
                      }

buildSummary :: Doc -> Summary
buildSummary doc =
  Summary {title = doc.head >>= (.title), ok = True}

readAndBuildSummary :: String -> Summary
readAndBuildSummary url = case readDoc url of
  Left err -> Summary {title = Nothing, ok = False}
  Right doc -> buildSummary doc

isTitleNonEmpty :: Doc -> Maybe Bool
isTitleNonEmpty doc = do
  head <- doc.head
  title <- head.title
  return $ not $ null title

isTitleNonEmpty' :: Doc -> Maybe Bool
isTitleNonEmpty' doc = not <$> null <$> (doc.head >>= (.title))

isTitleNonEmpty'' :: Doc -> Maybe Bool
isTitleNonEmpty'' doc = not . null <$> (doc.head >>= (.title))

readWhetherTitleNonEmpty :: String -> Either Error (Maybe Bool)
readWhetherTitleNonEmpty url = do
  doc <- readDoc url
  return $ isTitleNonEmpty doc

readWhetherTitleNonEmpty' :: String -> Either Error (Maybe Bool)
readWhetherTitleNonEmpty' url = isTitleNonEmpty <$> readDoc url

main :: IO ()
main = do
  let urls = ["good", "title-empty", "title-missing", "head-missing", "fail"]
  forM_ urls $ \url -> do
    putStrLn $ printf "Checking \"https://%s/\":" url
    -- Summary.
    let summary = readAndBuildSummary url
    putStrLn $ printf "  Summary: %s" $ show summary
    putStrLn $ printf "  Title: %s" $ fromMaybe "" summary.title
    -- Has title.
    let hasTitle = readWhetherTitleNonEmpty url
    -- let hasTitleSure = either (const False) id $ fromMaybe False <$> hasTitle
    let hasTitleSure = fromMaybe False $ either (const Nothing) id hasTitle
    putStrLn $
      printf "  Has title: %s vs %s" (show hasTitle) (show hasTitleSure)

He then proceeded to wrapping this code in IO monad and ended up with the following:

{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE MultiWayIf #-}
{-# LANGUAGE OverloadedRecordDot #-}
{-# LANGUAGE OverloadedRecordUpdate #-}

import Control.Monad (forM_)
import Data.Functor ((<&>))
import Data.List (isInfixOf)
import Data.Maybe (fromMaybe)
import Text.Printf (printf)

data Error = Error (String) deriving (Show)

data Doc = Doc {head :: Maybe Head}

data Head = Head {title :: Maybe String}

data Summary = Summary
  { title :: Maybe String,
    ok :: Bool
  }
  deriving (Show)

-- readDoc

readDoc :: String -> IO (Either Error Doc)
readDoc url =
  return
    if
        | isInfixOf "fail" url -> Left $ Error $ printf "Bad read of %s" url
        | otherwise ->
            Right $
              if
                  | isInfixOf "head-missing" url -> Doc {head = Nothing}
                  | isInfixOf "title-missing" url ->
                      Doc {head = Just Head {title = Nothing}}
                  | isInfixOf "title-empty" url ->
                      Doc {head = Just Head {title = Just ""}}
                  | otherwise ->
                      Doc
                        { head =
                            Just Head {title = Just $ printf "Title of %s" url}
                        }

-- buildSummary

buildSummary :: Doc -> Summary
buildSummary doc =
  Summary {title = doc.head >>= (.title), ok = True}

-- readAndBuildSummary

readAndBuildSummary :: String -> IO Summary
readAndBuildSummary url = do
  docOrError <- readDoc url
  return $ case docOrError of
    Left err -> Summary {title = Nothing, ok = False}
    Right doc -> buildSummary doc

readAndBuildSummary' :: String -> IO Summary
readAndBuildSummary' url =
  readDoc url >>= \docOrError -> return $ case docOrError of
    Left err -> Summary {title = Nothing, ok = True}
    Right doc -> buildSummary doc

readAndBuildSummary'' :: String -> IO Summary
readAndBuildSummary'' url =
  readDoc url <&> \case
    Left err -> Summary {title = Nothing, ok = True}
    Right doc -> buildSummary doc

-- isTitleNonEmpty

isTitleNonEmpty :: Doc -> Maybe Bool
isTitleNonEmpty doc = do
  head <- doc.head
  title <- head.title
  return $ not $ null title

isTitleNonEmpty' :: Doc -> Maybe Bool
isTitleNonEmpty' doc = not <$> null <$> (doc.head >>= (.title))

isTitleNonEmpty'' :: Doc -> Maybe Bool
isTitleNonEmpty'' doc = not . null <$> (doc.head >>= (.title))

-- readWhetherTitleNonEmpty

readWhetherTitleNonEmpty :: String -> IO (Either Error (Maybe Bool))
readWhetherTitleNonEmpty url = do
  docOrError <- readDoc url
  return $ do
    doc <- docOrError
    return $ isTitleNonEmpty doc

readWhetherTitleNonEmpty' :: String -> IO (Either Error (Maybe Bool))
readWhetherTitleNonEmpty' url =
  readDoc url >>= \docOrError ->
    return $ docOrError >>= \doc -> return $ isTitleNonEmpty doc

readWhetherTitleNonEmpty'' :: String -> IO (Either Error (Maybe Bool))
readWhetherTitleNonEmpty'' url = (isTitleNonEmpty <$>) <$> readDoc url

readWhetherTitleNonEmpty''' :: String -> IO (Either Error (Maybe Bool))
readWhetherTitleNonEmpty''' url = readDoc url <&> (<&> isTitleNonEmpty)

-- main

main :: IO ()
main = do
  let urls = ["good", "title-empty", "title-missing", "head-missing", "fail"]
  forM_ urls $ \url -> do
    putStrLn $ printf "Checking \"https://%s/\":" url
    -- Summary.
    summary <- readAndBuildSummary url
    putStrLn $ printf "  Summary: %s" $ show summary
    putStrLn $ printf "  Title: %s" $ fromMaybe "" summary.title
    -- Has title.
    hasTitle <- readWhetherTitleNonEmpty url
    -- let hasTitleSure = either (const False) id $ fromMaybe False <$> hasTitle
    let hasTitleSure = fromMaybe False $ either (const Nothing) id hasTitle
    putStrLn $
      printf "  Has title: %s vs %s" (show hasTitle) (show hasTitleSure)

I see a few issues with this code:

  1. the abuse of IO monad
  2. the excessive (an I mean unnecessary) use of Maybe monad
  3. too many unnecessary readDoc calls (bear in mind: it is supposed to be a network call + document parse, so quite a heavy function)

Improvements

My suggestions on how to make their code better:

  1. only stick to IO where the function is actually interacting with the external world; in this case this would be either main or readDoc (when it actually makes a network call)
  2. eliminate the abuse of Maybe Bool - in the functions where the Maybe Bool is used (isTitleNonEmpty and its variations) just Bool would suffice
  3. only call readDoc once and use the result down the pipeline, no need to call it multiple times (readAndBuildSummary, readWhetherTitleNonEmpty)
  4. reduce the nested if statements to guard expressions
  5. replace the case eitherValue expressions with the either function
  6. remove the dead / unused / unnecessary code (like those duplicated functions - readWhetherTitleNonEmpty', readWhetherTitleNonEmpty'' and readWhetherTitleNonEmpty''')

Use IO monad only for the functions which interact with the outside world

IO monad tells compiler and the people who read the code the function operates outside of the program internals. Like actually performing input-output, working with network, operating system, etc.

You can think of any such function as an unsafe part of an application.

In the example above, most of the functions do not comply with that - they are simply a side-effect of poor monad operations and could be easily changed as follows:

isTitleNonEmpty :: Doc -> Maybe Bool
isTitleNonEmpty doc = do
  head <- doc.head
  title <- head.title
  return $ not $ null title

buildSummary :: Doc -> Summary
buildSummary doc =
  Summary {title = doc.head >>= (.title), ok = True}

-- there is absolutely no need for this function
readWhetherTitleNonEmpty :: String -> IO (Either Error (Maybe Bool))
readWhetherTitleNonEmpty url = do
  docOrError <- readDoc url
  return $ do
    doc <- docOrError
    return $ isTitleNonEmpty doc

-- this is also redundant
readAndBuildSummary :: String -> IO Summary
readAndBuildSummary url = do
  docOrError <- readDoc url
  return $ case docOrError of
    Left err -> Summary {title = Nothing, ok = False}
    Right doc -> buildSummary doc

-- this is the (reduced) existing code
main__old :: IO ()
main__old = do
  let url = "good"

  putStrLn $ printf "Checking \"https://%s/\":" url

  summary <- readAndBuildSummary url

  putStrLn $ printf "  Summary: %s" $ show summary
  putStrLn $ printf "  Title: %s" $ fromMaybe "" summary.title

  hasTitle <- readWhetherTitleNonEmpty url

-- note how the document is only passed down to the functions which actually operate on the document
-- the only two functions using `IO` monad here are `main` and `readDoc`, as both perform input-output operations
main :: IO ()
main = do
  let url = "good"

  putStrLn $ printf "Checking \"https://%s/\":" url

  docOrError <- readDoc url

  let summary = case docOrError of
    Left err -> Summary {title = Nothing, ok = False}
    Right doc -> buildSummary doc

  putStrLn $ printf "  Summary: %s" $ show summary
  putStrLn $ printf "  Title: %s" $ fromMaybe "" summary.title

  let hasTitle = isTitleNonEmpty doc

Reduce calls to IO functions

Pure functions must return the exact same value for the same arguments, regardless of how many times they are called.

IO (and a lot of other monads, like Reader, Writer, etc.) work around this rule by changing the outside world. So even though a function technically returns the same value every time, the side-effect of an IO monad can potentially change something elsewhere, outside of the function itself.

This makes the programs non-pure and might cause a lot of negative impact. These side effects are what is thought to be the root of all evils in (functional) programming. They make it hard to figure out what a program actually does. They make the functions and programs hard to test and prove to be correct.

From the performance perspective, if a function actually does change the outside world, it can’t be memoized and it won’t be optimized by the compiler to only use its result, computed once.

Think calling a function which tries to create a user in a system or drop a database table. This would happen on every call. If you call such function twice, it will most likely fail (due to database failures, hopefully).

But what if the function creates data instead and there are no checks for duplicates in the external system? Like appending a log line. If such function was to be called multiple times, it can quickly bloat the storage. This is a potential security issue as well.

So instead of calling such functions (recap: functions returning IO monad), why not call them once and store the result? Then the entire program can operate on the stored result value.

Of course, there are plethora of cases where such behaviour (calling such functions multiple times) is actually desired. But in the sample code from above, this is not the case.

Check the transformed code from the previous point how this solution is utilized:

main :: IO ()
main = do
  let url = "good"

  putStrLn $ printf "Checking \"https://%s/\":" url

  -- store the result of fetching a document by its URL
  docOrError <- readDoc url

  -- work on the stored value instead of sending an extra HTTP request
  let summary = case docOrError of
    Left err -> Summary {title = Nothing, ok = False}
    Right doc -> buildSummary doc

  putStrLn $ printf "  Summary: %s" $ show summary
  putStrLn $ printf "  Title: %s" $ fromMaybe "" summary.title

  -- work on the stored value again, preventing one more HTTP request
  let hasTitle = case docOrError of
    Left err -> False
    Right doc -> isTitleNonEmpty doc

Eliminate the abuse of Maybe Bool

There’s simply no need to go overboard with Maybe in methods which simply check whether something is present or not.

Maybe Bool introduces a tri-state:

  • Just False
  • Just True
  • Nothing

In cases where this is desired, is much better to represent each of the state with a semantic value:

type HeaderPresence = NoHeader | EmptyHeader | HasHeader

This way the code is much easier to follow and debug.

For the sample in question even this approach is redundant - the function simply checks whether the value (title) is present or not, so a simple Bool would suffice:

isTitleNonEmpty__old :: Doc -> Maybe Bool
isTitleNonEmpty__old doc = do
  head <- doc.head
  title <- head.title
  return $ not $ null title

isTitleNonEmpty :: Doc -> Bool
isTitleNonEmpty doc =
  not $ null title
  where
    title = doc.head doc >>= head.title

main__old :: IO ()
main__old = do
  let url = "good"

  docOrError <- readDoc url

  hasTitle <- case docOrError of
    Left err -> None
    Right doc -> isTitleNonEmpty__old doc

  let hasTitleSure = fromMaybe False $ either (const Nothing) id hasTitle

  putStrLn $
    printf "  Has title: %s vs %s" (show hasTitle) (show hasTitleSure)

main :: IO ()
main = do
  let url = "good"

  putStrLn $ printf "Checking \"https://%s/\":" url

  docOrError <- readDoc url

  -- a simple True | False value, no need to mess with Maybe
  let hasTitle = case docOrError of
    Left err -> False
    Right doc -> isTitleNonEmpty doc

  -- note: no need to wrap & unwrap Maybe and Either
  putStrLn $ printf "  Has title: %s" (show hasTitle)

Reduce the nested if statements - use guard expressions

The original readDoc function uses a rather unnecessarily bulky if statement:

readDoc :: String -> IO (Either Error Doc)
readDoc url =
  return
    if
        | isInfixOf "fail" url -> Left $ Error $ printf "Bad read of %s" url
        | otherwise ->
            Right $
              if
                  | isInfixOf "head-missing" url -> Doc {head = Nothing}
                  | isInfixOf "title-missing" url ->
                      Doc {head = Just Head {title = Nothing}}
                  | isInfixOf "title-empty" url ->
                      Doc {head = Just Head {title = Just ""}}
                  | otherwise ->
                      Doc
                        { head =
                            Just Head {title = Just $ printf "Title of %s" url}
                        }

Haskell provides a built-in language feature for just this case, which really does make code cleaner a lot:

readDoc :: String -> IO (Either Error Doc)

readDoc url
  | isInfixOf "fail" url = return $ Left $ Error $ printf "Bad read of %s" url
  | isInfixOf "head-missing" url = return $ Right Doc { head = Nothing }
  | isInfixOf "title-missing" url = return $ Right Doc { head = Just Head { title = Nothing } }
  | isInfixOf "title-empty" url = return $ Right Doc { head = Just Head { title = Just "" } }
  | otherwise = return $ Right Doc { head = Just Head { title = Just $ printf "Title of %s" url } }

Use existing functions on monads instead of case statements

In my experience, for the most part, you want to simply apply a function to a Maybe or an Either.

And, again, for the most part, you either ignore one of the states of the monad (be it Nothing or Left), or apply a different function to it.

There are handy helper functions for these situations which do make the code cleaner - fromMaybe and either.

Consider few examples:

hasTitle__old <- case docOrError of
  Left err -> None
  Right doc -> isTitleNonEmpty__old doc

hasTitle <- either (const None) (isTitleNonEmpty__old)
let summary__old = case docOrError of
  Left err -> Summary {title = Nothing, ok = False}
  Right doc -> buildSummary doc

-- needs a nice helper function
invalid_summary = Summary { title = Nothing, ok = False }

let summary = either invalid_summary buildSummary
let hasTitle__old = case docOrError of
  Left err -> False
  Right doc -> isTitleNonEmpty doc

let hasTitle = either (const False) isTitleNonEmpty

End result

Here’s what the code looks like when applying all of the above suggestions and tidying up the code:

{-# LANGUAGE DuplicateRecordFields #-}

import Control.Monad (forM_)
import Data.List (isInfixOf)
import Data.Maybe (fromMaybe)
import Text.Printf (printf)

newtype Error = Error String deriving (Show)

newtype Doc = Doc {_head :: Maybe Head}

newtype Head = Head {title :: Maybe String}

data Summary = Summary
  { title :: Maybe String,
    ok :: Bool
  }
  deriving (Show)

-- helpers for Repl.It is meh with language extensions

docHead :: Doc -> Maybe Head
docHead = _head

headTitle :: Head -> Maybe String
headTitle = title

summaryTitle :: Summary -> Maybe String
summaryTitle = title

-- implementations

readDoc :: String -> IO (Either Error Doc)
readDoc url
  | "fail" `isInfixOf` url = return $ Left $ Error $ printf "Bad read of %s" url
  | "head-missing" `isInfixOf` url = return $ Right Doc {_head = Nothing}
  | "title-missing" `isInfixOf` url = return $ Right Doc {_head = Just Head {title = Nothing}}
  | "title-empty" `isInfixOf` url = return $ Right Doc {_head = Just Head {title = Just ""}}
  | otherwise = return $ Right Doc {_head = Just Head {title = Just $ printf "Title of %s" url}}

buildSummary :: Doc -> Summary
buildSummary doc =
  Summary {title = docHead doc >>= headTitle, ok = True}

invalidSummary :: Summary
invalidSummary = Summary {title = Nothing, ok = False}

readAndBuildSummary :: Either Error Doc -> Summary
readAndBuildSummary = either invalidSummary buildSummary

readAndBuildTitle :: Summary -> String
readAndBuildTitle summary = fromMaybe "" (summaryTitle summary)

isTitleNonEmpty :: Doc -> Bool
isTitleNonEmpty doc =
  not $ null title
  where
    title = docHead doc >>= headTitle

readWhetherTitleNonEmpty :: Either Error Doc -> Bool
readWhetherTitleNonEmpty = either (const False) isTitleNonEmpty

-- main

main :: IO ()
main = do
  let urls = ["good", "title-empty", "title-missing", "head-missing", "fail"]

  forM_ urls $ \url -> do
    putStrLn $ printf "Checking \"https://%s/\":" url

    docOrError <- readDoc url

    -- Summary
    let summary = readAndBuildSummary docOrError
    let title = readAndBuildTitle summary

    putStrLn $ printf "  Summary: %s" $ show summary
    putStrLn $ printf "  Title: %s" $ title

    -- Has title
    let hasTitle = readWhetherTitleNonEmpty docOrError

    putStrLn $ printf "  Has title: %s" (show hasTitle)

Floyd-Warshall algorithm, revised

In this blog I would like to brush upon Floyd-Warshall algorithm implementation I have described previously.

See, generally, when explaining Floyd-Warshall algorithm, the graph is given as an adjacency matrix - an V x V matrix, where V is the number of vertices (nodes) in a graph and each matrix value representing the distance between each vertex. For instance, the value G[1][3] = 12 represents the edge from vertex 1 to vertex 3 with the value of 12.

Aside from adjacency matrix, there are two other common ways to represent a graph:

  • incidence matrix of size V x E for V vertices and E edges, where each row represents an edge value (G[i][j] > 0 for edge i -> j, G[i][j] < 0 for edge j -> i, G[i][j] = 0 for no edge between vertices i and j)
  • adjacency list, where each vertex is represented as an object with a list of its connections

Consider the graph from the previous blog:

It can be represented in three different ways:

adjacency list:

v0: [ [2 -> 1] ]
v1: [ [2 -> 6], [0 -> 7] ]
v2: [ [3 -> 5] ]
v3: [ [1 -> 2] ]

adjacency matrix:

   |  v0  v1  v2  v3
---+------------------
v0 |   0   0   1   0
v1 |   7   0   6   0
v2 |   0   0   0   5
v3 |   0   2   0   0

incidence matrix:

   |  e0  e1  e2  e3  e4
---+---------------------
v0 |   0   0   0   7   0
v1 |   0   0   2   0   0
v2 |   1   0   0   0   6
v3 |   0   5   0   0   0

However, arguably the most common and most memory-efficient way to represent graph in a real-world application would be the adjacency list.

Hence I thought a bit of a revised implementation is needed, since the previous one (aka “classic” implementation) relies on matrices. And those matrices contain quite a bit of unused data. Take the graph above as an example: it has 4 vertices and 5 edges. An adjacency matrix would use 16 numbers, incidence matrix would use 20 numbers and adjacency list would use 5 tuples of 2 numbers (10 numbers, so to speak).

“Classic” Floyd-Warshall algorithm implementation constructs an additional matrix for the shortest paths’ distances (at most it can reuse the existing one representing the graph itself) and an additional V x V matrix for path reconstruction.

With a graph represented as an adjacency list (or just a list of node objects), the algorithm can use something like a dictionary (or a hashmap) to reduce the memory consumption. Subjectively, it also becomes easier to read and understand the algorithm.

First, we’d need few definitions:

class Node
{
    public string Name;
    public Dictionary<Node, int> Connections;

    public Node(string name)
    {
        this.Name = name;
        this.Connections = new Dictionary<Node, int>();
    }

    public override string ToString()
    {
        return this.Name;
    }
}

class Graph
{
    public List<Node> Nodes;

    public Graph()
    {
        this.Nodes = new List<Node>();
    }
}

class Path
{
    public Node From;
    public Node To;
    public List<Node> Nodes;
    public int Distance;

    public Path(Node from, Node to)
    {
        this.Distance = 0;
        this.From = from;
        this.To = to;
        this.Nodes = new List<Node>();
    }
}

The Path class is a bit excessive, as it really only needs a list of Node objects, but I decided to make it verbose on purpose - to make it easier to debug.

The algorithm then becomes like this:

Dictionary<(Node from, Node to), Node> FindAllPaths(Graph g)
{
    var distance = new Dictionary<(Node from, Node to), int>();
    var pathThrough = new Dictionary<(Node from, Node to), Node>();

    foreach (var nodeFrom in g.Nodes)
    {
        foreach (var nodeTo in g.Nodes)
        {
            if (nodeFrom == nodeTo)
            {
                distance.Add((nodeFrom, nodeTo), 0);
            }
            else
            {
                distance.Add((nodeFrom, nodeTo), int.MaxValue);
            }
        }

        foreach (var nodeTo in nodeFrom.Connections.Keys)
        {
            distance[(nodeFrom, nodeTo)] = nodeFrom.Connections[nodeTo];
            pathThrough.Add((nodeFrom, nodeTo), nodeTo);
        }

        pathThrough.Add((nodeFrom, nodeFrom), nodeFrom);
    }

    foreach (var nodeThrough in g.Nodes)
    {
        foreach (var nodeFrom in g.Nodes)
        {
            foreach (var nodeTo in g.Nodes)
            {
                if (nodeFrom == nodeTo || distance[(nodeThrough, nodeTo)] == int.MaxValue || distance[(nodeFrom, nodeThrough)] == int.MaxValue)
                {
                    continue;
                }

                if (distance[(nodeFrom, nodeTo)] > distance[(nodeFrom, nodeThrough)] + distance[(nodeThrough, nodeTo)])
                {
                    distance[(nodeFrom, nodeTo)] = distance[(nodeFrom, nodeThrough)] + distance[(nodeThrough, nodeTo)];
                    pathThrough[(nodeFrom, nodeTo)] = pathThrough[(nodeFrom, nodeThrough)];
                }
            }
        }
    }

    return pathThrough;
}

One can easily see the separate parts of the algorithm - initializing the intermediate storage (for the paths and distances) and actually finding the shortest paths. It is pretty much the same algorithm as before with the only difference being the use of dictionary instead of a matrix.

Path reconstruction bit is also a little bit different (not too much though):

Path ReconstructPath(Node from, Node to, Dictionary<(Node from, Node to), Node> allPaths)
{
    if (!allPaths.ContainsKey((from, to)))
    {
        return null;
    }

    var path = new Path(from, to);

    var tempNode = from;

    path.Nodes.Add(tempNode);

    while (tempNode != to)
    {
        var nextNode = allPaths[(tempNode, to)];

        path.Distance += tempNode.Connections[nextNode];
        path.Nodes.Add(nextNode);

        tempNode = nextNode;
    }

    return path;
}

Now, there is one major improvement we can make to this algorithm. See, when we initialize the dictionaries for paths and distances, we follow the same logic as with the classic implementation, putting in the values for edges which do not exist in a graph:

foreach (var nodeFrom in g.Nodes)
{
    // this bit will add entries for inexistent edges
    foreach (var nodeTo in g.Nodes)
    {
        if (nodeFrom == nodeTo)
        {
            distance.Add((nodeFrom, nodeTo), 0);
        }
        else
        {
            distance.Add((nodeFrom, nodeTo), int.MaxValue);
        }
    }

    // ...
}

But as a matter of fact, we do not really have to store these values - instead, we could simply have a check whether the needed entry exists in the dictionary or not:

Dictionary<(Node from, Node to), Node> FindAllPaths(Graph g)
{
    var distance = new Dictionary<(Node from, Node to), int>();
    var pathThrough = new Dictionary<(Node from, Node to), Node>();

    foreach (var nodeFrom in g.Nodes)
    {
        // -- no more unnecessary entries --

        foreach (var nodeTo in nodeFrom.Connections.Keys)
        {
            distance.Add((nodeFrom, nodeTo), nodeFrom.Connections[nodeTo]);
            pathThrough.Add((nodeFrom, nodeTo), nodeTo);
        }

        pathThrough.Add((nodeFrom, nodeFrom), nodeFrom);
    }

    foreach (var nodeThrough in g.Nodes)
    {
        foreach (var nodeFrom in g.Nodes)
        {
            foreach (var nodeTo in g.Nodes)
            {
                if (nodeFrom == nodeTo)
                {
                    continue;
                }

                // ++ but one extra check here, to prevent unnecessary iterations ++
                if (!distance.ContainsKey((nodeFrom, nodeThrough)) || !distance.ContainsKey((nodeThrough, nodeTo)))
                {
                    continue;
                }

                // ++ and one extra check here, to annotate the existence of a new, indirect shortest path between nodeFrom and nodeTo ++
                if (!distance.ContainsKey((nodeFrom, nodeTo)) || distance[(nodeFrom, nodeTo)] > distance[(nodeFrom, nodeThrough)] + distance[(nodeThrough, nodeTo)])
                {
                    distance[(nodeFrom, nodeTo)] = distance[(nodeFrom, nodeThrough)] + distance[(nodeThrough, nodeTo)];
                    pathThrough[(nodeFrom, nodeTo)] = pathThrough[(nodeFrom, nodeThrough)];
                }
            }
        }
    }

    return pathThrough;
}

For that matter, we don’t really need to have a separate dictionary for path reconstruction - since both distance and pathThrough dictionaries serve the same purpose (have the exact same key, but different value), we can smash those two together and use even less memory:

Dictionary<(Node from, Node to), (Node through, int distance)> FindAllPaths(Graph g)
{
    var pathThrough = new Dictionary<(Node from, Node to), (Node through, int distance)>();

    foreach (var nodeFrom in g.Nodes)
    {
        foreach (var nodeTo in nodeFrom.Connections.Keys)
        {
            pathThrough.Add((nodeFrom, nodeTo), (nodeTo, nodeFrom.Connections[nodeTo]));
        }

        pathThrough.Add((nodeFrom, nodeFrom), (nodeFrom, 0));
    }

    foreach (var nodeThrough in g.Nodes)
    {
        foreach (var nodeFrom in g.Nodes)
        {
            foreach (var nodeTo in g.Nodes)
            {
                if (nodeFrom == nodeTo)
                {
                    continue;
                }

                if (!pathThrough.ContainsKey((nodeFrom, nodeThrough)) || !pathThrough.ContainsKey((nodeThrough, nodeTo)))
                {
                    continue;
                }

                if (!pathThrough.ContainsKey((nodeFrom, nodeTo)) || pathThrough[(nodeFrom, nodeTo)].distance > pathThrough[(nodeFrom, nodeThrough)].distance + pathThrough[(nodeThrough, nodeTo)].distance)
                {
                    pathThrough[(nodeFrom, nodeTo)] = (nodeThrough, pathThrough[(nodeFrom, nodeThrough)].distance + pathThrough[(nodeThrough, nodeTo)].distance);
                }
            }
        }
    }

    return pathThrough;
}

Essentially, nothing special - just using the tuples for both the intermediate node and the distance for the shortest paths.

And with a tiny bit of a change, path reconstruction:

Path ReconstructPath(Node from, Node to, Dictionary<(Node from, Node to), (Node through, int distance)> allPaths)
{
    if (!allPaths.ContainsKey((from, to)))
    {
        return null;
    }

    var path = new Path(from, to);

    var tempNode = from;

    path.Nodes.Add(tempNode);

    while (tempNode != to)
    {
        var (nextNode, distance) = allPaths[(tempNode, to)];

        path.Distance += distance;
        path.Nodes.Add(nextNode);

        tempNode = nextNode;
    }

    return path;
}

Parser monad

Few years ago I did a course project for the “Functional programming” course at Jagiellonian University. The project was to implement a program for solving systems of equations… in Haskell.

For me back then, the entire concept of monads was quite foreign and scary, so I implemented a very crude state machine to parse the equations as strings from STDIN:

parseRow :: String -> Integer -> [Integer] -> [String] -> ([Integer], [String])
parseRow [] state coefficients var_names
    | state == 7 = (reverse coefficients, reverse var_names)
    | otherwise = error ("Invalid equation (state: " ++ show state ++ "; coefficients: " ++ show coefficients ++ "; var_names: " ++ show var_names ++ ")")

parseRow (c:cs) state coefficients var_names
    | (state == 1) && (c == '-') = parseRow cs 2 ((-1) : coefficients) var_names
    | state == 1 && isNum c = parseRow cs 3 (c_int : coefficients) var_names
    | state == 1 && isAlpha c = parseRow cs 4 (1 : coefficients) ([c] : var_names)
    | state == 2 && isNum c = parseRow cs 3 (repl_k : drop 1 coefficients) var_names
    | state == 2 && isAlpha c = parseRow cs 4 coefficients ([c] : var_names)
    | (state == 3) && (c == '=') = parseRow cs 5 coefficients var_names
    | state == 3 && isAlpha c = parseRow cs 4 coefficients ([c] : var_names)
    | state == 3 && isNum c = parseRow cs 3 (new_k : drop 1 coefficients) var_names
    | (state == 4) && (c == '-') = parseRow cs 2 ((-1) : coefficients) var_names
    | (state == 4) && (c == '+') = parseRow cs 2 (1 : coefficients) var_names
    | state == 4 && isAlphaNum c = parseRow cs 4 coefficients (new_v : drop 1 var_names)
    | (state == 4) && (c == '=') = parseRow cs 5 coefficients var_names
    | (state == 5) && (c == '-') = parseRow cs 6 ((-1) : coefficients) var_names
    | state == 5 && isNum c = parseRow cs 7 (c_int : coefficients) var_names
    | state == 6 && isNum c = parseRow cs 7 (repl_k : drop 1 coefficients) var_names
    | state == 7 && isNum c = parseRow cs 7 (new_k : drop 1 coefficients) var_names
    | otherwise = parseRow cs state coefficients var_names
    where
        k = abs $ head coefficients
        k_sign = sign (head coefficients)
        c_int = atoi c
        new_k = k_sign * ((k * 10) + c_int)
        repl_k = k_sign * c_int
        v = if not (null var_names) then head var_names else []
        new_v = v ++ [c]

parseLine :: String -> ([Integer], [String])
parseLine s = parseRow s 1 [] []

However, recently I (hopefully) got a grip of monads, so I decided to modify my old project in Haskell to make it prettier.

One of the improvements I made was the parsing part. Since one of the requirements for the course project was to not use any third-party libraries, I decided to stick to this and implement a parser monad.

This proved to be an interesting task and made the entire parsing part quite neat:

equationFactor :: Parser Fraction
equationFactor = do
    _sign <- factorSign

    zeroOrMore (sat isSpace)

    factor <- fmap (fromMaybe (1%1)) (zeroOrOne rationalFactor)

    return (_sign * factor)

equationMember :: Parser (Fraction, String)
equationMember = do
    factor <- equationFactor

    zeroOrMore (sat isSpace)
    zeroOrOne (sat (== '*'))
    zeroOrMore (sat isSpace)

    nameFirst <- oneOrMore (sat isAlpha)
    nameRest <- zeroOrMore (sat isAlphaNum)

    zeroOrMore (sat isSpace)

    return (factor, nameFirst ++ nameRest)

-- An equation consists of a list of pairs (factor, variable name) and a free member
equation :: Parser ([(Fraction, String)], Fraction)
equation = do
    members <- oneOrMore equationMember

    zeroOrMore (sat isSpace)
    sat (== '=')
    zeroOrMore (sat isSpace)

    freeMember <- rationalFactor

    return (members, freeMember)

This is all possible thanks to the Parser monad. Under the cut - the implementation details.

Read more

ConfigScript

Once upon a time I thought OGRE was overly too complicated - all those unnecessary script files, custom formats, a ton of setup hassle. That was until I tried figuring out an entire modern 3D rendering stack from scratch. That’s when I realized configuring a rendering process (rendering pipeline) can be tricky. And that’s when I realized configuring application with config files can be helpful. OGRE suddenly became very appealing to me and I really began to appreciate all the work the devs have put into it.

One good aspect of OGRE was the “unnecessary” script and configuration files. But the syntax of those files looked much cleaner than that of JSON:

// This is a comment
object_keyword Example/ObjectName
{
    attribute_name "some value"

    object_keyword2 "Nested Object"
    {
        other_attribute 1 2 3
        // and so on..
    }
}

I thought if I could harvest anything from OGRE into my OpenGL application, that would be the configuration based on this format (rather than Lua or whatever scripts).

Hence I crafted this simple grammar in ANTLR4 to parse these files:

grammar ConfigScript;

config : (object | comment)* EOF ;

object
    : Identifier '{' property* '}'
    | Identifier STRING '{' (property)* '}'
    ;

property : Identifier propertyValue ;

propertyValue
    : vector
    | INT
    | FLOAT
    | BOOL
    | STRING
    | objectValue
    ;

objectValue
    : '{' property* '}'
    | STRING '{' (property)* '}'
    ;

vector
    : INT+
    | FLOAT+
    ;

comment : LINE_COMMENT | BLOCK_COMMENT ;

STRING : DOUBLE_QUOTED_STRING | SINGLE_QUOTED_STRING ;

BOOL : 'true' | 'false' ;

DOUBLE_QUOTED_STRING : '"' DoubleQuoteStringChar* '"' ;
SINGLE_QUOTED_STRING : '\'' SingleQuoteStringChar* '\'' ;

Identifier : ALPHA (ALPHA | NUM)* ;

fragment SingleQuoteStringChar : ~['\r\n] ;
    // : ~['\\\r\n]
    // | SimpleEscapeSequence ;

fragment DoubleQuoteStringChar : ~["\r\n] ;
    // : ~["\\\r\n]
    // | SimpleEscapeSequence ;

// fragment SimpleEscapeSequence : '\\' ['"?abfnrtv\\] ;

INT : '0'
    | '-'? [1-9] [0-9]*
    ;

FLOAT : ('+' | '-')? NUM+ '.' NUM+ ;

WHITESPACE : [ \r\n\t]+ -> skip ;
ALPHA : [a-zA-Z_] ;
NUM : [0-9] ;

LINE_COMMENT : '//' ~[\r\n]* -> skip ;
BLOCK_COMMENT : '/*' .*? '*/' -> skip ;

The only difference is that the object name can only be a quoted string:

// This is a comment
object_keyword "Example/ObjectName" // <--- this can not be just Example/ObjectName
{
    attribute_name "some value"

    object_keyword2 "Nested Object"
    {
        other_attribute 1 2 3
        // and so on..
        /* block comment */
    }
}

The only thing left with this parser thing is to compile it and use in a project.

Read more

OpenGL: advanced samples

Damn this blog was long coming - I picked my interest in shaders over a decade ago, started to actually learn something in early 2021 and started writing this blog at the end of 2021. Now the time has finally come to share this with the world!

For a long time I was keen in learning low-level computer graphics APIs and algorithms, but only recently actually made any progress towards that goal. I have spent few weeks if not months learning about shaders, graphics pipeline and rendering techniques, so this write-up was long forecoming. There might be some mistakes, since I am not an expert in CG and OpenGL and there is a big chunk missing, namely rendering animated 3D models, compute shaders and tesselation, but I hope to fix those in the future.

One might ask: why OpenGL? Why not Vulkan (or DirectX 12)? Why even bother with these emulaiton techniques - why not jump straight to raytracing? Well, it is the end of October 2022 at the moment of publishing this blog, the most recent GPU on the market is GeForce RTX 4090, which is yet to arrive and its RRP (recommended retailer price) is A$2959 (as advertised by nVidia):

RTX4090 price as of October 2022

And most of those cards are gone in a blink of an eye and used for mining whatever-coins. My PC runs GTX1080, which is still more than enough for most of the games I play (being Genshin Impact at most).

I think it is still not a widely spread consumer market, and whilst raytracing, DLSS and all those fine things sound super cool and nice, it is still a niche market. That is not to say I won’t be looking into that in the near future. I just think OpenGL still has place in this world, at least as a starting point to build a decent understanding of the modern graphics pipeline, some basic rendering techniques and justify switching to more recent APIs. OpenGL as well as DirectX 11 are still a very feasible fallback for those sytems which do not run top-notch hardware or are struggling running those massive open-world games in anything higher than 1080p@60fps.

In this blog I will try to condense the knowledge I have acquired. I will skip most basic parts such as initializing context, creating window and so on, since most of the tutorials and articles on the Internet focus on those topics. Unlike most tutorials and articles out there, this blog will be all about rendering techniques.

This article was heavily inspired by few blogs on russian website (Habr), namely “super-modern OpenGL” (part 1 and part 2) - they are quite messy and lack a lot of material on really interesting topics (again, rendering techniques). This blog partially builds on top of those two and uses a lot more other materials (references provided).

This blog is enormous, so blow is the table of contents for the topics covered. Feel free to jump straight to the topic that picks your interest, screenshots are provided:

The source code is available on GitHub.

Read more