More than TypeScript

Back in 2011 frontend was a very different thing - JavaScript had no class, Object.entries / Object.keys, promises were a proof of concept idea (unless you used 3rd party library bluebird) and Node was v0.10.

Then came CoffeeScript, which added nice helper features to JavaScript - list comprehensions, classes, string interpolation and if statements (meaning you could use them for variable assignment):

# if statements
text = if happy and knowsIt
  chaChaCha()
else if sexy
  knowsIt()
else if tooSexy
  removeShirt()
else
  showIt()

# list comprehensions
courses = [ 'greens', 'caviar', 'truffles', 'roast', 'cake' ]
menu = (i, dish) -> "Menu Item #{i}: #{dish}"
menu i + 1, dish for dish, i in courses

# ranges and list comprehensions
countdown = (num for num in [10..1])

# iterating over object entries
yearsOld = max: 10, ida: 9, tim: 11

ages = for child, age of yearsOld
  "#{child} is #{age}"

Whilst it was still compiled to an inferior ES5 JavaScript, it helped to organise the code and make it substantially cleaner. The one drawback is that it did not provide any type safety (only added recently, via @flow and you would still have to duplicate your classes if you wanted to use it - once in coffeescript and once in @flow annotations). Anyhow, CoffeeScript was a good tool for the task, if you ask me.

Then came Dart, Flow and TypeScript, which were also compiled to ES5 JavaScript, but instead of adding new syntax features, they aimed to solve a different problem - by introducing types they sought to reduce the number of runtime errors with type checks at compile time.

This sounded so good that many developers and companies immediately got on board. Alas, the new tech still suffered from the same issue as JavaScript itself and most common cause of runtime errors - the null and undefined still was a thing, causing the exact same runtime errors.

The one true benefit offered by TypeScript over the others was that it allowed to seamlessly use existing JavaScript code. And, provided you have the type signatures for that JavaScript code, it could even perform type checking on it too, effectively reducing the requirements for using TypeScript in the existing codebase. No wonder it was an easy buy-in for many projects.

Fast-forward to 2024 (twelve years since its first release) and TypeScript dominates the frontend world. Over the years it seems to have been focused on improving the type system in terms of what can you do with types - union types, partial types, etc. It still suffers from the original issues though and there still are not too many new syntactic features to pair with its powerful type system (like pattern matching).

With the new EcmaScript standards, classes and promises becoming the first-class citizens in all browsers (even Internet Explorer / Edge, when it was still around), the APIs and syntax became more mature (Object.entries, async / await to reduce callback hell, for .. of, const / let, string interpolation and many others). There are still no list comprehensions or conditional expressions / pattern matching though.

TypeScript still does help in what I think is a small subset of highly-specific scenarios like navigating the code (jump to definition / declaration / find usages) in the IDE and refactoring the code. But IDEs matured as well, code navigation is not as much of a feature unique to TypeScript anymore as it used to be (in the era of Sublime Text). TypeScript can prevent some really naive errors at compile time like using a number instead of an object, but I don’t think developers run into them these days - because, again, IDEs are really powerful these days, even VSCode and they help eliminate such mistakes to a large degree.

Let’s consider a real-world scenario (from my work project): server returns a list of LocationType objects. Each one of the objects can be either a Collection, a Document (in a specific collection) or a Table, each with its own subset of fields, specific to the type. We need to handle each case differently (display them on the UI differently).

The OpenAPI spec:

tableLocation:
  type: object
  required:
    - table
  properties:
    table:
      type: string

collectionLocation:
  type: object
  required:
    - collection
  properties:
    collection:
      type: string

documentLocation:
  type: object
  required:
    - collection
    - document
  properties:
    collection:
      type: string
    document:
      type: string

location:
  oneOf:
    - $ref: "#/components/tableLocation"
    - $ref: "#/components/collectionLocation"
    - $ref: "#/components/documentLocation"

And the TypeScript code generated by OpenAPI for the above spec looks like this:

interface CollectionLocation {
    collection: string;
}

interface DocumentLocation {
    collection: string;
    document: string;
}

interface TableLocation {
    table: string;
}

function instanceOfCollectionLocation(value: object): boolean {
    let isInstance = true;
    isInstance = isInstance && "collection" in value;

    return isInstance;
}

function instanceOfDocumentLocation(value: object): boolean {
    let isInstance = true;
    isInstance = isInstance && "collection" in value;
    isInstance = isInstance && "document" in value;

    return isInstance;
}

function instanceOfTableLocation(value: object): boolean {
    let isInstance = true;
    isInstance = isInstance && "table" in value;

    return isInstance;
}

function CollectionLocationFromJSONTyped(json: any, ignoreDiscriminator: boolean): CollectionLocation {
    if ((json === undefined) || (json === null)) {
        return json;
    }
    return {
        'collection': json['collection'],
    };
}

function DocumentLocationFromJSONTyped(json: any, ignoreDiscriminator: boolean): DocumentLocation {
    if ((json === undefined) || (json === null)) {
        return json;
    }
    return {
        'collection': json['collection'],
        'document': json['document'],
    };
}

function TableLocationFromJSONTyped(json: any, ignoreDiscriminator: boolean): TableLocation {
    if ((json === undefined) || (json === null)) {
        return json;
    }
    return {
        'table': json['table'],
    };
}

type JobLocation = CollectionLocation | DocumentLocation | TableLocation;

function JobLocationFromJSONTyped(json: any, ignoreDiscriminator: boolean): JobLocation {
    if ((json === undefined) || (json === null)) {
        return json;
    }
    return { ...CollectionLocationFromJSONTyped(json, true), ...DocumentLocationFromJSONTyped(json, true), ...TableLocationFromJSONTyped(json, true) };
}

This might be a bit too harsh on TypeScript as a language, considering this is not the best implementation (in my opinion), but this highlights one of the issues with TypeScript: this very same code caused a number of runtime errors caught by users - not exceptions, not compilation errors, but wrong UI behaviour - on the UI all locations were treated as a table location.

The reason why this was happening is the code itself - it does not really handle the choice type JobUpdateLocation correctly and instead of a choice type it returns a union type, to put it roughly - instead of oneOf it returns essentially allOf object.

Now, even if we were to rewrite it by hand (which would defeat the purpose of using OpenAPI and could be harder to keep in sync between client and server code), we would end up with something like this:

type CollectionLocation = {
    collection: string;
}

type DocumentLocation = {
    collection: string;
    document: string;
}

type TableLocation = {
    table: string;
}

function instanceOfCollectionLocation(value: object): boolean {
    return ("collection" in value) && !("document" in value);
}

function instanceOfDocumentLocation(value: object): boolean {
    return ("collection" in value) && ("document" in value);
}

function instanceOfTableLocation(value: object): boolean {
    return ("table" in value);
}

function CollectionLocationFromJSONTyped(json: any): CollectionLocation | undefined {
    if ((json === undefined) || (json === null)) {
        return undefined;
    }
    return {
        'collection': json['collection'],
    };
}

function DocumentLocationFromJSONTyped(json: any): DocumentLocation | undefined {
    if ((json === undefined) || (json === null)) {
        return undefined;
    }
    return {
        'collection': json['collection'],
        'document': json['document'],
    };
}

function TableLocationFromJSONTyped(json: any): TableLocation | undefined {
    if ((json === undefined) || (json === null)) {
        return undefined;
    }
    return {
        'table': json['table'],
    };
}

type JobLocation = CollectionLocation | DocumentLocation | TableLocation;

function JobLocationFromJSONTyped(json: any): JobLocation | undefined {
    if ((json === undefined) || (json === null)) {
        return undefined;
    }
    if (instanceOfCollectionLocation(json) && !instanceOfDocumentLocation(json) && !instanceOfTableLocation(json)) {
      return CollectionLocationFromJSONTyped(json);
    }
    if (instanceOfDocumentLocation(json) && !instanceOfCollectionLocation(json) && !instanceOfTableLocation(json)) {
      return DocumentLocationFromJSONTyped(json);
    }
    if (instanceOfTableLocation(json) && !instanceOfCollectionLocation(json) && !instanceOfDocumentLocation(json)) {
      return TableLocationFromJSONTyped(json);
    }
    return undefined;
}

This is quite a verbose code, with few bad practices in place (the use of any and object, need to always be conscious of undefined), but this is literally what we ended up doing (the helpers, not redefining the types).

The types in TypeScript (or rather classes and interfaces) are also a point of a few confusing tricks you have to keep in mind - if we were to use class instead of type, as follows

class CollectionLocation {
    constructor(public collection: string) {}
}

class DocumentLocation {
    constructor(public collection: string, document: string) {}
}

class TableLocation {
    constructor(public table: string) {}
}

we would eventually run into the similar bug at runtime, which is a perfectly valid behaviour from the perspective of TypeScript, because of type compatibility, meaning DocumentLocation and CollectionLocation could be used interchangeably, since they have a subset of compatible fields:

const a: CollectionLocation = new DocumentLocation('col', 'doc'); // ok
const b: DocumentLocation = new CollectionLocation('col'); // also ok
const c: TableLocation = a; // not ok

This would not work if CollectionLocation, DocumentLocation and TableLocation were types instead:

const a: CollectionLocation = { collection: 'col', document: 'doc' }; // not ok
const b: DocumentLocation = { collection: 'col' }; // not ok
const c: TableLocation = b; // not ok

And the code to parse location from JSON is actually quite ugly. It would benefit so much from switch expressions or pattern matching!

And we still have to remember to handle those potentially undefined values whenever we use the helper functions. And here’s an example from literally few days ago:

interface Job {
    projectId: string;
}

const jobsInProgress: Job[];

const message = useProjectStatus(
    jobsInProgress[0]?.projectId ?? ''
);

const useProjectStatus = (projectId: string) {
    const { data } = useQuery({ queryFn: () => fetch(`/project/${projectId}`) });

    return data;
};

The above code started throwing an error, since server responded with 500 Server Error. Reason was quite simple - the projectId, which we recently started to validate on the server (expecting it to be a valid ID), was a blank string. Interesting thing: no one has questioned the very line causing the “default value” for projectId to become an empty string:

jobsInProgress[0]?.projectId ?? ''

And issues like these are unbelievably common in front-end world, while being quite tricky to detect and resolve. To fix this particular issue we added a client-side validation (to an extent) to run the query only when the projectId value is provided:

useQuery({
  enabled: !!projectId,
  queryFn: () => fetch(`/project/${projectId}`)
})

The problem remains: it is really easy to miss this rather small fallback to an empty string.

There is a solution in Scala world that somewhat addresses this issue - refined types, which allows to have something along the lines of:

import eu.timepit.refined.*
import eu.timepit.refined.api.Refined
import eu.timepit.refined.string.*

type NonBlankString = MatchesRegex[".+"]

def useProjectStatus(projectId: String Refined NonBlankString) = ???

This would require wrapping all the values passed to fetchData in refineV[NonBlankString]() call and handling the case when the validation fails:

def generateId(): String = List("some", "").last

def fetchData(id: String Refined NonBlankString) = println(s"fetching '$id'...")

def main(args: Array[String]) = {
  for {
    id <- refineV[NonBlankString](generateId())
  } yield fetchData(id)
}

But in the land of TypeScript, there is only so much you can do - TypeScript only works at compile time.

The above examples might sound like very far-fetched edge case scenarios, but keep in mind: this is the code generated automatically by one of the most popular tools from a trivial schema. This is not as far-fetched as it might seem.

Can we do better in TypeScript? Something like refined in Scala? What if we had a powerful type system and syntax to support it? And, if possible, get rid of the null and undefined along the way?

The problem of null and undefined can be mitigated to an extent by using some concepts of functional programming, similarly to how I showcased some time ago. It would be quite hard to achieve, though, given the problem remains imbued in the language itself. Moreover, targeting the issues described above, it would take an entire standard library to really reduce the possibility of the issue:

const jobInProgressMaybe: Maybe<Job> = new List<Job>(jobsInProgress).first();
const projectIdMaybe: Maybe<string> = jobInProgress.map(j => j.projectId);

useQuery({
  enabled: projectIdMaybe.isSome(),
  queryFn: () =>
    projectIdMaybe
      .flatMap(projectId => fetch(`/project/${projectId}`))
      .orElse(Promise.reject('no projectId'))
})

We could try to address the empty string issue with the extensive type system:

type NonEmptyString<T extends string> = '' extends T ? never : T;
type MyString<T extends string> = T;

function fetchData<T extends string>(id: NonEmptyString<T>) {
  console.log(`Fetching ${id}`);
}

But it would only work if all the values are known at compile time, which is easily broken with the simplest test:

function generateId(): NonEmptyString<string> {
  return ['moo', ''][1] as NonEmptyString<string>;
}

fetchData(''); // not ok
fetchData(generateId()); // ok, but guaranteed undefined behaviour at runtime

And it would take even more effort to make all types uniquely identifiable (to solve the choice type problem).

The way most developers would approach solving similar issues in a real-world project would be (at best) adding some linters, checkers and relying on automated tests and high-quality code reviews. In my experience, this is a rather flimsy excuse rather than a real solution and it does not work most of the time - especially in edge case scenarios.

This is where I’d suggest to use another language altogether, which, similarly to CoffeeScript and TypeScript back in the day, solved some problems at compile time. And suggest I will.

There is a big warning before I proceed though: another technology is a rather big decision, not to be taken lightly. Some might see it as an impossible switch, only few small or indie developers on rather unimportant projects would make. But remember it was the same story with ES6, TypeScript, bundlers and pretty much any significant upgrade in the past.

Balance bike is a good tool to get you going - it gets you from walking to moving fast. But if you want to get faster and further, you have to drop it at some stage in favour of a more advanced bike.

Similarly to how TypeScript and ESNext got you from plain callback-hell-infested JavaScript code to a better place - you can refactor code faster, it saves you from a few errors at compile time, the code is much cleaner and conscise now. But if you really want to get even further, you will have to make a leap of faith, make an investment into the future.

Here is my big controversial suggestion: a pure functional language, one with strong type system, which does not have a concept of null and undefined in the first place, with a nice sweet syntax.

Before landing on a specific choice, check out Elm (dead, but a good starting point) and PureScript, in that order. Let me explain.

Elm is like a very simplified Haskell - it is a pure functional language with a subset of Haskell syntax. It has a nice compiler with really good error messages. It enforces a structure for your application (redux-like). It gives you a gentle introduction to the functional programming concepts and it targets browsers (web applications). With its architecture, you can look at the message (action from redux) type and see exactly what are all possible operations in the application (which makes reading code and getting to know new codebases much easier).

On a bad note, it is not being developed since 2019, it comes with an entire runtime (saves you from runtime errors, but blows up the bundle size) and it is a all-or-nothing commitment for the project - it is an all-in-one platform and if you want to gradually update your application from React - sorry, you will have to rewrite entire parts of you application entirely in Elm. The good point turned bad, having all possible actions defined in one message type make complex applications really complex (with one massive type definition, an issue very familiar to developers who had to deal with Redux).

here could have been an Elm code sample

The next step on this journey would be PureScript. It is an actively developed language, it has a minimal footprint after compiled to JS (much smaller than Elm), it has a very rich ecosystem and, best of all, it has a very simple interop with JS and it can compile just one module. Top it up with Halogen framework and you effectively got yourself Elm on steroids. The downside is that it is slightly more complex platform (language and framework) compared to Elm, so the learning curve is a bit steeper.

The above example of CoffeeScript code could be written in plain PureScript like this:

foreign import happy :: Boolean
foreign import knowsIt :: Boolean
foreign import sexy :: Boolean
foreign import tooSexy :: Boolean
foreign import chaChaCha :: String
foreign import knowsItStr :: String
foreign import removeShirt :: String
foreign import showIt :: String

import Data.Array ((..), mapWithIndex)
import Data.Map as M
import Data.Tuple (Tuple(..))

-- if statements with multiple branches become pattern matching
text
  | happy && knowsIt = chaChaCha
  | sexy = knowsItStr
  | tooSexy = removeShirt
  | otherwise = showIt

-- list comprehensions become function application
courses = [ "greens", "caviar", "truffles", "roast", "cake" ]

-- string interpolation is possible via an external packages
-- https://pursuit.purescript.org/packages/purescript-interpolate
import Data.Interpolate as I
menu' i dish = I.i "Menu Item " i ": " dish

-- https://pursuit.purescript.org/packages/purescript-template-strings
import Data.TemplateString.Unsafe ((<~>))
menu'1 i dish = "Menu Item ${i}: ${dish}" <~> { i: i, dish: dish }

import Data.TemplateString ((<->))
import Data.Tuple.Nested ((/\))
menu'2 i dish = "Menu Item ${i}: ${dish}" <-> [ "i" /\ i, "dish" /\ dish ]

-- pure PureScript string interpolation
menu i dish = "Menu Item " <> (show i) <> ": " <> dish

x i dish = menu (i + 1) dish

-- can not just call a function and ignore its result
x' = mapWithIndex x courses

-- ranges become Array monad
-- countdown :: Array Int
countdown = do
  num <- 10 .. 1
  pure num

-- JavaScript objects exist in a separate package
import Foreign.Object as FO
yearsOld' = FO.fromHomogeneous { max: 10, ida: 9, tim: 11 }

-- object as Map
yearsOld = M.fromFoldable [Tuple "max" 10, Tuple "ida" 9, Tuple "tim" 11]

y child age = (show child) <> " is " <> (show age)
ages = map y yearsOld
ages = map y yearsOld'

The real deal with this approach is how to migrate from an existing (most likely) React/TypeScript/(webpack | vite) ecosystem to PureScript?

Expanding on Scott Wlaschin’s talk, you can (and probably should) separate the pure application logic from IO, potentially utilising the foreign imported functions to interact with the existing JS code (libraries). This way you keep your application logic error-free, and all the errors that can happen are shifted towards the presentation layer (MVC/MVP, remember this concept?).

This would be the best strategy for the most projects, migrating one bit at a time and making the application less and less error prone whilst not wreaking the havok by rewriting everything from scratch (very few businesses will buy into that).

The bigger issue is that most modern frontend apps I have seen are so mangled in mixing the business logic and the presentation layer, it would be challenging (to say the least) to unmangle it back to a reasonable code. Check how we handle UI action, triggering a HTTP request and updating both the UI (to display the request progress/status) and the application state (for other parts of the UI) at the same time.

here could have been a real-world application interaction handling code sample

Calling PureScript code from JavaScript (based on FFI example in PureScript book):

module Test where

import Prelude

gcd :: Int -> Int -> Int
gcd 0 m = m
gcd n 0 = n
gcd n m
  | n > m     = gcd (n - m) m
  | otherwise = gcd (m - n) n

data ZeroOrOne a = Zero | One a

inc :: ZeroOrOne Int -> ZeroOrOne Int
inc Zero = Zero
inc (One n) = One (n + 1)

_zero = Zero
_one = One 1
_two = One 2

and then in JS:

import Test from 'Test.js';

Test.gcd(15)(20);

const _zero = new Test.Zero();
const _one = new Test.One(1);
const _two = new Test.One(2);

console.log(Test.inc(_zero));
console.log(Test.inc(_one));
console.log(Test.inc(_two));

In the other direction (calling JS code from PureScript):

export const setItem = key => value => () =>
  window.localStorage.setItem(key, value);

export const getItem = key => () =>
  window.localStorage.getItem(key);

and then in PureScript:

foreign import setItem :: String -> String -> Effect Unit

foreign import getItem :: String -> Effect Json

import Data.Argonaut (class DecodeJson, class EncodeJson)
import Data.Argonaut.Decode.Generic (genericDecodeJson)
import Data.Argonaut.Encode.Generic (genericEncodeJson)
import Data.Generic.Rep (class Generic)

-- define PhoneType

derive instance Generic PhoneType _

instance EncodeJson PhoneType where encodeJson = genericEncodeJson
instance DecodeJson PhoneType where decodeJson = genericDecodeJson

processItem :: Json -> Either String Person
processItem item = do
  jsonString <- decodeJson item
  j          <- jsonParser jsonString
  decodeJson j

main = do
  item <- getItem "person"
  initialPerson <- case processItem item of
    Left  err -> do
      log $ "Error: " <> err <> ". Loading examplePerson"
      pure examplePerson
    Right p   -> pure p

To align this with the original problem about JobLocation, here’s how this would look like in PureScript:

module Main where

import Prelude
import Data.Argonaut (jsonParser)
import Data.Argonaut.Decode (decodeJson, (.:), printJsonDecodeError)
import Data.Argonaut.Decode.Class (class DecodeJson)  
import Data.Argonaut.Decode.Error (JsonDecodeError(..))
import Data.Argonaut.Encode (encodeJson, (:=), (~>))
import Data.Argonaut.Encode.Class (class EncodeJson)
import Data.Bifunctor (bimap)
import Data.Either (Either(..))
import Data.Generic.Rep (class Generic)
import Data.Show.Generic (genericShow)
import Effect (Effect)
import Effect.Console (log)

data JobLocation
  = CollectionLocation { collection :: String }
  | DocumentLocation { collection :: String, document :: String }
  | TableLocation { table :: String }

derive instance Eq JobLocation
derive instance Generic JobLocation _
instance showJobLocation :: Show JobLocation where
  show a = genericShow a

instance DecodeJson JobLocation where
  decodeJson json = do
    obj <- decodeJson json
    
    -- Check which fields are present
    maybeCollection :: Maybe String <- obj .:? "collection"
    maybeDocument :: Maybe String <- obj .:? "document" 
    maybeTable :: Maybe String <- obj .:? "table"
    
    let hasCollection = isJust maybeCollection
    let hasDocument = isJust maybeDocument
    let hasTable = isJust maybeTable
    
    case hasCollection, hasDocument, hasTable of
      true, true, false -> do
        collection <- obj .: "collection"
        document <- obj .: "document"
        pure $ DocumentLocation { collection, document }
        
      true, false, false -> do
        collection <- obj .: "collection"
        pure $ CollectionLocation { collection }
        
      false, false, true -> do
        table <- obj .: "table"
        pure $ TableLocation { table }
        
      _, _, _ -> Left $ AtKey "structure" $ UnexpectedValue json

parseJobLocation :: String -> Either String JobLocation
parseJobLocation jsonStr = jsonParser jsonStr >>= (decodeJson >>> bimap printJsonDecodeError identity)

-- Example usage
examples :: Effect Unit
examples = do
  let collectionJson = """{"collection": "users"}"""
  case parseJobLocation collectionJson of
    Left err -> log $ "Collection parse error: " <> err
    Right location -> do
      log $ "Parsed CollectionLocation: " <> show location
  
  let documentJson = """{"collection": "users", "document": "user123"}"""
  case parseJobLocation documentJson of
    Left err -> log $ "Document parse error: " <> err
    Right location -> do
      log $ "Parsed DocumentLocation: " <> show location
  
  let tableJson = """{"table": "analytics"}"""
  case parseJobLocation tableJson of
    Left err -> log $ "Table parse error: " <> err
    Right location -> do
      log $ "Parsed TableLocation: " <> show location
  
  let invalidJson = """{"data": "something"}"""
  case parseJobLocation invalidJson of
    Left err -> log $ "Expected error: " <> err
    Right location -> log $ "Unexpected success: " <> show location

main :: Effect Unit
main = do
  examples

While this code is type safe and will handle all of the edge cases just perfectly, I find it pretty hard to read. Parts that I would be unable to understand in a month of not working with this code, like this one:

parseJobLocation jsonStr = jsonParser jsonStr >>= (decodeJson >>> bimap printJsonDecodeError identity)

It could be rewritten in an imperative way:

parseJobLocation jsonStr = do
  json <- jsonParser jsonStr
  bimap printJsonDecodeError identity (decodeJson json)

But I doubt this makes it any more readable - the bimap printJsonDecodeError identity (decodeJson json) part, specifically. This is a common issue with Haskell and PureScript - some of the functions are weirdly named and can only be understood either with experience or by reading the docs (which, in turn, are also quite cryptic). Take the bimap function for example. Its signature looks like this:

bimap :: forall a b c d. (a -> b) -> (c -> d) -> f a c -> f b d

This does not really help non-seasoned Haskeller.

The docs only say this:

The bimap function maps a pair of functions over the two type arguments of the bifunctor.

I had to use it since there are two function calls in parseJobLocation: jsonParser :: String -> Either String Json and decodeJson :: Json -> Either JsonDecodeError a. This means if you just pipe the output of the first to the second (jsonParser jsonStr >>= decodeJson), the return type should match - because of the nature of the >>= operator: >>= :: (b -> c) -> m a b -> m a c, meaning for Either a b and a function b -> c the return type would have to be Either a c, in our case (Json -> a) -> (Either String Json) should return Either String a, but decodeJson returns Either JsonDecodeError a instead. Effectively, >>= operates on the second argument of a functor (in case of Either a b, operator >>= changes b), whereas we want to change it first and then change the first argument as well (making the chain Either String Json -> Either JsonDecodeError a -> Either String a). But this has to be done in one go - or, rather, in one argument of >>=. We can create such function by combining decodeJson which returns Either JsonDecodeError a and another function, which converts that middle argument, JsonDecodeError into String, making it Either String a. We achieve this by using the >>> operator which is an alias for composeFlipped and is defined as composeFlipped :: a b c -> a c d -> a b d. In fact, instead of using bimap _ identity _ it is better to use lmap _ _ which would only apply function to the left side of Either:

parseJobLocation jsonStr = do
  json <- jsonParser jsonStr
  lmap printJsonDecodeError (decodeJson json)

Explanations like those make it a huge barrier to entry for newcomers.

Incorporating the above solution in an existing TypeScript project might look a tad cumbersome. For Vite, there is a PureScript plugin:

import { defineConfig } from 'vite';
import purescript from 'vite-plugin-purescript';

export default defineConfig({
  plugins: [
    // ...
    purescript(),
  ],
});

The tricky part is that we have to re-define types in TypeScript:

type CollectionLocation = { collection: string };
type DocumentLocation = { collection: string; document: string };
type TableLocation = { table: string };

type JobLocation = CollectionLocation | DocumentLocation | TableLocation;

Moreover, since the function returns an Either, we would need that one too:

interface Left<E> {
  readonly _tag: 'Left';
  readonly value0: E;
}

interface Right<A> {
  readonly _tag: 'Right';
  readonly value0: A;
}

type Either<E, A> = Left<E> | Right<A>;

And then importing the function and calling it from TypeScript:

import { parseJobLocation } from '../output/JobLocation';

const loc = parseJobLocation('{ "collection": "col", "document": "doc" }');

if (loc._tag === 'Right') {
    // ...
}

Quite the bother, right? That’s why it is most beneficial if the entire critical section can be written in PureScript entirely. In my case, the JobLocation is used for displaying a corresponding UI element (in React), so there’s little benefit at a cost of lots of boilerplate and not the best experience defining those parsers.

I personally prefer Scala 3 implementation (not the integration with TypeScript part though, just the JSON parser implementation):

import io.circe.*
import io.circe.parser.*
import cats.implicits.*
import cats.effect.{IO, IOApp}
import io.circe.generic.semiauto.*

enum JobLocation:
  case CollectionLocation(collection: String)
  case DocumentLocation(collection: String, document: String)
  case TableLocation(table: String)

object JobLocation:
  given Decoder[DocumentLocation] = deriveDecoder
  given Decoder[CollectionLocation] = deriveDecoder
  given Decoder[TableLocation] = deriveDecoder

object JobLocationApp extends IOApp.Simple:

  import JobLocation.*

  def parseJobLocation(jsonString: String): Either[Error, JobLocation] =
    parse(jsonString).flatMap { json =>
      json.as[TableLocation].widen[JobLocation] orElse
      json.as[DocumentLocation].widen[JobLocation] orElse
      json.as[CollectionLocation].widen[JobLocation]
    }

  def run: IO[Unit] =
    val testCases = List(
      """{"collection": "users"}""",
      """{"collection": "orders", "document": "order-123"}""",
      """{"table": "analytics"}""",
      """{"invalid": "data"}"""
    )

    testCases.traverse_ { jsonStr =>
      parseJobLocation(jsonStr) match
        case Right(location) =>
          IO.println(s"Parsed: $jsonStr -> $location")

        case Left(error) =>
          IO.println(s"Failed to parse: $jsonStr -> ${error.getMessage}")
    }

This is a really straightforward implementation in my opinion. In this example, the parsing is literally boiled down to “try parsing TableLocation and if it returns Left, try parsing DocumentLocation instead, and if that returns Left, try parsing CollectionLocation, otherwise return what you got”.

Alternatively, circe, the JSON parsing library for scala-cats used in the example above, provides an even neater way:

object JobLocation:
  given Decoder[DocumentLocation] = deriveDecoder
  given Decoder[CollectionLocation] = deriveDecoder
  given Decoder[TableLocation] = deriveDecoder

  given Decoder[JobLocation] =
    summon[Decoder[TableLocation]].widen[JobLocation] or
    summon[Decoder[DocumentLocation]].widen[JobLocation] or
    summon[Decoder[CollectionLocation]].widen[JobLocation]

def parseJobLocation(jsonString: String): Either[Error, JobLocation] =
    parse(jsonString).flatMap(_.as[JobLocation])

In this code, the parser for the parent type, JobLocation is defined as a combined parser of whichever case class manages to get parsed first:

summon[Decoder[TableLocation]].widen[JobLocation] or
summon[Decoder[DocumentLocation]].widen[JobLocation] or
summon[Decoder[CollectionLocation]].widen[JobLocation]

Just to reiterate, I do understand that converting the application (and developers) to this new weird technology is an almost impossible task, especially in a large long-lived project. One way to reason about it and justify the transition is the resilience requirements of a project (the need for actually error-prone code) and the amount of time and effort spent to date on finding and fixing those nasty bugs and undefined behaviours in an application.

IO impact

At MongoDB I work on a Relational Migrator project - a tool which helps people migrate their relational database to MongoDB. And recently we grew interested in the performance of our tool. Due to the nature of the migrations, they are usually extremely long (potentially even never ending, for some scenarios). It is a rather valuable information to know where we can speed things up.

Hence we ran a profiler on a relatively big database of 1M rows. And this was what we saw:

The handleBatch method is where the meat and potatoes of our migration logic reside. It lasts for approx. 6.5 sec. We could have debated on which parts of this flame graph we could optimize (and we actually did), but we first decided to take a quick look at the same graph from the higher level - not the CPU time (when CPU is actually doing the active work) but the total time:

The entire application run took 4,937 sec (1hr 22min 17sec). Of which, the migration itself took only 130 sec:

The biggest chunk of it was writing to MongoDB database at 120 sec:

The actual migration logic is really just 3.5 sec:

So out of 130 sec of the actual migration run, the actual logic took 3.5 sec or mere 2.69%. The rest was just IO (input/output). Which we also saw on the thread timeline:

Most time all the threads spent sleeping.

This is not new information, just a reminder that the slowest part of pretty much any application is input-output.

Simulating network outages in Docker

Recently, for my work project I needed to simulate a system of ours going offline for a period of time (similar to having a network outage on a customer side).

I figured there are two ways to do it:

  • using Docker’ networks
  • using host OS’ firewall

With Docker network it is as easy as

docker network disconnect <network-name> <container-name>

This way you disconnect a container from a network (even if it is a bridge network, exposing the container to the host OS).

You can find a list of networks with docker network ls and list of containers with docker ps.

To roll it back (simulate recovery from an outage), simply

$ docker network connect <network-name> <container-name>

Disconnecting a container from a network is okay, but sometimes you might want to have a fine-grain control over the outage, like forbid a specific IP or port being accessed by your container.

With the firewall it is totally possible, but the steps differ for Linux and OSX.

In Linux you use iptables and control a rule group specific to Docker only:

$ iptables -I DOCKER -p tcp --dport 27017 -j DROP

The above command will block all TCP traffic on port 27017 for all Docker containers. To revert this, run

$ iptables -I DOCKER -p tcp —dport 22 ACCEPT

To control a specific IP address use the -s parameter:

$ iptables -I DOCKER -p tcp -s 192.168.0.10 --dport 27017 -j DROP
$ iptables -I DOCKER -p tcp -s 192.168.0.10 --dport 27017 -j ACCEPT

On the other hand, OSX uses a tool with BSD roots, pf. You can use the /etc/pf.conf file to mess with firewall rules.

Blocking traffic is achieved by adding a rule like below to the /etc/pf.conf file:

block drop out quick proto tcp from any to any port 27017

followed by reloading the rule list with

$ pfctl -f /etc/pf.conf

In case pf is disabled, one is enabled running

$ pfctl -e

Re-enabling is as easy as removing (or commenting out) the rule line in the /etc/pf.conf file and reloading it with pfctl -f /etc/pf.conf.

How unique are your bundles?

In the modern front-end development we are all used to package managers, transpilation and bundling. These concepts are the biproduct of the tools which we hoped would simplify developer’ burden and speed up the development.

However, how confident are you these tools are actually doing good job?

Developers seem comfortable off-loading the processing power to users’ machine (browser, predominantly). We are not surprised by seeing slow websites anymore. A simple blog is downloading 55MB of JavaScript? Seems fine nowadays.

I currently work on a fairly small tool (MongoDB Relational Migrator), which also utilizes TypeScript, React and, of course, bundling. We use Vite for that. Our bundles are split into chunks (but I have accounted for that, too).

I went ahead and wrote a rather simple script which parses the bundles (using TypeScript compiler API, because why not), extracting the function definitions (both arrow functions and plain old function) and counts how many times they occur in the file. For this last bit, to make sure I am not counting a => true and x => true as different occurrences, I am minimizing the function definition with uglifyjs and counting the SHA256 hashes of the minimized functions (just to have a reasonable key in my hashmap instead of entire function code).

These are my findings.

Out of 54 chunks, 47 are not css-in-js chunks. Out of 47 remaining, 7 have any significant duplication (over 5%). But when they do, they do it hard: duplication varies between 18% and a whopping 42% of sheer file size. Absolute numbers are also astonishing: 33% to 59% functions are duplicates.

Found 15192 functions, 8963 are unique (59%)
Duplicates length: 1518418 bytes out of 3537579 bytes are duplicate code (42.92%)

Found 1202 functions, 494 are unique (41.1%)
Duplicates length: 130649 bytes out of 340227 bytes are duplicate code (38.4%)

Found 513 functions, 231 are unique (45.03%)
Duplicates length: 50160 bytes out of 136057 bytes are duplicate code (36.87%)

Found 598 functions, 267 are unique (44.65%)
Duplicates length: 57607 bytes out of 164737 bytes are duplicate code (34.97%)

Found 17 functions, 10 are unique (58.82%)
Duplicates length: 1932 bytes out of 6532 bytes are duplicate code (29.58%)

Found 154 functions, 98 are unique (63.64%)
Duplicates length: 11140 bytes out of 45135 bytes are duplicate code (24.68%)

Found 968 functions, 651 are unique (67.25%)
Duplicates length: 52616 bytes out of 281406 bytes are duplicate code (18.7%)

I thought my code might be wrong, so I looked into the bundle code itself. Here’s a short excerpt:

Object.assign.bind():function(e){for(var t=1;t<arguments.length;t++){var n=arguments[t];for(var r in n)Object.prototype.hasOwnProperty.call(n,r)&&(e[r]=n[r])}return e},yR.apply(this,arguments)}var Zce;function bR(){return bR=Object.assign?Object.assign.bind():function(e){for(var t=1;t<arguments.length;t++){var n=arguments[t];for(var r in n)Object.prototype.hasOwnProperty.call(n,r)&&(e[r]=n[r])}return e},bR.apply(this,arguments)}var Wce;function wR(){return wR=Object.assign?Object.assign.bind():function(e){for(var t=1;t<arguments.length;t++){var n=arguments[t];for(var r in n)Object.prototype.hasOwnProperty.call(n,r)&&(e[r]=n[r])}return e},wR.apply(this,arguments)}var Uce;function $R(){return $R=Object.assign?Object.assign.bind():function(e){for(var t=1;t<arguments.length;t++){var n=arguments[t];for(var r in n)Object.prototype.hasOwnProperty.call(n,r)&&(e[r]=n[r])}return e},$R.apply(this,arguments)}var Gce;function OR(){return OR=Object.assign?Object.assign.bind():function(e){for(var t=1;t<arguments.length;t++){var n=arguments[t];for(var r in n)Object.prototype.hasOwnProperty.call(n,r)&&(e[r]=n[r])}return e},OR.apply(this,arguments)}var Kce;function xR(){return xR=Object.assign?Object.assign.bind():function(e){for(var t=1;t<arguments.length;t++){var n=arguments[t];for(var r in n)Object.prototype.hasOwnProperty.call(n,r)&&(e[r]=n[r])}return e},xR.apply(this,arguments)}

See how the following fragment of code repeats multiple times:

function(e){for(var t=1;t<arguments.length;t++){var n=arguments[t];for(var r in n)Object.prototype.hasOwnProperty.call(n,r)&&(e[r]=n[r])}return e}

In fact, this exact same fragment of code repeats 137 times in the same piece of bundle chunk (same file):

Repeated function definition in a single chunk of code

By the way, this is a production build of our front-end, built using Vite, with minification enabled.

The raw length of this function code is 146 characters. So in a single place, in a single file, you have 136 * 146 = 19_992 bytes of waste. Meaning, browser has to load these 20KB of code, parse it and create 136 duplicating functions.

Looking at the overall size of 3.5MB of code in this chunk and its insane 41% duplicated code (in sheer bytes, not occurrences, so 1.5MB wasted), imagine how much faster this single file could have been loaded in a browser.

I was keen on seeing what functions get duplicated most often and ran my script on an entire build output directory. Here are top 80-ish offenders:

Function code (minimized) Copies
function(r){for(var t=1;t<arguments.length;t++){var n,o=arguments[t];for(n in o)Object.prototype.hasOwnProperty.call(o,n)&&(r[n]=o[n])}return r} 2205
function n(){return Object.assign&&Object.assign.bind(),n.apply(this,arguments)} 1197
function n(){return Object.assign,n.apply(this,arguments)} 1008
function(){} 753
function(i){this.a=i} 250
function(n,e){if(null==n)return{};for(var r,t={},f=Object.keys(n),u=0;u<f.length;u++)r=f[u],0<=e.indexOf(r)||(t[r]=n[r]);return t} 191
function(e,r){if(null==e)return{};var t,n=function(e,r){if(null==e)return{};for(var t,n={},l=Object.keys(e),o=0;o<l.length;o++)t=l[o],0<=r.indexOf(t)||(n[t]=e[t]);return n}(e,r);if(Object.getOwnPropertySymbols)for(var l=Object.getOwnPropertySymbols(e),o=0;o<l.length;o++)t=l[o],0<=r.indexOf(t)||Object.prototype.propertyIsEnumerable.call(e,t)&&(n[t]=e[t]);return n} 187
function(e,r){return r=r||e.slice(0),Object.freeze(Object.defineProperties(e,{raw:{value:Object.freeze(r)}}))} 159
function(n){return this===n} 119
function(r,t){if("object"!=typeof r||null===r)return r;var e=r[Symbol.toPrimitive];if(void 0===e)return("string"===t?String:Number)(r);e=e.call(r,t||"default");if("object"!=typeof e)return e;throw new TypeError("@@toPrimitive must return a primitive value.")} 113
function(r,e,t){return i=function(r,e){if("object"!=typeof r||null===r)return r;var t=r[Symbol.toPrimitive];if(void 0===t)return String(r);t=t.call(r,e);if("object"!=typeof t)return t;throw new TypeError("@@toPrimitive must return a primitive value.")}(e,"string"),(e="symbol"==typeof i?i:String(i))in r?Object.defineProperty(r,e,{value:t,enumerable:!0,configurable:!0,writable:!0}):r[e]=t,r;var i} 111
function(t){t=function(t,r){if("object"!=typeof t||null===t)return t;var i=t[Symbol.toPrimitive];if(void 0===i)return String(t);i=i.call(t,r);if("object"!=typeof i)return i;throw new TypeError("@@toPrimitive must return a primitive value.")}(t,"string");return"symbol"==typeof t?t:String(t)} 111
function(e,n,r){return n in e?Object.defineProperty(e,n,{value:r,enumerable:!0,configurable:!0,writable:!0}):e[n]=r,e} 104
function(c,i){He.call(this,c,i)} 94
function(n,r){(null==r||r&gt;n.length)&&(r=n.length);for(var e=0,l=new Array(r);e<r;e++)l[e]=n[e];return l} 93
function(){return!0} 92
function(){return!1} 78
function(){return new gt(this)} 77
function(r){if(Array.isArray(r))return r} 77
function(){throw new TypeError(`Invalid attempt to destructure non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.`)} 77
function(t){return t&&"object"==typeof t&&"default"in t?t:{default:t}} 76
function(i,t){this.a=i,this.b=t} 58
function(){return this.a} 49
function(t,e){var r,n=Object.keys(t);return Object.getOwnPropertySymbols&&(r=Object.getOwnPropertySymbols(t),e&&(r=r.filter(function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable})),n.push.apply(n,r)),n} 49
function(e){var t;"default"!==e&&(t=Object.getOwnPropertyDescriptor(c,e),Object.defineProperty(y,e,t.get?t:{enumerable:!0,get:function(){return c[e]}}))} 49
function(){return c[b]} 49
function(a,e,i){var r,t=i["aria-label"],n=i["aria-labelledby"],c=i.title;switch(a){case"img":return t||n||c?(l(r={},"aria-labelledby",n),l(r,"aria-label",t),l(r,"title",c),r):{"aria-label":"".concat(e.replace(/([a-z])([A-Z])/g,"$1 $2")," Icon")};case"presentation":return{"aria-hidden":!0,alt:""}}} 49
function(i){Di(this,i)} 48
function(r){return Object.getOwnPropertyDescriptor(e,r).enumerable} 48
function(){throw M(new De)} 46
function(l,r){var t=null==l?null:typeof Symbol<"u"&&l[Symbol.iterator]||l["@@iterator"];if(null!=t){var n,u,e=[],a=!0,o=!1;try{for(t=t.call(l);!(a=(n=t.next()).done)&&(e.push(n.value),!r||e.length!==r);a=!0);}catch(l){o=!0,u=l}finally{try{a||null==t.return||t.return()}finally{if(o)throw u}}return e}} 44
function(n){throw M(new De)} 39
function(r){var n;return r&&"object"==typeof r&&"default"in r?r:(n=Object.create(null),r&&Object.keys(r).forEach(function(e){var t;"default"!==e&&(t=Object.getOwnPropertyDescriptor(r,e),Object.defineProperty(n,e,t.get?t:{enumerable:!0,get:function(){return r[e]}}))}),n.default=r,Object.freeze(n))} 39
function n(r){return n(r)} 38
function(n){return typeof n} 38
function(o){return o&&"function"==typeof Symbol&&o.constructor===Symbol&&o!==Symbol.prototype?"symbol":typeof o} 38
function(r){var n;return r&&r.__esModule?r:(n=Object.create(null),r&&Object.keys(r).forEach(function(e){var t;"default"!==e&&(t=Object.getOwnPropertyDescriptor(r,e),Object.defineProperty(n,e,t.get?t:{enumerable:!0,get:function(){return r[e]}}))}),n.default=r,Object.freeze(n))} 37
function(){return this.b} 33
function(l,r){var t=null==l?null:typeof Symbol<"u"&&l[Symbol.iterator]||l["@@iterator"];if(null!=t){var e,n,u,a,f=[],i=!0,o=!1;try{if(u=(t=t.call(l)).next,0===r){if(Object(t)!==t)return;i=!1}else for(;!(i=(e=u.call(t)).done)&&(f.push(e.value),f.length!==r);i=!0);}catch(l){o=!0,n=l}finally{try{if(!i&&null!=t.return&&(a=t.return(),Object(a)!==a))return}finally{if(o)throw n}}return f}} 33
function(t){Object.defineProperty(e,t,Object.getOwnPropertyDescriptor(n,t))} 32
function(n){return Ei(n)} 30
function(n){return L(fn,X,2,n,6,1)} 30
function(n){} 29
function(r){if(typeof Symbol<"u"&&null!=r[Symbol.iterator]||null!=r["@@iterator"])return Array.from(r)} 28
function(){throw new TypeError(`Invalid attempt to spread non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.`)} 28
()=>{} 27
function(){return null} 27
function(n,o){e.exports=o(m,on(),dn)} 27
function(i){Fi(this,i)} 23
function(n,o){e.exports=o(dn,on(),m)} 22
function(){throw M(new Fe(Re((xe(),Ds))))} 21
function(){return this} 20
function(i,t){this.b=i,this.a=t} 19
function(n,e){throw M(new De)} 18
function(){return 0} 17
function(r){Object.defineProperty(e,r,Object.getOwnPropertyDescriptor(t,r))} 16
()=>{var l;return null!=(l=e.options.debugAll)?l:e.options.debugHeaders} 16
function(n){return n} 15
function(c){Kr.call(this,c)} 15
function(){return this.c} 15
function(){return this.d} 15
function(n,r){return n} 14
d=>d.id 14
function(){var r=function(t,o){return(r=Object.setPrototypeOf||({__proto__:[]}instanceof Array?function(t,o){t.__proto__=o}:function(t,o){for(var n in o)Object.prototype.hasOwnProperty.call(o,n)&&(t[n]=o[n])}))(t,o)};return function(t,o){if("function"!=typeof o&&null!==o)throw new TypeError("Class extends value "+String(o)+" is not a constructor or null");function n(){this.constructor=t}r(t,o),t.prototype=null===o?Object.create(o):(n.prototype=o.prototype,new n)}} 14
function(_,o){_.__proto__=o} 14
function(o,r){for(var t in r)Object.prototype.hasOwnProperty.call(r,t)&&(o[t]=r[t])} 14
function(n){return!1} 13
function(t){var l=-1,n=null==t?0:t.length;for(this.clear();++l<n;){var r=t[l];this.set(r[0],r[1])}} 12
function(a,e,l){var i,r=l["aria-label"],t=l["aria-labelledby"],n=l.title;switch(a){case"img":return r||t||n?(f(i={},"aria-labelledby",t),f(i,"aria-label",r),f(i,"title",n),i):{"aria-label":"".concat(e.replace(/([a-z])([A-Z])/g,"$1 $2")," Icon")};case"presentation":return{"aria-hidden":!0,alt:""}}} 12
a=>a 12
a=>a() 12
function(n,a){n.a=a} 11
function(){ia.call(this)} 11
function(i,t,h){this.a=i,this.b=t,this.c=h} 11
function(n,r){return r} 11
function(){return this.a.gc()} 11
function(e){return e&&e.__esModule?e:{default:e}} 11
()=>n(!1) 11
function(i){this.b=i} 10
function(c){this.c=c} 10
function(){return this.f} 10
function(n){return n||"div"} 10
function(n,r,i){var l;return i=null!=(l=i)?l:"div",n||("string"==typeof(null==r?void 0:r.href)?"a":i)} 10

Let’s dive deeper, shall we?

Imagine for a second that we could define the duplicated functions once and then just reuse the short name instead (sounds reasonable, does it not?).

But not all of those functions could be de-duplicated in such way (at least not so easily). Some of these functions use the outer closure functions and variables (something defined outside of the function itself), so we can skip these. For instance, function(i){Di(this,i)} and function(){throw M(new De)} can be ignored.

Then, there are function using this. These might be tricky (this is not what you think is a famous JavaScript mantra).

Lastly, some (if not most) of the functions could be either replaced with arrow functions or standard library function. But that is uncertain - one must understand what a function does first.

With those points in mind, let’s look at the offenders once again:

Function code (minimized) Copies Notes
function(r){for(var t=1;t<arguments.length;t++){var n,o=arguments[t];for(n in o)Object.prototype.hasOwnProperty.call(o,n)&&(r[n]=o[n])}return r} 2205 spread operator?
function n(){return Object.assign&&Object.assign.bind(),n.apply(this,arguments)} 1197 Object.assign methods?
function n(){return Object.assign,n.apply(this,arguments)} 1008 Object.assign properties?
function(){} 753 no-op
function(n,e){if(null==n)return{};for(var r,t={},f=Object.keys(n),u=0;u<f.length;u++)r=f[u],0<=e.indexOf(r)||(t[r]=n[r]);return t} 191 ?
function(e,r){if(null==e)return{};var t,n=function(e,r){if(null==e)return{};for(var t,n={},l=Object.keys(e),o=0;o<l.length;o++)t=l[o],0<=r.indexOf(t)||(n[t]=e[t]);return n}(e,r);if(Object.getOwnPropertySymbols)for(var l=Object.getOwnPropertySymbols(e),o=0;o<l.length;o++)t=l[o],0<=r.indexOf(t)||Object.prototype.propertyIsEnumerable.call(e,t)&&(n[t]=e[t]);return n} 187 ?
function(e,r){return r=r||e.slice(0),Object.freeze(Object.defineProperties(e,{raw:{value:Object.freeze(r)}}))} 159 ?
function(r,t){if("object"!=typeof r||null===r)return r;var e=r[Symbol.toPrimitive];if(void 0===e)return("string"===t?String:Number)(r);e=e.call(r,t||"default");if("object"!=typeof e)return e;throw new TypeError("@@toPrimitive must return a primitive value.")} 113 ?
function(r,e,t){return i=function(r,e){if("object"!=typeof r||null===r)return r;var t=r[Symbol.toPrimitive];if(void 0===t)return String(r);t=t.call(r,e);if("object"!=typeof t)return t;throw new TypeError("@@toPrimitive must return a primitive value.")}(e,"string"),(e="symbol"==typeof i?i:String(i))in r?Object.defineProperty(r,e,{value:t,enumerable:!0,configurable:!0,writable:!0}):r[e]=t,r;var i} 111 ?
function(t){t=function(t,r){if("object"!=typeof t||null===t)return t;var i=t[Symbol.toPrimitive];if(void 0===i)return String(t);i=i.call(t,r);if("object"!=typeof i)return i;throw new TypeError("@@toPrimitive must return a primitive value.")}(t,"string");return"symbol"==typeof t?t:String(t)} 111 ?
function(e,n,r){return n in e?Object.defineProperty(e,n,{value:r,enumerable:!0,configurable:!0,writable:!0}):e[n]=r,e} 104 ?
function(n,r){(null==r||r>n.length)&&(r=n.length);for(var e=0,l=new Array(r);e<r;e++)l[e]=n[e];return l} 93 array spread?
function(){return!0} 92 always-true
function(){return!1} 78 always-false
function(r){if(Array.isArray(r))return r} 77 self explanatory
function(){throw new TypeError('Invalid attempt to destructure non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.')} 77 isSymbol?
function(t){return t&&"object"==typeof t&&"default"in t?t:{default:t}} 76 ?
function(t,e){var r,n=Object.keys(t);return Object.getOwnPropertySymbols&&(r=Object.getOwnPropertySymbols(t),e&&(r=r.filter(function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable})),n.push.apply(n,r)),n} 49 ?
function(e){var t;"default"!==e&&(t=Object.getOwnPropertyDescriptor(c,e),Object.defineProperty(y,e,t.get?t:{enumerable:!0,get:function(){return c[e]}}))} 49 ?
function(l,r){var t=null==l?null:typeof Symbol<"u"&&l[Symbol.iterator]||l["@@iterator"];if(null!=t){var n,u,e=[],a=!0,o=!1;try{for(t=t.call(l);!(a=(n=t.next()).done)&&(e.push(n.value),!r||e.length!==r);a=!0);}catch(l){o=!0,u=l}finally{try{a||null==t.return||t.return()}finally{if(o)throw u}}return e}} 44 ?
function(r){var n;return r&&"object"==typeof r&&"default"in r?r:(n=Object.create(null),r&&Object.keys(r).forEach(function(e){var t;"default"!==e&&(t=Object.getOwnPropertyDescriptor(r,e),Object.defineProperty(n,e,t.get?t:{enumerable:!0,get:function(){return r[e]}}))}),n.default=r,Object.freeze(n))} 39 ?
function n(r){return n(r)} 38 Function.apply?
function(n){return typeof n} 38 typeof
function(o){return o&&"function"==typeof Symbol&&o.constructor===Symbol&&o!==Symbol.prototype?"symbol":typeof o} 38 ?
function(r){var n;return r&&r.__esModule?r:(n=Object.create(null),r&&Object.keys(r).forEach(function(e){var t;"default"!==e&&(t=Object.getOwnPropertyDescriptor(r,e),Object.defineProperty(n,e,t.get?t:{enumerable:!0,get:function(){return r[e]}}))}),n.default=r,Object.freeze(n))} 37 import?
function(l,r){var t=null==l?null:typeof Symbol<"u"&&l[Symbol.iterator]||l["@@iterator"];if(null!=t){var e,n,u,a,f=[],i=!0,o=!1;try{if(u=(t=t.call(l)).next,0===r){if(Object(t)!==t)return;i=!1}else for(;!(i=(e=u.call(t)).done)&&(f.push(e.value),f.length!==r);i=!0);}catch(l){o=!0,n=l}finally{try{if(!i&&null!=t.return&&(a=t.return(),Object(a)!==a))return}finally{if(o)throw n}}return f}} 33 ?
function(n){} 29 no-op
function(r){if(typeof Symbol<"u"&&null!=r[Symbol.iterator]||null!=r["@@iterator"])return Array.from(r)} 28 ?
function(){throw new TypeError('Invalid attempt to spread non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.')} 28 ?
()=>{} 27 no-op
function(){return null} 27 always-null
function(){return this} 20 always-this
function(){return 0} 17 always-zero
function(n){return n} 15 identity
function(n,r){return n} 14 always-first-argument
d=>d.id 14 .id
function(){var r=function(t,o){return(r=Object.setPrototypeOf||({__proto__:[]}instanceof Array?function(t,o){t.__proto__=o}:function(t,o){for(var n in o)Object.prototype.hasOwnProperty.call(o,n)&&(t[n]=o[n])}))(t,o)};return function(t,o){if("function"!=typeof o&&null!==o)throw new TypeError("Class extends value "+String(o)+" is not a constructor or null");function n(){this.constructor=t}r(t,o),t.prototype=null===o?Object.create(o):(n.prototype=o.prototype,new n)}} 14 ?
function(_,o){_.__proto__=o} 14 Object.is_a?
function(o,r){for(var t in r)Object.prototype.hasOwnProperty.call(r,t)&&(o[t]=r[t])} 14 ?
function(n){return!1} 13 always-false
a=>a 12 identity
a=>a() 12 call first argument
function(n,a){n.a=a} 11 enum/const definition?
function(n,r){return r} 11 always-second-argument
function(e){return e&&e.__esModule?e:{default:e}} 11 import default
function(c){this.c=c} 10 enum/const definition?

As a matter of fact, someone on the internet did a very similar research few years ago. So I hoped to see the improvement in the build tools over the years.

As I mentioned above, our front-end is bundled with Vite. Let’s see if using esbuild or bun (since both are fairly new and stand out in terms of architecture and performance) do a better job.

With few small adjustments to make things fair (e.g. build the same thing in the same way), like disabling the plugins for Vite, setting up svgr loader, here are some build time stats:

yarn install:

➤ YN0000: Done with warnings in 17s 798ms
yarn  12.11s user 19.69s system 175% cpu 18.122 total

bun install:

warn: esbuild's postinstall script took 748.9ms

 1028 packages installed [1.82s]
  Removed: 2
bun install  0.22s user 0.65s system 47% cpu 1.849 total
Bundler Build time
bun 0.43s
esbuild 2.57s
vite 85.04s
webpack 138.64s

And the analysis of the built bundles:

vite:

Found 968 functions, 651 are unique (67.25%)
Found 598 functions, 267 are unique (44.65%)
Found 154 functions, 98 are unique (63.64%)
Found 17 functions, 10 are unique (58.82%)
Found 15192 functions, 8963 are unique (59%)
Found 1202 functions, 494 are unique (41.1%)
Found 513 functions, 231 are unique (45.03%)
= Total 18644 functions, 10714 are unique (57.4%)

Duplicates length: 52616 bytes out of 281406 bytes are duplicate code (18.7%)
Duplicates length: 57607 bytes out of 164737 bytes are duplicate code (34.97%)
Duplicates length: 11140 bytes out of 45135 bytes are duplicate code (24.68%)
Duplicates length: 1932 bytes out of 6532 bytes are duplicate code (29.58%)
Duplicates length: 1518418 bytes out of 3537579 bytes are duplicate code (42.92%)
Duplicates length: 130649 bytes out of 340227 bytes are duplicate code (38.4%)
Duplicates length: 50160 bytes out of 136057 bytes are duplicate code (36.87%)
= Total 1822522 out of 4511673 bytes are duplicate code (40.3%)

esbuild:

Found 46654 functions, 28952 are unique (62.06%)
Duplicates length: 6905599 bytes out of 9645594 bytes are duplicate code (71.59%)

bun:

Found 31113 functions, 25755 are unique (82.78%)
Duplicates length: 446020 bytes out of 5696964 bytes are duplicate code (7.83%)

webpack:

Found 2898 functions, 1434 are unique (49.48%)
Duplicates length: 320940 bytes out of 4645589 bytes are duplicate code (6.91%)

And a deeper analysis of the duplicated functions:

esbuild:

Function Copies
function(r){for(var t=1;t<arguments.length;t++){var n,o=arguments[t];for(n in o)Object.prototype.hasOwnProperty.call(o,n)&&(r[n]=o[n])}return r} 2216
function n(){return Object.assign&&Object.assign.bind(),n.apply(this,arguments)} 1204
function n(){return Object.assign,n.apply(this,arguments)} 1010
function(){} 844
function(t){return t&&"object"==typeof t&&"default"in t?t:{default:t}} 260
function(i){this.a=i} 250
function(n,e){if(null==n)return{};for(var r,t={},f=Object.keys(n),u=0;u<f.length;u++)r=f[u],0<=e.indexOf(r)||(t[r]=n[r]);return t} 203
function(e,r){if(null==e)return{};var t,n=function(e,r){if(null==e)return{};for(var t,n={},l=Object.keys(e),o=0;o<l.length;o++)t=l[o],0<=r.indexOf(t)||(n[t]=e[t]);return n}(e,r);if(Object.getOwnPropertySymbols)for(var l=Object.getOwnPropertySymbols(e),o=0;o<l.length;o++)t=l[o],0<=r.indexOf(t)||Object.prototype.propertyIsEnumerable.call(e,t)&&(n[t]=e[t]);return n} 194
function(e,r){return r=r||e.slice(0),Object.freeze(Object.defineProperties(e,{raw:{value:Object.freeze(r)}}))} 160
function(r,t){if("object"!=typeof r||null===r)return r;var e=r[Symbol.toPrimitive];if(void 0===e)return("string"===t?String:Number)(r);e=e.call(r,t||"default");if("object"!=typeof e)return e;throw new TypeError("@@toPrimitive must return a primitive value.")} 137
function(r,e,t){return i=function(r,e){if("object"!=typeof r||null===r)return r;var t=r[Symbol.toPrimitive];if(void 0===t)return String(r);t=t.call(r,e);if("object"!=typeof t)return t;throw new TypeError("@@toPrimitive must return a primitive value.")}(e,"string"),(e="symbol"==typeof i?i:String(i))in r?Object.defineProperty(r,e,{value:t,enumerable:!0,configurable:!0,writable:!0}):r[e]=t,r;var i} 134
function(t){t=function(t,r){if("object"!=typeof t||null===t)return t;var i=t[Symbol.toPrimitive];if(void 0===i)return String(t);i=i.call(t,r);if("object"!=typeof i)return i;throw new TypeError("@@toPrimitive must return a primitive value.")}(t,"string");return"symbol"==typeof t?t:String(t)} 134
()=>{} 129
function(n){return this===n} 119
function(e,n,r){return n in e?Object.defineProperty(e,n,{value:r,enumerable:!0,configurable:!0,writable:!0}):e[n]=r,e} 115
function(n,r){(null==r||r>n.length)&&(r=n.length);for(var e=0,l=new Array(r);e<r;e++)l[e]=n[e];return l} 106
function(c,i){Bs.call(this,c,i)} 94
function(){return!0} 93
function(r){if(Array.isArray(r))return r} 83
function(){throw new TypeError(Invalid attempt to destructure non-iterable instance.\nIn order to be iterable, non-array objects must have a Symbol.iterator method.)} 83
function(){return!1} 79
function(){return new cu(this)} 77
function(e){var t;"default"!==e&&(t=Object.getOwnPropertyDescriptor(b,e),Object.defineProperty(E,e,t.get?t:{enumerable:!0,get:function(){return b[e]}}))} 76
function(){return b[P]} 76
function(a,e,l){var i,r=l["aria-label"],t=l["aria-labelledby"],n=l.title;switch(a){case"img":return r||t||n?(s(i={},"aria-labelledby",t),s(i,"aria-label",r),s(i,"title",n),i):{"aria-label":"".concat(e.replace(/([a-z])([A-Z])/g,"$1 $2")," Icon")};case"presentation":return{"aria-hidden":!0,alt:""}}} 76
function(r){var n;return r&&r.__esModule?r:(n=Object.create(null),r&&Object.keys(r).forEach(function(e){var t;"default"!==e&&(t=Object.getOwnPropertyDescriptor(r,e),Object.defineProperty(n,e,t.get?t:{enumerable:!0,get:function(){return r[e]}}))}),n.default=r,Object.freeze(n))} 67
function(n){return typeof n} 64
function(o){return o&&"function"==typeof Symbol&&o.constructor===Symbol&&o!==Symbol.prototype?"symbol":typeof o} 64
function n(r){return n(r)} 63
function(t,e){var r,n=Object.keys(t);return Object.getOwnPropertySymbols&&(r=Object.getOwnPropertySymbols(t),e&&(r=r.filter(function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable})),n.push.apply(n,r)),n} 61
function(i,t){this.a=i,this.b=t} 58
function(r){var n;return r&&"object"==typeof r&&"default"in r?r:(n=Object.create(null),r&&Object.keys(r).forEach(function(e){var t;"default"!==e&&(t=Object.getOwnPropertyDescriptor(r,e),Object.defineProperty(n,e,t.get?t:{enumerable:!0,get:function(){return r[e]}}))}),n.default=r,Object.freeze(n))} 50
function(r){if(typeof Symbol<"u"&&null!=r[Symbol.iterator]||null!=r["@@iterator"])return Array.from(r)} 49
function(){throw new TypeError(`Invalid attempt to spread non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.`)} 49
function(){return this.a} 49
function(i){p0(this,i)} 48
function(l,r){var t=null==l?null:typeof Symbol<"u"&&l[Symbol.iterator]||l["@@iterator"];if(null!=t){var n,u,e=[],a=!0,o=!1;try{for(t=t.call(l);!(a=(n=t.next()).done)&&(e.push(n.value),!r||e.length!==r);a=!0);}catch(l){o=!0,u=l}finally{try{a||null==t.return||t.return()}finally{if(o)throw u}}return e}} 46
function(){throw St(new Ss)} 46
function(n){throw St(new Ss)} 39
()=>{"use strict";Vu(),Du()} 38
function(l,r){var t=null==l?null:typeof Symbol<"u"&&l[Symbol.iterator]||l["@@iterator"];if(null!=t){var e,n,u,a,f=[],i=!0,o=!1;try{if(u=(t=t.call(l)).next,0===r){if(Object(t)!==t)return;i=!1}else for(;!(i=(e=u.call(t)).done)&&(f.push(e.value),f.length!==r);i=!0);}catch(l){o=!0,n=l}finally{try{if(!i&&null!=t.return&&(a=t.return(),Object(a)!==a))return}finally{if(o)throw n}}return f}} 37
function(){return this.b} 33
function(){X(x)} 32
function(n){} 31
a=>a() 30
function(n){return V1(n)} 30
function(n){return dn(oi,mr,2,n,6,1)} 30
function(){return null} 29
function(){return this} 24
()=>{"use strict";Du()} 23
function(i){g0(this,i)} 23
function(t,r){var e;if(t)return"string"==typeof t?k(t,r):"Map"===(e="Object"===(e=Object.prototype.toString.call(t).slice(8,-1))&&t.constructor?t.constructor.name:e)||"Set"===e?Array.from(t):"Arguments"===e||/^(?:Ui|I)nt(?:8|16|32)(?:Clamped)?Array$/.test(e)?k(t,r):void 0} 22
function(t,r){var e;if(t)return"string"==typeof t?P(t,r):"Map"===(e="Object"===(e=Object.prototype.toString.call(t).slice(8,-1))&&t.constructor?t.constructor.name:e)||"Set"===e?Array.from(t):"Arguments"===e||/^(?:Ui|I)nt(?:8|16|32)(?:Clamped)?Array$/.test(e)?P(t,r):void 0} 22
function(n){return null!=n&&n instanceof Array} 21
function(r){if(Array.isArray(r))return P(r)} 21
function(e){return e&&e.__esModule?e:{default:e}} 21
function(t,r){var e;if(t)return"string"==typeof t?Q(t,r):"Map"===(e="Object"===(e=Object.prototype.toString.call(t).slice(8,-1))&&t.constructor?t.constructor.name:e)||"Set"===e?Array.from(t):"Arguments"===e||/^(?:Ui|I)nt(?:8|16|32)(?:Clamped)?Array$/.test(e)?Q(t,r):void 0} 21
function(){throw St(new Os(il((qs(),_g))))} 21
function(n){return n} 20
function(n){return null!=n&&n.nodeType===Node.ELEMENT_NODE} 20
function(e){throw Error("Received unhandled value: ".concat(e))} 20
function(r,n){return Array.isArray(r)?r.concat(n):"string"==typeof r?r:void 0} 19

bun:

Function Copies
function(){} 739
function(i){this.a=i} 250
function(r){for(var t=1;t<arguments.length;t++){var n,o=arguments[t];for(n in o)Object.prototype.hasOwnProperty.call(o,n)&&(r[n]=o[n])}return r} 197
()=>{} 141
function(n){return this===n} 119
function(c,f){f7.call(this,c,f)} 94
function(){return!0} 91
function(){return new p9(this)} 77
function(){return!1} 76
function(i,t){this.a=i,this.b=t} 58
function(n,e){if(null==n)return{};for(var r,t={},f=Object.keys(n),u=0;u<f.length;u++)r=f[u],0<=e.indexOf(r)||(t[r]=n[r]);return t} 53
function(r,t){if("object"!=typeof r||null===r)return r;var e=r[Symbol.toPrimitive];if(void 0===e)return("string"===t?String:Number)(r);e=e.call(r,t||"default");if("object"!=typeof e)return e;throw new TypeError("@@toPrimitive must return a primitive value.")} 51
function(){return this.a} 49
function(e,r){if(null==e)return{};var t,n=function(e,r){if(null==e)return{};for(var t,n={},l=Object.keys(e),o=0;o<l.length;o++)t=l[o],0<=r.indexOf(t)||(n[t]=e[t]);return n}(e,r);if(Object.getOwnPropertySymbols)for(var l=Object.getOwnPropertySymbols(e),o=0;o<l.length;o++)t=l[o],0<=r.indexOf(t)||Object.prototype.propertyIsEnumerable.call(e,t)&&(n[t]=e[t]);return n} 48
function(r,e,t){return i=function(r,e){if("object"!=typeof r||null===r)return r;var t=r[Symbol.toPrimitive];if(void 0===t)return String(r);t=t.call(r,e);if("object"!=typeof t)return t;throw new TypeError("@@toPrimitive must return a primitive value.")}(e,"string"),(e="symbol"==typeof i?i:String(i))in r?Object.defineProperty(r,e,{value:t,enumerable:!0,configurable:!0,writable:!0}):r[e]=t,r;var i} 48
function(t){t=function(t,r){if("object"!=typeof t||null===t)return t;var i=t[Symbol.toPrimitive];if(void 0===i)return String(t);i=i.call(t,r);if("object"!=typeof i)return i;throw new TypeError("@@toPrimitive must return a primitive value.")}(t,"string");return"symbol"==typeof t?t:String(t)} 48
function(i){T6(this,i)} 48
()=>{R3(),G3()} 46
function(){throw x0(new w7)} 46
function(e,r){return r=r||e.slice(0),Object.freeze(Object.defineProperties(e,{raw:{value:Object.freeze(r)}}))} 45
function(n,r){(null==r||r>n.length)&&(r=n.length);for(var e=0,l=new Array(r);e<r;e++)l[e]=n[e];return l} 41
function(n){throw x0(new w7)} 39
function(r){if(Array.isArray(r))return r} 36
function(){throw new TypeError("Invalid attempt to destructure non-iterable instance.\\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.")} 36
function(){return this.b} 33
function(n){return n2(n)} 30
function(n){return J1($5,p1,2,n,6,1)} 30
function(t,e){var r,n=Object.keys(t);return Object.getOwnPropertySymbols&&(r=Object.getOwnPropertySymbols(t),e&&(r=r.filter(function(e){return Object.getOwnPropertyDescriptor(t,e).enumerable})),n.push.apply(n,r)),n} 29
function(l,r){var t=null==l?null:"undefined"!=typeof Symbol&&l[Symbol.iterator]||l["@@iterator"];if(null!=t){var e,n,u,f,i=[],a=!0,o=!1;try{if(u=(t=t.call(l)).next,0===r){if(Object(t)!==t)return;a=!1}else for(;!(a=(e=u.call(t)).done)&&(i.push(e.value),i.length!==r);a=!0);}catch(l){o=!0,n=l}finally{try{if(!a&&null!=t.return&&(f=t.return(),Object(f)!==f))return}finally{if(o)throw n}}return i}} 29
function(n){} 29
function(){return null} 28
function(e){return Object.getOwnPropertyDescriptor(Z,e).enumerable} 25
function(e){Object.defineProperty(Z,e,Object.getOwnPropertyDescriptor(W,e))} 25
function(n){return n} 23
function(i){C6(this,i)} 23
()=>{G3()} 22
function(){throw x0(new C7(n7((y7(),KG))))} 21
function(){return this} 19
function(i,t){this.b=i,this.a=t} 19
function(n){return!1} 18
function(n,w){throw x0(new w7)} 18
function(){return 0} 17
function(n){return typeof n} 17
function(o){return o&&"function"==typeof Symbol&&o.constructor===Symbol&&o!==Symbol.prototype?"symbol":typeof o} 17
()=>{R3()} 16
()=>{var e;return null!=(e=Z.options.debugAll)?e:Z.options.debugHeaders} 16
function(n,r){return n} 15
function(c){hX.call(this,c)} 15
function(){return this.c} 15
function(){return this.d} 15
function(e,n,r){return n in e?Object.defineProperty(e,n,{value:r,enumerable:!0,configurable:!0,writable:!0}):e[n]=r,e} 14
function(r){if("undefined"!=typeof Symbol&&null!=r[Symbol.iterator]||null!=r["@@iterator"])return Array.from(r)} 14
function(){throw new TypeError("Invalid attempt to spread non-iterable instance.\\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.")} 14
d=>d.id 14
()=>W(!1) 13
a=>a 12
function(t){var l=-1,n=null==t?0:t.length;for(this.clear();++l<n;){var r=t[l];this.set(r[0],r[1])}} 12
function(n,c){} 12
function(n,r){return r} 11
function(n,a){n.a=a} 11
function(){_V.call(this)} 11
function(i,t,h){this.a=i,this.b=t,this.c=h} 11
function(){return this.a.gc()} 11
function(i){this.b=i} 10
function(c){this.c=c} 10
function(){return this.f} 10

Interestingly enough, all three tools handled the job bad in different aspects:

  • vite was the slowest and produced second biggest bundle
  • esbuild was the fastest and produced the largest bundle
  • bun was slower than esbuild by a split of hair, but produced smallest bundle with least duplicates

Bonus points to bun for installing node modules in a link of an eye.

In terms of duplicates, however, all three failed miserably (in my opinion), with the best result being the bundle produced by bun with 18% duplicates and the rest having almost half the bundle wasted.

For the most part, bundlers seem to be doing a pretty bad job at tree shaking and keep a lot of those utility functions’ duplicates. One can estimate how much of a wasted space these use, by multiplying the function code length by the number of duplicates minus one (for one definition).

Let’s imagine some of the above functions could be de-duplicated. What are the benefit of that? For the most part, the front-end can load faster for users - simply because there is less bytes to transfer. On top of that, there are less functions to be created in memory. So technically, the front-end can act faster. Although, on the modern machines the difference between having one function and few thousand of the same function is negligible.

Here is a shortened list of top abusers from different bundlers for our tool:

Function vite esbuild bun
#Bytes #Bytes #Bytes
function(){}7539036844101287398868
function(){return!0}921840931860911820
function(){return!1}781560791580761520
function(){return null}276212966728644
function(){return this}204602455219437
function(){return 0}17340n/a0n/a0
function(n){return!1}13273n/a018378
function(n){}293773140329377
function(n){return n}153152042023483
function(n){return typeof n}n/a064179217476
function(n){return this===n}n/a011933321193332
function(n,r){return n}14322n/a015345
function(n,r){return r}11253n/a011253
function(n,c){}n/a0n/a012180
()=>{}27162129774141846
a=>a1248n/a01248
a=>a()1272n/a0n/a0
function(r){for(var t=1;t<arguments.length;t++){var n,o=arguments[t];for(n in o)Object.prototype.hasOwnProperty.call(o,n)&&(r[n]=o[n])}return r}2205317520221631910419728368
function n(){return Object.assign&&Object.assign.bind(),n.apply(this,arguments)}119795760120496320n/a0
function n(){return Object.assign,n.apply(this,arguments)}100858464101058580n/a0
function(n,r){(null==r||r>n.length)&&(r=n.length);for(var e=0,l=new Array(r);e<r;e++)l[e]=n[e];return l}93967210611024n/a0
function(r){if(Array.isArray(r))return r}773157833403361476
function(){throw new TypeError('Invalid attempt to destructure non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.')}77132448314276366192
function(t){return t&&"object"==typeof t&&"default"in t?t:{default:t}}76562426019240n/a0
function(_,o){_.__proto__=o}14392n/a0n/a0
function(o,r){for(var t in r)Object.prototype.hasOwnProperty.call(r,t)&&(o[t]=r[t])}141176n/a0n/a0
function(e){return e&&e.__esModule?e:{default:e}}11539211029n/a0
function(e,n,r){return n in e?Object.defineProperty(e,n,{value:r,enumerable:!0,configurable:!0,writable:!0}):e[n]=r,e}n/a011513570n/a0
function(r,n){return Array.isArray(r)?r.concat(n):"string"==typeof r?r:void 0}n/a0191520n/a0
function(n){return null!=n&&n instanceof Array}n/a021987n/a0
function(n,e){if(null==n)return{};for(var r,t={},f=Object.keys(n),u=0;u<f.length;u++)r=f[u],0<=e.indexOf(r)||(t[r]=n[r]);return t}n/a0n/a0536890
function(r,t){if("object"!=typeof r||null===r)return r;var e=r[Symbol.toPrimitive];if(void 0===e)return("string"===t?String:Number)(r);e=e.call(r,t||"default");if("object"!=typeof e)return e;throw new TypeError("@@toPrimitive must return a primitive value.")}n/a0137368535113719
function(e,r){return r=r||e.slice(0),Object.freeze(Object.defineProperties(e,{raw:{value:Object.freeze(r)}}))}n/a016017600n/a0
function(o){return o&&"function"==typeof Symbol&&o.constructor===Symbol&&o!==Symbol.prototype?"symbol":typeof o}n/a0647424n/a0
function(r){if("undefined"!=typeof Symbol&&null!=r[Symbol.iterator]||null!=r["@@iterator"])return Array.from(r)}n/a0n/a0141624

Interestingly enough, aside from a lot of () => {} and (a, b) => a and () => true (as I call them, utility) functions, there are a lot of ES6 / TypeScript helpers such as class definition and spread operator variants, presumingly made to be compatible with ES5-only browsers. Maybe if we had targeted only platforms supporting the latest ES features we would get better results?

Well, not quite much:

bundle sizes:

Bundler Bundle size
bun 5.4M
esbuild 9.2M
esbuild (tuned) 8.0M
vite 7.1M
vite (tuned) 3.8M
webpack 4.4M

vite:

Found 15983 functions, 9581 are unique (59.94%)
Duplicates length: 1495985 bytes out of 4019326 bytes are duplicate code (37.22%)

esbuild:

Found 41736 functions, 29224 are unique (70.02%)
Duplicates length: 3406606 bytes out of 8347230 bytes are duplicate code (40.81%)

webpack is (should be) already using the target config option from tsconfig.json (which is set to ESNext in our case) and bun does not really have a whole lot of customization in this regard.

One unlikely possibility when these duplicates can be faster than having just one function is that running the nearly-defined code (in terms of a single block of code) might be slightly faster than making code jumps. This is super far-fetched idea from low-level programming, when CPU does not have to jump thousands or millions of (assembly) instructions back and forth but literally a few instead. This won’t justify using verbose ES5-compatible code on ESnext browser, however. How about we run a very synthetic benchmark to check just this one theory?

function f(){return!1}

console.time('1');
for(var i=0;i<100000;i++){var a=[];for(var t=0;t<20000;t++)a.push(Math.random()*1000000);var x=a.filter(f).length}
console.timeEnd('1');
console.time('2');
for(var i=0;i<100000;i++){var a=[];for(var t=0;t<20000;t++)a.push(Math.random()*1000000);var f=function(){return!1},x=a.filter(f).length}
console.timeEnd('2');

The results are actually quite stable:

1: 17029ms
1: 16998ms
1: 16903ms

2: 21877ms
2: 21811ms
2: 21821ms

Having just one function instance is approx. 23% faster. But what happens at consequitive runs?

1: 9194ms
1: 9159ms
1: 14044ms
1: 13882ms
1: 13975ms
1: 9205ms
1: 14026ms

2: 21821ms
2: 13843ms
2: 13866ms
2: 13854ms
2: 13961ms

2: 21718ms
2: 13952ms
2: 13925ms
2: 13923ms

Seems like CPU does indeed do a little bit of instruction caching and branch prediction (first run is visibly slower than the subsequent runs). But the observation still holds: having one function definition instead of many copies (even “near” copies) has a much bigger impact.

With that being said, there is one interesting thing to try here: what if we actually replace some of those bulky duplicates with one-time declarations, within the same bundle?

Without performing an in-depth code analysis and optimization, I came up with the following naive implementation:

  1. analyze the code and pick a few functions (biggest abusers) to fix them up
  2. extract all function names, function parameter names (including potential destructuring objects in function params) and all variable and constant names
  3. for each function to be de-duplicated, create a unique name (using the variable and function parameter and function names to avoid any clashes)
  4. from the end of the file to the beginning, remove all occurrences of the function declaration and replace all function references with the unique name
  5. add the function declarations to the beginning of the file

This approach is rather naive, since it does not account for a number of edge cases. For instance, if there are two functions to be replaced:

function(e){for(var r=1;r<arguments.length;r++){var t,l=arguments[r];for(t in l)Object.prototype.hasOwnProperty.call(l,t)&&(e[t]=l[t])}return e}

function p(){return(p=Object.assign||function(e){for(var r=1;r<arguments.length;r++){var t,l=arguments[r];for(t in l)Object.prototype.hasOwnProperty.call(l,t)&&(e[t]=l[t])}return e}).apply(this,arguments)}

e.g. one includes the other, the algorithm could have potentially evicted cases like this.

Before proceeding further, it is a good idea to test if the cleaned up bundle can safely replace the original one. Hence I just stuffed it in the static assets folder of our project and ran it with the modified bundle.

This way I figured few issues with the naive approach:

  • two function definitions are causing stack overflow:
    • $z=function n(){return Object.assign,n.apply(this,arguments)}
    • $q=function n(){return n=Object.assign&&Object.assign.bind(),n.apply(this,arguments)}
  • some of the empty functions are actually used as constructors (ES5-compatible OOP model) which is only discovered by finding the expressions like $FnName.prototype.something = somethingElse;
  • some functions are named and then referenced later in the code
  • some functions are not used at all: Unused aliases

For the shorthand functions I first tried manually fixing them up - had to replace them with $z=function(){return $z=Object.assign.bind(),$z.apply(this,arguments)} alikes. This worked, so I created an AST transformer to handle these one-line return-only functions:

const simplifyFunction = (code, fname) => {
    const tmpFilename = '_tmp';

    fs.writeFileSync(tmpFilename, code, 'utf-8');

    const root = ts.createSourceFile(
        tmpFilename,
        code,
        ts.ScriptTarget.ESNext,
        /* setParentNodes */ true
    );

    let rootFnName = undefined;

    const parse = (node) => {
        if (ts.isFunctionDeclaration(node) && ts.isIdentifier(node.name) && node.name.escapedText !== '') {
            rootFnName = node.name.escapedText;
            return;
        }

        ts.forEachChild(node, child => {
            if (child) {
                parse(child);
            }
        });
    };

    parse(root);

    if (!rootFnName) {
        fs.rmSync(tmpFilename);
        return code;
    }

    const transformer = (ctx) => (sourceFile) => {
        const visit = (node) => {
            if (ts.isIdentifier(node) && node.escapedText === rootFnName) {
                return ts.factory.createIdentifier(fname);
            }

            if (
                ts.isFunctionDeclaration(node) &&
                ts.isBlock(node.body) &&
                node.body.statements.length === 1 &&
                ts.isReturnStatement(node.body.statements[0]) &&
                ts.isBinaryExpression(node.body.statements[0].expression)
            ) {
                const next = ts.factory.createFunctionDeclaration(
                    [],
                    undefined,
                    undefined,
                    [],
                    [],
                    undefined,

                    ts.factory.createBlock([
                        ts.factory.createReturnStatement(
                            ts.factory.createComma(
                                ts.factory.createAssignment(
                                    ts.factory.createIdentifier(fname),
                                    node.body.statements[0].expression.left
                                ),

                                node.body.statements[0].expression.right
                            )
                        )
                    ])
                );

                return ts.visitEachChild(next, visit, ctx);
            }

            return ts.visitEachChild(node, visit, ctx);
        };

        return ts.visitNode(sourceFile, visit);
    };

    const s = ts.createSourceFile(tmpFilename, code, ts.ScriptTarget.ESNext);
    const { transformed } = ts.transform(s, [ transformer ]);

    const newCode = ts.createPrinter({ omitTrailingSemicolon: true })
        .printFile(transformed.find(({ fileName }) => fileName === tmpFilename));

    fs.rmSync(tmpFilename);

    return newCode;
};

The transformer is essentially a two-pass processor: it first parses the code and identifies the first function declaration. If none was found - it just returns the original code. If there was a so-called “root function” defined, it then replaces all identifier with that “root function” name with the alias provided as fname. It also replaces the return statements in form of a return something && something() with return alias = something, alias().

This approach is different from simply using uglifyjs to just try and minimize the code - it is way more complex (compared to just one function call). It results in few extra whitespaces being added. But using uglifyjs messes things up again and TypeScript compiler does not have an option for minimizing the output. But few extra whitespaces are totally acceptable given the much bigger savings from this transformation.

With the constructors I simply excluded them from being de-duplicated. This resulted in 155 fewer substitutions (in Vite mode), which is negligible on the overall scale of the problem.

As for the named functions which are in the global scope and are referenced later down the line, I had to create a list of “backwards-compatible aliases”, mapping those old function names onto the new unique names.

I ended up with a three-pass parser-transformer utility (not really “simple script” anymore). The passes being:

  1. figuring out the duplicate function declarations and replacing them from the end to the start of the code (to minimize the chance of writing over the just-changed code)
  2. removing the duplicate global-scope named functions and replacing them with shorthand aliases to the de-duplicated declarations
  3. replace the usages of all known aliases and shorthand-named de-duplicated functions with their corresponding new (generated) names

Other than those, replacing the original bundle with the optimized one worked like a charm!

The results? With the threshold of 20 duplicates or more:

Bundler Before optimization
Bundle size Total functions Unique functions Unique functions, % Duplicate code, %
bun6.2M8903744383.6%0.78%
esbuild8.7M130571025078.5%3.9%
vite3.9M3502236567.53%6.39%
webpack4.4M2898143449.48%6.91%
After optimization
bun6.2M (same)7865 (-1038)7355 (-88)93.52% (+9.92%)0.51% (-0.27%)
esbuild8.5M (-0.2M)3265 (-9792)2990 (-7260)91.58% (+13.08%)0.62% (-3.28%)
vite3.6M (-0.3M)2483 (-1019)2277 (-88)91.7% (+24.17%)1.68% (-4.71%)
webpack4.1M (-0.3M)1484 (-1414)1375 (-59)92.65% (+43.17%)0.43% (-6.48%)

In conclusion, the bundlers do a pretty average job at optimizing the bundles, even in production mode with some extra tuning. And if some brave soul is willing to invest even more time and effort than I did into developing a sophisticated solution (potentially improving the existing tools, like uglifyjs or bundlers themselves), the numbers can be improved even further. It would be really interesting to see what would the results be running this optimizer on a bigger bundle.

In my humble opinion, Bun does produce the cleanest bundle. It might be not the smallest one, but it has least unnecessary stuff. On top of that, it is the fastest tool in JS world I have ever used. By the way, this very blog is built using Bun and React SSR - on Github Actions it takes around a minute to build and publish:

Github Actions build and publish with Bun Github Actions build breakdown

TypeScript classes are not what you think

There was a talk at CPPCon 2017 by Matt Goldbolt with an inspirational title, What has my compiler done for me lately?. It actually resonates with me quite a bit, especially in the world where we try to push so hard for statically typed compiled languages on front-end.

Recently I recalled one of (many) aspects why I think TypeScript is useless (as a way to introduce strong type system to JavaScript world) in many cases.

To be realistic and not just throw bare accusations around, this was inspired by the work on contracts for a new microservice, specifically - the request & response types to be used in both client and the back-end of the microservice. The types were similar to the types used on a database layer and one of the developers just returned objects of that DB layer model type on a controller (endpoint) level, which confused me.

Consider the code below:

class A {
    constructor(public moo: string, public foo: number) {}
}

class B {
    constructor(public moo: string, public foo: number, public zoo?: string[]) {}
}

const b: B = new B('ololo', -3.14, ['1']);
const a: A = b;
const c: B = a;

This works because of a (rather questionable) design decision by TypeScript team called Type compatibility, where any two classes or interfaces that have overlapping public fields are deemed compatible and can be mutually interplaceable.

However, in most languages with reasonable type system, you would expect two different classes to be just that - two different classes - in the case of a code above, an object of class B can never be assigned to a variable of type A and vice versa.

There is a way to achieve this in TypeScript, however (a bit cumbersome, though): by hiding the properties and only exposing them through non-getter/non-setter methods, when needed:

class A {
    constructor(private moo: string, private foo: number) {}

    getMoo() {
        return this.moo;
    }

    getFoo() {
        return this.foo;
    }
}

class B {
    constructor(private moo: string, private foo: number, private zoo?: ReadonlyArray<string>) {}

    getMoo() {
        return this.moo;
    }

    getFoo() {
        return this.foo;
    }

    getZoo() {
        return this.zoo;
    }
}

const b: B = new B('ololo', -3.14, ['1']);
const a: A = b; // Type 'B' is not assignable to type 'A'. Types have separate declarations of a private property 'moo'.(2322)
const c: B = a; // Property 'getZoo' is missing in type 'A' but required in type 'B'.(2741)

Without knowing this “feature” beforehand, one might end up with an inconsistent code or errors down the line (when somebody decides to modify the DB layer model and gets errors on an API layer). And TypeScript does not really help here.

How not to use monads

Recently I have watched a video titled “Optionals and errors in Haskell & Rust - monads by example”. In the video, the author makes a simple application simulating the retrieval of a HTML document via URL and checking if the document contains the title. This is a half-simulated experience, aimed at the beginners to demonstrate how the error handling is done in Rust and Haskell.

However, author in his code has made quite a few bad decisions. And that’s what I want to make a focus of this blog: how one should not use monads and handle errors in Haskell.

Author’s original source looks like this:

{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE MultiWayIf #-}
{-# LANGUAGE OverloadedRecordDot #-}
{-# LANGUAGE OverloadedRecordUpdate #-}

import Control.Monad (forM_)
import Data.List (isInfixOf)
import Data.Maybe (fromMaybe)
import Text.Printf (printf)

data Error = Error (String) deriving (Show)

data Doc = Doc {head :: Maybe Head}

data Head = Head {title :: Maybe String}

data Summary = Summary
  { title :: Maybe String,
    ok :: Bool
  }
  deriving (Show)

readDoc :: String -> Either Error Doc
readDoc url =
  if
      | isInfixOf "fail" url -> Left $ Error $ printf "Bad read of %s" url
      | otherwise ->
          Right $
            if
                | isInfixOf "head-missing" url -> Doc {head = Nothing}
                | isInfixOf "title-missing" url ->
                    Doc {head = Just Head {title = Nothing}}
                | isInfixOf "title-empty" url ->
                    Doc {head = Just Head {title = Just ""}}
                | otherwise ->
                    Doc
                      { head =
                          Just Head {title = Just $ printf "Title of %s" url}
                      }

buildSummary :: Doc -> Summary
buildSummary doc =
  Summary {title = doc.head >>= (.title), ok = True}

readAndBuildSummary :: String -> Summary
readAndBuildSummary url = case readDoc url of
  Left err -> Summary {title = Nothing, ok = False}
  Right doc -> buildSummary doc

isTitleNonEmpty :: Doc -> Maybe Bool
isTitleNonEmpty doc = do
  head <- doc.head
  title <- head.title
  return $ not $ null title

isTitleNonEmpty' :: Doc -> Maybe Bool
isTitleNonEmpty' doc = not <$> null <$> (doc.head >>= (.title))

isTitleNonEmpty'' :: Doc -> Maybe Bool
isTitleNonEmpty'' doc = not . null <$> (doc.head >>= (.title))

readWhetherTitleNonEmpty :: String -> Either Error (Maybe Bool)
readWhetherTitleNonEmpty url = do
  doc <- readDoc url
  return $ isTitleNonEmpty doc

readWhetherTitleNonEmpty' :: String -> Either Error (Maybe Bool)
readWhetherTitleNonEmpty' url = isTitleNonEmpty <$> readDoc url

main :: IO ()
main = do
  let urls = ["good", "title-empty", "title-missing", "head-missing", "fail"]
  forM_ urls $ \url -> do
    putStrLn $ printf "Checking \"https://%s/\":" url
    -- Summary.
    let summary = readAndBuildSummary url
    putStrLn $ printf "  Summary: %s" $ show summary
    putStrLn $ printf "  Title: %s" $ fromMaybe "" summary.title
    -- Has title.
    let hasTitle = readWhetherTitleNonEmpty url
    -- let hasTitleSure = either (const False) id $ fromMaybe False <$> hasTitle
    let hasTitleSure = fromMaybe False $ either (const Nothing) id hasTitle
    putStrLn $
      printf "  Has title: %s vs %s" (show hasTitle) (show hasTitleSure)

He then proceeded to wrapping this code in IO monad and ended up with the following:

{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE MultiWayIf #-}
{-# LANGUAGE OverloadedRecordDot #-}
{-# LANGUAGE OverloadedRecordUpdate #-}

import Control.Monad (forM_)
import Data.Functor ((<&>))
import Data.List (isInfixOf)
import Data.Maybe (fromMaybe)
import Text.Printf (printf)

data Error = Error (String) deriving (Show)

data Doc = Doc {head :: Maybe Head}

data Head = Head {title :: Maybe String}

data Summary = Summary
  { title :: Maybe String,
    ok :: Bool
  }
  deriving (Show)

-- readDoc

readDoc :: String -> IO (Either Error Doc)
readDoc url =
  return
    if
        | isInfixOf "fail" url -> Left $ Error $ printf "Bad read of %s" url
        | otherwise ->
            Right $
              if
                  | isInfixOf "head-missing" url -> Doc {head = Nothing}
                  | isInfixOf "title-missing" url ->
                      Doc {head = Just Head {title = Nothing}}
                  | isInfixOf "title-empty" url ->
                      Doc {head = Just Head {title = Just ""}}
                  | otherwise ->
                      Doc
                        { head =
                            Just Head {title = Just $ printf "Title of %s" url}
                        }

-- buildSummary

buildSummary :: Doc -> Summary
buildSummary doc =
  Summary {title = doc.head >>= (.title), ok = True}

-- readAndBuildSummary

readAndBuildSummary :: String -> IO Summary
readAndBuildSummary url = do
  docOrError <- readDoc url
  return $ case docOrError of
    Left err -> Summary {title = Nothing, ok = False}
    Right doc -> buildSummary doc

readAndBuildSummary' :: String -> IO Summary
readAndBuildSummary' url =
  readDoc url >>= \docOrError -> return $ case docOrError of
    Left err -> Summary {title = Nothing, ok = True}
    Right doc -> buildSummary doc

readAndBuildSummary'' :: String -> IO Summary
readAndBuildSummary'' url =
  readDoc url <&> \case
    Left err -> Summary {title = Nothing, ok = True}
    Right doc -> buildSummary doc

-- isTitleNonEmpty

isTitleNonEmpty :: Doc -> Maybe Bool
isTitleNonEmpty doc = do
  head <- doc.head
  title <- head.title
  return $ not $ null title

isTitleNonEmpty' :: Doc -> Maybe Bool
isTitleNonEmpty' doc = not <$> null <$> (doc.head >>= (.title))

isTitleNonEmpty'' :: Doc -> Maybe Bool
isTitleNonEmpty'' doc = not . null <$> (doc.head >>= (.title))

-- readWhetherTitleNonEmpty

readWhetherTitleNonEmpty :: String -> IO (Either Error (Maybe Bool))
readWhetherTitleNonEmpty url = do
  docOrError <- readDoc url
  return $ do
    doc <- docOrError
    return $ isTitleNonEmpty doc

readWhetherTitleNonEmpty' :: String -> IO (Either Error (Maybe Bool))
readWhetherTitleNonEmpty' url =
  readDoc url >>= \docOrError ->
    return $ docOrError >>= \doc -> return $ isTitleNonEmpty doc

readWhetherTitleNonEmpty'' :: String -> IO (Either Error (Maybe Bool))
readWhetherTitleNonEmpty'' url = (isTitleNonEmpty <$>) <$> readDoc url

readWhetherTitleNonEmpty''' :: String -> IO (Either Error (Maybe Bool))
readWhetherTitleNonEmpty''' url = readDoc url <&> (<&> isTitleNonEmpty)

-- main

main :: IO ()
main = do
  let urls = ["good", "title-empty", "title-missing", "head-missing", "fail"]
  forM_ urls $ \url -> do
    putStrLn $ printf "Checking \"https://%s/\":" url
    -- Summary.
    summary <- readAndBuildSummary url
    putStrLn $ printf "  Summary: %s" $ show summary
    putStrLn $ printf "  Title: %s" $ fromMaybe "" summary.title
    -- Has title.
    hasTitle <- readWhetherTitleNonEmpty url
    -- let hasTitleSure = either (const False) id $ fromMaybe False <$> hasTitle
    let hasTitleSure = fromMaybe False $ either (const Nothing) id hasTitle
    putStrLn $
      printf "  Has title: %s vs %s" (show hasTitle) (show hasTitleSure)

I see a few issues with this code:

  1. the abuse of IO monad
  2. the excessive (an I mean unnecessary) use of Maybe monad
  3. too many unnecessary readDoc calls (bear in mind: it is supposed to be a network call + document parse, so quite a heavy function)

Improvements

My suggestions on how to make their code better:

  1. only stick to IO where the function is actually interacting with the external world; in this case this would be either main or readDoc (when it actually makes a network call)
  2. eliminate the abuse of Maybe Bool - in the functions where the Maybe Bool is used (isTitleNonEmpty and its variations) just Bool would suffice
  3. only call readDoc once and use the result down the pipeline, no need to call it multiple times (readAndBuildSummary, readWhetherTitleNonEmpty)
  4. reduce the nested if statements to guard expressions
  5. replace the case eitherValue expressions with the either function
  6. remove the dead / unused / unnecessary code (like those duplicated functions - readWhetherTitleNonEmpty', readWhetherTitleNonEmpty'' and readWhetherTitleNonEmpty''')

Use IO monad only for the functions which interact with the outside world

IO monad tells compiler and the people who read the code the function operates outside of the program internals. Like actually performing input-output, working with network, operating system, etc.

You can think of any such function as an unsafe part of an application.

In the example above, most of the functions do not comply with that - they are simply a side-effect of poor monad operations and could be easily changed as follows:

isTitleNonEmpty :: Doc -> Maybe Bool
isTitleNonEmpty doc = do
  head <- doc.head
  title <- head.title
  return $ not $ null title

buildSummary :: Doc -> Summary
buildSummary doc =
  Summary {title = doc.head >>= (.title), ok = True}

-- there is absolutely no need for this function
readWhetherTitleNonEmpty :: String -> IO (Either Error (Maybe Bool))
readWhetherTitleNonEmpty url = do
  docOrError <- readDoc url
  return $ do
    doc <- docOrError
    return $ isTitleNonEmpty doc

-- this is also redundant
readAndBuildSummary :: String -> IO Summary
readAndBuildSummary url = do
  docOrError <- readDoc url
  return $ case docOrError of
    Left err -> Summary {title = Nothing, ok = False}
    Right doc -> buildSummary doc

-- this is the (reduced) existing code
main__old :: IO ()
main__old = do
  let url = "good"

  putStrLn $ printf "Checking \"https://%s/\":" url

  summary <- readAndBuildSummary url

  putStrLn $ printf "  Summary: %s" $ show summary
  putStrLn $ printf "  Title: %s" $ fromMaybe "" summary.title

  hasTitle <- readWhetherTitleNonEmpty url

-- note how the document is only passed down to the functions which actually operate on the document
-- the only two functions using `IO` monad here are `main` and `readDoc`, as both perform input-output operations
main :: IO ()
main = do
  let url = "good"

  putStrLn $ printf "Checking \"https://%s/\":" url

  docOrError <- readDoc url

  let summary = case docOrError of
    Left err -> Summary {title = Nothing, ok = False}
    Right doc -> buildSummary doc

  putStrLn $ printf "  Summary: %s" $ show summary
  putStrLn $ printf "  Title: %s" $ fromMaybe "" summary.title

  let hasTitle = isTitleNonEmpty doc

Reduce calls to IO functions

Pure functions must return the exact same value for the same arguments, regardless of how many times they are called.

IO (and a lot of other monads, like Reader, Writer, etc.) work around this rule by changing the outside world. So even though a function technically returns the same value every time, the side-effect of an IO monad can potentially change something elsewhere, outside of the function itself.

This makes the programs non-pure and might cause a lot of negative impact. These side effects are what is thought to be the root of all evils in (functional) programming. They make it hard to figure out what a program actually does. They make the functions and programs hard to test and prove to be correct.

From the performance perspective, if a function actually does change the outside world, it can’t be memoized and it won’t be optimized by the compiler to only use its result, computed once.

Think calling a function which tries to create a user in a system or drop a database table. This would happen on every call. If you call such function twice, it will most likely fail (due to database failures, hopefully).

But what if the function creates data instead and there are no checks for duplicates in the external system? Like appending a log line. If such function was to be called multiple times, it can quickly bloat the storage. This is a potential security issue as well.

So instead of calling such functions (recap: functions returning IO monad), why not call them once and store the result? Then the entire program can operate on the stored result value.

Of course, there are plethora of cases where such behaviour (calling such functions multiple times) is actually desired. But in the sample code from above, this is not the case.

Check the transformed code from the previous point how this solution is utilized:

main :: IO ()
main = do
  let url = "good"

  putStrLn $ printf "Checking \"https://%s/\":" url

  -- store the result of fetching a document by its URL
  docOrError <- readDoc url

  -- work on the stored value instead of sending an extra HTTP request
  let summary = case docOrError of
    Left err -> Summary {title = Nothing, ok = False}
    Right doc -> buildSummary doc

  putStrLn $ printf "  Summary: %s" $ show summary
  putStrLn $ printf "  Title: %s" $ fromMaybe "" summary.title

  -- work on the stored value again, preventing one more HTTP request
  let hasTitle = case docOrError of
    Left err -> False
    Right doc -> isTitleNonEmpty doc

Eliminate the abuse of Maybe Bool

There’s simply no need to go overboard with Maybe in methods which simply check whether something is present or not.

Maybe Bool introduces a tri-state:

  • Just False
  • Just True
  • Nothing

In cases where this is desired, is much better to represent each of the state with a semantic value:

type HeaderPresence = NoHeader | EmptyHeader | HasHeader

This way the code is much easier to follow and debug.

For the sample in question even this approach is redundant - the function simply checks whether the value (title) is present or not, so a simple Bool would suffice:

isTitleNonEmpty__old :: Doc -> Maybe Bool
isTitleNonEmpty__old doc = do
  head <- doc.head
  title <- head.title
  return $ not $ null title

isTitleNonEmpty :: Doc -> Bool
isTitleNonEmpty doc =
  not $ null title
  where
    title = doc.head doc >>= head.title

main__old :: IO ()
main__old = do
  let url = "good"

  docOrError <- readDoc url

  hasTitle <- case docOrError of
    Left err -> None
    Right doc -> isTitleNonEmpty__old doc

  let hasTitleSure = fromMaybe False $ either (const Nothing) id hasTitle

  putStrLn $
    printf "  Has title: %s vs %s" (show hasTitle) (show hasTitleSure)

main :: IO ()
main = do
  let url = "good"

  putStrLn $ printf "Checking \"https://%s/\":" url

  docOrError <- readDoc url

  -- a simple True | False value, no need to mess with Maybe
  let hasTitle = case docOrError of
    Left err -> False
    Right doc -> isTitleNonEmpty doc

  -- note: no need to wrap & unwrap Maybe and Either
  putStrLn $ printf "  Has title: %s" (show hasTitle)

Reduce the nested if statements - use guard expressions

The original readDoc function uses a rather unnecessarily bulky if statement:

readDoc :: String -> IO (Either Error Doc)
readDoc url =
  return
    if
        | isInfixOf "fail" url -> Left $ Error $ printf "Bad read of %s" url
        | otherwise ->
            Right $
              if
                  | isInfixOf "head-missing" url -> Doc {head = Nothing}
                  | isInfixOf "title-missing" url ->
                      Doc {head = Just Head {title = Nothing}}
                  | isInfixOf "title-empty" url ->
                      Doc {head = Just Head {title = Just ""}}
                  | otherwise ->
                      Doc
                        { head =
                            Just Head {title = Just $ printf "Title of %s" url}
                        }

Haskell provides a built-in language feature for just this case, which really does make code cleaner a lot:

readDoc :: String -> IO (Either Error Doc)

readDoc url
  | isInfixOf "fail" url = return $ Left $ Error $ printf "Bad read of %s" url
  | isInfixOf "head-missing" url = return $ Right Doc { head = Nothing }
  | isInfixOf "title-missing" url = return $ Right Doc { head = Just Head { title = Nothing } }
  | isInfixOf "title-empty" url = return $ Right Doc { head = Just Head { title = Just "" } }
  | otherwise = return $ Right Doc { head = Just Head { title = Just $ printf "Title of %s" url } }

Use existing functions on monads instead of case statements

In my experience, for the most part, you want to simply apply a function to a Maybe or an Either.

And, again, for the most part, you either ignore one of the states of the monad (be it Nothing or Left), or apply a different function to it.

There are handy helper functions for these situations which do make the code cleaner - fromMaybe and either.

Consider few examples:

hasTitle__old <- case docOrError of
  Left err -> None
  Right doc -> isTitleNonEmpty__old doc

hasTitle <- either (const None) (isTitleNonEmpty__old)
let summary__old = case docOrError of
  Left err -> Summary {title = Nothing, ok = False}
  Right doc -> buildSummary doc

-- needs a nice helper function
invalid_summary = Summary { title = Nothing, ok = False }

let summary = either invalid_summary buildSummary
let hasTitle__old = case docOrError of
  Left err -> False
  Right doc -> isTitleNonEmpty doc

let hasTitle = either (const False) isTitleNonEmpty

End result

Here’s what the code looks like when applying all of the above suggestions and tidying up the code:

{-# LANGUAGE DuplicateRecordFields #-}

import Control.Monad (forM_)
import Data.List (isInfixOf)
import Data.Maybe (fromMaybe)
import Text.Printf (printf)

newtype Error = Error String deriving (Show)

newtype Doc = Doc {_head :: Maybe Head}

newtype Head = Head {title :: Maybe String}

data Summary = Summary
  { title :: Maybe String,
    ok :: Bool
  }
  deriving (Show)

-- helpers for Repl.It is meh with language extensions

docHead :: Doc -> Maybe Head
docHead = _head

headTitle :: Head -> Maybe String
headTitle = title

summaryTitle :: Summary -> Maybe String
summaryTitle = title

-- implementations

readDoc :: String -> IO (Either Error Doc)
readDoc url
  | "fail" `isInfixOf` url = return $ Left $ Error $ printf "Bad read of %s" url
  | "head-missing" `isInfixOf` url = return $ Right Doc {_head = Nothing}
  | "title-missing" `isInfixOf` url = return $ Right Doc {_head = Just Head {title = Nothing}}
  | "title-empty" `isInfixOf` url = return $ Right Doc {_head = Just Head {title = Just ""}}
  | otherwise = return $ Right Doc {_head = Just Head {title = Just $ printf "Title of %s" url}}

buildSummary :: Doc -> Summary
buildSummary doc =
  Summary {title = docHead doc >>= headTitle, ok = True}

invalidSummary :: Summary
invalidSummary = Summary {title = Nothing, ok = False}

readAndBuildSummary :: Either Error Doc -> Summary
readAndBuildSummary = either invalidSummary buildSummary

readAndBuildTitle :: Summary -> String
readAndBuildTitle summary = fromMaybe "" (summaryTitle summary)

isTitleNonEmpty :: Doc -> Bool
isTitleNonEmpty doc =
  not $ null title
  where
    title = docHead doc >>= headTitle

readWhetherTitleNonEmpty :: Either Error Doc -> Bool
readWhetherTitleNonEmpty = either (const False) isTitleNonEmpty

-- main

main :: IO ()
main = do
  let urls = ["good", "title-empty", "title-missing", "head-missing", "fail"]

  forM_ urls $ \url -> do
    putStrLn $ printf "Checking \"https://%s/\":" url

    docOrError <- readDoc url

    -- Summary
    let summary = readAndBuildSummary docOrError
    let title = readAndBuildTitle summary

    putStrLn $ printf "  Summary: %s" $ show summary
    putStrLn $ printf "  Title: %s" $ title

    -- Has title
    let hasTitle = readWhetherTitleNonEmpty docOrError

    putStrLn $ printf "  Has title: %s" (show hasTitle)

Floyd-Warshall algorithm, revised

In this blog I would like to brush upon Floyd-Warshall algorithm implementation I have described previously.

See, generally, when explaining Floyd-Warshall algorithm, the graph is given as an adjacency matrix - an V x V matrix, where V is the number of vertices (nodes) in a graph and each matrix value representing the distance between each vertex. For instance, the value G[1][3] = 12 represents the edge from vertex 1 to vertex 3 with the value of 12.

Aside from adjacency matrix, there are two other common ways to represent a graph:

  • incidence matrix of size V x E for V vertices and E edges, where each row represents an edge value (G[i][j] > 0 for edge i -> j, G[i][j] < 0 for edge j -> i, G[i][j] = 0 for no edge between vertices i and j)
  • adjacency list, where each vertex is represented as an object with a list of its connections

Consider the graph from the previous blog:

It can be represented in three different ways:

adjacency list:

v0: [ [2 -> 1] ]
v1: [ [2 -> 6], [0 -> 7] ]
v2: [ [3 -> 5] ]
v3: [ [1 -> 2] ]

adjacency matrix:

   |  v0  v1  v2  v3
---+------------------
v0 |   0   0   1   0
v1 |   7   0   6   0
v2 |   0   0   0   5
v3 |   0   2   0   0

incidence matrix:

   |  e0  e1  e2  e3  e4
---+---------------------
v0 |   0   0   0   7   0
v1 |   0   0   2   0   0
v2 |   1   0   0   0   6
v3 |   0   5   0   0   0

However, arguably the most common and most memory-efficient way to represent graph in a real-world application would be the adjacency list.

Hence I thought a bit of a revised implementation is needed, since the previous one (aka “classic” implementation) relies on matrices. And those matrices contain quite a bit of unused data. Take the graph above as an example: it has 4 vertices and 5 edges. An adjacency matrix would use 16 numbers, incidence matrix would use 20 numbers and adjacency list would use 5 tuples of 2 numbers (10 numbers, so to speak).

“Classic” Floyd-Warshall algorithm implementation constructs an additional matrix for the shortest paths’ distances (at most it can reuse the existing one representing the graph itself) and an additional V x V matrix for path reconstruction.

With a graph represented as an adjacency list (or just a list of node objects), the algorithm can use something like a dictionary (or a hashmap) to reduce the memory consumption. Subjectively, it also becomes easier to read and understand the algorithm.

First, we’d need few definitions:

class Node
{
    public string Name;
    public Dictionary<Node, int> Connections;

    public Node(string name)
    {
        this.Name = name;
        this.Connections = new Dictionary<Node, int>();
    }

    public override string ToString()
    {
        return this.Name;
    }
}

class Graph
{
    public List<Node> Nodes;

    public Graph()
    {
        this.Nodes = new List<Node>();
    }
}

class Path
{
    public Node From;
    public Node To;
    public List<Node> Nodes;
    public int Distance;

    public Path(Node from, Node to)
    {
        this.Distance = 0;
        this.From = from;
        this.To = to;
        this.Nodes = new List<Node>();
    }
}

The Path class is a bit excessive, as it really only needs a list of Node objects, but I decided to make it verbose on purpose - to make it easier to debug.

The algorithm then becomes like this:

Dictionary<(Node from, Node to), Node> FindAllPaths(Graph g)
{
    var distance = new Dictionary<(Node from, Node to), int>();
    var pathThrough = new Dictionary<(Node from, Node to), Node>();

    foreach (var nodeFrom in g.Nodes)
    {
        foreach (var nodeTo in g.Nodes)
        {
            if (nodeFrom == nodeTo)
            {
                distance.Add((nodeFrom, nodeTo), 0);
            }
            else
            {
                distance.Add((nodeFrom, nodeTo), int.MaxValue);
            }
        }

        foreach (var nodeTo in nodeFrom.Connections.Keys)
        {
            distance[(nodeFrom, nodeTo)] = nodeFrom.Connections[nodeTo];
            pathThrough.Add((nodeFrom, nodeTo), nodeTo);
        }

        pathThrough.Add((nodeFrom, nodeFrom), nodeFrom);
    }

    foreach (var nodeThrough in g.Nodes)
    {
        foreach (var nodeFrom in g.Nodes)
        {
            foreach (var nodeTo in g.Nodes)
            {
                if (nodeFrom == nodeTo || distance[(nodeThrough, nodeTo)] == int.MaxValue || distance[(nodeFrom, nodeThrough)] == int.MaxValue)
                {
                    continue;
                }

                if (distance[(nodeFrom, nodeTo)] > distance[(nodeFrom, nodeThrough)] + distance[(nodeThrough, nodeTo)])
                {
                    distance[(nodeFrom, nodeTo)] = distance[(nodeFrom, nodeThrough)] + distance[(nodeThrough, nodeTo)];
                    pathThrough[(nodeFrom, nodeTo)] = pathThrough[(nodeFrom, nodeThrough)];
                }
            }
        }
    }

    return pathThrough;
}

One can easily see the separate parts of the algorithm - initializing the intermediate storage (for the paths and distances) and actually finding the shortest paths. It is pretty much the same algorithm as before with the only difference being the use of dictionary instead of a matrix.

Path reconstruction bit is also a little bit different (not too much though):

Path ReconstructPath(Node from, Node to, Dictionary<(Node from, Node to), Node> allPaths)
{
    if (!allPaths.ContainsKey((from, to)))
    {
        return null;
    }

    var path = new Path(from, to);

    var tempNode = from;

    path.Nodes.Add(tempNode);

    while (tempNode != to)
    {
        var nextNode = allPaths[(tempNode, to)];

        path.Distance += tempNode.Connections[nextNode];
        path.Nodes.Add(nextNode);

        tempNode = nextNode;
    }

    return path;
}

Now, there is one major improvement we can make to this algorithm. See, when we initialize the dictionaries for paths and distances, we follow the same logic as with the classic implementation, putting in the values for edges which do not exist in a graph:

foreach (var nodeFrom in g.Nodes)
{
    // this bit will add entries for inexistent edges
    foreach (var nodeTo in g.Nodes)
    {
        if (nodeFrom == nodeTo)
        {
            distance.Add((nodeFrom, nodeTo), 0);
        }
        else
        {
            distance.Add((nodeFrom, nodeTo), int.MaxValue);
        }
    }

    // ...
}

But as a matter of fact, we do not really have to store these values - instead, we could simply have a check whether the needed entry exists in the dictionary or not:

Dictionary<(Node from, Node to), Node> FindAllPaths(Graph g)
{
    var distance = new Dictionary<(Node from, Node to), int>();
    var pathThrough = new Dictionary<(Node from, Node to), Node>();

    foreach (var nodeFrom in g.Nodes)
    {
        // -- no more unnecessary entries --

        foreach (var nodeTo in nodeFrom.Connections.Keys)
        {
            distance.Add((nodeFrom, nodeTo), nodeFrom.Connections[nodeTo]);
            pathThrough.Add((nodeFrom, nodeTo), nodeTo);
        }

        pathThrough.Add((nodeFrom, nodeFrom), nodeFrom);
    }

    foreach (var nodeThrough in g.Nodes)
    {
        foreach (var nodeFrom in g.Nodes)
        {
            foreach (var nodeTo in g.Nodes)
            {
                if (nodeFrom == nodeTo)
                {
                    continue;
                }

                // ++ but one extra check here, to prevent unnecessary iterations ++
                if (!distance.ContainsKey((nodeFrom, nodeThrough)) || !distance.ContainsKey((nodeThrough, nodeTo)))
                {
                    continue;
                }

                // ++ and one extra check here, to annotate the existence of a new, indirect shortest path between nodeFrom and nodeTo ++
                if (!distance.ContainsKey((nodeFrom, nodeTo)) || distance[(nodeFrom, nodeTo)] > distance[(nodeFrom, nodeThrough)] + distance[(nodeThrough, nodeTo)])
                {
                    distance[(nodeFrom, nodeTo)] = distance[(nodeFrom, nodeThrough)] + distance[(nodeThrough, nodeTo)];
                    pathThrough[(nodeFrom, nodeTo)] = pathThrough[(nodeFrom, nodeThrough)];
                }
            }
        }
    }

    return pathThrough;
}

For that matter, we don’t really need to have a separate dictionary for path reconstruction - since both distance and pathThrough dictionaries serve the same purpose (have the exact same key, but different value), we can smash those two together and use even less memory:

Dictionary<(Node from, Node to), (Node through, int distance)> FindAllPaths(Graph g)
{
    var pathThrough = new Dictionary<(Node from, Node to), (Node through, int distance)>();

    foreach (var nodeFrom in g.Nodes)
    {
        foreach (var nodeTo in nodeFrom.Connections.Keys)
        {
            pathThrough.Add((nodeFrom, nodeTo), (nodeTo, nodeFrom.Connections[nodeTo]));
        }

        pathThrough.Add((nodeFrom, nodeFrom), (nodeFrom, 0));
    }

    foreach (var nodeThrough in g.Nodes)
    {
        foreach (var nodeFrom in g.Nodes)
        {
            foreach (var nodeTo in g.Nodes)
            {
                if (nodeFrom == nodeTo)
                {
                    continue;
                }

                if (!pathThrough.ContainsKey((nodeFrom, nodeThrough)) || !pathThrough.ContainsKey((nodeThrough, nodeTo)))
                {
                    continue;
                }

                if (!pathThrough.ContainsKey((nodeFrom, nodeTo)) || pathThrough[(nodeFrom, nodeTo)].distance > pathThrough[(nodeFrom, nodeThrough)].distance + pathThrough[(nodeThrough, nodeTo)].distance)
                {
                    pathThrough[(nodeFrom, nodeTo)] = (nodeThrough, pathThrough[(nodeFrom, nodeThrough)].distance + pathThrough[(nodeThrough, nodeTo)].distance);
                }
            }
        }
    }

    return pathThrough;
}

Essentially, nothing special - just using the tuples for both the intermediate node and the distance for the shortest paths.

And with a tiny bit of a change, path reconstruction:

Path ReconstructPath(Node from, Node to, Dictionary<(Node from, Node to), (Node through, int distance)> allPaths)
{
    if (!allPaths.ContainsKey((from, to)))
    {
        return null;
    }

    var path = new Path(from, to);

    var tempNode = from;

    path.Nodes.Add(tempNode);

    while (tempNode != to)
    {
        var (nextNode, distance) = allPaths[(tempNode, to)];

        path.Distance += distance;
        path.Nodes.Add(nextNode);

        tempNode = nextNode;
    }

    return path;
}

Parser monad

Few years ago I did a course project for the “Functional programming” course at Jagiellonian University. The project was to implement a program for solving systems of equations… in Haskell.

For me back then, the entire concept of monads was quite foreign and scary, so I implemented a very crude state machine to parse the equations as strings from STDIN:

parseRow :: String -> Integer -> [Integer] -> [String] -> ([Integer], [String])
parseRow [] state coefficients var_names
	| state == 7 = (reverse coefficients, reverse var_names)
	| otherwise = error ("Invalid equation (state: " ++ show state ++ "; coefficients: " ++ show coefficients ++ "; var_names: " ++ show var_names ++ ")")

parseRow (c:cs) state coefficients var_names
	| (state == 1) && (c == '-') = parseRow cs 2 ((-1) : coefficients) var_names
	| state == 1 && isNum c = parseRow cs 3 (c_int : coefficients) var_names
	| state == 1 && isAlpha c = parseRow cs 4 (1 : coefficients) ([c] : var_names)
	| state == 2 && isNum c = parseRow cs 3 (repl_k : drop 1 coefficients) var_names
	| state == 2 && isAlpha c = parseRow cs 4 coefficients ([c] : var_names)
	| (state == 3) && (c == '=') = parseRow cs 5 coefficients var_names
	| state == 3 && isAlpha c = parseRow cs 4 coefficients ([c] : var_names)
	| state == 3 && isNum c = parseRow cs 3 (new_k : drop 1 coefficients) var_names
	| (state == 4) && (c == '-') = parseRow cs 2 ((-1) : coefficients) var_names
	| (state == 4) && (c == '+') = parseRow cs 2 (1 : coefficients) var_names
	| state == 4 && isAlphaNum c = parseRow cs 4 coefficients (new_v : drop 1 var_names)
	| (state == 4) && (c == '=') = parseRow cs 5 coefficients var_names
	| (state == 5) && (c == '-') = parseRow cs 6 ((-1) : coefficients) var_names
	| state == 5 && isNum c = parseRow cs 7 (c_int : coefficients) var_names
	| state == 6 && isNum c = parseRow cs 7 (repl_k : drop 1 coefficients) var_names
	| state == 7 && isNum c = parseRow cs 7 (new_k : drop 1 coefficients) var_names
	| otherwise = parseRow cs state coefficients var_names
	where
		k = abs $ head coefficients
		k_sign = sign (head coefficients)
		c_int = atoi c
		new_k = k_sign * ((k * 10) + c_int)
		repl_k = k_sign * c_int
		v = if not (null var_names) then head var_names else []
		new_v = v ++ [c]

parseLine :: String -> ([Integer], [String])
parseLine s = parseRow s 1 [] []

However, recently I (hopefully) got a grip of monads, so I decided to modify my old project in Haskell to make it prettier.

One of the improvements I made was the parsing part. Since one of the requirements for the course project was to not use any third-party libraries, I decided to stick to this and implement a parser monad.

This proved to be an interesting task and made the entire parsing part quite neat:

equationFactor :: Parser Fraction
equationFactor = do
    _sign <- factorSign

    zeroOrMore (sat isSpace)

    factor <- fmap (fromMaybe (1%1)) (zeroOrOne rationalFactor)

    return (_sign * factor)

equationMember :: Parser (Fraction, String)
equationMember = do
    factor <- equationFactor

    zeroOrMore (sat isSpace)
    zeroOrOne (sat (== '*'))
    zeroOrMore (sat isSpace)

    nameFirst <- oneOrMore (sat isAlpha)
    nameRest <- zeroOrMore (sat isAlphaNum)

    zeroOrMore (sat isSpace)

    return (factor, nameFirst ++ nameRest)

-- An equation consists of a list of pairs (factor, variable name) and a free member
equation :: Parser ([(Fraction, String)], Fraction)
equation = do
    members <- oneOrMore equationMember

    zeroOrMore (sat isSpace)
    sat (== '=')
    zeroOrMore (sat isSpace)

    freeMember <- rationalFactor

    return (members, freeMember)

This is all possible thanks to the Parser monad. Under the cut - the implementation details.

Read more

ConfigScript

Once upon a time I thought OGRE was overly too complicated - all those unnecessary script files, custom formats, a ton of setup hassle. That was until I tried figuring out an entire modern 3D rendering stack from scratch. That’s when I realized configuring a rendering process (rendering pipeline) can be tricky. And that’s when I realized configuring application with config files can be helpful. OGRE suddenly became very appealing to me and I really began to appreciate all the work the devs have put into it.

One good aspect of OGRE was the “unnecessary” script and configuration files. But the syntax of those files looked much cleaner than that of JSON:

// This is a comment
object_keyword Example/ObjectName
{
    attribute_name "some value"

    object_keyword2 "Nested Object"
    {
        other_attribute 1 2 3
        // and so on..
    }
}

I thought if I could harvest anything from OGRE into my OpenGL application, that would be the configuration based on this format (rather than Lua or whatever scripts).

Hence I crafted this simple grammar in ANTLR4 to parse these files:

grammar ConfigScript;

config : (object | comment)* EOF ;

object
    : Identifier '{' property* '}'
    | Identifier STRING '{' (property)* '}'
    ;

property : Identifier propertyValue ;

propertyValue
    : vector
    | INT
    | FLOAT
    | BOOL
    | STRING
    | objectValue
    ;

objectValue
    : '{' property* '}'
    | STRING '{' (property)* '}'
    ;

vector
    : INT+
    | FLOAT+
    ;

comment : LINE_COMMENT | BLOCK_COMMENT ;

STRING : DOUBLE_QUOTED_STRING | SINGLE_QUOTED_STRING ;

BOOL : 'true' | 'false' ;

DOUBLE_QUOTED_STRING : '"' DoubleQuoteStringChar* '"' ;
SINGLE_QUOTED_STRING : '\'' SingleQuoteStringChar* '\'' ;

Identifier : ALPHA (ALPHA | NUM)* ;

fragment SingleQuoteStringChar : ~['\r\n] ;
    // : ~['\\\r\n]
    // | SimpleEscapeSequence ;

fragment DoubleQuoteStringChar : ~["\r\n] ;
    // : ~["\\\r\n]
    // | SimpleEscapeSequence ;

// fragment SimpleEscapeSequence : '\\' ['"?abfnrtv\\] ;

INT : '0'
    | '-'? [1-9] [0-9]*
    ;

FLOAT : ('+' | '-')? NUM+ '.' NUM+ ;

WHITESPACE : [ \r\n\t]+ -> skip ;
ALPHA : [a-zA-Z_] ;
NUM : [0-9] ;

LINE_COMMENT : '//' ~[\r\n]* -> skip ;
BLOCK_COMMENT : '/*' .*? '*/' -> skip ;

The only difference is that the object name can only be a quoted string:

// This is a comment
object_keyword "Example/ObjectName" // <--- this can not be just Example/ObjectName
{
    attribute_name "some value"

    object_keyword2 "Nested Object"
    {
        other_attribute 1 2 3
        // and so on..
        /* block comment */
    }
}

The only thing left with this parser thing is to compile it and use in a project.

Read more

OpenGL: advanced samples

Damn this blog was long coming - I picked my interest in shaders over a decade ago, started to actually learn something in early 2021 and started writing this blog at the end of 2021. Now the time has finally come to share this with the world!

For a long time I was keen in learning low-level computer graphics APIs and algorithms, but only recently actually made any progress towards that goal. I have spent few weeks if not months learning about shaders, graphics pipeline and rendering techniques, so this write-up was long forecoming. There might be some mistakes, since I am not an expert in CG and OpenGL and there is a big chunk missing, namely rendering animated 3D models, compute shaders and tesselation, but I hope to fix those in the future.

One might ask: why OpenGL? Why not Vulkan (or DirectX 12)? Why even bother with these emulaiton techniques - why not jump straight to raytracing? Well, it is the end of October 2022 at the moment of publishing this blog, the most recent GPU on the market is GeForce RTX 4090, which is yet to arrive and its RRP (recommended retailer price) is A$2959 (as advertised by nVidia):

RTX4090 price as of October 2022

And most of those cards are gone in a blink of an eye and used for mining whatever-coins. My PC runs GTX1080, which is still more than enough for most of the games I play (being Genshin Impact at most).

I think it is still not a widely spread consumer market, and whilst raytracing, DLSS and all those fine things sound super cool and nice, it is still a niche market. That is not to say I won’t be looking into that in the near future. I just think OpenGL still has place in this world, at least as a starting point to build a decent understanding of the modern graphics pipeline, some basic rendering techniques and justify switching to more recent APIs. OpenGL as well as DirectX 11 are still a very feasible fallback for those sytems which do not run top-notch hardware or are struggling running those massive open-world games in anything higher than 1080p@60fps.

In this blog I will try to condense the knowledge I have acquired. I will skip most basic parts such as initializing context, creating window and so on, since most of the tutorials and articles on the Internet focus on those topics. Unlike most tutorials and articles out there, this blog will be all about rendering techniques.

This article was heavily inspired by few blogs on russian website (Habr), namely “super-modern OpenGL” (part 1 and part 2) - they are quite messy and lack a lot of material on really interesting topics (again, rendering techniques). This blog partially builds on top of those two and uses a lot more other materials (references provided).

This blog is enormous, so blow is the table of contents for the topics covered. Feel free to jump straight to the topic that picks your interest, screenshots are provided:

The source code is available on GitHub.

Read more