IO impact

At MongoDB I work on a Relational Migrator project - a tool which helps people migrate their relational database to MongoDB. And recently we grew interested in the performance of our tool. Due to the nature of the migrations, they are usually extremely long (potentially even never ending, for some scenarios). It is a rather valuable information to know where we can speed things up.

Hence we ran a profiler on a relatively big database of 1M rows. And this was what we saw:

The handleBatch method is where the meat and potatoes of our migration logic reside. It lasts for approx. 6.5 sec. We could have debated on which parts of this flame graph we could optimize (and we actually did), but we first decided to take a quick look at the same graph from the higher level - not the CPU time (when CPU is actually doing the active work) but the total time:

The entire application run took 4,937 sec (1hr 22min 17sec). Of which, the migration itself took only 130 sec:

The biggest chunk of it was writing to MongoDB database at 120 sec:

The actual migration logic is really just 3.5 sec:

So out of 130 sec of the actual migration run, the actual logic took 3.5 sec or mere 2.69%. The rest was just IO (input/output). Which we also saw on the thread timeline:

Most time all the threads spent sleeping.

This is not new information, just a reminder that the slowest part of pretty much any application is input-output.