Samrat Man Singh

Email: mail@samrat.me

Hi! I’m Samrat, a programmer from Nepal.

Summary of recent reading(July-September 2019)

2019-09-30


The book is slightly outdated and has a couple of typos but I found it a worthwhile read– it’s relatively quick to work through and the book tries to build up a good mental model of what OTP behaviours offer and when to use them. Of note is the chapter showing how you might implement a Supervisor using a GenServer.

[spoiler] Vlad enlists in Morrolan’s army for war against Fornia, a Dragon who’s stolen something from Morrolan. [/spoiler]


Summary of recent reading(April-June 2019)

2019-06-30



Summary of recent reading(January-March 2019)

2019-03-17



Book reviews- Alone on the wall, The Push, Can't Hurt Me

2018-12-31


Alone on the Wall by Alex Honnold and David Roberts

This is a memoir of sorts by a rock climber most famous for his free solo ascents(free soloing = rock climbing without a rope). The reason I picked up this book is because I’ve been doing some climbing myself and while the type of climbing I’ve done– mostly bouldering, mostly indoors– is very different from what Honnold does, it’s still very interesting to read about the perspective and experiences of someone who performs at such a high level.

Especially interesting was his perspective on risk versus consequences. Climbing a big wall without a rope has obviously very grave consequences if he falls, but he figures that most of the time the risk of his actually falling is pretty low.

The Push by Tommy Caldwell

Another rock climbing book. A big theme in this book is getting over adversity and pushing past perceived limits. Caldwell has had quite an interesting life and there are bits of biographical details in the book alongside his rock climbing exploits. The biographical bits are actually quite interesting and they tie in with what fed his sense of purpose while doing some of his climbs.

Can’t Hurt Me by David Goggins

This one is not about rock-climbing! Instead, it’s by a former Navy SEAL turned endurance athlete– Goggins has participated in several ultra-marathons. This is somewhat thematically similar to “The Push” in that it is also about pushing past your perceived limits, although unlike “The Push”, this one is a bit prescriptive and borders on the “self-help” genre at times.

Still, I think the core message of the book– that you should constantly be pushing past your comfort zone– is a valuable message.


Disk-based algorithms: External merge-sort

2017-11-13


I have lately been studying how databases work under the hood. There are many problems that a database has to solve. At the moment, I’ve been having fun exploring how data is stored on disk. Previously, I wrote about how a hashtable is stored on disk. This post is somewhat similar– I talk about how data stored on disk can be sorted.

Databases frequently have to sort data of massive sizes, sizes big enough that they don’t fit in memory at once. Sorting algorithms that can sort data residing in disk are called external sorting algorithms. External merge-sort is one such algorithm.

External merge-sort

Just like merge-sort, external merge-sort is a divide-and-conquer algorithm– it recursively sorts smaller subarrays, and then merges those subarrays. Because disk I/O happens in blocks, however, the smallest subarray is the size of a disk page. So for a file with n pages, the algorithm proceeds as follows.

First, all pages are sorted internally. To achieve this, we bring in however many pages can fit into the program’s buffers(remember, the entire data doesn’t fit in memory), sort each page and write them back. This step is usually referred to as pass 0.

Next, in pass 1, we sort runs of length 1. A run is a collection of sorted pages, so that means that we take pairs of pages and merge them to get $n/2$ runs of length 2. Now, the contents of pages 0 and 1 are sorted; as are 2 and 3, and so on. In pass 2, we do the same thing, but with runs of length 2. And so on, until finally we have two subarrays(runs of length $n/2$ pages). Merging those gives us the final sorted result. We start with $n$ runs and halve at each pass, so there will be $log_2(n)$ passes.

All this may seem very familiar to you if you’re already acquainted with merge-sort. The difference here is that is some extra work to be done when merging. Take, for example, the last pass where we do the final merge between two runs of $n/2$ pages. Our original constraint was that n pages don’t fit in memory, so 2 runs of $n/2$ pages certainly won’t either. To solve this, the pages in the two input runs are streamed in.

First we bring in the first pages from each of the runs. As with “regular” merge-sort, we iterate through both, picking off whichever has the smaller item. Eventually, one will run out. At that point, bring in another page from that run into the buffer, and continue. Our output is also a file, so we need to do the same on the output side– once the output buffer is filled, write it to the output file, and start with an empty buffer.

You can check out my Rust implementation of this algorithm on Github.

Multi-way merging

What I’ve described above is called two-way merging since at any given step we merge two runs. However, it can be advantageous to merge $k$ runs at once. This would result in $log_k(n)$ number of passes. Since we write out to disk in each pass, the efficiency gain this reduction in the number of passes gives us can be significant.

You can read more about the ways to do k-way merging on Wikipedia.