Samrat Man Singh


Hi! I’m Samrat, a programmer from Nepal.


WebAssembly branching instructions, by example


I have been familiarizing myself with WebAssembly by writing small WAT files and looking at the WASM output for some simple C functions. WebAssembly Studio is a great tool for doing this kind of exploration.

Branching in WebAssembly takes some getting used to– for one thing, the target of your branch instructions are relative to the current scope. That is, the target of the br 1 refers to the scope one level up from the current scope and the same target will be referred to as different indices depending on where the branch is being called from.

(block ;; <- this is the target block for the br below
  (block ;; <- if you wanted to refer to this, you would use `0`.
    br 1

The other thing is that the branching behaviour is a bit different depending on whether the target is a block or a loop. If the target is a block, control exits from the block. While if the target is a loop, the loop continues.

I thought I’d jot down some notes on my understanding of how branching works in WASM along with some examples.


The branch instructions take an index– this index specifies the number of levels outward to go from the current scope:

    local.get 0 ;; get 0th local, and put into stack
    i32.eqz ;; pop stack and check if it is zero, put result(boolean) into stack
    br_if 0 ;; branch out of 0th `block` if top item in stack is true
    i32.const 42
    local.set 1)

As mentioned in the comment in the code block, if the 0th local is zero, control reaches the instruction after the block.

Let’s look at a function involving nested blocks. Suppose we want to encode the following logic in WASM:

if (x == 0)
  return 42;
else if (x == 1)
  return 99;
  return 7;

One way to achieve this would be with the following function:

(func $foo (param i32) (result i32)
       (local i32)
                   ;; x == 0
                   local.get 0
                   br_if 0

                   ;; x == 1
                   local.get 0
                   i32.const 1
                   br_if 1

                   ;; the `else` case
                   i32.const 7
                   local.set 1
                   br 2)
             i32.const 42
             local.set 1
             br 1)
         i32.const 99
         local.set 1)
       local.get 1)

If you actually compile the C or Rust equivalent of the code above, this is not the WASM you’ll get but this serves nicely for our purposes.

Let’s start with the “else” case first. We set the local to 7 and branch out of the block two levels up. This is the outermost block, which makes the next instruction the local.get 1. So we get the local we just set and return.

In the x == 0 case, we branch out from the innermost block, and the next three instructions are:

i32.const 42
local.set 1
br 1

Again we set the local to 42. And the index we pass to br is 1. Remember though that we branched out from a block, so branching out with index 1 means we are branching out of the outermost block again.


A branch instruction referring to a loop behaves a bit differently– if the index passed to a branch instruction is a loop, control passes to the beginning of the loop instead of exiting as happens in a block.

Here is an example showing the iterative factorial algorithm:

;; int result = 1;
;; while (i > 0) {
;;   result = result * i;
;;   i = i - 1;
;; }
(func $iterFact (param i64) (result i64)
       (local i64)
       i64.const 1
       local.set 1
           local.get 0
           br_if 0
            local.get 1
            local.get 0
            local.set 1
            local.get 0
            i64.const -1
            local.tee 0
            br_if 1
            br 0))
       local.get 1)

Let’s focus on just the branching statements.

The first br_if, outside the loop exits the block if the argument passed is 0.

The br_if inside the loop also exits the block(and hence also the loop), although note that this time the same block is referred to as index 1 instead of 0.

The final br 0(the penultimate line in the code block above) branches to the beginning of the loop, thus continuing the loop.

Summary of recent reading(October-December 2019)


On porting code


Of late, my main side project has been rug, a stripped-down Git implementation which I’m building in Rust. I’m following along James Coglan’s excellent book Building Git where he lays out how he went about building the same project, in Ruby.

rug showing the (git) status
for it's own code repository

In essence, my project boils down to porting James’s code1 into Rust. But why bother with this at all? After all, porting code from one language to another is probably not on your top ten list of cool project ideas. Building something new will always be exciting. But I do think that at least as far as personal projects are concerned, porting a piece of code gets a bad rap. They can be a great learning experience and while it may not match the thrill of starting something afresh, can be plenty exciting to work on.

Working to re-write a piece of software in another language is a great way to dive deep into how the project works. With Git, you learn how the files, directories and commits in a repo are stored as objects, what happens as you are staging each hunk of your work and how diffing works. And so much more. To be sure, you could read an article or watch a video explaining how any of these things work but grappling with actual code to implement a functionality will give you a far superior understanding of things.

Besides the technical nuances of the software’s functionality, porting code will also expose you to how the project itself is architected. What are the entities the project deals with and what is the high-level structure of the project? What layers are there and why is it layered this way and not some other way? These are problems you will face when working on your own personal and professional projects. Choosing a well-designed system to study and port can give you some new perspectives on how to think about such things.

Translating a project from one language to another can also be a great opportunity to get acquainted with a new language. I had done some Rust before working on rug but this project has been large enough that I’ve been able to go deeper into both the language and the ecosystem. For example, the last two days I’ve been working on replacing the ad-hoc CLI argument parsing code with a wonderful Rust library called clap. With Rust, there are plenty of new concepts, and sometimes it just takes some practice using it to get used to the language.

Hopefully, I’ve convinced you that choosing to port a piece of software as your next programming side-project is at least worth considering. As I said earlier, building something entirely new has it’s thrills. But diving into a codebase and understanding it deeply enough to be able to port it to another language can also be immensely satisfying.

  1. Which itself is a port(sorta) of the original Git codebase, but this is not really relevant for this post.


Piping output to a pager


When you run git diff on a project where you’ve made more than a screenful of changes, the output is automatically redirected to a pager and you can browse through your changes using the familiar less interface. This post shows a minimal example piping output to less.

To achieve this, git creates a pipe and spawns a new process, whose purpose is to exec the pager.

int fds[2];
if (pipe(fds) == -1) {
int child_pid = fork();

Let’s start by looking at the parent process, even though in the code this case is handled last. The STDOUT of the parent process is aliased to the write end of the pipe, so any further printing in the parent process writes stuff to fds[WRITE_END]. Once we’re done, we close the write end of the pipe to signal EOF to the child process.

We wait for the child process to exit since otherwise control returns to the shell.

switch (child_pid)
// ...
  default: { /* Parent */
    /* STDOUT is now fds[WRITE_END] */
    dup2(fds[WRITE_END], STDOUT_FILENO);
    /* parent doesn't read from pipe */

    /* "Business" logic which determines what to actually print */
    int num_lines = 1024;
    if (argc > 1) {
      num_lines = atoi(argv[1]);


    /* Signal EOF to the pager process */

    int stat_loc;
    waitpid(child_pid, &stat_loc, 0);

The STDIN of the new process is aliased to the read end of the pipe. Anything being printed in the parent process is thus now an input to the pager process. After setting up STDIN, this process runs less.

switch (child_pid) {
case 0: {    /* Child(pager) */
    /* Pager process doesn't write to pipe */

    /* Make READ_END of pipe pager's STDIN */
    dup2(fds[READ_END], STDIN_FILENO);

    /* F -> quit-if-one-screen */
    /* R -> preserve color formatting */
    /* X -> don't send some special instructions eg. to clear terminal screen before starting */
    char *less_argv[] = {"less", "-FRX", NULL};
    int exec_status = execvp(less_argv[0], less_argv);

            "execvp failed with status: %d and errno: %d\n", exec_status, errno);
// ...
} // switch

The full example can be found in this gist. You can also check out the Ruby and Rust implementations of this.

Myers' diff algorithm in Clojure


If you’re a programmer, you’re probably fairly comfortable with understanding and interpreting diff patches of the kind that git and other version control systems use. In this post we’ll walk through a Clojure implementation of the Myers’ diff algorithm, which is the algorithm used by Git.

I have been implementing Git following along James Coglan’s book “Building Git”– you can check out the project here. The project itself is in Rust but I thought it might be an interesting exercise to implement the algorithm in Clojure.

Edit graph

Say we want to turn a file with the lines A,B,C into A,C,E. The Myers’ diff algorithm finds out the shortest edit script that does so. The algorithm works on an edit graph as shown above– a rightward traversal from one vertex to another indicates deletion of a line from the old file; a downward traversal indicates insertion of a line from the new file.

Also notice that in our example ABC -> ACE there are two lines which are the same(A & C). Myers’ algorithm leverages this to produce the shortest edit script. In our edit graph above, you can see that where lines are equal we have drawn dotted lines to indicate diagonal traversals which require neither insertion nor deletion. The Myers’ algorithm maximizes the number of diagonal traversals in the edit script it produces1.

Let’s walk through the output Myers’ algorithm produces for our example above:

The first character is the same in both files so take the first diagonal down. Then delete B. The third character in the old file(C) is the same as the second in the new file so keep it and do a diagonal traversal again. Then insert E. This series of edits on the old file has produced our desired new file, and so we stop.

Finally, before we dive into the algorithm, we have one more addition to the edit graph. We will draw downward-diagonals across the graph, also called k-lines. Each k-line has the equation k = x - y.


The maximum number of operations(insertions or deletions) the edit script can have is len(old) + len(new)– in this case we would be deleting every line in the old file and inserting every line from the new file.

We iterate through 0 to maximum number of operations and see what the farthest point along each k-line we can go. In the reduce, we can see that our accumulator is a tuple with v and trace.

v maps each k to the farthest x we have found so far along that k-line. We can discard the y values since we can easily compute them(y = x - k).

trace is a vector of the vs we’ve seen so far.

We can see that the function ultimately returns trace. This will be used to reconstruct the path that we took along the graph.

(defn shortest-edit [a b]
  (let [max (+ (count a) (count b))
        [_ trace] (reduce (fn [[v trace] d]
                            (let [[found-end? farthest-xs] (get-farthest-xs-for-d v a b d)
                                  [new-v new-trace] [(merge v farthest-xs)
                                                     (conj trace v)]]
                              (if found-end?
                                (reduced [new-v new-trace])
                                [new-v new-trace])))
                          [{1 0} []]
                          (range 0 (inc max)))]

To get to a line k, we can either go down from k+1 or go right from line k-1. We use the move-down? function to decide whether to move down or right:

(defn move-down?
  [v d k]
  (or (= k (- d))
      (and (not= k d)
           (< (get v (dec k))
              (get v (inc k))))))

Next, in get-farthest-xy-along-k* we figure out what the farthest (x,y) we can reach in this round d along the k line given the v map we’ve built so far. a and b contain the lines that we want to diff. This simply uses move-down?as described above; then if possible walks down any diagonals.

(defn get-farthest-xy-along-k*
  [v d a b k]
  (let [x (if (move-down? v d k)
            (get v (inc k))
            (inc (get v (dec k))))
        y (- x k)
        ;; walk down diagonals
        [x y] (loop [[x y] [x y]]
                (if (and (< x (count a))
                         (< y (count b))
                         (= (get a x) (get b y)))
                  (recur [(inc x) (inc y)])
                  [x y]))]
    [x y]))

get-farthest-xs-for-d iterates from -d to d(both inclusive), 2 steps at a time2 and stores the farthest x for each k, returning early if we reach the bottom-right corner in our edit graph.

(defn get-farthest-xs-for-d
  [v a b d]
  (let [get-farthest-xy-along-k (partial get-farthest-xy-along-k* v d a b)]
    (reduce (fn [[found-end? k->x] k]
              (let [[x y] (get-farthest-xy-along-k k)]
                (if (and (>= x (count a))
                         (>= y (count b)))
                  (reduced [true k->x])
                  [found-end? (assoc k->x k x)])))
            [false {}]
            (range (- d) (inc d) 2))))

So far, we’ve built up functions that give us a trace:

myers.core> (shortest-edit (clojure.string/split "ABCABBA" #"")
                  (clojure.string/split "CBABAC" #""))
[{1 0}
 {1 0, 0 0}
 {1 1, 0 0, -1 0}
 {1 1, 0 2, -1 0, -2 2, 2 3}
 {1 5, 0 2, -1 4, -2 2, 2 3, -3 3, 3 5}
 {1 5, 0 5, -1 4, -2 4, 2 7, -3 3, 3 5, -4 3, 4 7}]

Now, let’s look at how to turn this into a diff.

Backtracking and producing the diff

This next function will take an input a trace and output an edit sequence. Each element in the edit-sequence is of the form [prev-x prev-y x y].

This time we start from the bottom-right node of the edit graph. In each iteration, we figure out the previous (x,y)– also walking back up diagonals if required.

(defn trace->edit-seq
  [n m trace]
  (let [[_ edit-seq]
        (reduce (fn [[[x y] edit-seq] [d v]]
                  (let [k (- x y)
                        prev-k (if (move-down? v d k)
                                 (inc k)
                                 (dec k))
                        prev-x (get v prev-k)
                        prev-y (- prev-x prev-k)
                        [[x y] edit-seq] (loop [[x y] [x y]
                                                edit-seq edit-seq]
                                           (if (and (> x prev-x) (> y prev-y))
                                             (recur [(dec x) (dec y)]
                                                    (conj edit-seq
                                                          [(dec x) (dec y) x y]))
                                             [[x y] edit-seq]))
                        edit-seq (if (pos? d)
                                   (conj edit-seq [prev-x prev-y x y])
                    [[prev-x prev-y] edit-seq]))
                [[n m] []]
                (reverse (map vector (range) trace)))]
    (reverse edit-seq)))

Once we have an edit sequence, printing the diff is pretty straightforward:

(defn diff
  [a b]
  (for [[prev-x prev-y x y] (trace->edit-seq (count a)
                                             (count b)
                                             (shortest-edit a b))]
    (let [a-line (get a prev-x)
          b-line (get b prev-y)]
        (= x prev-x) {:op :ins
                      :content b-line}
        (= y prev-y) {:op :del
                      :content a-line}
        :else {:op :eql
               :content a-line}))))

(defn print-diff
  (doseq [{:keys [op content]} diff]
    (let [d-out (case op
                  :ins (format "+%s" content)
                  :del (format "-%s" content)
                  :eql (format " %s" content))]
      (println d-out))))

  1. which is also why finding the shortest edit script and finding the longest common subsequence are equivalent problems.

  2. if d is even, you can only reach an even k-line, and vice-versa.