Samrat Man Singh

Email: mail@samrat.me

Hi! I’m Samrat, a programmer from Nepal.

Piping output to a pager

2019-10-28

When you run git diff on a project where you’ve made more than a screenful of changes, the output is automatically redirected to a pager and you can browse through your changes using the familiar less interface. This post shows a minimal example piping output to less.

To achieve this, git creates a pipe and spawns a new process, whose purpose is to exec the pager.

int fds[2];
if (pipe(fds) == -1) {
perror_die("pipe");
}
int child_pid = fork();

Let’s start by looking at the parent process, even though in the code this case is handled last. The STDOUT of the parent process is aliased to the write end of the pipe, so any further printing in the parent process writes stuff to fds[WRITE_END]. Once we’re done, we close the write end of the pipe to signal EOF to the child process.

We wait for the child process to exit since otherwise control returns to the shell.

switch (child_pid)
// ...
default: { /* Parent */
/* STDOUT is now fds[WRITE_END] */
dup2(fds[WRITE_END], STDOUT_FILENO);
/* parent doesn't read from pipe */

/* "Business" logic which determines what to actually print */
int num_lines = 1024;
if (argc > 1) {
num_lines = atoi(argv[1]);
}
print_n_lines(num_lines);

fflush(stdout);

/* Signal EOF to the pager process */
close(STDOUT_FILENO);

int stat_loc;
waitpid(child_pid, &stat_loc, 0);
break;
}

The STDIN of the new process is aliased to the read end of the pipe. Anything being printed in the parent process is thus now an input to the pager process. After setting up STDIN, this process runs less.

switch (child_pid) {
case 0: {    /* Child(pager) */
/* Pager process doesn't write to pipe */
close(fds[WRITE_END]);

/* Make READ_END of pipe pager's STDIN */

/* F -> quit-if-one-screen */
/* R -> preserve color formatting */
/* X -> don't send some special instructions eg. to clear terminal screen before starting */
char *less_argv[] = {"less", "-FRX", NULL};
int exec_status = execvp(less_argv[0], less_argv);

fprintf(stderr,
"execvp failed with status: %d and errno: %d\n", exec_status, errno);
break;
}

// ...
} // switch

The full example can be found in this gist. You can also check out the Ruby and Rust implementations of this.

Myers' diff algorithm in Clojure

2019-10-21

If you’re a programmer, you’re probably fairly comfortable with understanding and interpreting diff patches of the kind that git and other version control systems use. In this post we’ll walk through a Clojure implementation of the Myers’ diff algorithm, which is the algorithm used by Git.

I have been implementing Git following along James Coglan’s book “Building Git”– you can check out the project here. The project itself is in Rust but I thought it might be an interesting exercise to implement the algorithm in Clojure.

Edit graph

Say we want to turn a file with the lines A,B,C into A,C,E. The Myers’ diff algorithm finds out the shortest edit script that does so. The algorithm works on an edit graph as shown above– a rightward traversal from one vertex to another indicates deletion of a line from the old file; a downward traversal indicates insertion of a line from the new file.

Also notice that in our example ABC -> ACE there are two lines which are the same(A & C). Myers’ algorithm leverages this to produce the shortest edit script. In our edit graph above, you can see that where lines are equal we have drawn dotted lines to indicate diagonal traversals which require neither insertion nor deletion. The Myers’ algorithm maximizes the number of diagonal traversals in the edit script it produces1.

Let’s walk through the output Myers’ algorithm produces for our example above:

The first character is the same in both files so take the first diagonal down. Then delete B. The third character in the old file(C) is the same as the second in the new file so keep it and do a diagonal traversal again. Then insert E. This series of edits on the old file has produced our desired new file, and so we stop.

Finally, before we dive into the algorithm, we have one more addition to the edit graph. We will draw downward-diagonals across the graph, also called k-lines. Each k-line has the equation k = x - y.

Implementation

The maximum number of operations(insertions or deletions) the edit script can have is len(old) + len(new)– in this case we would be deleting every line in the old file and inserting every line from the new file.

We iterate through 0 to maximum number of operations and see what the farthest point along each k-line we can go. In the reduce, we can see that our accumulator is a tuple with v and trace.

v maps each k to the farthest x we have found so far along that k-line. We can discard the y values since we can easily compute them(y = x - k).

trace is a vector of the vs we’ve seen so far.

We can see that the function ultimately returns trace. This will be used to reconstruct the path that we took along the graph.

(defn shortest-edit [a b]
(let [max (+ (count a) (count b))
[_ trace] (reduce (fn [[v trace] d]
(let [[found-end? farthest-xs] (get-farthest-xs-for-d v a b d)
[new-v new-trace] [(merge v farthest-xs)
(conj trace v)]]
(if found-end?
(reduced [new-v new-trace])
[new-v new-trace])))
[{1 0} []]
(range 0 (inc max)))]
trace))

To get to a line k, we can either go down from k+1 or go right from line k-1. We use the move-down? function to decide whether to move down or right:

(defn move-down?
[v d k]
(or (= k (- d))
(and (not= k d)
(< (get v (dec k))
(get v (inc k))))))
• If k=-d, it is not possible to go right from k-1, so go down from k+1.

• If k=d, it is not possible to go down from k+1, so go right from k-1.

• Otherwise, go down if v[k-1] < v[k+1].

What this expression in the predicate is saying is: if we are further along on the k+1 line, we want to go down to meet the k line. Else go right from the k-1 line.

Next, in get-farthest-xy-along-k* we figure out what the farthest (x,y) we can reach in this round d along the k line given the v map we’ve built so far. a and b contain the lines that we want to diff. This simply uses move-down?as described above; then if possible walks down any diagonals.

(defn get-farthest-xy-along-k*
[v d a b k]
(let [x (if (move-down? v d k)
(get v (inc k))
(inc (get v (dec k))))
y (- x k)
;; walk down diagonals
[x y] (loop [[x y] [x y]]
(if (and (< x (count a))
(< y (count b))
(= (get a x) (get b y)))
(recur [(inc x) (inc y)])
[x y]))]
[x y]))

get-farthest-xs-for-d iterates from -d to d(both inclusive), 2 steps at a time2 and stores the farthest x for each k, returning early if we reach the bottom-right corner in our edit graph.

(defn get-farthest-xs-for-d
[v a b d]
(let [get-farthest-xy-along-k (partial get-farthest-xy-along-k* v d a b)]
(reduce (fn [[found-end? k->x] k]
(let [[x y] (get-farthest-xy-along-k k)]
(if (and (>= x (count a))
(>= y (count b)))
(reduced [true k->x])
[found-end? (assoc k->x k x)])))
[false {}]
(range (- d) (inc d) 2))))

So far, we’ve built up functions that give us a trace:

myers.core> (shortest-edit (clojure.string/split "ABCABBA" #"")
(clojure.string/split "CBABAC" #""))
[{1 0}
{1 0, 0 0}
{1 1, 0 0, -1 0}
{1 1, 0 2, -1 0, -2 2, 2 3}
{1 5, 0 2, -1 4, -2 2, 2 3, -3 3, 3 5}
{1 5, 0 5, -1 4, -2 4, 2 7, -3 3, 3 5, -4 3, 4 7}]
myers.core> 

Now, let’s look at how to turn this into a diff.

Backtracking and producing the diff

This next function will take an input a trace and output an edit sequence. Each element in the edit-sequence is of the form [prev-x prev-y x y].

This time we start from the bottom-right node of the edit graph. In each iteration, we figure out the previous (x,y)– also walking back up diagonals if required.

(defn trace->edit-seq
[n m trace]
(let [[_ edit-seq]
(reduce (fn [[[x y] edit-seq] [d v]]
(let [k (- x y)
prev-k (if (move-down? v d k)
(inc k)
(dec k))
prev-x (get v prev-k)
prev-y (- prev-x prev-k)
[[x y] edit-seq] (loop [[x y] [x y]
edit-seq edit-seq]
(if (and (> x prev-x) (> y prev-y))
(recur [(dec x) (dec y)]
(conj edit-seq
[(dec x) (dec y) x y]))
[[x y] edit-seq]))
edit-seq (if (pos? d)
(conj edit-seq [prev-x prev-y x y])
edit-seq)]
[[prev-x prev-y] edit-seq]))
[[n m] []]
(reverse (map vector (range) trace)))]
(reverse edit-seq)))

Once we have an edit sequence, printing the diff is pretty straightforward:

(defn diff
[a b]
(for [[prev-x prev-y x y] (trace->edit-seq (count a)
(count b)
(shortest-edit a b))]
(let [a-line (get a prev-x)
b-line (get b prev-y)]
(cond
(= x prev-x) {:op :ins
:content b-line}
(= y prev-y) {:op :del
:content a-line}
:else {:op :eql
:content a-line}))))

(defn print-diff
[diff]
(doseq [{:keys [op content]} diff]
(let [d-out (case op
:ins (format "+%s" content)
:del (format "-%s" content)
:eql (format " %s" content))]
(println d-out))))

1. which is also why finding the shortest edit script and finding the longest common subsequence are equivalent problems.

[return]
2. if d is even, you can only reach an even k-line, and vice-versa.

[return]

2019-09-30

• The Little Elixir & OTP Guidebook by Benjamin Tan Wei Hao: an introduction to the Elixir programming language and the OTP framework that comes with it. It focusses on Elixir’s concurrency features and shows you what OTP can do(the book mostly focusses on GenServers and Supervisors) without spending too much time on the language itself– the idea being to get someone already familiar with functional programming excited about the features that set Elixir apart.

The book is slightly outdated and has a couple of typos but I found it a worthwhile read– it’s relatively quick to work through and the book tries to build up a good mental model of what OTP behaviours offer and when to use them. Of note is the chapter showing how you might implement a Supervisor using a GenServer.

• Atomic Habits by James Clear: on how to cultivate good habits and cull out bad ones. And why you should focus on building up good habits in the first place. I don’t know how valuable for me this was personally. The book was easy enough to read, and it says sensible things. The problem is that I never find myself actively applying things I learn from books like these.

• Exhalation by Ted Chiang: collection of short sci-fi stories. I liked it just as much as the author’s first book Stories of Your Life and Others. I don’t know if there is a name for the flavor of science fiction that Chiang writes, but he really nails it. I especially liked that the book included Chiang’s notes on what he was going for when he was constructing each story.

• Orca by Steven Brust: book 7 of the Vlad Taltos series. The series seemingly seems to get better with every book. This book in the series is still fantasy and not financial thriller, but at times it tries to be. Also loved that big parts of the book are from the perspective of Kiera, Vlad’s friend and renowned thief.

[spoiler] Vlad agrees to help a woman with her land getting foreclosed in exchange for her help healing Savn. This leads to the investigation of the death of Fyres, a shady Orca businessman. Vlad soon realizes that the situation involves a much larger scam and big banks covering up a murder for fear of going under if truth comes to light. [/spoiler]

• Dragon by Steven Brust: the next book in the Vlad Taltos series. I found some stretches of the book a bit of a chore to read, even though this is still a short book. Although that might have been because I took a longish break when reading the book.

[spoiler] Vlad enlists in Morrolan’s army for war against Fornia, a Dragon who’s stolen something from Morrolan. [/spoiler]

2019-06-30

• Peak by Anders Ericsson and Robert Pool: a book on deliberate practice by one of the researchers who originally coined the term. This book lays out a framework for how to achieve expertise. Briefly, this goes as follows: identify what skills and mental representations an expert possesses, design a step-by-step program where you build up that skill, get feedback on what you’re doing wrong, focus on your weaknesses as you practice. The book emphasizes mental representations as essential to expertise– as chess players study games for many, many hours they see not just individual pieces but learn to identify larger patterns in any configuration. Deliberate practice is about building similar mental representations in any skill you want to master.

Overall, I think this book is a worthwhile read. You’ve probably encountered the idea of deliberate practice elsewhere by now, and the authors lay down exactly what that entails. The book can be a bit dry at times– it tries to stay firmly grounded on facts, and that means citing one study after another. But I did find the description of how Benjamin Franklin designed a program to practice his writing skills to be really cool.

• Backstabbing for Beginners by Michael Soussan: an insider’s account of the United Nations operation overlooking sanctions imposed on Saddamn Hussein’s Iraq. The book is very open about the many failings(including incompetence, naivete and downright corruption) of the Oil-for-Food program and the people involved. And how Saddam Hussein benefitted massively, to the detriment of the people of Iraq, by exploiting these failings.

I found the book really well-written, and highly recommend it.

• Phoenix by Steven Brust: This is book 5 of the Vlad Taltos series. I’ve reviewed the earlier ones, and I can definitely stick to my recommendation for this series– these books are short and entertaining. I’m reading them in order of publication date; in terms of chronological order, this book follows book 3(Teckla).

• Athyra by Steven Brust: Book 6 of the Vlad Taltos series. This one’s written in a different voice and centers around a different character, although the protagonist of the series– Vlad– still plays a significant role. Still really fun.

2019-03-17

• Messy by Tim Harford: how pursuing tidy systems is harming us and why we should embrace messy systems that better reflect the unpredictableness of the real world. The systems in question can be anything from how you organize your inbox to how cities are laid out. Although it struck me while reading this book how similar most non-fiction books tend to be, I did enjoy the book and found the arguments quite compelling.

• Learning to Climb Indoors by Eric Horst: a guide to learning and improving for the beginning climber. The author seems like a knowledgeable person and the book covers everything from how to approach different kinds of handholds to managing mental roadblocks like fear to optimum training schedules.

I also found this to be a really well-written book, and it goes beyond just the technical and talks about how to stay motivated in the long run and even how climbing holds many life-lessons.

• Jhereg by Steven Brust: It had been a while since I’d read any fantasy and this was just a great book to come back to the genre with. This is part of a series called Vlad Taltos, named after the main character who is an assassin.

Unlike most other fantasy books I’ve read(not that many), the book doesn’t take itself too seriously. I thoroughly enjoyed the book, and I plan on picking up other books in the series soon.

[spoiler] Someone has stolen from the Jhereg organization, and Vlad is hired to assassinate him. The thief is taking refuge in Morrolan’s castle, where he knows Jhereg dare not attack. [/spoiler]

• Show Your Work! by Austin Kleon: tries to make the case that creativity is a social process and how you should share your work online to get an audience. I got excited about this book after reading Derek Sivers’s review of it, but I didn’t find the book as insightful as I was hoping.

• Yendi by Steven Brust: Second book in the Vlad Taltos series, from above. I enjoyed this book as well, although the structure of the plot felt a bit similar to first book.

[spoiler] Vlad goes into a turf war, gets killed by Cawti, gets revivified, then falls in love with Cawti. The turf war turns out to be part of a plan to instate Norathor as Dragon heir, displacing Aliera. [/spoiler]

• A Mind For Numbers by Barbara Oakley: strategies to learn better. The book also goes has ample explanations, based on research on the human mind, on why the techniques described work. Many of the ideas in the book, I’d already encountered in Make it Stick (eg. practicing recall through mini-tests, spacing repetition), but it was still useful to remind myself that I don’t have any excuses to not put these techniques to use.

• Teckla by Steven Brust: next book in the Vlad Taltos series. I felt like this book hinted at a much deeper big-picture narrative than the first two books did. I continue to be excited about this series.

[spoiler] Vlad’s wife Cawti has joined a revolution against the Dragaeran oppression of Easterners and Teckla. [/spoiler]

• The Defining Decade by Meg Jay: targeted towards twenty-somethings. The book tries to push readers into taking ownership of transitioning into adulthood, and being more intentional with life.

• Taltos by Steven Brust: book 4 of the Vlad Taltos series. I really enjoyed this one.

[spoiler] Vlad and Morrolan travel to the Paths of the Dead to revive Aliera [/spoiler]