# Parsing

# Arithmetic with Instaparse

### 2014-03-15

Last week I started playing around with the Instaparse
library, which makes writing parsers really easy in Clojure. What
Instaparse does is take the formal definition of a language and give
you a parser for that language. You can then use that parser on a
string, possibly containing something in that language. You need to
write the *grammar* in one of two forms: Extended Backus-Naur
Form(EBNF) or Augmented Backus-Naur form(ABNF). You can read about
these grammars in Matt Might’s excellent blog post:
The language of languages. I’ve picked EBNF as my grammar of
choice for now, admittedly without having studied the nuances of
each. Instaparse does let you mix up EBNF and ABNF in your grammar but
you might end up unnecessarily complecting things for yourself.

If the string contains a syntactically correct string, the parser gives you a tree. Instaparse also provides you with the tools to transform that tree into something else, say a Clojure form, so that it can get evaluated. To get familiar with the Backus-Naur form(and its descendants, EBNF and ABNF), I recommend going through Matt’s blog post linked above.

In this post, we’ll write a grammar for parsing arithmetic
expressions, so that we can turn something like `"11*1-1/5"`

into:

`(clojure.core/- (clojure.core/* 11 1) (clojure.core// 1 5))`

Notice that the parser has spit out the Clojure form with the correct precedence, that is multiplicative operators get evaluated before additive operators.

## The grammar

Here is the grammar that I came up with. It handles the four basic
arithmetic operators: `+`

, `-`

, `*`

and `/`

. `+`

and `-`

have the same
precedence, as do `*`

and `/`

. The multiplicative operators(* and /)
have higher precedence than the additive operators(+ and -).

```
(def arithmetic
(insta/parser
"<expr> = additive-expr | multiplicative-expr | number
additive-expr = expr (('+' | '-' ) expr)
multiplicative-expr = number ( ('*' | '/') (multiplicative-expr|number) )
number = #'-?[0-9]+'"))
```

Let us now take a closer look at the grammar. Starting from the
bottom-up, a `number`

is any integer. A `multiplicative-expr`

is any
number multiplied or divided by another `multiplicative-expr`

or a
number. Notice that a `multiplicative-expr`

cannot contain an
`additive-expr`

.

Next, an `additive-expr`

is any `expr`

added or subtracted with any
other `expr`

. `expr`

is defined in the first line to be one of
`additive-expr`

, `multiplicative-expr`

or `number`

. The angled
brackets around `expr`

means that the `:expr`

tag will be hidden in
the parse tree Instaparse produces.

The definitions of additive and multiplicative expressions implicitly specify that multiplicative operators have higher precedence.

## Trying out the parser

Let us now try out the parser to see if it actually works:

First, let’s try the simplest possible operation:

```
user> (arithmetic "42")
([:number "42"])
```

A bit more compound expression:

```
user> (arithmetic "2+2")
([:additive-expr [:number "2"] "+" [:number "2"]])
```

And now a slightly ridiculous expression:

```
user> (clojure.pprint/pprint (arithmetic "2+2-2/2*2-1/1+1"))
([:additive-expr
[:number "2"]
"+"
[:additive-expr
[:number "2"]
"-"
[:additive-expr
[:multiplicative-expr
[:number "2"]
"/"
[:multiplicative-expr [:number "2"] "*" [:number "2"]]]
"-"
[:additive-expr
[:multiplicative-expr [:number "1"] "/" [:number "1"]]
"+"
[:number "1"]]]]])
```

## Transforming the parse tree

Like I mentioned above, we really want to have the arithmetic expression parsed into a Clojure form. To do this we will specify what transformations we want:

```
(def parse-tree->sexp
{:additive-expr (fn [operand operator & rest]
(if (= operator "+")
`(+ ~operand ~@rest)
`(- ~operand ~@rest)))
:multiplicative-expr (fn [operand operator & rest]
(if (= operator "*")
`(* ~operand ~@rest)
`(/ ~operand ~@rest)))
:number read-string})
```

And finally to get the Clojure form we can run the `insta/transform`

functions like this:

```
user> (insta/transform parse-tree->sexp (arithmetic "2+2*2-2"))
((clojure.core/+ 2 (clojure.core/- (clojure.core/* 2 2) 2)))
```