Macros In Haskell

Haskell too has macros. The subsystem is called Template Haskell. It is a Lisp-like AST generation framework that comes as a GHC extension (that is, it isn’t a part of the language as of now). This blog post is a (hopefully) simple introduction to TH, and how it can possibly be used. As usual, it is less of an instructive guide, and more like a chronicle of my own personal adventure. Moderate competence in Haskell is required.

Unlike Lisp’s macro system, TH is typed — like everything else in Haskell. You have typed constructors like FunD (function declaration) and InfixP which are used the build the AST of whatever form you wish to splice into your regular code. You can find the current TH user manual at [1].

The AST constructors are in the module Language.Haskell.TH, which looks quite overwhelming at the first glance, and, as such, writing TH expressions from scratch can get quite difficult. Thankfully, we also have a nice runQ function which allows us to write TH expressions without really grokking everything in the module. I’ll try to demonstrate this by getting TH to generate a small (read useless / toy) string matcher.

Objective

We’ll try to generate a function with the type String->Bool. The function’s job will be to check if the String it receives belongs to a list of Strings we provide at compile time. To make things a little more interesting, we’ll allow the strings we have at compile time to have a regex-like "?" in them, which can match any character. As an example, if the list of Strings is ["asdf", "a?foo", "P"], the generated function should match "asdf", "P", "amfoo" but not "afoo". We don’t address the issue of how to match a literal "?". Heck, out end product boils down to ten lines of code.

Once we’re done writing our TH code, we should have a function generateMatcher, which we shall invoke from the top level of a source file, like:

  generateMatcher "generatedF" ["asdf", "a?foo", "P"]

It will then generate the matcher, with the name generatedF and the behaviour we’re just described. It can then be called like a regular function. So a source file can look like:

  generateMatcher "generatedF" ["asdf", "a?foo", "P"]

  main = do
    str < - getLine
    putStrLn $ show $ generatedF str

Starting Off

The first thing to think about is how the compiled code should look like. Given the above problem instance, we could generate something like

    generatedF ('a':'s':'d':'f':[]) = True
    generatedF ('a':_:'f':'o':'o':[]) = True
    generatedF ('P':[]) = True
    generatedF _ = False
  

Haskell’s pattern matcher is, AFAIK, intelligent enough to convert the cases into a tree structure and quickly jump to the correct clause. Now the second question: how should the AST look like? Here we take the help of the runQ function. First we start GHCi as ghci -XTemplateHaskell (since TH is not enabled by default). Once in GHCi, we import the required module by :m + Language.Haskell.TH. The scene is set! We can now see the AST for generatedF ('a':'s':'d':'f':[]) = True. For that, we evaluate runQ [d|generatedF ('a':'s':'d':'f':[]) = True|]. The [d| ... |] is used to quote a declaration. Detailed information (about this, and generally about TH) is in [1]. We will now see some (rather incomprehensible) output:

[FunD generatedF [Clause [InfixP (LitP (CharL 'a'))
GHC.Types.: (InfixP (LitP (CharL 's')) GHC.Types.:
(InfixP (LitP (CharL 'd')) GHC.Types.:
(InfixP (LitP (CharL 'f')) GHC.Types.:
(ConP GHC.Types.[] []))))] (NormalB
(ConE GHC.Bool.True)) []]]

A little indentation, of course, goes a long way in making things clearer (when does it not? :D):

[FunD
 generatedF [Clause
             [InfixP (LitP (CharL 'a'))
              GHC.Types.:
              (InfixP (LitP (CharL 's'))
               GHC.Types.:
               (InfixP (LitP (CharL 'd'))
                GHC.Types.:
                (InfixP (LitP (CharL 'f'))
                 GHC.Types.:
                 (ConP GHC.Types.[] []))))]
             (NormalB (ConE GHC.Bool.True)) []]]
  

I usually use guesswork from this point, leaving it to Haskell’s type system to correct my incompetence. Haskell’s AST has the convention of suffixing most data constructors with a letter which hints at what type it belongs to. We can guess (and purely guess, I have not looked up the documentation) that InfixP is an infix pattern, a LitP is a literal pattern, a NormalB is a normal body, a ConE is a constructor expression, a FunD is a function declaration and so forth. The AST for the match-anything clause is simpler:

  [FunD
   generatedF [Clause [WildP]
               (NormalB (ConE GHC.Bool.False)) []]]

You can guess what a WildP is, especially since it derives from a _.

The core idea

The basic idea behind TH is that allowing the user to splice in a bit of custom-generated AST into an otherwise normal Haskell program can lead to all kinds of awesomeness. A splicable AST has to be typed as a Q Exp, a Q Type or a Q [Dec]. The last one stands for a list of declarations, which is the one we need right now. To splice in an expression as an AST, we place it inside a $( ... ). To keep things clean, top level declarations don’t need the $( ... ) nesting — anything of the type Q [Dec] at the top level of a source file is automatically interpreted correctly. Q is a Monad, which is sometimes needed to generate unique identifiers, like Lisp’s gensym. We don’t need that right now.

Getting back to the point, we need a function that takes the specified input, and outputs the AST of the code we want to generate. Since we need to generate something of the type Q [Dec], and we will be given the name of the function to generate, and a list of Strings, there is at least one line we can write:

  generateMatcher :: String -> [String] -> Q [Dec]

We already have seen the basic structure of the AST. We now write some code to generate it. We will generate only one Dec (i.e. the [Dec] in Q [Dec] will only have one element), corresponding to the one function we want to declare. The Dec will be a FunD (just like in the AST runQ showed us), with multiple Clauses, each representing one pattern (which itself corresponds to one String we were given at compile time).

One thing though — for some reason I don’t really understand, while runQ prints them, things in the GHC module (like GHC.Bool.True, GHC.Types.:) don’t really work in a generated AST. A token in TH is essentially a Name, which we can get using mkName :: String -> Name or by using the quote (') operator. So one way to get around the problem is to represent things like GHC.Bool.True as Names — this seem to work well in practice. The quote operator does not seem to work for symbols — we replace things like GHC.Bool.True with 'True and things like : with mkName ":". We also need to use mkName for the functions’ name, which needs to be a (surprise) Name.

Okay, phew! Time to see some code that actually works. With the little Haskell-foo I have, I wrote the following piece of code:

generateMatcher name strs =
  return [FunD (mkName name) (map thMagic strs ++ baseCase)]
  where
    thMagic str = Clause [transform str] (NormalB (ConE 'True)) []
    baseCase = [Clause [WildP] (NormalB (ConE 'False)) []]
    transform [] = ConP (mkName "[]") []
    transform ('?':rest) =
      let colon = mkName ":" in (InfixP WildP colon $ transform rest)
    transform (x:rest) =
      InfixP (LitP (CharL x)) (mkName ":") $ transform rest

Ten lines, as promised. I used return to lift the [Dec] into the Q monad. Other than, most of it is a direct translation of the input we get into something that looks like the output runQ dumped on me. The code is in my snippets repo, as usual. TH does not, as of now, allow you to declare an expression and splice it in the same file. You need to have, say, a FooTH.hs where you have all the functions for generating the AST and a Foo.hs where you actually use them.

TL;DR

This was a long and meandering post. In short, here are the two basic points I want to make:

  • TH is awesome. It can save a lot of time, if used properly.
  • Don’t beat yourself up trying to write ASTs from scratch, play around with runQ, steal and modify.

[1] http://www.haskell.org/ghc/docs/latest/html/users_guide/template-haskell.html

Advertisements
This entry was posted in Computers and tagged , , , . Bookmark the permalink.

6 Responses to Macros In Haskell

  1. Issac Trotts says:

    I get this error message when I try to compile it:

    ijt@curry:~/haskell/template_example$ ghc -XTemplateHaskell –make th.hs
    [1 of 1] Compiling Main ( th.hs, th.o )

    th.hs:18:1:
    GHC stage restriction: `generateMatcher’
    is used in a top-level splice or annotation,
    and must be imported, not defined locally
    In the expression:
    generateMatcher “generatedF” [“asdf”, “a?foo”, “P”]

  2. Issac Trotts says:

    Update: It compiles if you move the generateMatcher function into a separate module.

  3. Carl says:

    You would gain a lot from further exploration of quotes. That was the biggest thing I missed when I was first learning TH, and now that I’ve learned more of what they can do, I feel really ridiculous for using anything else to generate the AST in most cases.

    Most importantly, they’re actually what lisp calls quasiquotes – you can refer to the surrounding environment from inside them. Let’s take a Q Exp, for example, as they have the simplest syntax:

    let a = 5 in [| print a |]

    The names referred to in the quote are handled in different ways. “print” is in scope at the top level, and therefore the AST generated by the quote contains a reference to the Name of the top-level binding.

    “a” is in scope locally, so it is treated somewhat differently. The Lift instance is actually used to lift its value into the Exp, and the generated AST contains a representation of the value of the name. In this case, a numeric literal.

    There’s one more way you can affect the contents of a quote, and that’s with an explicit splice into it. How about…

    [| print $foo |] or [|print $(varE bar) |]

    In these cases, expressions are being spliced into a quote. This is a non-top-level splice, the opposite of actually splicing TH generated code into non-TH code. The splice is indicated with a $, as with top-level splices. The parens are optional, unless you are splicing in an expression that’s more than a single token. Spliced-in values need to be in Q, and of the correct type for where they’re being spliced (Q Exp, Q [Decl], etc).

    Together, the various ways of using quotes mean you rarely need to directly generate the AST. This is nice because the AST is rather verbose to work with, after all.

  4. laughedelic says:

    Hello, Carl.
    Thanks for short explanation o quotation syntax. I’ve spent last night trying to make simple macros, using quotes, but failed, because I don’t understand something important about allowed scope for using in qoutes.

    Can you rewrite for example generateMatcher only with quotes?

  5. Carl says:

    No, it can’t be done only with quotes, because there’s no syntax for splicing an arbitrary name into a function declaration.

    Lots of little pieces of generateMatcher can be done with quotes, but that’s not hugely helpful, as the structure of the code you’re generating depends on the input.

    Slightly more helpful is that you could use ‘[] and ‘(:) instead of using mkName for those.

    But it’s relatively uncommon to build things so dynamically structured with TH. TH is really not very much like lisp macros. It’s usually used to create instances mechanically or create type-safe functions when they otherwise wouldn’t be type-safe. See http://hackage.haskell.org/package/Printf-TH for an example of the latter.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s