Lindsey Kuper (lindseykuper) wrote,
Lindsey Kuper

"A pattern matcher for miniKanren, or, how to get into trouble with CPS macros"

The abstract for the paper I mentioned working on a while ago is now in the preliminary program for Scheme and Functional Programming 2009. Actually, we've changed the abstract slightly since we submitted the version that appears there, so here's the version that (I think) is going to appear in the final, published workshop proceedings:

CPS macros written using Scheme's syntax-rules macro system allow for guaranteed composition of macros and control over the order of macro expansion. We identify a limitation of CPS macros when used to generate bindings from a non-unique list of user-specified identifiers. Implementing a pattern matcher for the miniKanren relational programming language revealed this limitation. Identifiers come from the pattern, and repetition indicates that the same variable binding should be used. Using a CPS macro, binding is delayed until after the comparisons are performed. This may cause free identifiers that are symbolically equal to be conflated, even when they are introduced by different parts of the source program. After expansion, this leaves some identifiers unbound that should be bound. In our first solution, we use syntax-case with bound-identifier=? to correctly compare the delayed bindings. Our second solution uses eager binding with syntax-rules. This requires abandoning the CPS approach when discovering new identifiers.

I'd like to take a stab at explaining what we did in slightly less technical language, but it'll require some background. So, here goes: In the Scheme programming language, macros are programs that generate programs. Macros can generate other macros, too. Scheme programs may be run by a Scheme interpreter (which is itself a program). If an interpreter has to run a Scheme program that makes use of macros, one of the first things it has to do is expand all the macros, which themselves may expand into other macros, until no more expansion can be done. This phase of execution is called, unsurprisingly, macro expansion.

In some situations, the programmer might like to be able to control the order in which macro expansion takes place (which macro to expand first, which to expand second, and so on). For instance, a particular order of expansion might produce code that runs faster than some other order of expansion would have. Or maybe expansion itself can proceed more efficiently if it's done in a particular order. So, in Scheme, programmers can use a technique called continuation-passing style (CPS) in their macros to control the order of macro expansion. (CPS is interesting for other reasons, but I won't discuss them here.)

Scheme programmers can use macros to add new features to Scheme, and even to build new languages. Our paper documents a discovery we made while we were working on adding a new feature to the miniKanren programming language. miniKanren is itself a language that's built using Scheme macros that were written using Scheme's syntax-rules macro system, and the feature we were adding to miniKanren was pattern matching. For our miniKanren pattern matcher, which we were also implementing with syntax-rules, we wanted to delay the generation of variable bindings on the fly based on certain conditions. We could do this by using CPS to delay the expansion of the macro that generated the bindings until other macro expansions had taken place. By doing this, our macros could generate more concise code. Most of the time, it worked well, but we were able to construct a situation in which the delayed expansion caused incorrect code to be generated. The bug had the flavor of the hygiene problem, in which variables in user programs can be inappropriately "shadowed" or "captured" by variables in macros because they happen to have the same name. However, in our case, no inappropriate variable capture was taking place; rather, the problem was that variables that should have been bound in the generated code could be left unbound in certain situations.

After demonstrating and explaining the variable-binding problem in the paper, we present two workarounds. One is to knuckle under and use the more powerful syntax-case macro system instead of syntax-rules. The syntax-case version is conceptually clean and generates efficient code. However, since syntax-case is a really, really big hammer, and one of the guiding principles of miniKanren is, well, minimalism, we also present a syntax-rules solution that doesn't use (as much) CPS and therefore doesn't delay the generation of variable bindings. The generated code is rather sloppier and less efficient, but it does not suffer from the variable-binding problem.

With the publication of this paper, my Erdős number makes a precipitous drop from infinity to 4 (Paul Erdős -> Paul T. Bateman -> Eugene E. Kohlbecker -> Daniel P. Friedman -> Lindsey Kuper). The other authors had surprised me by adding my name to the author list after the paper was accepted to the workshop (I had previously only been in the acknowledgments). At first, it seemed to me as though I was getting a publication almost for free; I had contributed an idea, but hadn't done any of the writing. However, we found some errors in the already-accepted paper that necessitated rewriting significant chunks of it, and I spent a lot of time on that. So, now that I really have done a lot of the writing, I actually feel that my name belongs on the thing!

Tags: grad school, research

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 1 comment