Parsing
In this case study, we show how to implement a parser, using the lexer from the Lexer case study.
Again, we require some imports – in particular, we reuse the lexer implementation.
module examples/casestudies/parser
import examples/casestudies/lexer
import string
Parsers can be expressed by using the lexer effect and process the token stream. To model different alternatives in the grammar, we use the following effect for non-determinism:
interface Nondet {
def alt(): Bool
def fail[A](msg: String): A
}
effect Parser = { Nondet, Lexer }
Parser Combinators
Given these two effects, we can readily define a host of (imperative) parser combinators. We start by the simplest one, which applies a predicate to the next element in the input stream and fails, if it does not match.
def accept { p: Token => Bool } : Token / Parser = {
val got = do next();
if (p(got)) got
else do fail("Unexpected token " ++ got.show)
}
Using accept
, we can define parsers for the different token types.
def any() = accept { t => true }
def accept(exp: TokenKind) = accept { t => t.kind == exp }
def ident() = accept(Ident()).text
def number() = accept(Number()).text
def punct(p: String) = {
val tok = accept(Punct())
if (tok.text == p) ()
else do fail("Expected " ++ p ++ " but got " ++ tok.text)
}
def kw(exp: String): Unit / Parser = {
val got = ident();
if (got == exp) ()
else do fail("Expected keyword " ++ exp ++ " but got " ++ got)
}
Using the effect for non-deterministic choice alt
, we can model alternatives, optional matches and various repetitions:
def or[R] { p: => R } { q: => R } =
if (do alt()) { p() } else { q() }
def opt[R] { p: => R }: Option[R] / Parser =
or { Some(p()) } { None() }
def many { p: => Unit }: Unit / Parser =
or { some { p() } } { () }
def some { p: => Unit }: Unit / Parser =
{ p(); many { p() } }
Example: A Simple Expression Language
We illustrate the usage of the above parser combinators by parsing a simple expressions language.
type Tree {
Lit(value: Int)
Var(name: String)
Let(name: String, binding: Tree, body: Tree)
App(name: String, arg: Tree)
}
Let us start by defining the parser for numeric literals.
def parseNum(): Tree / Parser = {
val numText = number();
with default[WrongFormat, Tree] { do fail("Expected number, but cannot convert input to integer: " ++ numText) };
Lit(numText.toInt)
}
We simply call the parser for number()
and try to convert the
resulting string to an intenger.
def parseVar(): Tree / Parser =
Var(ident())
def parseAtom() = or { parseVar() } { parseNum() }
Parsing variables is simply a matter of reusing the ident
parser and changing the
result. Users of monadic parser combinator libraries might be delighted to see, that we
do not need to use map
or something similar to transform the result. The parser is
simply written in direct style, still offering great flexibility in modifying the
semantics of effects.
Similarly, we can write parsers for let bindings, by sequentially composing our existing parsers:
def parseLet(): Tree / Parser = {
kw("let");
val name = ident();
punct("=");
val binding = parseExpr();
kw("in");
val body = parseExpr();
Let(name, binding, body)
}
Again, note how naturally the result can be composed from the individual results, much like
manually writing a recursive descent parser. Compared to handcrafted parsers, the imperative
parser combinators presented here offer a similar flexibility. At the same time, the semantics
of alt
and fail
is still left open, offering flexibility in the implementation of the actual underlying parsing algorithm.
We proceed to implement the remaining parsers for our expression language:
def parseGroup() = or { parseAtom() } {
punct("(");
val res = parseExpr();
punct(")");
res
}
def parseApp(): Tree / Parser = {
val funName = ident();
punct("(");
val arg = parseExpr();
punct(")");
App(funName, arg)
}
def parseExpr(): Tree / Parser =
or { parseLet() } { or { parseApp() } { parseGroup() } }
Example: Combining Parsers and Local Mutable State
It is possible to combine the imperative parser combinators with local mutable state. The implementation of local variables in Effekt is designed to interact well with continuation capture and multiple resumptions. This is important for use cases like the parser example where the continuation is potentially called multiple times.
The following example implements an example
// <EXPR> ::= <NUMBER> | <IDENT> `(` <EXPR> (`,` <EXPR>)* `)`
It uses local (mutable) variables to count the number of leafs as semantic action.
def parseCalls(): Int / Parser =
or { number(); 1 } {
var count = 1;
ident();
punct("(");
count = count + parseCalls();
many {
punct(",");
count = count + parseCalls()
};
punct(")");
count
}
Notice how the user defined combinator many
feels like a built-in control operator
while
.
Backtracking Parsers
As mentioned above, so far we used the non-determinism effects, but did not specify what they mean. We could implement depth-first parsing corresponding to recursive descent. Alternatively, we also could implement breadth-first parsing. Further, in case of ambiguities, we have the choice whether we want to recognize only the first result or compute all possible alternative ways to recognize the input.
For this case study, we implement a depth-first parser, computing the first successful result. Results are represented by the following datatype:
type ParseResult[R] {
Success(t: R);
Failure(msg: String)
}
The parsing algorithm is simply implemented as a handler for Parser
.
def parse[R](input: String) { p: => R / Parser }: ParseResult[R] = try {
lexer(input) { skipWhitespace { Success(p()) } }
} with Nondet {
def alt() = resume(true) match {
case Failure(msg) => resume(false)
case Success(res) => Success(res)
}
def fail[A](msg) = Failure(msg)
} with LexerError { (msg, pos) =>
Failure(msg)
}
The handler reuses the lexer implementation of the Lexer case study. The lexer
raises a LexerError
in case of an unexpected enf of the input stream or if it cannot
recognize a token. Those lexer errors are simply converted into failures, which the
parser can backtrack. To establish the backtracking behavior, it is important that the
lexer is executed under the parser handler and not the other way around. Only this way
the lexer positions will be restored when calling the continuation a second time with resume(false)
.
Running the Examples
Having implemented a handler for the Parser
effect, we can run our example “grammars” on some inputs.
def main() = {
inspect(parse("42") { parseCalls() })
inspect(parse("foo(1)") { parseCalls() })
inspect(parse("foo(1, 2)") { parseCalls() })
inspect(parse("foo(1, 2, 3, 4)") { parseCalls() })
inspect(parse("foo(1, 2, bar(4, 5))") { parseCalls() })
inspect(parse("foo(1, 2,\nbar(4, 5))") { parseCalls() })
inspect(parse("}42") { parseExpr() })
inspect(parse("42") { parseExpr() })
inspect(parse("let x = 4 in 42") { parseExpr() })
inspect(parse("let x = let y = 2 in 1 in 42") { parseExpr() })
inspect(parse("let x = (let y = 2 in 1) in 42") { parseExpr() })
inspect(parse("let x = (let y = f(42) in 1) in 42") { parseExpr() })
inspect(parse("let x = (let y = f(let z = 1 in z) in 1) in 42") { parseExpr() })
}