Superpower 3.2.2-dev-00214

Superpower NuGet Version

A parser combinator library based on Sprache. Superpower generates friendlier error messages through its support for token-driven parsers.

Logo

What is Superpower?

The job of a parser is to take a sequence of characters as input, and produce a data structure that's easier for a program to analyze, manipulate, or transform. From this point of view, a parser is just a function from string to T - where T might be anything from a simple number, a list of fields in a data format, or the abstract syntax tree of some kind of programming language.

Just like other kinds of functions, parsers can be built by hand, from scratch. This is-or-isn't a lot of fun, depending on the complexity of the parser you need to build (and how you plan to spend your next few dozen nights and weekends).

Superpower is a library for writing parsers in a declarative style that mirrors the structure of the target grammar. Parsers built with Superpower are fast, robust, and report precise and informative errors when invalid input is encountered.

Usage

Superpower is embedded directly into your C# program, without the need for any additional tools or build-time code generation tasks.

dotnet add package Superpower

The simplest text parsers consume characters directly from the source text:

// Parse any number of capital 'A's in a row
var parseA = Character.EqualTo('A').AtLeastOnce();

The Character.EqualTo() method is a built-in parser. The AtLeastOnce() method is a combinator, that builds a more complex parser for a sequence of 'A' characters out of the simple parser for a single 'A'.

Superpower includes a library of simple parsers and combinators from which more sophisticated parsers can be built:

TextParser<string> identifier =
    from first in Character.Letter
    from rest in Character.LetterOrDigit.Or(Character.EqualTo('_')).Many()
    select first + new string(rest);

var id = identifier.Parse("abc123");

Assert.Equal("abc123", id);

Parsers are highly modular, so smaller parsers can be built and tested independently of the larger parsers that use them.

Tokenization

Along with text parsers that consume input character-by-character, Superpower supports token parsers.

A token parser consumes elements from a list of tokens. A token is a fragment of the input text, tagged with the kind of item that fragment represents - usually specified using an enum:

public enum ArithmeticExpressionToken
{
    None,
    Number,
    Plus,

A major benefit of driving parsing from tokens, instead of individual characters, is that errors can be reported in terms of tokens - unexpected identifier `frm`, expected keyword `from` - instead of the cryptic unexpected `m`.

Token-driven parsing takes place in two distinct steps:

  1. Tokenization, using a class derived from Tokenizer<TKind>, then
  2. Parsing, using a function of type TokenListParser<TKind>.
var expression = "1 * (2 + 3)";

// 1.
var tokenizer = new ArithmeticExpressionTokenizer();
var tokenList = tokenizer.Tokenize(expression);

// 2.
var parser = ArithmeticExpressionParser.Lambda; // parser built with combinators
var expressionTree = parser.Parse(tokenList);

// Use the result
var eval = expressionTree.Compile();
Console.WriteLine(eval()); // -> 5

Assembling tokenizers with TokenizerBuilder<TKind>

The job of a tokenizer is to split the input into a list of tokens - numbers, keywords, identifiers, operators - while discarding irrelevant trivia such as whitespace or comments.

Superpower provides the TokenizerBuilder<TKind> class to quickly assemble tokenizers from recognizers, text parsers that match the various kinds of tokens required by the grammar.

A simple arithmetic expression tokenizer is shown below:

var tokenizer = new TokenizerBuilder<ArithmeticExpressionToken>()
    .Ignore(Span.WhiteSpace)
    .Match(Character.EqualTo('+'), ArithmeticExpressionToken.Plus)
    .Match(Character.EqualTo('-'), ArithmeticExpressionToken.Minus)
    .Match(Character.EqualTo('*'), ArithmeticExpressionToken.Times)
    .Match(Character.EqualTo('/'), ArithmeticExpressionToken.Divide)
    .Match(Character.EqualTo('('), ArithmeticExpressionToken.LParen)
    .Match(Character.EqualTo(')'), ArithmeticExpressionToken.RParen)
    .Match(Numerics.Natural, ArithmeticExpressionToken.Number)
    .Build();

Tokenizers constructed this way produce a list of tokens by repeatedly attempting to match recognizers against the input in top-to-bottom order.

Writing tokenizers by hand

Tokenizers can alternatively be written by hand; this can provide the most flexibility, performance, and control, at the expense of more complicated code. A handwritten arithmetic expression tokenizer is included in the test suite, and a more complete example can be found here.

Writing token list parsers

Token parsers are defined in the same manner as text parsers, using combinators to build up more sophisticated parsers out of simpler ones.

class ArithmeticExpressionParser
{
    static readonly TokenListParser<ArithmeticExpressionToken, ExpressionType> Add =
        Token.EqualTo(ArithmeticExpressionToken.Plus).Value(ExpressionType.AddChecked);
        
    static readonly TokenListParser<ArithmeticExpressionToken, ExpressionType> Subtract =
        Token.EqualTo(ArithmeticExpressionToken.Minus).Value(ExpressionType.SubtractChecked);
        
    static readonly TokenListParser<ArithmeticExpressionToken, ExpressionType> Multiply =
        Token.EqualTo(ArithmeticExpressionToken.Times).Value(ExpressionType.MultiplyChecked);
        
    static readonly TokenListParser<ArithmeticExpressionToken, ExpressionType> Divide = 
        Token.EqualTo(ArithmeticExpressionToken.Divide).Value(ExpressionType.Divide);

    static readonly TokenListParser<ArithmeticExpressionToken, Expression> Constant =
            Token.EqualTo(ArithmeticExpressionToken.Number)
            .Apply(Numerics.IntegerInt32)
            .Select(n => (Expression)Expression.Constant(n));

    static readonly TokenListParser<ArithmeticExpressionToken, Expression> Factor =
        (from lparen in Token.EqualTo(ArithmeticExpressionToken.LParen)
            from expr in Parse.Ref(() => Expr)
            from rparen in Token.EqualTo(ArithmeticExpressionToken.RParen)
            select expr)
        .Or(Constant);

    static readonly TokenListParser<ArithmeticExpressionToken, Expression> Operand =
        (from sign in Token.EqualTo(ArithmeticExpressionToken.Minus)
            from factor in Factor
            select (Expression)Expression.Negate(factor))
        .Or(Factor).Named("expression");

    static readonly TokenListParser<ArithmeticExpressionToken, Expression> Term =
        Parse.Chain(Multiply.Or(Divide), Operand, Expression.MakeBinary);

    static readonly TokenListParser<ArithmeticExpressionToken, Expression> Expr =
        Parse.Chain(Add.Or(Subtract), Term, Expression.MakeBinary);

    public static readonly TokenListParser<ArithmeticExpressionToken, Expression<Func<int>>>
        Lambda = Expr.AtEnd().Select(body => Expression.Lambda<Func<int>>(body));
}

Error messages

The error scenario tests demonstrate some of the error message formatting capabilities of Superpower. Check out the parsers referenced in the tests for some examples.

ArithmeticExpressionParser.Lambda.Parse(new ArithmeticExpressionTokenizer().Tokenize("1 + * 3"));
     // -> Syntax error (line 1, column 5): unexpected operator `*`, expected expression.

To improve the error reporting for a particular token type, apply the [Token] attribute:

public enum ArithmeticExpressionToken
{
    None,

    Number,

    [Token(Category = "operator", Example = "+")]
    Plus,

Performance

Superpower is built with performance as a priority. Less frequent backtracking, combined with the avoidance of allocations and indirect dispatch, mean that Superpower can be quite a bit faster than Sprache.

Recent benchmark for parsing a long arithmetic expression:

Host Process Environment Information:
BenchmarkDotNet.Core=v0.9.9.0
OS=Windows
Processor=?, ProcessorCount=8
Frequency=2533306 ticks, Resolution=394.7411 ns, Timer=TSC
CLR=CORE, Arch=64-bit ? [RyuJIT]
GC=Concurrent Workstation
dotnet cli version: 1.0.0-preview2-003121

Type=ArithmeticExpressionBenchmark  Mode=Throughput  

Method Median StdDev Scaled Scaled-SD
Sprache 283.8618 µs 10.0276 µs 1.00 0.00
Superpower (Token) 81.1563 µs 2.8775 µs 0.29 0.01

Benchmarks and results are included in the repository.

Tips: if you find you need more throughput: 1) consider a hand-written tokenizer, and 2) avoid the use of LINQ comprehensions and instead use chained combinators like Then() and especially IgnoreThen() - these allocate fewer delegates (closures) during parsing.

Examples

Superpower is introduced, with a worked example, in this blog post.

Example parsers to learn from:

  • JsonParser is a complete, annotated example implementing the JSON spec with good error reporting
  • DateTimeTextParser shows how Superpower's text parsers work, parsing ISO-8601 date-times
  • IntCalc is a simple arithmetic expresion parser (1 + 2 * 3) included in the repository, demonstrating how Superpower token parsing works
  • Plotty implements an instruction set for a RISC virtual machine
  • tcalc is an example expression language that computes durations (1d / 12m)

Real-world projects built with Superpower:

  • Serilog.Expressions uses Superpower to implement an expression and templating language for structured log events
  • The query language of Seq is implemented using Superpower
  • seqcli extraction patterns use Superpower for plain-text log parsing
  • PromQL.Parser is a parser for the Prometheus Query Language

Have an example we can add to this list? Let us know.

Getting help

Please post issues to the issue tracker, or tag your question on StackOverflow with superpower.

The repository's title arose out of a talk "Parsing Text: the Programming Superpower You Need at Your Fingertips" given at DDD Brisbane 2015.

Showing the top 20 packages that depend on Superpower.

Packages Downloads
Serilog.Filters.Expressions
Expression-based event filtering for Serilog.
2,327
Serilog.Filters.Expressions
Expression-based event filtering for Serilog.
14
Serilog.Filters.Expressions
Expression-based event filtering for Serilog.
13
Serilog.Filters.Expressions
Expression-based event filtering for Serilog.
12
Serilog.Filters.Expressions
Expression-based event filtering for Serilog.
11

.NET 6.0

  • No dependencies.

.NET 8.0

  • No dependencies.

.NET Standard 2.0

Version Downloads Last updated
3.2.2-dev-00214 1 05/28/2026
3.2.1 2 05/04/2026
3.2.1-dev-00211 2 05/04/2026
3.2.1-dev-00210 2 05/04/2026
3.2.0 2 05/04/2026
3.2.0-dev-00207 2 05/04/2026
3.2.0-dev-00206 2 05/04/2026
3.2.0-dev-00204 2 05/04/2026
3.2.0-dev-00203 2 05/04/2026
3.1.1-dev-00257 8 09/17/2025
3.1.1-dev-00201 2 05/04/2026
3.1.0 10 07/10/2025
3.1.0-dev-00250 10 07/11/2025
3.1.0-dev-00247 13 02/20/2025
3.1.0-dev-00245 13 02/20/2025
3.1.0-dev-00241 13 02/20/2025
3.1.0-dev-00239 13 02/20/2025
3.0.0 13 02/20/2025
3.0.0-dev-00236 13 02/20/2025
3.0.0-dev-00233 13 02/20/2025
3.0.0-dev-00231 12 02/20/2025
3.0.0-dev-00229 13 02/20/2025
3.0.0-dev-00222 13 02/20/2025
3.0.0-dev-00221 13 02/20/2025
3.0.0-dev-00220 13 02/20/2025
3.0.0-dev-00219 14 02/20/2025
3.0.0-dev-00215 13 02/20/2025
3.0.0-dev-00213 12 02/20/2025
3.0.0-dev-00210 12 02/20/2025
3.0.0-dev-00206 12 02/20/2025
3.0.0-dev-00203 13 02/20/2025
3.0.0-dev-00201 12 02/20/2025
3.0.0-dev-00196 13 02/20/2025
3.0.0-dev-00193 12 02/20/2025
3.0.0-dev-00189 12 02/20/2025
2.3.1-dev-00185 11 02/20/2025
2.3.1-dev-00184 13 02/20/2025
2.3.0 1,968 08/06/2024
2.3.0-dev-00172 13 02/20/2025
2.2.0 12 02/20/2025
2.2.0-dev-00170 13 02/20/2025
2.2.0-dev-00169 13 02/20/2025
2.2.0-dev-00161 13 02/20/2025
2.1.1-dev-00159 13 02/20/2025
2.1.1-dev-00157 13 02/20/2025
2.1.0 13 02/20/2025
2.1.0-dev-00155 13 02/20/2025
2.1.0-dev-00135 13 02/20/2025
2.1.0-dev-00132 13 02/20/2025
2.1.0-dev-00128 13 02/20/2025
2.1.0-dev-00126 13 02/20/2025
2.1.0-dev-00124 13 02/20/2025
2.0.1-dev-00123 13 02/20/2025
2.0.1-dev-00120 13 02/20/2025
2.0.0 12 02/20/2025
2.0.0-dev-00116 13 02/20/2025
2.0.0-dev-00115 13 02/20/2025
2.0.0-dev-00114 13 02/20/2025
2.0.0-dev-00111 13 02/20/2025
2.0.0-dev-00109 13 02/20/2025
2.0.0-dev-00107 13 02/20/2025
2.0.0-dev-00103 14 02/20/2025
2.0.0-dev-00102 13 02/20/2025
2.0.0-dev-00101 13 02/20/2025
2.0.0-dev-00100 13 02/20/2025
2.0.0-dev-00093 13 02/20/2025
2.0.0-dev-00091 13 02/20/2025
2.0.0-dev-00089 13 02/20/2025
1.1.1-dev-00086 13 02/20/2025
1.1.1-dev-00081 13 02/20/2025
1.1.1-dev-00079 13 02/20/2025
1.1.0 13 02/20/2025
1.1.0-dev-00065 13 02/20/2025
1.1.0-dev-00063 13 02/20/2025
1.1.0-dev-00061 13 02/20/2025
1.1.0-dev-00059 12 02/20/2025
1.0.3-dev-00051 13 02/20/2025
1.0.2 13 02/20/2025
1.0.1 13 02/20/2025
1.0.1-dev-00044 12 02/20/2025
1.0.0 13 02/20/2025
1.0.0-dev-00038 13 02/20/2025
1.0.0-dev-00037 13 02/20/2025
1.0.0-dev-00036 12 02/20/2025
1.0.0-dev-00035 13 02/20/2025
1.0.0-dev-00034 13 02/20/2025
1.0.0-dev-00033 12 02/20/2025
1.0.0-dev-00032 12 02/20/2025
1.0.0-dev-00031 12 02/20/2025
1.0.0-dev-00030 13 02/20/2025
1.0.0-dev-00029 13 02/20/2025
1.0.0-dev-00028 13 02/20/2025
1.0.0-dev-00027 13 02/20/2025
1.0.0-dev-00026 13 02/20/2025
1.0.0-dev-00025 13 02/20/2025
1.0.0-dev-00024 13 02/20/2025
1.0.0-dev-00023 12 02/20/2025
1.0.0-dev-00022 13 02/20/2025
1.0.0-dev-00021 13 02/20/2025
1.0.0-dev-00020 12 02/20/2025
1.0.0-dev-00019 13 02/20/2025
1.0.0-dev-00018 13 02/20/2025
1.0.0-dev-00017 13 02/20/2025
1.0.0-dev-00016 13 02/20/2025
1.0.0-dev-00014 13 02/20/2025
1.0.0-dev-00013 13 02/20/2025
1.0.0-dev-00012 13 02/20/2025
1.0.0-dev-00010 13 02/20/2025
1.0.0-dev-00009 13 02/20/2025
1.0.0-dev-00008 13 02/20/2025
1.0.0-dev-00007 13 02/20/2025
1.0.0-dev-00006 13 02/20/2025
1.0.0-dev-00005 13 02/20/2025
1.0.0-dev-00004 12 02/20/2025
1.0.0-dev-00003 13 02/20/2025
1.0.0-dev-00002 12 02/20/2025