-
Notifications
You must be signed in to change notification settings - Fork 96
Home
combine
is a parser combinator library. Let's explain that in two steps.
A parser is a thing that turns some input (for example a &str
) into some output (for example (i32, Vec<i32>)
) by applying an algorithm.
"combinator" refers to a special way of defining the exakt algorithm for a parser. If you write a parser from scratch, you usually end up with a big state machine and lots of slice handling. In constrast, a parser combinator defines the final parser by combining small building blocks. This is how it looks like:
# use combine::parser::range::{range, take_while1};
# use combine::parser::repeat::{sep_by};
# use combine::parser::Parser;
let input = "Hammer, Saw, Drill";
// a chain of alphabetic characters
let tool = take_while1(|c : char| c.is_alphabetic());
// many `tool`s, seperated by ", "
let mut tools = sep_by(tool, range(", "));
let output : Vec<&str> = tools.easy_parse(input).unwrap().0;
// vec!["Hammer", "Saw", "Drill"]
Listing A-1 - 'Hello combine' example
take_while1
, range
and sep_by
are building blocks from the combine
library. tool
and tools
are self-made building blocks. The latter is also the final parser.
Note: From now on, I will no longer use the term 'building block', but instead call them 'parsers'. Parsers that have nested parsers are 'combinators'.
Learn combine
with the not so quick Quickstart Tutorial.
Every parser in every language needs roughly these four things to work:
- The data to parse or a way to obtain that data
- A definition of the format to parse
- A way of gathering and returning the information it has found
- A way to notify about Errors during parsing
It may also support one or more of these extra functionalities
- Resume parsing / streaming of input data
- Giving location information of input data tokens (e.g. line, column for text input)
combine
tries to be as generic as possible in these things which results in quite a few trait bounds all over the place.
The linked chapters describe the combine
way of these things and why they are the way they are. This helps a lot understanding error messages and dealing with sticks and stones.
For reference, here are some alternatives in the rust ecosystem:
All parser libraries come with their own trade offs, so choose wisely 😄 .