-
I'm very new to peg and pegtl, so probably I'm missing something. I have a grammar very similar to the following one:
he parse works great, but there is too many activation of test_action::apply. The program outputs "AAAB", because, if I understand well, the parse tries the first alternative (AB) for the first character and fails, then proceeds with the other (A). But even if it "rewinds", it always call test_action::apply. What is the correct way to handle this situation? My intent is to output "AAB", possibly without complicate the grammar. |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments
-
Any parser that may "rewind", like those generated by PEGTL, will have to 'undo' any state modifications it made on any failed branches in the grammar. The way PEGTL allows you to do this is with the "control class", which has hooks that are invoked when a grammar rule starts to match, and when it matches either successfully or matching fails. However, it's up to you as the programmer to put a solution together using this low level tool. Imho, for complex grammars, where backtracking is not easy to see, the best thing to do is make your parser construct a parse tree, which is easy to fix using simple push and pop operations when the parser backtracks (e.g. in your example, this would remove the first 'A' node when it fails), and then pass that tree to the rest of your program. Constructing a parse tree inside a backtracking PEGTL parser is a simple and generic algorithm that works for any grammar. Out of the box, all PEGTL does is give you a recognizer, answering the question "Does this input conform to this grammar?" |
Beta Was this translation helpful? Give feedback.
-
The analysis of the issue from @nlyan is spot on. Generating a parse tree and taking it from there is one solution, another is to play around with the rules and the actions (and control) in a way that minimises backtracking and/or attaches the actions to rules in a way that backtracking in the grammar doesn't need to undo previous actions. The latter is not always super easy, however between actions and control it's usually possible to find a solution that works well and is not overly complicated. |
Beta Was this translation helpful? Give feedback.
-
To add: The PEGTL does have out-of-the-box support for generating a parse-tree (and even apply some transformations towards an AST on-the-fly), have a look at the examples. Note that parse-tree generation is a two-pass approach, you first generate the parse-tree from the input, then you use the (now stable) parse-tree to do whatever you want with it. If your application does not parse enormous amounts of data, this is usually not a performance issue and you get rid of all backtracking issues. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your support. I wrote a couple of solution following your suggestions:
|
Beta Was this translation helpful? Give feedback.
-
Thank you very much! |
Beta Was this translation helpful? Give feedback.
-
You're welcome! For the record, another possibility, if not particularly elegant, is to use the (infinite) lookahead feature of PEGs. With something like |
Beta Was this translation helpful? Give feedback.
-
I took the last approach suggested by @ColinH in one of my own recent projects. I had a grammar matching strings that looked approximately like "key = value", but where the right-hand side had a more elaborate grammar for a particular 'key' string. Most of these pairs were handled by a generic grammar placed at the end of a Here's a cut out of my solution: template <typename... Rules>
using lookahead = seq<at<Rules...>, Rules...>;
template <typename Key, typename Value, typename Separator>
struct option : seq<lookahead<Key, Separator>, must<Value>> {}; There's certainly a degree of black magic to creating PEG grammars. |
Beta Was this translation helpful? Give feedback.
You're welcome! For the record, another possibility, if not particularly elegant, is to use the (infinite) lookahead feature of PEGs.
With something like
seq< at< foo >, foo >
, the firstfoo
will attempt to match with actions disabled, and only if that succeeded willfoo
be matched "for real", i.e. with actions enabled.