Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regenerate parser for the observe mini-language #1743

Merged
merged 2 commits into from
Apr 17, 2023
Merged

Conversation

mdickinson
Copy link
Member

This PR regenerates the parser for the observe mini-language using the newest version of Lark. This should fix some deprecation warnings coming from imports used by the old parser.

The parser was regenerated using Lark 1.1.5, via

python -m lark.tools.standalone traits/observation/_dsl_grammar.lark --out traits/observation/_generated_parser.py

Closes #1739

The parser was regenerated using Lark 1.1.5, via

    python -m lark.tools.standalone traits/observation/_dsl_grammar.lark --out traits/observation/_generated_parser.py
@mdickinson
Copy link
Member Author

Tests are currently failing for unrelated reasons; we need #1741 to be merged.

Copy link
Contributor

@rahulporuri rahulporuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one comment regarding the review process itself.

DATA = (
{'parser': {'parser': {'tokens': {0: 'NAME', 1: 'series', 2: 'element', 3: 'LSQB', 4: 'STAR', 5: 'metadata', 6: 'trait', 7: 'series_terminal', 8: 'anytrait', 9: 'PLUS', 10: 'ITEMS', 11: 'items', 12: '$END', 13: 'COMMA', 14: 'DOT', 15: 'RSQB', 16: 'COLON', 17: 'notify', 18: 'quiet', 19: 'parallel', 20: 'start', 21: 'parallel_terminal'}, 'states': {0: {0: (0, 29)}, 1: {1: (0, 19), 2: (0, 34), 3: (0, 22), 0: (0, 9), 4: (0, 3), 5: (0, 30), 6: (0, 12), 7: (0, 13), 8: (0, 24), 9: (0, 0), 10: (0, 18), 11: (0, 14)}, 2: {5: (0, 30), 2: (0, 16), 8: (0, 8), 6: (0, 12), 4: (0, 3), 0: (0, 9), 3: (0, 22), 9: (0, 0), 10: (0, 18), 11: (0, 14)}, 3: {12: (1, {'@': 10}), 13: (1, {'@': 10})}, 4: {13: (1, {'@': 11}), 14: (1, {'@': 11}), 15: (1, {'@': 11}), 16: (1, {'@': 11})}, 5: {14: (1, {'@': 12}), 16: (1, {'@': 12}), 12: (1, {'@': 13}), 13: (1, {'@': 13})}, 6: {5: (0, 30), 10: (0, 18), 2: (0, 5), 6: (0, 12), 0: (0, 9), 3: (0, 22), 8: (0, 33), 9: (0, 0), 4: (0, 3), 11: (0, 14)}, 7: {13: (1, {'@': 14}), 14: (1, {'@': 14}), 15: (1, {'@': 14}), 16: (1, {'@': 14})}, 8: {12: (1, {'@': 15}), 13: (1, {'@': 15})}, 9: {14: (1, {'@': 16}), 12: (1, {'@': 16}), 13: (1, {'@': 16}), 16: (1, {'@': 16}), 15: (1, {'@': 16})}, 10: {13: (0, 1), 12: (1, {'@': 17})}, 11: {13: (0, 31), 15: (0, 35)}, 12: {14: (1, {'@': 18}), 12: (1, {'@': 18}), 13: (1, {'@': 18}), 16: (1, {'@': 18}), 15: (1, {'@': 18})}, 13: {12: (1, {'@': 19}), 13: (1, {'@': 19})}, 14: {14: (1, {'@': 20}), 12: (1, {'@': 20}), 13: (1, {'@': 20}), 16: (1, {'@': 20}), 15: (1, {'@': 20})}, 15: {16: (0, 21), 17: (0, 32), 14: (0, 26), 18: (0, 23), 15: (1, {'@': 21}), 13: (1, {'@': 21})}, 16: {14: (1, {'@': 14}), 16: (1, {'@': 14}), 12: (1, {'@': 22}), 13: (1, {'@': 22})}, 17: {}, 18: {14: (1, {'@': 23}), 12: (1, {'@': 23}), 13: (1, {'@': 23}), 16: (1, {'@': 23}), 15: (1, {'@': 23})}, 19: {16: (0, 21), 17: (0, 2), 14: (0, 26), 18: (0, 6)}, 20: {13: (1, {'@': 12}), 14: (1, {'@': 12}), 15: (1, {'@': 12}), 16: (1, {'@': 12})}, 21: {9: (1, {'@': 24}), 0: (1, {'@': 24}), 10: (1, {'@': 24}), 3: (1, {'@': 24}), 4: (1, {'@': 24})}, 22: {5: (0, 30), 1: (0, 15), 19: (0, 11), 6: (0, 12), 0: (0, 9), 3: (0, 22), 2: (0, 4), 9: (0, 0), 10: (0, 18), 11: (0, 14)}, 23: {5: (0, 30), 2: (0, 20), 6: (0, 12), 0: (0, 9), 3: (0, 22), 9: (0, 0), 10: (0, 18), 11: (0, 14)}, 24: {12: (1, {'@': 25}), 13: (1, {'@': 25})}, 25: {12: (1, {'@': 26}), 13: (1, {'@': 26})}, 26: {9: (1, {'@': 27}), 0: (1, {'@': 27}), 10: (1, {'@': 27}), 3: (1, {'@': 27}), 4: (1, {'@': 27})}, 27: {16: (0, 21), 17: (0, 32), 18: (0, 23), 14: (0, 26), 15: (1, {'@': 28}), 13: (1, {'@': 28})}, 28: {7: (0, 25), 1: (0, 19), 3: (0, 22), 0: (0, 9), 2: (0, 34), 20: (0, 17), 4: (0, 3), 5: (0, 30), 6: (0, 12), 21: (0, 10), 8: (0, 24), 9: (0, 0), 10: (0, 18), 11: (0, 14)}, 29: {14: (1, {'@': 29}), 12: (1, {'@': 29}), 13: (1, {'@': 29}), 16: (1, {'@': 29}), 15: (1, {'@': 29})}, 30: {14: (1, {'@': 30}), 12: (1, {'@': 30}), 13: (1, {'@': 30}), 16: (1, {'@': 30}), 15: (1, {'@': 30})}, 31: {5: (0, 30), 1: (0, 27), 6: (0, 12), 0: (0, 9), 3: (0, 22), 2: (0, 4), 9: (0, 0), 10: (0, 18), 11: (0, 14)}, 32: {5: (0, 30), 2: (0, 7), 6: (0, 12), 0: (0, 9), 3: (0, 22), 9: (0, 0), 10: (0, 18), 11: (0, 14)}, 33: {12: (1, {'@': 31}), 13: (1, {'@': 31})}, 34: {14: (1, {'@': 11}), 16: (1, {'@': 11}), 12: (1, {'@': 32}), 13: (1, {'@': 32})}, 35: {14: (1, {'@': 33}), 12: (1, {'@': 33}), 13: (1, {'@': 33}), 16: (1, {'@': 33}), 15: (1, {'@': 33})}}, 'start_states': {'start': 28}, 'end_states': {'start': 17}}, 'lexer_conf': {'tokens': [{'@': 0}, {'@': 1}, {'@': 2}, {'@': 3}, {'@': 4}, {'@': 5}, {'@': 6}, {'@': 7}, {'@': 8}, {'@': 9}], 'ignore': ['WS'], 'g_regex_flags': 0, '__type__': 'LexerConf'}, 'start': ['start'], '__type__': 'LALR_ContextualLexer'}, 'rules': [{'@': 16}, {'@': 23}, {'@': 29}, {'@': 10}, {'@': 27}, {'@': 24}, {'@': 18}, {'@': 20}, {'@': 30}, {'@': 33}, {'@': 14}, {'@': 12}, {'@': 11}, {'@': 28}, {'@': 21}, {'@': 22}, {'@': 15}, {'@': 13}, {'@': 31}, {'@': 32}, {'@': 25}, {'@': 19}, {'@': 26}, {'@': 17}], 'options': {'debug': False, 'keep_all_tokens': False, 'tree_class': None, 'cache_grammar': False, 'postlex': None, 'parser': 'lalr', 'lexer': 'contextual', 'transformer': None, 'start': ['start'], 'priority': None, 'ambiguity': 'auto', 'propagate_positions': False, 'lexer_callbacks': {}, 'maybe_placeholders': False, 'edit_terminals': None, 'g_regex_flags': 0}, '__type__': 'Lark'}
{'parser': {'lexer_conf': {'terminals': [{'@': 0}, {'@': 1}, {'@': 2}, {'@': 3}, {'@': 4}, {'@': 5}, {'@': 6}, {'@': 7}, {'@': 8}, {'@': 9}], 'ignore': ['WS'], 'g_regex_flags': 0, 'use_bytes': False, 'lexer_type': 'contextual', '__type__': 'LexerConf'}, 'parser_conf': {'rules': [{'@': 10}, {'@': 11}, {'@': 12}, {'@': 13}, {'@': 14}, {'@': 15}, {'@': 16}, {'@': 17}, {'@': 18}, {'@': 19}, {'@': 20}, {'@': 21}, {'@': 22}, {'@': 23}, {'@': 24}, {'@': 25}, {'@': 26}, {'@': 27}, {'@': 28}, {'@': 29}, {'@': 30}, {'@': 31}, {'@': 32}, {'@': 33}], 'start': ['start'], 'parser_type': 'lalr', '__type__': 'ParserConf'}, 'parser': {'tokens': {0: 'trait', 1: 'PLUS', 2: 'NAME', 3: 'ITEMS', 4: 'metadata', 5: 'LSQB', 6: 'items', 7: 'element', 8: 'DOT', 9: 'COLON', 10: '$END', 11: 'COMMA', 12: 'parallel', 13: 'series', 14: 'STAR', 15: 'anytrait', 16: 'RSQB', 17: 'quiet', 18: 'notify', 19: 'parallel_terminal', 20: 'start', 21: 'series_terminal'}, 'states': {0: {0: (0, 17), 1: (0, 23), 2: (0, 25), 3: (0, 18), 4: (0, 14), 5: (0, 2), 6: (0, 4), 7: (0, 19)}, 1: {8: (1, {'@': 22}), 9: (1, {'@': 22}), 10: (1, {'@': 29}), 11: (1, {'@': 29})}, 2: {0: (0, 17), 1: (0, 23), 12: (0, 27), 2: (0, 25), 13: (0, 12), 3: (0, 18), 4: (0, 14), 5: (0, 2), 6: (0, 4), 7: (0, 15)}, 3: {0: (0, 17), 1: (0, 23), 2: (0, 25), 3: (0, 18), 14: (0, 5), 5: (0, 2), 7: (0, 35), 15: (0, 9), 4: (0, 14), 6: (0, 4)}, 4: {16: (1, {'@': 17}), 9: (1, {'@': 17}), 11: (1, {'@': 17}), 8: (1, {'@': 17}), 10: (1, {'@': 17})}, 5: {10: (1, {'@': 13}), 11: (1, {'@': 13})}, 6: {16: (1, {'@': 19}), 9: (1, {'@': 19}), 11: (1, {'@': 19}), 8: (1, {'@': 19}), 10: (1, {'@': 19})}, 7: {0: (0, 17), 1: (0, 23), 2: (0, 25), 3: (0, 18), 4: (0, 14), 5: (0, 2), 6: (0, 4), 7: (0, 8)}, 8: {8: (1, {'@': 20}), 16: (1, {'@': 20}), 9: (1, {'@': 20}), 11: (1, {'@': 20})}, 9: {10: (1, {'@': 26}), 11: (1, {'@': 26})}, 10: {17: (0, 0), 18: (0, 7), 8: (0, 29), 9: (0, 13), 16: (1, {'@': 23}), 11: (1, {'@': 23})}, 11: {10: (1, {'@': 30}), 11: (1, {'@': 30})}, 12: {17: (0, 0), 18: (0, 7), 8: (0, 29), 9: (0, 13), 16: (1, {'@': 24}), 11: (1, {'@': 24})}, 13: {5: (1, {'@': 15}), 3: (1, {'@': 15}), 2: (1, {'@': 15}), 1: (1, {'@': 15}), 14: (1, {'@': 15})}, 14: {16: (1, {'@': 18}), 9: (1, {'@': 18}), 11: (1, {'@': 18}), 8: (1, {'@': 18}), 10: (1, {'@': 18})}, 15: {8: (1, {'@': 22}), 16: (1, {'@': 22}), 9: (1, {'@': 22}), 11: (1, {'@': 22})}, 16: {1: (0, 23), 2: (0, 25), 19: (0, 20), 7: (0, 1), 13: (0, 31), 0: (0, 17), 15: (0, 11), 20: (0, 21), 3: (0, 18), 14: (0, 5), 4: (0, 14), 21: (0, 33), 5: (0, 2), 6: (0, 4)}, 17: {16: (1, {'@': 16}), 9: (1, {'@': 16}), 11: (1, {'@': 16}), 8: (1, {'@': 16}), 10: (1, {'@': 16})}, 18: {8: (1, {'@': 11}), 16: (1, {'@': 11}), 9: (1, {'@': 11}), 11: (1, {'@': 11}), 10: (1, {'@': 11})}, 19: {8: (1, {'@': 21}), 16: (1, {'@': 21}), 9: (1, {'@': 21}), 11: (1, {'@': 21})}, 20: {11: (0, 26), 10: (1, {'@': 33})}, 21: {}, 22: {8: (1, {'@': 21}), 9: (1, {'@': 21}), 10: (1, {'@': 27}), 11: (1, {'@': 27})}, 23: {2: (0, 28)}, 24: {15: (0, 34), 0: (0, 17), 1: (0, 23), 2: (0, 25), 3: (0, 18), 14: (0, 5), 4: (0, 14), 5: (0, 2), 6: (0, 4), 7: (0, 22)}, 25: {8: (1, {'@': 10}), 16: (1, {'@': 10}), 9: (1, {'@': 10}), 11: (1, {'@': 10}), 10: (1, {'@': 10})}, 26: {1: (0, 23), 2: (0, 25), 5: (0, 2), 7: (0, 1), 13: (0, 31), 0: (0, 17), 15: (0, 11), 3: (0, 18), 14: (0, 5), 4: (0, 14), 21: (0, 30), 6: (0, 4)}, 27: {16: (0, 6), 11: (0, 32)}, 28: {8: (1, {'@': 12}), 16: (1, {'@': 12}), 9: (1, {'@': 12}), 11: (1, {'@': 12}), 10: (1, {'@': 12})}, 29: {5: (1, {'@': 14}), 3: (1, {'@': 14}), 2: (1, {'@': 14}), 1: (1, {'@': 14}), 14: (1, {'@': 14})}, 30: {10: (1, {'@': 31}), 11: (1, {'@': 31})}, 31: {17: (0, 24), 18: (0, 3), 8: (0, 29), 9: (0, 13)}, 32: {0: (0, 17), 1: (0, 23), 2: (0, 25), 3: (0, 18), 13: (0, 10), 4: (0, 14), 5: (0, 2), 6: (0, 4), 7: (0, 15)}, 33: {10: (1, {'@': 32}), 11: (1, {'@': 32})}, 34: {10: (1, {'@': 28}), 11: (1, {'@': 28})}, 35: {8: (1, {'@': 20}), 9: (1, {'@': 20}), 10: (1, {'@': 25}), 11: (1, {'@': 25})}}, 'start_states': {'start': 16}, 'end_states': {'start': 21}}, '__type__': 'ParsingFrontend'}, 'rules': [{'@': 10}, {'@': 11}, {'@': 12}, {'@': 13}, {'@': 14}, {'@': 15}, {'@': 16}, {'@': 17}, {'@': 18}, {'@': 19}, {'@': 20}, {'@': 21}, {'@': 22}, {'@': 23}, {'@': 24}, {'@': 25}, {'@': 26}, {'@': 27}, {'@': 28}, {'@': 29}, {'@': 30}, {'@': 31}, {'@': 32}, {'@': 33}], 'options': {'debug': False, 'keep_all_tokens': False, 'tree_class': None, 'cache': False, 'postlex': None, 'parser': 'lalr', 'lexer': 'contextual', 'transformer': None, 'start': ['start'], 'priority': 'normal', 'ambiguity': 'auto', 'regex': False, 'propagate_positions': False, 'lexer_callbacks': {}, 'maybe_placeholders': False, 'edit_terminals': None, 'g_regex_flags': 0, 'use_bytes': False, 'import_paths': [], 'source_path': None, '_plugins': {}}, '__type__': 'Lark'}
Copy link
Contributor

@rahulporuri rahulporuri Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I naively assumed that I could run python -m lark.tools.standalone traits/observation/_dsl_grammar.lark --out traits/observation/_generated_parser.py on my personal machine and check that there are no changes in the generated parser code.

That doesn't seem to be the case. This dictionary gets modified when I regenerate the parser code. The change just moved around the key/value pairs in the parser tokens dictionary.

I'm not sure what the usual review process for such PRs is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting; that'll be due to hash randomization, I guess. lark-parser/lark#595 looks related.

Possibly we should be generating using a fixed PYTHONHASHSEED.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rahulporuri I just tried using a fixed PYTHONHASHSEED, generating with:

PYTHONHASHSEED=12345 python -m lark.tools.standalone traits/observation/_dsl_grammar.lark --out traits/observation/_generated_parser.py

But I still get a different ordering each time. It's not clear to me what the source of the non-determinism is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me what the source of the non-determinism is.

From a brief look at the Lark source, I suspect that the source of non-determinism is memory addresses of objects: the code is using ids of objects as set elements and/or dictionary keys. (Which then affects iteration order, etc.) There is a use of random.randint in earley_forest.py, but I don't think we exercise that code path.

I'll stop trying to find a way to make this deterministic and merge as-is.

@mdickinson mdickinson merged commit bef7857 into main Apr 17, 2023
@mdickinson mdickinson deleted the update-grammar branch April 17, 2023 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sre_parse and sre_constants modules are deprecated
2 participants