Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose raw byte position in position information. #31

Closed
wants to merge 1 commit into from

Conversation

samhocevar
Copy link
Contributor

In some cases it may be desirable to know the byte offset in addition
to the line / column information, for instance when parsing binary
files, or when feeding the parser with partial data.

In some cases it may be desirable to know the byte offset in addition
to the line / column information, for instance when parsing binary
files, or when feeding the parser with partial data.
@samhocevar
Copy link
Contributor Author

This commit changes the API so I am not sure it is acceptable
as is. However the feature is desirable for my purposes, and I
found out it was easier to modify the PEGTL rather than implement
a byte tracking mechanism using additional state variables, which
I would then have to add to every parser needing it.

I am also not sure of how to format the information (or whether I should
ignore it) in position_info::operator<<.

If this does not fit within your overall plans, maybe you can suggest a
more elegant solution?

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling e5f8968 on lolengine:byte-position-in-input into 84a64ef on ColinH:master.

@ColinH
Copy link
Member

ColinH commented Sep 26, 2016

Thanks for the pull request, I'll not merge it right now because it might interfere with some other things that we have planned, but keep it as reminder and for possible future inclusion if sufficiently independent of these other changes.

@ColinH
Copy link
Member

ColinH commented Nov 29, 2016

Do you later use the byte position programmatically, or do you only need the human-readable form in the exception message?

@samhocevar
Copy link
Contributor Author

I use the byte position programmatically, yes.

My current use case here is the creation of a transpiler that does not need an intermediate AST. The parser analyses the input and marks parts of the code using their byte offsets. A post-process then uses search/replace to perform the language transformation, and if the replaced chunk changes the size, offsets located after it get updated. Using line/column notation would make the search/replace work more complex.

@ColinH
Copy link
Member

ColinH commented Dec 2, 2016

Ok, thanks, another small question: Do you ever need the byte offset together with column and line, or do you not need the column and line in cases where you use the byte offset?

@samhocevar
Copy link
Contributor Author

In my case, unless I am parsing binary data, I think I always need column and line in order to provide meaningful error reporting, regardless of whether my action class uses the byte offset.

@ColinH
Copy link
Member

ColinH commented Dec 6, 2016

I've been trying to make this more flexible, to not always keep all three numbers, but it seems to be more work than anticipated, and only for a small optimisation. Your pull-request is now unfortunately out of date, but some of the changes will be simpler with the latest commits. I will now look into re-doing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants