Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't assume all text files end with a line break #309

Closed
cessen opened this issue Jun 18, 2021 · 9 comments
Closed

Don't assume all text files end with a line break #309

cessen opened this issue Jun 18, 2021 · 9 comments
Assignees
Labels
C-bug Category: This is a bug

Comments

@cessen
Copy link
Contributor

cessen commented Jun 18, 2021

Looking through Helix's code, and then testing out with some text files, it looks like Helix tries really, really hard to enforce all text files having a final end-of-file line break. To the point that its internals assume this.

Let me start by saying that I agree that text files should generally end with a line break. Among other reasons:

  1. Unix and POSIX mandate it, and technically don't consider non-compliant files to even be valid text files.
  2. The C standard also technically requires this for all source files (although I suspect few modern compilers enforce it).

Moreover, Unix and C have good reasons for requiring this, mostly having to do with various non-interactive text processing/preprocessing tools. So I'm definitely in the camp that believes this convention should generally be followed.

However, in practice the world's text files frequently don't adhere to this, and I don't think it's the place of a text editor to enforce platform/tool/language-specific conventions unless directed to by the user. A text editor should be able to round-trip a text file in a bit-for-bit identical way whenever reasonably possible. And forcing file-end line breaks prevents that in the general case.

Furthermore (assuming that we want to support round-tripping), trying to make the internal code view everything in terms of this convention actually complicates the code with special cases. Which I learned the hard way in the early days of Ropey.

So, speaking from experience, my recommendation is that we don't have any of Helix's internals assume a file-end line break.

I do think there should be an optional feature that ensures file-end line breaks on save. But that's different than having Helix's internals view the world in those terms.

@cessen cessen added the C-discussion Category: Discussion or questions that doesn't represent real issues label Jun 18, 2021
@kellytk
Copy link
Contributor

kellytk commented Jun 19, 2021

I recently dealt with this issue in a different editor.

A text editor should be able to round-trip a text file in a bit-for-bit identical way whenever reasonably possible.

Relatedly, a text editor should not modify file contents (not including file access time) without explicit action by the user.

So, speaking from experience, my recommendation is that we don't have any of Helix's internals assume a file-end line break.

Agreed.

I do think there should be an optional feature that ensures file-end line breaks on save.

Assuming it defaults to disabled.

I think an option and mechanism to both inform the user, and empower them to rectify it manually, is preferable. When opening the file, it would be helpful if there was an informative banner to the user, perhaps with an easy action to add the newline. The info+action banner presentation could be an option, however I would think to default it to enabled.

@cessen
Copy link
Contributor Author

cessen commented Jun 19, 2021

I've been testing out some prior art. When opening two files that are identical except for the presence/absence of a final file-end line break:

  • Vim/neovim: both files lack a final blank line, and behave identically in the editor.
  • Kakoune: both files lack a final blank line, and behave identically in the editor.
  • Emacs: by default, emacs doesn't give any indication in the gutter of what lines are present or not. But navigating to the very end of the file, it does allow the cursor to move to a final blank row when there's a final line break. If you turn on line numbers in the gutter, it doesn't show a line number for that final row until you insert some contents.
  • Nano: both files do have a final blank line, and behave identically in the editor.
  • Sublime Text: the file with a file-end line break has a final blank line.
  • GEdit: both files lack a final blank line, and behave identically in the editor.
  • VSCode: the file with a file-end line break has a final blank line.

So, in other words, it's all over the place. In general, I think it's weird when editors don't reflect the file difference somehow, so vim/kak/nano/gedit all feel off to me. I think emacs' solution is kind of clever, where a final blank line is there as far as cursor navigation goes, but isn't treated as a line for line-numbering purposes. Although maybe that behavior is still a little to unix/posix-specific.

I do think there should be an optional feature that ensures file-end line breaks on save.

Assuming it defaults to disabled.

I don't personally feel strongly one way or the other, but I think you're right that it follows the principle of least surprise, and so that would probably be the best default. Intuitively, if I load a file and immediately save it again without modification, I don't expect the on-disk contents to me modified.

There's also an interesting niche case where this came up for me. In the reddit thread of one of Ropey's releases, BurntSushi asked about how Ropey would handle binary data, and that led me down a rabbit hole of wondering how various text editors handle binary data. Vim behaved as if it would probably successfully round-trip the binary data, but it didn't. It was really unclear to me at the time why that was the case, but in retrospect it's fairly obvious: it probably added a line break to the end of the file.

@kirawi
Copy link
Member

kirawi commented Jun 19, 2021

VSCode also acts the same as Emacs if you disable editor.renderFinalNewline, if I understood correctly. It's disabled by default on Linux systems.

@kirawi
Copy link
Member

kirawi commented Jun 19, 2021

This was something I sort of "attempted" in #98, though of course there are many problems with it and it doesn't fully address everything. I think if I were to do it again knowing what I know now, I would probably deal with it by creating a thin abstraction so the internals don't have to be rewritten.

@cessen
Copy link
Contributor Author

cessen commented Jun 19, 2021

VSCode also acts the same as Emacs if you disable editor.renderFinalNewline,

Having an option like this is probably the best solution. It's also super easy to support--probably a one-liner (maybe two). And if we want to be opinionated about it, it can default to on.

I think if I were to do it again knowing what I know now, I would probably deal with it by creating a thin abstraction so the internals don't have to be rewritten.

Unless I'm misunderstanding you, resolving this issue is mostly just a matter of combing through the Helix's code to make sure lines are being handled in a way that doesn't discard the "last line" from Ropey's perspective. It will likely touch a lot of source files, but the actual size of the changes should be very small. Certainly not a rewrite.

I've been wanting to comb through Helix's use of Ropey anyway, so I can just do both at the same time if @archseer gives his okay on this issue. I could also switch the appropriate spots to use Ropey's new non-panicking methods while I'm at it.

@kirawi
Copy link
Member

kirawi commented Jun 19, 2021

@cessen Feel free, since you're already going that far. For me, my decision was because the changes would be quite far reaching in many aspects. Now that we have far more tests than when I started that PR, it might be a different situation now.

@cessen
Copy link
Contributor Author

cessen commented Jun 20, 2021

Discussed this with @archseer on matrix, and he's fine with moving forward on this. I'll start work on this once I've finished some other work on Helix.

@cessen cessen self-assigned this Jun 20, 2021
@cessen cessen added C-bug Category: This is a bug and removed C-discussion Category: Discussion or questions that doesn't represent real issues labels Jun 20, 2021
@cessen
Copy link
Contributor Author

cessen commented Jun 20, 2021

Marking as bug just because of the round-tripping aspect.

@cessen
Copy link
Contributor Author

cessen commented Jun 24, 2021

Closing this issue because #362 subsumes it and will track the task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug
Projects
None yet
Development

No branches or pull requests

3 participants