Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of EncodingDetectingInputStream #2535

Merged
merged 1 commit into from
Dec 17, 2022

Conversation

knutwannheden
Copy link
Contributor

Slightly improves the performance of EncodingDetectingInputStream by refactoring the read() method and avoiding the additional byte[] copy in readFully().

Also make sure that PlainTextParser#parseInputs() reads the source text before the encoding, since the latter won't be correctly set otherwise.

Slightly improves the performance of `EncodingDetectingInputStream` by refactoring the `read()` method and avoiding the additional `byte[]` copy in `readFully()`.

Also make sure that `PlainTextParser#parseInputs()` reads the source text before the encoding, since the latter won't be correctly set otherwise.
@knutwannheden
Copy link
Contributor Author

Have you evaluated supporting byte buffers at the API level and working with memory-mapped files?

@knutwannheden
Copy link
Contributor Author

I will try to supply some performance improvement figures for the parsing benchmark. I think it was around 10%. But for this change that parsing benchmark should be trimmed down a bit. The current two tests basically compare the results of using buffering vs no buffering when reading very small files (much smaller than the buffer size, even). I can provide a benchmark specifically for the refactored class.

@jkschneider jkschneider merged commit d29ae3a into openrewrite:main Dec 17, 2022
@jkschneider jkschneider added this to the 7.34.3 milestone Dec 17, 2022
@jkschneider
Copy link
Member

Have you evaluated supporting byte buffers at the API level and working with memory-mapped files?

Not yet. Are you meaning as an alternative implementation for our JavaFileObject?

@knutwannheden knutwannheden deleted the reading-performance branch December 17, 2022 19:31
@knutwannheden
Copy link
Contributor Author

Not yet. Are you meaning as an alternative implementation for our JavaFileObject?

We would still have to conform to the FileObject API. I have not done any high-level profiling, so I don't know how crucial this part actually is (I am assuming it will often be rather irrelevant compared to the other phases). Depending on that it could still be worthwhile to check this out.

Regarding file encoding in general, I quite like IntelliJ's approach to this: https://www.jetbrains.com/help/idea/encoding.html#single-file. In contrast, OpenRewrite will for ASCII and UTF-8 sources try to detect the character set for the entire contents, which is a bit wasteful.

At least when using the Maven plugin (not with the Gradle plugin AFAICT), the org.openrewrite.tree.ParsingExecutionContextView#setCharset() method is invoked if the project.build.sourceEncoding property is set, which will then skip the entire character set detection logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants