Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance Proposal] Use SIMDJson in Corehost #59642

Open
deeprobin opened this issue Sep 27, 2021 · 13 comments
Open

[Performance Proposal] Use SIMDJson in Corehost #59642

deeprobin opened this issue Sep 27, 2021 · 13 comments
Labels
area-Host tenet-performance Performance related issue
Milestone

Comments

@deeprobin
Copy link
Contributor

Description

Replace rapidjson with simdjson

Data

https://github.com/simdjson/simdjson#performance-results

Analysis

SIMDJSON is 4 times faster than RapidJSON

@deeprobin deeprobin added the tenet-performance Performance related issue label Sep 27, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Sep 27, 2021
@ghost
Copy link

ghost commented Sep 27, 2021

Tagging subscribers to this area: @eiriktsarpalis, @layomia
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Replace rapidjson with simdjson

Data

https://github.com/simdjson/simdjson#performance-results

Analysis

SIMDJSON is 4 times faster than RapidJSON

Author: deeprobin
Assignees: -
Labels:

area-System.Text.Json, tenet-performance, untriaged

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Sep 27, 2021

Is json parsing in corehost a bottleneck for you? If so please share some numbers. It's supposed to parse a relatively small json once, isn't? while I love simdjson library it sounds a bit overkill for corehost - it will increase binary size.

@ghost
Copy link

ghost commented Sep 27, 2021

Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Replace rapidjson with simdjson

Data

https://github.com/simdjson/simdjson#performance-results

Analysis

SIMDJSON is 4 times faster than RapidJSON

Author: deeprobin
Assignees: -
Labels:

tenet-performance, area-Host, untriaged

Milestone: -

@am11
Copy link
Member

am11 commented Sep 27, 2021

Generally, I agree with @EgorBo. However, it won't be the first time we are improving performance of apps startup by replacing JSON engine in corehost: dotnet/core-setup#7708. Could be worth exploring?

@EgorBo
Copy link
Member

EgorBo commented Sep 27, 2021

Generally, I agree with @EgorBo. However, it won't be the first time we are improving performance of apps startup by replacing JSON engine in corehost: dotnet/core-setup#7708. Could be worth exploring?

nice! that pr even reduced binary size. However, I fear simdjson will increase by e.g. half-a-megabyte or so. Still indeed worth exploring then.
I personally would love SimdJSON to power System.Text.Json instead (JsonDocument)

@deeprobin
Copy link
Contributor Author

Well, performance-wise it is not a bottleneck for me, however, such an optimization does not hurt in my opinion.
In my opinion we shouldn't care about the binary size in the end - or do you want to run the corehost on a floppy disk?

nice! that pr even reduced binary size. However, I fear simdjson will increase by e.g. half-a-megabyte or so. Still indeed worth exploring then.

@EgorBo

@am11
Copy link
Member

am11 commented Sep 27, 2021

performance-wise it is not a bottleneck for me

Empirically speaking, when it comes to performance vs. size in runtime, normally performance takes precedence. We would still need to show numbers to discuss/understand the tradeoffs (I guess it would be a non-trivial work to get there).

do you want to run the corehost on a floppy disk?

Impact on file sizes in .NET is typically frowned upon, there is even a tag for it: size-reduction. In case of host, it will impact size of all hosts including singlefilehost where slimmer binaries are preferred: #12629.

@deeprobin
Copy link
Contributor Author

Then maybe an Optimization Level option would be useful. On the one hand the type optimized for binary size and on the other hand the one optimized for performance. The result can be achieved with preprocessor variables.

In Rust for example you can define this in the file Cargo.toml¹, in our case maybe in the Project.csproj file.

References

[1]: See profiles - https://doc.rust-lang.org/cargo/reference/profiles.html

@vitek-karas
Copy link
Member

As @am11 mentions size is pretty important, specifically for the host, since it ships with basically every .NET app (so even small increase adds up in some cases).

I definitely think it's worth to see what the perf gain would be versus the size regression.

Regarding "optimize for size" - we have the ability to do that in managed code (Trimming), but currently we don't have that ability in native code. This is mainly due to the fact that the SDK doesn't actually produce native executables directly, it uses precompiled executables. So we would have to ship two versions of the host, one for perf, one for size. We already have two (normal, single-file) so this would double the number of hosts...
All of this is doable, just relatively expensive.

In the case of the host there's additional tradeoff to think about - this code runs only during startup. So it's not just about CPU, it's also about IO and memory. All of these metrics should be as low as possible. The current hosts are definitely not perfect in this regard, but any changes will have to consider these aspects as well.

@hez2010
Copy link
Contributor

hez2010 commented Sep 28, 2021

Related: #28937

@TkTech
Copy link

TkTech commented Oct 1, 2021

If you're planning on including simdjson verbatim, rather than just the algorithms described in the paper, the size can be tweaked a decent bit with just build-time flags and minimal effort. You're likely only going to use the OnDemand or the DOM-based API, so the other can be removed. The runtime dispatch can be narrowed if Core requires, say, Haswell at a minimum then Westmere can be discarded. If at least SSE4 is required by other parts of Core, then the fallback parser can be removed.

I'm sure @lemire can offer some input here. Ultimately someone will need to try it to get realistic numbers since so much can be eliminated based off of how Core will use it.

@lemire
Copy link

lemire commented Oct 1, 2021

@TkTech For something like a runtime, it would make sense to build a trimmed down kernel version and it would not be hard.

@am11
Copy link
Member

am11 commented Oct 1, 2021

If at least SSE4 is required by other parts of Core, then the fallback parser can be removed.

Yes, this is the case.

Note that this issue is about corehost, the hosting layer where we use RapidJSON to parse two/three kinds of json files (deps, runtimeconfig, rid graph etc.). #28937 is tracking the actual runtime libraries part (wider audience).

@agocke agocke added this to AppModel Jul 1, 2022
@agocke agocke added this to the Future milestone Jul 1, 2022
@agocke agocke removed the untriaged New issue has not been triaged by the area owner label Jul 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Host tenet-performance Performance related issue
Projects
Status: No status
Development

No branches or pull requests

8 participants