-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SuperPMI asm diffs and metrics improvement #52877
Comments
Seems like the For non-numeric data we'd have to think harder about what aggregation would look like. I suppose we'd build a histogram and somehow compare base/diff histograms? |
Also seems like there is a lot of ad-hoc counting stuff in the jit that perhaps we should try and fold into this somehow. So much so that I'm not quite sure where to start. Some the existing counting also does aggregation across runs of the jit (eg min/max/average). I am wondering if we should just stop doing jit-side aggregation and always rely on an external processor or if being able to see aggregation directly from the jit is useful and we should keep it. |
This issue proposes adding a new JIT-EE api for the JIT to report "metrics" back to SuperPMI, and how that could be used by SuperPMI replay and asm diffs.
Currently, running SuperPMI asm diffs happens as follows:
superpmi.py asmdiffs
to orchestrate the process.superpmi.exe -a
.a. Each of baseline and diff JIT compile a method context and report results back to the "VM" (or here, the SuperPMI host).
b. superpmi.exe runs the "near differ" to determine if there is a binary diff. If so, the method context number is added to a list of method contexts with asm diffs.
superpmi.py
iterates over the list of method context numbers, invoking both the baseline and diff JIT to generate textual .dasm (and, thus, compiling the functions again).superpmi.py
callsjit-analyze
to analyze the results. It parses the textual output and generates improvement/regression information for each metric it is concerned with.I propose a set of new JIT-EE methods:
If we don't want named overloading, we could add the metric type to the function name. Or, we could have something like:
At the end of compilation, the JIT calls this API for each interesting metric it has computed. This should be the same set that
jit-analyze
will parse (or a superset), namely, "CodeSize, PerfScore, PrologSize, InstrCount, AllocSize, ExtraAllocBytes, DebugClauseCount, DebugVarCount". For example:(we would also report hot/cold/data sizes).
SuperPMI will save these in a new CompileResult table (or tables). The VM would implement this function a no-op, and in fact, the JIT could optionally skip reporting them if SuperPMI is not the host.
For asm diffs, if there is a diff, instead of reporting just the method context#, superpmi.exe could also report the metrics. This could be as a file in CSV form, one column per metric (we might need to handle the case where different compilations report different metrics, in which case maybe json/xml would be better?).
Then, jit-analyze could work directly on this set of data, instead of needing to parse out the metrics from the generated text asm files. This would make jit-analyze much faster. In fact, the set of asm files we generate could be deferred until after jit-analyze is run (not before, as a prerequisite for running jit-analyze). For example, we could choose to only generate dasm files for the set of improved/regressed routines that jit-analyze reports, and not automatically generate dasm for all the method contexts with diffs. In the worst case where every function has diffs, this could save an enormous amount of time, avoiding generating dasm files for thousands of functions which will never be examined. We would need to make sure jit-analyze reports all the diffs that we as developers prefer to look at, including the simplest possible smallest diffs, not just the most impactful diffs. There would need to be a way to have the current behavior of generating all diffs, or perhaps a way to take the list of diffs and generate dasm from them, after the normal processing. Generating the .dasm files could also have a default reasonable cut-off (maximum number of generated files), since generating the dasm is not required to gather the stats.
This depends on all the interesting metrics being available during "normal", non-disasm compilation, which should be true.
So, the new process would look like:
superpmi.py asmdiffs
to orchestrate the process.superpmi.exe -a
.a. Each of baseline and diff JIT compile a method context and report results back to the SuperPMI host, including metrics.
b. superpmi.exe runs the "near differ" to determine if there is a binary diff. If so, the method context number is added to a list of method contexts with asm diffs, including metrics.
superpmi.py
calls jit-analyze to analyze the results. It reads the generated list of metrics for the difs and generates improvement/regression information for each metric it is concerned with. It needs to generate a machine-readable summary of its results, including method context numbers, in addition to the text/markdown format (there is current provision for TSV/JSON output).superpmi.py
iterates over the list of method context numbers with diffs that jit-analyze says is interesting, invoking both the baseline and diff JIT to generate textual .dasm for developers to examine.The work for this should concurrently consider fixing #41653, by passing summary of metrics available to superpmi.exe on to jit-analyze. In that case, by somehow providing complete aggregate code size (not just total code size of functions with diffs) to jit-analyze.
@dotnet/jit-contrib Comments?
category:eng-sys
theme:super-pmi
skill-level:expert
cost:large
impact:large
The text was updated successfully, but these errors were encountered: