Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convenient self-profile option? #50780

Closed
nikomatsakis opened this issue May 15, 2018 · 16 comments
Closed

convenient self-profile option? #50780

nikomatsakis opened this issue May 15, 2018 · 16 comments
Assignees
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance

Comments

@nikomatsakis
Copy link
Contributor

It'd be really nice if there were an easy way to breakdown compilation time into a few categories, and to get a summary of reuse:

  • Parsing/expansion
  • Type-checking, borrow-checking
  • Codegen (aka trans)
  • LLVM + linking

Something that we could conveniently copy and paste into bug reports to get an overall sense of what is going. This would replace and/or build on the existing profile options. (Even better would be if we could then have an option to get per-query data, later on.)

cc @nnethercote @michaelwoerister

@nikomatsakis nikomatsakis added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance labels May 15, 2018
@nnethercote
Copy link
Contributor

Excluding the reuse part, isn't this what -Ztime-passes is for?

I've been told that profiling queries are meant to replace -Ztime-passes, but when I followed the instructions at https://forge.rust-lang.org/profile-queries.html I couldn't get it to work -- it produced empty files.

Here are some ways that I think -Ztime-passes could be improved:

  • Fewer buckets. There are too many that take up almost no time, and just clutter the output.
  • Percentages. I don't like having to constantly do percentage math (this stage took 3.7s out of 11.1s) in my head. Parallelism complicates this.
  • The nesting is unclear. It took me ages to realize that the total for nested buckets shows up below them.
  • Parellelism messes up the ordering.

A lot of improvements could be made by recording data and then having a post-processing step, which could aggregate insignificant buckets, for example. I don't know whether that post-processing should happen within rustc itself or be a separate tool.

@nikomatsakis
Copy link
Contributor Author

@nnethercote I think that "time-passes" and the profile-queries concepts probably ought to be merged, and renamed to something like -Zself-profile. One problem with time-passes is that the compiler is moving towards a model where it doesn't have passes — but we're not there yet. This means that the implementation model of time-passes is pretty broken: it is based around some top-level calls, but in fact those calls can sort of unreliably include a bunch of unrelated work. For example, invoking the MIR borrow checker may also construct MIR, but it's probably not a good idea to account those the same. This is what profile-queries was meant to fix, but as you observed, the impl is broken. I think we should probably fix profile-queries and simplify it — for example, it doesn't seem like we've really had much need for the HTML-based output it produces, and we'd rather just kind of get a flat list of which queries are taking up time. (And yes it'd be nice to group those queries, because often we don't need that much detail.)

I think my preference would be that when we declare queries, we also declare a "phase" or something that is supposed to be used for a coarse-grained dump (and then you could use -Zfull-profile or something if you want the full details).

@wesleywiser
Copy link
Member

I'd like to work on this.

@michaelwoerister
Copy link
Member

It would be great if we had a solution that can deal well with parallelism, potentially also showing points of contention. We have the (possibly bitrotted) -Ztrans_time_graph option which generates a diagram that is rather crude but was indispensable for implementing the scheduler correctly. There was also some discussion about generalizing this and using a known output format that can be visualized in existing tools here: #43804

That being said, I don't think we should try to fix any of the existing solutions but build a new one. It should support at least a pure text summary format that can easily be collected by users and pasted in bug reports. An additional, more detailed output format that can be easily processed by other tools would be great too.

@nnethercote, are there any lessons to be learned from FF's profiling support? Or the valgrind-based tools?

Also cc the whole @rust-lang/wg-compiler-performance

@nnethercote
Copy link
Contributor

Apologies for the slow reply.

Text-based output is indeed extremely convenient for pasting into bugs, etc., so I recommend it whenever it's appropriate. The simplest thing to do is just dump out the final text directly from rustc, as -Ztime-passes currently does, but that's inflexible.

The alternative is to write data to a file and then have a separate viewing tool. If the data has a tree-like structure (as -Ztime-passes already does) then the ability to expand/collapse subtrees is very useful. Firefox's about:memory is a good example of this. Here's some sample output:

397.53 MB (100.0%) -- explicit
├──108.74 MB (27.35%) ── heap-unclassified
├───63.97 MB (16.09%) -- js-non-window
│   ├──39.85 MB (10.02%) -- zones
│   │  ├──35.94 MB (09.04%) -- zone(0x114a80000)
│   │  │  ├──18.46 MB (04.64%) ++ (36 tiny)
│   │  │  └──17.48 MB (04.40%) -- realm([System Principal], shared JSM global)
│   │  │     ├───8.81 MB (02.22%) ++ classes
│   │  │     └───8.67 MB (02.18%) ++ (11 tiny)

It's all text so it can be copy and pasted easily. (The lines are Unicode light box-drawing chars \u2500, \u2502, \u2514, \u251c, which usually render nicely in terminals and text boxes.) But it's also interactive: when viewing it within Firefox you can click on any of those lines that have "++" or "--" to expand or collapse a sub-tree. Also, by default it only shows as expanded the sub-trees that exceed a certain threshold; this is good because you can collect arbitrarily detailed data (i.e. as many buckets, sub-buckets, sub-sub-buckets, etc., as you like) without it being overwhelming.

I would recommend using a web browser as your display tool. One possibility is to generate an HTML page(s) directly from rustc, as is done by the profile-queries. Another possibility is to generate some other intermediate format (JSON?) and then write a webpage/website that reads that in, as is done by rustc-perf's site. The latter approach might be more flexible.

@wesleywiser
Copy link
Member

I've got a very basic version of this here. Here's a sample of the output:

wesley@endurance:~/code/rust/rustc-serialize> RUSTFLAGS="-Zself-profile" cargo build
   Compiling rustc-serialize v0.3.24 (file:///home/wesley/code/rust/rustc-serialize)
Self profiling results for rustc_serialize:
Parsing         		 27             
Expansion       		 132            
TypeChecking    		 2199           
	1597268 hits 1666449 queries
BorrowChecking  		 150            
	10064 hits 14436 queries
Codegen         		 2098           
	44011 hits 54580 queries
Linking         		 17             
	4724 hits 5660 queries
Other           		 601            
	2389409 hits 2461645 queries

Optimization level: No
Incremental: on
    Finished dev [unoptimized + debuginfo] target(s) in 5.36 secs

It might be nice to have a category for "incremental overhead" such as hashing & serializing/deserializing the graph? Perhaps Parsing could also output the number of lines parsed to give a sense for the size of the crate? Are there any other rustc options that would be useful to include in the output?

@michaelwoerister
Copy link
Member

That looks very cool! Some feedback:

  • It would be good to display the time unit (are these ms?).
  • Is hits cache hits? It might be more readable to display a ratio instead of absolute numbers.
  • Some other rustc options take the --verbose flag into account. This could be used here to control how much data this provides.
  • Having more data about incremental would be great. Some of the aggregate data from -Zincremental-info could also be shown here (like CGU re-use, red vs green nodes, etc).

@nikomatsakis
Copy link
Contributor Author

This is awesome work @wesleywiser! 💯

I agree with @michaelwoerister's suggestion that we include ms or some other unit. I also think it'd be nice to display the data more "tabularly". Something like this:

| Phase                  | Time (ms) | Queries | Hits (%) |
| -----                  | --------- | ------- | -------- |
| Parsing                | 27        |         |          |
| Expansion              | 132       |         |          |
| TypeChecking           | 2199      | 1666449 | 95.8     |
| BorrowChecking         | 150       | 14436   | 69.7     |
| Codegen                | 2098      | 54580   | 80.6     |
| Linking                | 17        | 5660    | 83.5     |
| Other                  | 601       | 2461645 | 97.1     |

I've gone ahead and made it valid Markdown, too, so that it can easily be pasted for nice display =)

Phase Time (ms) Queries Hits (%)
Parsing 27
Expansion 132
TypeChecking 2199 1666449 95.8
BorrowChecking 150 14436 69.7
Codegen 2098 54580 80.6
Linking 17 5660 83.5
Other 601 2461645 97.1

Finally, I think we display the time as a percentage of the total for each phase.

As a last point, I think it'd be great if — to determine the total time — we do a single measurement from the "outside". This will help us to ensure we don't have "leakage" — that is, time that is not accounted for. Or at least it will give us an idea how much leakage there is.

@wesleywiser
Copy link
Member

Thanks for the feedback @nikomatsakis and @michaelwoerister! Those are all great ideas and I'll start working on those.

@nnethercote
Copy link
Contributor

So it looks like the output is being produced directly by rustc, is that right?

@wesleywiser
Copy link
Member

@nnethercote Yes, that's correct.

@wesleywiser
Copy link
Member

Progress update: I've implemented much of the above feedback. The output now looks like this:

wesley@endurance:~/code/rust/rustc-serialize> RUSTFLAGS="-Zself-profile -Zprofile-json" cargo build
   Compiling rustc-serialize v0.3.24 (file:///home/wesley/code/rust/rustc-serialize)
Self profiling results for rustc_serialize:

| Phase            | Time (ms)      | Queries        | Hits (%) |
| ---------------- | -------------- | -------------- | -------- |
| Parsing          | 27             |                |          |
| Expansion        | 132            |                |          |
| TypeChecking     | 2182           | 3321830        | 96.17    |
| BorrowChecking   | 144            | 28872          | 69.71    |
| Codegen          | 2065           | 110571         | 79.61    |
| Linking          | 17             | 11320          | 83.46    |
| Other            | 590            | 4922566        | 97.07    |

Optimization level: No
Incremental: on
    Finished dev [unoptimized + debuginfo] target(s) in 5.29 secs

In addition, -Zprofile-json causes a file containing the timing data to be created:

wesley@endurance:~/code/rust/rustc-serialize> cat self_profiler_results.json 
[
            {
                "category": "Parsing",
                "time_ms": 27
            },
            {
                "category": "Expansion",
                "time_ms": 132
            },
            {
                "category": "TypeChecking",
                "time_ms": 2182
            },
            {
                "category": "BorrowChecking",
                "time_ms": 144
            },
            {
                "category": "Codegen",
                "time_ms": 2065
            },
            {
                "category": "Linking",
                "time_ms": 17
            },
            {
                "category": "Other",
                "time_ms": 590
            }
        ]

@nnethercote
Copy link
Contributor

One suggestion: the JSON file should contain enough data to exactly recreate the -Zself-profile output. Currently it appears to lack the information to recreate:

  • the "Self profiling results for rustc_serialize:" line;
  • the "Optimization level: No" line;
  • the "Incremental: on" line;
  • the "Queries" and "Hits" columns.

If it's possible to get the full command line that rustc was invoked with, I suggest including that in the JSON output too. That way if you end up with a bunch of different JSON files you have full data on how each one was created.

Finally, whenever you have coarse categories like this, the natural inclination is for users to want more detail. "Codegen takes 2065ms... can we break that down some more?" (I experienced this back in the early days of Firefox's about:memory.) This naturally takes us back to the idea of a tree-like structure, where you have categories, sub-categories, sub-sub-categories, etc.

@andjo403
Copy link
Contributor

andjo403 commented Jun 6, 2018

a while ago I did some experiments with replacing time-passes with chrome tracing events I have updated my branch of rustc now and have some results posted on this page.

a sad thing is that the pages generated only works in chrome.

For some of the events I have extra data to describe the event so it is possible to see why some parts takes more time.

maybe some thing similar can be used for the verbose mode.

@nikomatsakis
Copy link
Contributor Author

@wesleywiser this looks really nice.

bors added a commit that referenced this issue Aug 3, 2018
Implement a self profiler

This is a work in progress implementation of #50780. I'd love feedback on the overall structure and code as well as some specific things:

- [The query categorization mechanism](https://github.com/rust-lang/rust/compare/master...wesleywiser:wip_profiling?expand=1#diff-19e0a69c10eff31eb2d16805e79f3437R101). This works but looks kind of ugly to me. Perhaps there's a better way?

- [The profiler assumes only one activity can run at a time](https://github.com/rust-lang/rust/compare/master...wesleywiser:wip_profiling?expand=1#diff-f8a403b2d88d873e4b27c097c614a236R177). This is obviously incompatible with the ongoing parallel queries.

- [The output code is just a bunch of `format!()`s](https://github.com/rust-lang/rust/compare/master...wesleywiser:wip_profiling?expand=1#diff-f8a403b2d88d873e4b27c097c614a236R91). Is there a better way to generate markdown or json in the compiler?

- [The query categorizations are likely wrong](https://github.com/rust-lang/rust/compare/master...wesleywiser:wip_profiling?expand=1#diff-19e0a69c10eff31eb2d16805e79f3437R101). I've marked what seemed obvious to me but I'm sure I got a lot of them wrong.

The overhead currently seems very low. Running `perf` on a sample compilation with profiling enabled reveals:
![image](https://user-images.githubusercontent.com/831192/41657821-9775efec-7462-11e8-9e5e-47ec94105d9d.png)
@wesleywiser wesleywiser self-assigned this Aug 15, 2018
@XAMPPRocky XAMPPRocky added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Oct 2, 2018
@wesleywiser
Copy link
Member

I believe this can be closed now. We have the -Zself-profile option, tools to process that data and documentation on how to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance
Projects
None yet
Development

No branches or pull requests

6 participants