Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update status periodically in smart terminals #2516

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

digit-google
Copy link
Contributor

Ensure that time-related NINJA_STATUS formatters such as %w as properly updated when running in a smart terminal.

This fixes issue #2515 by ensuring that Ninja can refresh its status even when only long commands are running.

  • Modify Subprocess::DoWork() to accept a timeout, and return an enum describing the exit condition.
  • Add Status::Refresh() and StatusPrinter::Refresh() to update the status on smart terminals.
  • Modify RealCommandRunner to call DoWork() with a timeout, and call status->Refresh() on expiration to trigger the update.

The end result is that the status is properly updated in smart terminals when a single long command is invoked (see the example build plan in #2515).

NOTE: This adds a new regression test in misc/output_test.py, however, the test is flaky since it is time-dependent, i.e. it runs successfully on a lightly loaded laptop, but fails frequently on Github CI. Hence it is disabled by default but can be run manually during development.

@jhasse
Copy link
Collaborator

jhasse commented Oct 31, 2024

Respect for the low-level timer implementations!

I'm actually working on a complete rewrite of Ninja's frontend that also fixes this. It does change a lot other stuff too though.

In order to have this working without an environment variable and to fix a few issues with ninja's frontend that require breaking changes, my plan would be to do this in ninja 2.0. This new v2 would live alongside ninja v1 so that we can test stuff and experiment with ideas (you mentioned some in #2515 (comment)). The new v2 would be heavier and shipped as a second binary (see #1499). I will open a PR for that in the next few days.

@digit-google
Copy link
Contributor Author

Interesting, would you care sharing your plans about this rewrite? This might have a very important impact on the Fuchsia fork, as we try very hard to stay close to upstream (i.e. I routinely rebase our stack of patchs on top of upstream Ninja to make them manageable without any merges).

Since we are sharing our intentions:

I didn't mention here is that for the Fuchsia fork, which also needs to implement a client/server mode, we refactored Ninja considerably to introduce an AsyncLoop abstraction which allows for waiting for different types of I/O events, and made SubprocessSet use it for both Posix and Win32. This also supports timers, which are used in our implementation of StatusTable, which is how we print running commands in smart terminals.

I didn't want to introduce such level of complexity here, and I don't think upstream needs the AsyncLoop interface at this point.

I have also another branch to implement StatusTable in upstream Ninja that uses a simpler version of the class (i.e. one which doesn't depend on AsyncLoop at all). It hasn't been uploaded as a PR yet, but works on top of this PR properly.

For the record, this has been used by my team for over a year, and has helped tremendously in identifying build commands that were bogging the build for no good reason, which led to several build performance improvements that would not have been possible with Ninja's current output. Which is why I think having such a feature in v1 would benefit

Finally, we are working to introduce in our own fork a new Status sub-class to generate a Build Event Protocol stream. The BEP originated in Bazel but has become a format supported by many other build / CI analytics tools, it is a binary protobuf with a well-known fixed schema that provides rich information about what's happening during the build, is easy to convert to other formats (in particular into perfetto traces), and doesn't have the problems of JSON (slow encoding + bad non-UTF8 support). We do not expect upstream to accept such a feature though, but any change to how the frontend works is important for us. Surprisingly it doesn't add much code to the source tree.

FWIW, the Android Ninja fork already has such a feature, except that they use their own custom protobuf schema, which includes far more metrics in the event stream, but we prefer to stick to BEP for compatibility. They have been using it for years and are very happy with it.

Finally, I am a little wary about delivery. A major rewrite like the one you describe would take a large amount of work, and I didn't have the feeling that you have a lot of time to work on this. How many months / years are you forecasting for such work? I am really curious to know.

@jhasse
Copy link
Collaborator

jhasse commented Oct 31, 2024

Interesting!

I think some kind of async event loop is a good idea. I've used Boost.Asio since I'm not good enough to implement that from scratch, but I guess Boost is banned at Google because NIH? I guess a hand-crafted solution is also much faster to compile, results in a smaller binary and can be tuned for performance.

Those extensions you have in your Fuchsia branch are very interesting! That's why I dig the idea of a second v2 binary / folder: I could basically merge PRs that only touch v2 right away without much considerations, because it would be marked as "beta" and I don't have to worry about breaking something for existing users.

These are the breaking changes I have in mind:

** IMPORTANT NOTE BEFORE PEOPLE START COMPLAINING: ** These are only ideas!

How many months / years are you forecasting for such work? I am really curious to know.

As that would be in parallel I would guess years? Because it's a second binary and the source would be in a v2 directory, I could quickly merge PRs that only touch on that code. My hope is that this encourages people to help out.

For the past years there were many good PRs that added useful functionality but weren't merged because there's always the risk of breaking something. I want to change that while keeping the stable ninja v1.

btw: My changes don't touch the existing code that much, nearly nothing at all. So you shouldn't have any problem rebasing the Fuchsia fork on top of them :)
I really should find the time to open the PR and show you my prototype ...

@digit-google digit-google force-pushed the periodic-status-updates branch 2 times, most recently from a8d5a9f to c36bdc2 Compare November 11, 2024 13:09
@digit-google digit-google force-pushed the periodic-status-updates branch from c36bdc2 to f236c88 Compare November 25, 2024 09:03
Allow the DoWork() method to return after a given timeout
has expired. The new method returns a WorkResult value which
is an enum that can take three values to indicate process
completion, user interruption or that the timeout has
expired.
Add a method to the abstract Status interface to refresh
the status after some time has passed. Implement it properly
in StatusPrinter.

+ Fix a bug in FormatProgressMessage() which ignored the
  value pass as argument (time_millis), and was using
  a member value instead (time_millis_). In practice, they
  were the same, but this is no longer true after this
  change.
Use the new Subprocess::DoWork(int64_t) method to wait for at
most one second before updating the status in the Ninja terminal.

NOTE: A new output_test.py is added to check this feature, but
      since it is time-dependent, it tends to fail on Github CI
      so is disabled by default. It is possible to run it manually
      though on a lightly-loaded machine.
@digit-google digit-google force-pushed the periodic-status-updates branch from f236c88 to b09f853 Compare January 10, 2025 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants