Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up precommit [LUCENE-9861] #10900

Closed
asfimport opened this issue Mar 22, 2021 · 22 comments
Closed

speed up precommit [LUCENE-9861] #10900

asfimport opened this issue Mar 22, 2021 · 22 comments

Comments

@asfimport
Copy link

A lot of the java tools for precommit aren't being called in efficient ways (compilation, linting, etc).

For example ecjlint, it runs very slow:

Aggregate task times (possibly running in parallel!):
 271.73 sec.  ecjLintMain
 270.18 sec.  ecjLintTest
 227.17 sec.  compileJava
  12.07 sec.  compileTestJava
   1.21 sec.  processResources
   0.18 sec.  clean

Simplying adding a couple reasonable jvm arguments to the ecj linter (jvmArgs = [ '-XX:+UseParallelGC', '-XX:TieredStopAtLevel=1' ]) speeds it up significantly.

Speedup for ecjLint is 3x for me:

Aggregate task times (possibly running in parallel!):
 163.38 sec.  compileJava
  84.57 sec.  ecjLintMain
  83.12 sec.  ecjLintTest
   6.11 sec.  compileTestJava
   0.95 sec.  processResources
   0.15 sec.  clean

I imagine same may be true for a lot of these tasks. We're currently tossing default jvm args at these short-lived subprocesses, which is very suboptimal.


Migrated from LUCENE-9861 by Robert Muir (@rmuir), resolved Mar 23 2021
Attachments: LUCENE-9861_hack.patch
Linked issues:

Pull requests: #33

@asfimport
Copy link
Author

Robert Muir (@rmuir) (migrated from JIRA)

For renderJavadoc this is also 2x speedup:
Before:

Aggregate task times (possibly running in parallel!):
 357.87 sec.  renderJavadoc
 259.61 sec.  compileJava
   1.05 sec.  processResources
   0.28 sec.  clean

After:

Aggregate task times (possibly running in parallel!):
 182.77 sec.  renderJavadoc
 166.17 sec.  compileJava
   0.94 sec.  processResources
   0.16 sec.  clean

I haven't yet attacked javac, the reason you see tasks get faster there is just because C2 compiler threads from javadoc aren't eating up all my cpu and slowing them down.

@asfimport
Copy link
Author

Robert Muir (@rmuir) (migrated from JIRA)

My current hack patch. Still want to investigate javac and any other bad guys that are slow during precommit.

Also I think we need not hardcode the args? I already have this stuff set correctly in org.gradle.jvmargs, should we just plumb those to these subprocesses?

It will be a little annoying for some things such as javadoc as we'll need to put -J in front of each arg. cc: @dweiss

@asfimport
Copy link
Author

Robert Muir (@rmuir) (migrated from JIRA)

simple bench of current hack patch

before:

$ ./gradlew clean precommit
BUILD SUCCESSFUL in 6m 20s
581 actionable tasks: 514 executed, 67 up-to-date

after:

$ ./gradlew clean precommit
BUILD SUCCESSFUL in 3m 55s
581 actionable tasks: 514 executed, 67 up-to-date

For reference, here are the task times from the output:

Aggregate task times (possibly running in parallel!):
 194.10 sec.  compileTestJava
 185.43 sec.  renderJavadoc
 184.49 sec.  compileJava
  85.09 sec.  ecjLintMain
  81.87 sec.  spotlessJava
  78.61 sec.  ecjLintTest
  29.43 sec.  validateSourcePatterns
   7.77 sec.  rat
   7.41 sec.  jar
   3.07 sec.  compileToolsJava
   2.66 sec.  ecjLintTools
   1.52 sec.  verifyLocks
   1.28 sec.  processResources
   0.86 sec.  copyTestResources
   0.75 sec.  gitStatus
   0.75 sec.  validateJarChecksums
   0.44 sec.  collectJarInfos
   0.15 sec.  spotlessInternalRegisterDependencies
   0.12 sec.  clean
   0.09 sec.  validateJarLicenses
   0.07 sec.  processTestResources
   0.05 sec.  syncConf
   0.04 sec.  spotlessJavaCheck
   0.02 sec.  checkDanglingLicenseFiles
   0.01 sec.  checkWorkingCopyClean
   0.01 sec.  versionsPropsAreSorted

So I still need to investigate javac, spotlessJava, and validateSourcePatterns

@asfimport
Copy link
Author

Dawid Weiss (@dweiss) (migrated from JIRA)

Whaaat? :))

You should really be looking at all dependencies of the "check -x test" task (checks without tests). Precommit is just a subset of those checks I typically run.

As for javadoc - I had this crazy idea that it'd be great to combine multiple doclets into a single pass over the sources. Javadoc is repeating the job of javac (and multiple javadoc passes are doing it multiple time). It'd be great to reuse the compile tree somehow. Don't know how to do it but it sounds super hacky... I mean, exciting.

@asfimport
Copy link
Author

Dawid Weiss (@dweiss) (migrated from JIRA)

I did take a look at the patch... oh, man. There is a lovely way to pull it up - let me give it a try in an hour or so.

@asfimport
Copy link
Author

Uwe Schindler (@uschindler) (migrated from JIRA)

I would really like to look also into not forking a subprocess? Javac runs perfectly fine inside the Gradle JVM, why do we need to fork for each ECJ invocation? In Ant, ECJ used to run inside the Compiler Adapter of Ant and runs in-process.

@asfimport
Copy link
Author

Robert Muir (@rmuir) (migrated from JIRA)

Not forking is fine too. But forking with C2 enabled in order to run for only 5-20seconds is where things get bad (that's what we are doing today)

@asfimport
Copy link
Author

Uwe Schindler (@uschindler) (migrated from JIRA)

Yes, especially also renderJavadoc should use the better JVM options. RenderJavadoc can't be in-process, as javadoc calls System.exit().

But for stuff that works perfectly fine in the Gradle JVM, disabling forking also helps because it can run for very long time do class loading cost goes away, optimizer can optimize and Gradle daemon can keep it alive.

@asfimport
Copy link
Author

Robert Muir (@rmuir) (migrated from JIRA)

Also the "-times" can be removed from the patch. I was using it for debugging the ecj speed, but it would be noise to most people.

@asfimport
Copy link
Author

Robert Muir (@rmuir) (migrated from JIRA)

But for stuff that works perfectly fine in the Gradle JVM, disabling forking also helps because it can run for very long time do class loading cost goes away, optimizer can optimize and Gradle daemon can keep it alive.

In my tests it is better to run gradle's JVM with C1-only, too... even though it is longer lived. Maybe because of all the groovy and stuff, I'm not sure. Note: I don't use the daemon, but we are still talking about a process running for minutes and still C2 only slows things down.

@asfimport
Copy link
Author

Dawid Weiss (@dweiss) (migrated from JIRA)

I've refactored this into a somewhat more shared logic (see the PR).

@asfimport
Copy link
Author

Dawid Weiss (@dweiss) (migrated from JIRA)

The problem with running javadocs in-process is that I'm slightly worried about heap... we already have it quite high for gradle. Also, more importantly, javadoc is forked for the target JVM selected for compilation/ tests - this  would be inconsistent if we used gradle's VM.

@asfimport
Copy link
Author

Dawid Weiss (@dweiss) (migrated from JIRA)

My results. After (sorry it was, the other way around):

Aggregate task times (possibly running in parallel!):
  77.53 sec.  renderSiteJavadoc
  76.72 sec.  renderJavadoc
  75.67 sec.  compileJava
  60.92 sec.  compileTestJava
  34.16 sec.  ecjLintMain
  30.85 sec.  ecjLintTest
  29.53 sec.  checkBrokenLinks
  29.18 sec.  clean
  24.42 sec.  spotlessJava
...
BUILD SUCCESSFUL in 1m 47s

Before:

 136.29 sec.  renderJavadoc
 115.73 sec.  renderSiteJavadoc
  93.04 sec.  compileJava
  76.30 sec.  compileTestJava
  52.09 sec.  ecjLintMain
  48.96 sec.  ecjLintTest
  41.54 sec.  spotlessJava
  30.09 sec.  clean
  29.71 sec.  checkBrokenLinks
  12.31 sec.  rat
  10.65 sec.  validateSourcePatterns
   7.58 sec.  checkUnusedConstraints
...
BUILD SUCCESSFUL in 2m 12s

Wall-time difference isn't that big but this is an amd ryzen machine with lots of cores.

@asfimport
Copy link
Author

Uwe Schindler (@uschindler) (migrated from JIRA)

Looks slower after? 🤔

@asfimport
Copy link
Author

Robert Muir (@rmuir) (migrated from JIRA)

If you have 87 cores, you can afford to spend a bunch of unused ones compiling c2 code to make the build go a few seconds faster.

If you only have limited cores like me, when you run the build, computer is fully maxed out: you have 100% cpu usage. so cycles wasted on c2 add whole minutes to the build.

@asfimport
Copy link
Author

Robert Muir (@rmuir) (migrated from JIRA)

Another way to look at it, if you see that, you've set gradle parallelism too low and you have unused cores. Better to increase it so you are doing 64 of these things at a time than to only do 16 at a time and burn 48 cores on c2 :)

@asfimport
Copy link
Author

Dawid Weiss (@dweiss) (migrated from JIRA)

The improvement is clearly seen on individual tasks so I'm +1 for committing this in. I agree my work computer isn't representative - it's a beast (I love it).

@asfimport
Copy link
Author

Uwe Schindler (@uschindler) (migrated from JIRA)

+1 for committing

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 078d007 in lucene's branch refs/heads/main from Dawid Weiss
https://gitbox.apache.org/repos/asf?p=lucene.git;h=078d007

LUCENE-9861: pull tuned vm options into a separate aspect. (#33)

@asfimport
Copy link
Author

Robert Muir (@rmuir) (migrated from JIRA)

Thank you @dweiss, this one really helps my computer.

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 078d007 in lucene's branch refs/heads/jira/LUCENE-9856-static-analysis from Dawid Weiss
https://gitbox.apache.org/repos/asf?p=lucene.git;h=078d007

LUCENE-9861: pull tuned vm options into a separate aspect. (#33)

@asfimport
Copy link
Author

Adrien Grand (@jpountz) (migrated from JIRA)

Closing after the 9.0.0 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant