-
-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance: debugger source lookup becomes unusable #5367
Comments
I did an analysis. I pulled the latest changes from git to commit 5853f25 - and I added a log point in the deferred SHA256 calculation:
Then I build, installed the freshly built plugins into my Eclipse and opened a The log point was triggered way over 40k times and the file had still not opened in the BND runfile editor. I was only able to capture the first 23337 calls to the logpoint in a log file because I had to briefly leave then and when I came back the scroll buffer had already overflown... anyway... So from 23337 times the log-point was triggered, here are the top-ten times a SHA256 digest was calculated for a given file. You see here the name of the file and how often during the opening of a single BND runfile the same hash was calculated over and over again. For most files, calculating the has only takes a couple of milliseconds - but I also have some large files for which it takes several seconds.
@bjhargrave you wrote that even if the calculation is deferred now, it needs to be called at some point - ok, fine - but does it have to be called 40+ times on the same file even while just simple wanting to open a runfile in the editor? |
I have built myself a locally modified version with a little cache for the SHA256 hashes. It expires hashes that are older than 30 minutes or if the file modification timestamp on disk differs from the one recorded in the cache. It also records misses, file updates and hits - here an example:
(Those are the numbers for opening my bndrun file in the bnd runfile editor directly after starting Eclipse.) Also, overall it feels quite a bit faster now... I still have to see if I notice any negative side effects of this cache. |
You are using m2e which is using Bndtools m2e to handle the bndrun support. The bndrun support depends upon OSGi repositories which are made using FileSetRepository. The FileSetRepository is then a collection of ResourceImpls whose equals method depends upon the SHA256 value. It is unclear to me from the information here how many times FileSetRepositories are being created. Each FSR will have distinct ResourceImpls which each need to compute the SHA256 value. I would need a small github repo which demonstrates the problem to investigate further. Otherwise, I am just guessing on what the actual issue is an how to solve it. |
Well you can assume that there are like a couple of hundred Eclipse projects in the workspace, each of which is an m2e module and each of which includes the bnd maven plugins to generate the OSGI metadata for the main artifact, for the test artifact, which run the bnd resolver maven plugin to update the The m2e modules cluster into several groups that each inherit from a joint parent pom module. That parent pom usually contains a large dependency management section which manages all dependency versions for the entire reactor. The |
The set of dependencies of all of these projects is probably the ~1000 JAR files that end up in my cache (see comment above) - so if two of the repositories refer to a ResouceImpl that has the same path, it should usually be the case that the file also has the same hash - so information about hashes calculated for files referenced from multiple repositories should be centrally sharable, no? |
This is the makeshift static cache I hacked myself together for the moment - not very well done, but apparently effective...
|
Looking at the code, I note that there is a small bug - the |
I am not sure I see the value of the DeferredValue classes if we will have a cache of file->SHA values. Both are overkill. I would want to rip out the DeferredValue change if we move to a SHA cache. Then ResourceBuilder.addFile would itself consult a new SHA cache class instead directly using SHA256.digest. |
I guess deferring the calculation until it is needed could still save effort in case the value is never needed at all. I did notice a performance improvement when you first put in deferral. But at the time, I still had fewer Eclipse projects converted to OSGi bundles. The more projects I migrated, the slower it appeared to become again. |
It should be easy for you to test the removal of using the DeferredValue class in the SHA value calculation while keeping your SHA cache. |
I think instead of removing it, I might add counters that count calls to Btw. I tried writing log messages through an SLF4J logger in |
If you start Eclipse from the command line, then you should be able to see the console output. We use logback. See bnd/bndtools.core/bndtools.shared.bndrun Lines 11 to 15 in 5853f25
bnd/bndtools.core/bndtools.shared.bndrun Line 26 in 5853f25
Lines 1 to 15 in 5853f25
|
So here some statistics - again, opening one of my
That gives 1055 times that |
... I tried running Eclipse from the terminal and I do see some messages there, but not from the logger call I added to the cache code. So I used the debugger with the logpoint again ;) |
You will probably need to edit the logback.xml to set the desired log level for your logger name. |
1055 is only 2% of the times addFile was called. So that does not seem to be large enough to justify retaining the complexity of the DeferredValue code in addition to adding the complexity of a SHA cache. |
Still, 10 seconds additional delay simply when trying to open a bndrun file in the editor is noticeable by the user... personally, I'd vote for keeping the deferral in. |
You don't know it is 10 sec delay. You just pulled a number out of thin air :-) |
Well, if I look at my logpoint output, I see lots of times in the 5ms-10ms range for calculating the hash sums. As I said before, I have some files that take several seconds for the hash to be calculated, so the true average is probably higher than 10ms. |
I think that 2% of the total time spent computing SHA hash values is a more accurate time estimate since you don't know that the files whose SHA was not needed have expensive SHA calculations or not. |
Here's a more accurate measurement. I took the first 1000 timings of SHA256 calculations from the run I logged ealier. These sum up to 17847 ms - that's ~17ms on average per file. |
FYI here's some notes on logging in m2e/bndtools on eclipse https://github.com/bndtools/bnd/wiki/Eclipse-Logging-with-Logback |
Here are the timings ;) batch of 1000, times in ms. The longest in this batch was 1039ms, but I have seen also longer calculations in the full run. |
So if we assume 10ms per SHA256 calculation and we had 1004 cache misses in the above calculation, that would be 10 seconds with the deferral in place. If we removed the deferral, we'd get an additional 10 seconds - that would add up to 20 seconds in total - twice the time as without deferral. |
Ok, what we do not know is if the calls saved by the deferral would have been cache hits - in that case, the deferral would be indeed pointless. |
So, new numbers from my little real-world scenario here:
That means, the SHA256 calculation for ~100 files was saved in this case by the deferral. Assuming a conservative 10ms per file, that would mean a slow-down by at least 1 second if the deferral would be removed for this scenario. |
The cache reduces the need to create new Resource objects, including SHA-256 computation, for unchanged files. Fixes bndtools#5367 Signed-off-by: BJ Hargrave <bj@hargrave.dev>
The cache reduces the need to create new Resource objects, including SHA-256 computation, for unchanged files. Fixes bndtools#5367 Signed-off-by: BJ Hargrave <bj@hargrave.dev>
The cache reduces the need to create new Resource objects, including SHA-256 computation, for unchanged files. Fixes bndtools#5367 Signed-off-by: BJ Hargrave <bj@hargrave.dev>
The cache reduces the need to create new Resource objects, including SHA-256 computation, for unchanged files. Fixes bndtools#5367 Signed-off-by: BJ Hargrave <bj@hargrave.dev>
The cache reduces the need to create new Resource objects, including SHA-256 computation, for unchanged files. Fixes bndtools#5367 Signed-off-by: BJ Hargrave <bj@hargrave.dev>
The cache reduces the need to create new Resource objects, including SHA-256 computation, for unchanged files. Fixes bndtools#5367 Signed-off-by: BJ Hargrave <bj@hargrave.dev>
The cache reduces the need to create new Resource objects, including SHA-256 computation, for unchanged files. Fixes bndtools#5367 Signed-off-by: BJ Hargrave <bj@hargrave.dev>
@bjhargrave thanks :) I'm building and trying it. |
When I attach a debugger with a log point in |
The cache reduces the need to process files to Resource objects, including SHA-256 computation, for unchanged files. Fixes bndtools#5367 Signed-off-by: BJ Hargrave <bj@hargrave.dev>
The cache reduces the need to process files to Resource objects, including SHA-256 computation, for unchanged files. Fixes bndtools#5367 Signed-off-by: BJ Hargrave <bj@hargrave.dev>
The cache reduces the need to process files to Resource objects, including SHA-256 computation, for unchanged files. Fixes bndtools#5367 Signed-off-by: BJ Hargrave <bj@hargrave.dev>
This is a kind-of follow-up issue to #5322
I am currently using a bnd 6.4.0-SNAPSHOT from the p2 update site.
When I try to debug, the source lookup has become unbearably slow. Checking what happens, I found the checksum calculation in the BND plugins to be the culprit again. The screenshot below shows who is calling the
aQute...Digestr.write()
method - which is mainly theResourceImpl.equals()
.I really wonder... is my setup so extraordinary with a couple of hundred Maven modules and bundles? Am I doing something wrong?
The text was updated successfully, but these errors were encountered: