-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add component type to dependencies to self #490
Add component type to dependencies to self #490
Conversation
64e849a
to
4396ab6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot claim (at all) that I had understood how all the AccessContext
, TopProcess
, CompletionProcess
, MemberDependenciesByTarget
worked before... 🙈
... but I definitely agree that the new concept of ReverseDependencies
is much simpler! 👍
It would take me much more time to understand everything in detail, but IMO the Christmas release shouldn't be blocked by that.
Regarding the speedup: You were only looking at the reverse dependency lookup, right? (What did you measure exactly?)
For the entire import (or even execution) of the ClassFileImporterSlowTest
, I didn't see a significant difference, nor for the class file import in some larger project.
archunit/src/main/java/com/tngtech/archunit/core/importer/AccessRecord.java
Show resolved
Hide resolved
archunit/src/main/java/com/tngtech/archunit/core/domain/ReverseDependencies.java
Outdated
Show resolved
Hide resolved
for (JavaClass javaClass : classes) { | ||
if (signaturePredicate.exists(javaClass, target)) { | ||
matching.add(javaClass); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was initially wondering whether we could save some iterations by breaking the loop once we know that there are multiple matches:
} | |
if (matching.size() > 1) { | |
break; | |
} | |
} |
But it seems that there usually aren't that many matches anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this only occurs, if a method has been overridden several times in the hierarchy. E.g. if you have
class Base {
void foo() {
// some impl
}
}
class Child extends Base {
@Override
void foo() {
// also some impl
}
}
class GrandChild extends Child {
// not overridden here, otherwise it would be directly found
}
And then by accident you pick Base.foo()
as the first candidate to find the target of a call to GrandChild.foo()
. I'll still look into it 👍 (unfortunately the whole "start with any method in the hierarchy is pretty sub-optimal" 😞)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think I'll leave it out for now. I could not find any noticeable difference in performance. And the problem is, that the benefit is not completely clear for me. We are exchanging a couple of unnecessary loops in a rare case against an operation we have to execute in every single loop. Take the extreme case that we for example have no single method that has been overridden multiple times in the whole set of imported classes, then we would actually even perform worse than if we don't do the check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I absolutely agree.
When I initially thought about it, I hadn't realized what containsExactlyOneMatch
is actually used for...
*/ | ||
@PublicAPI(usage = ACCESS) | ||
public Set<JavaAnnotation<?>> getAnnotationsWithParameterTypeOfSelf() { | ||
return reverseDependencies.getAnnotationsWithParameterTypeOf(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the test coverage, this can – as well as getInstanceofChecksWithTypeOfSelf
– just be simplified to:
return Collections.emptySet();
Maybe we should have some tests for those new methods...? 😜
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, now I remember that I wanted to add some additional test 🙈 In particular, because for the final state this method is not used anymore to create the reverse dependencies 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I added tests for the two new public API methods 😃
Thanks a lot for looking into it!! Yes, this is probably one of the most complicated parts 😉 If you understand the code after better than before, I would say mission accomplished 😂 I mean, there is IMHO a really good test coverage, so I'm not too scared that I have destroyed anything fundamental with this refactoring. But there can always be some subtle corner cases I've overlooked. Regarding performance, yes, the dependency and reverse dependency creation is a lazy process. So what I did to measure the difference is to add the following to
If you don't do anything like that, then all the changed code is not executed anyway 😉 |
4396ab6
to
4b6c645
Compare
For component type dependencies we have overlooked to filter out self dependencies. Such dependencies occur in particular for the bytecode generated for enums. I have now encapsulated the actual dependency creation in a specific method and moved the check into that method to make it easier to remember this in the future. I have also added an argument validation into the constructor of `Dependency` as another safeguard. Funny thing is, that this was clearly visible in the frozen rules store being modified, yet I did not realize it, when I reviewed the change. Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
At the moment `isArray()` and `getComponentType()` can behave inconsistent, if participating classes are only imported partially. I.e. `isArray()` looks at the descriptor, which might say "this is an array type", while `getComponentType()` cannot resolve the component type, because we did not import the respective class. Since this is definitely a plausible case (even though we should probably automatically resolve these types similarly to call targets, type hierarchy or annotations), we should be more robust and only rely on the component type being present or not. I considered switching `isArray()` to simply check, if the component type is present, but that just feels wrong, because then we could have types with name `Foo[]` which supposedly are no array. Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
`ClassFileImporterTest` has meanwhile grown too much. It is hard to keep an overview of all the aspects covered and code analysis is sluggish. I have started to split it into several tests for different aspects, beginning with annotations. Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
There is no need to check `getAllConstructors()` (i.e. also superclass constructors) for potential targets of constructor calls, because constructors are never inherited. All invocable constructors are declared directly within the class (at least on bytecode level). Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
Methods like `getAllSubClasses()` are pretty expensive if called frequently (in particular for very basic classes, e.g. interfaces that are widely used or `java.lang.Object`). Since these methods are also used internally for the reverse dependency lookup, the reverse dependency lookup is 3x faster on my machine for the classes imported in `ClassFileImporterSlowTest`, than without caching. Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
We have already checked that the call is not directly to `parent` before (because otherwise the respective member candidate would already have been an exact match). Thus it is futile to check if the call targets `parent` and we can leave it out from our search and remove `Sets.union(..)`. Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
I feel this was originally over-generified. While it is possible to treat the resolution uniformly for field accesses, method calls and constructor calls, it completely hides the fact that we do not need any class hierarchy resolution complexity for constructor calls. Since constructor calls will always be directly to the target owner, i.e. it is impossible to call a super class constructor from the outside, the constructor is always declared within the class itself. I have thus decided to remove the polymorphism from `TargetInfo` and move the resolution logic into `AccessFactory`, where the complex hierarchy resolution now only needs to be applied to fields and methods. The logic still feels overcomplicated and sub-optimal, e.g. we get all fields/methods of a class as a set and randomly look for a path from `target.desc` to `member`, and if there is EXACTLY ONE match in the path, we accept it. If we would sort the members according to the class hierarchy we would never get to the case that we find a match, but there is another field shadowing / method overriding in the hierarchy between `target` and `member`. However since this part is not completely trivial I decided to save this for a later point in time. Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
Replacing the overcomplicated `tryFind(.., nameMatching(quote(..)))` by a simple loop + `equals` check saves another 20-30% of the execution time on my machine (when running `ClassFileImporterSlowTest`) Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
For objects that are frequently used in sets and have a non-trivial hash code this has proven to make sense. `Dependency` is immutable, thus we can calculate the hash code once on construction instead of repeatedly calculating it on set operations. Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
While I generally really like having local classes within the test method I think the disadvantages are too big here. Namely that we assert the line number, which means whenever e.g. any import in this test class changes, the test method will change. This would in turn cause constant changes on this test method that are completely unrelated to the task at hand. Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
These methods do more than a simple "get", but they create the respective access objects. We should make this clear from the name to also make it clear that these methods should only be called once during the import process. Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
The completion process seems to be more convoluted than necessary. Originally this was intended so that no further access to the domain objects would be necessary after `complete`, but to return an inner class with access to the state that will do the completion is quite hard to understand in hindsight. We will make a series of little refactorings to improve this and eventually get rid of `CompletionProcess`. As a first step the `ImportContext` can take over the reverse registering of accesses by their target owner type, instead of doing this within the completion process. We will keep all the method signatures as they are for now, since all those classes should eventually be obsolete. Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
We can now get rid of `CompletionProcess`, since it does not contribute anything anymore. Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
We can then remove the whole `AccessContext` within `DomainObjectCreationContext`. Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
I felt the original test was a little hard to read and unterstand. I have split the test in three tests, i.e. one test per case (overridden method, shadowed field and default super constructor call). Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
I have decided to encapsulate all reverse dependencies, i.e. "dependencies to self", within a separate object only taking care of this aspect. As a first step I migrated field accesses to self, method calls to self and constructor calls to self into this object. These are also the most difficult cases, because of inheritance. Take e.g. ``` class Base{ Object field; } class Child extends Base {} class Caller { void access(Child child) { child.field } } ``` When `base.getFieldAccessesToSelf()` is invoked, we want to return that `Caller` actually accesses a field of `Base`. But this is not so easy, because in the byte code we have registered a field access to `Child.field`. To further complicate this, if we would only import `Caller` and `Child`, then ArchUnit would not even know how the accessed field is declared at all. This is why we have the concept of `FieldAccess`, which only with extra context dependent logic, on `resolve()` returns the actual `JavaField` that is accessed by the call (if resolvable from the context). In turn, if we want to know all the accesses to `Base.field`, we need to check all the accesses to any subclass, resolve each of these accesses and see if the actually resolved field matches the field we are interested in. Since the resolution is not trivial, it has always been wrapped into a memoizing supplier to only be invoked, if users really are interested in reverse dependencies or the resolved access targets. I have replaced these memoizing suppliers by a single Guava `LoadingCache` without eviction policy, because I felt it is easier to understand and has the same effect in the end. One more thing I simplified is that we have one clear point `reverseDependenciesCreation.registerDependenciesOf(clazz)` now, where for each class the reverse dependencies are registered. To me this feels easier to understand than to set up this reverse lookup very deep inside of `complete` methods for the concrete `JavaMembers` and hide the concrete reverse lookup within the import context. I thought about unifying the cache loader used for the reverse lookup for method calls/field accesses to also look up constructor calls, but then decided against it. In the end constructor calls can just be looked up much easier, because there is no possible direct call to super class constructors. Also it sped up the reverse lookup about 5-10% on average on my machine to do the constructor lookup directly by full name. Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
This will make it easier to understand how reverse dependencies work, because now there is one uniform place, in the very end, where we register all reverse dependencies of a class. I have thought about encapsulating `ReverseDependencies` into `JavaClassDependencies` to have an API similar to earlier, but since `JavaMember` meanwhile also uses `ReverseDependencies` directly, I refrained from it. I think it is still acceptable to have these two objects and another level of indirection (`JavaClassDependencies` delegating to `ReverseDependencies`) will certainly not make dependency resolution any faster either. In any case the `ImportContext` is now simpler and I measured performance and could not detect any significant degradation (even though we now have additional loops to register the reverse dependencies that we did not have before). Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
Instead of manually adding component type reverse dependencies, we can simply reuse all dependencies from self for the reverse lookup. This does not only speed up the reverse lookup by about factor 2 on my machine, it also makes it trivial in the future to add reverse dependencies for any further new types of dependencies from self. Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
82701fa
to
7f4b8cd
Compare
…o-self Add component type to dependencies to self
This is the reverse side of #428, i.e. it adds the component type of arrays to
JavaClass.dependenciesToSelf
. In other words it for the following examplewe will now see the dependency
DependsOnComponentType -> Target
fortarget.getDependenciesToSelf()
.On the way I figured that the dependency resolution process, and in particular the reverse dependency resolution process, is meanwhile pretty convoluted. I have thus simplified this and at least on my machine increased the speed for the reverse dependency lookup by a factor 7 for
ClassFileImporterSlowTest.importsClasspath()
.In the end I figured that the process of adding new types of dependencies is pretty tedious, because we first add the "dependencies from self", then in a completely parallel process we add the "dependencies to self". With the refactoring now we will simply reuse the "dependencies from self" to create the "dependencies to self", thus making it a lot easier in the future to add both directions in one shot.
Issue: #257