Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add component type to dependencies to self #490

Merged
merged 19 commits into from
Dec 15, 2020

Conversation

codecholeric
Copy link
Collaborator

This is the reverse side of #428, i.e. it adds the component type of arrays to JavaClass.dependenciesToSelf. In other words it for the following example

class Target {}

class DependsOnComponentType {
    Target[] fieldDependency;
}

we will now see the dependency DependsOnComponentType -> Target for target.getDependenciesToSelf().

On the way I figured that the dependency resolution process, and in particular the reverse dependency resolution process, is meanwhile pretty convoluted. I have thus simplified this and at least on my machine increased the speed for the reverse dependency lookup by a factor 7 for ClassFileImporterSlowTest.importsClasspath().
In the end I figured that the process of adding new types of dependencies is pretty tedious, because we first add the "dependencies from self", then in a completely parallel process we add the "dependencies to self". With the refactoring now we will simply reuse the "dependencies from self" to create the "dependencies to self", thus making it a lot easier in the future to add both directions in one shot.

Issue: #257

@codecholeric codecholeric requested a review from hankem December 13, 2020 22:17
@codecholeric codecholeric force-pushed the add-component-type-to-dependencies-to-self branch 2 times, most recently from 64e849a to 4396ab6 Compare December 13, 2020 23:24
Copy link
Member

@hankem hankem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot claim (at all) that I had understood how all the AccessContext, TopProcess, CompletionProcess, MemberDependenciesByTarget worked before... 🙈
... but I definitely agree that the new concept of ReverseDependencies is much simpler! 👍
It would take me much more time to understand everything in detail, but IMO the Christmas release shouldn't be blocked by that.

Regarding the speedup: You were only looking at the reverse dependency lookup, right? (What did you measure exactly?)
For the entire import (or even execution) of the ClassFileImporterSlowTest, I didn't see a significant difference, nor for the class file import in some larger project.

for (JavaClass javaClass : classes) {
if (signaturePredicate.exists(javaClass, target)) {
matching.add(javaClass);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially wondering whether we could save some iterations by breaking the loop once we know that there are multiple matches:

Suggested change
}
if (matching.size() > 1) {
break;
}
}

But it seems that there usually aren't that many matches anyways?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this only occurs, if a method has been overridden several times in the hierarchy. E.g. if you have

class Base {
  void foo() {
    // some impl
  }
}

class Child extends Base {
  @Override
  void foo() {
    // also some impl
  }
}

class GrandChild extends Child {
  // not overridden here, otherwise it would be directly found
}

And then by accident you pick Base.foo() as the first candidate to find the target of a call to GrandChild.foo(). I'll still look into it 👍 (unfortunately the whole "start with any method in the hierarchy is pretty sub-optimal" 😞)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think I'll leave it out for now. I could not find any noticeable difference in performance. And the problem is, that the benefit is not completely clear for me. We are exchanging a couple of unnecessary loops in a rare case against an operation we have to execute in every single loop. Take the extreme case that we for example have no single method that has been overridden multiple times in the whole set of imported classes, then we would actually even perform worse than if we don't do the check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I absolutely agree.
When I initially thought about it, I hadn't realized what containsExactlyOneMatch is actually used for...

*/
@PublicAPI(usage = ACCESS)
public Set<JavaAnnotation<?>> getAnnotationsWithParameterTypeOfSelf() {
return reverseDependencies.getAnnotationsWithParameterTypeOf(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the test coverage, this can – as well as getInstanceofChecksWithTypeOfSelfjust be simplified to:

return Collections.emptySet();

Maybe we should have some tests for those new methods...? 😜

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, now I remember that I wanted to add some additional test 🙈 In particular, because for the final state this method is not used anymore to create the reverse dependencies 😉

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I added tests for the two new public API methods 😃

@codecholeric
Copy link
Collaborator Author

codecholeric commented Dec 15, 2020

Thanks a lot for looking into it!! Yes, this is probably one of the most complicated parts 😉 If you understand the code after better than before, I would say mission accomplished 😂 I mean, there is IMHO a really good test coverage, so I'm not too scared that I have destroyed anything fundamental with this refactoring. But there can always be some subtle corner cases I've overlooked.

Regarding performance, yes, the dependency and reverse dependency creation is a lazy process. So what I did to measure the difference is to add the following to ClassFileImporterSlowTest.imports_the_classpath()

for (JavaClass clazz : classes) {
    clazz.getDirectDependenciesToSelf();
}

If you don't do anything like that, then all the changed code is not executed anyway 😉

@codecholeric codecholeric force-pushed the add-component-type-to-dependencies-to-self branch from 4396ab6 to 4b6c645 Compare December 15, 2020 17:11
For component type dependencies we have overlooked to filter out self dependencies. Such dependencies occur in particular for the bytecode generated for enums. I have now encapsulated the actual dependency creation in a specific method and moved the check into that method to make it easier to remember this in the future. I have also added an argument validation into the constructor of `Dependency` as another safeguard.
Funny thing is, that this was clearly visible in the frozen rules store being modified, yet I did not realize it, when I reviewed the change.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
At the moment `isArray()` and `getComponentType()` can behave inconsistent, if participating classes are only imported partially. I.e. `isArray()` looks at the descriptor, which might say "this is an array type", while `getComponentType()` cannot resolve the component type, because we did not import the respective class. Since this is definitely a plausible case (even though we should probably automatically resolve these types similarly to call targets, type hierarchy or annotations), we should be more robust and only rely on the component type being present or not.
I considered switching `isArray()` to simply check, if the component type is present, but that just feels wrong, because then we could have types with name `Foo[]` which supposedly are no array.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
`ClassFileImporterTest` has meanwhile grown too much. It is hard to keep an overview of all the aspects covered and code analysis is sluggish. I have started to split it into several tests for different aspects, beginning with annotations.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
There is no need to check `getAllConstructors()` (i.e. also superclass constructors) for potential targets of constructor calls, because constructors are never inherited. All invocable constructors are declared directly within the class (at least on bytecode level).

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
Methods like `getAllSubClasses()` are pretty expensive if called frequently (in particular for very basic classes, e.g. interfaces that are widely used or `java.lang.Object`). Since these methods are also used internally for the reverse dependency lookup, the reverse dependency lookup is 3x faster on my machine for the classes imported in `ClassFileImporterSlowTest`, than without caching.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
We have already checked that the call is not directly to `parent` before (because otherwise the respective member candidate would already have been an exact match). Thus it is futile to check if the call targets `parent` and we can leave it out from our search and remove `Sets.union(..)`.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
I feel this was originally over-generified. While it is possible to treat the resolution uniformly for field accesses, method calls and constructor calls, it completely hides the fact that we do not need any class hierarchy resolution complexity for constructor calls. Since constructor calls will always be directly to the target owner, i.e. it is impossible to call a super class constructor from the outside, the constructor is always declared within the class itself. I have thus decided to remove the polymorphism from `TargetInfo` and move the resolution logic into `AccessFactory`, where the complex hierarchy resolution now only needs to be applied to fields and methods.
The logic still feels overcomplicated and sub-optimal, e.g. we get all fields/methods of a class as a set and randomly look for a path from `target.desc` to `member`, and if there is EXACTLY ONE match in the path, we accept it. If we would sort the members according to the class hierarchy we would never get to the case that we find a match, but there is another field shadowing / method overriding in the hierarchy between `target` and `member`. However since this part is not completely trivial I decided to save this for a later point in time.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
Replacing the overcomplicated `tryFind(.., nameMatching(quote(..)))` by a simple loop + `equals` check saves another 20-30% of the execution time on my machine (when running `ClassFileImporterSlowTest`)

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
For objects that are frequently used in sets and have a non-trivial hash code this has proven to make sense. `Dependency` is immutable, thus we can calculate the hash code once on construction instead of repeatedly calculating it on set operations.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
While I generally really like having local classes within the test method I think the disadvantages are too big here. Namely that we assert the line number, which means whenever e.g. any import in this test class changes, the test method will change. This would in turn cause constant changes on this test method that are completely unrelated to the task at hand.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
These methods do more than a simple "get", but they create the respective access objects. We should make this clear from the name to also make it clear that these methods should only be called once during the import process.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
The completion process seems to be more convoluted than necessary. Originally this was intended so that no further access to the domain objects would be necessary after `complete`, but to return an inner class with access to the state that will do the completion is quite hard to understand in hindsight. We will make a series of little refactorings to improve this and eventually get rid of `CompletionProcess`. As a first step the `ImportContext` can take over the reverse registering of accesses by their target owner type, instead of doing this within the completion process. We will keep all the method signatures as they are for now, since all those classes should eventually be obsolete.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
We can now get rid of `CompletionProcess`, since it does not contribute anything anymore.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
We can then remove the whole `AccessContext` within `DomainObjectCreationContext`.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
I felt the original test was a little hard to read and unterstand. I have split the test in three tests, i.e. one test per case (overridden method, shadowed field and default super constructor call).

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
I have decided to encapsulate all reverse dependencies, i.e. "dependencies to self", within a separate object only taking care of this aspect. As a first step I migrated field accesses to self, method calls to self and constructor calls to self into this object.
These are also the most difficult cases, because of inheritance. Take e.g.

```
class Base{
  Object field;
}

class Child extends Base {}

class Caller {
  void access(Child child) {
    child.field
  }
}
```

When `base.getFieldAccessesToSelf()` is invoked, we want to return that `Caller` actually accesses a field of `Base`. But this is not so easy, because in the byte code we have registered a field access to `Child.field`. To further complicate this, if we would only import `Caller` and `Child`, then ArchUnit would not even know how the accessed field is declared at all. This is why we have the concept of `FieldAccess`, which only with extra context dependent logic, on `resolve()` returns the actual `JavaField` that is accessed by the call (if resolvable from the context). In turn, if we want to know all the accesses to `Base.field`, we need to check all the accesses to any subclass, resolve each of these accesses and see if the actually resolved field matches the field we are interested in.
Since the resolution is not trivial, it has always been wrapped into a memoizing supplier to only be invoked, if users really are interested in reverse dependencies or the resolved access targets. I have replaced these memoizing suppliers by a single Guava `LoadingCache` without eviction policy, because I felt it is easier to understand and has the same effect in the end.
One more thing I simplified is that we have one clear point `reverseDependenciesCreation.registerDependenciesOf(clazz)` now, where for each class the reverse dependencies are registered. To me this feels easier to understand than to set up this reverse lookup very deep inside of `complete` methods for the concrete `JavaMembers` and hide the concrete reverse lookup within the import context.
I thought about unifying the cache loader used for the reverse lookup for method calls/field accesses to also look up constructor calls, but then decided against it. In the end constructor calls can just be looked up much easier, because there is no possible direct call to super class constructors. Also it sped up the reverse lookup about 5-10% on average on my machine to do the constructor lookup directly by full name.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
This will make it easier to understand how reverse dependencies work, because now there is one uniform place, in the very end, where we register all reverse dependencies of a class. I have thought about encapsulating `ReverseDependencies` into `JavaClassDependencies` to have an API similar to earlier, but since `JavaMember` meanwhile also uses `ReverseDependencies` directly, I refrained from it. I think it is still acceptable to have these two objects and another level of indirection (`JavaClassDependencies` delegating to `ReverseDependencies`) will certainly not make dependency resolution any faster either.
In any case the `ImportContext` is now simpler and I measured performance and could not detect any significant degradation (even though we now have additional loops to register the reverse dependencies that we did not have before).

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
Instead of manually adding component type reverse dependencies, we can simply reuse all dependencies from self for the reverse lookup. This does not only speed up the reverse lookup by about factor 2 on my machine, it also makes it trivial in the future to add reverse dependencies for any further new types of dependencies from self.

Signed-off-by: Peter Gafert <peter.gafert@tngtech.com>
@codecholeric codecholeric force-pushed the add-component-type-to-dependencies-to-self branch from 82701fa to 7f4b8cd Compare December 15, 2020 20:34
@codecholeric codecholeric requested a review from hankem December 15, 2020 20:45
@codecholeric codecholeric merged commit 99841dc into master Dec 15, 2020
@codecholeric codecholeric deleted the add-component-type-to-dependencies-to-self branch December 15, 2020 20:56
codecholeric added a commit that referenced this pull request Feb 21, 2021
…o-self

Add component type to dependencies to self
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants