-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify DeclaredDependency and fix typing{-,_}extensions #193
Conversation
After talking about this in today's status meeting, I will make a couple of commits that adds the mapping information back into the |
98d196e
to
d97b6ad
Compare
Ok, this is ready for a new round of reviews now. The only substantive change (in addition to rebasing) is adding two commits at the end that expose the dep-to-import mappings in the Here is an abbreviated look at what that looks like in out JSON output (running {
"settings": { ... },
"imports": [ ... ],
"declared_deps": [ ... ],
"resolved_deps": {
"importlib_metadata": {
"package_name": "importlib-metadata",
"import_names": ["importlib_metadata"],
"mappings": ["local_env"]
},
...
"typing-extensions": {
"package_name": "typing_extensions",
"import_names": ["typing_extensions"],
"mappings": ["local_env"]
},
"pytest": {
"package_name": "pytest",
"import_names": ["_pytest", "py", "pytest"],
"mappings": ["local_env"]
},
...
"scipy": {
"package_name": "scipy",
"import_names": ["scipy"],
"mappings": ["identity"]
},
"numpy": {
"package_name": "numpy",
"import_names": ["numpy"],
"mappings": ["identity"]
}
},
"undeclared_deps": [ ... ],
"unused_deps": [ ... ],
"version": "0.4.1"
} |
As Maria suggests in [1], knowing which import name came from which mapping can be very useful when debugging. We achieve this by combining .mappings and .import_names into a dict that associates each mapping with the import names it contributes. However, although now redundant, we still want to hang on to the flattened .import_names set for checking if each declared dep is used. Therefore we assign the new dict to .mappings and keep the redundant .import_names, except that we hide it from our JSON output. [1]: #193 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very clear refactor and a good idea to create a Package
class to govern mappings and to simplify the UnusedDependency
and UndeclaredDepencency
.
What I would love to see more of (and my main concern here) are tests for Package
class methods. They are tested indirectly in other places, but it would fit to include them in test_types.py
.
Apart from that, more of a nit-pick, is that LocalPackageLookup
is now more related to Packages
than to dependencies and I would see it separate from check.py
and gathered in one module with Package
.
Agreed. Will add some |
LOCAL_ENV is more descriptive for describing the source of this mapping from package name to import names: we query the current/local Python environment with importlib_metadata.packages_distributions() to get this mapping.
This moves around a lot of code, but the ultimate behavior is unchanged (no tests are changed in a meaningful way): - We introduce a new Package class whose sole purpose is to encapsulate the mapping from package (i.e. dependency) names to an associated set of import names. This mapping will replace the similar mapping that lived inside the DeclaredDependency class. - This class also holds the logic to _normalize_ package names. For now, this consists only of lowercasing them, but this will change soon. - Note that the .package_name member is NOT meant to be normalized, but should reflect the spelling of the source from which it was created (packages_distributions() in the case of LOCAL_ENV, or the DeclaredDependency object in case of IDENTITY). Instead the normalization will happen in the surrounding code when lookup is performed (see e.g. LocalPackageLookup.lookup_package()). - Also, the .is_used() helps simplify the matching of imports from the code (aka. imported_names) against the import names exposed by the package. - As a temporary measure, we also include a .modify() method which will modify/replace a DeclaredDependency instance in the same way as the old dependency_to_imports_mapping() function did. Although this modification/replacement is no longer necessary, it allows us to delay major test changes until later commit. - The new Package class allows the LocalPackageLookup to become somewhat simpler: Its constructor still calls packages_distributions(), but now builds a dict of Packages objects to cache the extracted information, and the .lookup_package() method becomes a simple lookup in this dict. - Finally, the new resolve_dependencies() function takes over what the dependency_to_imports_mapping() and map_dependencies_to_imports() functions previously did: Create a local package lookup, and map all declared dependencies into a Package object - either one found in the local environment, or a fallback pseudo-Package encoding the identity mapping. Instead of the dependency-to-import-names mapping living _inside_ each DeclaredDependency object, the mapping is now explicitly returned from resolved_dependencies() as a Dict[str, Package] mapping. The matching/ calculation performed in compare_imports_to_dependencies proceeds largely as before, albeit with some name changes that IMHO improve readability. Althought the tests are left unchanged as much as possible, there are some necessary changes in test_map_dep_name_to_import_names due to the API changes described above.
Some dependencies (notably "typing-extensions") have a package name that contain a hyphen ('-'), but provide an import name where that hyphen has been replaced with an underscore ('_'). Adding this normalization to our Package class fixes real-world cases, and brings us closer to the kind of normalization already performed by similar implementations in this space, e.g. the various safe_* functions in setuptools:pkg_resources, or the Pants build system[1] [1]: e.g. look at the map_third_party_modules_to_addresses() method of ThirdPartyPythonModuleMapping in release_2.15.0: src/python/pants/backend/python/dependency_inference/module_mapper.py:319
There is no longer any need to store the import names and mapping enum details _inside_ the DeclaredDependency class. If we do want these details in the (e.g. JSON) output of FawltyDeps, we can instead add the corresponding Package objects back into the UnusedDependency objects. For now, though, simply remove the .import_names and .mapping members from the DeclaredDependency class, along with associated code and test cases. This helps cut down on the duplication inside the internal Analysis data structure, which is also reflected in the JSON output.
Until now we had: class UndeclaredDependency: name: str references: List[ParsedImport] class UnusedDependency: name: str references: List[DeclaredDependency] but both ParsedImport and DeclaredDependency contained their own .name which was identical to the corresponding .name above. This duplication is unnecessary, and we can avoid it by storing the .source member of the latter objects in the .references lists of the respective former objects. In other words: class UndeclaredDependency: name: str references: List[Location] class UnusedDependency: name: str references: List[Location] The simpler structure is nicely reflected in some shorter test cases.
This is in preparation for including Package objects in the overall Analysis data structure.
When we removed .import_names and .mapping from DeclaredDependency, we also remove any information in our JSON output that could help debug the dependency-to-import-names mapping that FD performs. This commits adds that information back in a more useful format. Instead of duplicating it across many DelaredDependency objects, we now add a dedicated field to the top-level Analysis object called .resolved_deps. Analysis.resolved_deps is a mapping from dependency names (as they occur in the --deps files we have parsed) to the corresponding Package object, which contains information on what import names that FawltyDeps found, and which mapping it used to find those import names (currently either via the local environment, or by falling back to the identity mapping).
As Maria suggests in [1], knowing which import name came from which mapping can be very useful when debugging. We achieve this by combining .mappings and .import_names into a dict that associates each mapping with the import names it contributes. However, although now redundant, we still want to hang on to the flattened .import_names set for checking if each declared dep is used. Therefore we assign the new dict to .mappings and keep the redundant .import_names, except that we hide it from our JSON output. [1]: #193 (comment)
25ec61c
to
5392788
Compare
Suggested-by: Maria Knorps <maria.knorps@tweag.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the tests! they are very helpful and descriptive 🎉
🚀
We have one discussion left, but this is very corner-casy stuff. The PR is good to merge :)
This verifies that deps declared with only case differences ("Pip" vs "pip", which is interpreted as the _same_ package by "pip install") still end up being reported as two different unused dependencies.
As Maria suggests in [1], knowing which import name came from which mapping can be very useful when debugging. We achieve this by combining .mappings and .import_names into a dict that associates each mapping with the import names it contributes. However, although now redundant, we still want to hang on to the flattened .import_names set for checking if each declared dep is used. Therefore we assign the new dict to .mappings and keep the redundant .import_names, except that we hide it from our JSON output. [1]: #193 (comment)
We will soon remove compare_imports_to_dependencies(), but we have a lot of parametrized tests for this function that we want to retain. Turn each test vector into a FDTestVector instance to make the definition of each vector more readable, and then use the same test data to test resolve_dependencies(), calculate_undeclared(), and calculate_unused() in addition to compare_imports_to_dependencies(). Based on Maria's great idea in [1]. [1]: #193 (comment)
We will soon remove compare_imports_to_dependencies(), but we have a lot of parametrized tests for this function that we want to retain. Turn each test vector into a FDTestVector instance to make the definition of each vector more readable, and then use the same test data to test resolve_dependencies(), calculate_undeclared(), and calculate_unused() in addition to compare_imports_to_dependencies(). Based on Maria's great idea in [1]. [1]: #193 (comment)
We will soon remove compare_imports_to_dependencies(), but we have a lot of parametrized tests for this function that we want to retain. Turn each test vector into a FDTestVector instance to make the definition of each vector more readable, and then use the same test data to test resolve_dependencies(), calculate_undeclared(), and calculate_unused() in addition to compare_imports_to_dependencies(). Based on Maria's great idea in [1]. [1]: #193 (comment)
We will soon remove compare_imports_to_dependencies(), but we have a lot of parametrized tests for this function that we want to retain. Turn each test vector into a FDTestVector instance to make the definition of each vector more readable, and then use the same test data to test resolve_dependencies(), calculate_undeclared(), and calculate_unused() in addition to compare_imports_to_dependencies(). Based on Maria's great idea in [1]. [1]: #193 (comment)
This started out as an attempt to fix #173, but also picked up some of the simplifications and fixing of unnecessary duplication that we discussed on Slack recently: https://tweag.slack.com/archives/C04AAQP8C85/p1677243757048469
This series does fix the
typing-extensions
vstyping_extensions
issue in the middle, but there is quite a bit of refactoring happening before, and some very nice cleanup happening afterwards. In sum I think the refactoring is worthwhile, but if you'd rather have the fix committed in its own PR, I can have a look at pulling it out of this context.types
: RenameDependenciesMapping.DEPENDENCY_TO_IMPORT
->.LOCAL_ENV
check
: Reorganize mapping of dependencies to import namescheck.Package.normalize_name()
: Replace hyphens with underscorestypes
: Remove mapping details fromDeclaredDependency
types
: Reduce duplication by changing.references
intoList[Location]