Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add alias for compound street names with abbreviated generic #145

Merged
merged 1 commit into from
Jan 13, 2022

Conversation

missinglink
Copy link
Member

@missinglink missinglink commented Jan 8, 2022

implementation of #144

this method is slightly different from previous in that all versions of the name (including aliases) are considered rather than only the primary name.

note: this method produces duplicate names, the subsequent post processing step 'deduplication' handles removing duplicate entries so I felt it wasn't necessary to do so within this script, at the cost of having duplicate entries in the tests.

resolves #144
resolves pelias/api#1594

@orangejulius
Copy link
Member

Cool, from the test output this looks perfect. Next step is probably to run a small (or maybe planet build) and see how this compares in some of our relevant test cases. The new one from pelias/api#1594 should be especially helpful.

@missinglink missinglink force-pushed the contractions_abbreviated branch from 6abb64c to cb44cdb Compare January 11, 2022 11:47
@missinglink missinglink marked this pull request as ready for review January 11, 2022 11:48
@missinglink
Copy link
Member Author

missinglink commented Jan 11, 2022

opening this up for review/merge.

there were a couple things I wasn't quite happy about but decided not to tackle in this PR:

  • in the bottom of test/post/seperable_street_names.js I have enumerated all the common permutations of a german street name, two of these forms are not covered by this functionality due to a lack of full-token synonym substitution. I went back and forth on this and decided that since we already have the pelias/schema synonyms and the index-time street name cleanup script that it would be messy and error-prone to duplicate that functionality here. I added a code comment with a little more info.
  • I don't really like the names of the functionality being expansions/contractions (which I probably named in the first place 😝), these words make me think of conversion to/from abbreviations, whereas the functionality here is about separating/combining of compound words. I would be 👍 for finding better names although I doubt anyone except for me really cares about this 😆

@missinglink
Copy link
Member Author

worth noting that more computation will be required than before since we are operating on lists now rather than scalar values, I think this is preferable in order to produce more permutations, but it may result in a very slight index-time perf slowdown for affected records (ie. DE/CH/AT/NL plus address/street/intersection).

@missinglink
Copy link
Member Author

@orangejulius before merging this we might want to quickly discuss the config and consider expanding it a little more (or not?), it's no extra 'work' per-se, more of a question of completeness/coverage.

@@ -101,20 +96,27 @@ function post(doc) {
if( !TARGET_LAYERS.includes( doc.getLayer() ) ) { return; }

// detect document country code
let docCountryCode = _.get(doc, 'parent.country_a[0]');
let docCountryCode = _.get(doc, 'parent.country_a[0]') || _.get(doc, 'parent.dependency_a[0]');
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while not strictly required yet, this change allows future configuration for children of a dependency.

@missinglink missinglink force-pushed the contractions_abbreviated branch from cb44cdb to 870800f Compare January 11, 2022 12:03
@missinglink
Copy link
Member Author

I didn't add an appropriate semantic-release message to #147 so it will be included in this release.

@missinglink
Copy link
Member Author

This looks good on the dev environment:

Screenshot 2022-01-13 at 12 47 36

Happy to merge this as-is, there are a couple more potential separable street suffixes in https://github.com/openvenues/libpostal/blob/master/resources/dictionaries/de/concatenated_suffixes_separable.txt but I don't feel like we need to add any of those at this stage since they're less common.

@missinglink
Copy link
Member Author

need to add acceptance-tests

@missinglink missinglink merged commit 2cb45c1 into master Jan 13, 2022
@missinglink missinglink deleted the contractions_abbreviated branch January 13, 2022 16:25
orangejulius added a commit to pelias/acceptance-tests that referenced this pull request Mar 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants