-
Notifications
You must be signed in to change notification settings - Fork 15
OSM name of a feature matches to Wikidata name #105
Conversation
Some 👀 and feedback if any from you @planemad and @amishas157 would be really helpful. |
cc: @batpad @geohacker |
}, | ||
{ | ||
"description": "Test OSM name matches with aliases on Wikidata", | ||
"expectedResult": {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this give "result:name_matches_to_wikidata": true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@planemad the current design of all compare functions is to return results only if interesting, and by interesting we mean mostly harmful changes. When things are good, the compare functions return nothing, i.e: {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This design essentially does not differentiate between no data and a positive result.
If the goal is to have every changeset reviewed by the community, comparators should pass on as much useful knowledge as possible to a human reviewer to make the final decision. This comparator has essentially done the tedious effort of looking up Wikidata and comparing names, witholding this finding will lead to duplicate human effort on the same activity. What do we gain by this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comparators should pass on as much useful knowledge as possible to a human reviewer to make the final decision.
Totally agree @planemad, 💯 The easier bit is returning "result:name_matches_to_wikidata": true
from the compare function. The harder one is on osmcha
's side. Just to keep the scale of things manageable, osmcha
stores just the features that the comparators have flagged for being potentially harmful/problematic and not all the features. Yes, osmcha
has all the changesets but not all the features.
I am curious to hear more on this, shall we create a separate ticket for the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we create a separate ticket for the same?
Yes please. We should have consistent design principles that will serve as a guide to build useful compare functions without being constrained by limitations of osmcha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a new ticket here: #106
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good start. My biggest concern is that name
on OSM is the name in the local language and not necessarily English.
Since the comparator compares only to the English Wikidata label there is a potential for a lot of noise from data in non English regions.
Can we get an idea of the level of noise this will generate if it goes out live?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good start. My biggest concern is that name on OSM is the name in the local language and not necessarily English.
Great observation @planemad
Can we get an idea of the level of noise this will generate if it goes out live?
Once deployed on osmcha
, all feature changes flagged by this comparator can be filtered by the reason: Name does not match to Wikidata
}, | ||
{ | ||
"description": "Test OSM name matches with aliases on Wikidata", | ||
"expectedResult": {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@planemad the current design of all compare functions is to return results only if interesting, and by interesting we mean mostly harmful changes. When things are good, the compare functions return nothing, i.e: {}
if ((osmName !== wikidataName) && (wikidataAliasNames.indexOf(osmName) === -1)) return callback(null, { | ||
'result:name_matches_to_wikidata': false | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkowshik Though we are assuming that we won't be hitting wikidata API too hard, but just to be 💯 , what we can do is, catch the errors when wikidata API is ratelimited
and find out a way for it to report to us. Maybe we can also use: 'result:wikidataApiLimitExceeded: true, the way we do it for escalate and then read it on vandalism side to send us these error. [This](https://www.mediawiki.org/wiki/API:Errors_and_warnings) list the error codes returned by wikidata API. We can catch for
ratelimited`. If we get such errors from vandalism, we can figure out some other way, so as to not hit wikidata API hard and also be ensured that this comparator has worked the way it is expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @amishas157 copied over your comments to a new ticket about best practices for working with external APIs here: #107
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkowshik ,The comparator looks 🎉
Published to npm as version: |
I downloaded the Wikidata dump to find a total of
25,327,505
features, out of which3,090,713
features have alatitude
tag. i.e:12.2%
of Wikidata features have a location component. 🎉There are
589,087
features on OpenStreetMap with a Wikidata tag. For this iteration the focus is on name modification to a feature with a Wikidata tag. So, querying the Wikidata API in realtime is a better option compared to creating a local dump similar tolandmarks.sqlite
for a couple of reasons:Sample Wikidata query
Holige is a 😋 sweet flatbread from South India with a Wikidata ID:
Q19891734
$ curl "https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q19891734&format=json" {"entities":{"Q19891734":{"pageid":21535239,"ns":0,"title":"Q19891734","lastrevid":390829462,"modified":"2016-10-18T23:45:44Z","type":"item","id":"Q19891734","labels":{"en":{"language":"en","value":"Holige"},"pa":{"language":"pa","value":"\u0a2a\u0a42\u0a30\u0a28 \u0a2a\u0a4b\u0a32\u0a40"}},"descriptions":{"pa":{"language":"pa","value":"\u0a2d\u0a3e\u0a30\u0a24\u0a40 \u0a16\u0a3e\u0a23\u0a3e"},"en":{"language":"en","value":"Indian Food"}},"aliases":{},"claims":{"P279":[{"mainsnak":{"snaktype":"value","property":"P279","datavalue":{"value":{"entity-type":"item","numeric-id":2095,"id":"Q2095"},"type":"wikibase-entityid"},"datatype":"wikibase-item"},"type":"statement","id":"Q19891734$5AD3D27B-6D89-4435-84B4-D25744E4D81C","rank":"normal"}],"P495":[{"mainsnak":{"snaktype":"value","property":"P495","datavalue":{"value":{"entity-type":"item","numeric-id":668,"id":"Q668"},"type":"wikibase-entityid"},"datatype":"wikibase-item"},"type":"statement","id":"Q19891734$FC9CC4A4-9858-44B9-BAC2-1C4FAAF03A70","rank":"normal","references":[{"hash":"7eb64cf9621d34c54fd4bd040ed4b61a88c4a1a0","snaks":{"P143":[{"snaktype":"value","property":"P143","datavalue":{"value":{"entity-type":"item","numeric-id":328,"id":"Q328"},"type":"wikibase-entityid"},"datatype":"wikibase-item"}]},"snaks-order":["P143"]}]}],"P373":[{"mainsnak":{"snaktype":"value","property":"P373","datavalue":{"value":"Obbattu","type":"string"},"datatype":"string"},"type":"statement","id":"Q19891734$3E388244-771E-44AC-91BB-57F72AEDA0D5","rank":"normal","references":[{"hash":"7eb64cf9621d34c54fd4bd040ed4b61a88c4a1a0","snaks":{"P143":[{"snaktype":"value","property":"P143","datavalue":{"value":{"entity-type":"item","numeric-id":328,"id":"Q328"},"type":"wikibase-entityid"},"datatype":"wikibase-item"}]},"snaks-order":["P143"]}]}],"P18":[{"mainsnak":{"snaktype":"value","property":"P18","datavalue":{"value":"Holige1.JPG","type":"string"},"datatype":"commonsMedia"},"type":"statement","id":"Q19891734$A56187C7-F1B4-4FA6-B3A9-4770D2B33BB3","rank":"normal","references":[{"hash":"7eb64cf9621d34c54fd4bd040ed4b61a88c4a1a0","snaks":{"P143":[{"snaktype":"value","property":"P143","datavalue":{"value":{"entity-type":"item","numeric-id":328,"id":"Q328"},"type":"wikibase-entityid"},"datatype":"wikibase-item"}]},"snaks-order":["P143"]}]}]},"sitelinks":{"enwiki":{"site":"enwiki","title":"Puran poli","badges":[]},"pawiki":{"site":"pawiki","title":"\u0a2a\u0a42\u0a30\u0a28 \u0a2a\u0a4b\u0a32\u0a40","badges":[]}}}},"success":1}
Wikidata alias
An object in Wikidata can have one or more aliases. For example, the city of Bengaluru has the name
Bengaluru
in OpenStreetMap butBangalore
on Wikidata. I have made appropriate modifications to the comparator so that the Bangalore is flagged only when the name differentiates from eitherBangalore
orBengaluru
. 😃Quality of data
It has been an eye-opening experience seeing OpenStreetMap data used with other open data sources. The boost in quality of data and maintainability is a win-win! 🚀