-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace geolix with locus #2362
Conversation
lib/plausible/geo/locus.ex
Outdated
nil | ||
|
||
{:error, reason} -> | ||
raise "failed to lookup ip address #{inspect(ip_address)}: " <> inspect(reason) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raising on invalid ip address or any other kind of unexpected error
BundleMonUnchanged files (8)
No change in files bundle size Final result: ✅ View report in BundleMon website ➡️ |
lib/plausible/geo/locus.ex
Outdated
|
||
unless opts[:async] do | ||
{:ok, _version} = :locus.await_loader(@db) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waiting for db to be loaded by default, blocks root supervisor startup
e736ccd
to
c279785
Compare
lib/plausible/geo/locus.ex
Outdated
:ok = :locus.start_loader(@db, {:maxmind, edition}, license_key: license_key) | ||
|
||
path = opts[:path] -> | ||
:ok = :locus.start_loader(@db, path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if locus:start_loader
returns an error, this function crashes with a match error, I think it's good enough as the error reason would still be easily visible in logs
lib/plausible/geo/locus.ex
Outdated
:ok = :locus.start_loader(@db, path) | ||
|
||
true -> | ||
raise "failed to load geolocation db: need :path or :license_key to be provided" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raising here too in case not enough info was provided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job on this!
039c73c
to
053e836
Compare
I pushed your changes to staging, and it's looking good! Could you please help me doing some manual testing? https://staging.srv.plausible.io/ I believe we can merge and deploy it to production on Monday. |
I sent some events to dummy.site with "forwarded" headers to simulate different ip addresses. One thing I noticed now and also before these changes is that for some locations regions seem to be showing cities and cities seem to be showing city districts. Like As a user, for the Shinjuku example above I'd expect to see |
@@ -34,7 +34,7 @@ defmodule Mix.Tasks.DownloadCountryDatabase do | |||
|
|||
if res.status_code == 200 do | |||
File.mkdir("priv/geodb") | |||
File.write!("priv/geodb/dbip-country.mmdb", res.body) | |||
File.write!("priv/geodb/dbip-country.mmdb.gz", res.body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
locus
trusts the file extension.
locus
didn't try to gunzip dbip-country.mmdb
because it didn't have a .gz
suffix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, the error that locus
logged was
[error] [locus] [geolocation] database failed to load (:filesystem): {:unpack_database_from, :mmdb_blob,
{:bad_metadata,
{:marker_not_found,
<<171, 205, 239, 77, 97, 120, 77, 105, 110, 100, 46, 99, 111, 109>>}}}
yeah, that's not ideal. we've been manually changing some of these districts into cities ourselves to make it better when people report it but it's difficult to do the whole world like that. see https://github.com/plausible/analytics/blob/master/lib/plausible/ingestion/city_overrides.ex. is there a better way to do it? |
Before I go into speculation on how to work around this, I'll post some example responses from dbip-city-lite and maxmind's geolite2-city for two Shinjuku ip addresses to better understand what we are dealing with: ip lookup response from dbip-city-liteiex(6)> :locus.lookup(:dbip, "124.150.157.214")
{:ok,
%{
# The full dbip response from the paid version would also contain geonames ids.
"city" => %{"names" => %{"en" => "Shinjuku"}},
"continent" => %{
"code" => "AS",
"geoname_id" => 6255147,
"names" => %{
"de" => "Asien",
"en" => "Asia",
"es" => "Asia",
"fa" => " آسیا",
"fr" => "Asie",
"ja" => "アジア大陸",
"ko" => "아시아",
"pt-BR" => "Ásia",
"ru" => "Азия",
"zh-CN" => "亚洲"
}
},
"country" => %{
"geoname_id" => 1861060,
"is_in_european_union" => false,
"iso_code" => "JP",
"names" => %{
"de" => "Japan",
"en" => "Japan",
"es" => "Japón",
"fa" => "ژاپن",
"fr" => "Japon",
"ja" => "日本",
"ko" => "일본",
"pt-BR" => "Japão",
"ru" => "Япония",
"zh-CN" => "日本"
}
},
"location" => %{"latitude" => 35.6967, "longitude" => 139.709},
"subdivisions" => [%{"names" => %{"en" => "Tokyo"}}]
}} The full dbip response from the paid version would also contain geonames ids like in this maxmind lookup example: ip lookup response from maxmind's geolite2-city, contains geoname idsiex(10)> :locus.lookup(:maxmind, "122.216.10.1")
{:ok,
%{
"city" => %{
"geoname_id" => 1852083, # but https://www.geonames.org/1852083 can be used to double check
"names" => %{"en" => "Shinjuku", "ja" => "新宿区"} # same problem, Shinjuku is listed as city
},
"continent" => %{
"code" => "AS",
"geoname_id" => 6255147,
"names" => %{
"de" => "Asien",
"en" => "Asia",
"es" => "Asia",
"fr" => "Asie",
"ja" => "アジア",
"pt-BR" => "Ásia",
"ru" => "Азия",
"zh-CN" => "亚洲"
}
},
"country" => %{
"geoname_id" => 1861060,
"iso_code" => "JP",
"names" => %{
"de" => "Japan",
"en" => "Japan",
"es" => "Japón",
"fr" => "Japon",
"ja" => "日本",
"pt-BR" => "Japão",
"ru" => "Япония",
"zh-CN" => "日本"
}
},
"location" => %{
"accuracy_radius" => 5,
"latitude" => 35.6948,
"longitude" => 139.6976,
"time_zone" => "Asia/Tokyo"
},
"postal" => %{"code" => "160-0021"},
"registered_country" => %{
"geoname_id" => 1861060,
"iso_code" => "JP",
"names" => %{
"de" => "Japan",
"en" => "Japan",
"es" => "Japón",
"fr" => "Japon",
"ja" => "日本",
"pt-BR" => "Japão",
"ru" => "Япония",
"zh-CN" => "日本"
}
},
"subdivisions" => [
%{
"geoname_id" => 1850144,
"iso_code" => "13",
"names" => %{
"en" => "Tokyo",
"fr" => "Préfecture de Tokyo",
"ja" => "東京都"
}
}
]
}} |
So what we have is basically wrong data, in my opinion. DB-IP thinks Shinjuku is a city, there is no way around it just by using the data we have from this response. But since the paid version includes geoname id, it can be used to lookup the geoname info by the id and check what kind of place it is: https://www.geonames.org/1852083/shinjuku-ku.html Shinjuku is listed as a second-order administrative division, under Tokyo (JP (independent political entity) -> Tokyo (first-order administrative division) -> Shinjuku (second-order administrative division)), maybe that information can be used to automatically rename the "city_geoname_id" key into something like "district_geoname_id" before inserting into the DB. Something similar is already being done in https://github.com/plausible/location, but on the "presenting" side, when turning geoname ids into city names before rendering the dashboard. |
It might have been too soon for me to claim that the data is wrong 🙈 (The suggested workaround might still work, though, I've checked it on Bangkok / Watthana responses as well, I'll need to check more cities)
I tried to research a bit what a second-order administrative division is, and it usually means something like a city in most countries. But from https://en.wikipedia.org/wiki/Municipalities_of_Japan
So maybe in this particular case, Shinjuku is indeed a city-like administrative division (a special ward) even though it sounds a bit strange. |
would be great if we can improve this somehow. we get three types of feedback for cities (everyone seems to be happy with countries and regions). here they are ranked by how often we hear about them:
|
Hey 👋! I pushed some commits to this branch and merged with master. We're also ready to give MaxMind in Plausible Cloud a try after some benchmarking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just one comment about adding geolocation readiness status to our healthcheck endpoint.
Once this is tested and we've settled on using Locus+Maxmind, we also need to document this for self-hosted and include it in the release notes for the next release (cc @ruslandoga)
|
||
defp setup_geolocation do | ||
opts = Application.fetch_env!(:plausible, Plausible.Geo) | ||
:ok = Plausible.Geo.load_db(opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this seems to be async by default, I think it would also be a good idea to add this loading step to the /api/health
endpoint like we do with the sites cache. This way the load balancer will not route traffic to a newly restarted node until the geo database is loaded.
Side question: how long does it take to download and be ready to run geolocation lookups with the Maxmind license key?
Side note: not a concern for this PR but I think we're getting to a point where we should really separate liveness vs readiness probes as suggested by @cnkk. Loading the geolocation DB is required for the app to be 'ready' but it's independent from it being 'live'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's sync by default in this implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side question: how long does it take to download and be ready to run geolocation lookups with the Maxmind license key?
On my laptop it takes six seconds. I don't know how much bigger the paid version is though. I think most of the time is spent on IO, decoding (or rather, verification) is fast.
Mix.install [:locus]
:timer.tc fn ->
Application.put_env(:locus, :license_key, "XXdDMc5OhchOTazu") # I'll delete it in a few days
:ok = :locus.start_loader(:city, {:maxmind, "GeoLite2-City"}, [:no_cache])
{:ok, _} = :locus.await_loader(:city)
end
#=> {6204296, {:ok, {{2023, 1, 10}, {15, 33, 48}}}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Seeing as this is sync and called before Location.load
, I assume this will slow down the whole startup process. If this loading was async, it could run in parallel and probably finish before Location.load
is done if I understand right. That seems desirable so the whole startup process is only limited by the slowest piece which would be Location.load
. What do you think @ruslandoga @vinibrsl ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, maybe we can fire it off before Location.load
, and wait for it with something like Plausible.Geo.await_load()
after Location.load
(just in case)? await_load
would call :locus.await_loader(:city)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using :locus.await_load
but I couldn't know when the task was completed. The alternative I considered is to use Task.await_many/2
to start Location and Geolix in parallel. This ensures the app will never be in an invalid state where Geolix or Location haven't yet finished booting up. Changed in 35393b2. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using :locus.await_load but I couldn't know when the task was completed.
What was the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem was that I was using it wrong 😅. Fixed in c775070. Let me know what you think :)
506c2f1
to
35393b2
Compare
Changes
This PR replaces geolix with locus to simplify self-hosted setup.
locus
can auto-update maxmind dbs which are recommended for self-hosters if they want city-level geolocation.locus
is also a bit faster.This PR also uses a test mmdb file from https://github.com/maxmind/MaxMind-DB for e2e geolocation tests without stubs.
Tests
Changelog
Documentation
Dark mode