-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database performance? #171
Comments
This issue is critical for us and question the relevance of GOST for our projet, do you have any insight on the underlying reason? |
Hi, whats the difference between Things(1130)/Locations and Things(1129)/Locations? |
Hi,
|
but I guess 1130 has many (historicall)locations (how many?) |
Well, 1130 has only one location, and 1129 has 10, which is pretty strange at first sight! |
one way to investigate is to get sql database access and reproduce the 1130 behaviour (using a query containing thing, location, historicallocation). Maybe an index is missing somewhere. |
I've added a document to get to the performed sql queries quickly, see https://github.com/gost/docs/blob/master/gost_debug_sql_queries.md So for the query 'Things(1)/Locations' the query is: SELECT A_location.location_id AS A_location_id, A_location.location_name AS A_location_name, A_location.location_description AS A_location_description, A_location.location_encodingtype AS A_location_encodingtype, A_location.location_geojson AS A_location_geojson FROM (SELECT location.id AS location_id, location.name AS location_name, location.description AS location_description, location.encodingtype AS location_encodingtype, location.geojson::text AS location_geojson FROM v1.location WHERE (SELECT thing.id AS thing_id FROM v1.thing INNER JOIN v1.thing_to_location ON thing.id = thing_to_location.thing_id AND location.id = thing_to_location.location_id WHERE thing.id = 1) IS NOT NULL ORDER BY location_id DESC LIMIT 2 OFFSET 0) AS A_location |
Thanks for the GOST_LOG_VERBOSE_FLAG exposition, saves us some time! The query on the database for SELECT A_location.location_id AS A_location_id,
A_location.location_name AS A_location_name,
A_location.location_description AS A_location_description,
A_location.location_encodingtype AS A_location_encodingtype,
A_location.location_geojson AS A_location_geojson
FROM (SELECT location.id AS location_id,
location.name AS location_name,
location.description AS location_description,
location.encodingtype AS location_encodingtype,
location.geojson::text AS location_geojson
FROM v1.location
WHERE (SELECT thing.id AS thing_id FROM v1.thing
INNER JOIN v1.thing_to_location
ON thing.id = thing_to_location.thing_id AND location.id = thing_to_location.location_id
WHERE thing.id = 1130)
IS NOT NULL
ORDER BY location_id DESC
LIMIT 2 OFFSET 0
) AS A_location Here's the output for the
We believe the problem hides within the nested loop; maybe an index issue (regarding |
Our preliminary investigations confirm that using On a short term, can you consider only including it in the query when needed? This would avoid triggering the On a longer term, it looks like the nested queries generated by the query builder would benefit from being simplified (by preferring joins over nested select) to drastically reduce the execution time. |
ok, it would be helpful to have a small reproducible case, is there something available? Script should contain http post thing, http post locations in a loop and the to be optimized test query (http get). |
Sure, we can share the script we use to load the GeoLife dataset but, given the problem occurs when increasing the number of registered objects and locations, it may take some time to execute (although it would be a good case for benchmarking GOST). |
You can download the scripts we used to upload GeoLife stuff into gost here : https://cloud.remyraes.com/s/yxp43RJmieqYpx7 (password is Geolife-STA-0) We only uploaded locations for the thing n°153 of the dataset, which contains 2024 paths and took around 19h47 to upload to gost with this script. (In case you're wondering, |
ok I can run download/import, but it's unclear to me where to specify the target SensorThings server. |
By default, it targets our VM; however the function to change the target server is not exposed publicly, I will come back to you when it's the case! |
I have to refactor the way we handle configurations within the library. I still added a method for you to be able to target any STA server, using this code sample:
using the I uploaded there (password: Geolife-STA-1) the geolife script we used with a localhost config that you can edit to suit your needs ( |
I've succesfully executed download/import (for nr 153), tool output: $ npm run dataset:import 153 Thing n°153 | 2024/2024 files | 2159018/2159018 locations imported | 55:52 In the database I see now 2156994 locations, 3 things (1,2,3), 2156994 historicallocation. Only thing_id =3 has locations. Some remarks:
|
Perf testing, requesting /v1.0/Things(3)/Locations gives:
So first two requests within 20 ms. Any more ideas how to reproduce the behaviour you are seeing? |
I used the tool to import the n°153 on a new gost instance in localhost, it created one single thing and imported all locations in 55 mins 52 secs too! Maybe excessive importation time is due to network stuff or VM configuration, will investigate about this. (regarding the 2024 "missing" locations, due to 2024 also being the number of files, I highly suspect an error in the counter rather than in the importation process) Requesting /v1.0/Things(1)/Locations now seems to behave normally:
We are now investigating towards VM configuration and SQL queries optimisation. |
ok, i'm closing this issue. If there is a reproducible performance issue you can open a new one. |
Hello guys,
Pushing lots of
Location
entities on mygost
instance (a bit more than 2 million), the system starts to behave strangely:Maybe related to #152 ?
Cheers!
The text was updated successfully, but these errors were encountered: