-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async rdbms order #3611
Async rdbms order #3611
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3611 +/- ##
==========================================
- Coverage 80.90% 80.86% -0.05%
==========================================
Files 425 425
Lines 32231 32237 +6
==========================================
- Hits 26077 26068 -9
- Misses 6154 6169 +15
Continue to review full report at Codecov.
|
This comment was marked as outdated.
This comment was marked as outdated.
b47f53b
to
6d4138a
Compare
In the case of mysql, such incremental field is required to be the last one, so we need to reorder the fields here. See https://thewebfellas.com/blog/conditional-duplicate-key-updates-with-mysql/ ------------------------------------------------------------------------ > Simple conditional updates Unfortunately there’s a catch: the app can process events out-of-order meaning updates to the summary table won’t necessarily occur in the order that the events are created. This means that the last_event_id and last_event_created_at fields should only be updated if the event is newer than the last one for the day. In SQL terms it should look like this: INSERT INTO daily_events (created_on, last_event_id, last_event_created_at) VALUES ('2010-01-19', 23, '2010-01-19 10:23:11') ON DUPLICATE KEY UPDATE last_event_id = VALUES(last_event_id), last_event_created_at = VALUES(last_event_created_at) WHERE last_event_created_at < VALUES(last_event_created_at); The bad news is that this won’t work as MySQL doesn’t allow WHERE clauses in the update portion of the query. In a simple case like this the easiest workaround is to use the IF function: INSERT INTO daily_events (created_on, last_event_id, last_event_created_at) VALUES ('2010-01-19', 23, '2010-01-19 10:23:11') ON DUPLICATE KEY UPDATE last_event_id = IF(last_event_created_at < VALUES(last_event_created_at), VALUES(last_event_id), last_event_id), last_event_created_at = IF(last_event_created_at < VALUES(last_event_created_at), VALUES(last_event_created_at), last_event_created_at); This works by checking if the last_event_created_at timestamp of the event being updated is newer than the current timestamp, if it is then the new value is assigned to the field in the update, otherwise the current value is used. An important thing to keep in mind when using this approach is that the order in which you update your fields is very important. I was wrongly under the impression that the updates took place in one mass-assignment after the entire query had been interpreted by MySQL. But they’re not: the assignments happen in the order they appear in the query. To give you an example, this query won’t produce the expected result: INSERT INTO daily_events (created_on, last_event_id, last_event_created_at) VALUES ('2010-01-19', 23, '2010-01-19 10:23:11') ON DUPLICATE KEY UPDATE last_event_created_at = IF(last_event_created_at < VALUES(last_event_created_at), VALUES(last_event_created_at), last_event_created_at), last_event_id = IF(last_event_created_at < VALUES(last_event_created_at), VALUES(last_event_id), last_event_id); When the update is executed with a more recent event, the last_event_created_at field will be updated, but the last_event_id field won’t. This is because when the second IF is evaluated last_event_created_at has already been updated so that last_event_created_at is equal to VALUES(last_event_created_at). Crazy huh?! ------------------------------------------------------------------------
6d4138a
to
32a7f94
Compare
small_tests_24 / small_tests / 32a7f94 small_tests_23 / small_tests / 32a7f94 dynamic_domains_pgsql_mnesia_23 / pgsql_mnesia / 32a7f94 dynamic_domains_mysql_redis_24 / mysql_redis / 32a7f94 dynamic_domains_pgsql_mnesia_24 / pgsql_mnesia / 32a7f94 dynamic_domains_mssql_mnesia_24 / odbc_mssql_mnesia / 32a7f94 ldap_mnesia_24 / ldap_mnesia / 32a7f94 ldap_mnesia_23 / ldap_mnesia / 32a7f94 internal_mnesia_24 / internal_mnesia / 32a7f94 pgsql_mnesia_24 / pgsql_mnesia / 32a7f94 mysql_redis_24 / mysql_redis / 32a7f94 pgsql_mnesia_23 / pgsql_mnesia / 32a7f94 elasticsearch_and_cassandra_24 / elasticsearch_and_cassandra_mnesia / 32a7f94 mssql_mnesia_24 / odbc_mssql_mnesia / 32a7f94 riak_mnesia_24 / riak_mnesia / 32a7f94 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good, I only have some general notes:
- How do we check that it indeed works as intended? Maybe add a test in
rdbms_SUITE
? - Of course it would be good to add it for MS SQL. Do you know if this is even possible?
SQL = upsert_query(Host, Table, InsertFields, Updates, UniqueKeyFields), | ||
prepare_upsert(Host, Name, Table, InsertFields, Updates, UniqueKeyFields, none). | ||
|
||
-spec prepare_upsert(Host :: mongoose_rdbms:server(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: I think we should make this a map of arguments, especially that some of them are optional.
Test added 👌🏽
Probably, but honestly I have no idea how, and I'd like to unblock inbox now. Perhaps we can leave the mssql case as a ticket for the future? Perhaps to when a client actually requests it?
May be, but there's plenty of modules using this function and I didn't want to introduce a big diff (albein mechanical) in this PR, dunno, but could do 🤔 |
small_tests_24 / small_tests / 7db782e small_tests_23 / small_tests / 7db782e dynamic_domains_pgsql_mnesia_23 / pgsql_mnesia / 7db782e dynamic_domains_mysql_redis_24 / mysql_redis / 7db782e muc_SUITE:hibernation:hibernated_room_can_be_queried_for_archive{error,{{assertion_failed,assert,is_groupchat_message,
[<<"Restorable message">>],
undefined,"undefined"},
[{escalus_new_assert,assert_true,2,
[{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_new_assert.erl"},
{line,84}]},
{muc_SUITE,wait_for_mam_result,3,
[{file,"/home/circleci/project/big_tests/tests/muc_SUITE.erl"},
{line,4383}]},
{muc_SUITE,'-hibernated_room_can_be_queried_for_archive/1-fun-0-',3,
[{file,"/home/circleci/project/big_tests/tests/muc_SUITE.erl"},
{line,4124}]},
{escalus_story,story,4,
[{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
{line,72}]},
{muc_SUITE,hibernated_room_can_be_queried_for_archive,1,
[{file,"/home/circleci/project/big_tests/tests/muc_SUITE.erl"},
{line,4120}]},
{test_server,ts_tc,3,[{file,"test_server.erl"},{line,1783}]},
{test_server,run_test_case_eval1,6,
[{file,"test_server.erl"},{line,1292}]},
{test_server,run_test_case_eval,9,
[{file,"test_server.erl"},{line,1224}]}]}} ldap_mnesia_24 / ldap_mnesia / 7db782e dynamic_domains_mssql_mnesia_24 / odbc_mssql_mnesia / 7db782e dynamic_domains_pgsql_mnesia_24 / pgsql_mnesia / 7db782e internal_mnesia_24 / internal_mnesia / 7db782e ldap_mnesia_23 / ldap_mnesia / 7db782e mysql_redis_24 / mysql_redis / 7db782e pgsql_mnesia_24 / pgsql_mnesia / 7db782e elasticsearch_and_cassandra_24 / elasticsearch_and_cassandra_mnesia / 7db782e pgsql_mnesia_23 / pgsql_mnesia / 7db782e mssql_mnesia_24 / odbc_mssql_mnesia / 7db782e riak_mnesia_24 / riak_mnesia / 7db782e |
Nice!
Ok, let's do it like this
Yes, let's do it separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
If we have asynchronous inserts for inbox or smart_markers, there's a risk of different nodes aggregating different data, and then inserting it in undefined orders, so an earlier event could be flushed by one node after a latter event from another node. For example in the case of a conversation between Alice and Bob, when Alice and Bob are connected to different nodes on a cluster.
So what we want here, is for the DB to ignore updates, if the timestamp of such update is not strictly higher than the timestamp from the previous update.
For this, we add one more parameter to the preparing of queries, that constructs the appropriate query for Postgres and MySQL. This is currently not implemented for MSSQL.
With this, now the asynchronous backends for inbox and markers are much closer to being production ready, the only thing missing is the remove events, which don't have a timestamp so they won't be reordered correctly with updates. I plan to make removals tagged updates instead, like in the case of inbox, moving them to a hidden box instead.
Note the commit message for f887636 for how the trick works for MySQL.