-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cylc.rundb: improve commit/rollback logic #1788
Conversation
This looks good, although my level of sqlite wizardry is sub-optimal. Could you (or did you) test your theory about what's wrong with current master, and that this will fix it, by messing with the code to cause a deliberate failure mid-way through some statements? |
04c18b3
to
93adf74
Compare
Branch re-based. Added test to prove that DB statement diagnostic is printed on STDERR. |
93adf74
to
0b7ca25
Compare
Yes, I have simulated the problem by adding to if self.is_public and self.n_tries == 0 and stmt.startswith("UPDATE task_states"):
raise sqlite3.Error Running a single-task suite, we'll end up with no entry in the |
0b7ca25
to
8010205
Compare
There might be a performance issue here. This randomly chosen small test suite takes almost twice as long to run on this branch, and the extra time is all spent in the db code:
Profiling on master:
Profiling on this branch:
8 seconds does seem like a long time given the small amount of data being written here. (I have not had time to profile any more realistic suites) |
To reduce the number of commits, the alternative is to commit all statements together like before, but also allow rollback and retry all statements together, hence don't delete any statements until all successful. |
8010205
to
adcabba
Compare
I have updated the logic to reduce the number of commits. It will only call commit on success of the whole set of queued items, and will only delete items in the queue if everything is OK. This will allow it to retry the whole set of queued items on failure. The change continues to work correctly with my simulated failure described above. |
Performance is similar to master now. Test battery passes here. Review 1 - good. |
Actually (post review 1!) I decided to run the database tests a few times, and in two of eight runs
|
@matthewrmshin - can you look into the issue reported by @hjoliver above? |
Test should now be robust. |
8341cdd
to
d7be7b8
Compare
Branch re-based. |
Commit and delete all queued items only on success. Roll back if anything dies, and retry all queued items next time. This ensures statements are never missed on retry. Print statement diagnostics on DB connect error for write.
A sleep now guarantees that when the database is locked, the next statement would be an INSERT INTO task_events statement.
d7be7b8
to
5113af1
Compare
All good now; back to @arjclark |
Looks OK to me. No problems from the test-battery in my environment. |
Commit as soon as statements are executed correctly - should reset roll back point.Roll back as soon as statements are not executed correctly - should prevent roll back of more statements than intended.Commit and delete all queued items only on success. Roll back if anything dies, and retry all queued items next time. This ensures statements are never missed on retry.
Print statement diagnostics on DB connect error for write.
@hjoliver please take an initial look.