-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for parallel copy (FR) #193
Comments
I need a very strong incentive to parallelize again. I'm unsure whether at this time partitioned tables actually allow concurrent writes to different partitions (didn't use to allow that). For non-partitioned tables this is an absolute NO-GO. Parallel Unless I see a strong case I'm going to label as WontFix. |
pt-online-schema-change is also single-threaded. Partitioned tables do allow parallel writes for sure. There are hints for partitions for DML http://dev.mysql.com/doc/refman/5.6/en/partitioning-selection.html and most of the metadata required is in information_schema.partitions, similar estimates to tables. Partitioning clearly allows for divide and conquer copy. |
On rowcopy, yes. However the triggers create parallelism, which cause locking: https://github.com/github/gh-ost/blob/master/doc/why-triggerless.md
Not on single table access; it's still faster than concurrent (I'm talking about bulk loads workload)
This does not depict the full picture. At least on 5.5 (and as I recall also on 5.6, early releases), writing to a partition, even InnoDB one, would require traversing the entire partition set to attempt to acquire lock, which would not be acquired after all, because there would be no need to. But the traversal is there, and it is a serializing mechanism. I'm happy to see benchmarks that show write throughout serial vs. parallel on partitioned tables. If these show significant improvement with parallelism, then I'm happy to pursue this. |
I agree that it would be very useful if it would actually works. But I also agree that this could make things more complicated (locking issues) and that it might not work as expected. Maybe applying changes which are read from the binlog stream can be done in parallel or maybe grouped into one transaction (like group commit). Is there any group info available from the binlogs? Another thing we could (but probably should not) do is to prefetch pages by selecting it. This would only speedup the writes on the master because the pages are already in the buffer pool. We should not focus only on InnoDB. Running gh-ost against MyRocks, TokuDB or NDB might work and might be able to benefit from parallelism. (NDB might not be the best example here..) For copying the table: Multiple |
This is certainly cool; it would leave the server in weird state if |
I tested the exchange on a big partitioned table (3 billion rows, 500Gb, 90 partitions) with conventional online alter and it went twice faster in 5.6 for full rebuild. One problem was exchanging back because the full table is fully read to see if it matches the partitioning rules. 5.7 has exchange without validation which makes the exchange nearly immediate though serialized due to MDL. If the exchange is done without validation, the gain is a factor 3-4. Still interesting, especially for testing. About the "would leave the server in weird state if gh-ost crashes", it would be the case anyway ! You would have to drop everything that was in progress. There would be some cleanup if the server crashes just like for pt-online-schema-change,
Sure, this is certainly something that the community can contribute to if it is not on your road map ! |
Sorry, I didn't follow through. What was the experiment exactly? A factor of 3-4 reduction in runtime is very interesting and justifies such a development.
True.
It's just that it's a relatively big change right now to work on. |
The parallel rebuild of a big partitioned table using exchange partitions : if P is the number of partitions
you can parallelize 2. I am not sure we could apply this to gh-ost though but it works very well using online alters if you do not crash in the middle. So good for development. |
Thanks for explaining. Very interesting. |
I've set up the following primitive benchmark:
|
@valeriikravchuk thank you for the benchmark! Looking somewhat deeper into this, the code change would be non-trivial, and would consume some time. This feature is not on my immediate roadmap and will be pushed till a later time, unless someone wants to do it (and in which case, please consult with me beforehand -- thank you!) |
Quick question regarding the statement below
Do you have any links which actually benchmark parallel inserts into the same table? We have a table with > 6 billion rows, and I estimate gh-ost to take around 3 weeks to run a migration on this table. Except sharding (which we are working towards), is there any way to speed up a gh-ost migration on such tables? |
I do not have public ones (I haven't searched). I've had multiple attempts in my past observing this.
We just came out of a The usual |
@shlomi-noach I ran this on aurora to test it out, in an idle db of db.r3.8xlarge Single Inserts - 188 minutes for 100M rows In one parallel thread, I started from 0 million to 50 million, while in the other I did from 50 million, to 100 million. Now this could be because of how aws aurora scales which is a black box. Just wanted to share results, for those curious. |
@pratik60 thank you, very interesting! |
If the PK-ranges don't overlap, they will not split each other's pages. |
For big tables, it should be possible to copy the rows in parallel.
It is even more obvious for partitioned tables + replication is now parallel.
I propose this syntax :
--copy-threads
The text was updated successfully, but these errors were encountered: