Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INLONG-7554][Sort] MySQL CDC supports parsing gh-ost records #7750

Merged
merged 17 commits into from
Apr 23, 2023

Conversation

e-mhui
Copy link
Contributor

@e-mhui e-mhui commented Mar 31, 2023

Prepare a Pull Request

[INLONG-7554][Sort] MySQL CDC supports parsing gh-ost records

Motivation

gh-ost is a triggerless online schema migration solution for MySQL.

When we use the gh-ost tool, it generates multiple DDL statements. For example, when adding a column c to table tb1 using the gh-ost tool, these DDL statements demonstrate how gh-ost works.

DROP TABLE IF EXISTS `menghuiyu`.`_tb1_gho`
DROP TABLE IF EXISTS `menghuiyu`.`_tb1_del`
DROP TABLE IF EXISTS `menghuiyu`.`_tb1_ghc`
create /* gh-ost */ table `menghuiyu`.`_tb1_ghc` (\n\t\t\tid bigint auto_increment,\n\t\t\tlast_update timestamp not null DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,\n\t\t\thint varchar(64) charset ascii not null,\n\t\t\tvalue varchar(4096) charset ascii not null,\n\t\t\tprimary key(id),\n\t\t\tunique key hint_uidx(hint)\n\t\t) auto_increment=256
create /* gh-ost */ table `menghuiyu`.`_tb1_gho` like `menghuiyu`.`tb1`
alter /* gh-ost */ table `menghuiyu`.`_tb1_gho` add column c varchar(255)
create /* gh-ost */ table `menghuiyu`.`_tb1_del` (\n\t\t\tid int auto_increment primary key\n\t\t) engine=InnoDB comment='ghost-cut-over-sentry'
DROP TABLE IF EXISTS `menghuiyu`.`_tb1_del`
rename /* gh-ost */ table `menghuiyu`.`tb1` to `menghuiyu`.`_tb1_del`
rename /* gh-ost */ table `menghuiyu`.`_tb1_gho` to `menghuiyu`.`tb1`
DROP TABLE IF EXISTS `menghuiyu`.`_tb1_ghc`
DROP TABLE IF EXISTS `menghuiyu`.`_tb1_del`

MySQL CDC captures these DDL statements and synchronizes them to the sink, the sink cannot recognize the gh-ost tables (_tb1_gho, _tb1_ghc, _tb1_del) in these DDL statements and we only need to synchronize the alter operation to the sink. Therefore, we need to extract the alter statement and restore the gh-ost tables to the original table tb1.

Main workflow:

image

Modifications

  1. Add "gh-ost.ddl.change" and "gh-ost.table.regex" options.
    public static final ConfigOption<Boolean> GH_OST_DDL_CHANGE= ConfigOptions
            .key("gh-ost.ddl.change")
            .booleanType()
            .defaultValue(false)
            .withDescription(
                    "Whether parse ddl changes of gh-ost, default value is 'false'.");
    public static final ConfigOption<String> GH_OST_TABLE_REGEX = ConfigOptions
            .key("gh-ost.table.regex")
            .stringType()
            .defaultValue("^_(.*)_(gho|ghc|del|new|old)$")
            .withDescription(
                    "Matcher the original table name from the ddl of gh-ost.");
  1. Process the gh-ost records.
    /**
     * Extract ghost ddl record
     *
     * @param data
     * @return
     * @throws Exception
     */
    private GenericRowData extractGhostRecord(GenericRowData data) throws Exception {
        String ddl = ((Map<String, String>)data.getField(0)).get(DDL_FIELD_NAME);
        // According ghost table regex to find ghost table
        if (this.ghostTableRegex.startsWith(CARET) && this.ghostTableRegex.endsWith(DOLLAR)) {
            this.ghostTableRegex = this.ghostTableRegex.substring(1, this.ghostTableRegex.length() - 1);
        }
        Pattern ghostTablePattern = Pattern.compile(this.ghostTableRegex);
        Matcher ghostMatcher = ghostTablePattern.matcher(ddl);
        if (ghostMatcher.find()) {
            // Just need Alter statement
            if (ddl.toUpperCase().contains(DDL_OP_ALTER)) {
                String originTable = ghostMatcher.group(1);
                return (GenericRowData) physicalConverter
                        .convert(ddl.replaceAll(this.ghostTableRegex, originTable), null);
            } else {
                return null;
            }
        }
        return data;
    }

Verifying this change

Run AllMigrateTest.java

Documentation

  • Does this pull request introduce a new feature? (yes )
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
  • docs
Option Required Default Type Description
gh-ost.ddl.change optional false Boolean Whether parse ddl changes of gh-ost, default value is 'false'.
gh-ost.table.regex optional ^_(.*)_(gho|ghc|del|new|old)$ String Matcher the original table name from the ddl of gh-ost.

@dockerzhang dockerzhang requested review from EMsnap, yunqingmoswu and gong and removed request for EMsnap and yunqingmoswu March 31, 2023 14:25
@e-mhui e-mhui marked this pull request as draft April 4, 2023 06:47
@e-mhui e-mhui marked this pull request as ready for review April 5, 2023 10:01
@e-mhui e-mhui marked this pull request as draft April 6, 2023 01:48
@e-mhui e-mhui marked this pull request as ready for review April 20, 2023 09:30
Copy link
Contributor

@EMsnap EMsnap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, LGTM @e-mhui

@EMsnap EMsnap merged commit cd02218 into apache:master Apr 23, 2023
Yizhou-Yang pushed a commit to Yizhou-Yang/inlong-yyz that referenced this pull request May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][Sort] MySQL CDC supports parsing gh-ost records
4 participants