Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

When br is restoring cdclog, if there is a ddl operation in the TS interval, the ddl operation will not be restored #870

Closed
tongtongyin opened this issue Mar 15, 2021 · 6 comments
Assignees

Comments

@tongtongyin
Copy link
Contributor

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    If possible, provide a recipe for reproducing the error.
    When cdc outputs the ddl operation, name the suffix of the ddl file name as uint64max-TS of the first ddl operation (it can be directly named as the TS of the last ddl), and then when collecting ddl files during br recovery, add this ts is regarded as the TS of the last ddl, and all ddl smaller than startTs are filtered out, so some ddl files that should not be filtered are filtered out, so some ddl operations are not restored when they are restored.

  2. What did you expect to see?

  3. What did you see instead?

  4. What version of BR and TiDB/TiKV/PD are you using?

  1. Operation logs

    • Please upload br.log for BR if possible
    • Please upload tidb-lightning.log for TiDB-Lightning if possible
    • Please upload tikv-importer.log from TiKV-Importer if possible
    • Other interesting logs
  2. Configuration of the cluster and the task

    • tidb-lightning.toml for TiDB-Lightning if possible
    • tikv-importer.toml for TiKV-Importer if possible
    • topology.yml if deployed by TiUP
  3. Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus if possible

@tongtongyin tongtongyin added the type/bug Something isn't working label Mar 15, 2021
@3pointer
Copy link
Collaborator

The reason we choose uint64max-TS is here, And I didn't get it why all ddls smaller than startTs are filtered out is wrong? could you provide an example?

@dveeden
Copy link
Contributor

dveeden commented Apr 26, 2021

tidb 5.7.25-TiDB-v5.0.1 > begin; select @@tidb_current_ts; rollback;
Query OK, 0 rows affected (0.01 sec)

+--------------------+
| @@tidb_current_ts  |
+--------------------+
| 424528599009984514 |
+--------------------+
1 row in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

tidb 5.7.25-TiDB-v5.0.1 > create table cdcinsert (id int primary key);
Query OK, 0 rows affected (0.60 sec)

tidb 5.7.25-TiDB-v5.0.1 > insert into cdcinsert values (1),(2),(3);
Query OK, 3 rows affected (0.10 sec)
Records: 3  Duplicates: 0  Warnings: 0

tidb 5.7.25-TiDB-v5.0.1 > update cdcinsert set id=4 where id=2;
Query OK, 1 row affected (0.10 sec)
Rows matched: 1  Changed: 1  Warnings: 0

tidb 5.7.25-TiDB-v5.0.1 > delete from cdcinsert where id=3;
Query OK, 1 row affected (0.12 sec)

tidb 5.7.25-TiDB-v5.0.1 > insert into cdcinsert values (5);
Query OK, 1 row affected (0.12 sec)

tidb 5.7.25-TiDB-v5.0.1 > insert into cdcinsert values (6);
Query OK, 1 row affected (0.12 sec)

tidb 5.7.25-TiDB-v5.0.1 > insert into cdcinsert values (7);
Query OK, 1 row affected (0.07 sec)

tidb 5.7.25-TiDB-v5.0.1 > begin; select @@tidb_current_ts; rollback;
Query OK, 0 rows affected (0.01 sec)

+--------------------+
| @@tidb_current_ts  |
+--------------------+
| 424528626626854913 |
+--------------------+
1 row in set (0.01 sec)

Query OK, 0 rows affected (0.00 sec)
[dvaneeden@dve-carbon ~]$ tiup br restore cdclog --pd=192.168.122.50:2379 --storage=/tmp/cdclog --start-ts=424528599009984514 --end-ts=424528626626854913 --log-level=debug
Starting component `br`: /home/dvaneeden/.tiup/components/br/v5.0.1/br restore cdclog --pd=192.168.122.50:2379 --storage=/tmp/cdclog --start-ts=424528599009984514 --end-ts=424528626626854913 --log-level=debug
Detail BR log in /tmp/br.log.2021-04-26T16.47.05+0200 
Error: [schema:1146]Table 'test.cdcinsert' doesn't exist
Error: run `/home/dvaneeden/.tiup/components/br/v5.0.1/br` (wd:/home/dvaneeden/.tiup/data/SVhQY9u) failed: exit status 1
[2021/04/26 16:47:06.140 +02:00] [INFO] [log_client.go:198] ["filter ddl file by ts"] [name=ddl.18022218450899042302] [ts=424525622810509313]
[2021/04/26 16:47:06.140 +02:00] [INFO] [log_client.go:708] ["collect ddl files"] [files="[]"]
[2021/04/26 16:47:06.140 +02:00] [INFO] [log_client.go:241] ["no ddls to restore"]

So it looks like the DDL file is filtered out while part of the file should have been applied.

This looks like it may be the same issue as reported here.

@dengqee
Copy link

dengqee commented May 6, 2021

The reason we choose uint64max-TS is here, And I didn't get it why all ddls smaller than startTs are filtered out is wrong? could you provide an example?

func (l *LogClient) maybeTSInRange(ts uint64) bool {
// We choose the last event's ts as file name in cdclog when rotate.
// so even this file name's ts is larger than l.endTS,
// we still need to collect it, because it may have some events in this ts range.
// TODO: find another effective filter to collect files
return ts >= l.startTS
}

When filtering a DDL file, here treat the suffix of the file name as the commit ts of the last DDL in this file, but in fact, the file name corresponds to the commit ts of the first DDL.
https://github.com/pingcap/ticdc/blob/e4e96ed4204abc30f1e983e477cdf74d267948af/cdc/sink/cdclog/file.go#L267-L274
https://github.com/pingcap/ticdc/blob/e4e96ed4204abc30f1e983e477cdf74d267948af/cdc/sink/cdclog/s3.go#L307-L311

@dengqee
Copy link

dengqee commented May 6, 2021

The reason we choose uint64max-TS is here, And I didn't get it why all ddls smaller than startTs are filtered out is wrong? could you provide an example?

func (l *LogClient) maybeTSInRange(ts uint64) bool {
// We choose the last event's ts as file name in cdclog when rotate.
// so even this file name's ts is larger than l.endTS,
// we still need to collect it, because it may have some events in this ts range.
// TODO: find another effective filter to collect files
return ts >= l.startTS
}

When filtering a DDL file, here treat the suffix of the file name as the commit ts of the last DDL in this file, but in fact, the file name corresponds to the commit ts of the first DDL.
https://github.com/pingcap/ticdc/blob/e4e96ed4204abc30f1e983e477cdf74d267948af/cdc/sink/cdclog/file.go#L267-L274
https://github.com/pingcap/ticdc/blob/e4e96ed4204abc30f1e983e477cdf74d267948af/cdc/sink/cdclog/s3.go#L307-L311

// We choose the last event's ts as file name in cdclog when rotate.

This operate is done by cdclog, but ddl files do not rename to the last commit ts. @3pointer

@3pointer
Copy link
Collaborator

The reason we choose uint64max-TS is here, And I didn't get it why all ddls smaller than startTs are filtered out is wrong? could you provide an example?

func (l *LogClient) maybeTSInRange(ts uint64) bool {
// We choose the last event's ts as file name in cdclog when rotate.
// so even this file name's ts is larger than l.endTS,
// we still need to collect it, because it may have some events in this ts range.
// TODO: find another effective filter to collect files
return ts >= l.startTS
}

When filtering a DDL file, here treat the suffix of the file name as the commit ts of the last DDL in this file, but in fact, the file name corresponds to the commit ts of the first DDL.
https://github.com/pingcap/ticdc/blob/e4e96ed4204abc30f1e983e477cdf74d267948af/cdc/sink/cdclog/file.go#L267-L274
https://github.com/pingcap/ticdc/blob/e4e96ed4204abc30f1e983e477cdf74d267948af/cdc/sink/cdclog/s3.go#L307-L311

// We choose the last event's ts as file name in cdclog when rotate.

This operate is done by cdclog, but ddl files do not rename to the last commit ts. @3pointer

yes, you are right, I give a workaround PR to fix it. And I think we'd better unified it in the future before we release this feature.

@IANTHEREAL
Copy link
Collaborator

can we close it?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants