Skip to content
This repository was archived by the owner on Dec 8, 2021. It is now read-only.

backend/local: more robust range retry strategy #476

Merged
merged 9 commits into from
Nov 26, 2020
Merged

backend/local: more robust range retry strategy #476

merged 9 commits into from
Nov 26, 2020

Conversation

glorv
Copy link
Contributor

@glorv glorv commented Nov 16, 2020

What problem does this PR solve?

Add better range retry strategy to reduce the probability of Epoch Not Match error and make lightning run faster if the range conflict is very heavy.

What is changed and how it works?

  • Remove the parallel write&ingest code. Since we already do concurrency at a higher level, this can make the retry logic clearer and easier.
  • If a kv range is written to multi regions, only retry the failed regions.
  • Only increase the retry counter if one retry has no progress at all

Benchmark result:

The benchmark was running in the following two dataset:

  1. dataset1. randomly generated int primary key. (50GB)
  2. dataset2. randomly generated int primary key and 2 extra secondary keys. (126G)

lighting config:

[lightning]
 table-concurrency = 8

[mydumper]
batch-size = "1GiB"
DateSet branch EpochNotMatch errors count cost time
dataset1 master 1949 25m23s
dataset1 retry-range 196 6m38s
dataset2 master 4130 2h14m
dataset2 retry-range 298 50m15s

Check List

Tests

  • Unit test
  • Integration test
  • Manual test
  • No code

Side effects

Related changes

@glorv glorv requested review from 3pointer and kennytm November 16, 2020 09:03
Copy link
Collaborator

@kennytm kennytm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kennytm kennytm added the status/LGT1 One reviewer already commented LGTM (LGTM1) label Nov 16, 2020
@glorv glorv added this to the 4.0.9 milestone Nov 23, 2020
@lance6716
Copy link
Contributor

/run-all-tests

[2020-11-25T11:24:29.480Z] ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 104

Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kennytm kennytm added status/LGT2 Two reviewers already commented LGTM, ready for merge (LGTM2) and removed status/LGT1 One reviewer already commented LGTM (LGTM1) labels Nov 26, 2020
@glorv glorv merged commit cbad36a into master Nov 26, 2020
@glorv glorv deleted the retry-range branch November 26, 2020 04:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status/LGT2 Two reviewers already commented LGTM, ready for merge (LGTM2)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants