Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INLONG-7249][Sort] JDBC accurate dirty data archive and metric calculation #7253

Closed
wants to merge 30 commits into from

Conversation

Yizhou-Yang
Copy link
Contributor

@Yizhou-Yang Yizhou-Yang commented Jan 17, 2023

the pr needs to remove logs and add comments before the final merge

Fixes #7249

Feature: record level accurate dirty data archive and metric for jdbc connector

Design:

  1. Use java reflection to override TableSimpleStatementExecutor, pass in inlong metric and dirty helper for accurate metrics.
  2. Add a counter, to not do record level flush for the first 3 times in case of network problems
  3. Parse out the error record from error message
  4. Add batch mechanism and iterate the batch to find the position of dirty data
  5. Refresh batch, update metrics, and retry the records after dirty data.

@Yizhou-Yang Yizhou-Yang changed the title To be renamed [INLONG-7249][Sort] JDBC accurate dirty data archive and metric calculation Jan 17, 2023
@Yizhou-Yang
Copy link
Contributor Author

Yizhou-Yang commented Jan 29, 2023

todo:
for multiple sink, avoid redundent metric calculation when the executor is enhanced
check compatibility and performance in various scenerios

some thoughts on this pr:
1.this pr is DEPENDENT on the variable naming of specific flink executors. Hopefully the namings in flink won't be changed.
2.the long[] for metrics is not thread safe on write. I think this is ok because flush only happens after a certain interval/batch size, and there should not be concurrent modification.

@Yizhou-Yang
Copy link
Contributor Author

TODO : the pr is still being performance-tested and need to remove logs and add comments before the final merge

@Yizhou-Yang Yizhou-Yang marked this pull request as draft February 8, 2023 06:56
@Yizhou-Yang Yizhou-Yang marked this pull request as ready for review February 8, 2023 07:44
@Yizhou-Yang Yizhou-Yang marked this pull request as draft February 9, 2023 09:58
@Yizhou-Yang
Copy link
Contributor Author

I will reorganize the code and produce a new pr sometime later this month

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][Sort] JDBC accurate dirty data archive and metric calculation
6 participants