-
Notifications
You must be signed in to change notification settings - Fork 534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[INLONG-5169][Sort] Fix race condition issue of HBaseSinkFunction metric collection #5170
Conversation
@yunqingmoswu @thexiay PTAL, thanks. |
...sort-connectors/hbase/src/main/java/org/apache/inlong/sort/hbase/sink/HBaseSinkFunction.java
Outdated
Show resolved
Hide resolved
...sort-connectors/hbase/src/main/java/org/apache/inlong/sort/hbase/sink/HBaseSinkFunction.java
Outdated
Show resolved
Hide resolved
Maybe we can define a |
...sort-connectors/hbase/src/main/java/org/apache/inlong/sort/hbase/sink/HBaseSinkFunction.java
Outdated
Show resolved
Hide resolved
...sort-connectors/hbase/src/main/java/org/apache/inlong/sort/hbase/sink/HBaseSinkFunction.java
Outdated
Show resolved
Hide resolved
done |
Hi guys, thanks for reviewing. First, I'm sorry that I didn't explain my points behind the PR. As you can see, all the comments are focused on the scenario of flushing failure. I think we don't need care about the metrics after flushing failure. Because
My points here are:
What do you guys think of this? @EMsnap @gong @yunqingmoswu |
As far as I know, the underlying logic of refresh should generate hfile first and then load, then refresh will only have this batch of data either all visible or all lost. If I understand correctly, then after refreshing the statistics, the data obtained will be more accurate, not only for dirty data, but also for normal sync data. |
@ifndef-SleePy hello, here define of dirty data is write fail, and we need data records and data size. |
[1] https://hbase.apache.org/book.html#arch.bulk.load |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to report indicators before and after flushing, but it is recommended to report indicators after flushing.
Thanks for clarifying. |
After an offline communication with @yunqingmoswu , we reached an agreement that just keep the origin implementation with thread-safe counter. Because it's bases on production requirement, not technical.
|
please resolve conflic@ifndef-SleePy |
I've re-created the PR after resolving conflicts. |
…lacing counter with thread-safe implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Prepare a Pull Request
Motivation
Modifications
Describe the modifications you've done.
Verifying this change
(Please pick either of the following options)
Documentation