-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cherry-pick](branch-2.1) Pick "[Enhancement](group commit)Optimize be select for group commit #35558" #37830
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
run buildall |
1 similar comment
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TeamCity be ut coverage result: |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TeamCity be ut coverage result: |
…e#35558) 1. Streamload and insert into, if batched and sent to the master FE, should use a consistent BE strategy (previously, insert into reused the first selected BE, while streamload used round robin). First, a map <table id, be id> records a fixed be id for a certain table. The first time a table is imported, a BE is randomly selected, and this table id and be id are recorded in the map permanently. Subsequently, all data imported into this table will select the BE corresponding to the table id recorded in the map. This ensures that batching is maximized to a single BE. To address the issue of excessive load on a single BE, a variable similar to a bvar window is used to monitor the total data volume sent to a specific BE for a specific table during the batch interval (default 10 seconds). A second map <be id, window variable> is used to track this. If a new import finds that its corresponding BE's window variable is less than a certain value (e.g., 1G), the new import continues to be sent to the corresponding BE according to map1. If it exceeds this value, the new import is sent to another BE with the smallest window variable value, and map1 is updated. If every BE exceeds this value, the one with the smallest value is still chosen. This helps to alleviate excessive pressure on a single BE. 2. For streamload, if batched and sent to a BE, it will batch directly on this BE and will commit the transaction at the end of the import. At this point, a request is sent to the FE, which records the size of this import and adds it to the window variable. 3. Streamload sent to observer FE, as well as insert into sent to observer FE, follow the logic in 1 by RPC, passing the table id to the master FE to obtain the selected be id.
run buildall |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
TeamCity be ut coverage result: |
run feut |
Proposed changes
Issue Number: close #xxx
Pick #35558