Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhencement](Export) support export with outfile syntax #18325

Merged
merged 11 commits into from
Apr 20, 2023

Conversation

BePPPower
Copy link
Contributor

@BePPPower BePPPower commented Apr 3, 2023

Proposed changes

Issue Number: close #xxx

Problem summary

Export syntax provides asynchronous export function, but Export does not achieve vectorization.
Outfile syntax provides synchronous export function`.
So we can reimplement the export syntax with oufile syntax.

Checklist(Required)

  • Does it affect the original behavior
  • Has unit tests been added
  • Has document been added or modified
  • Does it need to update dependencies
  • Is this PR support rollback (If NO, please explain WHY)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added area/load Issues or PRs related to all kinds of load area/planner Issues or PRs related to the query planner kind/docs Categorizes issue or PR as related to documentation. labels Apr 3, 2023
connectContext.setEnv(Env.getCurrentEnv());
connectContext.setDatabase(job.getTableName().getDb());
// TODO(ftw): 这里的权限可能要改一下
connectContext.setQualifiedUser(UserIdentity.ROOT.getQualifiedUser());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to save user info in export job

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@morningman morningman added the kind/meta-version-change Categorizes issue or PR as related to changing meta version label Apr 12, 2023
@@ -547,6 +546,7 @@ private Env(boolean isCheckpointCatalog) {
this.routineLoadManager = new RoutineLoadManager();
this.sqlBlockRuleMgr = new SqlBlockRuleMgr();
this.exportMgr = new ExportMgr();
this.exportMgr.start();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to startMasterOnlyDaemonThreads()

public ExportMgr() {
int poolSize = Config.export_running_job_num_limit == 0 ? 5 : Config.export_running_job_num_limit;
exportingExecutor = new MasterTaskExecutor("export-exporting-job", poolSize, true);
exportingExecutor.start();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Override the start() of MasterDaemon and move this exportingExecutor.start() to start()

List<ExportJob> jobs = getExportJobs(JobState.PENDING);
// Because exportJob may be replayed from log
// we also need handle EXPORTING state exportJob.
jobs.addAll(getExportJobs(JobState.EXPORTING));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to handle exporting state jobs here.
All jobs replayed from log should be reset to PENDING.

// we also need handle EXPORTING state exportJob.
jobs.addAll(getExportJobs(JobState.EXPORTING));
int runningJobNumLimit = Config.export_running_job_num_limit;
if (runningJobNumLimit > 0 && !jobs.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to calc the remaining job num.
The number of thread in executor pool is limit to 5, and the submitted job will be queued when all threads are taken.
So here, we can simply do:

  1. change the job's state to EXPORTING, so that it won't be scheduled again.
  2. submit the job to the executor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exporting state means this exportJob is exporting. If we change the job's state to EXPORTING here, the job may not be assigned to the thread in the executor pool. So I think the change of job's state should in ExportExportingTask.exec()

return;
}

if (job.updateState(ExportJob.JobState.EXPORTING)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to persist EXPORTING state.
The state should already be changed before submitting to the executor thread pool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

newFiles.add(destPath);
}

private Status makeSnapshots() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we can comment out this makeSnapshots() method because it is not used.
It can be reopened after we implement exporting tablet one by one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


}
} catch (Exception e) {
LOG.warn("run export exporting job error", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOG.warn("run export exporting job error", e);
LOG.warn("run export exporting job {}.", job, e);

@@ -714,6 +500,52 @@ public synchronized boolean isFinalState() {
return this.state == ExportJob.JobState.CANCELLED || this.state == ExportJob.JobState.FINISHED;
}

public Status makeSnapshots() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public Status makeSnapshots() {
private Status makeSnapshots() {

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@BePPPower BePPPower marked this pull request as ready for review April 14, 2023 09:21
@BePPPower
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@BePPPower BePPPower changed the title [Enhencement](Export) support export [Enhencement](Export) support export with outfile syntax Apr 14, 2023
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

morningman
morningman previously approved these changes Apr 16, 2023
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 16, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman
Copy link
Contributor

run clickbench

@morningman
Copy link
Contributor

run beut

@hello-stephen
Copy link
Contributor

hello-stephen commented Apr 16, 2023

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 37.01 seconds
stream load tsv: 430 seconds loaded 74807831229 Bytes, about 165 MB/s
stream load json: 22 seconds loaded 2358488459 Bytes, about 102 MB/s
stream load orc: 60 seconds loaded 1101869774 Bytes, about 17 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230419154003_clickbench_pr_131231.html

@morningman
Copy link
Contributor

run feut

@BePPPower
Copy link
Contributor Author

run buildall

@BePPPower
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@BePPPower
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@BePPPower
Copy link
Contributor Author

run feut

@BePPPower
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@BePPPower
Copy link
Contributor Author

run p0

@BePPPower
Copy link
Contributor Author

run p1

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 8e2146f into apache:master Apr 20, 2023
gnehil pushed a commit to gnehil/doris that referenced this pull request Apr 21, 2023
`Export` syntax provides asynchronous export function, but `Export` does not achieve vectorization.
`Outfile` syntax provides synchronous export function`.
So we can reimplement the export syntax with oufile syntax.
Reminiscent pushed a commit to Reminiscent/doris that referenced this pull request May 15, 2023
`Export` syntax provides asynchronous export function, but `Export` does not achieve vectorization.
`Outfile` syntax provides synchronous export function`.
So we can reimplement the export syntax with oufile syntax.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. area/load Issues or PRs related to all kinds of load area/planner Issues or PRs related to the query planner area/vectorization kind/docs Categorizes issue or PR as related to documentation. kind/meta-version-change Categorizes issue or PR as related to changing meta version reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants