Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create the new benchmark database #5839

Merged
merged 4 commits into from
Nov 7, 2024

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Oct 29, 2024

As defined in https://fburl.com/gdoc/pczummt2, this doesn't create the table, but I want to start keeping the create table SQL query on git

Testing

Manually run the query to create the table on CH https://console.clickhouse.cloud/services/c9b76950-2cf3-4fa0-93bb-94a65ff5f27d/console/database/benchmark/table/oss_ci_benchmark_v3

@huydhn huydhn requested review from kit1980, clee2000 and a team October 29, 2024 00:51
Copy link

vercel bot commented Oct 29, 2024

@huydhn is attempting to deploy a commit to the Meta Open Source Team on Vercel.

A member of the Team first needs to authorize it.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 29, 2024
Copy link

vercel bot commented Oct 29, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
torchci ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 7, 2024 0:07am

@huydhn
Copy link
Contributor Author

huydhn commented Oct 29, 2024

cc @FindHao Here is the database schema to hold the benchmark results for TritonBench. This supports adding new metrics and operators (reused the model.name field, but for ops name). I think I will need to check with you about the nested ops part (IIRC from https://fburl.com/gdoc/6yxdmuh5) to make sure that this schema can express that

@FindHao
Copy link
Member

FindHao commented Oct 29, 2024

cc @FindHao Here is the database schema to hold the benchmark results for TritonBench. This supports adding new metrics and operators (reused the model.name field, but for ops name). I think I will need to check with you about the nested ops part (IIRC from https://fburl.com/gdoc/6yxdmuh5) to make sure that this schema can express that

Thank you! Will sync up with you offline.

@huydhn huydhn requested a review from xuzhao9 October 30, 2024 02:29
@huydhn
Copy link
Contributor Author

huydhn commented Oct 30, 2024

@xuzhao9 Much appreciate if you could help take a look at the schema. Ideally, it needs to be flexible enough to contain all the information required by https://fburl.com/gdoc/6yxdmuh5

I'm working with @FindHao to nail this down atm.

@xuzhao9
Copy link
Contributor

xuzhao9 commented Nov 6, 2024

@huydhn We are planning the following JSON output schema for Tritonbench nightly run: https://docs.google.com/document/d/1jttjVsYqW_rQNISp1jX6ysCblFZA1erGIM_7yQWHg_M/edit

I am wondering are we planning to use the same schema for Torchbench (module-level benchmarking) and Tritonbench (operator-level benchmarking)?

@huydhn
Copy link
Contributor Author

huydhn commented Nov 6, 2024

@xuzhao9

@huydhn We are planning the following JSON output schema for Tritonbench nightly run: https://docs.google.com/document/d/1jttjVsYqW_rQNISp1jX6ysCblFZA1erGIM_7yQWHg_M/edit

Thank you for sharing this doc. I'm looking at the schema there and the general schema here should be able to cover all the fields needed by TritonBench. There are fields to keep information about the runner (cpu, gpu devices), about the benchmark (name, mode, precision), and also about the change (sha, branch) and the important dependencies (triton sha and branch). The good news is that it's easy to write a simple script to convert TritonBench nightly run to the format here and keep both formats if needed. For example, I have this script https://github.com/pytorch/executorch/blob/main/.github/scripts/extract_benchmark_results.py to convert the mobile benchmark results from ExecuTorch to this format.

I am wondering are we planning to use the same schema for Torchbench (module-level benchmarking) and Tritonbench (operator-level benchmarking)?

That's the goal. I'm looking to converge the different benchmark schema we are using into one, so that we can store it properly in a database table and build and API around it to allow people to access the data on OSS. So, this schema can be used by other benchmark that people are building too, TorchChat and ExecuTorch are some examples.

mem_info String,
avail_mem_in_gb UInt32,
gpu_info String,
gpu_count UInt32,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jack-Khuu Just FYI, I think that keeping the gpu_info and gpu_count here would capture the case of having more than one GPUs in one runner. My assumption is that all GPUs on the same runner will be the same. For distributed, there might be more than one runners I guess, so let me make this a list of runners instead to capture that

@huydhn huydhn merged commit 1592557 into pytorch:main Nov 7, 2024
7 checks passed
huydhn added a commit that referenced this pull request Nov 7, 2024
…5845)

After #5839, it's time to
upload the GHA to upload to S3. I'll update the S3 lambda replicator in
a separate PR.

### Testing

* Locally
```
# Backward compatibility, upload to both dynamoDB and S3 for v2 schema
$python upload_benchmark_results.py --benchmark-results-dir benchmark-results-dir-for-testing/v2 --schema-version v2 --dry-run
INFO:root:Uploading benchmark-results-dir-for-testing/v2/android-artifacts-31017223108.json to dynamoDB (v2)
INFO:root:Writing 16 documents to DynamoDB torchci-oss-ci-benchmark
INFO:root:Upload benchmark-results-dir-for-testing/v2/android-artifacts-31017223108.json to s3://ossci-benchmarks/v2/pytorch/executorch/12345/31017223108/android-artifacts-31017223108.json
INFO:root:Uploading benchmark-results-dir-for-testing/v2/android-artifacts-31017223431.json to dynamoDB (v2)
INFO:root:Writing 12 documents to DynamoDB torchci-oss-ci-benchmark
INFO:root:Upload benchmark-results-dir-for-testing/v2/android-artifacts-31017223431.json to s3://ossci-benchmarks/v2/pytorch/executorch/12345/31017223431/android-artifacts-31017223431.json

# We use only S3 for v3 schema
$python upload_benchmark_results.py --benchmark-results-dir benchmark-results-dir-for-testing/v3 --schema-version v3
INFO:root:Upload benchmark-results-dir-for-testing/v3/mock.json to s3://ossci-benchmarks/v3/pytorch/pytorch/1/1/mock.json
```

* CI
* v2
https://github.com/pytorch/test-infra/actions/runs/11606273442/job/32318017857?pr=5845#step:4:55
* v3
https://github.com/pytorch/test-infra/actions/runs/11606273442/job/32318017857?pr=5845#step:5:43

* Test PR on ExecuTorch to use the new version
https://github.com/pytorch/executorch/actions/runs/11606339159 to see
that the files are uploaded to S3
https://github.com/pytorch/executorch/actions/runs/11606339159/job/32318826449#step:8:87
huydhn added a commit that referenced this pull request Nov 15, 2024
To ease the process of gathering the benchmark metadata before uploading
the the database, I'm adding a script
`.github/scripts/benchmarks/gather_metadata.py` to gather this
information and pass it to the upload script. From
#5839, the benchmark metadata
includes the following required fields:

```
-- Metadata
`timestamp` UInt64,
`schema_version` String DEFAULT 'v3',
`name` String,
-- About the change
`repo` String DEFAULT 'pytorch/pytorch',
`head_branch` String,
`head_sha` String,
`workflow_id` UInt64,
`run_attempt` UInt32,
`job_id` UInt64,
-- The raw records on S3
`s3_path` String,
```

I'm going to test this out with PT2 compiler instruction count benchmark
at pytorch/pytorch#140493

### Testing


https://github.com/pytorch/test-infra/actions/runs/11831746632/job/32967412160?pr=5918#step:5:105
gathers the metadata and upload the benchmark results correctly

Also, an actual upload at
https://github.com/pytorch/pytorch/actions/runs/11831781500/job/33006545698#step:24:138
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Nov 20, 2024
I'm trying to make this benchmark results available on OSS benchmark database, so that people can query it from outside.  The first step is to also record the results in the JSON format compatible with the database schema defined in pytorch/test-infra#5839.

Existing CSV files remain unchanged.

### Testing

The JSON results are uploaded as artifacts to S3 https://github.com/pytorch/pytorch/actions/runs/11809725848/job/32901411180#step:26:13, for example https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/11809725848/1/artifact/test-jsons-test-pr_time_benchmarks-1-1-linux.g4dn.metal.nvidia.gpu_32901411180.zip

Pull Request resolved: #140493
Approved by: https://github.com/laithsakka
cyyever pushed a commit to cyyever/pytorch that referenced this pull request Nov 20, 2024
I'm trying to make this benchmark results available on OSS benchmark database, so that people can query it from outside.  The first step is to also record the results in the JSON format compatible with the database schema defined in pytorch/test-infra#5839.

Existing CSV files remain unchanged.

### Testing

The JSON results are uploaded as artifacts to S3 https://github.com/pytorch/pytorch/actions/runs/11809725848/job/32901411180#step:26:13, for example https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/11809725848/1/artifact/test-jsons-test-pr_time_benchmarks-1-1-linux.g4dn.metal.nvidia.gpu_32901411180.zip

Pull Request resolved: pytorch#140493
Approved by: https://github.com/laithsakka
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Nov 20, 2024
I'm trying to make this benchmark results available on OSS benchmark database, so that people can query it from outside.  The first step is to also record the results in the JSON format compatible with the database schema defined in pytorch/test-infra#5839.

Existing CSV files remain unchanged.

### Testing

The JSON results are uploaded as artifacts to S3 https://github.com/pytorch/pytorch/actions/runs/11809725848/job/32901411180#step:26:13, for example https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/11809725848/1/artifact/test-jsons-test-pr_time_benchmarks-1-1-linux.g4dn.metal.nvidia.gpu_32901411180.zip

Pull Request resolved: #140493
Approved by: https://github.com/laithsakka
youssef62 pushed a commit to youssef62/pytorch that referenced this pull request Nov 23, 2024
I'm trying to make this benchmark results available on OSS benchmark database, so that people can query it from outside.  The first step is to also record the results in the JSON format compatible with the database schema defined in pytorch/test-infra#5839.

Existing CSV files remain unchanged.

### Testing

The JSON results are uploaded as artifacts to S3 https://github.com/pytorch/pytorch/actions/runs/11809725848/job/32901411180#step:26:13, for example https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/11809725848/1/artifact/test-jsons-test-pr_time_benchmarks-1-1-linux.g4dn.metal.nvidia.gpu_32901411180.zip

Pull Request resolved: pytorch#140493
Approved by: https://github.com/laithsakka
youssef62 pushed a commit to youssef62/pytorch that referenced this pull request Nov 23, 2024
I'm trying to make this benchmark results available on OSS benchmark database, so that people can query it from outside.  The first step is to also record the results in the JSON format compatible with the database schema defined in pytorch/test-infra#5839.

Existing CSV files remain unchanged.

### Testing

The JSON results are uploaded as artifacts to S3 https://github.com/pytorch/pytorch/actions/runs/11809725848/job/32901411180#step:26:13, for example https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/11809725848/1/artifact/test-jsons-test-pr_time_benchmarks-1-1-linux.g4dn.metal.nvidia.gpu_32901411180.zip

Pull Request resolved: pytorch#140493
Approved by: https://github.com/laithsakka
Ryo-not-rio pushed a commit to Ryo-not-rio/pytorch that referenced this pull request Dec 2, 2024
I'm trying to make this benchmark results available on OSS benchmark database, so that people can query it from outside.  The first step is to also record the results in the JSON format compatible with the database schema defined in pytorch/test-infra#5839.

Existing CSV files remain unchanged.

### Testing

The JSON results are uploaded as artifacts to S3 https://github.com/pytorch/pytorch/actions/runs/11809725848/job/32901411180#step:26:13, for example https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/11809725848/1/artifact/test-jsons-test-pr_time_benchmarks-1-1-linux.g4dn.metal.nvidia.gpu_32901411180.zip

Pull Request resolved: pytorch#140493
Approved by: https://github.com/laithsakka
Ryo-not-rio pushed a commit to Ryo-not-rio/pytorch that referenced this pull request Dec 2, 2024
I'm trying to make this benchmark results available on OSS benchmark database, so that people can query it from outside.  The first step is to also record the results in the JSON format compatible with the database schema defined in pytorch/test-infra#5839.

Existing CSV files remain unchanged.

### Testing

The JSON results are uploaded as artifacts to S3 https://github.com/pytorch/pytorch/actions/runs/11809725848/job/32901411180#step:26:13, for example https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/11809725848/1/artifact/test-jsons-test-pr_time_benchmarks-1-1-linux.g4dn.metal.nvidia.gpu_32901411180.zip

Pull Request resolved: pytorch#140493
Approved by: https://github.com/laithsakka
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
I'm trying to make this benchmark results available on OSS benchmark database, so that people can query it from outside.  The first step is to also record the results in the JSON format compatible with the database schema defined in pytorch/test-infra#5839.

Existing CSV files remain unchanged.

### Testing

The JSON results are uploaded as artifacts to S3 https://github.com/pytorch/pytorch/actions/runs/11809725848/job/32901411180#step:26:13, for example https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/11809725848/1/artifact/test-jsons-test-pr_time_benchmarks-1-1-linux.g4dn.metal.nvidia.gpu_32901411180.zip

Pull Request resolved: pytorch#140493
Approved by: https://github.com/laithsakka
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
I'm trying to make this benchmark results available on OSS benchmark database, so that people can query it from outside.  The first step is to also record the results in the JSON format compatible with the database schema defined in pytorch/test-infra#5839.

Existing CSV files remain unchanged.

### Testing

The JSON results are uploaded as artifacts to S3 https://github.com/pytorch/pytorch/actions/runs/11809725848/job/32901411180#step:26:13, for example https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/11809725848/1/artifact/test-jsons-test-pr_time_benchmarks-1-1-linux.g4dn.metal.nvidia.gpu_32901411180.zip

Pull Request resolved: pytorch#140493
Approved by: https://github.com/laithsakka
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants