-
Notifications
You must be signed in to change notification settings - Fork 70
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Add Dataset Image Use for E2E Tests (#548)
**Reason for Change**: This PR ensures we use an image for dataset that is locally built during e2e as opposed to an external URL. This prevents flakey network errors that can waste entire pipeline runs. Addresses - #531
- Loading branch information
1 parent
39eb92f
commit 08d2800
Showing
8 changed files
with
54 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
FROM busybox:latest | ||
|
||
RUN mkdir -p /data | ||
|
||
COPY docker/dataset/dataset.parquet /data/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# E2E Fine-Tuning Dataset Files | ||
|
||
## Overview | ||
|
||
This dataset file is used for conducting end-to-end (E2E) testing for fine-tuning. The Dockerfile builds an image incorporating the [dolly-15k-oai-style](https://huggingface.co/datasets/philschmid/dolly-15k-oai-style) dataset which is then used within an init container specifically for fine-tuning. | ||
|
||
## Files | ||
|
||
- **Dockerfile**: Builds the Docker image for the E2E tests. | ||
|
||
- **dataset.parquet**: The dataset itself, downloaded from [dolly-15k-oai-style](https://huggingface.co/datasets/philschmid/dolly-15k-oai-style) | ||
|
||
|
||
## Usage | ||
|
||
Build the Docker image with the following command: | ||
|
||
```bash | ||
|
||
make docker-build-dataset | ||
|
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters