-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: auto alpaca #778
fix: auto alpaca #778
Conversation
Codecov Report
@@ Coverage Diff @@
## main #778 +/- ##
==========================================
+ Coverage 87.68% 87.70% +0.01%
==========================================
Files 184 184
Lines 15092 15127 +35
==========================================
+ Hits 13234 13267 +33
- Misses 1858 1860 +2
... and 9 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creds in notebook again...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Left a suggestion.
Let's clean up the notebook and creds before merging.
dataquality/dq_auto/schema.py
Outdated
@@ -46,6 +48,9 @@ class BaseAutoDatasetConfig: | |||
# Column names | |||
input_col: str = "text" | |||
target_col: str = "label" | |||
# Dataset input / output formatter | |||
max_train_size: Optional[int] = None | |||
formatter: BaseFormatter = DefaultFormatter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a more idiomatic way of initializing for a dataclass:
formatter: BaseFormatter = DefaultFormatter() | |
from dataclasses import field | |
... | |
formatter: BaseFormatter = field(default_factory=DefaultFormatter) |
1d37f19
to
9695326
Compare
bump version docstring make max train size default to none remove notebook
9695326
to
50e0a87
Compare
We were running into some column renaming issues that I fix in this PR
I also made some edits to allow the user to pass in an upper limit to the dataset size that is configurable in DatasetConfig
A HF dataset has a default size limit that can be upped by the user