Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chore: Documentation and input / output cols #715

Merged
merged 1 commit into from
Jul 18, 2023
Merged

Conversation

elboy3
Copy link
Contributor

@elboy3 elboy3 commented Jul 18, 2023

Small cleanup leading up to S2S sprint during onsite week

@elboy3 elboy3 requested review from a team and dcaustin33 as code owners July 18, 2023 00:24
@@ -135,7 +154,7 @@ def _get_data_dict(self) -> Dict:
C.epoch.value: [self.epoch] * batch_size,
}
if self.split == Split.inference:
data["inference_name"] = [self.inference_name] * batch_size
data[C.inference_name.value] = [self.inference_name] * batch_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a .value here? I think it works simply, too?

Suggested change
data[C.inference_name.value] = [self.inference_name] * batch_size
data[C.inference_name] = [self.inference_name] * batch_size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we do! This creates a dict that later gets cast to a vaex DataFrame, which can't have enum as the keys to the dict

The following errors:

from enum import Enum
import vaex

class MyType(str, Enum):
    a = "a"
    b = "b"
    c = "c"

df = vaex.from_dict(
    {
        MyType.a: [1, 2, 3],
        MyType.b: [4, 5, 6],
        MyType.c: [7, 8, 9],
    }
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL!

token_deps = "token_deps"
token_gold_probs = "token_gold_probs"
# Mypy complained about split as an attribute, so we use `split_`
split_ = "split"
epoch = "epoch"
inference_name = "inference_name"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why inference_name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For line 157 in dataquality/loggers/model_logger/seq2seq.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was why call it inference name? Name as a column name is too generic and doesn't feel like that's what we should be getting?

@codecov-commenter
Copy link

codecov-commenter commented Jul 18, 2023

Codecov Report

Merging #715 (3b8333c) into main (59cfd9a) will increase coverage by 0.00%.
The diff coverage is 90.00%.

@@           Coverage Diff           @@
##             main     #715   +/-   ##
=======================================
  Coverage   89.69%   89.70%           
=======================================
  Files         166      166           
  Lines       13320    13323    +3     
=======================================
+ Hits        11948    11951    +3     
  Misses       1372     1372           
Impacted Files Coverage Δ
dataquality/utils/arrow.py 100.00% <ø> (ø)
dataquality/loggers/model_logger/seq2seq.py 91.80% <75.00%> (ø)
dataquality/loggers/data_logger/seq2seq.py 95.06% <100.00%> (ø)
dataquality/schemas/seq2seq.py 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@elboy3 elboy3 merged commit 7db3ef4 into main Jul 18, 2023
@elboy3 elboy3 deleted the chore/s2s-cleanup branch July 18, 2023 19:51
bogdan-galileo pushed a commit that referenced this pull request Jul 21, 2023
Small cleanup leading up to S2S sprint during onsite week
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants