Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Adding bounds checking for --scaled and --num in sourmash sketch #1711

Merged
merged 16 commits into from
Aug 27, 2021

Conversation

keyabarve
Copy link
Contributor

Fixes #1708

  • Add checks for the values of scaled and num in sourmash sketch.
  • num values should always be > 0. if they are unreasonable, say > 50,000 or < 50, we should emit a warning.
  • scaled values should always be > 0. if they are unreasonable, say > 1e6 or < 100, we should emit a warning.
  • Add tests to check this behavior.

@codecov
Copy link

codecov bot commented Aug 11, 2021

Codecov Report

Merging #1711 (9b62059) into latest (e262810) will increase coverage by 7.39%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           latest    #1711      +/-   ##
==========================================
+ Coverage   82.70%   90.10%   +7.39%     
==========================================
  Files         114       87      -27     
  Lines       12207     8398    -3809     
  Branches     1558     1555       -3     
==========================================
- Hits        10096     7567    -2529     
+ Misses       1852      573    -1279     
+ Partials      259      258       -1     
Flag Coverage Δ
python 90.10% <100.00%> (+0.02%) ⬆️
rust ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sourmash/cli/utils.py 100.00% <100.00%> (ø)
src/sourmash/command_sketch.py 88.62% <100.00%> (+1.56%) ⬆️
src/sourmash/sourmash_args.py 93.81% <100.00%> (+0.24%) ⬆️
src/core/src/sketch/hyperloglog/estimators.rs
src/core/src/sketch/minhash.rs
src/core/src/errors.rs
src/core/src/wasm.rs
src/core/src/index/storage.rs
src/core/tests/minhash.rs
src/core/src/encodings.rs
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e262810...9b62059. Read the comment docs.

@keyabarve
Copy link
Contributor Author

@ctb I'm not being able to figure out how and where to get started. Could you provide some hints?

@ctb
Copy link
Contributor

ctb commented Aug 13, 2021

hi @keyabarve, have you found the param string parsing code?

@keyabarve
Copy link
Contributor Author

Yes, I found it in src/sourmash/command_sketch.py. This is the function: def _parse_params_str(params_str):.

@ctb
Copy link
Contributor

ctb commented Aug 13, 2021

ok. So that's where current value checking is happening, and where any new value checking would need to happen.

@keyabarve
Copy link
Contributor Author

Oh okay. So should I call the functions that I wrote in utils.py here, or should I basically rewrite the functions here?

@ctb
Copy link
Contributor

ctb commented Aug 13, 2021

In general, I don't think we should have multiple functions that do the same thing, but I would say to just get something working and tested, and then I can suggest refactorings in review.

@keyabarve
Copy link
Contributor Author

Alright, I'll do that. Thanks!

@keyabarve
Copy link
Contributor Author

@ctb I have commented out the existing check statements that were in the function and also the same statements that were in one of the tests: tests/test_sourmash_sketch.py::test_do_sourmash_sketchdna_with_bad_scaled. Please review.

@keyabarve
Copy link
Contributor Author

Also, this check: Dev env instructions / nix (pull_request) Failing after 7m — nix is failing in all of my PRs and I can't seem to figure out why.

@keyabarve
Copy link
Contributor Author

I have added the tests for both num and scaled. Please review.

src/sourmash/command_sketch.py Outdated Show resolved Hide resolved
src/sourmash/command_sketch.py Outdated Show resolved Hide resolved
@keyabarve
Copy link
Contributor Author

I have changed the error statements. Could you take a look at it and let me know if it looks good?

@ctb
Copy link
Contributor

ctb commented Aug 15, 2021 via email

@ctb
Copy link
Contributor

ctb commented Aug 15, 2021

Two thoughts -

it seems like you're just adding new code with new checks, and not centralizing the code so that there is one place for the value checking. In that case, why change the error messages at all? You can just add new ones without changing the old error messages.

That having been said, we will now run into the problem where the code for doing the checking becomes potentially inconsistent.

So I suggest refactoring things so there are two functions, one for each of num and scaled value checking, located in sourmash_args.py. Each function prints out errors or warnings, and may error exit. Then you rework the argparse stuff and the param_string stuff so that those functions are called.

(I'm not quite sure how to get it all to work with argparse properly, though.)

@keyabarve
Copy link
Contributor Author

Two thoughts -

it seems like you're just adding new code with new checks, and not centralizing the code so that there is one place for the value checking. In that case, why change the error messages at all? You can just add new ones without changing the old error messages.

So then I'll keep the error checking statements for num and scaled that were already there in place of the code that I wrote and I'll just add the extra statements.

That having been said, we will now run into the problem where the code for doing the checking becomes potentially inconsistent.

Right, because all the other commands are using the functions from utils.py and they will have that specific set of error messages, while the sketch command will have different ones.

So I suggest refactoring things so there are two functions, one for each of num and scaled value checking, located in sourmash_args.py. Each function prints out errors or warnings, and may error exit. Then you rework the argparse stuff and the param_string stuff so that those functions are called.

So then would I have to completely remove the 2 functions from utils.py too? And then the same functions that I write in sourmash_args.py will be used by all of the commands including sketch?

(I'm not quite sure how to get it all to work with argparse properly, though.)

Hmm, I'll take a look into it.

@ctb
Copy link
Contributor

ctb commented Aug 16, 2021

So I suggest refactoring things so there are two functions, one for each of num and scaled value checking, located in sourmash_args.py. Each function prints out errors or warnings, and may error exit. Then you rework the argparse stuff and the param_string stuff so that those functions are called.

So then would I have to completely remove the 2 functions from utils.py too? And then the same functions that I write in sourmash_args.py will be used by all of the commands including sketch?

Not necessarily - IIRC, the functions in utils.py are argparse-specific, and you could have them call a function in sourmash_args? I think my suggestion would be to get the value checking and error messages about values working in one function (one for num, one for scaled), and then use/display the error messages from that one function in argparse code and in the param_string code, and then we can revisit.

@keyabarve
Copy link
Contributor Author

So I suggest refactoring things so there are two functions, one for each of num and scaled value checking, located in sourmash_args.py. Each function prints out errors or warnings, and may error exit. Then you rework the argparse stuff and the param_string stuff so that those functions are called.

So then would I have to completely remove the 2 functions from utils.py too? And then the same functions that I write in sourmash_args.py will be used by all of the commands including sketch?

Not necessarily - IIRC, the functions in utils.py are argparse-specific, and you could have them call a function in sourmash_args? I think my suggestion would be to get the value checking and error messages about values working in one function (one for num, one for scaled), and then use/display the error messages from that one function in argparse code and in the param_string code, and then we can revisit.

I'm still a little confused, but I'll try something out, push it, and ask for a review.

@ctb
Copy link
Contributor

ctb commented Aug 17, 2021 via email

@keyabarve
Copy link
Contributor Author

keyabarve commented Aug 17, 2021

@ctb I moved the 2 functions from utils.py to sourmash_args.py and I called them in the argparse and param_string codes. I also changed the error statements a little, so some of the tests are failing because the error messages don't match. Could you please take a look at it and let me know if it looks good?

@keyabarve
Copy link
Contributor Author

@ctb Please review.

Copy link
Contributor

@ctb ctb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it's heading in the right direction!

tests/test_sourmash_sketch.py Outdated Show resolved Hide resolved
src/sourmash/sourmash_args.py Show resolved Hide resolved
src/sourmash/sourmash_args.py Outdated Show resolved Hide resolved
@keyabarve
Copy link
Contributor Author

@ctb Please review.

@keyabarve
Copy link
Contributor Author

@ctb I accidentally wrote a function and pushed it on the current branch (KB_1708) instead of the branch it was actually supposed to be pushed on (KB_1707), but I immediately deleted the function and pushed again. Hence the 2 commits above.

@keyabarve
Copy link
Contributor Author

@bluegenes Could you please review this?

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>
Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>
@bluegenes
Copy link
Contributor

This is looking good to me. Why don't you clean up the commented out bits of code and merge in latest to see if all tests continue to pass.

@keyabarve
Copy link
Contributor Author

Sure! I'll do that and ask for a review. Thanks!

@keyabarve keyabarve changed the title [WIP] Adding bounds checking for --scaled and --num in sourmash sketch [MRG] Adding bounds checking for --scaled and --num in sourmash sketch Aug 27, 2021
@keyabarve
Copy link
Contributor Author

@ctb Should I merge this?

@ctb
Copy link
Contributor

ctb commented Aug 27, 2021 via email

@keyabarve keyabarve merged commit c7426ae into latest Aug 27, 2021
@keyabarve keyabarve deleted the KB_1708 branch August 27, 2021 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

confirm and/or add bounds checking for scaled and num in sourmash sketch
3 participants