-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/contour speed optimizations #634
Conversation
@dcaustin33 @elboy3 how will retrieval work on the backend with a single file? Are you going to need to download it every time? Or in runners, will you split this up into individual files? |
Codecov Report
@@ Coverage Diff @@
## main #634 +/- ##
==========================================
- Coverage 84.42% 84.21% -0.21%
==========================================
Files 165 165
Lines 12973 13005 +32
==========================================
Hits 10952 10952
- Misses 2021 2053 +32
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@Ben-Epstein my thought was to keep it as one. We only need to retrieve it for the /rows endpoint (nothing else). I was considering adding it to the cache as well if the file is large and it takes some time to download. wdyt? |
is it currently a single file, or would this change it into a single file from multiple individual files? I would imagine that a single contours file would be pretty large right? |
@Ben-Epstein for 200 examples the size of the file rn is 1.4 mb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense on the dq side, i would watch the performance on the API and consider breaking its into individual polygon JSONs (or testing that behavior alongside) if the performance suffers
if not os.path.isdir(prefix): | ||
with lock: | ||
os.makedirs(prefix, exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just do os.makedirs(prefix, exist_ok=True)
or preferably can we just create this folder at the beginning of a run and then not have to do any checks here?
with lock: | ||
os.makedirs(prefix, exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need the lock?
Optimizations to write contours all locally and then upload in one file at the end because of colab's speed. For 50 examples this takes us from 60 seconds down to 30 seconds in colab.