-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix generator support in fromdicts - use file cache instead of iterto… #625
Fix generator support in fromdicts - use file cache instead of iterto… #625
Conversation
Pull Request Test Coverage Report for Build 2693186888
💛 - Coveralls |
@@ -742,6 +746,18 @@ | |||
return peek, chain(peek, it) | |||
|
|||
|
|||
def iterchunk(fn): | |||
# reopen so iterators from file cache are independent | |||
debug('iterchunk, opening %s' % fn) |
Check notice
Code scanning
Formatting a regular string which could be a f-string (consider-using-f-string)
@@ -742,6 +746,18 @@ | |||
return peek, chain(peek, it) | |||
|
|||
|
|||
def iterchunk(fn): | |||
# reopen so iterators from file cache are independent | |||
debug('iterchunk, opening %s' % fn) |
Check notice
Code scanning
Use lazy % formatting in logging functions (logging-not-lazy)
yield pickle.load(f) | ||
except EOFError: | ||
pass | ||
debug('end of iterchunk, closed %s' % fn) |
Check notice
Code scanning
Formatting a regular string which could be a f-string (consider-using-f-string)
yield pickle.load(f) | ||
except EOFError: | ||
pass | ||
debug('end of iterchunk, closed %s' % fn) |
Check notice
Code scanning
Use lazy % formatting in logging functions (logging-not-lazy)
@@ -175,16 +177,50 @@ | |||
|
|||
class DictsGeneratorView(DictsView): | |||
|
|||
def __init__(self, dicts, header=None, sample=1000, missing=None): | |||
super(DictsGeneratorView, self).__init__(dicts, header, sample, missing) |
Check notice
Code scanning
Consider using Python 3 style super() without arguments (super-with-arguments)
yield self._header | ||
|
||
if not self._filecache: | ||
self._filecache = NamedTemporaryFile(delete=False, mode='wb') |
Check notice
Code scanning
Consider using 'with' for resource-allocating operations (consider-using-with)
# reverse order | ||
second_row3 = next(it2) | ||
first_row3 = next(it1) | ||
assert second_row3 == first_row3 |
Check warning
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.
assert second_row3 == first_row3 | ||
ieq(actual, actual) | ||
assert actual.header() == ('n', 'foo', 'bar') | ||
assert len(actual) == 6 |
Check warning
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.
@@ -1,6 +1,7 @@ | |||
from __future__ import absolute_import, print_function, division | |||
|
|||
|
|||
import logging | |||
import pickle |
Check warning
Code scanning / Bandit (reported by Codacy)
Consider possible security implications associated with pickle module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 15 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 16 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 16 potential problems in the proposed changes. Check the Files changed tab for more details.
a598913
to
177756b
Compare
177756b
to
62cebec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 14 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 17 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 14 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 17 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 17 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 14 potential problems in the proposed changes. Check the Files changed tab for more details.
@juarezr I've updated the docstring for |
Nice solution. |
I didn't understand if you consider this PR ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 13 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 14 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 13 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 13 potential problems in the proposed changes. Check the Files changed tab for more details.
self._filecache = NamedTemporaryFile(delete=False, mode='wb') | ||
it = iter(self.dicts) | ||
for o in it: | ||
row = tuple(o[f] if f in o else self.missing for f in self._header) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
row = tuple(o[f] if f in o else self.missing for f in self._header) | |
row = tuple(o.get(f, self.missing) for f in self._header) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commited in the branch that has superseded this one: 662b785
@juarezr this took a while. I had asked for some more reviews and discussion about this solution and ended up not entirely happy with it. Dumping the whole generator to the file before yielding any data was considered as a faulty solution. Done some more trials and I've finally created a solution that I am happy with: |
Can we close this as #626 superseded it and it was merged? |
Definitely to be closed, this one is obsolete now |
…ols.tee
This PR has the objective of improving the support of generators in fromdicts.
The current implementation uses itertools.tee which according to docs and production deployments uses large amounts of memory, leading to out of memory kills of processes.
This PR aims to keep the improved support of generators by using a filecache, similar to sorting, to allow multiple iterations.
Closes #618
Changes
DictsGeneratorView
inpetl.io.json
to use file cacheChecklist
Use this checklist for assuring the quality of pull requests that include new code and or make changes to existing code.
tox
/pytest