-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csv file time history #115
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #115 +/- ##
=====================================
Coverage 100% 100%
=====================================
Files 4 4
Lines 966 987 +21
Branches 226 231 +5
=====================================
+ Hits 966 987 +21
Continue to review full report at Codecov.
|
@jsantner Why are these changes needed? |
@bryanwweber
You can see this on the Travis report after my second commit on this branch, where I had only added a test yaml file with time history defined in a csv file. |
0f5cd88
to
5d851fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsantner Thanks for submitting this! A few suggestions:
- I think the assumption is that the CSV filename will be specified relative to the directory of the YAML file, hence the
directory
argument should be unnecessary. If that's not working, I'd be interested to see a failing test case. - If possible, I think we should load and check the CSV file as well. However, I think that will fit better when we refactor all of the validation, so we can skip it for now.
pyked/validation.py
Outdated
# If reading from a file, the file will not be validated. | ||
# A file can have an arbitrary number of columns, and the columns | ||
# to be used are specified. | ||
if type(value['values']) is list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not 'filename' not in value['values'].keys()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we can assume the CSV file is specified relative to the yaml file, but DataPoint
doesn't know where the yaml file directory is unless it's specified as an argument to __init__
, right? Is there a simpler way to deal with a relative path?
I tried using 'filename' not in value['values'].keys()
in a previous commit but this fails when the values are given as a list. In that case, value['values']
is a list, not a dict, so if 'filename' not in value['values'].keys()
raises an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, just trying to open the file will try to open it relative to the working directory of the Python process. Are you saying that if I have a file structure like
|- database
|---butanol
|------file_1.yaml
|------file_1.csv
and I start Python in the database
directory, and load file_1.yaml
like
>>> ChemKED('butanol/file_1.yaml')
it won't work, because Python will assume the file_1.csv
is relative to database
, not butanol
?
As on your other PR, I think the simpler/more "pythonic" way to do this is a try...except
block, rather than checking the type of the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's exactly what I'm saying. I was using a script to read multiple yaml files in a complex directory structure within a database
directory, and python was looking for the csv file in database
, not in the folder with the yaml file.
I'm not sure that a try...except
block would work well here. Are you thinking of something like? It seems more complex and confusing this way, and it puts a lot of code between the try
and except
try:
if 'filename' in value['values'].keys():
# Don't do anything because csv files aren't validated
pass
else:
# This should never happen. If vale['values'] is a dictionary with keys, 'filename' should be a key
self._error(field, 'must include filename or list of values')
except KeyError:
# value['values'] is probably a list.
# Code from earlier that checks the number of columns
Since this PR is related to time histories, I have a somewhat related question for you that I just stumbled on. Let's say somebody has a csv file with three columns - time, pressure, and volume. Right now, these must be implemented as two separate time-histories
and the csv will be loaded twice, right? Is there interest in allowing the user to specify multiple time-histories using a single file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to keep this focused on the code here, I moved the other discussion to the main comment thread. Anyhow, two things:
- I want to validate csv files in the future, so we might as well set up for that here
- The code could be
try:
if 'filename' not in value['values'].keys():
self._error(field, 'must include filename or list of values')
except TypeError:
# Code from earlier that checks the number of columns
or
try:
filename = value['values'].get('filename')
if filename is None:
self._error(field...
except AttributeError:
# Code from earlier
which isn't all that confusing to me. The reason (to me) to avoid the type
function is that it doesn't always handle inheritance in a straightforward way, so we'd be relying on the underlying YAML library to always return something that's a subclass of list. On the other hand, with the try-except, we're using the duck-typing in Python to try something that we expect to be the case, and catch the resulting errors.
OK, that's a case that we missed for sure. What if, rather than passing the directory name around, we turn the
I should note that this is just off the top of my head, and there may be a better way to handle this backwards dependency. Also, this doesn't help in the case of using a dictionary as the input. I'm not sure there's a good way to handle that, though.
The only problem I see with loading twice is that it might take some time to load the file from disk. I think it makes more sense to keep the specifications of the time histories separate, and try to cache the data in the csv file somehow, rather than loading it from disk twice. |
pyked/chemked.py
Outdated
values = np.genfromtxt(hist['values']['filename'], delimiter=',') | ||
filename = hist['values']['filename'] | ||
if not isabs(filename): | ||
filename = join(directory, filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will fail if the input is a dict
and therefore directory
is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the default directory to ''
would fix that. But, if the input is a dict, then the filename must be specified as an absolute path, right? Since there's no yaml file, a path relative to a yaml file wouldn't make sense. So, line 693 won't be run anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that change will fix the problem on this line, because the directory
argument gets set to None
if yaml_file
is None
, so this line will join None
and filename
, which won't work. Actually, I think that if people provide a dictionary input, the CSV file has to be specified relative to the PWD of the Python process, or as an absolute file name, but we don't need to check for that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I just pushed another commit so that directory
will never be set to None
. Now, if yaml_file
is None
, then directory
will be ''
…r forgiveness instead of permission when looking for csv file
I've never used Using a dictionary input, I think the filename would have to be specified as an absolute path. If it were a relative path, what would it be relative to? There's no yaml file. |
What if we do the path munging in the loop that creates the for point in self._properties['datapoints']:
if 'time-histories' in point:
for th in point['time-histories']:
try:
filename = Path(th['values']['filename'])
except TypeError:
pass
else:
if yaml_file is not None:
th['values']['filename'] = (yaml_file.stem/filename).resolve()
else:
th['values']['filename'] = filename.resolve()
self.datapoints.append(DataPoint(point)) (please correct any indentation errors, writing code with proportional fonts is hard...) Then we don't have to pass anything around. This will resolve the path into an absolute path relative to the yaml file (if given) or relative to the CWD of the Python process, which should handle the dictionary and the yaml file cases gracefully. BTW, one of the reasons I'm pushing back here is that I don't want to change how Can you please make sure to add tests for all these code branches? The diff coverage should be 100%, you can see the lines that haven't been run here: https://codecov.io/gh/pr-omethe-us/PyKED/pull/115/diff where the lines that are in the brighter red in the left column weren't executed during testing. |
That's a smart way to do it, I'll add it in and test it. |
https://github.com/pr-omethe-us/PyKED/blob/master/pyked/tests/test_chemked.py#L59 By the way, you'll also have to turn the if yaml_file is not None:
yaml_file = Path(yaml_file)
with open... |
Revert DataPoint to its original form, use Path objects to indicate location of csv file. Still need to add tests for 100% coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a test PR review
@@ -11,7 +11,7 @@ A BibTeX entry for LaTeX users is | |||
```TeX | |||
@misc{PyKED, | |||
author = {Kyle E Niemeyer and Bryan W Weber}, | |||
year = 2017, | |||
year = 2018, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
year = 2018, | |
year = 2020, |
CHANGELOG.md
Changes proposed in this pull request:
@pr-omethe-us/chemked