Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: save data and rebuild model #49

Merged
merged 2 commits into from
Jul 18, 2018
Merged

MAINT: save data and rebuild model #49

merged 2 commits into from
Jul 18, 2018

Conversation

eigenfoo
Copy link
Contributor

@eigenfoo eigenfoo commented Jul 18, 2018

I forgot to save the data as part of the trace. This does that and also allows us to rebuild_model.

@eigenfoo eigenfoo requested a review from aseyboldt July 18, 2018 15:29
@@ -233,6 +239,12 @@ def fit_authors(data,
trace.attrs['model-version'] = get_versions()['version']
trace.attrs['model-type'] = AUTHOR_MODEL_TYPE

if save_data:
d = data.set_index(['meta_user_id',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These dimension names should match the ones used in the model (line 59)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way xarray knows that they are the same dimension, and doesn't store the coordinates twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so then:

author -> meta_user_id
algo -> meta_algorithm_id
backtest -> meta_code_id

Right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@aseyboldt
Copy link
Collaborator

thanks

@aseyboldt aseyboldt merged commit 48bf273 into master Jul 18, 2018
@aseyboldt aseyboldt deleted the save_data branch July 18, 2018 15:48
@eigenfoo
Copy link
Contributor Author

eigenfoo commented Jul 18, 2018

@aseyboldt latest Jenkins build failed because of the same dim names

E           ValueError: conflicting MultiIndex level name(s):
E           'meta_code_id' (dim_0), (meta_code_id)
E           'meta_user_id' (dim_0), (meta_user_id)
E           'meta_algorithm_id' (dim_0), (meta_algorithm_id)

@eigenfoo
Copy link
Contributor Author

I think the easiest solution to this would be to change the dim names for _data to be meta_code_id__, meta_user_id__, etc. The underscores can be added or deleted when saving and loading the data.

I'm not sure if this is the most elegant solution though, since the user may wait to access the data by itself, and the underscores would make it a pain.

@aseyboldt what do you think?

@aseyboldt
Copy link
Collaborator

It should be possible to fix that. We somehow need to tell xarray to reuse those coords. Give me a couple of minutes.

@aseyboldt
Copy link
Collaborator

This seems to be more difficult than I thought: pydata/xarray#1603

The dimension meta_user_id in _data is not the same as the other meta_user_id. The first one has repeated entries, so that the length is equal len(data), while the second has len num_authors. So you could use new names like data_meta_user_id. This isn't all that nice, but I think it makes some sense.

@eigenfoo
Copy link
Contributor Author

data storage is necromancy

@aseyboldt
Copy link
Collaborator

I created pydata/xarray#2299.

I think we should just add a

d.index.names = ['data_meta_user_id', 'data_meta_...']

just before the assignment to the xarray dataset.

@eigenfoo eigenfoo mentioned this pull request Jul 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants