Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vb/issue235 -- Convert config.pkl to config.yaml #240

Merged
merged 5 commits into from
Apr 25, 2023
Merged

Conversation

vineetbansal
Copy link
Collaborator

@vineetbansal vineetbansal commented Mar 13, 2023

Addresses issue #235.

Whenever configuration files are saved, config.yaml is used by default. When analysis commands are used, where a config.pkl would have been unconditionally read initially, we check to see if a config.yaml is present, failing which we look for a config.pkl (and generate a DeprecationWarning if we do so).

view_config works but generates a DeprecationWarning too.

I've tested the whole thing on old/newly generated training data, and it works correctly. I've modified the notebook templates too, but I haven't confirmed correct behavior on them. I'll leave a note here once I have. In the meantime, I'm opening up the PR so you can start looking around.

@vineetbansal vineetbansal requested a review from zhonge March 13, 2023 14:46
@@ -88,7 +88,7 @@ def __init__(
self.use_cupy = use_cupy # estimate_normalization may need access to self.use_cupy, so save it first
if norm is None:
norm = self.estimate_normalization()
self.norm = norm
self.norm = [float(x) for x in norm]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The norm was the only config attribute that in some cases would be a numpy float32 (which would cause serialization problems), so I'm converting it to a float everywhere its generated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, so yml can't serialize float32? Good to keep in mind... the pytorch models have dtype float32 so we should be careful about typing. Can you add a comment about why?

Afaik norm should be only used during model eval (to re-scale the intensity of the densities), so I don't think it's an issue in this case.

Copy link
Collaborator Author

@vineetbansal vineetbansal Mar 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any data type that is not a built-in python type needs extra bookkeeping information as to what library/class is needed to initialize a new object given a string representation of it. So serializing a simple variable x which is a numpy float results in the yaml:

x: !!python/object/apply:numpy.core.multiarray._reconstruct
  args:
  - !!python/name:numpy.ndarray ''
  - !!python/tuple
    - 0
  - !!binary |
    Yg==
  state: !!python/tuple
  - 1
  - !!python/tuple []
  - !!python/object/apply:numpy.dtype
    args:
    - f8
    - false
    - true
    state: !!python/tuple
    - 3
    - <
    - null
    - null
    - null
    - -1
    - -1
    - 0
  - false
  - !!binary |
    AAAAAAAAAAA=

Serializing/Reading the norm variable by casting it as a float is indeed not an issue in this case as you suspect. I've verified this both locally as well as in the unit tests.

cryodrgn train_nn data/toy_projections.mrcs --poses data/toy_angles.pkl -o output/toy_recon -n 10
cryodrgn train_nn data/toy_projections.star --poses data/toy_angles.pkl -o output/toy_recon -n 10
cryodrgn train_nn data/toy_projections.txt --poses data/toy_angles.pkl -o output/toy_recon -n 10
cryodrgn train_nn data/toy_projections.mrcs --poses data/toy_angles.pkl -o output/toy_recon -n 10 --no-amp
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toy_projections.mrcs are 30x30 in size, and amp (default) insists on a side length being a multiple of 8. So this never works in my local setup without --no-amp. This is unrelated to this PR though, so I'm happy to move it to a different PR/issue if you think it's better. (It fails further down anyway since there is no script called translate_stack.py).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this! We could consider relaxing that assertion but will start a separate issue for tracking that.

@vineetbansal
Copy link
Collaborator Author

vineetbansal commented Mar 19, 2023

I've tested the notebooks too and they're working correctly w.r.t the rest of the config changes introduced in this PR. However, the notebooks are failing both in the master branch as well as this branch on an unrelated error, on the line below:

# Load poses
if config['dataset_args']['do_pose_sgd']:
    pose_pkl = f'{WORKDIR}/pose.{EPOCH}.pkl'
    with open(pose_pkl,'rb') as f:
        rot, trans = pickle.load(f)
else:
    pose_pkl = config['dataset_args']['poses']
    rot, trans = utils.load_pkl(pose_pkl)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[17], line 8
      6 else:
      7     pose_pkl = config['dataset_args']['poses']
----> 8     rot, trans = utils.load_pkl(pose_pkl)

ValueError: too many values to unpack (expected 2)

My understanding is that poses can be rotations, translations, or just translations, and the toy dataset that I'm testing out this notebook on (the hand dataset) just has rotations, no translations. I think the rest of cryodrgn code is written so as to take care of this situation, but perhaps the notebooks are not?

I'm not sure how big an issue this is for the real usage of notebooks. If it is, perhaps we can open up a new issue on this? In any case it is unrelated to this PR so I'm confident that this can be merged, and there are no obvious places that we're overlooking here for bugs.

@zhonge zhonge changed the title Vb/issue235 Vb/issue235 -- Convert config.pkl to config.yaml Mar 26, 2023
@zhonge
Copy link
Collaborator

zhonge commented Mar 26, 2023

I've tested the notebooks too and they're working correctly w.r.t the rest of the config changes introduced in this PR. However, the notebooks are failing both in the master branch as well as this branch on an unrelated error, on the line below:

# Load poses
if config['dataset_args']['do_pose_sgd']:
    pose_pkl = f'{WORKDIR}/pose.{EPOCH}.pkl'
    with open(pose_pkl,'rb') as f:
        rot, trans = pickle.load(f)
else:
    pose_pkl = config['dataset_args']['poses']
    rot, trans = utils.load_pkl(pose_pkl)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[17], line 8
      6 else:
      7     pose_pkl = config['dataset_args']['poses']
----> 8     rot, trans = utils.load_pkl(pose_pkl)

ValueError: too many values to unpack (expected 2)

My understanding is that poses can be rotations, translations, or just translations, and the toy dataset that I'm testing out this notebook on (the hand dataset) just has rotations, no translations. I think the rest of cryodrgn code is written so as to take care of this situation, but perhaps the notebooks are not?

I'm not sure how big an issue this is for the real usage of notebooks. If it is, perhaps we can open up a new issue on this? In any case it is unrelated to this PR so I'm confident that this can be merged, and there are no obvious places that we're overlooking here for bugs.

Ah yes, I think we should open a new ticket for updating the toy datasets. Right now the toy dataset doesn't have any translations (just rotations since the images are perfectly centered). There is also no CTF applied on this dataset. I'm pretty sure this idealized setting breaks some of the assumptions in the notebooks, and we can discuss whether we need to support these idealized settings in the code and in the jupyter notebooks.

@zhonge zhonge merged commit 0216a08 into master Apr 25, 2023
@zhonge zhonge deleted the vb/issue235 branch April 25, 2023 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants