Vb/issue235 -- Convert config.pkl to config.yaml #240

vineetbansal · 2023-03-13T14:46:24Z

Addresses issue #235.

Whenever configuration files are saved, config.yaml is used by default. When analysis commands are used, where a config.pkl would have been unconditionally read initially, we check to see if a config.yaml is present, failing which we look for a config.pkl (and generate a DeprecationWarning if we do so).

view_config works but generates a DeprecationWarning too.

I've tested the whole thing on old/newly generated training data, and it works correctly. I've modified the notebook templates too, but I haven't confirmed correct behavior on them. I'll leave a note here once I have. In the meantime, I'm opening up the PR so you can start looking around.

vineetbansal · 2023-03-13T14:48:08Z

cryodrgn/dataset.py

@@ -88,7 +88,7 @@ def __init__(
        self.use_cupy = use_cupy  # estimate_normalization may need access to self.use_cupy, so save it first
        if norm is None:
            norm = self.estimate_normalization()
-        self.norm = norm
+        self.norm = [float(x) for x in norm]


The norm was the only config attribute that in some cases would be a numpy float32 (which would cause serialization problems), so I'm converting it to a float everywhere its generated.

Gotcha, so yml can't serialize float32? Good to keep in mind... the pytorch models have dtype float32 so we should be careful about typing. Can you add a comment about why?

Afaik norm should be only used during model eval (to re-scale the intensity of the densities), so I don't think it's an issue in this case.

Any data type that is not a built-in python type needs extra bookkeeping information as to what library/class is needed to initialize a new object given a string representation of it. So serializing a simple variable x which is a numpy float results in the yaml:

x: !!python/object/apply:numpy.core.multiarray._reconstruct args: - !!python/name:numpy.ndarray '' - !!python/tuple - 0 - !!binary | Yg== state: !!python/tuple - 1 - !!python/tuple [] - !!python/object/apply:numpy.dtype args: - f8 - false - true state: !!python/tuple - 3 - < - null - null - null - -1 - -1 - 0 - false - !!binary | AAAAAAAAAAA=

Serializing/Reading the norm variable by casting it as a float is indeed not an issue in this case as you suspect. I've verified this both locally as well as in the unit tests.

vineetbansal · 2023-03-13T14:53:03Z

testing/unittest.sh

-cryodrgn train_nn data/toy_projections.mrcs --poses data/toy_angles.pkl -o output/toy_recon -n 10
-cryodrgn train_nn data/toy_projections.star --poses data/toy_angles.pkl -o output/toy_recon -n 10
-cryodrgn train_nn data/toy_projections.txt --poses data/toy_angles.pkl -o output/toy_recon -n 10
+cryodrgn train_nn data/toy_projections.mrcs --poses data/toy_angles.pkl -o output/toy_recon -n 10 --no-amp


toy_projections.mrcs are 30x30 in size, and amp (default) insists on a side length being a multiple of 8. So this never works in my local setup without --no-amp. This is unrelated to this PR though, so I'm happy to move it to a different PR/issue if you think it's better. (It fails further down anyway since there is no script called translate_stack.py).

Thanks for updating this! We could consider relaxing that assertion but will start a separate issue for tracking that.

vineetbansal · 2023-03-19T19:48:02Z

I've tested the notebooks too and they're working correctly w.r.t the rest of the config changes introduced in this PR. However, the notebooks are failing both in the master branch as well as this branch on an unrelated error, on the line below:

# Load poses
if config['dataset_args']['do_pose_sgd']:
    pose_pkl = f'{WORKDIR}/pose.{EPOCH}.pkl'
    with open(pose_pkl,'rb') as f:
        rot, trans = pickle.load(f)
else:
    pose_pkl = config['dataset_args']['poses']
    rot, trans = utils.load_pkl(pose_pkl)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[17], line 8
      6 else:
      7     pose_pkl = config['dataset_args']['poses']
----> 8     rot, trans = utils.load_pkl(pose_pkl)

ValueError: too many values to unpack (expected 2)

My understanding is that poses can be rotations, translations, or just translations, and the toy dataset that I'm testing out this notebook on (the hand dataset) just has rotations, no translations. I think the rest of cryodrgn code is written so as to take care of this situation, but perhaps the notebooks are not?

I'm not sure how big an issue this is for the real usage of notebooks. If it is, perhaps we can open up a new issue on this? In any case it is unrelated to this PR so I'm confident that this can be merged, and there are no obvious places that we're overlooking here for bugs.

zhonge · 2023-03-26T16:07:51Z

I've tested the notebooks too and they're working correctly w.r.t the rest of the config changes introduced in this PR. However, the notebooks are failing both in the master branch as well as this branch on an unrelated error, on the line below:
# Load poses
if config['dataset_args']['do_pose_sgd']:
    pose_pkl = f'{WORKDIR}/pose.{EPOCH}.pkl'
    with open(pose_pkl,'rb') as f:
        rot, trans = pickle.load(f)
else:
    pose_pkl = config['dataset_args']['poses']
    rot, trans = utils.load_pkl(pose_pkl)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[17], line 8
      6 else:
      7     pose_pkl = config['dataset_args']['poses']
----> 8     rot, trans = utils.load_pkl(pose_pkl)

ValueError: too many values to unpack (expected 2)
My understanding is that poses can be rotations, translations, or just translations, and the toy dataset that I'm testing out this notebook on (the hand dataset) just has rotations, no translations. I think the rest of cryodrgn code is written so as to take care of this situation, but perhaps the notebooks are not?

I'm not sure how big an issue this is for the real usage of notebooks. If it is, perhaps we can open up a new issue on this? In any case it is unrelated to this PR so I'm confident that this can be merged, and there are no obvious places that we're overlooking here for bugs.

Ah yes, I think we should open a new ticket for updating the toy datasets. Right now the toy dataset doesn't have any translations (just rotations since the images are perfectly centered). There is also no CTF applied on this dataset. I'm pretty sure this idealized setting breaks some of the assumptions in the notebooks, and we can discuss whether we need to support these idealized settings in the code and in the jupyter notebooks.

vineetbansal added 5 commits March 12, 2023 14:54

saving

e96176c

changes to support config pkl/yaml

2d93803

pyright fixes

c0c50d1

Merge branch 'master' into vb/issue235

d0d72d5

flake8 fixes

5fb84a5

vineetbansal requested a review from zhonge March 13, 2023 14:46

vineetbansal commented Mar 13, 2023

View reviewed changes

zhonge changed the title ~~Vb/issue235~~ Vb/issue235 -- Convert config.pkl to config.yaml Mar 26, 2023

zhonge mentioned this pull request Mar 26, 2023

Support idealized imaging settings / toy datasets? #253

Open

zhonge approved these changes Apr 25, 2023

View reviewed changes

zhonge merged commit 0216a08 into master Apr 25, 2023

zhonge deleted the vb/issue235 branch April 25, 2023 19:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vb/issue235 -- Convert config.pkl to config.yaml #240

Vb/issue235 -- Convert config.pkl to config.yaml #240

vineetbansal commented Mar 13, 2023 •

edited

Loading

vineetbansal Mar 13, 2023

zhonge Mar 13, 2023

vineetbansal Mar 19, 2023 •

edited

Loading

vineetbansal Mar 13, 2023

zhonge Mar 13, 2023

vineetbansal commented Mar 19, 2023 •

edited

Loading

zhonge commented Mar 26, 2023

Vb/issue235 -- Convert config.pkl to config.yaml #240

Vb/issue235 -- Convert config.pkl to config.yaml #240

Conversation

vineetbansal commented Mar 13, 2023 • edited Loading

vineetbansal Mar 13, 2023

Choose a reason for hiding this comment

zhonge Mar 13, 2023

Choose a reason for hiding this comment

vineetbansal Mar 19, 2023 • edited Loading

Choose a reason for hiding this comment

vineetbansal Mar 13, 2023

Choose a reason for hiding this comment

zhonge Mar 13, 2023

Choose a reason for hiding this comment

vineetbansal commented Mar 19, 2023 • edited Loading

zhonge commented Mar 26, 2023

vineetbansal commented Mar 13, 2023 •

edited

Loading

vineetbansal Mar 19, 2023 •

edited

Loading

vineetbansal commented Mar 19, 2023 •

edited

Loading