-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vb/issue235 -- Convert config.pkl to config.yaml #240
Conversation
@@ -88,7 +88,7 @@ def __init__( | |||
self.use_cupy = use_cupy # estimate_normalization may need access to self.use_cupy, so save it first | |||
if norm is None: | |||
norm = self.estimate_normalization() | |||
self.norm = norm | |||
self.norm = [float(x) for x in norm] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The norm
was the only config
attribute that in some cases would be a numpy float32
(which would cause serialization problems), so I'm converting it to a float
everywhere its generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, so yml can't serialize float32? Good to keep in mind... the pytorch models have dtype float32
so we should be careful about typing. Can you add a comment about why?
Afaik norm
should be only used during model eval (to re-scale the intensity of the densities), so I don't think it's an issue in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any data type that is not a built-in python type needs extra bookkeeping information as to what library/class is needed to initialize a new object given a string representation of it. So serializing a simple variable x
which is a numpy float results in the yaml:
x: !!python/object/apply:numpy.core.multiarray._reconstruct
args:
- !!python/name:numpy.ndarray ''
- !!python/tuple
- 0
- !!binary |
Yg==
state: !!python/tuple
- 1
- !!python/tuple []
- !!python/object/apply:numpy.dtype
args:
- f8
- false
- true
state: !!python/tuple
- 3
- <
- null
- null
- null
- -1
- -1
- 0
- false
- !!binary |
AAAAAAAAAAA=
Serializing/Reading the norm
variable by casting it as a float
is indeed not an issue in this case as you suspect. I've verified this both locally as well as in the unit tests.
cryodrgn train_nn data/toy_projections.mrcs --poses data/toy_angles.pkl -o output/toy_recon -n 10 | ||
cryodrgn train_nn data/toy_projections.star --poses data/toy_angles.pkl -o output/toy_recon -n 10 | ||
cryodrgn train_nn data/toy_projections.txt --poses data/toy_angles.pkl -o output/toy_recon -n 10 | ||
cryodrgn train_nn data/toy_projections.mrcs --poses data/toy_angles.pkl -o output/toy_recon -n 10 --no-amp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toy_projections.mrcs
are 30x30 in size, and amp (default) insists on a side length being a multiple of 8. So this never works in my local setup without --no-amp
. This is unrelated to this PR though, so I'm happy to move it to a different PR/issue if you think it's better. (It fails further down anyway since there is no script called translate_stack.py
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this! We could consider relaxing that assertion but will start a separate issue for tracking that.
I've tested the notebooks too and they're working correctly w.r.t the rest of the
My understanding is that poses can be rotations, translations, or just translations, and the toy dataset that I'm testing out this notebook on (the I'm not sure how big an issue this is for the real usage of notebooks. If it is, perhaps we can open up a new issue on this? In any case it is unrelated to this PR so I'm confident that this can be merged, and there are no obvious places that we're overlooking here for bugs. |
Ah yes, I think we should open a new ticket for updating the toy datasets. Right now the toy dataset doesn't have any translations (just rotations since the images are perfectly centered). There is also no CTF applied on this dataset. I'm pretty sure this idealized setting breaks some of the assumptions in the notebooks, and we can discuss whether we need to support these idealized settings in the code and in the jupyter notebooks. |
Addresses issue #235.
Whenever configuration files are saved,
config.yaml
is used by default. When analysis commands are used, where aconfig.pkl
would have been unconditionally read initially, we check to see if aconfig.yaml
is present, failing which we look for aconfig.pkl
(and generate aDeprecationWarning
if we do so).view_config
works but generates aDeprecationWarning
too.I've tested the whole thing on old/newly generated training data, and it works correctly. I've modified the notebook templates too, but I haven't confirmed correct behavior on them. I'll leave a note here once I have. In the meantime, I'm opening up the PR so you can start looking around.