Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return distance results as float rather than double for 3.0 or make return type flexible. #3927

Open
hmacdope opened this issue Nov 21, 2022 · 4 comments
Labels
Component-lib CZI-performance performance track of CZIEOSS4 grant
Milestone

Comments

@hmacdope
Copy link
Member

Is your feature request related to a problem?

As detailed in #3914 and observed other places, the need to mix precision when computing distances makes the mda.lib.distances python and C++ layers slow. It also requires an ugly shim to get distopia to work properly and IMO does not make heaps of sense with float coordinates to start with.

Describe the solution you'd like

We could return float results or have the return type be flexible, Ie make no promises about the return type.

Describe alternatives you've considered

Leave everything the way it is.

Leave your thoughts @MDAnalysis/coredevs as would be a breaking change.

@hmacdope hmacdope added Component-lib CZI-performance performance track of CZIEOSS4 grant labels Nov 21, 2022
@hmacdope hmacdope added this to the Release 3.0 milestone Nov 21, 2022
@hmacdope hmacdope mentioned this issue Nov 21, 2022
4 tasks
@hmacdope hmacdope moved this to In Progress in CZI Performance track Nov 21, 2022
@hmacdope hmacdope moved this from In Progress to Todo in CZI Performance track Nov 21, 2022
@richardjgowers
Copy link
Member

Ideally a Universe has a dtype, and all coordinates are of that type. Currently we're float/single for coordinates, boxes are double and then distances are returned as doubles, it's a bit inconsistent.

@orbeckst
Copy link
Member

I'd say a dtype makes sense for a homogenous data structure like an array. For something heterogenous like a Universe it's a "nice to have". I think it's more important to think about loss of precision when you're working with the data than to make promises up-front. For instance, it may make a lot of sense to use double accumulator variables when aggregating float coordinates.

@orbeckst
Copy link
Member

orbeckst commented Dec 2, 2022

The performance improvements that you show in #3914 (comment) make a strong case to give up the guarantee of double precision output.

Admittedly, after reading the other thread I understand better why @richardjgowers suggests

Ideally a Universe has a dtype, and all coordinates are of that type. Currently we're float/single for coordinates, boxes are double and then distances are returned as doubles, it's a bit inconsistent.

as this situation would avoid any casting in between.

I assume there will still be situations where we would want/need to use mixed precision.

Do we make boxes double because that's what we get from trajectories or because we need it for, e.g., PBC stuff?

@hmacdope
Copy link
Member Author

hmacdope commented Dec 2, 2022

My understanding is that the actual precision of coordinates as stored in the trajectory files varies wildly ie XTC it is less accurate than a float due to compression while a DCD I think reads coords as float and a box as double from what I can see.

There was also some arguments made that box ops should be done in double precision (I remember seeing these, I'll try and find them again). But I think this is a question of accuracy, i.e my tests done in distopia indicate you can get 5 sig fig accuracy with all floats anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component-lib CZI-performance performance track of CZIEOSS4 grant
Projects
Development

No branches or pull requests

3 participants