SLEAP read user labels #79

lochhh · 2023-10-31T17:17:13Z

Description

What is this PR

Bug fix
Addition of a new feature
Other

Why is this PR needed?
This PR enables load_poses._load_from_sleap_labels_file to load pose tracks from a SLEAP .slp file consisting of either:

user-labelled instances only,
predicted instances only, or
a mix of user-labelled and predicted instances

What does this PR do?
This PR adds:

load_poses._sleap_labels_to_numpy, a function that converts SLEAP Labels to a NumPy array, adapted from Labels.numpy() from the sleap_io package, but prioritises user-labelled instances over predicted instances when the .slp file consists of a mix of user-labelled and predicted instances. This behaviour mirrors SLEAP's approach when exporting ".h5" analysis files, so that the loading a .slp file in movement returns the same dataset as the corresponding SLEAP analysis .h5 file (i.e., exported from the same .slp file).
new test cases for
- checking that loading pairs of .h5 and .slp files returns the same dataset, regardless of whether the .slp file consists of user-labelled instances only, predicted instances only, or a mix of both
- converting SLEAP datasets to DeepLabCut-style pandas DataFrames
- converting SLEAP datasets to DeepLabCut file formats

References

This PR addresses #84

How has this PR been tested?

Tests have been written and run for the new functions added.

Does this PR require an update to the documentation?

Docstrings have been updated accordingly.

Checklist:

The code has been tested locally
Tests have been added to cover all new functionality
The documentation has been updated to reflect any changes
The code has been formatted with pre-commit

codecov · 2023-10-31T17:30:59Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (73858ad) 98.05% compared to head (f81f6d2) 98.19%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #79      +/-   ##
==========================================
+ Coverage   98.05%   98.19%   +0.14%     
==========================================
  Files           8        8              
  Lines         308      333      +25     
==========================================
+ Hits          302      327      +25     
  Misses          6        6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

niksirbi

Great job @lochhh!

I have two general comments on this, apart from the specifics:

I think we need to explicitly test for the case that a .slp file contains both predicted and user-labeled instances and make sure that we load the expected things in that case (according to what you describe in the docstring of _sleap_user_labels_to_numpy())
The load_poses.py is starting to get a bit long. How about refactoring the private _ functions into a separate movement.io.utils.py file? If you agree with this, let's open an issue to track this (we should do it in a different PR, after this one is merged).

tests/test_unit/test_load_poses.py

niksirbi · 2023-11-06T16:54:28Z

tests/test_unit/test_load_poses.py

+    @pytest.fixture(scope="module")
+    def sleap_user_labels(self):
+        """Return a SLEAP `Labels` object containing only
+        user-labeled instances.
+        """
+        slp_file = POSE_DATA.get(
+            "SLEAP_three-mice_Aeon_proofread.predictions.slp"
+        )
+        return read_labels(slp_file)
+
+    @pytest.fixture(scope="module")
+    def sleap_predicted_labels(self):
+        """Return a SLEAP `Labels` object containing only
+        predicted instances.
+        """
+        slp_file = POSE_DATA.get("SLEAP_single-mouse_EPM.predictions.slp")
+        return read_labels(slp_file)


Currently we have two types of datasets, either all predicted or all user-labeled instances. A user may supply a .slp file with a mix of both, and we want to make sure that we handle that appropriately.

So it seems to me that we are missing a test case. Could we "mock" such a dataset by taking "SLEAP_single-mouse_EPM.predictions.slp" and converting some instances to user-labeled? I'm open to other ways of creating this test-case as well.

I think this concerns a separate issue. For simplicity, let's consider a single-animal frame here.
Possible scenarios leading to a mix of Predicted and User-labelled Instances in a sleap.instance.LabeledFrame:

case 1 - user "fixes" incorrectly predicted individual A:
[PredictedInstance(A)] → [PredictedInstance(A), Instance(A)]

case 2 - user "adds" missing individual A:
[] → [Instance(A)]

In either case, the current behaviour using sleap-io's Labels.numpy() is to ignore user-labelled instances:

loading case1.slp in movement returns [PredictedInstance(A)]

loading case2.slp in movement returns []

However, if the user exports case1.slp as case1.h5, and case2.slp as case2.h5, SLEAP's write_tracking_h5 prioritises the user-labelled instances, i.e., if user-labelled instance is present, select user-labelled instance, otherwise select predicted instance. Hence:

loading case1.h5 in movement returns [Instance(A)] with NaN confidence

loading case2.h5 in movement returns [Instance(A)] with NaN confidence

I guess we should handle .slp files with mixed instance types the same way SLEAP does in write_tracking_h5, so that loading .slp and .h5 in movement return the same dataset. And perhaps, for consistency, we should leave confidence scores for user-labels-only files as NaNs, rather than setting these as 1.0...

Wrt test data, I think it's more straightforward to add a pair of .h5 and .slp files to our sample datasets.

Wdyt?

I thought about it, and I think what you propose here makes sense and aligns better with the expectations of users. If I correct a prediction (thereby, turning it into a user-labelled instance), I expect it to be prioritised for going into the output, like write_tracking_h5 does.

Moreover, as you point out, it makes sense for the imported data to be the same, whether it came from .slp or .h5. This means that we should indeed duplicate the functionality of write_tracking_h5. Since we cannot depend on sleap, and sleap-io doesn't have it, we should write our own implementation for it.

There are two issues I foresee with this approach:

We have to keep an eye on the write_tracking_h5 in sleap. If it's behaviour changes significantly, we may need to update our own version. But I hope they won't do that, because if they do, we'll have to ship different data loaders for different versions of sleap. But I guess this is the case already, we have to stay up-to-date with developments in deeplabcut and sleap.

As you mention, we probably have to mark the user-labelled instances with NaN confidence. This will not be a problem if we assume that user-labelling is the only process that can generate NaNs in .slp and .h5 files. If this assumption holds, we can treat NaNs coming from sleap as having the highest confidence during preprocessing. If this assumption is not true, we need another way of keeping track of which instances are user-labelled (maybe an additional boolean DataArray or a list of the indices of user-labelled instances stored in the Dataset's metadata attrs.

In summary, I think you should go ahead with implementing this plan in this PR, and we should come to a decision on how to handle NaNs. We definitely need to keep information on which predictions we can "trust", because it will come in handy during data cleaning.

movement/io/load_poses.py

Co-authored-by: Niko Sirmpilatze <niko.sirbiladze@gmail.com>

niksirbi

That's very good @lochhh !
I thinks this is ready to go, apart from one tiny comment I left.

We should merge this ahead of PR 83, because some of the tests I worte for that one will conflict with the ones here. After we merge this one, I'll merge/rebase PR 83, fix conflicts and merge that one too.

niksirbi · 2023-11-16T14:55:32Z

movement/io/load_poses.py

-    loaded. If there are multiple videos in the file, only the first one will
-    be used.
+    You can also directly load the ".slp" file. However, if the file contains
+    multiple videos, only the pose tracks from the first video will be loaded.


Should we also mention here that user-labeled instances are prioritised over the predicted ones? You have a thorough note about it in the docstring of the private function, but the users won't interface with that one, so I think it's also worth mentioning here.

Good point! Will add that.

sonarqubecloud · 2023-11-16T16:08:19Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

lochhh requested a review from niksirbi October 31, 2023 17:17

niksirbi linked an issue Oct 31, 2023 that may be closed by this pull request

Read SLEAP user-labelled instances #44

Closed

niksirbi modified the milestone: v0.1 - alpha release Oct 31, 2023

lochhh marked this pull request as ready for review November 1, 2023 09:45

niksirbi approved these changes Nov 6, 2023

View reviewed changes

niksirbi reviewed Nov 6, 2023

View reviewed changes

movement/io/load_poses.py Outdated Show resolved Hide resolved

lochhh mentioned this pull request Nov 7, 2023

Docs link check #80

Merged

6 tasks

lochhh force-pushed the sleap-user-label branch from 358ea10 to c3cd58f Compare November 7, 2023 12:02

lochhh marked this pull request as draft November 9, 2023 12:06

lochhh mentioned this pull request Nov 14, 2023

Handle SLEAP data with mixed (user/predicted) labels #84

Closed

lochhh linked an issue Nov 14, 2023 that may be closed by this pull request

Handle SLEAP data with mixed (user/predicted) labels #84

Closed

lochhh and others added 9 commits November 14, 2023 14:40

Clean up comments

9aa3e9e

Read SLEAP user-labeled instances

20c7771

Standardise .csv .h5 file namings in docstrings

db368f6

Add SLEAP test case for converting to dlc df

27bbb89

Add test cases for converting SLEAP to DLC files

81979b8

Return path instead of name in sleap_file fixture

e413912

Remove extra level indentation in docstring

04445fe

Co-authored-by: Niko Sirmpilatze <niko.sirbiladze@gmail.com>

Update sleap-related docstrings

0c3c63f

Log warning when assigning fixed conf scores

a4d9523

lochhh force-pushed the sleap-user-label branch from 2650542 to a4d9523 Compare November 14, 2023 14:40

lochhh added 2 commits November 14, 2023 14:48

Remove local sleap_file fixture

1f571fa

Refactor load from SLEAP labels file

724b8f0

lochhh marked this pull request as ready for review November 15, 2023 11:37

lochhh requested a review from niksirbi November 15, 2023 11:38

niksirbi approved these changes Nov 16, 2023

View reviewed changes

lochhh added 2 commits November 16, 2023 15:59

Update from_sleap_file notes

871b62d

Ignore anchor check for SLEAP gh permalink

f81f6d2

niksirbi merged commit 3b61db6 into main Nov 16, 2023
24 checks passed

niksirbi deleted the sleap-user-label branch November 16, 2023 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SLEAP read user labels #79

SLEAP read user labels #79

lochhh commented Oct 31, 2023 •

edited

Loading

codecov bot commented Oct 31, 2023 •

edited

Loading

niksirbi left a comment

niksirbi Nov 6, 2023

lochhh Nov 7, 2023

niksirbi Nov 9, 2023

niksirbi left a comment

niksirbi Nov 16, 2023

lochhh Nov 16, 2023

sonarqubecloud bot commented Nov 16, 2023

SLEAP read user labels #79

SLEAP read user labels #79

Conversation

lochhh commented Oct 31, 2023 • edited Loading

Description

References

How has this PR been tested?

Does this PR require an update to the documentation?

Checklist:

codecov bot commented Oct 31, 2023 • edited Loading

Codecov Report

niksirbi left a comment

Choose a reason for hiding this comment

niksirbi Nov 6, 2023

Choose a reason for hiding this comment

lochhh Nov 7, 2023

Choose a reason for hiding this comment

niksirbi Nov 9, 2023

Choose a reason for hiding this comment

niksirbi left a comment

Choose a reason for hiding this comment

niksirbi Nov 16, 2023

Choose a reason for hiding this comment

lochhh Nov 16, 2023

Choose a reason for hiding this comment

sonarqubecloud bot commented Nov 16, 2023

lochhh commented Oct 31, 2023 •

edited

Loading

codecov bot commented Oct 31, 2023 •

edited

Loading