Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Integration for Neural Embedding based workflowes (eg: SAM (Segment Anything)) #6049

Closed
2 tasks done
kunaltyagi opened this issue Apr 20, 2023 · 3 comments · Fixed by #6019
Closed
2 tasks done
Labels
enhancement New feature or request

Comments

@kunaltyagi
Copy link
Contributor

My actions before raising this issue

On large images, SAM latency per query adds up.

Possible Solution

Is there a way we can store the encoded image embedding in CVAT db, perhaps a separate table and then run the interactive prompt encoder + decoder with onnx.js?

The table would have a primary key on the project/task/job/frame with columns for different model embeddings. We can use a simple binary blob to store the data. This table can be populated by either by task (workflow already exists) or per frame if the db doesn't have an entry.

This would require adding support for arbitrary scripts/webasm binaries in the browser to enable use of onnx.js

Context

Scaling DB is much easier (and cheaper) than scaling GPU HW, specially since the GPU compute is idempotent

Current Implementation and Thoughts

  1. Create a table manually in the database: Easy, but official support would be appreciated
  2. Create a detector model which gets the request and populates the table, and doesn't detect anything ever
    • Feels like a hack to have 2 models, one for pre-process and one for use
    • Need to access poorly documented event.body payload
  3. Create an interactor which runs pytorch for the prompt encoder and decoder as a serverless function
    • Sadly, making this use onnx.js would take too much effort (and know-how of cvat and js)
    • We add a checksum to the db to ensure the image is the same, else generate embeddings again
@bsekachev
Copy link
Member

Hi @kunaltyagi

I am working on client-side optimizations for SAM in #6019

@bsekachev bsekachev added the enhancement New feature or request label Apr 25, 2023
@ryanalexmartin
Copy link

Excellent idea, @kunaltyagi

@bsekachev A frontend solution is fantastic, and I'm very much looking forward to using your work once it's finished, but I would also love to see a backend implementation. as some users would prefer to use more powerful GPUs available on their cloud instances, and may be connecting from thin clients. Perhaps the Nuclio function itself could store and look-up the embeddings for images in a mounted volume.

I think this per-image workflow would feel good and responsive:

  1. User selects the SAM tool, via the magic wand, and clicks "Interact", but is not yet allowed to click on the image. Red cross hair does not yet appear. A request is sent to the Nuclio serverless function. The function checks a table/db/volume/etc to see if embeddings for the image name (and matching checksum) exist.

  2. The modal that usually says "Waiting for a response from Segment Anything..." instead says "generating embeddings for image..." or, if the embeddings have been previously generated, are loaded from wherever the Nuclio function stores them. The embedding of image's location is returned to the client.

  3. The embeddings for the image are now generated, or loaded from the database. Red crosshair appears, and now we are able to annotate the image. The resource locator of the embeddings are included with these requests, so the model no longer has to generate them every time we click, saving the world tons of money in compute and reducing CO2 emissions.

I'll try my hand at writing this, but if anybody sees any glaring issues with what I described above, I'd appreciate a heads-up.

@kunaltyagi
Copy link
Contributor Author

In step 2, I'd provide a button to try again (in case the image changed due to some db operation)

bsekachev added a commit that referenced this issue May 11, 2023
<!-- Raise an issue to propose your change
(https://github.com/opencv/cvat/issues).
It helps to avoid duplication of efforts from multiple independent
contributors.
Discuss your ideas with maintainers to be sure that changes will be
approved and merged.
Read the [Contribution
guide](https://opencv.github.io/cvat/docs/contributing/). -->

<!-- Provide a general summary of your changes in the Title above -->

### Motivation and context
Resolved #5984 
Resolved #6049
Resolved #6041

- Compatible only with ``sam_vit_h_4b8939.pth`` weights. Need to
re-export ONNX mask decoder with some custom model changes (see below)
to support other weights (or just download them using links below)
- Need to redeploy the serverless function because its interface has
been changed.

Decoders for other weights:
sam_vit_l_0b3195.pth:
[Download](https://drive.google.com/file/d/1Nb5CJKQm_6s1n3xLSZYso6VNgljjfR-6/view?usp=sharing)
sam_vit_b_01ec64.pth:
[Download](https://drive.google.com/file/d/17cZAXBPaOABS170c9bcj9PdQsMziiBHw/view?usp=sharing)

Changes done in ONNX part:
```
git diff scripts/export_onnx_model.py
diff --git a/scripts/export_onnx_model.py b/scripts/export_onnx_model.py
index 8441258..18d5be7 100644
--- a/scripts/export_onnx_model.py
+++ b/scripts/export_onnx_model.py
@@ -138,7 +138,7 @@ def run_export(

     _ = onnx_model(**dummy_inputs)

-    output_names = ["masks", "iou_predictions", "low_res_masks"]
+    output_names = ["masks", "iou_predictions", "low_res_masks", "xtl", "ytl", "xbr", "ybr"]

     with warnings.catch_warnings():
         warnings.filterwarnings("ignore", category=torch.jit.TracerWarning)
bsekachev@DESKTOP-OTBLK26:~/sam$ git diff segment_anything/utils/onnx.py
diff --git a/segment_anything/utils/onnx.py b/segment_anything/utils/onnx.py
index 3196bdf..85729c1 100644
--- a/segment_anything/utils/onnx.py
+++ b/segment_anything/utils/onnx.py
@@ -87,7 +87,15 @@ class SamOnnxModel(nn.Module):
         orig_im_size = orig_im_size.to(torch.int64)
         h, w = orig_im_size[0], orig_im_size[1]
         masks = F.interpolate(masks, size=(h, w), mode="bilinear", align_corners=False)
-        return masks
+        masks = torch.gt(masks, 0).to(torch.uint8)
+        nonzero = torch.nonzero(masks)
+        xindices = nonzero[:, 3:4]
+        yindices = nonzero[:, 2:3]
+        ytl = torch.min(yindices).to(torch.int64)
+        ybr = torch.max(yindices).to(torch.int64)
+        xtl = torch.min(xindices).to(torch.int64)
+        xbr = torch.max(xindices).to(torch.int64)
+        return masks[:, :, ytl:ybr + 1, xtl:xbr + 1], xtl, ytl, xbr, ybr

     def select_masks(
         self, masks: torch.Tensor, iou_preds: torch.Tensor, num_points: int
@@ -132,7 +140,7 @@ class SamOnnxModel(nn.Module):
         if self.return_single_mask:
             masks, scores = self.select_masks(masks, scores, point_coords.shape[1])

-        upscaled_masks = self.mask_postprocessing(masks, orig_im_size)
+        upscaled_masks, xtl, ytl, xbr, ybr = self.mask_postprocessing(masks, orig_im_size)

         if self.return_extra_metrics:
             stability_scores = calculate_stability_score(
@@ -141,4 +149,4 @@ class SamOnnxModel(nn.Module):
             areas = (upscaled_masks > self.model.mask_threshold).sum(-1).sum(-1)
             return upscaled_masks, scores, stability_scores, areas, masks

-        return upscaled_masks, scores, masks
+        return upscaled_masks, scores, masks, xtl, ytl, xbr, ybr
```

### How has this been tested?
<!-- Please describe in detail how you tested your changes.
Include details of your testing environment, and the tests you ran to
see how your change affects other areas of the code, etc. -->

### Checklist
<!-- Go over all the following points, and put an `x` in all the boxes
that apply.
If an item isn't applicable for some reason, then ~~explicitly
strikethrough~~ the whole
line. If you don't do that, GitHub will show incorrect progress for the
pull request.
If you're unsure about any of these, don't hesitate to ask. We're here
to help! -->
- [x] I submit my changes into the `develop` branch
- [x] I have added a description of my changes into the
[CHANGELOG](https://github.com/opencv/cvat/blob/develop/CHANGELOG.md)
file
- [ ] I have updated the documentation accordingly
- [ ] I have added tests to cover my changes
- [x] I have linked related issues (see [GitHub docs](

https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword))
- [x] I have increased versions of npm packages if it is necessary

([cvat-canvas](https://github.com/opencv/cvat/tree/develop/cvat-canvas#versioning),

[cvat-core](https://github.com/opencv/cvat/tree/develop/cvat-core#versioning),

[cvat-data](https://github.com/opencv/cvat/tree/develop/cvat-data#versioning)
and

[cvat-ui](https://github.com/opencv/cvat/tree/develop/cvat-ui#versioning))

### License

- [x] I submit _my code changes_ under the same [MIT License](
https://github.com/opencv/cvat/blob/develop/LICENSE) that covers the
project.
  Feel free to contact the maintainers if that's a concern.
mikhail-treskin pushed a commit to retailnext/cvat that referenced this issue Jul 1, 2023
<!-- Raise an issue to propose your change
(https://github.com/opencv/cvat/issues).
It helps to avoid duplication of efforts from multiple independent
contributors.
Discuss your ideas with maintainers to be sure that changes will be
approved and merged.
Read the [Contribution
guide](https://opencv.github.io/cvat/docs/contributing/). -->

<!-- Provide a general summary of your changes in the Title above -->

### Motivation and context
Resolved cvat-ai#5984 
Resolved cvat-ai#6049
Resolved cvat-ai#6041

- Compatible only with ``sam_vit_h_4b8939.pth`` weights. Need to
re-export ONNX mask decoder with some custom model changes (see below)
to support other weights (or just download them using links below)
- Need to redeploy the serverless function because its interface has
been changed.

Decoders for other weights:
sam_vit_l_0b3195.pth:
[Download](https://drive.google.com/file/d/1Nb5CJKQm_6s1n3xLSZYso6VNgljjfR-6/view?usp=sharing)
sam_vit_b_01ec64.pth:
[Download](https://drive.google.com/file/d/17cZAXBPaOABS170c9bcj9PdQsMziiBHw/view?usp=sharing)

Changes done in ONNX part:
```
git diff scripts/export_onnx_model.py
diff --git a/scripts/export_onnx_model.py b/scripts/export_onnx_model.py
index 8441258..18d5be7 100644
--- a/scripts/export_onnx_model.py
+++ b/scripts/export_onnx_model.py
@@ -138,7 +138,7 @@ def run_export(

     _ = onnx_model(**dummy_inputs)

-    output_names = ["masks", "iou_predictions", "low_res_masks"]
+    output_names = ["masks", "iou_predictions", "low_res_masks", "xtl", "ytl", "xbr", "ybr"]

     with warnings.catch_warnings():
         warnings.filterwarnings("ignore", category=torch.jit.TracerWarning)
bsekachev@DESKTOP-OTBLK26:~/sam$ git diff segment_anything/utils/onnx.py
diff --git a/segment_anything/utils/onnx.py b/segment_anything/utils/onnx.py
index 3196bdf..85729c1 100644
--- a/segment_anything/utils/onnx.py
+++ b/segment_anything/utils/onnx.py
@@ -87,7 +87,15 @@ class SamOnnxModel(nn.Module):
         orig_im_size = orig_im_size.to(torch.int64)
         h, w = orig_im_size[0], orig_im_size[1]
         masks = F.interpolate(masks, size=(h, w), mode="bilinear", align_corners=False)
-        return masks
+        masks = torch.gt(masks, 0).to(torch.uint8)
+        nonzero = torch.nonzero(masks)
+        xindices = nonzero[:, 3:4]
+        yindices = nonzero[:, 2:3]
+        ytl = torch.min(yindices).to(torch.int64)
+        ybr = torch.max(yindices).to(torch.int64)
+        xtl = torch.min(xindices).to(torch.int64)
+        xbr = torch.max(xindices).to(torch.int64)
+        return masks[:, :, ytl:ybr + 1, xtl:xbr + 1], xtl, ytl, xbr, ybr

     def select_masks(
         self, masks: torch.Tensor, iou_preds: torch.Tensor, num_points: int
@@ -132,7 +140,7 @@ class SamOnnxModel(nn.Module):
         if self.return_single_mask:
             masks, scores = self.select_masks(masks, scores, point_coords.shape[1])

-        upscaled_masks = self.mask_postprocessing(masks, orig_im_size)
+        upscaled_masks, xtl, ytl, xbr, ybr = self.mask_postprocessing(masks, orig_im_size)

         if self.return_extra_metrics:
             stability_scores = calculate_stability_score(
@@ -141,4 +149,4 @@ class SamOnnxModel(nn.Module):
             areas = (upscaled_masks > self.model.mask_threshold).sum(-1).sum(-1)
             return upscaled_masks, scores, stability_scores, areas, masks

-        return upscaled_masks, scores, masks
+        return upscaled_masks, scores, masks, xtl, ytl, xbr, ybr
```

### How has this been tested?
<!-- Please describe in detail how you tested your changes.
Include details of your testing environment, and the tests you ran to
see how your change affects other areas of the code, etc. -->

### Checklist
<!-- Go over all the following points, and put an `x` in all the boxes
that apply.
If an item isn't applicable for some reason, then ~~explicitly
strikethrough~~ the whole
line. If you don't do that, GitHub will show incorrect progress for the
pull request.
If you're unsure about any of these, don't hesitate to ask. We're here
to help! -->
- [x] I submit my changes into the `develop` branch
- [x] I have added a description of my changes into the
[CHANGELOG](https://github.com/opencv/cvat/blob/develop/CHANGELOG.md)
file
- [ ] I have updated the documentation accordingly
- [ ] I have added tests to cover my changes
- [x] I have linked related issues (see [GitHub docs](

https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword))
- [x] I have increased versions of npm packages if it is necessary

([cvat-canvas](https://github.com/opencv/cvat/tree/develop/cvat-canvas#versioning),

[cvat-core](https://github.com/opencv/cvat/tree/develop/cvat-core#versioning),

[cvat-data](https://github.com/opencv/cvat/tree/develop/cvat-data#versioning)
and

[cvat-ui](https://github.com/opencv/cvat/tree/develop/cvat-ui#versioning))

### License

- [x] I submit _my code changes_ under the same [MIT License](
https://github.com/opencv/cvat/blob/develop/LICENSE) that covers the
project.
  Feel free to contact the maintainers if that's a concern.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants