Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for remotely-sourced zoo models #4786

Merged
merged 5 commits into from
Sep 17, 2024
Merged

Conversation

brimoor
Copy link
Contributor

@brimoor brimoor commented Sep 10, 2024

Adds support for loading zoo models whose definitions are stored in a GitHub repository or publicly accessible URL! 🎉

Example datasets published in this format:

Notes

Requires voxel51/eta#633

Usage

fiftyone zoo models register-source https://github.com/voxel51/openai-clip
fiftyone zoo models list-sources
fiftyone zoo models list --source https://github.com/voxel51/openai-clip
import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset(
    "coco-2017",
    split="validation",
    max_samples=50,
    shuffle=True,
)

model = foz.load_zoo_model(
    "voxel51/clip-vit-base32-torch",
    text_prompt="A photo of a",
    classes=["person", "dog", "cat", "bird", "car", "tree", "chair"],
)

dataset.apply_model(model, label_field="predictions")

session = fo.launch_app(dataset)
fiftyone zoo models delete voxel51/clip-vit-base32-torch
fiftyone zoo models delete-source https://github.com/voxel51/openai-clip

Creating remotely-sourced models

This is extensively documented in the PR, but the basic ingredients to defining a remotely-sourced zoo model are a manifest.json file:

{
    "name": "voxel51/openai-clip",
    "url": "https://github.com/voxel51/openai-clip",
    "models": [
        {
            "base_name": "voxel51/clip-vit-base32-torch",
            "base_filename": "CLIP-ViT-B-32.pt",
            "author": "OpenAI",
            "license": "MIT",
            "source": "https://github.com/openai/CLIP",
            "description": "CLIP text/image encoder from Learning Transferable Visual Models From Natural Language Supervision (https://arxiv.org/abs/2103.00020) trained on 400M text-image pairs",
            "tags": [
                "classification",
                "logits",
                "embeddings",
                "torch",
                "clip",
                "zero-shot"
            ],
            "size_bytes": 353976522,
            "date_added": "2022-04-12 17:49:51",
            "requirements": {
                "packages": ["torch", "torchvision"],
                "cpu": {
                    "support": true
                },
                "gpu": {
                    "support": true
                }
            }
        }
    ]
}

and an __init__.py file that contains download_model() and load_model() methods:

def download_model(model_name, model_path):
    """Downloads the model.

    Args:
        model_name: the name of the model to download, as declared by the
            ``base_name`` and optional ``version`` fields of the manifest
        model_path: the absolute filename or directory to which to download the
            model, as declared by the ``base_filename`` field of the manifest
    """

    # Determine where to download `model_name` from
    url = ...

    # Download `url` to `model_path`
    ...

def load_model(model_name, model_path, **kwargs):
    """Loads the model.

    Args:
        model_name: the name of the model to load, as declared by the
            ``base_name`` and optional ``version`` fields of the manifest
        model_path: the absolute filename or directory to which the model was
            donwloaded, as declared by the ``base_filename`` field of the
            manifest
        **kwargs: optional keyword arguments that configure how the model
            is loaded

    Returns:
        a :class:`fiftyone.core.models.Model`
    """

    # The directory containing this file
    model_dir = os.path.dirname(model_path)

    # Consturct the specified `Model` instance, generally by importing
    # other modules in `model_dir`
    model = ...

    return model

Summary by CodeRabbit

  • New Features

    • Enhanced CLI commands for managing remote model sources, including list-sources, register-source, and delete-source.
    • Added functionality to download and apply models from remote sources, with improved command structure.
    • Introduced a comprehensive overview of the Model Zoo design, detailing model integration and prediction processes.
  • Documentation

    • Updated documentation for the FiftyOne Model Zoo CLI and API to clarify new commands and functionalities.
    • Streamlined content in various sections to improve usability and focus on essential information.
  • Bug Fixes

    • Improved error handling and logging for model registration and downloading processes.

Copy link
Contributor

coderabbitai bot commented Sep 10, 2024

Walkthrough

The pull request introduces significant enhancements to the FiftyOne Model Zoo, focusing on the management of remote model sources. Key updates include new CLI commands for listing, registering, and deleting remote sources, as well as modifications to existing commands to accommodate remote models. Documentation has been updated to reflect these changes, providing clearer guidance on using the Model Zoo and its API. Additionally, the version requirement for the voxel51-eta package has been updated to ensure compatibility with newer features.

Changes

File Change Summary
docs/scripts/make_model_zoo_docs.py Updated documentation header and description for clarity; no functional changes.
docs/source/cli/index.rst Added new commands for managing remote model sources; updated existing commands to support remote models.
docs/source/user_guide/model_zoo/api.rst Removed introductory text to streamline the Model Zoo API documentation.
docs/source/user_guide/model_zoo/design.rst Introduced comprehensive overview of Model Zoo design, detailing model integration and prediction processes.
docs/source/user_guide/model_zoo/index.rst Enhanced introduction and structure to clarify Model Zoo capabilities and usage.
docs/source/user_guide/model_zoo/remote.rst New documentation on working with remotely-sourced models, including registration and management instructions.
fiftyone/core/cli.py Added commands for managing remote sources; modified existing commands for remote model flexibility.
fiftyone/zoo/models/__init__.py Enhanced functionality for managing remote models, including new source management functions and improved download logic.
setup.py Updated version requirement for voxel51-eta package to >=0.13.0,<0.14.

Possibly related PRs

  • Adding support for remotely-sourced zoo datasets #4752: The changes in this PR enhance the functionality of the FiftyOne library by adding support for remotely-sourced datasets, which aligns with the main PR's updates to documentation regarding built-in models and their management.

Suggested labels

feature

Poem

🐇 In the Model Zoo, oh what a sight,
New commands and sources, all feel just right.
With models remote, we hop and we play,
Managing them easily, hip-hip-hooray!
So gather 'round, friends, let’s dance and rejoice,
For FiftyOne's magic gives us a voice! 🎉


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    -- I pushed a fix in commit <commit_id>, please review it.
    -- Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    -- @coderabbitai generate unit testing code for this file.
    -- @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    -- @coderabbitai generate interesting stats about this repository and render them as a table.
    -- @coderabbitai read src/utils.ts and generate unit testing code.
    -- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    -- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@brimoor brimoor marked this pull request as ready for review September 15, 2024 15:50
Base automatically changed from remote-zoo-datasets to develop September 15, 2024 15:54
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range and nitpick comments (1)
fiftyone/core/cli.py (1)

Line range hint 2293-2320: Looks good, but consider using a ternary operator

Adding the "remote" column to indicate if a model is remotely-sourced looks great and fits well with the goal of supporting remote models.

As suggested by the static analysis tool, you could simplify the code by replacing the if-else block with a ternary operator:

is_remote = "✓" if isinstance(model, fozm.RemoteZooModel) else ""

This is a non-blocking suggestion, feel free to keep the current code if you prefer it for clarity.

Tools
Ruff

2303-2306: Use ternary operator is_remote = "✓" if isinstance(model, fozm.RemoteZooModel) else "" instead of if-else-block

Replace if-else-block with is_remote = "✓" if isinstance(model, fozm.RemoteZooModel) else ""

(SIM108)

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 38527cb and 1fd3654.

Files selected for processing (9)
  • docs/scripts/make_model_zoo_docs.py (1 hunks)
  • docs/source/cli/index.rst (9 hunks)
  • docs/source/user_guide/model_zoo/api.rst (0 hunks)
  • docs/source/user_guide/model_zoo/design.rst (1 hunks)
  • docs/source/user_guide/model_zoo/index.rst (6 hunks)
  • docs/source/user_guide/model_zoo/remote.rst (1 hunks)
  • fiftyone/core/cli.py (12 hunks)
  • fiftyone/zoo/models/init.py (18 hunks)
  • setup.py (1 hunks)
Files not reviewed due to no reviewable changes (1)
  • docs/source/user_guide/model_zoo/api.rst
Files skipped from review due to trivial changes (1)
  • docs/scripts/make_model_zoo_docs.py
Additional context used
Ruff
fiftyone/zoo/models/__init__.py

275-278: Use ternary operator name = model_name if model_name is not None else name_or_url instead of if-else-block

Replace if-else-block with name = model_name if model_name is not None else name_or_url

(SIM108)


648-651: Use ternary operator repo = GitHubRepository(url_or_gh_repo) if "github" in url else None instead of if-else-block

Replace if-else-block with repo = GitHubRepository(url_or_gh_repo) if "github" in url else None

(SIM108)


754-754: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


762-762: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

fiftyone/core/cli.py

2303-2306: Use ternary operator is_remote = "✓" if isinstance(model, fozm.RemoteZooModel) else "" instead of if-else-block

Replace if-else-block with is_remote = "✓" if isinstance(model, fozm.RemoteZooModel) else ""

(SIM108)

Additional comments not posted (38)
setup.py (1)

78-78: Verify compatibility with the updated voxel51-eta version.

Updating the voxel51-eta version requirement is a good change. However, please ensure that:

  • The codebase has been thoroughly tested with version 0.13.0 of voxel51-eta.
  • Any necessary changes have been made to leverage new features or updates in voxel51-eta 0.13.0.
  • There are no breaking changes in voxel51-eta 0.13.0 that could negatively impact the functionality of this project.

You can verify the compatibility by running the following commands:

Also, review the changelog or release notes of voxel51-eta 0.13.0 to identify any breaking changes or updates that may require corresponding changes in this codebase.

Verification successful

Version update confirmed, no direct usage found

The update to voxel51-eta>=0.13.0,<0.14 in setup.py is confirmed. However, no direct imports or usage of voxel51.eta were found in the codebase. This suggests that voxel51-eta might be used as a dependency for other packages or modules.

To ensure compatibility:

  • Thoroughly test the project with the new version of voxel51-eta.
  • Review the changelog for voxel51-eta 0.13.0 to identify any breaking changes or new features that might affect your project indirectly.
  • Pay attention to any warnings or deprecation notices during testing.

The lack of direct usage reduces the risk of incompatibility, but careful testing is still recommended.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash

# Verify that the codebase imports and uses `voxel51.eta` correctly
rg --type python -A 5 $'import voxel51.eta'

# Check if there are any references to deprecated or removed features from `voxel51.eta`
rg --type python -i $'voxel51.eta'

Length of output: 140


Script:

#!/bin/bash

# Find Python files and search for voxel51.eta imports
fd -e py -x grep -Hn "import.*voxel51\.eta" {} +

# Find Python files and search for any mentions of voxel51.eta
fd -e py -x grep -Hn "voxel51\.eta" {} +

# Check for any potential usage of voxel51-eta in other file types (e.g., configuration files)
grep -r "voxel51-eta" .

Length of output: 26798

docs/source/user_guide/model_zoo/index.rst (5)

8-13: LGTM!

The introduction provides a clear and concise overview of the FiftyOne Model Zoo's capabilities, highlighting its key features and supported model sources. The content aligns well with the AI-generated summary.


29-45: LGTM!

The "Built-in models" section effectively focuses on the pre-trained models available in the Model Zoo. The custom button provides a clear call-to-action for users to explore the available models, and the note about custom models is a relevant and informative addition. The content aligns well with the AI-generated summary.


46-65: LGTM!

The addition of the "Remotely-sourced models" and "Model interface" sections enhances the documentation by introducing key aspects of the Model Zoo's functionality and design. The content aligns well with the AI-generated summary, and the custom buttons provide clear navigation to pages with more detailed information.


70-75: LGTM!

The "API reference" section effectively informs users about the dual access methods (Python library and CLI) for managing models in the Model Zoo. The custom button provides a clear link to the API reference page for more detailed information. The content aligns well with the AI-generated summary.


Line range hint 88-239: LGTM!

The "Basic recipe" section provides clear and comprehensive guidance on using the Model Zoo for prediction, embeddings, and logits. The code samples and explanations are well-structured and easy to follow. The content aligns well with the AI-generated summary, particularly regarding the improved instructions on checking for logits and storing them during predictions.

docs/source/user_guide/model_zoo/design.rst (7)

1-10: LGTM!

The introduction provides a clear and concise overview of the Model Zoo design and the Model interface. It highlights the key aspects of the Model class, such as providing a common interface for loading models and generating predictions with defined input and output data formats, which is important for consistency and ease of use.


12-25: Informative note for custom model integration!

The note provides useful information for users who want to integrate their custom models with FiftyOne. It encourages implementing the Model interface to leverage built-in methods and mentions the TorchImageModel class as a utility for deploying models in custom frameworks, which is helpful for users.


27-201: Comprehensive explanation of the prediction process!

The code segment provides a detailed explanation of the prediction process using the Model interface. The code examples for image and video models, along with the description of the predict() method's input and output formats, provide clarity on how to use the interface correctly. The mentions of the context manager convention, predict_all() method for batching, and using DataLoaders for PyTorch models are helpful for users to leverage the full capabilities of the Model interface.


203-234: Clear explanation of embeddings computation!

The code segment provides a clear explanation of how models can expose embeddings computation capability using the EmbeddingsMixin. The similarity to the predict() method in terms of usage and the convention of returning a numpy array make it easy for users to understand and use the embed() method. The note about embedding shapes and the mention of the embed_all() method for batching provide flexibility and efficiency for embeddings computation.


236-249: Clear explanation of logits computation!

The code segment provides a clear explanation of how models can expose logits computation capability using the LogitsMixin. The mention of the store_logits property and how logits are stored in the Label instances during inference provides clarity on how users can access and utilize the logits generated by the model.


251-349: Detailed explanation of using custom models!

The code segment provides a detailed explanation of how to use custom models, particularly Torch models, with FiftyOne. The example of loading a pretrained model from torchvision and the explanation of the TorchImageModelConfig class and its parameters provide a clear understanding of the process. The descriptions of model loading, preprocessing options, output processors, and the embeddings_layer parameter cover all the essential aspects of integrating custom models with FiftyOne, making it easier for users to leverage their own models.


351-360: Useful information on registering custom models!

The note provides useful information for users who want to register and reuse their custom models. The ability to register custom models under a chosen name allows for easy reference and reusability. The example code snippet clearly demonstrates how to load and use a registered custom model, making it convenient for users to leverage their custom models across different projects.

docs/source/user_guide/model_zoo/remote.rst (5)

1-194: Excellent documentation!

The introduction and usage sections provide a comprehensive guide on working with remotely-sourced zoo models. The instructions are clear, well-structured, and easy to follow. The inclusion of code snippets for both Python and CLI usage is particularly helpful.


195-211: Great overview of creating remotely-sourced models!

The section provides a clear and concise overview of the components required for creating a remote source of models. The directory structure and required files are well-explained without overwhelming the reader with too much detail.


212-319: Comprehensive description of the manifest.json file!

The section provides a detailed and informative description of the manifest.json file and its fields. The table with field descriptions is well-structured and easy to understand. The inclusion of an example manifest.json file is a great addition to help users grasp the structure and content of the file.


320-354: Clear explanation of the download_model() function!

The section provides a concise and informative description of the download_model() function, its signature, and its purpose. The explanation of when the function is called under-the-hood is particularly helpful for understanding its role in the overall process.


355-412: Excellent documentation of the load_model() function!

The section provides a comprehensive and well-structured explanation of the load_model() function, its signature, and its purpose. The mention of optional keyword arguments for customized loading is a great addition, as it highlights the flexibility of the function. The reference to an example remote model source is also a valuable resource for users who want to see a real-world implementation.

fiftyone/zoo/models/__init__.py (10)

Line range hint 35-69: LGTM!

The addition of the source parameter to filter models based on registered remote sources is a useful enhancement. The implementation looks good.


Line range hint 72-88: Looks good!

The function has been updated to correctly handle the new source parameter and filter the manifest accordingly. The implementation is straightforward and easy to follow.


Line range hint 124-159: Great work!

The updates to the download_zoo_model function provide more flexibility in specifying the model to download by accepting either a model name or a remote source URL. The added note about using a GitHub personal access token for private repositories is also helpful for users.

The implementation looks solid and handles the different scenarios correctly.


Line range hint 219-308: Excellent updates!

The enhancements to the load_zoo_model function provide more flexibility by allowing users to specify either a model name or a remote source URL. The added note about using a GitHub personal access token for private repositories is a helpful addition.

The implementation handles various scenarios, such as caching loaded models, downloading models if necessary, installing requirements, and loading remote models. The code is well-structured and easy to follow.

Great job on these improvements!


363-370: New function looks good!

The addition of the list_zoo_model_sources function is a helpful utility for users to easily retrieve a list of all registered remote model sources. The implementation is straightforward and leverages the existing _load_zoo_models_manifest function to get the necessary data.

Nice work on adding this functionality!


373-393: Great addition!

The register_zoo_model_source function is a valuable addition that allows users to easily register a new remote model source. The flexibility in accepting various formats for the url_or_gh_repo parameter, such as GitHub repo URLs, refs, or public archive URLs, makes it convenient for users to specify the remote source.

The inclusion of the overwrite flag is a good touch, giving users control over whether to overwrite existing files during the registration process.

The note about using a GitHub personal access token for private repositories is a helpful reminder for users.

Overall, this function is a great enhancement to the FiftyOne Model Zoo functionality.


396-421: Solid implementation!

The delete_zoo_model_source function is a useful addition that allows users to remove a registered remote source and its associated downloaded models. The flexibility in accepting various formats for the url_or_gh_repo parameter, such as GitHub repo URLs or refs, makes it convenient for users to specify the remote source to delete.

The function handles different scenarios appropriately:

  • If the manifest is found and its directory is not the top-level model zoo directory, it deletes the directory.
  • If the manifest is found but its directory is the top-level model zoo directory, it logs a warning message to prevent accidental deletion.
  • If the manifest is not found, it raises a ValueError with a clear message.

The implementation is clean and easy to understand.

Great job on adding this functionality to manage remote sources in the FiftyOne Model Zoo!


497-504: Nice addition!

The RemoteZooModel class is a good addition to represent remotely-sourced models in the FiftyOne Model Zoo. By inheriting from the ZooModel class, it inherits all the necessary attributes and methods.

The override of the __init__ method is clean and handles the case when the manager attribute is None by creating a default RemoteModelManagerConfig instance with the model name. This ensures that remotely-sourced models have a manager configuration even if one is not explicitly provided.

The class is well-defined and fits nicely into the existing model zoo architecture.

Great work on adding this class to support remotely-sourced models!


506-510: Looks good!

The RemoteModelManagerConfig class is a straightforward addition to represent the configuration for a remote model manager. By inheriting from the etam.ModelManagerConfig class, it inherits the necessary functionality for a model manager configuration.

The override of the __init__ method is simple and clear. It calls the superclass constructor with the provided dictionary and then extracts the model_name using the parse_string method, assigning it to the model_name attribute. This ensures that the model name is easily accessible as a separate attribute.

The class is well-defined and serves its purpose of storing the configuration for a remote model manager.

Nice work on adding this class to support the configuration of remote model managers!


512-515: Good implementation!

The RemoteModelManager class is a nice addition to handle the

docs/source/cli/index.rst (4)

2560-2560: LGTM!

The addition of the list-sources command to list remote zoo model sources is a valuable enhancement that aligns with the PR objectives.


2572-2583: LGTM!

The addition of the list-sources, register-source, and delete-source commands to manage remote zoo model sources is a valuable enhancement that aligns with the PR objectives.


Line range hint 2590-2635: LGTM!

The addition of the --source argument to the fiftyone zoo models list command for listing models from a specific remote source is a valuable enhancement that aligns with the PR objectives.


2744-2789: LGTM!

The updates to the fiftyone zoo models download command to support downloading remotely-sourced zoo models, along with the addition of the --model-name argument, are valuable enhancements that align with the PR objectives.

fiftyone/core/cli.py (6)

2202-2210: LGTM!

The new subcommands for managing remote model sources look good.


Line range hint 2218-2235: Looks good!

The updated docstring and new --source argument for filtering models by remote source look great.


2258-2263: LGTM!

Adding the source argument to handle the --source CLI option looks good.


2269-2275: Looks good!

Extracting the tags and source arguments, and splitting the tags on commas, looks correct and consistent with the CLI options.


2279-2287: LGTM!

Retrieving the filtered models using fozm._list_zoo_models() with the tags and source arguments, and passing the result to _print_zoo_models_list(), looks great.


2480-2536: LGTM!

The changes to support downloading remotely-sourced zoo models look great:

  • The updated docstring clearly explains the supported formats for specifying remote models.
  • Renaming the argument to name_or_url and updating the help text matches the new behavior.
  • The --model-name argument is a good addition to handle remote sources with multiple models.
  • Calling fozm.download_zoo_model() with the name_or_url and model_name arguments is the right approach.

Overall, this looks like a solid implementation of the remote model downloading feature.

@brimoor brimoor merged commit 846b8e2 into develop Sep 17, 2024
10 of 14 checks passed
@brimoor brimoor deleted the remote-zoo-models branch September 17, 2024 13:36
@brimoor
Copy link
Contributor Author

brimoor commented Sep 17, 2024

Reviewed live w/ ML team

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant