Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check gpu task #361

Merged
merged 7 commits into from
Mar 21, 2024
Merged

Check gpu task #361

merged 7 commits into from
Mar 21, 2024

Conversation

annehaley
Copy link
Collaborator

This PR is intended to demonstrate that the mock deepssm task has access to a GPU when run from the "gpu" queue. These changes can be reverted when the real deepssm task is implemented, but for now this version can be used to test the GPU availability.

In order to demonstrate a difference between the two queues, the "mock-deepssm" endpoint will spawn a task on each queue. To persist the feedback from each task, two TaskProgress objects are used. These TaskProgress objects are not associated with any project, so one migration needs to be applied to allow project=null on TaskProgress objects. Since we need a migration anyway, I added a field "message" to the model (similar to the "error" field but without the connotation of failure). The mock deepssm task will save a string to this field which describes the availability of a GPU device.

Once merged, the expected behavior is as follows:

  • User submits a post request to api/mock-deepssm
  • Two tasks are spawned: One deepssm task sent to the "gpu" queue and one deepssm task sent to the default queue ("celery")
  • User receives a response similar to the following, containing two TaskProgress ids:
{
  "success": true,
  "progress_ids": {
    "gpu": 5,
    "default": 6
  }
}
  • The task sent to the default queue will run and save the following message to its TaskProgress object: "DeepSSM task not implemented; testing GPU availability. GPU available = False."
  • The task sent to the "gpu" queue will stay in the queue until the "manage_workers" task is spawned by Celery beat, whereupon the GPU worker on AWS will be started. The GPU worker will pick up the waiting task and save a success message to its TaskProgress object, similar to the following: "DeepSSM task not implemented; testing GPU availability. GPU available = True. Found device [device_name]."
  • The user can compare these results by making get requests to api/v1/task-progress/5 and api/v1/task-progress/6.

@annehaley annehaley marked this pull request as ready for review March 21, 2024 18:23
@annehaley annehaley merged commit a47f575 into master Mar 21, 2024
4 checks passed
@annehaley annehaley deleted the check-gpu-task branch March 21, 2024 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant