Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor criu plugin baseline v2 #2298

Merged

Conversation

rerrabolu
Copy link
Contributor

No description provided.

…vices

Add a new compilation unit to host symbols and methods that will be
needed to C&R DRM devices. Refactor code that indicates support for
C&R and checkpoints KFD and DRM devices

Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Refactor code used to Checkpoint DRM devices. Code is moved
into amdgpu_plugin_drm.c file which hosts various methods to
checkpoint and restore a workload.

Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
@avagin avagin requested a review from dayatsin-amd November 12, 2023 03:09
@rerrabolu
Copy link
Contributor Author

My changelist fails for two scenarios:

  1. Cirrus CI / Vagrant Fedora Rawhide based test
  2. Cirrus CI / Vagrant Fedora based test (no VDSO)

Is it possible for me to re-run them as the "details" does not show me if my change is the cause. Furthermore, I see other changelists that are having failures in the same validating element.

@avagin avagin requested a review from fdavid-amd November 29, 2023 23:39
@avagin
Copy link
Member

avagin commented Nov 29, 2023

@rerrabolu can someone from your team review this change?

@dayatsin-amd
Copy link
Contributor

@rerrabolu can someone from your team review this change?

Sorry, been busy with other things. I will take a look.

Copy link

A friendly reminder that this PR had no activity for 30 days.

@avagin avagin added no-auto-close Don't auto-close as a stale issue and removed stale-pr labels Jan 17, 2024
@avagin
Copy link
Member

avagin commented Jan 17, 2024

@dayatsin-amd friendly ping

@rst0git
Copy link
Member

rst0git commented Jan 17, 2024

@rerrabolu Would it be possible to add tests that can be used to verify the functionality on a system with GPU?

Copy link
Contributor

@dayatsin-amd dayatsin-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
One nit pick, the commit message title is exceeding 50 chars. See if you can make it shorter.

@avagin avagin merged commit 9d9ae29 into checkpoint-restore:criu-dev Feb 5, 2024
7 of 9 checks passed
@rerrabolu
Copy link
Contributor Author

rerrabolu commented Feb 22, 2024

  • Thanks for taking time to merge my patch.
  • I see some lint errors, will try to provide a patch to address it next couple of weeks
  • I will find out if we could add a small test case to validate CRIU on a system with AMD GPUs. This will take sometime and will require some coordination

@rst0git
Copy link
Member

rst0git commented Feb 22, 2024

@rerrabolu Some of the lint errors have been fixed in #2345.

I have been using the example applications from the following repository for testing locally with MI100 GPU:
https://github.com/ROCm/HIP-Examples/tree/master/HIP-Examples-Applications

We could add a few simple test cases, similar to these examples, to the ZDTM test suite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-auto-close Don't auto-close as a stale issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants