Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VFIO tests are not stable #261

Closed
glazychev-art opened this issue Mar 29, 2022 · 5 comments
Closed

VFIO tests are not stable #261

glazychev-art opened this issue Mar 29, 2022 · 5 comments
Milestone

Comments

@glazychev-art
Copy link
Collaborator

Description

Caught with Calico-CNI

Logs

logs-1069.zip

Build

https://github.com/networkservicemesh/integration-k8s-packet/actions/runs/2054555544

@glazychev-art
Copy link
Collaborator Author

It seems, that it is packet-specific problem. I've run this test with all available servers - it works fine.
I think we need to wait for new incidents.

@glazychev-art
Copy link
Collaborator Author

New incident: https://github.com/networkservicemesh/integration-k8s-packet/actions/runs/2081993976

See TestVfio2Noop.TestMultiForwarder/TestVfio2Noop

@glazychev-art
Copy link
Collaborator Author

glazychev-art commented Apr 12, 2022

Incident without Calico.
See:
TestKernel2Kernel_Vfio2Noop:1 | Test execution failed TestKernel2Kernel_Vfio2Noop
TestKernel2Vxlan2Kernel_Vfio2Noop:1 | Test execution failed TestKernel2Vxlan2Kernel_Vfio2Noop
https://github.com/networkservicemesh/integration-k8s-packet/actions/runs/2154463143/attempts/1

@glazychev-art glazychev-art changed the title TestKernel2Kernel_Vfio2Noop.TestMultiForwarder is not stable VFIO tests are not stable Apr 14, 2022
@denis-tingaikin denis-tingaikin added this to the v1.4.0 milestone May 19, 2022
@glazychev-art
Copy link
Collaborator Author

Results so far:

  1. There was a suggestion that the previous tests would affect the next ones - has not yet been confirmed. Because from time to time there are crashes of the first VFIO test on a fresh setup.
  2. Dpdk application hang occurs in these cycles:
    https://github.com/glazychev-art/dpdk-pingpong/blob/d1e8000b72fc4d3170f8726b375836b66f09a270/main.c#L276-L291
    In other words, we don't have a response from the server (nse-vfio). But the server does not receive the request either.
  3. Refreshes don't fix the problem. If test A doesn't work - it will not work even after refreshes (check manually). But the next test B might start working.
    This may be due to the fact that these chain elements only work on the first request:
    https://github.com/networkservicemesh/sdk-sriov/blob/main/pkg/networkservice/chains/forwarder/server.go#L123-L124
    because of resetmechanism - https://github.com/networkservicemesh/sdk-sriov/blob/main/pkg/networkservice/common/resetmechanism/server.go#L45-L65

Next step:

  1. Check nse-vfio configuration
  2. Check client and nse configuration for consistency

@glazychev-art
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants