Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with graphman rewind while subgraph has failed with non-deterministic error #4459

Closed
brianluong opened this issue Mar 15, 2023 · 3 comments

Comments

@brianluong
Copy link

brianluong commented Mar 15, 2023

Do you want to request a feature or report a bug?
Bug

What is the current behavior?

  1. When a subgraph has failed with non-deterministic error, the runner thread goes to sleep.

  2. Run graphman rewind -s 30 $CURRENT_BLOCK $IPFS to restart the subgraph processing thread. I also tried running with a larger sleep (480s) that was longer than the duration the thread was sleeping for (the log was Mar 15 22:45:25.734 ERRO Subgraph failed with non-deterministic error: Failed to transact block operations: subgraph writer poisoned by previous error, retry_delay_s: 240).

  3. The subgraph starts up and is indexing blocks happily. I think this starts a new subgraph processing thread.

  4. When the runner thread from (1) wakes up, it sees that the writer is still poisoned and then fails again with non-deterministic error. (should this be fixed already?)

  5. The subgraph stops indexing. I think the thread spawned by (2) got killed as well.

If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem.

What is the expected behavior?
After the graphman rewind and the subgraph restarts and is indexing happily, it shouldn't fail again.

  • Should having two threads processing a single subgraph even be allowed? When new subgraph thread starts up, should we be killing the one that's sleeping?
  • Why isn't the poisoned writer not clearing its state? I think it should be, indicated from this fix.
@github-actions
Copy link

Looks like this issue has been open for 6 months with no activity. Is it still relevant? If not, please remember to close it.

@github-actions github-actions bot added the Stale label Sep 13, 2023
@azf20
Copy link
Contributor

azf20 commented Sep 15, 2023

hey @brianluong are you still seeing this issue when rewinding?

@brianluong
Copy link
Author

brianluong commented Sep 22, 2023

@azf20 To be honest, I haven't tried it recently. Our workaround was restarting the entire indexer. I'm down to close this issue.

@github-actions github-actions bot removed the Stale label Sep 26, 2023
@fordN fordN closed this as completed Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants