Issue with graphman rewind while subgraph has failed with non-deterministic error #4459

brianluong · 2023-03-15T20:03:45Z

Do you want to request a feature or report a bug?
Bug

What is the current behavior?

When a subgraph has failed with non-deterministic error, the runner thread goes to sleep.
Run graphman rewind -s 30 $CURRENT_BLOCK $IPFS to restart the subgraph processing thread. I also tried running with a larger sleep (480s) that was longer than the duration the thread was sleeping for (the log was Mar 15 22:45:25.734 ERRO Subgraph failed with non-deterministic error: Failed to transact block operations: subgraph writer poisoned by previous error, retry_delay_s: 240).
The subgraph starts up and is indexing blocks happily. I think this starts a new subgraph processing thread.
When the runner thread from (1) wakes up, it sees that the writer is still poisoned and then fails again with non-deterministic error. (should this be fixed already?)
The subgraph stops indexing. I think the thread spawned by (2) got killed as well.

If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem.

What is the expected behavior?
After the graphman rewind and the subgraph restarts and is indexing happily, it shouldn't fail again.

Should having two threads processing a single subgraph even be allowed? When new subgraph thread starts up, should we be killing the one that's sleeping?
Why isn't the poisoned writer not clearing its state? I think it should be, indicated from this fix.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-09-13T00:17:14Z

Looks like this issue has been open for 6 months with no activity. Is it still relevant? If not, please remember to close it.

azf20 · 2023-09-15T14:45:02Z

hey @brianluong are you still seeing this issue when rewinding?

brianluong · 2023-09-22T13:46:46Z

@azf20 To be honest, I haven't tried it recently. Our workaround was restarting the entire indexer. I'm down to close this issue.

github-actions bot added the Stale label Sep 13, 2023

github-actions bot removed the Stale label Sep 26, 2023

fordN closed this as completed Apr 9, 2024

Provide feedback