Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subgraph using Aggregations⁠ failing randomly #5530

Closed
1 of 3 tasks
itsjerryokolo opened this issue Jul 11, 2024 · 11 comments · Fixed by #5675
Closed
1 of 3 tasks

Subgraph using Aggregations⁠ failing randomly #5530

itsjerryokolo opened this issue Jul 11, 2024 · 11 comments · Fixed by #5675
Assignees
Labels
area/aggregations bug Something isn't working

Comments

@itsjerryokolo
Copy link
Contributor

Bug report

This subgraph fails randomly with the error below.

"Failed to transact block operations: store error: duplicate key value violates unique constraint \"transfer_stats_hour_id_key\"",

The same block 6284464 which it failed in (QmQZadt6wd8wo8U6opsVnjpAjx2Ysh3iLiPXuydJ65eqT3) is successfully processed in a new deployment (QmYktZYrbaENqm9nS8kvBxnzsRUyh4ueuQ679mYwMdNLpz)

Relevant log output

No response

IPFS hash

No response

Subgraph name or link to explorer

No response

Some information to help us out

  • Tick this box if this bug is caused by a regression found in the latest release.
  • Tick this box if this bug is specific to the hosted service.
  • I have searched the issue tracker to make sure this issue is not a duplicate.

OS information

None

@itsjerryokolo itsjerryokolo added the bug Something isn't working label Jul 11, 2024
@george-openformat
Copy link

george-openformat commented Jul 18, 2024

Running into the same issue:
subgraph | playground

First deployment on arbitrum-sepolia failed on block 64040656 after deploying a new version failed on block 64603832

@silvercondor
Copy link

getting same error on multiple subgraphs.

note that it can also happen for daily aggregations

@tinypell3ts
Copy link

I'm also getting the same error in my subgraph.

ERRO Subgraph failed with non-deterministic error: Failed to transact block operations: store error: duplicate key value violates unique constraint "transaction_stat_hour_id_key", retry_delay_s: 3732, attempt: 40, sgd: 5389, subgraph_id: QmZFm5y76cnjiF5TBfCCufVyR9sQAaWV6Ygm9Fg3cTUSed, component: SubgraphInstanceManager

@tsudmi
Copy link

tsudmi commented Oct 1, 2024

Running into the same issue. Any updates on this?

@gperezalba
Copy link

Same issue here. Also would be nice to include a working example of the mapping in https://github.com/graphprotocol/graph-node/blob/master/docs/aggregations.md

@lutter
Copy link
Collaborator

lutter commented Oct 8, 2024

Digging into one such example on Avalanche, this seems to be caused by blocks having the same timestamps. For example, on Avalanche blocks 51482738 and 51482739 both have the timestamp 0x67040580 which corresponds to 2024-10-07 16:00:00+00 That would trigger a rollup for the hour starting at 15:00.

I have to look through the code to see if that truly is the issue, but block times being not monotonic is a bit of a bummer

@lutter
Copy link
Collaborator

lutter commented Oct 8, 2024

The timestamps were a red herring. What's happening is this: when a subgraph starts, we assume that the last time we did a rollup is the block time of the block where we last had an actual write/entity change by looking at the PoI table. But for subgraphs that have big gaps between actual writes, that time doesn't change even though we do rollups as new blocks come in. When a subgraph in that state is restarted, we redo those rollups which causes the unique constraint violation.

Until we have a fix, it might help to rewind the subgraph to before the last actual write. Unfortunately, that data is not available in the API anywhere, and you need to run a query like select lower(block_range) from sgdNNN.poi2$ order by vid desc limit 1 to find that block. You'll then want to rewind to a block before that one. But if the subgraph doesn't have any real writes after that block, the constraint violation will happen again when the subgraph is restarted.

lutter added a commit that referenced this issue Oct 18, 2024
When graph-node is restarted, we need to determine for subgraphs with
aggregations when the last rollup was triggered to ensure aggregations get
filled without gaps or duplication.

The code used the `block_time` column in the PoI table for that but that is
not correct as the PoI table only records blocks and times for which the
subgraph actually has writes. When the subgraph scans through a largish
number of blocks without changes, we only update the head pointer but also
do rollups as aggregation intervals pass. Because of that, we might perform
a rollup without a corresponding entry in the PoI table.

With this change, we actually find the maximum timestamp from all
aggregation tables to tell us when the last rollup was triggered as that
data reflects when rollups happened accurately.

For safety, this new behavior can be turned off by setting
`GRAPH_STORE_LAST_ROLLUP_FROM_POI=true` to return to the old buggy behavior
in case the new behavior causes some other unexpected problems.

Fixes #5530
lutter added a commit that referenced this issue Oct 25, 2024
When graph-node is restarted, we need to determine for subgraphs with
aggregations when the last rollup was triggered to ensure aggregations get
filled without gaps or duplication.

The code used the `block_time` column in the PoI table for that but that is
not correct as the PoI table only records blocks and times for which the
subgraph actually has writes. When the subgraph scans through a largish
number of blocks without changes, we only update the head pointer but also
do rollups as aggregation intervals pass. Because of that, we might perform
a rollup without a corresponding entry in the PoI table.

With this change, we actually find the maximum timestamp from all
aggregation tables to tell us when the last rollup was triggered as that
data reflects when rollups happened accurately.

For safety, this new behavior can be turned off by setting
`GRAPH_STORE_LAST_ROLLUP_FROM_POI=true` to return to the old buggy behavior
in case the new behavior causes some other unexpected problems.

Fixes #5530
@lutter lutter closed this as completed in 22bca4e Oct 25, 2024
@antho31
Copy link

antho31 commented Jan 11, 2025

Hello,
I'm encountering the same error while indexing contracts on the Abstract Testnet with the subgraph (Deployment ID: Qmd6J1vaLnKCtnfKEBpmzEUv7Ng1y73EDQiZTHyopj486H, Slug: noodles-subgraph-testnet).
Is there a workaround to resolve this issue?

@lutter
Copy link
Collaborator

lutter commented Jan 14, 2025

Hello, I'm encountering the same error while indexing contracts on the Abstract Testnet with the subgraph (Deployment ID: Qmd6J1vaLnKCtnfKEBpmzEUv7Ng1y73EDQiZTHyopj486H, Slug: noodles-subgraph-testnet). Is there a workaround to resolve this issue?

Sorry that this is happening (again) I am looking into what is causing this, but don't have an ETA for a fix yet.

@antho31
Copy link

antho31 commented Jan 14, 2025

Hello, I'm encountering the same error while indexing contracts on the Abstract Testnet with the subgraph (Deployment ID: Qmd6J1vaLnKCtnfKEBpmzEUv7Ng1y73EDQiZTHyopj486H, Slug: noodles-subgraph-testnet). Is there a workaround to resolve this issue?

Sorry that this is happening (again) I am looking into what is causing this, but don't have an ETA for a fix yet.

Thanks for your response ! Let me know when you have updates on this :)

@george-openformat
Copy link

Can also confirm the error is back. QmNfan446bptuSx2JJ19szpD7koUQgtkfmdP8GNLUR31nP We deployed to arbitrum-sepolia
It ran for a month or so before this error cropped up again.

Error Logs:

error
2025-01-14 05:50:11 a.m.
Subgraph failed with non-deterministic error: Failed to transact block operations: store error: duplicate key value violates unique constraint "user_reward_app_stat_hour_id_key", retry_delay_s: 248, attempt: 1
error

2025-01-14 05:50:11 a.m.
Subgraph writer failed, error: store error: duplicate key value violates unique constraint "user_reward_app_stat_hour_id_key"
error

2025-01-14 05:48:44 a.m.
Unable to connect to endpoint: status: Unknown, message: "transport error", details: [], metadata: MetadataMap { headers: {} }: transport error: operation was canceled: connection closed: connection closed, provider: arb-sep-firehose-pinax, deployment: QmTRh5Sjj9E65C7cWdtCTcijPmnmm1Ne8F4eVQPVbsz4pB, component: FirehoseBlockStream

Good luck @lutter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/aggregations bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants