Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(node): [NET-1377] Operator stops inspecting after Polygon RPC timeout #2893

Merged
merged 3 commits into from
Nov 21, 2024

Conversation

teogeb
Copy link
Contributor

@teogeb teogeb commented Nov 20, 2024

Error

IfOperator.fetchRedundancyFactor() call throws an error during inspectRandomNode task, the task is unable to complete. The scheduler won't start subsequent inspections as the frozen task is still pending.

This kind of error is shown in logs:

INFO [2024-11-16T23:42:11.049] (inspectOverTime          ): Inspecting target {"traceId":"iSu6S5","attemptNo":1,"target":{"sponsorshipAddress":"0x335370713cea4321330cbcb5f2e2b87fe6a33e7c","operatorAddress":"0x22f694c92b74a31dc0af06cff53586ac84b2c9fc","streamPart":"streamr.eth/demos/radio#0"}}
WARN [2024-11-16T23:42:16.110] (inspectOverTime          ): Error encountered {"traceId":"iSu6S5"}
    err: {
      "type": "Error",
      "message": "Error while executing contract call \"operator.metadata\", code=TIMEOUT",
      "stack":
          Error: Error while executing contract call "operator.metadata", code=TIMEOUT
              at withErrorHandling (/home/streamr/network/packages/sdk/dist/src/contracts/contract.js:52:30)
              at async Object.fn [as metadata] (/home/streamr/network/packages/sdk/dist/src/contracts/contract.js:62:29)
              at async Operator.fetchRedundancyFactor (/home/streamr/network/packages/sdk/dist/src/contracts/Operator.js:407:34)
              at async InspectionOverTimeTask.findNodesForTargetGivenFleetState [as findNodesForTargetGivenFleetStateFn] (/home/streamr/network/packages/node/dist/src/plugins/operator/inspectionUtils.js:66:31)
              at async InspectionOverTimeTask.run (/home/streamr/network/packages/node/dist/src/plugins/operator/inspectOverTime.js:96:43)
      "reason": {
        "type": "Error",
        "message": "request timeout (code=TIMEOUT, version=6.13.1)",
        "stack":
            Error: request timeout (code=TIMEOUT, version=6.13.1)
                at makeError (/home/streamr/network/node_modules/ethers/lib.commonjs/utils/errors.js:129:21)
                at ClientRequest.<anonymous> (/home/streamr/network/node_modules/ethers/lib.commonjs/utils/geturl.js:59:50)
                at ClientRequest.emit (node:events:519:28)
                at TLSSocket.emitRequestTimeout (node:_http_client:856:9)
                at Object.onceWrapper (node:events:633:28)
                at TLSSocket.emit (node:events:531:35)
                at Socket._onTimeout (node:net:591:8)
                at listOnTimeout (node:internal/timers:581:17)
                at process.processTimers (node:internal/timers:519:7)
        "code": "TIMEOUT",
        "shortMessage": "request timeout"
      }

Fix

Added doneGate.open() call to InspectionOverTimeTask#destroy()

  • the error handler created in start() calls the destroy() method
  • as the gate is open, the await task.waitUntilPassOrDone() at line 35 is no longer blocked

Copy link

linear bot commented Nov 20, 2024

@github-actions github-actions bot added the node label Nov 20, 2024
@github-actions github-actions bot added the docs label Nov 20, 2024
@teogeb teogeb requested a review from harbu November 20, 2024 23:26
@teogeb teogeb merged commit 42d48ae into main Nov 21, 2024
24 checks passed
@teogeb teogeb deleted the inspectOverTime-freeze-NET-1377 branch November 21, 2024 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants