Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connections do not close when using Boost #9327

Closed
8 of 18 tasks
rjan90 opened this issue Sep 18, 2022 · 16 comments · Fixed by #9328
Closed
8 of 18 tasks

Connections do not close when using Boost #9327

rjan90 opened this issue Sep 18, 2022 · 16 comments · Fixed by #9328
Labels

Comments

@rjan90
Copy link
Contributor

rjan90 commented Sep 18, 2022

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • This is not a question or a support request. If you have any lotus related questions, please ask in the lotus forum.
  • This is not a new feature request. If it is, please file a feature request instead.
  • This is not an enhancement request. If it is, please file a improvement suggestion instead.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

  • lotus daemon - chain sync
  • lotus miner - mining and block production
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt)
  • lotus miner/market - storage deal
  • lotus miner/market - retrieval deal
  • lotus miner/market - data transfer
  • lotus client
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Lotus Version

# lotus-miner -v
lotus-miner version 1.17.1+mainnet+git.8db6a939c

# boostd -v
boostd version 1.4.0+git.810afec

Describe the Bug

This issue is based on the report and Slack-thread here: https://filecoinproject.slack.com/archives/CPFTWMY7N/p1663296405788379.

The Storage Provider sees a lot of connections not closing when using Boost <> Lotus, and the amount of open connections (connections not closing) causes the Storage Provider to miss windowPoSt. The issue seem similar to the one reported here and potentially fixed with #9230. However, in this case the user are importing legacy deals, which do not run the DataCid-task reported in the previous issue.

Therefore it seems like the issue of not closing connections when using Boost <> Lotus are more widespread, and potentially between all tasks a lotus-worker can run.

Example: The user reported Allmost 411 Connections in EST or Time_wait to this one worker - somthing is odd here on a lotus-worker running only PC2/C2

image (2)

The user has access to a legacy markets setup with the same amount of workers, and one can see the difference in open connections:
a miner running Monolith Markets still - 0 issues with the same amount of workers:
image (3)

VS my problem miner running boost:
image (4)

Logging Information

Ok - so i dropped boost for a few hours and watch the con's
Its not boot opening and closing a metric ton of connections to workers - its lotus.
On my problem worker i have the following to one worker:
# netstat -an | grep 6.81 |awk '/^tcp/ {++State[$NF]}END{for(key in State)print key "\t" State[key]}'
ESTABLISHED     30
TIME_WAIT       454
On a another server (not sealing at the moment, be graphs for cons have been stable for 2+ weeks even when sealing)
# netstat -an  | grep 100.110.6.72 | awk '/^tcp/ {++State[$NF]}END{for(key in State)print key "\t" State[key]}'
ESTABLISHED     18

Repo Steps

No response

@rjan90 rjan90 added kind/bug Kind: Bug area/markets Area: Markets labels Sep 18, 2022
@magik6k
Copy link
Contributor

magik6k commented Sep 19, 2022

Possibly related to #9263

@marshyonline
Copy link

I'm not overly optimistic that the PR will fix the issue as I'm seeing the huge connection requests to all workers but not all workers do AP in my setup.

@TippyFlitsUK
Copy link
Contributor

We will absolutely reopen this ticket if you continue to see the same issues @marshyonline. Please keep us posted!!

@marshyonline
Copy link

Thanks @TippyFlitsUK
Can you guys please provide instructions on how to cheery pick this commit into 1.17.1?
Ill test this asap

@rjan90
Copy link
Contributor Author

rjan90 commented Sep 20, 2022

Hey!

You can cherry-pick the commit with:

cd lotus
git cherry-pick d68bb937d81cd83b9e5b46d8870e6e88a9942cdf

Then rebuild 😄 There will also be a v1.17.2-rc1 out later today (US time) that will have this commit in it!

@marshyonline
Copy link

Ill wait for the RC i think :)
Thanks!

@marshyonline
Copy link

marshyonline commented Sep 23, 2022

Spot when i started sealing :(
image
image

Im seeing {"level":"warn","ts":"2022-09-23T01:07:18.373Z","logger":"ffiwrapper","caller":"ffiwrapper/sealer_cgo.go:196","msg":"AddPiece: cannot close pieceData reader *nullreader.NullReader because it is not an io.Closer"} in my AP logs after every sector is done.

Here is a breakdown of my workers (AP=AP, WRK1=PC1, WRK2=4x PC2 + 4x C2):
image

@marshyonline
Copy link

marshyonline commented Sep 23, 2022

Connections start Cycling out so quickly that we begin to see this:

{"level":"error","ts":"2022-09-23T11:28:19.768+1000","logger":"advmgr","caller":"sealer/sched_assigner_common.go:95","msg":"trySched(1) req.Sel.Ok error: getting worker paths:\n
   github.com/filecoin-project/lotus/storage/sealer.(*existingSelector).Ok\n        /root/workspace/lotus/storage/sealer/selector_existing.go:42\n  - RPC client error: sendReques
t failed: Post \"http://100.110.6.83:20001/rpc/v0\": context deadline exceeded"}
{"level":"error","ts":"2022-09-23T11:28:19.768+1000","logger":"advmgr","caller":"sealer/sched_assigner_common.go:95","msg":"trySched(1) req.Sel.Ok error: getting worker paths:\n
   github.com/filecoin-project/lotus/storage/sealer.(*existingSelector).Ok\n        /root/workspace/lotus/storage/sealer/selector_existing.go:42\n  - RPC client error: sendReques
t failed: Post \"http://100.110.6.83:20001/rpc/v0\": context deadline exceeded"}
{"level":"error","ts":"2022-09-23T11:28:19.768+1000","logger":"advmgr","caller":"sealer/sched_assigner_common.go:95","msg":"trySched(1) req.Sel.Ok error: getting worker paths:\n
   github.com/filecoin-project/lotus/storage/sealer.(*existingSelector).Ok\n        /root/workspace/lotus/storage/sealer/selector_existing.go:42\n  - RPC client error: sendReques
t failed: Post \"http://100.110.6.83:20001/rpc/v0\": context deadline exceeded"}
{"level":"error","ts":"2022-09-23T11:28:19.768+1000","logger":"advmgr","caller":"sealer/sched_assigner_common.go:95","msg":"trySched(1) req.Sel.Ok error: getting worker paths:\n    github.com/filecoin-project/lotus/storage/sealer.(*existingSelector).Ok\n        /root/workspace/lotus/storage/sealer/selector_existing.go:42\n  - RPC client error: sendRequest failed: Post \"http://100.110.6.83:20001/rpc/v0\": context deadline exceeded"}
{"level":"warn","ts":"2022-09-23T11:28:22.281+1000","logger":"advmgr","caller":"sealer/sched_worker.go:209","msg":"failed to check worker session","error":"RPC client error: sendRequest failed: Post \"http://100.110.6.83:20001/rpc/v0\": context deadline exceeded"}

Its only a matter of time now before the node and miner loose touch

@rjan90 rjan90 reopened this Sep 23, 2022
@marshyonline
Copy link

image

2 DC's from the node over the first 2 Deadlines running the latest RC

@marshyonline
Copy link

I spun up another miner on identical hardware (NB: Both servers had ubunut22.04 installed a week apart), Sealing at the exact same rate

  1. Lotus + Boost(With boost Shutdown)
    image

  2. Lotus w/o Boost
    image

It seems to be lotus with markets disabled is cycling through 100x more connection's to workers than when markets are enabled thus causing the link to drop.
In this case, while trying to seal CC with a boost server - 9/10 lotus will drop connection to the daemon and will either restart chain notifs or drop head updates and C2 will no longer run until either a miner or node restart. I can replicate both results on a remote daemon and a local daemon

@marshyonline
Copy link

So after much digging - i found the reason for this high connection count.
There was ~50 sectors stuck in the sched that were in a state of constant retry (the sector files were missing) - after creating dummy files and removing them from the node the connection count is normal again. (<1k tw cons) but this problem still persists.

@marshyonline
Copy link

I have now got the same symptom on another miner + node combo

I cant find anything to indicate a network issue or the like on my end at this time

Node Logs

{"level":"warn","ts":"2022-10-26T06:08:03.569+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:234","msg":"closed out channel sendRequest failed: write tcp4 100.110.6.40:1234->113.29.246.212:56846: use of closed network connection"}
{"level":"warn","ts":"2022-10-26T06:08:03.569+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/handler.go:329","msg":"error in RPC call to 'Filecoin.StateWaitMsg': context canceled"}
{"level":"error","ts":"2022-10-26T06:08:03.569+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:122","msg":"handle me:write tcp4 100.110.6.40:1234->113.29.246.212:56846: use of closed network connection"}
{"level":"warn","ts":"2022-10-26T06:08:03.569+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/handler.go:329","msg":"error in RPC call to 'Filecoin.StateWaitMsg': context canceled"}
{"level":"warn","ts":"2022-10-26T06:08:03.569+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/handler.go:329","msg":"error in RPC call to 'Filecoin.StateWaitMsg': context canceled"}
{"level":"warn","ts":"2022-10-26T06:08:03.569+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/handler.go:329","msg":"error in RPC call to 'Filecoin.StateWaitMsg': context canceled"}
{"level":"warn","ts":"2022-10-26T06:08:03.569+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/handler.go:329","msg":"error in RPC call to 'Filecoin.StateWaitMsg': context canceled"}
{"level":"warn","ts":"2022-10-26T06:08:03.569+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/handler.go:329","msg":"error in RPC call to 'Filecoin.StateWaitMsg': context canceled"}
{"level":"warn","ts":"2022-10-26T06:08:03.569+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/handler.go:329","msg":"error in RPC call to 'Filecoin.StateWaitMsg': context canceled"}
{"level":"warn","ts":"2022-10-26T06:08:03.569+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/handler.go:329","msg":"error in RPC call to 'Filecoin.StateWaitMsg': context canceled"}
{"level":"error","ts":"2022-10-26T06:08:03.569+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:122","msg":"handle me:write tcp4 100.110.6.40:1234->113.29.246.212:56846: use of closed network connection"}
{"level":"error","ts":"2022-10-26T06:08:03.570+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:122","msg":"handle me:write tcp4 100.110.6.40:1234->113.29.246.212:56846: use of closed network connection"}
{"level":"error","ts":"2022-10-26T06:08:03.570+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:122","msg":"handle me:write tcp4 100.110.6.40:1234->113.29.246.212:56846: use of closed network connection"}
{"level":"error","ts":"2022-10-26T06:08:03.570+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:122","msg":"handle me:write tcp4 100.110.6.40:1234->113.29.246.212:56846: use of closed network connection"}
{"level":"error","ts":"2022-10-26T06:08:03.570+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:122","msg":"handle me:write tcp4 100.110.6.40:1234->113.29.246.212:56846: use of closed network connection"}
{"level":"error","ts":"2022-10-26T06:08:03.570+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:122","msg":"handle me:write tcp4 100.110.6.40:1234->113.29.246.212:56846: use of closed network connection"}
{"level":"error","ts":"2022-10-26T06:08:03.570+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:122","msg":"handle me:write tcp4 100.110.6.40:1234->113.29.246.212:56846: use of closed network connection"}

Miner Logs

{"level":"warn","ts":"2022-10-25T19:07:02.281Z","logger":"rpc","caller":"go-jsonrpc@v0.1.7/client.go:365","msg":"rpc output message buffer","n":24}
{"level":"warn","ts":"2022-10-25T19:07:02.282Z","logger":"rpc","caller":"go-jsonrpc@v0.1.7/client.go:365","msg":"rpc output message buffer","n":25}
{"level":"warn","ts":"2022-10-25T19:07:02.285Z","logger":"rpc","caller":"go-jsonrpc@v0.1.7/client.go:365","msg":"rpc output message buffer","n":26}
{"level":"debug","ts":"2022-10-25T19:07:02.288Z","logger":"advmgr","caller":"sealer/sched_assigner_common.go:39","msg":"SCHED 1 queued; 130 open windows"}
{"level":"warn","ts":"2022-10-25T19:07:02.289Z","logger":"rpc","caller":"go-jsonrpc@v0.1.7/client.go:365","msg":"rpc output message buffer","n":27}
{"level":"warn","ts":"2022-10-25T19:07:02.291Z","logger":"rpc","caller":"go-jsonrpc@v0.1.7/client.go:365","msg":"rpc output message buffer","n":28}
{"level":"warn","ts":"2022-10-25T19:07:02.300Z","logger":"rpc","caller":"go-jsonrpc@v0.1.7/client.go:365","msg":"rpc output message buffer","n":29}
{"level":"warn","ts":"2022-10-25T19:07:02.302Z","logger":"rpc","caller":"go-jsonrpc@v0.1.7/client.go:365","msg":"rpc output message buffer","n":30}
{"level":"error","ts":"2022-10-25T19:07:05.332Z","logger":"stores","caller":"paths/localstorage_cached.go:105","msg":"error getting disk usage","path":"/mnt/store004/sealed/s-t01938357-5603","error":"file does not exist"}
{"level":"error","ts":"2022-10-25T19:07:05.332Z","logger":"stores","caller":"paths/localstorage_cached.go:105","msg":"error getting disk usage","path":"/mnt/store004/cache/s-t01938357-5603","error":"file does not exist"}
{"level":"error","ts":"2022-10-25T19:07:05.332Z","logger":"stores","caller":"paths/localstorage_cached.go:105","msg":"error getting disk usage","path":"/mnt/store001/sealed/s-t01938357-5615","error":"file does not exist"}
{"level":"error","ts":"2022-10-25T19:07:05.332Z","logger":"stores","caller":"paths/localstorage_cached.go:105","msg":"error getting disk usage","path":"/mnt/store001/cache/s-t01938357-5615","error":"file does not exist"}
{"level":"error","ts":"2022-10-25T19:07:05.332Z","logger":"stores","caller":"paths/localstorage_cached.go:105","msg":"error getting disk usage","path":"/mnt/store002/sealed/s-t01938357-5618","error":"file does not exist"}
{"level":"error","ts":"2022-10-25T19:07:05.332Z","logger":"stores","caller":"paths/localstorage_cached.go:105","msg":"error getting disk usage","path":"/mnt/store002/cache/s-t01938357-5618","error":"file does not exist"}
{"level":"error","ts":"2022-10-25T19:07:07.344Z","logger":"advmgr","caller":"sealer/sched_assigner_common.go:95","msg":"trySched(1) req.Sel.Ok error: getting worker paths:\n    github.com/filecoin-project/lotus/storage/sealer.(*existingSelector).Ok\n        /root/workspace/lotus/storage/sealer/selector_existing.go:42\n  - RPC client error: sendRequest failed: Post \"http://100.110.6.68:3458
/rpc/v0\": context deadline exceeded"}
{"level":"warn","ts":"2022-10-25T19:07:08.320Z","logger":"advmgr","caller":"sealer/sched_worker.go:209","msg":"failed to check worker session","error":"RPC client error: sendRequest failed: Post \"http://100.110.6.70:3458/rpc/v0\": context deadline exceeded"}
{"level":"warn","ts":"2022-10-25T19:07:08.320Z","logger":"advmgr","caller":"sealer/sched_worker.go:209","msg":"failed to check worker session","error":"RPC client error: sendRequest failed: Post \"http://100.110.6.70:3456/rpc/v0\": context deadline exceeded"}
{"level":"warn","ts":"2022-10-25T19:07:08.320Z","logger":"advmgr","caller":"sealer/sched_worker.go:209","msg":"failed to check worker session","error":"RPC client error: sendRequest failed: Post \"http://100.110.6.70:3461/rpc/v0\": context deadline exceeded"}
{"level":"warn","ts":"2022-10-25T19:07:08.320Z","logger":"advmgr","caller":"sealer/sched_worker.go:209","msg":"failed to check worker session","error":"RPC client error: sendRequest failed: Post \"http://100.110.6.70:3462/rpc/v0\": context deadline exceeded"}
{"level":"warn","ts":"2022-10-25T19:07:08.739Z","logger":"advmgr","caller":"sealer/sched_worker.go:209","msg":"failed to check worker session","error":"RPC client error: sendRequest failed: Post \"http://100.110.6.64:20001/rpc/v0\": context deadline exceeded"}
{"level":"warn","ts":"2022-10-25T19:07:10.609Z","logger":"advmgr","caller":"sealer/sched_worker.go:209","msg":"failed to check worker session","error":"RPC client error: sendRequest failed: Post \"http://100.110.6.79:20008/rpc/v0\": context deadline exceeded"}
{"level":"error","ts":"2022-10-25T19:07:12.350Z","logger":"advmgr","caller":"sealer/sched_assigner_common.go:95","msg":"trySched(1) req.Sel.Ok error: getting worker paths:\n    github.com/filecoin-project/lotus/storage/sealer.(*existingSelector).Ok\n        /root/workspace/lotus/storage/sealer/selector_existing.go:42\n  - RPC client error: sendRequest failed: Post \"http://100.110.6.68:3457
/rpc/v0\": context deadline exceeded"}
{"level":"error","ts":"2022-10-25T19:07:15.334Z","logger":"stores","caller":"paths/localstorage_cached.go:105","msg":"error getting disk usage","path":"/mnt/store004/sealed/s-t01938357-5603","error":"file does not exist"}
{"level":"error","ts":"2022-10-25T19:07:15.334Z","logger":"stores","caller":"paths/localstorage_cached.go:105","msg":"error getting disk usage","path":"/mnt/store004/cache/s-t01938357-5603","error":"file does not exist"}
{"level":"error","ts":"2022-10-25T19:07:15.334Z","logger":"stores","caller":"paths/localstorage_cached.go:105","msg":"error getting disk usage","path":"/mnt/store001/sealed/s-t01938357-5615","error":"file does not exist"}
{"level":"error","ts":"2022-10-25T19:07:15.334Z","logger":"stores","caller":"paths/localstorage_cached.go:105","msg":"error getting disk usage","path":"/mnt/store001/cache/s-t01938357-5615","error":"file does not exist"}
{"level":"error","ts":"2022-10-25T19:07:15.334Z","logger":"stores","caller":"paths/localstorage_cached.go:105","msg":"error getting disk usage","path":"/mnt/store002/sealed/s-t01938357-5618","error":"file does not exist"}
{"level":"error","ts":"2022-10-25T19:07:15.334Z","logger":"stores","caller":"paths/localstorage_cached.go:105","msg":"error getting disk usage","path":"/mnt/store002/cache/s-t01938357-5618","error":"file does not exist"}
{"level":"warn","ts":"2022-10-25T19:07:15.411Z","logger":"advmgr","caller":"sealer/sched_worker.go:209","msg":"failed to check worker session","error":"RPC client error: sendRequest failed: Post \"http://100.110.6.101:3455/rpc/v0\": context deadline exceeded"}
{"level":"warn","ts":"2022-10-25T19:07:15.411Z","logger":"advmgr","caller":"sealer/sched_worker.go:209","msg":"failed to check worker session","error":"RPC client error: sendRequest failed: Post \"http://100.110.6.101:3456/rpc/v0\": context deadline exceeded"}
{"level":"warn","ts":"2022-10-25T19:07:15.411Z","logger":"advmgr","caller":"sealer/sched_worker.go:209","msg":"failed to check worker session","error":"RPC client error: sendRequest failed: Post \"http://100.110.6.73:3459/rpc/v0\": context deadline exceeded"}
{"level":"warn","ts":"2022-10-25T19:07:15.411Z","logger":"advmgr","caller":"sealer/sched_worker.go:209","msg":"failed to check worker session","error":"RPC client error: sendRequest failed: Post \"http://100.110.6.73:3455/rpc/v0\": context deadline exceeded"}
{"level":"warn","ts":"2022-10-25T19:07:15.411Z","logger":"advmgr","caller":"sealer/sched_worker.go:209","msg":"failed to check worker session","error":"RPC client error: sendRequest failed: Post \"http://100.110.6.73:3457/rpc/v0\": context deadline exceeded"}
{"level":"error","ts":"2022-10-25T19:07:17.421Z","logger":"advmgr","caller":"sealer/sched_assigner_common.go:95","msg":"trySched(1) req.Sel.Ok error: getting supported worker task types:\n    github.com/filecoin-project/lotus/storage/sealer.(*existingSelector).Ok\n        /root/workspace/lotus/storage/sealer/selector_existing.go:34\n  - RPC client error: sendRequest failed: Post \"http://10
0.110.6.70:3461/rpc/v0\": context deadline exceeded"}
{"level":"error","ts":"2022-10-25T19:07:25.336Z","logger":"stores","caller":"paths/localstorage_cached.go:105","msg":"error getting disk usage","path":"/mnt/store004/sealed/s-t01938357-5603","error":"file does not exist"}
{"level":"error","ts":"2022-10-25T19:07:25.336Z","logger":"stores","caller":"paths/localstorage_cached.go:105","msg":"error getting disk usage","path":"/mnt/store004/cache/s-t01938357-5603","error":"file does not exist"}
{"level":"error","ts":"2022-10-25T19:07:25.336Z","logger":"stores","caller":"paths/localstorage_cached.go:105","msg":"error getting disk usage","path":"/mnt/store001/sealed/s-t01938357-5615","error":"file does not exist"}
{"level":"error","ts":"2022-10-25T19:07:25.336Z","logger":"stores","caller":"paths/localstorage_cached.go:105","msg":"error getting disk usage","path":"/mnt/store001/cache/s-t01938357-5615","error":"file does not exist"}
{"level":"error","ts":"2022-10-25T19:07:25.336Z","logger":"stores","caller":"paths/localstorage_cached.go:105","msg":"error getting disk usage","path":"/mnt/store002/sealed/s-t01938357-5618","error":"file does not exist"}
{"level":"error","ts":"2022-10-25T19:07:25.336Z","logger":"stores","caller":"paths/localstorage_cached.go:105","msg":"error getting disk usage","path":"/mnt/store002/cache/s-t01938357-5618","error":"file does not exist"}
{"level":"warn","ts":"2022-10-25T19:07:33.102Z","logger":"sectors","caller":"pipeline/fsm.go:792","msg":"sector 5699 got error event sealing.SectorChainPreCommitFailed: handler: websocket connection closed"}
{"level":"warn","ts":"2022-10-25T19:07:33.102Z","logger":"wdpost","caller":"wdpost/wdpost_sched.go:164","msg":"window post scheduler notifs channel closed"}
{"level":"warn","ts":"2022-10-25T19:07:33.102Z","logger":"sectors","caller":"pipeline/fsm.go:792","msg":"sector 5707 got error event sealing.SectorChainPreCommitFailed: handler: websocket connection closed"}
{"level":"error","ts":"2022-10-25T19:07:33.102Z","logger":"advmgr","caller":"sealer/sched_assigner_common.go:95","msg":"trySched(1) req.Sel.Ok error: getting worker paths:\n    github.com/filecoin-project/lotus/storage/sealer.(*existingSelector).Ok\n        /root/workspace/lotus/storage/sealer/selector_existing.go:42\n  - RPC client error: sendRequest failed: Post \"http://100.110.6.98:2000
1/rpc/v0\": context deadline exceeded"}
{"level":"warn","ts":"2022-10-25T19:07:33.102Z","logger":"sectors","caller":"pipeline/fsm.go:792","msg":"sector 5617 got error event sealing.SectorCommitFailed: failed to wait for porep inclusion: handler: websocket connection closed"}
{"level":"warn","ts":"2022-10-25T19:07:33.102Z","logger":"sectors","caller":"pipeline/fsm.go:792","msg":"sector 5614 got error event sealing.SectorCommitFailed: failed to wait for porep inclusion: handler: websocket connection closed"}
{"level":"error","ts":"2022-10-25T19:07:33.102Z","logger":"events","caller":"events/events_called.go:230","msg":"chain trigger (@H 2280604, triggered @ 2280610) failed: handler: websocket connection closed"}
{"level":"warn","ts":"2022-10-25T19:07:33.102Z","logger":"sectors","caller":"pipeline/fsm.go:792","msg":"sector 5677 got error event sealing.SectorChainPreCommitFailed: handler: websocket connection closed"}
{"level":"error","ts":"2022-10-25T19:07:33.102Z","logger":"events","caller":"events/events_called.go:480","msg":"messagesForTs MessagesForBlock failed (ts.H=2280613, Bcid:bafy2bzacebdrxoluhoxgsorny6vysf6nc3mhpheyasciahltg5n642mlzvqqi, B.Mcid:bafy2bzacebdxbcmr467oflrcx3r67hs4zapv5tmbtk2lspw226pwp3pv3mg7i): handler: websocket connection closed"}
{"level":"warn","ts":"2022-10-25T19:07:33.102Z","logger":"sectors","caller":"pipeline/fsm.go:792","msg":"sector 5624 got error event sealing.SectorCommitFailed: failed to wait for porep inclusion: handler: websocket connection closed"}
{"level":"warn","ts":"2022-10-25T19:07:33.102Z","logger":"sectors","caller":"pipeline/fsm.go:792","msg":"sector 5672 got error event sealing.SectorChainPreCommitFailed: handler: websocket connection closed"}
{"level":"warn","ts":"2022-10-25T19:07:33.102Z","logger":"sectors","caller":"pipeline/fsm.go:792","msg":"sector 5612 got error event sealing.SectorCommitFailed: failed to wait for porep inclusion: handler: websocket connection closed"}

@marshyonline
Copy link

And again - this time on a server that is running miner and node together(so no network to connect them together - all local cons)

Miner

{"level":"debug","ts":"2022-10-28T19:53:03.189+1100","logger":"advmgr","caller":"sealer/sched_assigner_common.go:152","msg":"SCHED Acceptable win: [[]]"}
found dealIdx 0
found dealIdx 0
found dealIdx 2
found dealIdx 2
found dealIdx 1
found dealIdx 1
{"level":"warn","ts":"2022-10-28T19:53:08.984+1100","logger":"advmgr","caller":"sealer/sched_worker.go:209","msg":"failed to check worker session","error":"RPC client error: sendRequest failed: Post \"http://100.110.6.55:3462/rpc/v0\": context deadline exceeded"}
found dealIdx 1
{"level":"warn","ts":"2022-10-28T19:53:09.253+1100","logger":"advmgr","caller":"sealer/stats.go:27","msg":"getting worker task types in WorkerStats","error":"RPC client error: sendRequest failed: Post \"http://100.110.6.55:3462/rpc/v0\": context deadline exceeded"}
found dealIdx 1
{"level":"warn","ts":"2022-10-28T19:53:45.814+1100","logger":"advmgr","caller":"sealer/sched_worker.go:209","msg":"failed to check worker session","error":"RPC client error: sendRequest failed: Post \"http://100.110.6.52:3460/rpc/v0\": context deadline exceeded"}
{"level":"error","ts":"2022-10-28T19:53:45.814+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:667","msg":"Connection timeout","remote":"127.0.0.1:1234"}
{"level":"error","ts":"2022-10-28T19:53:45.814+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:97","msg":"setting read deadlineset tcp 127.0.0.1:33768: use of closed network connection"}
{"level":"error","ts":"2022-10-28T19:53:45.814+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:97","msg":"setting read deadlineset tcp 127.0.0.1:33768: use of closed network connection"}
{"level":"error","ts":"2022-10-28T19:53:45.816+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:648","msg":"sendReqest failed (Handle me): write tcp 127.0.0.1:33768->127.0.0.1:1234: use of closed network connection"}
{"level":"warn","ts":"2022-10-28T19:53:45.817+1100","logger":"advmgr","caller":"sealer/stats.go:27","msg":"getting worker task types in WorkerStats","error":"RPC client error: sendRequest failed: Post \"http://100.110.6.66:2069/rpc/v0\": context deadline exceeded"}
{"level":"warn","ts":"2022-10-28T19:53:45.817+1100","logger":"sectors","caller":"pipeline/fsm.go:792","msg":"sector 24831 got error event sealing.SectorChainPreCommitFailed: handler: websocket connection closed"}
{"level":"error","ts":"2022-10-28T19:53:45.817+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:498","msg":"sending ping message: write tcp 127.0.0.1:33768->127.0.0.1:1234: use of closed network connection"}
{"level":"info","ts":"2022-10-28T19:53:45.817+1100","logger":"markets","caller":"loggers/loggers.go:21","msg":"storage provider event","name":"ProviderEventDealPublishError","proposal CID":"bafyreifulmjl5bv4qnlwwp3kdmdspuvsdkxlqsykbsw4i346u2edciwxny","state":"StorageDealFailing","message":"PublishStorageDeal error: PublishStorageDeals errored: WaitForPublishDeals errored: handler: websocket connection closed"}
{"level":"warn","ts":"2022-10-28T19:53:45.817+1100","logger":"sectors","caller":"pipeline/fsm.go:792","msg":"sector 24753 got error event sealing.SectorCommitFailed: failed to wait for porep inclusion: handler: websocket connection closed"}
{"level":"info","ts":"2022-10-28T19:53:45.817+1100","logger":"markets","caller":"loggers/loggers.go:21","msg":"storage provider event","name":"ProviderEventDealPublishError","proposal CID":"bafyreifujoegpvv6enkcicojrfaksku2dr4vzxfnmxv5anlicifefhwflq","state":"StorageDealFailing","message":"PublishStorageDeal error: PublishStorageDeals errored: WaitForPublishDeals errored: handler: websocket connection closed"}
{"level":"warn","ts":"2022-10-28T19:53:45.817+1100","logger":"sectors","caller":"pipeline/fsm.go:792","msg":"sector 24826 got error event sealing.SectorChainPreCommitFailed: handler: websocket connection closed"}
{"level":"info","ts":"2022-10-28T19:53:45.817+1100","logger":"markets","caller":"loggers/loggers.go:21","msg":"storage provider event","name":"ProviderEventDealPublishError","proposal CID":"bafyreiftnhxuxvilcooowkb6kmf3kldnct56b2stbwnj244rnsan2rkxmi","state":"StorageDealFailing","message":"PublishStorageDeal error: PublishStorageDeals errored: WaitForPublishDeals errored: handler: websocket connection closed"}
{"level":"warn","ts":"2022-10-28T19:53:45.817+1100","logger":"sectors","caller":"pipeline/fsm.go:792","msg":"sector 24747 got error event sealing.SectorCommitFailed: failed to wait for porep inclusion: handler: websocket connection closed"}
{"level":"error","ts":"2022-10-28T19:53:45.817+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:648","msg":"sendReqest failed (Handle me): write tcp 127.0.0.1:33768->127.0.0.1:1234: use of closed network connection"}
{"level":"warn","ts":"2022-10-28T19:53:45.817+1100","logger":"sectors","caller":"pipeline/fsm.go:792","msg":"sector 24749 got error event sealing.SectorCommitFailed: failed to wait for porep inclusion: handler: websocket connection closed"}
{"level":"warn","ts":"2022-10-28T19:53:45.817+1100","logger":"wdpost","caller":"wdpost/wdpost_sched.go:164","msg":"window post scheduler notifs channel closed"}
{"level":"info","ts":"2022-10-28T19:53:45.817+1100","logger":"markets","caller":"loggers/loggers.go:21","msg":"storage provider event","name":"ProviderEventDealPublishError","proposal CID":"bafyreifu34e7txzadvghqed2rv6ehqwfkn5y6kbm6y2kz6noxqsoygiyhq","state":"StorageDealFailing","message":"PublishStorageDeal error: PublishStorageDeals errored: WaitForPublishDeals errored: handler: websocket connection closed"}
{"level":"error","ts":"2022-10-28T19:53:45.817+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:648","msg":"sendReqest failed (Handle me): write tcp 127.0.0.1:33768->127.0.0.1:1234: use of closed network connection"}
{"level":"info","ts":"2022-10-28T19:53:45.816+1100","logger":"graphsync","caller":"impl/graphsync.go:488","msg":"Graphsync ReceiveError from 12D3KooWBG9hnG21y2okjRPHUqS5XykNeMjX6Vgc8f2LjUMCvpsJ: stream reset"}
{"level":"error","ts":"2022-10-28T19:53:45.820+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:648","msg":"sendReqest failed (Handle me): write tcp 127.0.0.1:33768->127.0.0.1:1234: use of closed network connection"}
{"level":"warn","ts":"2022-10-28T19:53:45.823+1100","logger":"providerstates","caller":"providerstates/provider_states.go:599","msg":"deal bafyreifulmjl5bv4qnlwwp3kdmdspuvsdkxlqsykbsw4i346u2edciwxny failed: PublishStorageDeal error: PublishStorageDeals errored: WaitForPublishDeals errored: handler: websocket connection closed"}
{"level":"warn","ts":"2022-10-28T19:53:45.825+1100","logger":"providerstates","caller":"providerstates/provider_states.go:599","msg":"deal bafyreifujoegpvv6enkcicojrfaksku2dr4vzxfnmxv5anlicifefhwflq failed: PublishStorageDeal error: PublishStorageDeals errored: WaitForPublishDeals errored: handler: websocket connection closed"}
{"level":"warn","ts":"2022-10-28T19:53:45.825+1100","logger":"providerstates","caller":"providerstates/provider_states.go:599","msg":"deal bafyreifu34e7txzadvghqed2rv6ehqwfkn5y6kbm6y2kz6noxqsoygiyhq failed: PublishStorageDeal error: PublishStorageDeals errored: WaitForPublishDeals errored: handler: websocket connection closed"}
{"level":"error","ts":"2022-10-28T19:53:45.825+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:648","msg":"sendReqest failed (Handle me): write tcp 127.0.0.1:33768->127.0.0.1:1234: use of closed network connection"}
{"level":"warn","ts":"2022-10-28T19:53:45.825+1100","logger":"providerstates","caller":"providerstates/provider_states.go:599","msg":"deal bafyreiftnhxuxvilcooowkb6kmf3kldnct56b2stbwnj244rnsan2rkxmi failed: PublishStorageDeal error: PublishStorageDeals errored: WaitForPublishDeals errored: handler: websocket connection closed"}
{"level":"error","ts":"2022-10-28T19:53:45.825+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:648","msg":"sendReqest failed (Handle me): write tcp 127.0.0.1:33768->127.0.0.1:1234: use of closed network connection"}
{"level":"error","ts":"2022-10-28T19:53:45.826+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:648","msg":"sendReqest failed (Handle me): write tcp 127.0.0.1:33768->127.0.0.1:1234: use of closed network connection"}
{"level":"error","ts":"2022-10-28T19:53:45.826+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:648","msg":"sendReqest failed (Handle me): write tcp 127.0.0.1:33768->127.0.0.1:1234: use of closed network connection"}
{"level":"error","ts":"2022-10-28T19:53:45.830+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:648","msg":"sendReqest failed (Handle me): write tcp 127.0.0.1:33768->127.0.0.1:1234: use of closed network connection"}

Node Logs

{"level":"warn","ts":"2022-10-28T19:53:44.512+1100","logger":"net/identify","caller":"identify/id.go:329","msg":"failed to identify 12D3KooWCj6urHcbBZTnpxRDwEhZvbcuhqvocoXZN69n8TjYo88u: stream reset"}
{"level":"warn","ts":"2022-10-28T19:53:45.814+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/handler.go:329","msg":"error in RPC call to 'Filecoin.StateWaitMsg': context canceled"}
{"level":"error","ts":"2022-10-28T19:53:45.816+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:129","msg":"handle me:write tcp4 127.0.0.1:1234->127.0.0.1:33768: use of closed network connection"}
{"level":"warn","ts":"2022-10-28T19:53:45.816+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/handler.go:329","msg":"error in RPC call to 'Filecoin.StateWaitMsg': context canceled"}
{"level":"warn","ts":"2022-10-28T19:53:45.816+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/handler.go:329","msg":"error in RPC call to 'Filecoin.StateWaitMsg': context canceled"}
{"level":"warn","ts":"2022-10-28T19:53:45.816+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/handler.go:329","msg":"error in RPC call to 'Filecoin.StateWaitMsg': context canceled"}
{"level":"warn","ts":"2022-10-28T19:53:45.816+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/handler.go:329","msg":"error in RPC call to 'Filecoin.StateWaitMsg': context canceled"}
{"level":"error","ts":"2022-10-28T19:53:45.816+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:122","msg":"handle me:write tcp4 127.0.0.1:1234->127.0.0.1:33768: use of closed network connection"}
{"level":"warn","ts":"2022-10-28T19:53:45.816+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/handler.go:329","msg":"error in RPC call to 'Filecoin.StateWaitMsg': context canceled"}
{"level":"warn","ts":"2022-10-28T19:53:45.816+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/handler.go:329","msg":"error in RPC call to 'Filecoin.StateWaitMsg': context canceled"}
{"level":"warn","ts":"2022-10-28T19:53:45.816+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/handler.go:329","msg":"error in RPC call to 'Filecoin.StateWaitMsg': context canceled"}
{"level":"warn","ts":"2022-10-28T19:53:45.816+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/handler.go:329","msg":"error in RPC call to 'Filecoin.StateWaitMsg': context canceled"}
{"level":"error","ts":"2022-10-28T19:53:45.816+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:122","msg":"handle me:write tcp4 127.0.0.1:1234->127.0.0.1:33768: use of closed network connection"}
{"level":"error","ts":"2022-10-28T19:53:45.817+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:122","msg":"handle me:write tcp4 127.0.0.1:1234->127.0.0.1:33768: use of closed network connection"}
{"level":"error","ts":"2022-10-28T19:53:45.817+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:122","msg":"handle me:write tcp4 127.0.0.1:1234->127.0.0.1:33768: use of closed network connection"}
{"level":"error","ts":"2022-10-28T19:53:45.817+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:122","msg":"handle me:write tcp4 127.0.0.1:1234->127.0.0.1:33768: use of closed network connection"}
{"level":"error","ts":"2022-10-28T19:53:45.817+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:122","msg":"handle me:write tcp4 127.0.0.1:1234->127.0.0.1:33768: use of closed network connection"}
{"level":"error","ts":"2022-10-28T19:53:45.817+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:122","msg":"handle me:write tcp4 127.0.0.1:1234->127.0.0.1:33768: use of closed network connection"}
{"level":"error","ts":"2022-10-28T19:53:45.817+1100","logger":"rpc","caller":"go-jsonrpc@v0.1.7/websocket.go:122","msg":"handle me:write tcp4 127.0.0.1:1234->127.0.0.1:33768: use of closed network connection"}
{"level":"info","ts":"2022-10-28T19:53:46.611+1100","logger":"net/identify","caller":"identify/id.go:364","msg":"failed negotiate identify protocol with peer","peer":"12D3KooWCBsey7qcWfdMXGtUWLQbqY8gytEHgEpSEEvep9CyAKXo","error":"stream reset"}

@jacobheun
Copy link
Contributor

@marshyonline are you still running into this issue post v1.18?

@marshyonline
Copy link

Moving onto 1.18 in the next few days - will update after upgrade.

I plan on testing RAFT in 1.19 when its ready to see if this helps also

@TippyFlitsUK
Copy link
Contributor

Hi 👋

The Legacy Lotus Markets sub-system reached EOL at the end of the 31st January 2023.

This ticket is being marked as won't fix and closed as the Lotus Team will no longer be making any further fixes or enhancements to the legacy markets subsystem.

Please feel free to re-open this ticket in the new Boost markets sub-system repository at https://github.com/filecoin-project/boost if you feel that it is still relevant.

Many thanks 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants