Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Sector spinning on GetTicket/PreCommit1 causing kvlog to grow huge #6191

Closed
neondragon opened this issue May 5, 2021 · 7 comments
Closed
Labels
area/sealing kind/bug Kind: Bug kind/stale P2 P2: Should be resolved

Comments

@neondragon
Copy link

Problem

I had four sectors get into a state where they moved quickly between GetTicket and PreCommit1 in a loop.

This also caused .miner/kvlog to grow at a rate of ~200MB/s to 1.6TB in size, which nearly exhausted all disk space before I caught the problem.

Restarting lotus-miner resulted in 18 minutes of extra downtime at the kvlog compaction step.

Version

lotus-miner version 1.9.0-rc1+mainnet+git.050ce5092.dirty

To reproduce

I am not sure what caused the issue, but it coincided with a problem with a C2 worker resulting in these sectors failing to schedule C2 a number of times, then re-running PC1, and failing C2 again. After I restarted lotus-miner to resolve that issue, the sectors went into this GetTicket/PreCommit1 loop.

Expected

Sectors perform GetTicket once when necessary and then proceed through the pipeline as normal. Kvlog doesn't grow unreasonably large.

Logs

lotus-miner sectors status --log
Truncated sector log -- produced SectorTicket/SectorOldTicket until I remvoed the sector

4563.   2021-05-04 21:32:22 +0000 UTC:  [event;sealing.SectorTicket]    {"User":{"TicketValue":"IvlTBBsaocm0+SBq4HVpEaqCXtW9Ke0LpLAIQku45og=","TicketEpoch":724232}}
4564.   2021-05-04 21:32:22 +0000 UTC:  [event;sealing.SectorOldTicket] {"User":{}}
4565.   2021-05-04 21:32:23 +0000 UTC:  [event;sealing.SectorTicket]    {"User":{"TicketValue":"IvlTBBsaocm0+SBq4HVpEaqCXtW9Ke0LpLAIQku45og=","TicketEpoch":724232}}
4566.   2021-05-04 21:32:23 +0000 UTC:  [event;sealing.SectorOldTicket] {"User":{}}
4567.   2021-05-04 21:32:23 +0000 UTC:  [event;sealing.SectorTicket]    {"User":{"TicketValue":"IvlTBBsaocm0+SBq4HVpEaqCXtW9Ke0LpLAIQku45og=","TicketEpoch":724232}}
4568.   2021-05-04 21:32:24 +0000 UTC:  [event;sealing.SectorOldTicket] {"User":{}}
4569.   2021-05-04 21:32:24 +0000 UTC:  [event;sealing.SectorTicket]    {"User":{"TicketValue":"IvlTBBsaocm0+SBq4HVpEaqCXtW9Ke0LpLAIQku45og=","TicketEpoch":724232}}
4570.   2021-05-04 21:32:24 +0000 UTC:  [event;sealing.SectorOldTicket] {"User":{}}
4571.   2021-05-04 21:32:25 +0000 UTC:  [event;sealing.SectorTicket]    {"User":{"TicketValue":"IvlTBBsaocm0+SBq4HVpEaqCXtW9Ke0LpLAIQku45og=","TicketEpoch":724232}}
4572.   2021-05-04 21:32:25 +0000 UTC:  [event;sealing.SectorOldTicket] {"User":{}}
4573.   2021-05-04 21:32:25 +0000 UTC:  [event;sealing.SectorTicket]    {"User":{"TicketValue":"IvlTBBsaocm0+SBq4HVpEaqCXtW9Ke0LpLAIQku45og=","TicketEpoch":724232}}
4574.   2021-05-04 21:32:25 +0000 UTC:  [event;sealing.SectorOldTicket] {"User":{}}
4575.   2021-05-04 21:32:26 +0000 UTC:  [event;sealing.SectorTicket]    {"User":{"TicketValue":"IvlTBBsaocm0+SBq4HVpEaqCXtW9Ke0LpLAIQku45og=","TicketEpoch":724232}}
4576.   2021-05-04 21:32:26 +0000 UTC:  [event;sealing.SectorOldTicket] {"User":{}}
4577.   2021-05-04 21:32:27 +0000 UTC:  [event;sealing.SectorTicket]    {"User":{"TicketValue":"IvlTBBsaocm0+SBq4HVpEaqCXtW9Ke0LpLAIQku45og=","TicketEpoch":724232}}
4578.   2021-05-04 21:32:27 +0000 UTC:  [event;sealing.SectorOldTicket] {"User":{}}
4579.   2021-05-04 21:32:27 +0000 UTC:  [event;sealing.SectorRemove]    {"User":{}}
4580.   2021-05-04 21:32:27 +0000 UTC:  [event;sealing.SectorTicket]    {"User":{"TicketValue":"IvlTBBsaocm0+SBq4HVpEaqCXtW9Ke0LpLAIQku45og=","TicketEpoch":724232}}
@Shekelme
Copy link

I see something very similar on lotus-miner version 1.9.0-rc2+mainnet+git.16a9baa3b (rc3 actually).

@Shekelme
Copy link

Do we know why *.log.cbor is growing so fast? The root partition on miner was full twice in the last 24 hours because of *.log.cbor. What's going on?

@jennijuju
Copy link
Member

One of the possible ways to handle this is: when we notice this log file(the write log for the metadata) is too large, we should compact it or overwrites it.

@jennijuju jennijuju added area/sealing P2 P2: Should be resolved labels May 24, 2021
@llifezou
Copy link
Contributor

I submitted a pr, it may solve this problem. #6635

@Reiers
Copy link

Reiers commented Dec 14, 2021

@neondragon Hi Alex. This issue solved, right?
I have not seen this issue from Lotus v1.13.0 and up.

@Reiers Reiers added the need/author-input Hint: Needs Author Input label Dec 14, 2021
@github-actions
Copy link

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 24 hours.

@github-actions
Copy link

This issue was closed because it is missing author input.

@TippyFlitsUK TippyFlitsUK removed the need/author-input Hint: Needs Author Input label Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sealing kind/bug Kind: Bug kind/stale P2 P2: Should be resolved
Projects
None yet
Development

No branches or pull requests

7 participants