-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log progress on network migrations #12259
Comments
Progress bar is only going to be useful if you're running Lotus in a terminal; but we could probably couple this with just improving these log messages because they're so opaque. I don't even know what these mean or how to connect them to any sense of "progress". We can do better I think. |
From triage today:
|
We currently have this in the migration: https://github.com/filecoin-project/go-state-types/blob/9a76026713cdf92be72a085c78a041ed3aa100af/migration/runner.go#L103C9-L115 And we set it to 10s here: lotus/chain/consensus/filcns/upgrades.go Lines 2649 to 2650 in 2cd6f40
So we should be seeing info logs every 10 seconds with Maybe the question for this issue is: why aren't we seeing these? |
OK, so now we've had the migration:
So we don't get any of those I've asked @virajbhartiya to have a look at this and see if he can understand what might be the blocker and maybe we can make it better for the next upgrade. Here's some pointers:
lotus/chain/consensus/filcns/upgrades.go Lines 2692 to 2705 in 467c6ff
The That And the important bit is where it calls That's where the logging happens, and it's the So back to lotus,
I think that the answer to this might just be that the timing is too low. My duration is There's a therad in slack with some others posting their greps: https://filecoinproject.slack.com/archives/C027TQMUVJN/p1732143841560979, @TippyFlitsUK's took particularly long because he had a splitstore compaction. So he might have logs that show So @virajbhartiya the things we need to do with this I think are:
|
Oh, and I think the message should be improved. Think about it from a user perspective - they don't really care how many jobs there are, they care about % complete. So we should at least add a % to the output so they don't have to figure it out for themselves. |
Thanks @rvagg for the logs and for providing the reference of where to start, I'll begin working on it |
@rvagg If I am correct these are the pending after logs you are referring to right in the comment above
|
Excellent, yes those are the logs @virajbhartiya, thanks for hunting those down. So we know they are showing up. I think the two things remaining here are:
Looking at them now it's clear that we're creating jobs at the same time as processing them, so we don't get to do a a. We should probably add the word "migration" in there, start with "Performing migration" perhaps?
Once every 10 minutes is far too slow here, we should see it ticking away. I'd be happy with a number between 2 and 5 seconds. For a 30 second migration that would be between 15 and 6 log entries. For a worst case of 10 minutes it'd be between 300 and 120 log entries. Is that bad? I don't think so. @rjan90 you're a log watcher, got an opinion on how much spam is too much for this progress output? Next, where to do that timing. I think you could start by doing it in the v15 migration, but @kamuik16 has already opened a skeleton PR @ #12707 which would need the changes in the v16 block. So we need to coordinate on getting that done. Either we land v15 changes and get @kamuik16 to merge them in, or we wait till @kamuik16's PR is landed and you just do v16, or you PR to @kamuik16's branch but then you don't get a commit in the git log for your efforts. |
I am fine with anything. Yours and @virajbhartiya's call. |
Thou I think it would be best after my PR lands :) |
sure, let's wait, it looks like #12707 could land pretty quick, just a small change needed |
nv16 skeleton is now merged so this is good to go |
Done Criteria:
Why Important:
User/Customer:
Example Output from a Migration:
The text was updated successfully, but these errors were encountered: