-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rough cut: new mining loop with timing. WIP #722
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hairy stuff, Wyatt. And also this code is hairy too :) I spent about 20m looking this over and can take a closer look with you if you want, maybe while we zoom? Broad strokes:
-
I think your approach of driving the scheduling via states is sensible. Doesn't seem like we need a FSM package, I think the simple thing you're doing seems to work.
-
My biggest suggestion is to encapsulate the scheduling into its own thingy. Pull all that delay/grace/etc logic out of the worker and put it into a WorkerScheduler or MiningScheduler that sits in between the input channel and the worker. This will make it easier to understand but more importantly far easier to test. It will enable you to test the scheduling logic separate from the worker logic. Lemme know if you want to talk more about this.
-
Unclear from cursory inspection whether access to the worker's fields are safe from all these goroutines. It would be helpful to carefully state the access model (what prevents them from being access concurrently?), and you might even need a mutex.
-
You probably considered this but you might be able to reduce the size of the scheduling implementation if you inline the changes of behavior in one function with if statements instead of breaking them out. It does seem like a tradeoff length of implementation against clarity, feel free to keep it as is.
mining/worker.go
Outdated
defer doneWg.Done() | ||
fmt.Printf("mine: running 'mine'\n") | ||
w.mine(currentRunCtx, input, w.nullBlockTimer, w.blockGenerator, w.createPoST, outCh) | ||
mineOutCh <- struct{}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to tell at first read but are we using two separate mechanisms to signal done-ness? The wg and the mineOutCh? If so we should probably have exactly one mechanism if that's possible.
The entire mining code feels drastically more complicated than it needs to be. The EC simulator is (as far as i know) a correct implementation of expected consensus, and is really small and easy (subjective, i know) to read: https://github.com/filecoin-project/ecsim/blob/master/main.go#L225 What are we doing that necessitates the huge blowup in complexity? |
@whyrusleeping thanks for checking in. It will help me if we can identify the specific areas where you see this blow up happening. I understand and generally agree with the indirection issues brought up in the conversation between you and @phritz Friday night:
These readability issues have existed for a while and I think it makes sense to address at least some of them in this PR because a lot of things are changing. Something I am less clear about: it seems like you are also commenting here about the changes specific to this PR. If so great, I've been looking for feedback, but could you ack/nack that this is the case? Also if this is the case is the extent of your feedback that timing should work more as it does in the EC sim? If so I can work with that, but if you have other opinions about these changes then it would be good to hear them now. |
mining/worker.go
Outdated
// Older epoch? Do nothing. TODO: is it secure that old heavier tipsets are ignored? | ||
// Current epoch? Replace base. | ||
// Newer epoch? Loop back to DELAY state for new epoch. | ||
if inEpoch == epoch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should be using a switch here
878874a
to
033d9f0
Compare
Hey hey hey Wassawassawassup? @whryusleeping @phritz @dignifiedquire I have a refactoring I am excited about and would like to share it with y'all now. Terms and conditionsDisclaimer # 1 -- I haven't messed with the scheduling code since the last push, it is still broken and lots of tests fail Disclaimer # 2 -- This touches a lot of files and is probably really annoying to read. I am sorry about this. Doing all of this at once seemed like the only way. If it is too annoying to read I can try to go back and split it into more digestable pieces. Disclaimer # 3 -- Elephant in the room is that the merge conflicts will probably be pretty bad with #729. I am willing to deal with all of these after that merges. I already have some changes from the early commits of that PR added into this refactor. What is all this?Before this mining was split roughly into the Worker and BlockGenerator interfaces. This refactor changes this to a Scheduler and Worker interface. Scheduler -- we are now creating a more complex mining strategy that involves timing. The Scheduler interface is where these strategies fit into the mining code. This interface has a Start() method that begins mining and provides input and output channels to the rest of the system just like the old Worker interface. Schedulers schedule mining work for a Worker. Worker -- as before Worker is the thing that actually mines. However in this commit Worker is now seperate from the thing that Starts mining (that is the Scheduler). Workers do work with the interface Mine method. BlockGenerator -- the BlockGenerator has been absorbed into the Worker.
Lots of tests have changed but this commit has kept most of the old behavioral coverage. Moving the seam to Scheduler/Worker is about as fine-grained as the old seam between Worker/BlockGenerator. Lots of Scheduler tests are missing because the timingScheduler is still broken. Some of the behavior moved into the scheduler from the Mine method, like null block incrementing, is not yet tested. Asks from y'all@whyrusleeping do you find that this improves the problems you were having reading the old mining code? Any suggestions on doing things better from this perspective? @phritz do you find that components are separated out cleanly enough that testing this is acceptably easy? Any suggestions on doing things better from this perspective? Are people sold on the Fall-Of-The-BlockGenerator premise? Everyone: Would people prefer that I try to merge this sooner rather than later? Concretely would you prefer I revert the default scheduler to something that resembles scheduling code before this PR so that I can merge this refactor before completing and fixing the timingScheduler in a separate PR? |
Rebase on #729 is in, the refactor is ready for review. |
mining/scheduler.go
Outdated
for { | ||
select { | ||
case <-miningCtx.Done(): | ||
fmt.Printf("runMine: miningCtx.Done()\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm debug statements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or turn them into event logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make this debug or info logs ("Mining canceled, shutting down current mining run.", etc)
Re event logs since we're going to be relying on event logs for kitty hawk I'd like to keep them relatively clean and high level. But I guess we haven't had that discussion yet...
mining/scheduler.go
Outdated
fmt.Printf("runMine: mineInCh: %v\n", input) | ||
if ok { | ||
currentRunCancel() | ||
currentRunCtx, currentRunCancel = context.WithCancel(input.Ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems weird to rely on a context passed from the input. Whats the reasoning for that?
It seems like we could drastically simplify this loop without having to worry about that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Using the input context was used in the previous structure because it allowed for canceling mining jobs in a fine grained way based on input. I am leaning towards saying that the whole paradigm of canceling specific jobs as the producer of the inCh is not what we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah get rid of input.context, doesn't seem useful.
mining/scheduler.go
Outdated
// PoSTs too fast. | ||
defer func() { | ||
if s.miningAt != epoch { | ||
// log.Warningf("Not enough time to mine on epochs %d to %d", s.miningAt, epoch - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave this warning in.
mining/scheduler.go
Outdated
} | ||
epoch += s.base.NullBlks | ||
// No blocks found last epoch? Add a null block for this epoch's input. | ||
if s.miningAt > epoch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logic feels funky. I need to think about it a bit more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least one funky thing to me is mutating the input. I would expect the input to be const and to do bookkeeping elsewhere.
mining/scheduler.go
Outdated
if err != nil { | ||
panic("this can't be happening") | ||
} | ||
// Older epoch? Do nothing. TODO: is it secure that old heavier tipsets are ignored? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should always select the heaviest tipset, regardless of epoch. For example, if we're mining on top of a chain that has a ton of null blocks in it, from someone who is ignoring others in the chain, and then we get a block from the main chain that is much denser, we want to switch over to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly agree with the spirit of this comment but am left unsure of how to proceed with regards to writing the scheduler logic as presented in #572 and with regards to resolving the attack motivating that issue. See my next comment for more detail.
mining/scheduler.go
Outdated
if inEpoch == epoch { | ||
s.base = input | ||
} | ||
if inEpoch > epoch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit weird to be tracking the epoch like this. The epoch itself doesnt really matter, we should just always be mining on top of the heaviest known tipset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is weird, relying on this feels brittle and unsettling to me. Alternative: instead of collecting tipsets at the current epoch the scheduler collects heaviest tipsets, then ignores as we do now. Now the scheduler always waits for mining to finish during the ignore stage, because it can't tell the difference between adversarial (same epoch) and non-adversarial (higher epoch) interruptions.
The downside I see is that now miners won't see non-adversarially delayed heaviest tipsets ASAP anymore. It seems like this increases the potential of nodes' clocks drifting. In this alternative strategy the node's delay period no longer begins near the delay period of other nodes. Once clocks start drifting my sense, though I'm not so sure about this, is that the whole delay collect paradigm is no longer that useful.
My general observation is that these timed block withholding / wasted work attacks remind me a lot of selfish mining and seem tricky to totally address by improving only this part of the system. Maybe we should be thinking about these attacks in the context of the whole system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be useful to get a better sense for the timing guarantees behind PoSTs in order to evaluate and improve the design. For example should we consider the case where a subset of miners can consistently accelerate their PoST calculation to finish x% faster someday in the future or even today? What is the typical proving time standard deviation?
We should sync up on this stuff tomorrow!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the adversarial model, if it's something future code authors should consider please comment it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More generally, yeah of all the code here I find delay() most confusing. See my suggestion that we just axe it if it is as it seems just an optimization. We are sitting here talking about it when we could be doing other useful stuff.
@whyrusleeping I agree with most of what you are saying and these comments are really useful with regards to improving the scheduler. To be clear though I would like your input on the mining refactor pushed in the latest commit (basically everything except the scheduler). Specifically:
and
Maybe you saw this and didn't have any complaints, but just wanted to make sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are people sold on the Fall-Of-The-BlockGenerator premise?
Sure.
@phritz do you find that components are separated out cleanly enough that testing this is acceptably easy? Any suggestions on doing things better from this perspective?
I didn't get to reviewing the tests yet!
mining/scheduler.go
Outdated
|
||
// The Scheduler listens for new heaviest TipSets and schedules mining work on | ||
// these TipSets. The scheduler is ultimately responsible for informing the | ||
// rest of the system about new successful blocks. This is the interface to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continued language nit around precise language: what is a "successful" block? I hit that and start to go looking for a "success" concept in the code, we have successful messages is that what he means? So much more clear to just say: "The scheduler informs the rest of the system when the worker mines a new block."
mining/scheduler.go
Outdated
// | ||
// The default Scheduler implementation, timingScheduler, operates in two | ||
// states, 'collect', where the scheduler listens for new heaviest tipsets to | ||
// replace the mining base, and 'ignore', where mining proceeds uninterrupted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "collect" is now "grace"?
Also here you've said what the states are but not why or how they transition. Somewhere you should give a high level description, might as well be here because you've already started. I think in another 1-2 sentences you can define terms epoch/grace/ignore/etc, say why we have these states, and say how they transition.
mining/scheduler.go
Outdated
// Scheduler is the mining interface consumers use. When you Start() the | ||
// scheduler it returns two channels (inCh, outCh) and a sync.WaitGroup: | ||
// - inCh: the caller sends Inputs to mine on to this channel. | ||
// - outCh: the worker sends Outputs to the caller on this channel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the scheduler?
mining/scheduler.go
Outdated
// mineGrace is the protocol grace period | ||
const mineGrace = time.Second | ||
|
||
// mineSleepTime is the protocol's estimated mining time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say why we define this (so we can fake it).
mining/scheduler.go
Outdated
// mineSleepTime is the protocol's estimated mining time | ||
const mineSleepTime = mineGrace * 30 | ||
|
||
// mineDelay is the protocol mining delay at the beginning of an epoch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say why we have it.
mining/scheduler.go
Outdated
// worker to cancel the context of the previous mining run if any and start | ||
// mining on the new block. Any results are sent into its output channel. | ||
// Closing the input channel does not cause the worker to stop; cancel | ||
// the Input.Ctx to cancel an individual mining run or the mininCtx to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I say axe input.ctx.
mining/scheduler.go
Outdated
// worker to cancel the context of the previous mining run if any and start | ||
// mining on the new block. Any results are sent into its output channel. | ||
// Closing the input channel does not cause the worker to stop; cancel | ||
// the Input.Ctx to cancel an individual mining run or the mininCtx to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
miningCtx
messagePool *core.MessagePool | ||
applyMessages miningApplier | ||
powerTable core.PowerTableView | ||
blockstore blockstore.Blockstore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the mining code need the block and ipld store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because of 1. the refactor puts block generation on the worker, 2. the changes in #729 cause powerTable reading to depend on the ipld store and cause applyMessages to depend on the blockstore.
mining/worker.go
Outdated
TipSet core.TipSet | ||
Ctx context.Context // TODO: we should evaluate if this is still useful | ||
TipSet core.TipSet | ||
NullBlks uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if I'd put the null block count here. mining.Input is what the scheduler gets and this field I think will always be empty for the scheduler. The null block count is a function of the scheduler and so could be passed in separately. If you keep it here please put a comment on this saying that it will be set by the scheduler.
node.miningCtx, node.cancelMining = context.WithCancel(context.Background()) | ||
inCh, outCh, doneWg := node.MiningWorker.Start(node.miningCtx) | ||
inCh, outCh, doneWg := node.MiningScheduler.Start(node.miningCtx) | ||
node.miningInCh = inCh | ||
node.miningDoneWg = doneWg | ||
node.AddNewlyMinedBlock = node.addNewlyMinedBlock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO(EC) on line 458 can be deleted
@phritz thanks for all of this review. Unfortunately it looks like I wasn't clear enough in Disclaimer # 1 in the request for review comment. I was trying to spare you from looking at the confusing, broken scheduler until I took another stab at it. In any case @whyrusleeping and I came to many similar conclusions and I think you will be a lot happier with the next iteration. |
baad210
to
33a3e19
Compare
mining/scheduler.go
Outdated
func (s *timingScheduler) runWorker(miningCtx context.Context, outCh chan<- Output, mineInCh <-chan Input, doneWg *sync.WaitGroup) { | ||
defer doneWg.Done() | ||
currentRunCancel := func() {} | ||
var currentRunCtx context.Context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a need for keeping this up here, seems like its only relevant to the scope its created in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is necessary for canceling existing mining runs when a new input arrives on the mineInCh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure we only want one goroutine mining at a time right now, and canceling this context seems like the cleanest way.
mining/scheduler.go
Outdated
// collect initializes the next round of mining, canceling any previous mining | ||
// calls still running. If the eager flag is set, collect starts mining right | ||
// away, possibly starting and stopping multiple mining jobs. | ||
func (s *timingScheduler) collect(miningCtx context.Context, inCh <-chan Input, mineInCh chan<- Input, eager bool) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does the return value signify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It determines whether we should end the scheduler (like the old version's end state), will update.
return false | ||
case input, ok := <-inCh: | ||
if !ok { | ||
// sender closed inCh, close and ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we not exit this loop here? Is the in channel being closed an expected thing?
When the caller shuts itself down it could tear things down in any order, eg close the channel and then cancel the context, or vice versa. It's accommodating that fact gracefully -- basically ignoring a channel close and shutting down only when the context cancels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, Wyatt. Way to go!
// The timingScheduler operates in two states, 'collect', where the scheduler | ||
// listens for new heaviest tipsets to use as the best mining base, and 'ignore', | ||
// where mining proceeds uninterrupted. The scheduler enters the 'collect' state | ||
// each time a new heaviest tipset arrives with a greater height. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean greater height here? or weight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
height!
// where mining proceeds uninterrupted. The scheduler enters the 'collect' state | ||
// each time a new heaviest tipset arrives with a greater height. The | ||
// scheduler finishes the collect state after the mining delay time, a protocol | ||
// parameter, has passed. The scheduler then enters the 'ignore' state. Here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a protocol parameter in the sense that everyone has to respect it? seems doubtful right -- how do we enforce it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding right now is that this is something that everyone honestly following the protocol does. Definitely not enforceable. Like all of our parameters the goal is to get incentive hacking just right so that it is in everyone's best interests to operate like this.
// or 'mining base' is used to denote the tipset that the miner uses as the | ||
// parent of the block it attempts to generate during mining. | ||
// | ||
// The timingScheduler operates in two states, 'collect', where the scheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say something about the rationality of miners waiting the delay period? Why is it in their best interest to do that instead of mine right away (because then they can mine on an even heavier chain?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, you've got it, they don't want to waste their time on the first tipset they see because another could show up real quick. I will add this.
mining/scheduler.go
Outdated
// scheduler finishes the collect state after the mining delay time, a protocol | ||
// parameter, has passed. The scheduler then enters the 'ignore' state. Here | ||
// the scheduler mines, ignoring all inputs with the most recent and lower | ||
// heights. 'ignore' concludes when the scheduler receives an input tipset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for clarity point out that the newly arriving tipset of greater height could be the one it just mined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea, that is definitely a subtle thing.
mining/scheduler.go
Outdated
|
||
// mineDelay is the protocol mining delay. The timingScheduler waits for the | ||
// mining delay during its 'collect' state. | ||
const mineDelay = time.Millisecond * 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is really tiny. Is 20ms a realistic time window? My RTT to europe is 200ms. Perhaps it's set this small to keep tests running quickly? In any case, please comment where the value came from.
Also, this seems like we'll need a way to easily set parameters of the system like this. At some point (not this CL) we will need to move settings like this into the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup right now this is for testing and I will update the comment. Check out PR #779 for a first step towards getting parameters into the system.
// mineSleepTime is the estimated mining time. We define this so that we can | ||
// fake mining with the current incomplete system. TODO this needs to be | ||
// configurable to expediate both unit and large scale testing. | ||
const mineSleepTime = mineDelay * 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say units in the name. mineDelayMS or similar.
mining/worker.go
Outdated
return <-outCh | ||
// DefaultWorker runs a mining job. | ||
type DefaultWorker struct { | ||
createPoST DoSomeWorkFunc // TODO: rename createPoSTFunc? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yoda say: do or do not. There is no TODO maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Om Sri Bool
mining/worker_test.go
Outdated
return st, nil | ||
} | ||
|
||
// Success case. TODO: this case isn't testing much. Testing w.Mine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's testing that the code runs and doesn't wait forever or deadlock or anything, which is something. But agree need more testing here.
mining/worker_test.go
Outdated
@@ -279,3 +121,337 @@ func TestCreateChallenge(t *testing.T) { | |||
assert.Equal(decoded, r) | |||
} | |||
} | |||
|
|||
// Returns a ticket checking function that return true every third time | |||
/*func everyThirdWinningTicket() func(_ []byte, _, _ int64) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? Commented out so you could use it in manual testing or something?
mining/worker_test.go
Outdated
var mockSigner = types.NewMockSigner(ki) | ||
|
||
func TestGenerate(t *testing.T) { | ||
// TODO fritz use core.FakeActor for state/contract tests for generate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about just TODO here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good to me, let's go go go
3ff1247
to
f05e271
Compare
Before this mining was split roughly into the Worker and BlockGenerator interfaces. This commit changes this to a Scheduler and Worker interface. Scheduler -- we are now creating a more complex mining strategy that involves timing. The Scheduler interface is where these strategies fit into the mining code. This interface has a Start() method that begins mining and provides input and output channels to the rest of the system just like the old Worker interface. Schedulers schedule mining work for a Worker. Worker -- as before Worker is the thing that actually mines. However in this commit Worker is now seperate from the thing that Starts mining (that is the Scheduler). Workers do work with the interface Mine method. BlockGenerator -- the BlockGenerator has been absorbed into the Worker. Here is my rational. 1. This facilitates making the code easier to follow with respect to @whyrusleeping's issues. We are now one interface and one object fewer in the mining code. 2. The seperation is no longer justified. In my view now that the worker is no longer responsible for scheduling, there is enough room in the abstraction for block generation as a member function. Looking back at the code of Mine I was not convinced this interface is worth the trouble. It seems likely that any alternate implementation of Generate would warrant an alternate implementation of Mine. I don't forsee lots of swapping implementations across the BlockGenerator interface and don't think testing Mine against an abstract Generate buys us that much 3. The seperation is actively getting in the way. worker.Mine has been using fake power values for a while. The way to fix this is to use the powerTableView and getStateTree dependencies used by the block generator. It makes far more sense for one thing to share these dependencies than to inject them into two things. With this refactor we are much closer to getting actual power values. Lots of tests have changed but this commit has kept most of the old behavioral coverage. Moving the seam to Scheduler/Worker is about as fine-grained as the old seam between Worker/BlockGenerator. Lots of Scheduler tests are missing because the timingScheduler is still broken. Some of the behavior moved into the scheduler from the Mine method, e.g. null block incrementing is not yet tested.
This is an attempt to drastically simplify the timingScheduler Now the scheduler only has two states and only tracks a base input. Nullblock increment is moved back to the worker. This scheduler has a few idiosyncracies but is simple to reason about and prevents mining processes from getting interrupted midway through PoST generation. New scheduler tests added.
This PR feels more like design than the last several have, so I’m looking for feedback.
Context, specific feedback requests
Disclaimer: I know things are somewhat broken. I’m not looking for feedback about smaller details but rather about the bigger picture. I want to know sooner rather than later if this needs to be rewritten in a major way.
I am not in love with the general structure of the new implementation. The primitives I went with here are basically select statements, goroutines, and big ole infinite loops. The code is a bit repetitive and makes a lot of goroutines. Possibly this would be simpler to read and update if we use more expressive building blocks. What I wrote is basically a little special purpose FSM. Maybe using an FSM library explicitly would make things easier to follow and modify. I wanted to give this implementation a try before making or importing a more general FSM library or something because this was simpler in the short term. Please help me evaluate if I should do a rewrite now to make this simpler in the long term!
There is one big departure from the previous design I’d like to call out. Before the mining entry point was input driven, every mining event was triggered by a new input on the mining channel. In the current design the mining loop, while affected by the arrival of new mining inputs, runs even if there are no new inputs. Another way to put this is that instead of having the mining function loop internally to account for null blocks, this looping is now factored out into the main mining loop. While this seemed like an improvement, I’d like to check in with others if they see things the same way, specifically because this might change what we can expect from canceling specific
input.Ctx
sWhat’s in this design?
Idea
To read this it will help to familiarize yourself with this issue. What’s described there is essentially a finite state machine that controls how mining operates. This FSM can be described with 3 states:
Delay — The worker does not mine yet. Instead the worker receives heaviest TipSets and records each new TipSet as what it is going to mine on top of. At the same time unfinished mining jobs are given one last chance to finish before the next epoch of mining starts. After 2 seconds of waiting the next epoch of mining starts and the state transitions to Grace.
Grace — The worker is mining but also listening for better TipSets. If it hears about a better one then it interrupts the latest mining job and restarts another on top of the new best TipSet. After 1 second the worker transitions to Ignore
Ignore — The worker does not care about better TipSets for this epoch anymore and ignores them. Mining is not interrupted. The mining run should finish within this time and the worker takes note of that. If the worker hears of a heavier TipSet from a new epoch, possibly its own TipSet from mining, it transitions to Delay immediately. Additionally the worker will only wait for 30 seconds before transitioning back to Delay.
If there is no new input for an epoch the worker simply increases the null count on the last mining input during the Delay state.
Code
worker.Start()
launches 3 goroutines:Additionally runMine launches a goroutine for every mining job so that it can stay available to listen to new requests from the receive loop
What's not in this design?
Accounting for any fancier mining strategies. E.g. no thought yet about rationally mining on own tipsets, heavy tipsets from past epochs, lighter tipsets from the future ...