Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rough cut: new mining loop with timing. WIP #722

Merged
merged 2 commits into from
Aug 23, 2018
Merged

Rough cut: new mining loop with timing. WIP #722

merged 2 commits into from
Aug 23, 2018

Conversation

ZenGround0
Copy link
Contributor

This PR feels more like design than the last several have, so I’m looking for feedback.

Context, specific feedback requests

Disclaimer: I know things are somewhat broken. I’m not looking for feedback about smaller details but rather about the bigger picture. I want to know sooner rather than later if this needs to be rewritten in a major way.

I am not in love with the general structure of the new implementation. The primitives I went with here are basically select statements, goroutines, and big ole infinite loops. The code is a bit repetitive and makes a lot of goroutines. Possibly this would be simpler to read and update if we use more expressive building blocks. What I wrote is basically a little special purpose FSM. Maybe using an FSM library explicitly would make things easier to follow and modify. I wanted to give this implementation a try before making or importing a more general FSM library or something because this was simpler in the short term. Please help me evaluate if I should do a rewrite now to make this simpler in the long term!

There is one big departure from the previous design I’d like to call out. Before the mining entry point was input driven, every mining event was triggered by a new input on the mining channel. In the current design the mining loop, while affected by the arrival of new mining inputs, runs even if there are no new inputs. Another way to put this is that instead of having the mining function loop internally to account for null blocks, this looping is now factored out into the main mining loop. While this seemed like an improvement, I’d like to check in with others if they see things the same way, specifically because this might change what we can expect from canceling specific input.Ctxs

What’s in this design?

Idea

To read this it will help to familiarize yourself with this issue. What’s described there is essentially a finite state machine that controls how mining operates. This FSM can be described with 3 states:
Delay — The worker does not mine yet. Instead the worker receives heaviest TipSets and records each new TipSet as what it is going to mine on top of. At the same time unfinished mining jobs are given one last chance to finish before the next epoch of mining starts. After 2 seconds of waiting the next epoch of mining starts and the state transitions to Grace.

Grace — The worker is mining but also listening for better TipSets. If it hears about a better one then it interrupts the latest mining job and restarts another on top of the new best TipSet. After 1 second the worker transitions to Ignore

Ignore — The worker does not care about better TipSets for this epoch anymore and ignores them. Mining is not interrupted. The mining run should finish within this time and the worker takes note of that. If the worker hears of a heavier TipSet from a new epoch, possibly its own TipSet from mining, it transitions to Delay immediately. Additionally the worker will only wait for 30 seconds before transitioning back to Delay.

If there is no new input for an epoch the worker simply increases the null count on the last mining input during the Delay state.

Code

worker.Start() launches 3 goroutines:

  1. runMine — listens for new requests to mine, runs one mining job at a time, signals that mining is done to the receive loop, takes care of sending output over the output channel
  2. receive loop — cycles through the receiver states Delay, Grace, Ignore, running each state’s specific receiver function implementing the above logic. Listens on inCh to receive new heaviest tipsets, and issues requests to the runMine grouting
  3. tear down function for cleanup

Additionally runMine launches a goroutine for every mining job so that it can stay available to listen to new requests from the receive loop

What's not in this design?

Accounting for any fancier mining strategies. E.g. no thought yet about rationally mining on own tipsets, heavy tipsets from past epochs, lighter tipsets from the future ...

Copy link
Contributor

@phritz phritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hairy stuff, Wyatt. And also this code is hairy too :) I spent about 20m looking this over and can take a closer look with you if you want, maybe while we zoom? Broad strokes:

  • I think your approach of driving the scheduling via states is sensible. Doesn't seem like we need a FSM package, I think the simple thing you're doing seems to work.

  • My biggest suggestion is to encapsulate the scheduling into its own thingy. Pull all that delay/grace/etc logic out of the worker and put it into a WorkerScheduler or MiningScheduler that sits in between the input channel and the worker. This will make it easier to understand but more importantly far easier to test. It will enable you to test the scheduling logic separate from the worker logic. Lemme know if you want to talk more about this.

  • Unclear from cursory inspection whether access to the worker's fields are safe from all these goroutines. It would be helpful to carefully state the access model (what prevents them from being access concurrently?), and you might even need a mutex.

  • You probably considered this but you might be able to reduce the size of the scheduling implementation if you inline the changes of behavior in one function with if statements instead of breaking them out. It does seem like a tradeoff length of implementation against clarity, feel free to keep it as is.

mining/worker.go Outdated
defer doneWg.Done()
fmt.Printf("mine: running 'mine'\n")
w.mine(currentRunCtx, input, w.nullBlockTimer, w.blockGenerator, w.createPoST, outCh)
mineOutCh <- struct{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to tell at first read but are we using two separate mechanisms to signal done-ness? The wg and the mineOutCh? If so we should probably have exactly one mechanism if that's possible.

@ZenGround0 ZenGround0 changed the title Rough cut: new mining loop with timing. WANTS DESIGN REVIEW. WANTS FEEDBACK. Rough cut: new mining loop with timing. WIP Aug 10, 2018
@ZenGround0 ZenGround0 removed the request for review from whyrusleeping August 10, 2018 02:24
@whyrusleeping
Copy link
Member

The entire mining code feels drastically more complicated than it needs to be. The EC simulator is (as far as i know) a correct implementation of expected consensus, and is really small and easy (subjective, i know) to read: https://github.com/filecoin-project/ecsim/blob/master/main.go#L225

What are we doing that necessitates the huge blowup in complexity?

@ZenGround0
Copy link
Contributor Author

@whyrusleeping thanks for checking in. It will help me if we can identify the specific areas where you see this blow up happening. I understand and generally agree with the indirection issues brought up in the conversation between you and @phritz Friday night:

Start is an interface method, it launches a goroutine to call w.mine, which is a member function, a small amount of searching finds ‘Mine’ as the usual culprit for being ‘mine’, ‘Mine’ calls blockgenerator.Generate, which is an interface. So to find the implementation I look where the variable itself comes from. That gets passed into ‘Mine’, so we go back up to ‘Start’ and see that it comes from a field on the worker. Soooo where does that get set? We have to go find whoever calls ‘NewWorker’. A small amount of grep later lands me in node/node.go. I find that the block generator gets constructed here, but its code is back in the mining package. and now finally we get the code thats actually creating the new block

These readability issues have existed for a while and I think it makes sense to address at least some of them in this PR because a lot of things are changing.

Something I am less clear about: it seems like you are also commenting here about the changes specific to this PR. If so great, I've been looking for feedback, but could you ack/nack that this is the case? Also if this is the case is the extent of your feedback that timing should work more as it does in the EC sim? If so I can work with that, but if you have other opinions about these changes then it would be good to hear them now.

mining/worker.go Outdated
// Older epoch? Do nothing. TODO: is it secure that old heavier tipsets are ignored?
// Current epoch? Replace base.
// Newer epoch? Loop back to DELAY state for new epoch.
if inEpoch == epoch {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should be using a switch here

@ZenGround0 ZenGround0 force-pushed the feat/ec/8 branch 2 times, most recently from 878874a to 033d9f0 Compare August 16, 2018 19:47
@ZenGround0
Copy link
Contributor Author

ZenGround0 commented Aug 16, 2018

Hey hey hey Wassawassawassup? @whryusleeping @phritz @dignifiedquire

I have a refactoring I am excited about and would like to share it with y'all now.

Terms and conditions

Disclaimer # 1 -- I haven't messed with the scheduling code since the last push, it is still broken and lots of tests fail

Disclaimer # 2 -- This touches a lot of files and is probably really annoying to read. I am sorry about this. Doing all of this at once seemed like the only way. If it is too annoying to read I can try to go back and split it into more digestable pieces.

Disclaimer # 3 -- Elephant in the room is that the merge conflicts will probably be pretty bad with #729. I am willing to deal with all of these after that merges. I already have some changes from the early commits of that PR added into this refactor.

What is all this?

Before this mining was split roughly into the Worker and BlockGenerator interfaces. This refactor changes this to a Scheduler and Worker interface.

Scheduler -- we are now creating a more complex mining strategy that involves timing. The Scheduler interface is where these strategies fit into the mining code. This interface has a Start() method that begins mining and provides input and output channels to the rest of the system just like the old Worker interface. Schedulers schedule mining work for a Worker.

Worker -- as before Worker is the thing that actually mines. However in this commit Worker is now seperate from the thing that Starts mining (that is the Scheduler). Workers do work with the interface Mine method.

BlockGenerator -- the BlockGenerator has been absorbed into the Worker.
Here is my rational.

  1. This facilitates making the code easier to follow with respect to @whyrusleeping's issues. We are now one interface and one object fewer in the mining code.
  2. The seperation is no longer justified. In my view now that the worker is no longer responsible for scheduling, there is enough room in the abstraction for block generation as a member function. Looking back at the code of Mine I was not convinced this interface is worth the trouble. It seems likely that any alternate implementation of Generate would warrant an alternate implementation of Mine. I don't forsee lots of swapping implementations across the BlockGenerator interface and don't think testing Mine against an abstract Generate buys us that much
  3. The seperation is actively getting in the way. worker.Mine has been using fake power values for a while. The way to fix this is to use the powerTableView and getStateTree dependencies used by the block generator. It makes far more sense for one thing to share these dependencies than to inject them into two things. With this refactor we are much closer to getting actual power values.

Lots of tests have changed but this commit has kept most of the old behavioral coverage. Moving the seam to Scheduler/Worker is about as fine-grained as the old seam between Worker/BlockGenerator. Lots of Scheduler tests are missing because the timingScheduler is still broken. Some of the behavior moved into the scheduler from the Mine method, like null block incrementing, is not yet tested.

Asks from y'all

@whyrusleeping do you find that this improves the problems you were having reading the old mining code? Any suggestions on doing things better from this perspective?

@phritz do you find that components are separated out cleanly enough that testing this is acceptably easy? Any suggestions on doing things better from this perspective?

Are people sold on the Fall-Of-The-BlockGenerator premise?

Everyone: Would people prefer that I try to merge this sooner rather than later? Concretely would you prefer I revert the default scheduler to something that resembles scheduling code before this PR so that I can merge this refactor before completing and fixing the timingScheduler in a separate PR?

@ZenGround0
Copy link
Contributor Author

Rebase on #729 is in, the refactor is ready for review.

for {
select {
case <-miningCtx.Done():
fmt.Printf("runMine: miningCtx.Done()\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm debug statements

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or turn them into event logs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this debug or info logs ("Mining canceled, shutting down current mining run.", etc)

Re event logs since we're going to be relying on event logs for kitty hawk I'd like to keep them relatively clean and high level. But I guess we haven't had that discussion yet...

fmt.Printf("runMine: mineInCh: %v\n", input)
if ok {
currentRunCancel()
currentRunCtx, currentRunCancel = context.WithCancel(input.Ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird to rely on a context passed from the input. Whats the reasoning for that?

It seems like we could drastically simplify this loop without having to worry about that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Using the input context was used in the previous structure because it allowed for canceling mining jobs in a fine grained way based on input. I am leaning towards saying that the whole paradigm of canceling specific jobs as the producer of the inCh is not what we want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah get rid of input.context, doesn't seem useful.

// PoSTs too fast.
defer func() {
if s.miningAt != epoch {
// log.Warningf("Not enough time to mine on epochs %d to %d", s.miningAt, epoch - 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave this warning in.

}
epoch += s.base.NullBlks
// No blocks found last epoch? Add a null block for this epoch's input.
if s.miningAt > epoch {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic feels funky. I need to think about it a bit more

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least one funky thing to me is mutating the input. I would expect the input to be const and to do bookkeeping elsewhere.

if err != nil {
panic("this can't be happening")
}
// Older epoch? Do nothing. TODO: is it secure that old heavier tipsets are ignored?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always select the heaviest tipset, regardless of epoch. For example, if we're mining on top of a chain that has a ton of null blocks in it, from someone who is ignoring others in the chain, and then we get a block from the main chain that is much denser, we want to switch over to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly agree with the spirit of this comment but am left unsure of how to proceed with regards to writing the scheduler logic as presented in #572 and with regards to resolving the attack motivating that issue. See my next comment for more detail.

if inEpoch == epoch {
s.base = input
}
if inEpoch > epoch {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit weird to be tracking the epoch like this. The epoch itself doesnt really matter, we should just always be mining on top of the heaviest known tipset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is weird, relying on this feels brittle and unsettling to me. Alternative: instead of collecting tipsets at the current epoch the scheduler collects heaviest tipsets, then ignores as we do now. Now the scheduler always waits for mining to finish during the ignore stage, because it can't tell the difference between adversarial (same epoch) and non-adversarial (higher epoch) interruptions.

The downside I see is that now miners won't see non-adversarially delayed heaviest tipsets ASAP anymore. It seems like this increases the potential of nodes' clocks drifting. In this alternative strategy the node's delay period no longer begins near the delay period of other nodes. Once clocks start drifting my sense, though I'm not so sure about this, is that the whole delay collect paradigm is no longer that useful.

My general observation is that these timed block withholding / wasted work attacks remind me a lot of selfish mining and seem tricky to totally address by improving only this part of the system. Maybe we should be thinking about these attacks in the context of the whole system.

Copy link
Contributor Author

@ZenGround0 ZenGround0 Aug 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be useful to get a better sense for the timing guarantees behind PoSTs in order to evaluate and improve the design. For example should we consider the case where a subset of miners can consistently accelerate their PoST calculation to finish x% faster someday in the future or even today? What is the typical proving time standard deviation?

We should sync up on this stuff tomorrow!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the adversarial model, if it's something future code authors should consider please comment it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally, yeah of all the code here I find delay() most confusing. See my suggestion that we just axe it if it is as it seems just an optimization. We are sitting here talking about it when we could be doing other useful stuff.

@ZenGround0
Copy link
Contributor Author

@whyrusleeping I agree with most of what you are saying and these comments are really useful with regards to improving the scheduler. To be clear though I would like your input on the mining refactor pushed in the latest commit (basically everything except the scheduler).

Specifically:

@whyrusleeping do you find that this improves the problems you were having reading the old mining code? Any suggestions on doing things better from this perspective?

and

Are people sold on the Fall-Of-The-BlockGenerator premise?

Maybe you saw this and didn't have any complaints, but just wanted to make sure.

Copy link
Contributor

@phritz phritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are people sold on the Fall-Of-The-BlockGenerator premise?

Sure.

@phritz do you find that components are separated out cleanly enough that testing this is acceptably easy? Any suggestions on doing things better from this perspective?

I didn't get to reviewing the tests yet!


// The Scheduler listens for new heaviest TipSets and schedules mining work on
// these TipSets. The scheduler is ultimately responsible for informing the
// rest of the system about new successful blocks. This is the interface to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continued language nit around precise language: what is a "successful" block? I hit that and start to go looking for a "success" concept in the code, we have successful messages is that what he means? So much more clear to just say: "The scheduler informs the rest of the system when the worker mines a new block."

//
// The default Scheduler implementation, timingScheduler, operates in two
// states, 'collect', where the scheduler listens for new heaviest tipsets to
// replace the mining base, and 'ignore', where mining proceeds uninterrupted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "collect" is now "grace"?

Also here you've said what the states are but not why or how they transition. Somewhere you should give a high level description, might as well be here because you've already started. I think in another 1-2 sentences you can define terms epoch/grace/ignore/etc, say why we have these states, and say how they transition.

// Scheduler is the mining interface consumers use. When you Start() the
// scheduler it returns two channels (inCh, outCh) and a sync.WaitGroup:
// - inCh: the caller sends Inputs to mine on to this channel.
// - outCh: the worker sends Outputs to the caller on this channel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the scheduler?

// mineGrace is the protocol grace period
const mineGrace = time.Second

// mineSleepTime is the protocol's estimated mining time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say why we define this (so we can fake it).

// mineSleepTime is the protocol's estimated mining time
const mineSleepTime = mineGrace * 30

// mineDelay is the protocol mining delay at the beginning of an epoch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say why we have it.

// worker to cancel the context of the previous mining run if any and start
// mining on the new block. Any results are sent into its output channel.
// Closing the input channel does not cause the worker to stop; cancel
// the Input.Ctx to cancel an individual mining run or the mininCtx to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say axe input.ctx.

// worker to cancel the context of the previous mining run if any and start
// mining on the new block. Any results are sent into its output channel.
// Closing the input channel does not cause the worker to stop; cancel
// the Input.Ctx to cancel an individual mining run or the mininCtx to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

miningCtx

messagePool *core.MessagePool
applyMessages miningApplier
powerTable core.PowerTableView
blockstore blockstore.Blockstore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the mining code need the block and ipld store?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because of 1. the refactor puts block generation on the worker, 2. the changes in #729 cause powerTable reading to depend on the ipld store and cause applyMessages to depend on the blockstore.

mining/worker.go Outdated
TipSet core.TipSet
Ctx context.Context // TODO: we should evaluate if this is still useful
TipSet core.TipSet
NullBlks uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I'd put the null block count here. mining.Input is what the scheduler gets and this field I think will always be empty for the scheduler. The null block count is a function of the scheduler and so could be passed in separately. If you keep it here please put a comment on this saying that it will be set by the scheduler.

node.miningCtx, node.cancelMining = context.WithCancel(context.Background())
inCh, outCh, doneWg := node.MiningWorker.Start(node.miningCtx)
inCh, outCh, doneWg := node.MiningScheduler.Start(node.miningCtx)
node.miningInCh = inCh
node.miningDoneWg = doneWg
node.AddNewlyMinedBlock = node.addNewlyMinedBlock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO(EC) on line 458 can be deleted

@ZenGround0
Copy link
Contributor Author

ZenGround0 commented Aug 21, 2018

@phritz thanks for all of this review. Unfortunately it looks like I wasn't clear enough in Disclaimer # 1 in the request for review comment. I was trying to spare you from looking at the confusing, broken scheduler until I took another stab at it. In any case @whyrusleeping and I came to many similar conclusions and I think you will be a lot happier with the next iteration.

@ZenGround0 ZenGround0 force-pushed the feat/ec/8 branch 4 times, most recently from baad210 to 33a3e19 Compare August 22, 2018 15:38
func (s *timingScheduler) runWorker(miningCtx context.Context, outCh chan<- Output, mineInCh <-chan Input, doneWg *sync.WaitGroup) {
defer doneWg.Done()
currentRunCancel := func() {}
var currentRunCtx context.Context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a need for keeping this up here, seems like its only relevant to the scope its created in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is necessary for canceling existing mining runs when a new input arrives on the mineInCh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure we only want one goroutine mining at a time right now, and canceling this context seems like the cleanest way.

// collect initializes the next round of mining, canceling any previous mining
// calls still running. If the eager flag is set, collect starts mining right
// away, possibly starting and stopping multiple mining jobs.
func (s *timingScheduler) collect(miningCtx context.Context, inCh <-chan Input, mineInCh chan<- Input, eager bool) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the return value signify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It determines whether we should end the scheduler (like the old version's end state), will update.

return false
case input, ok := <-inCh:
if !ok {
// sender closed inCh, close and ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we not exit this loop here? Is the in channel being closed an expected thing?

@phritz
Copy link
Contributor

phritz commented Aug 22, 2018

why do we not exit this loop here? Is the in channel being closed an expected thing?

When the caller shuts itself down it could tear things down in any order, eg close the channel and then cancel the context, or vice versa. It's accommodating that fact gracefully -- basically ignoring a channel close and shutting down only when the context cancels.

Copy link
Contributor

@phritz phritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, Wyatt. Way to go!

// The timingScheduler operates in two states, 'collect', where the scheduler
// listens for new heaviest tipsets to use as the best mining base, and 'ignore',
// where mining proceeds uninterrupted. The scheduler enters the 'collect' state
// each time a new heaviest tipset arrives with a greater height. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean greater height here? or weight?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

height!

// where mining proceeds uninterrupted. The scheduler enters the 'collect' state
// each time a new heaviest tipset arrives with a greater height. The
// scheduler finishes the collect state after the mining delay time, a protocol
// parameter, has passed. The scheduler then enters the 'ignore' state. Here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a protocol parameter in the sense that everyone has to respect it? seems doubtful right -- how do we enforce it?

Copy link
Contributor Author

@ZenGround0 ZenGround0 Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding right now is that this is something that everyone honestly following the protocol does. Definitely not enforceable. Like all of our parameters the goal is to get incentive hacking just right so that it is in everyone's best interests to operate like this.

// or 'mining base' is used to denote the tipset that the miner uses as the
// parent of the block it attempts to generate during mining.
//
// The timingScheduler operates in two states, 'collect', where the scheduler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say something about the rationality of miners waiting the delay period? Why is it in their best interest to do that instead of mine right away (because then they can mine on an even heavier chain?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, you've got it, they don't want to waste their time on the first tipset they see because another could show up real quick. I will add this.

// scheduler finishes the collect state after the mining delay time, a protocol
// parameter, has passed. The scheduler then enters the 'ignore' state. Here
// the scheduler mines, ignoring all inputs with the most recent and lower
// heights. 'ignore' concludes when the scheduler receives an input tipset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for clarity point out that the newly arriving tipset of greater height could be the one it just mined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, that is definitely a subtle thing.


// mineDelay is the protocol mining delay. The timingScheduler waits for the
// mining delay during its 'collect' state.
const mineDelay = time.Millisecond * 20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is really tiny. Is 20ms a realistic time window? My RTT to europe is 200ms. Perhaps it's set this small to keep tests running quickly? In any case, please comment where the value came from.

Also, this seems like we'll need a way to easily set parameters of the system like this. At some point (not this CL) we will need to move settings like this into the config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup right now this is for testing and I will update the comment. Check out PR #779 for a first step towards getting parameters into the system.

// mineSleepTime is the estimated mining time. We define this so that we can
// fake mining with the current incomplete system. TODO this needs to be
// configurable to expediate both unit and large scale testing.
const mineSleepTime = mineDelay * 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say units in the name. mineDelayMS or similar.

mining/worker.go Outdated
return <-outCh
// DefaultWorker runs a mining job.
type DefaultWorker struct {
createPoST DoSomeWorkFunc // TODO: rename createPoSTFunc?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yoda say: do or do not. There is no TODO maybe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Om Sri Bool

return st, nil
}

// Success case. TODO: this case isn't testing much. Testing w.Mine
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's testing that the code runs and doesn't wait forever or deadlock or anything, which is something. But agree need more testing here.

@@ -279,3 +121,337 @@ func TestCreateChallenge(t *testing.T) {
assert.Equal(decoded, r)
}
}

// Returns a ticket checking function that return true every third time
/*func everyThirdWinningTicket() func(_ []byte, _, _ int64) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? Commented out so you could use it in manual testing or something?

var mockSigner = types.NewMockSigner(ki)

func TestGenerate(t *testing.T) {
// TODO fritz use core.FakeActor for state/contract tests for generate:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about just TODO here :)

Copy link
Member

@whyrusleeping whyrusleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me, let's go go go

@ZenGround0 ZenGround0 force-pushed the feat/ec/8 branch 3 times, most recently from 3ff1247 to f05e271 Compare August 23, 2018 01:54
Before this mining was split roughly into the Worker and BlockGenerator
interfaces.  This commit changes this to a Scheduler and Worker interface.

Scheduler -- we are now creating a more complex mining strategy that
involves timing.  The Scheduler interface is where these strategies fit
into the mining code.  This interface has a Start() method that begins
mining and provides input and output channels to the rest of the system
just like the old Worker interface.  Schedulers schedule mining work for
a Worker.

Worker -- as before Worker is the thing that actually mines.  However in
this commit Worker is now seperate from the thing that Starts mining
(that is the Scheduler).  Workers do work with the interface Mine method.

BlockGenerator -- the BlockGenerator has been absorbed into the Worker.
Here is my rational.
1. This facilitates making the code easier to follow with respect to
@whyrusleeping's issues.  We are now one interface and one object fewer
in the mining code.
2. The seperation is no longer justified.  In my view now that the worker
is no longer responsible for scheduling, there is enough room in the
abstraction for block generation as a member function.  Looking back at
the code of Mine I was not convinced this interface is worth the trouble.
It seems likely that any alternate implementation of Generate would warrant
an alternate implementation of Mine.  I don't forsee lots of swapping
implementations across the BlockGenerator interface and don't think
testing Mine against an abstract Generate buys us that much
3. The seperation is actively getting in the way.  worker.Mine has been
using fake power values for a while.  The way to fix this is to use the
powerTableView and getStateTree dependencies used by the block generator.
It makes far more sense for one thing to share these dependencies than
to inject them into two things.  With this refactor we are much closer
to getting actual power values.

Lots of tests have changed but this commit has kept most of the old
behavioral coverage.  Moving the seam to Scheduler/Worker is about
as fine-grained as the old seam between Worker/BlockGenerator.
Lots of Scheduler tests are missing because the timingScheduler is still
broken.  Some of the behavior moved into the scheduler from the Mine method,
e.g. null block incrementing is not yet tested.
This is an attempt to drastically simplify the timingScheduler
Now the scheduler only has two states and only tracks a base input.
Nullblock increment is moved back to the worker.  This scheduler has a few
idiosyncracies but is simple to reason about and prevents mining processes
from getting interrupted midway through PoST generation.  New scheduler tests
added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants