Skip to content
This repository has been archived by the owner on Nov 30, 2019. It is now read-only.

Stop relying on time #4

Closed
paddycarver opened this issue Oct 22, 2012 · 2 comments
Closed

Stop relying on time #4

paddycarver opened this issue Oct 22, 2012 · 2 comments
Assignees
Labels

Comments

@paddycarver
Copy link

Using real time in a distributed application is apparently a Very Bad Idea (tm).

Use Lamport timestamps or vector clocks for detecting race conditions.

Find some equivalent of CLOCK_MONOTONIC for Node.lastHeardFrom.

Fix the Cluster.sendToIP timers to not use real time.

Source: http://www.reddit.com/r/programming/comments/11sgc7/pastry_a_distributed_hash_table_in_go/c6pagbn

@paddycarver
Copy link
Author

Looks like the best bet for resolving conflicts when doing routing table updates is to use http://labix.org/govclock. It needs to be brought up to date to work with Go 1, and I need to find a way to send that patch to Gustavo to get it merged in. It seems simple enough (I have a quick and dirty version working now) but there are some unexplained discrepancies (an unknown Bug(string, interface{}) method in tests, mainly) that I'd like to look into a bit more before I feel confident it works as expected. Also need to decide what needs versioning in the vector clock--I'm tempted to say the state table changes for the node sending the state tables, but that could use a bit more thought.

Because lastHeardFrom doesn't actually impact the performance of Wendy (it's provided as a helper method for applications built on Wendy, etc.), I'm not sure the pain to implement a monotonic clock is worth the dubious gains. The method mainly exists for debugging purposes: "gee, I haven't heard from Node X in two days, that's probably why I'm not getting messages from it--it fell off the face of the earth." Worth fixing eventually, perhaps, but not a pressing concern.

Likewise, I'm not sure the timers should not use real time. That's standard Go practice, and I think the cases in which it will be problematic are fairly limited. The only thing I can think of is a clock jump occurring while a message is in transit, and the worst case scenario there is that 1) you get an error (not good, but the algorithm should be built to be error-resilient) or 2) you have a request that takes longer than it's supposed to be able to, which probably isn't the end of the world. Again, definitely a bug, but I'm not sure it's a high enough priority to focus on it now.

The vector clocks, however, should be implemented.

@paddycarver
Copy link
Author

This has been resolved as of the beta1 release. While timeout detection still depends on the local clock, that's the standard Go timeout practice, so I'm going to stick with it. That is the only place the system clock is used now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant