Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time drift on MacOS #2076

Closed
oojacoboo opened this issue Sep 20, 2017 · 57 comments
Closed

Time drift on MacOS #2076

oojacoboo opened this issue Sep 20, 2017 · 57 comments

Comments

@oojacoboo
Copy link

Version 17.06.2-ce-mac27 (19124)

Can confirm that there is time drift on the Docker client for Mac. I was able to confirm that restarting the Docker client corrects the time issue. Issue is present across all docker containers and docker run busybox date.

There are quite a few reports of this on previous versions. And it was supposedly fixed, but it is not.

The time being off here was only about a minute prior to restarting the Docker client. I cannot say how far it would have been off had the host been put to sleep. In fact, that may have triggered an update and corrected the issue.

It seems the VM clock emulation is just not accurate. Is the VM not using NTP? Can we not have a cron re-sync regularly, even without a Docker client interruption?

@djs55
Copy link
Contributor

djs55 commented Sep 21, 2017

Time is currently set by 2 mechanisms:

  • NTP via chronyd is supposed to keep things in sync with the host
  • when the host wakes from sleep -- and the clock is too out-of-sync for chronyd to correct -- there's an explicit hwclock -s

I'm not sure what's allowing it to drift in your case. Perhaps the machine wakes, and runs hwclock -s but chronyd has already given up trying to sync the clocks?

When next in this state, could you log into the VM with:

docker run --rm --net=host --pid=host --privileged -it justincormack/nsenter1 /bin/sh

and then query the NTP state with chronyc e.g.

/ # chronyc activity
200 OK
1 sources online
0 sources offline
0 sources doing burst (return to online)
0 sources doing burst (return to offline)
0 sources with unknown address

/ # chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^* 192.168.65.1                  2   6   377    53    +71us[ +120us] +/-   32ms

@oojacoboo
Copy link
Author

oojacoboo commented Sep 21, 2017

@djs55 sure - will do. Also, please note that my system never went to sleep. And, I know this to be the case because I just installed OS X 10.13 the day before. But, I also rarely put the system to sleep in general.

@jpz
Copy link

jpz commented Oct 5, 2017

I'm also seeing this - output captured below:

% docker --version
Docker version 17.09.0-ce, build afdb6d4
% docker run --rm --net=host --pid=host --privileged -it justincormack/nsenter1 /bin/sh
/ # chronyc activity
200 OK
1 sources online
0 sources offline
0 sources doing burst (return to online)
0 sources doing burst (return to offline)
0 sources with unknown address
/ # chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^? 192.168.65.1                  0   9     0     -     +0ns[   +0ns] +/-    0ns

27secs of drift:

%docker run --rm --net=host --pid=host --privileged -it justincormack/nsenter1 /bin/date ; date
Thu Oct  5 22:53:50 UTC 2017
Fri  6 Oct 2017 09:54:17 AEDT

@ddollar
Copy link

ddollar commented Oct 26, 2017

Seeing this as well

$ docker --version
Docker version 17.09.0-ce, build afdb6d4

$ date
Thu Oct 26 11:57:14 EDT 2017

$ docker run --rm --net=host --pid=host --privileged -it justincormack/nsenter1 /bin/date
Thu Oct 26 16:01:25 UTC 2017

$ docker run --rm --net=host --pid=host --privileged -it justincormack/nsenter1 /usr/bin/chronyc activity
200 OK
1 sources online
0 sources offline
0 sources doing burst (return to online)
0 sources doing burst (return to offline)
0 sources with unknown address

$ docker run --rm --net=host --pid=host --privileged -it justincormack/nsenter1 /usr/bin/chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^? 192.168.65.1                  0   5     0     -     +0ns[   +0ns] +/-    0ns

@felixbuenemann
Copy link

I'm also seeing considerable drift of several minutes on macOS 10.13.1 (17B46a). My system is always on, I only set the screens to sleep when I'm away from the computer.

$ docker --version
Docker version 17.09.0-ce, build afdb6d4

$ uname -a
Darwin couchbook.local 17.2.0 Darwin Kernel Version 17.2.0: Sun Oct  1 00:46:50 PDT 2017; root:xnu-4570.20.62~10/RELEASE_X86_64 x86_64

$ last reboot | head -n1
reboot    ~                         Sat Oct 28 20:24

$ uptime
16:17  up 5 days, 20:53, 1 user, load averages: 2,92 3,54 3,87

$ date -u ; docker run busybox date
Fr  3 Nov 2017 15:20:43 UTC
Fri Nov  3 15:26:26 UTC 2017

$ date -u; docker run --rm --net=host --pid=host --privileged -it justincormack/nsenter1 /sbin/hwclock -r
Fr  3 Nov 2017 15:25:11 UTC
Fri Nov  3 15:25:12 2017  0.000000 seconds

$ docker run --rm --net=host --pid=host --privileged -it justincormack/nsenter1 /usr/bin/chronyc activity
200 OK
1 sources online
0 sources offline
0 sources doing burst (return to online)
0 sources doing burst (return to offline)
0 sources with unknown address

$ docker run --rm --net=host --pid=host --privileged -it justincormack/nsenter1 /usr/bin/chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^? 192.168.65.1                  0  10     0     -     +0ns[   +0ns] +/-    0ns

I also noticed that the chrony drift file can't be written because of a missing directory:

$ docker run --rm --net=host --pid=host --privileged -it justincormack/nsenter1 /bin/sh

# cat /etc/chrony/chrony.conf
server 192.168.65.1 trust iburst minpoll 2
makestep 0.5 -1
keyfile /etc/chrony/chrony.keys
driftfile /var/lib/chrony/chrony.drift
cmdport 0

# stat /var/lib/chrony/chrony.drift
stat: can't stat '/var/lib/chrony/chrony.drift': No such file or directory

# ls /var/lib
dhcpcd  docker  misc

This probably doesn't matter much, since it only affects initial clock drift correction after chronyd restarts.

One interesting thing to note though is that the hwclock in the VM is not drifting, but the system time is drifting from RTC. This indicates that the clock provider is doing a bad job.

Currently it is using the TSC time source, it might make sense to experiment with other providers like hpet or acpi_pm. There are various post on the web with TSC causing trouble on HyperV, os it is likely it affects other hypervisors as well.

$ docker run --rm --net=host --pid=host --privileged -it justincormack/nsenter1 /bin/sh

# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

# cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc hpet acpi_pm

I'll try experimenting with hpet first and see if that helps and try acpi_pm next.

The clocksource can be changed at runtime:

$ docker run --rm --net=host --pid=host --privileged -it justincormack/nsenter1 /bin/sh

# echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource

# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet

If your VMs system time currently drifts from the rtc, sync it back after switching the clocksource:

# hwclock --hctosys

I found lots of useful info under the following links:

@OscarBarrett
Copy link

Same here, with a drift of 16 minutes. Fixed by restarting Docker.

$ docker --version                                                                                                                                                               
Docker version 17.09.0-ce, build afdb6d4

$ docker run --rm --net=host --pid=host --privileged -it justincormack/nsenter1 /bin/sh
/ # chronyc activity
200 OK
1 sources online
0 sources offline
0 sources doing burst (return to online)
0 sources doing burst (return to offline)
0 sources with unknown address

/ # chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^? 192.168.65.1                  0  10     0     -     +0ns[   +0ns] +/-    0ns

/ # date
Thu Nov 23 02:35:33 UTC 2017
/# exit

$ date                                                                                                                                                                           
Thu 23 Nov 2017 10:19:57 AWST

@felixbuenemann
Copy link

Btw. none of the available clocksources fixed the time drift problem for me.

@ggascoigne
Copy link

I'm seeing a pretty consistent drift of about 1-2ms per second. This is pretty consistent. Resetting the clock time just changes the time, it doesn't address the drift.

@adrienjoly
Copy link

Same problem on Version 17.09.1-ce-mac42 (21090), Channel: stable, 3176a6af01.

My PostgreSQL container's clock gains several seconds in the future every time I wake up my MacBook Pro from sleep. I love Docker, but having to restart it so often is really annoying...

Let me know if I can help in any way.

@buchs
Copy link

buchs commented Dec 23, 2017

@adrienjoly - here is a tip: docker run --rm --privileged alpine hwclock -s will reset the time in the docker server. No need to stop and restart.

@adrienjoly
Copy link

@buchs Thanks for the tip! But isn't there a risk of altering data integrity of my PostgreSQL database running on that container, if I change the time without rebooting?

@kbuchsicf
Copy link

Hmm, interesting question. If you are timestamping entries, and you turn the docker server clock back, you might have issues. What else can you do in a database to cause problems? Constraints based on time? Maybe the log would turn out funny, if you needed to do a recovery. It all depends on what and how you use the database, I suppose.

@adrienjoly
Copy link

You're right, changing the time will cause wrong timestamps in our data. That's not that bad for a development environment.
I was mostly worried about PostgreSQL's internal behavior when changing the time back to the past. (i.e. impact on caching, handling of atomic transactions, etc...)

@hughsw
Copy link

hughsw commented Jan 16, 2018

I'm seeing the same issue using a Python image. Note that some servers refuse to authenticate my client due to the time disparity! At present a 13 second disparity:

$ docker --version
Docker version 17.12.0-ce, build c97c6d6

$ docker run -it --rm python:3-jessie python -c 'import time;print(time.time())' && python3 -c 'import time;print(time.time())'
1516141290.7820508
1516141277.759674

The hwclock workaround does work for me.

@ntodd
Copy link

ntodd commented Jan 20, 2018

My clock is drifting roughly 2.5 seconds every minute. hwclock can sync it back up, but resetting, reinstalling, nothing seems to resolve the persistent drift.

$ docker --version
Docker version 17.12.0-ce, build c97c6d6

@adrienjoly
Copy link

Here is the bash script I use on my Mac to restart Docker when needed:

echo "Stopping Docker containers..."
docker-compose down

echo "Stopping Docker gracefully..."
test -z "$(docker ps -q 2>/dev/null)" && osascript -e 'quit app "Docker"'

echo "Starting Docker..."
open --background -a Docker

echo "Waiting 30 seconds..."
sleep 30

echo "Starting Docker containers..."
docker-compose up -d

@thinkspill
Copy link

Also seeing very heavy drift.

Docker version 18.01.0-ce, build 03596f5
High Sierra 10.13.2 (17C2120)

@ddollar
Copy link

ddollar commented Jan 24, 2018

Recent versions of Docker for Mac have removed easy access to the "restart" button which was useful for fixing this

@stromnet
Copy link

stromnet commented Jan 25, 2018

After an upgrade from 17.09.0-ce-mac35 (19611) to 17.12.0-ce-mac49 (21995) yesterday, I've started noticing heavy drifting on Mac OS 10.11.6. The only other change was that the host machine was rebooted just before the upgrade (but no OS patches applied).

I'm developing a time sensitive application which immediately starts triggering protection mechanisms if time is more than 5 seconds away from an external source.
Today it has happened twice that this has kicked in. It has never happened before (for several versions back).

It could be fixed by running hwclock -s as shown in earlier posts, but this absolutely seems like some kind of regression.

Edit: First noticed it ~09:25 today, at which time it had ~30s drift, corrected with hwclock -s Now at ~14:30 (approx 5h) it had accumulated another 6s of drift. Machine has been on all the time, docker VM as well. Containers have been restarted & recreated a few times.

@thinkspill
Copy link

I upgraded another computer that previously wasn't showing drift to 17.12.0-ce-mac, and then it too started showing drift.

It's drifting in a really strange way too. Try this:

  1. Sync your clock with docker run --rm --privileged alpine hwclock -s
  2. Open two terminal windows side by side, one on host, one in any container.
  3. Run this in each terminal, starting them as closely together as possible:
while true; do date; sleep 10; done
  1. Initially you should see each date output at the same time, with matching times, within a second or so, depending on how quickly you started each command.

  2. On my system, here's where it gets weird. The outputs will slowly become out-of-sync as to when the output appears, but the date/time that is output always matches!

For example, after a while my host will output: Thu Jan 25 10:00:24 PST 2018 ... and then some seconds later the container will output: Thu Jan 25 10:00:24 PST 2018

Same times, but the container is delayed ever longer as time goes on. The times will always match, but the container falls further and further into the past.

It seems like time is moving more slowly in Containerlandia.

@hughsw
Copy link

hughsw commented Jan 25, 2018

I've attached a bash script, dfm-drift.txt that demonstrates the problem on my Mac. The script uses two Python processes to demonstrate the problem by interleaving timestamps from the Mac host and Docker guest.

$ docker --version
Docker version 17.12.0-ce, build c97c6d6

$ sw_vers
ProductName:    Mac OS X
ProductVersion: 10.13.3
BuildVersion:   17D47

Here's most of the commentary from the script. This commentary includes actual output from my MacBook Pro showing at least a second of drift within an hour of the synchronization step.

# This script starts two processes, one on the host and one in a
# Docker container.  Each of these prints out a timestamp each second,
# beginning shortly after the process began.  Each process endeavors to
# print the timestamp immediately after the Unix clock has rolled over
# a second.  The fractional part of each timestamp indicates the
# inevitable, but irrelevant small delay between waking up and
# sampling the clock.
#
# What am I looking for?
#
# In the beginning, the timestamps will come in interleaved pairs with
# the same whole seconds, e.g.:
#
#   $ bash dfm-drift.txt
#
#   Docker for Mac drift demonstrator.
#   See: https://github.com/docker/for-mac/issues/2076
#
#   host : 1.005
#   guest: 1.005
#   host : 2.000
#   guest: 2.002
#   host : 3.001
#   guest: 3.004
#   host : 4.001
#   guest: 4.002
#   host : 5.001
#   guest: 5.002
#
# If there is significant drift accumulating between the clocks,
# eventually the same whole second timestamps will not come in pairs,
# e.g.
#   host : 3597.002
#   guest: 3599.006
#   host : 3598.005
#   guest: 3600.003
#   host : 3599.005
#   guest: 3601.005
#   host : 3600.001
#   guest: 3602.006
#   host : 3601.003
#   guest: 3603.002
#
# In the above you can see that guest printed out 3600.003 more than a
# second before host printed out 3600.001

Perhaps a simpler way to say that is to note that guest emitted timestamp 3601.005 before host emitted 3600.001, so guest is at least a second ahead of host.

Here's the script, so named so that I could attach it: dfm-drift.txt
Look it over and run as:

$ bash dfm-drift.txt

@thinkspill
Copy link

thinkspill commented Jan 25, 2018

@hughsw I'm seeing something similar, when I add an iteration count so I can match the timestamps:

export i=0 && while ((i++)); :; do echo $i; date; sleep 10; done

I'm seeing two weird things:

  1. The container iteration will eventually start printing before the host iteration prints.
  2. The container time will be drifting behind the host time.

Like so:

Container iteration 1: Thu Jan 25 12:48:09 PST 2018
(no delay)
Host iteration 1: Thu Jan 25 12:48:09 PST 2018

....time passes....

Container iteration 640: Thu Jan 25 14:34:11 PST 2018
[......delay......]
Host on iteration 640: Thu Jan 25 14:34:14 PST 2018

Watching the system clock, the host is outputting at the correct time.

@hughsw
Copy link

hughsw commented Jan 25, 2018

@thinkspill Your script doesn't execute exactly every 10 seconds, so you have to be very careful about drawing conclusions that may assume that it does execute exactly every 10 seconds. That is, the wall-clock time it takes for the loop's scripting logic to execute and for date to run is not controlled. And, the semantics of sleep are very coarse (and may be the dominant noise). So, I don't think your point (2) is a valid conclusion.

I wrote the Python script in my test above to wait until the Unix clock rolls over each second before sampling the clock and emitting it. This makes it a little easier to draw conclusions about the drift.

@stromnet
Copy link

stromnet commented Jan 26, 2018

Another test:

#!/bin/sh
#set -x

[ -x /usr/sbin/ntpdate ] || yum install -y ntpdate
[ -x /usr/bin/bc ] || yum install -y bc
NTP_SERVER=${1:-pool.ntp.org}

hwclock -s
INIT_TS=$(date +%s%3N)
while true; do 
	NOW=$(date +%s%3N)
	DIFF=$((NOW-INIT_TS))
	echo "since initial sync: ${DIFF}ms. Comparing:"
	ntpdate -qs $NTP_SERVER
	NOW2=$(date +%s%3N)
	SLEEP=$(bc -l <<< "(60000 - ($NOW2 - $NOW))/1000.0")
	sleep $SLEEP
done

Executed in a docker run --rm -it --privileged centos:7. It has been running for a few hours now:

[root@cac530f72970 /]# ./clock.sh
since initial sync: 1ms. Comparing:
server 62.242.89.196, stratum 1, offset 0.061091, delay 0.03651
server 82.100.248.10, stratum 2, offset 0.058634, delay 0.05911
server 91.206.8.36, stratum 2, offset 0.062463, delay 0.06264
server 188.227.227.31, stratum 3, offset 0.062852, delay 0.06487
since initial sync: 60004ms. Comparing:
server 195.22.17.7, stratum 3, offset 0.020723, delay 0.08838
server 178.218.172.164, stratum 2, offset 0.018149, delay 0.08115
server 131.188.3.221, stratum 1, offset 0.020063, delay 0.05389
server 91.192.36.161, stratum 2, offset 0.019926, delay 0.04819
...
since initial sync: 6060384ms. Comparing:
server 91.206.8.36, stratum 2, offset -1.893754, delay 0.06270
server 51.15.2.109, stratum 2, offset -1.889942, delay 0.04703
server 37.59.27.22, stratum 2, offset -1.890578, delay 0.06775
server 196.10.52.57, stratum 2, offset -1.889525, delay 0.23308
...
since initial sync: 9960620ms. Comparing:
server 46.175.224.7, stratum 3, offset -3.028879, delay 0.06477
server 27.124.125.251, stratum 2, offset -3.028504, delay 0.06219
server 217.147.208.1, stratum 2, offset -3.028060, delay 0.05672
server 88.198.35.53, stratum 2, offset -3.028581, delay 0.05347

So after 101 minutes, it's -1.88s off
After 166 minutes, -3.02s.
Will keep it running during the day

Edit:
after 10.4 hours: -12s off.

since initial sync: 37562227ms. Comparing:
server 91.121.154.174, stratum 2, offset -12.843266, delay 0.05824
server 176.9.144.121, stratum 2, offset -12.845264, delay 0.05408
server 91.189.182.32, stratum 2, offset -12.846145, delay 0.04349
server 82.78.227.6, stratum 2, offset -12.849203, delay 0.09723

Edit 2:
Kept running during weekend. After approx 27 hours:

since initial sync: 96965600ms. Comparing:
server 62.149.0.30, stratum 1, offset -33.038150, delay 0.09132

Then, for some reason, it started to log 0.00000s offset.. until it after a while started spewing out ./clock.sh: line 16: cannot create temp file for here-document: No space left on device. And after yet a while, Docker daemon crashed, which I discovered now after weekend.
The system.log has a bunch of Docker[905]: Hostnet_udp.input: bind raised Host.Sockets.Too_many_connections, not sure if it is related or not though.

@xoba
Copy link

xoba commented May 14, 2018

@djs55, i'm trying it now, thanks!

@xoba
Copy link

xoba commented May 15, 2018

@djs55 so far so good on development build mentioned above, will try out some other use cases today just to see how broadly functional this fix is. thanks again!

@oojacoboo
Copy link
Author

@djs55 looking good here, running for about 24 hours now.

[jacob:/api]$ docker exec api-php date && date                                   (development✱)
Tue May 15 18:57:26 UTC 2018
Tue May 15 14:57:26 EDT 2018

@xoba
Copy link

xoba commented May 16, 2018

i would say this solves it for me, looking forward to when it lands in the official release, thanks @djs55! best, mike.

@timkeeler
Copy link

Is this slated for this quarter's stable release?

@djs55
Copy link
Contributor

djs55 commented Jun 8, 2018

@timkeeler the fix should be in edge 18.05.0-ce-mac66 and later and should also be in the next stable. Hope this helps!

@tmack8001
Copy link

I hope so and how much this is a problem for me might just jump to a unstable version to get around these issues.

@oojacoboo
Copy link
Author

@tmack8001 been running this for about 3 weeks or so now and it seems to have fully resolved the issue. I'm going ahead and closing this.

@ThatLarsGuy
Copy link

Any idea when this fix is going to hit a stable release? Right now seems to be the longest period in Docker For Mac's history without a stable release!

@djs55
Copy link
Contributor

djs55 commented Jul 9, 2018

@ThatLarsGuy the next stable Desktop release should coincide with the 18.06 engine release. The edge channel currently has 18.06.0-ce-rc1 but this isn't the final release candidate. So it should be soon, depending on whether serious bugs are found in the next release candidate(s).

@eshork
Copy link

eshork commented Jul 17, 2018

Not to kick a dead horse (but also, I will, because why else would I be on this thread). This timing bug is a bit long-lived at this point; seriously, it is years long and there's issue history to back that up. Let's get it released already! 🚀

@eshork
Copy link

eshork commented Jul 18, 2018

This is enough of a problem for me that I ended up making a docker image to address it until a real fix gets released. Still in the middle of testing it myself, but so far it seems to work well for me: https://hub.docker.com/r/eshork/docker-mac-chrony/

It's a really simple solution. I'm honestly not sure why this doesn't already exist somewhere else. Perhaps I simply didn't stumble across the appropriate google result...

Either way, if it helps one other person, that makes it worth while.

@shadowspawn
Copy link

Release notes for Docker for Mac stable release, version 18.06.0-ce-mac70 (26399) say

Bug fixes and minor changes:

  • Use Simple NTP to minimise clock drift between the VM and the host. Fixes docker/for-mac#2076

@djs55
Copy link
Contributor

djs55 commented Jul 26, 2018

@JohnRGee thanks for highlighting this -- the fix has indeed been released on both stable and edge channels. Hopefully this is fixed for everyone. If not, please do open a fresh ticket with a set of diagnostics.

Thanks all for your patience!

@docker-robott
Copy link
Collaborator

Closed issues are locked after 30 days of inactivity.
This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle locked

@docker docker locked and limited conversation to collaborators Jun 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests