-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial WAL implementation and tests #3569
Conversation
55a4443
to
3c988b8
Compare
DefaultSegmentSize = 2 * 1024 * 1024 // 2MB, default size of a segment file in a partitions | ||
DefaultReadySeriesSize = 32 * 1024 // 32KB, series is ready to flush once it reaches this size | ||
DefaultCompactionThreshold = 0.5 // flush and compact a partition once this ratio of keys are over the flush size | ||
DefaultMaxSeriesSize = 2 * 1024 * 1024 // if any key in a WAL partition reaches this size, we'll force compaction of the partition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put the comments above the identifiers? :)
Took a first pass, mostly very small stuff. I don't have the full picture yet for this, but tried to look for common mistakes we make during coding etc. Generally looks pretty clean to me though. Did have one real question about two |
Updated to fix @otoolep's comment about logging. I'm going to merge this in so I can cut RC1 since it's important to get this out for people to test against. However, keep making comments on this PR and I'll address in a separate one. |
Add initial WAL implementation and tests
@jhorwit2 just cut 0.9.3-rc1. Interested in hearing how it performs for you: https://groups.google.com/forum/#!topic/influxdb/T6mF4_WeNkM |
@pauldix the RPM link doesn't work :I I'll try to run some tests tomorrow. |
@jhorwit2 -- please try the link below for the RPM: https://s3.amazonaws.com/influxdb/influxdb-0.9.3_rc1-1.x86_64.rpm |
Thanks, are WAL flushes still suppose to cause latency in writes? |
Decided to give it a quick shot. I am noticing that WAL flushes are sometimes taking a while (like 5+ seconds) which causes writes to 500 timeout. Besides that issue i'm seeing better performance off the bat. |
Also, I noticed the config has |
@jhorwit2 you sure you're upgraded? What version does it say it is? WAL flushes shouldn't cause writes to timeout at all. Writes should still happen while flush and compact is running. If that changed then it has to be due to something I merged into the WAL branch earlier today, because I tested it extensively and definitely wasn't getting any delay due to WAL flush. The new location should have the WAL files, but it does a complete flush of a given partition if you haven't written data in 5 minutes. But there should still be directories in there for the partitions. |
@jhorwit2 actually, if you upgraded, but have old data around, then you're probably still writing to a |
I can confirm that the performance enhancement appears to be great. I'm currently writing 10k points/sec. peak without any issues (it appears Influx is sweating way less than before) and have done so for a few hours. Before the change, it would stop working latest after ~30 min. |
@kenzodeluxe What kind of hardware? Are you writing a single data point per request? I'd expect you to be able to get 10x that if you batch points together in requests in 1k to 10k batches. |
I did some preliminary testing this last weekend on your branch and was getting up to 50k/s with pretty standard points (few tags and 1 field float value).. This was on my home linux box so a 4 core, 16gb ram, ssd computer. I'm updating our test box (32gb ram, 16 core, ssd) this morning and i'll run more tests. @pauldix you were right about using the old b1. I made a new database and things seem better |
@pauldix This is right now a VM running on a PowerEdge R720 with 4GB memory and 4 cores (for testing); the requests are batched, but we had reduced the number of points per batch based on our past experience (with the actual hardware and running natively). With 90k entries/batch, I can see 3s time to write the data which is great. Once we're done with our preliminary testing we'll move this over to actual hardware (and I can share further details then). |
length := make([]byte, 8) | ||
for { | ||
// get the length of the compressed seriesAndFields blob | ||
_, err := f.Read(length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read()
can read less than len(length)
. I think you want to use io.ReadFull(f, length)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 370f008
@pauldix - I'm seeing some really weird things with queries. I'm getting non-deterministic results from my select queries. I sometimes get back something super small like 15 and sometimes i get 2000+ (i expect this #) It appears to only be in pretty recent data, which in our use case isn't good because we use this for near real-time insights, but the last few minutes of data is spotty. |
@pauldix - ran some tests with this script. Times are in milliseconds.
My rate was about ~18.40 requests per second over a minute which at a batch of 5k is ~92k points per second. 9.2 definitely couldn't handle this scale even with a tweaked config. This was on my home box which is a quad core, 16gb ram and ssd. (didn't get a chance to spin up a server today. hopefully will tomorrow) Here is 8k batches at same request rate per second. (so ~147k/s)
|
influxd was killed by OOM yesterday; I increased available memory to 64G, running stable since then. Had 5 timeouts on writes that took longer than 5s in a time frame with heavy querying from Grafana. Without that load, a batch of 10k points usually takes ~250ms to process (according to the logs). |
Posted this on the mailing list, re-posting here. In short, there are still write timeouts with many clients sending small-ish batches of metrics. To replicate write timeouts, the following load test client command can be used with influxdb graphite plugin listening on tcp port 2003. Graphite plugin batch size is 102400, though adjusting it makes no difference. # 100 persistent concurrent connections sending batches of 10 metrics with no delay between batches
$ python graphite_test_client.py -c 100 -m 10 -d 0 127.0.0.1 3600 Effect on influxdb:
To install load test client dependencies run Hope this helps. |
@pkittenis Are your test results from 0.9.3-rc1 or another version of the database? |
@beckettsean - |
This implements a WAL that is backed by the filesystem and disconnected from BoltDB.
Partitions
The WAL is broken into different partitions. The default number of partitions is 5. Each partition consists of a number of segment files. By default these files will get up to 2MB in size before a new segment file is opened. The files are numbered and start at 1. The number indicates the order in which the files should be read on startup to ensure data is recovered in the same order it was written.
Partitions are flushed and compacted individually. One of the goals with having multiple partitions was to be able to flush only a portion of the WAL at a time.
Compaction
The WAL does not flush everything in a partition when it comes time. It will only flush series that are over a given threshold (30kb by default). Here's the order of steps for compaction:
The reason for having a flush cache and the regular cache is so that queries can be answered while a compaction is occurring. Also, writes can come into the series that are in the middle of being flushed. They will simply get written into the new segment file and added to the in-memory cache for the next flush.
Task List
Cursor
on them