Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal error: runtime: cannot allocate memory during write operation #4548

Closed
priyadarsh opened this issue Oct 23, 2015 · 15 comments
Closed

fatal error: runtime: cannot allocate memory during write operation #4548

priyadarsh opened this issue Oct 23, 2015 · 15 comments

Comments

@priyadarsh
Copy link

  1. pre built version number along with commit number - 0.9.5-nightly-66a31ee
  2. OS version- Linux 3.10.0-229.14.1.el7.x86_64
  3. ls -lah on data//default and wal//default folders respectively on node 1.
drwx------ 4 influxdb influxdb   34 Oct 22 12:20 .
drwx------ 3 influxdb influxdb   20 Oct 21 22:34 ..
drwxrwxr-x 2 influxdb influxdb 4.0K Oct 23 11:34 4
drwxrwxr-x 2 influxdb influxdb  102 Oct 23 10:33 5

drwx------ 4 influxdb influxdb 34 Oct 22 12:20 .
drwx------ 3 influxdb influxdb 28 Oct 21 22:34 ..
drwx------ 2 influxdb influxdb 10 Oct 21 22:34 4
drwx------ 2 influxdb influxdb 10 Oct 22 12:20 5
  1. gist with config - using generated config - https://gist.github.com/314312035ba49ae4b4fa.git
  2. gist with the log - https://gist.github.com/c435e086e0fda92bd35a.git
  3. using tsm engine
  4. Writing with a java client @ 1500/s
  5. Data comprises of irregular time series data from application
  6. Mention cluster setup - 3 node cluster set up on 3 separate machines with same OS and hardware configurations
  7. machine hardware
Linux 3.10.0-229.14.1.el7.x86_64 #1 SMP Tue Sep 15 15:05:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

MemTotal:       98596756 kB

model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
model name  : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
  1. Viewed by grafana only
@priyadarsh
Copy link
Author

@priyadarsh
Copy link
Author

Runtime memstats of the node which is crashing
/debug/pprof/heap

https://gist.github.com/priyadarsh/22813b9e2698e83179dc#file-influxdb-heap-dump-node1

@priyadarsh
Copy link
Author

I tried again with latest nighlty build (0.9.5-nightly-827c513) and by enabling write trace.
Here is the log before the node crashed.

[tsm1wal] 2015/10/23 16:34:10 memory flush of /data/influxdb/.influxdb/data/travelguru/default/6 with 165675 keys and 328707 total values of 5259312 bytes
fatal error: runtime: cannot allocate memory

runtime stack:
runtime.throw(0xcf8540, 0x1f)
        /root/.gvm/gos/go1.5.1/src/runtime/panic.go:527 +0x90
runtime.persistentalloc1(0x4000, 0x8, 0x10ccb38, 0x1ada69)
        /root/.gvm/gos/go1.5.1/src/runtime/malloc.go:878 +0x2e3
runtime.persistentalloc.func1()
        /root/.gvm/gos/go1.5.1/src/runtime/malloc.go:831 +0x3b
runtime.systemstack(0x7f9d357f9c18)
        /root/.gvm/gos/go1.5.1/src/runtime/asm_amd64.s:278 +0xab
runtime.persistentalloc(0x4000, 0x0, 0x10ccb38, 0x1)
        /root/.gvm/gos/go1.5.1/src/runtime/malloc.go:832 +0x58
runtime.fixAlloc_Alloc(0x10b9018, 0x1)
        /root/.gvm/gos/go1.5.1/src/runtime/mfixalloc.go:67 +0xee
runtime.mHeap_AllocSpanLocked(0x10acd00, 0x1, 0x7f9d03ffff00)
        /root/.gvm/gos/go1.5.1/src/runtime/mheap.go:561 +0x1a7
runtime.mHeap_Alloc_m(0x10acd00, 0x1, 0x2, 0x7f9d03ffff00)
        /root/.gvm/gos/go1.5.1/src/runtime/mheap.go:425 +0x1ac
runtime.mHeap_Alloc.func1()
        /root/.gvm/gos/go1.5.1/src/runtime/mheap.go:484 +0x41
runtime.systemstack(0x7f9d357f9d58)
        /root/.gvm/gos/go1.5.1/src/runtime/asm_amd64.s:278 +0xab
runtime.mHeap_Alloc(0x10acd00, 0x1, 0x10000000002, 0x411404)
        /root/.gvm/gos/go1.5.1/src/runtime/mheap.go:485 +0x63
runtime.mCentral_Grow(0x10b42e8, 0x0)
        /root/.gvm/gos/go1.5.1/src/runtime/mcentral.go:190 +0x93
runtime.mCentral_CacheSpan(0x10b42e8, 0x7f9d03ffff00)
        /root/.gvm/gos/go1.5.1/src/runtime/mcentral.go:86 +0x4d4
runtime.mCache_Refill(0x7f9d59273000, 0xc900000002, 0x7f9d03ffff00)
        /root/.gvm/gos/go1.5.1/src/runtime/mcache.go:118 +0xcf
runtime.mallocgc.func2()
        /root/.gvm/gos/go1.5.1/src/runtime/malloc.go:611 +0x2b
runtime.systemstack(0xc820032a00)
        /root/.gvm/gos/go1.5.1/src/runtime/asm_amd64.s:262 +0x79
runtime.mstart()
        /root/.gvm/gos/go1.5.1/src/runtime/proc1.go:674

goroutine 2609 [running]:
runtime.systemstack_switch()
        /root/.gvm/gos/go1.5.1/src/runtime/asm_amd64.s:216 fp=0xc9fcd8cc88 sp=0xc9fcd8cc80
runtime.mallocgc(0x10, 0xa1cea0, 0xc900000000, 0xc9f41f3930)
        /root/.gvm/gos/go1.5.1/src/runtime/malloc.go:612 +0x842 fp=0xc9fcd8cd58 sp=0xc9fcd8cc88
runtime.newobject(0xa1cea0, 0xc9fe18ad20)
        /root/.gvm/gos/go1.5.1/src/runtime/malloc.go:760 +0x42 fp=0xc9fcd8cd80 sp=0xc9fcd8cd58
reflect.unsafe_New(0xa1cea0, 0xc9f41f3968)
        /root/.gvm/gos/go1.5.1/src/runtime/malloc.go:765 +0x21 fp=0xc9fcd8cd98 sp=0xc9fcd8cd80
reflect.Value.MapKeys(0xa193a0, 0xc987217bd0, 0xd5, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.5.1/src/reflect/value.go:1082 +0x1d4 fp=0xc9fcd8ce58 sp=0xc9fcd8cd98
encoding/json.(*mapEncoder).encode(0xc8200440e8, 0xcb1f5180b0, 0xa193a0, 0xc987217bd0, 0xd5, 0x1be15b00)
        /root/.gvm/gos/go1.5.1/src/encoding/json/encode.go:609 +0xd6 fp=0xc9fcd8cf48 sp=0xc9fcd8ce58
encoding/json.(*mapEncoder).(encoding/json.encode)-fm(0xcb1f5180b0, 0xa193a0, 0xc987217bd0, 0xd5, 0x0)
        /root/.gvm/gos/go1.5.1/src/encoding/json/encode.go:627 +0x51 fp=0xc9fcd8cf80 sp=0xc9fcd8cf48
encoding/json.(*structEncoder).encode(0xca55d0ccc0, 0xcb1f5180b0, 0xbb3080, 0xc987217bc0, 0xd9, 0x575100)
        /root/.gvm/gos/go1.5.1/src/encoding/json/encode.go:582 +0x2c0 fp=0xc9fcd8d128 sp=0xc9fcd8cf80
encoding/json.(*structEncoder).(encoding/json.encode)-fm(0xcb1f5180b0, 0xbb3080, 0xc987217bc0, 0xd9, 0xc987217b00)
        /root/.gvm/gos/go1.5.1/src/encoding/json/encode.go:596 +0x51 fp=0xc9fcd8d160 sp=0xc9fcd8d128
encoding/json.(*ptrEncoder).encode(0xc8200440f0, 0xcb1f5180b0, 0xbaa380, 0xc987217bc0, 0x16, 0x1be15a00)
        /root/.gvm/gos/go1.5.1/src/encoding/json/encode.go:704 +0xea fp=0xc9fcd8d1b0 sp=0xc9fcd8d160
encoding/json.(*ptrEncoder).(encoding/json.encode)-fm(0xcb1f5180b0, 0xbaa380, 0xc987217bc0, 0x16, 0xc96b8fd700)
        /root/.gvm/gos/go1.5.1/src/encoding/json/encode.go:709 +0x51 fp=0xc9fcd8d1e8 sp=0xc9fcd8d1b0
encoding/json.(*mapEncoder).encode(0xc8200441d8, 0xcb1f5180b0, 0xa17820, 0xc9eef0bf20, 0x15, 0x0)
        /root/.gvm/gos/go1.5.1/src/encoding/json/encode.go:617 +0x344 fp=0xc9fcd8d2d8 sp=0xc9fcd8d1e8
encoding/json.(*mapEncoder).(encoding/json.encode)-fm(0xcb1f5180b0, 0xa17820, 0xc9eef0bf20, 0x15, 0xc9eef0bf00)
        /root/.gvm/gos/go1.5.1/src/encoding/json/encode.go:627 +0x51 fp=0xc9fcd8d310 sp=0xc9fcd8d2d8
encoding/json.(*encodeState).reflectValue(0xcb1f5180b0, 0xa17820, 0xc9eef0bf20, 0x15)
        /root/.gvm/gos/go1.5.1/src/encoding/json/encode.go:297 +0x6b fp=0xc9fcd8d340 sp=0xc9fcd8d310
encoding/json.(*encodeState).marshal(0xcb1f5180b0, 0xa17820, 0xc9eef0bf20, 0x0, 0x0)
        /root/.gvm/gos/go1.5.1/src/encoding/json/encode.go:270 +0xa9 fp=0xc9fcd8d388 sp=0xc9fcd8d340
encoding/json.Marshal(0xa17820, 0xc9eef0bf20, 0x0, 0x0, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.5.1/src/encoding/json/encode.go:135 +0x83 fp=0xc9fcd8d3c0 sp=0xc9fcd8d388
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).writeSeries(0xc92a296360, 0xc9eef0bf20, 0x0, 0x0)
        /tmp/tmp.kvLu2bJaVO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:1590 +0x61 fp=0xc9fcd8d510 sp=0xc9fcd8d3c0
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).writeNewSeries(0xc92a296360, 0xcb6d4ba000, 0xc3e1, 0xf000, 0x0, 0x0)
        /tmp/tmp.kvLu2bJaVO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:1586 +0x14b fp=0xc9fcd8d580 sp=0xc9fcd8d510
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).convertKeysAndWriteMetadata(0xc92a296360, 0xc9ed6569f0, 0xcc335e0f30, 0xcb6d4ba000, 0xc3e1, 0xf000, 0x0, 0x0, 0x0, 0x0, ...)
        /tmp/tmp.kvLu2bJaVO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:816 +0x1a7 fp=0xc9fcd8d9e8 sp=0xc9fcd8d580
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write(0xc92a296360, 0xc9ed6569f0, 0xcc335e0f30, 0xcb6d4ba000, 0xc3e1, 0xf000, 0x0, 0x0)
        /tmp/tmp.kvLu2bJaVO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:401 +0x100 fp=0xc9fcd8db78 sp=0xc9fcd8d9e8
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Log).flush(0xcc18241600, 0x1, 0x0, 0x0)
        /tmp/tmp.kvLu2bJaVO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/wal.go:635 +0xb0b fp=0xc9fcd8df08 sp=0xc9fcd8db78
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Log).addToCache.func1(0xcc18241600)
        /tmp/tmp.kvLu2bJaVO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/wal.go:275 +0x31 fp=0xc9fcd8df88 sp=0xc9fcd8df08
runtime.goexit()
 /root/.gvm/gos/go1.5.1/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc9fcd8df90 sp=0xc9fcd8df88
created by github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Log).addToCache
        /tmp/tmp.kvLu2bJaVO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/wal.go:278 +0x153

@priyadarsh
Copy link
Author

Is this something related to go 1.5.1 (golang/go#12233) ?

@otoolep
Copy link
Contributor

otoolep commented Oct 23, 2015

I have never seen this error before with our software.

@DanielMorsing -- any ideas?

@jwilder jwilder changed the title Influxdb process crashing during write operation fatal error: runtime: cannot allocate memory during write operation Oct 23, 2015
@priyadarsh
Copy link
Author

Looks like I found something. Influxdb process is using up all of vm.max_map_count whose default value is 65530. It is throwing "cannot allocate memory" error after hitting this limit.
First I doubled it but in vain. Then I bumped it to vm.max_map_count = 1580000. Now the influxdb process is not crashing but it is nearing this limit as well. After running for around 10hrs, the used counts on all three machines is as follows (grep -c ^Size /proc/processid/smaps).
721627
262732
966316

@priyadarsh
Copy link
Author

The influxdb appears to be stable now. The only query remains is why I had to increase the vm.max_map_count to such a high value?

@DanielMorsing
Copy link
Contributor

The output from pmap -X pidof influxd`` could be helpful in figuring out what is causing all the mmaps to be created.

@priyadarsh
Copy link
Author

@DanielMorsing here you go. Shared via drop box as it is to big to share through gist.
https://www.dropbox.com/s/0u8fggpo8sifbo1/pmap.out.gz?dl=0

@priyadarsh
Copy link
Author

@DanielMorsing please let me know if you need anything else or if you are facing any problem downloading the file.

@jwilder
Copy link
Contributor

jwilder commented Oct 28, 2015

There is a fix for go 1.5.2 that might resolve this issue: https://go-review.googlesource.com/#/c/15191

@jwilder
Copy link
Contributor

jwilder commented Oct 28, 2015

Not sure when 1.5.2 is planned for release, but a workaround might be to build with go 1.4.2.

@otoolep
Copy link
Contributor

otoolep commented Oct 28, 2015

https://github.com/golang/go/milestones -- October 31st is the goal for the release of 1.5.2, it seems.

@grobian
Copy link

grobian commented Oct 30, 2015

Just in case I can help to debug this, I experienced the same problem sending in a copy of our graphite metrics generated by diamond.

-2

I'm running the influxdb-0.9.4.2-1.x86_64.rpm.

@jsternberg
Copy link
Contributor

@priyadarsh we have now moved the nightlies to Go 1.5.2. We think that's the issue so this issue should be fixed. If you find that you're still running into this issue with the latest nightly, please reopen so we can continue investigating.

Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants