Memory usage after select query #7607

RidgeA · 2016-11-08T16:46:09Z

Hello.
I have database from older versions (we used 0.9 version). There are 2 databases. Total size in old format (b1 engine) is about 10Gb.
I converted (using influx_tsm tool) this databases to tms engine format an theirs size decreased to 2 Gb (Is it normal?).

After that InfluxDB service starts properly and can reseive INSERT requests (leastwise according to logs).

But after any SELECT request service stucks, and begin increase memory usage (first time when I saw it influxdb service occupied about 150 GB memory).

InfluxDB version - 1.0.2
OS - CentOS 7
Server has 32 cores and 128 GB RAM
Config:
influxdb.conf.txt

Can somebody explain whai is the reason of this behaviour?

Thank you!

This is after server started
goroutine_start.txt
heap_start.txt

This is after sending select request
goroutine.txt
goroutine_2.txt
heap.txt
heap_2.txt

joshuajoh · 2016-11-08T18:04:08Z

Seems similar to the issue I'm experiencing though my hardware is 1/4 of yours.

Querying a database with a series between 40-50K causes all memory to eventually be consumed even though there should be no issues < 1 million.

Out of curiosity, how big is your series that you are querying?

SELECT numSeries FROM "_internal".."database" WHERE time > now() - 10s GROUP BY "database" ORDER BY desc LIMIT 1

jwilder · 2016-11-08T18:58:16Z

@RidgeA TSM has much better compression than the older b1/bz1 engines so a reduction in size is expected.

The heap profiles show the process consuming about 12GB of RSS. How were you determining memory usage?

Can you run the following when the heap is large and attach the output in a gist:

curl -o block.txt "http://localhost:8086/debug/pprof/block?debug=1" 
curl -o goroutine.txt "http://localhost:8086/debug/pprof/goroutine?debug=1" 
curl -o heap.txt "http://localhost:8086/debug/pprof/heap?debug=1" 
curl -o vars.txt "http://localhost:8086/debug/vars" 
iostat -xd 1 30 > iostat.txt
influx -execute "show shards" > shards.txt
influx -execute "show stats" > stats.txt
influx -execute "show diagnostics" > diagnostics.txt

RidgeA · 2016-11-09T07:49:29Z

@joshuajoh

> SELECT numSeries FROM "_internal".."database" WHERE time > now() - 10s GROUP BY "database" ORDER BY desc LIMIT 1
name: database
tags: database=positions_history
time            numSeries
----            ---------
1478674720000000000 231088

name: database
tags: database=main_page_performance
time            numSeries
----            ---------
1478674720000000000 0

name: database
tags: database=advertisementsHistory
time            numSeries
----            ---------
1478674720000000000 559811

name: database
tags: database=_internal
time            numSeries
----            ---------
1478674720000000000 734

I have 560k series in db to which I try make query.
Is it a big quantity ?
I have only 1 tag in each record - it is unique id in mysql. According to my retention policy records should be deleted after 6 month. But, after had read your comment, I found this #5092 and got to know that series are not drops even after the records was deleted according to retention policy and no one records in database, related to the series, are not left.

@jwilder

The heap profiles show the process consuming about 12GB of RSS. How were you determining memory usage?

htop
Before I started influxdb service.

After start influxdb service

After query (i should stop service because it will take all memory and i will have troubles with my server)

Can you run the following when the heap is large and attach the output in a gist:

After starting service
start.tar.gz
After query
query.tar.gz

As I read, large amount of series could cause this problem.
Then I have 2 questions

Can I somehow optimize database to my usecase - I store in influxDB history of changes in advertisements and I should rapidly search records related to advertisement by its id.
Can i somehow export whole data from influxDB to csv,json or something like format to store data in another database ?

UPD.
Can I drop only series from measurement and keep data?
When i try DROP SERIES FROM it drop all my data.

jwilder · 2016-11-17T00:15:39Z

Can you try the 1.1 release. There are several memory improvements in the release that may help your issue.

RidgeA · 2016-11-29T18:01:22Z

Hello. Sorry for late response.
I have tried 1.1 release, and it is much better.
Now I'm able to query some data, but only in small time interval - 1 day, for example.
While I request 10-days intervals, it's also consume all avaliable memory (plus swap).

jwilder · 2017-04-06T21:27:26Z

1.2 has additional query memory improvements. If you are still experiencing issues with 1.2.2 or later, please log a new issue.

jwilder added area/performance need more info labels Nov 8, 2016

jwilder closed this as completed Apr 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory usage after select query #7607

Memory usage after select query #7607

RidgeA commented Nov 8, 2016

joshuajoh commented Nov 8, 2016

jwilder commented Nov 8, 2016

RidgeA commented Nov 9, 2016 •

edited

Loading

jwilder commented Nov 17, 2016

RidgeA commented Nov 29, 2016

jwilder commented Apr 6, 2017

Memory usage after select query #7607

Memory usage after select query #7607

Comments

RidgeA commented Nov 8, 2016

joshuajoh commented Nov 8, 2016

jwilder commented Nov 8, 2016

RidgeA commented Nov 9, 2016 • edited Loading

jwilder commented Nov 17, 2016

RidgeA commented Nov 29, 2016

jwilder commented Apr 6, 2017

RidgeA commented Nov 9, 2016 •

edited

Loading