Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage after select query #7607

Closed
RidgeA opened this issue Nov 8, 2016 · 6 comments
Closed

Memory usage after select query #7607

RidgeA opened this issue Nov 8, 2016 · 6 comments

Comments

@RidgeA
Copy link

RidgeA commented Nov 8, 2016

Hello.
I have database from older versions (we used 0.9 version). There are 2 databases. Total size in old format (b1 engine) is about 10Gb.
I converted (using influx_tsm tool) this databases to tms engine format an theirs size decreased to 2 Gb (Is it normal?).

After that InfluxDB service starts properly and can reseive INSERT requests (leastwise according to logs).

But after any SELECT request service stucks, and begin increase memory usage (first time when I saw it influxdb service occupied about 150 GB memory).

InfluxDB version - 1.0.2
OS - CentOS 7
Server has 32 cores and 128 GB RAM
Config:
influxdb.conf.txt

Can somebody explain whai is the reason of this behaviour?

Thank you!

This is after server started
goroutine_start.txt
heap_start.txt

This is after sending select request
goroutine.txt
goroutine_2.txt
heap.txt
heap_2.txt

@joshuajoh
Copy link

Seems similar to the issue I'm experiencing though my hardware is 1/4 of yours.

Querying a database with a series between 40-50K causes all memory to eventually be consumed even though there should be no issues < 1 million.

Out of curiosity, how big is your series that you are querying?

SELECT numSeries FROM "_internal".."database" WHERE time > now() - 10s GROUP BY "database" ORDER BY desc LIMIT 1

@jwilder
Copy link
Contributor

jwilder commented Nov 8, 2016

@RidgeA TSM has much better compression than the older b1/bz1 engines so a reduction in size is expected.

The heap profiles show the process consuming about 12GB of RSS. How were you determining memory usage?

Can you run the following when the heap is large and attach the output in a gist:

curl -o block.txt "http://localhost:8086/debug/pprof/block?debug=1" 
curl -o goroutine.txt "http://localhost:8086/debug/pprof/goroutine?debug=1" 
curl -o heap.txt "http://localhost:8086/debug/pprof/heap?debug=1" 
curl -o vars.txt "http://localhost:8086/debug/vars" 
iostat -xd 1 30 > iostat.txt
influx -execute "show shards" > shards.txt
influx -execute "show stats" > stats.txt
influx -execute "show diagnostics" > diagnostics.txt

@RidgeA
Copy link
Author

RidgeA commented Nov 9, 2016

@joshuajoh

> SELECT numSeries FROM "_internal".."database" WHERE time > now() - 10s GROUP BY "database" ORDER BY desc LIMIT 1
name: database
tags: database=positions_history
time            numSeries
----            ---------
1478674720000000000 231088

name: database
tags: database=main_page_performance
time            numSeries
----            ---------
1478674720000000000 0

name: database
tags: database=advertisementsHistory
time            numSeries
----            ---------
1478674720000000000 559811

name: database
tags: database=_internal
time            numSeries
----            ---------
1478674720000000000 734

I have 560k series in db to which I try make query.
Is it a big quantity ?
I have only 1 tag in each record - it is unique id in mysql. According to my retention policy records should be deleted after 6 month. But, after had read your comment, I found this #5092 and got to know that series are not drops even after the records was deleted according to retention policy and no one records in database, related to the series, are not left.

@jwilder

The heap profiles show the process consuming about 12GB of RSS. How were you determining memory usage?

htop
Before I started influxdb service.
_029
After start influxdb service
_030
After query (i should stop service because it will take all memory and i will have troubles with my server)
_031

Can you run the following when the heap is large and attach the output in a gist:

After starting service
start.tar.gz
After query
query.tar.gz

As I read, large amount of series could cause this problem.
Then I have 2 questions

  1. Can I somehow optimize database to my usecase - I store in influxDB history of changes in advertisements and I should rapidly search records related to advertisement by its id.
  2. Can i somehow export whole data from influxDB to csv,json or something like format to store data in another database ?

UPD.
Can I drop only series from measurement and keep data?
When i try DROP SERIES FROM it drop all my data.

@jwilder
Copy link
Contributor

jwilder commented Nov 17, 2016

Can you try the 1.1 release. There are several memory improvements in the release that may help your issue.

@RidgeA
Copy link
Author

RidgeA commented Nov 29, 2016

Hello. Sorry for late response.
I have tried 1.1 release, and it is much better.
Now I'm able to query some data, but only in small time interval - 1 day, for example.
While I request 10-days intervals, it's also consume all avaliable memory (plus swap).

@jwilder
Copy link
Contributor

jwilder commented Apr 6, 2017

1.2 has additional query memory improvements. If you are still experiencing issues with 1.2.2 or later, please log a new issue.

@jwilder jwilder closed this as completed Apr 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants