Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InfluxDB unable to write data (localhost) with 25 tags and 300 fields. #5826

Closed
deepujain opened this issue Feb 25, 2016 · 27 comments
Closed

Comments

@deepujain
Copy link

One day of data: 74 files, 1000 points in each file, Each point has 25 tags and 300 fields

I was able to ingest 1 file correctly without any syntax errors.
However write of all files started to throw {"error":"timeout"}

InfluxD :
After a while it started to go crazy
Logs: http://pastebin.com/b2wJwFHT

I was planning to load 30 days of data. It did with speed and no errors on Druid + Imply, wanted to compare that with InfluxDB + Grafana. All of this installation is on local machine (Mac, with 24 GB RAM)

@jonseymour
Copy link
Contributor

How were you loading the data? How many points in each batch? How many batches in parallel?

@jonseymour
Copy link
Contributor

Ah, I see you were using curl. So, one curl command per file? How many curl commands in parallel?

@deepujain
Copy link
Author

All in sequence in a for loop. Am not testing write part now. I can afford
that to be slow but I want correct ingestion and fast query
On Wed, Feb 24, 2016 at 8:38 PM Jon Seymour notifications@github.com
wrote:

Ah, I see you were using curl. So, one curl command per file? How many
curl commands in parallel?


Reply to this email directly or view it on GitHub
#5826 (comment)
.

@deepujain
Copy link
Author

In addition that above question, how can i recover. InfluxDB wont start now.

filestore]2016/02/24 20:00:06 /Users/dvasthimal/.influxdb/data/mydb/default/28/000000003-000000002.tsm (#2) opened in 123.004398ms
[filestore]2016/02/24 20:00:14 /Users/dvasthimal/.influxdb/data/mydb/default/28/000000002-000000002.tsm (#0) opened in 8.452389905s
[filestore]2016/02/24 20:00:15 /Users/dvasthimal/.influxdb/data/mydb/default/28/000000003-000000001.tsm (#1) opened in 8.57616921s
[cacheloader] 2016/02/24 20:00:15 reading file /Users/dvasthimal/.influxdb/wal/mydb/default/28/_00025.wal, size 29333172
[cacheloader] 2016/02/24 20:00:17 reading file /Users/dvasthimal/.influxdb/wal/mydb/default/28/_00026.wal, size 22579152
[cacheloader] 2016/02/24 20:00:18 reading file /Users/dvasthimal/.influxdb/wal/mydb/default/28/_00027.wal, size 14073637.

.
...
...

@jonseymour
Copy link
Contributor

How big is each file? 300 fields per point seems quite high and it possible this might be a factor. How many of these are numeric, how many strings?

@deepujain
Copy link
Author

All fields (metrics) are numeric. (doubles mostly). dimensions (tags) are strings.

One day of data: 74 files, 1000 points in each file, Each point has 25 tags and 300 fields

@jonseymour
Copy link
Contributor

Could you post the output of:

find '/Users/dvasthimal/.influxdb/' -ls 

I am interested to see the total size of all the .wal files.

There is a possibility that you have run out of RAM.

I'll leave it for the influx guys to advise on a recovery approach.

@jonseymour
Copy link
Contributor

How big - in bytes - is each file?

@deepujain
Copy link
Author

10MB

@jonseymour
Copy link
Contributor

How many files are processed before you start seeing errors?

@deepujain
Copy link
Author

20

@jonseymour
Copy link
Contributor

Ok, here is a guess at what the problem is and perhaps a workaround, depending on whether you are in a position to rewrite the point tags.

This line of code:

https://github.com/influxdata/influxdb/blob/master/models/points.go#L331

use an insertion sort to sort tags. If your tags are not in go sort order, then there is a potentially O(n^2) sort for each point. If the tags were in go sort order then the cost of this sort will be ~ O(n). For n = 25, the difference between O(n^2) and O(n) is quite a lot.

(I am not 100% sure about this diagnosis - it is possible that something in the write path prior to this code normally sorts the tags in go sort order and so this isn't a factor, but it is worth testing if you are in position to do so).

updated: my fears about a potential O(n^2) issue are actually unfounded. Influx sorts the tags on the way in, so that when they hit the insertion sort later on it will execute with O(n) efficiency. Confirmed with a unit test.

@deepujain
Copy link
Author

What you mean by go sort ?
Do you want each row (point) to be sorted by multiple tags ? How can I do
that ?
On Wed, Feb 24, 2016 at 9:45 PM Jon Seymour notifications@github.com
wrote:

Ok, here is a guess at what the problem is and perhaps a workaround,
depending on whether you are in a position to rewrite the point tags.

This line of code:

https://github.com/influxdata/influxdb/blob/master/models/points.go#L331

use an insertion sort to sort tags. If your tags are not in go sort order,
then there is a potentially O(n^2) sort for each point. If the tags were in
go sort order then the cost of this sort will be ~ O(n). For n = 25, the
difference between O(n^2) and O(n) is quite a lot.

(I am not 100% sure about this diagnosis - it is possible that something
prior to write path in this code normally sorts the tag in go sort order
and so this isn't a factor, but it is worth testing if you are in position
to do so).


Reply to this email directly or view it on GitHub
#5826 (comment)
.

@jonseymour
Copy link
Contributor

Actually, it isn't going to help anyway because of the way influx makes points. [update: influx does the right thing and sorts tags on construction so that the binary form of the point is sorted in the optimal order ]

Are you in a position to build your own version of influx, or are you using a packaged version?

@deepujain
Copy link
Author

O/p of the command:
find '/Users/dvasthimal/.influxdb/' -ls

http://pastebin.com/QL6ZcKnw

@deepujain
Copy link
Author

I am using packaged version.

@deepujain
Copy link
Author

What is the maximum tested data ? # of tags, # of fields in a given measurement ?

@jonseymour
Copy link
Contributor

When you say it doesn't start, does it fail with an error message or do you kill it. How long do you wait? Can you paste the tail of the log file at the point the restart fails/is terminated? Is the system showing any evidence of high memory or CPU usage at the point the start fails/is terminated?

I'll have to defer to the influx support team regarding what they consider to be a reasonable number of tags and fields.

updated: to delete a question that is not relevant.

@jonseymour
Copy link
Contributor

@deepujain The O(n^2) issues I were worried about with point construction don't actually exist, verified both by closer inspection of the code and some actual unit tests. Still 300 fields is a lot of fields, so it could still be a factor.

@jonseymour
Copy link
Contributor

The number of wal logs seems quite high. How much disk space do you have free in your home directory?

@jonseymour
Copy link
Contributor

@deepujain According to this line:

https://github.com/influxdata/influxdb/blob/v0.10.1/tsdb/shard.go#L490-L492

influx only supports up to 255 fields per point. Perhaps violating this restriction is a contributing factor to your issues?

Updated: that's a restriction that only applies to b1 and bz1 engines - you are using tsm, so that shouldn't be an issue here.

@deepujain
Copy link
Author

I reduced number of fields to ~200. This times 95% of files got ingested. Rest has same error

Client Side: {"error":"timeout"}

Server Side:
[http] 2016/02/25 07:41:05 ::1 - - [25/Feb/2016:07:41:04 -0800] POST /write?db=ppw HTTP/1.1 204 0 - curl/7.43.0 2dfdca80-dbd6-11e5-805a-000000000000 1.394094614s
[http] 2016/02/25 07:41:11 ::1 - - [25/Feb/2016:07:41:05 -0800] POST /write?db=ppw HTTP/1.1 500 20 - curl/7.43.0 2ed6337a-dbd6-11e5-805b-000000000000 5.132936465s
[http] 2016/02/25 07:41:16 ::1 - - [25/Feb/2016:07:41:11 -0800] POST /write?db=ppw HTTP/1.1 500 20 - curl/7.43.0 31ea6c52-dbd6-11e5-805c-000000000000 5.171391369s
[http] 2016/02/25 07:41:20 ::1 - - [25/Feb/2016:07:41:16 -0800] POST /write?db=ppw HTTP/1.1 204 0 - curl/7.43.0 3506b06d-dbd6-11e5-805d-000000000000 4.353095492s

Is there a limit on number of fields (metrics) and what is tsm / b1 / bz1 ?
This is only 1 day of data for POC.

@deepujain
Copy link
Author

Influx QL is dead slow and disk starts spinning like crazy.

select * from ppw limit 10

Never saw the output

@deepujain
Copy link
Author

Better question:

I have a time series db that is updated once every day. It has around 30 dimensions (used as filters, that will match to tags) and 175 to 200 metrics (to perform aggregate functions and this will map to fields).

Each day around 100 to 150 files each with 1000 points will be ingested.

Ingestion time can take few minutes, but query time has to be order of few seconds.

I was able to do this with Druid (backend) and Imply (front end) but I like Grafana better, hence what to try out influxDB
Can influx DB support this kind of data ?

@jackzampolin
Copy link
Contributor

With 25 tags I would be surprised if your series cardinality was under 10 million. Influx should be able to handle this data given proper schema design.

@deepujain
Copy link
Author

What do you mean by proper schema design ? I will have one measurement with
25 tags and rest 200 as fields ?

Please share details on proper schema design
On Thu, Feb 25, 2016 at 11:13 AM Jack Zampolin notifications@github.com
wrote:

With 25 tags I would be surprised if your series cardinality
https://docs.influxdata.com/influxdb/v0.10/concepts/glossary/#series-cardinality
was under 10 million. Influx should be able to handle this data given
proper schema design.


Reply to this email directly or view it on GitHub
#5826 (comment)
.

@deepujain
Copy link
Author

Any suggestions here ? This issue was closed. https://influxdata.com/blog/announcing-influxdb-v0-10-100000s-writes-per-second-better-compression/ It clearly says that with columnar store there is no limit on number of fields ( 0 to 100 to 1000). However my use case does not seem to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants