Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old fields and tags show up after dropping measurement and rewriting #10052

Closed
cheribral opened this issue Jul 6, 2018 · 60 comments
Closed

Old fields and tags show up after dropping measurement and rewriting #10052

cheribral opened this issue Jul 6, 2018 · 60 comments
Assignees

Comments

@cheribral
Copy link

Using version influxdb-1.5.2, if I drop a measurement, and then write again using the same name, the measurement is recreated but it contains all the tags and keys from the previous measurement.

I noticed this after writing a measurement with an identifier as a field, then deciding to make it a tag. I could no longer query the data without a ::tag cast because the database still retained the old field key. I started playing around with arbitrary keys and values, and dropping the measurement. Every thing I've sent is retained across 'drop measurement's

I've even tried to drop the measurement, stop the database and rebuild the index, but this doesn't work either.

@cheribral cheribral changed the title Old fileds and tags show up after dropping measurement and rewriting Old fields and tags show up after dropping measurement and rewriting Jul 6, 2018
@e-dard
Copy link
Contributor

e-dard commented Jul 9, 2018

@cheribral could you provide some steps for us to reproduce this issue? Which index type are you using?

@cheribral
Copy link
Author

This is using the disk based tsi1 index.
This came right after I deleted the measurement and let it sit for a day before writing again.

I went back to make a test measurement to copy and paste the steps, and I can't reproduce it for some reason. I have no idea what the difference is other than that I don't have writes coming in to the measurement while I delete.
I'll see if I can get it happen again tomorrow.

@e-dard
Copy link
Contributor

e-dard commented Jul 10, 2018

@cheribral thanks, we will need example data and steps so we can follow along and understand the issue better.

@dustin96080
Copy link

dustin96080 commented Jul 13, 2018

@e-dard
Im having the same issue. Im using version: 1.5.2. Here is an example:
> drop measurement memory
>show field keys from memory Returns nothing
>select * from memory Returns nothing

> insert memory,host=test,type=memory value=0
> show field keys from memory
name: memory
fieldKey       fieldType
--------       ---------
buffered       float
cached         float
free           float
heap_usage     float
non_heap_usage float
slab_recl      float
slab_unrecl    float
used           float
value          float
> select * from memory limit 10
name: memory
time                buffered cached free heap_usage host non_heap_usage slab_recl slab_unrecl type   used value
----                -------- ------ ---- ---------- ---- -------------- --------- ----------- ----   ---- -----
1531445331027521830                                 test                                      memory      0

All these fields shown are old fields I am not using anymore but I can't seem to get them to go away. I have gotten them to go away before by running this:
nflux_inspect buildtsi -database graphite -datadir /var/lib/influxdb/data/ -waldir /var/lib/influxdb/wal/
This is not fixing this issue this time.

Hope this helps.

@e-dard
Copy link
Contributor

e-dard commented Jul 13, 2018

Thanks @dustin96080,

do you have a set of data to insert that will reproduce this bug?

@e-dard
Copy link
Contributor

e-dard commented Jul 13, 2018

Also, were you using TSI previous to 1.5.2?

@dustin96080
Copy link

dustin96080 commented Jul 13, 2018

@e-dard We have been using TSI from the beginning (about 1 year). Not sure how much data you want or how i would get that to you as i don't want it public. I have included a small subset of the data.

memory,host=dustintest01,type=memory buffered=4.272128e+06 1531180067000000000
memory,host=dustintest01,type=memory buffered=4.272128e+06 1531180127000000000
memory,host=dustintest01,type=memory cached=9.96225024e+08 1531179767000000000
memory,host=dustintest01,type=memory cached=9.9631104e+08 1531179827000000000
memory,host=dustintest01,type=memory cached=9.9631104e+08 1531179887000000000
memory,host=dustintest01,type=memory cached=9.9631104e+08 1531179947000000000
memory,host=dustintest01,type=memory cached=9.9631104e+08 1531180007000000000
memory,host=dustintest01,type=memory cached=9.9631104e+08 1531180067000000000
memory,host=dustintest01,type=memory cached=9.9631104e+08 1531180127000000000
memory,host=dustintest01,type=memory free=1.43478784e+09 1531179767000000000
memory,host=dustintest01,type=memory free=1.572323328e+09 1531179827000000000
memory,host=dustintest01,type=memory free=1.572294656e+09 1531179887000000000
memory,host=dustintest01,type=memory free=1.57231104e+09 1531179947000000000
memory,host=dustintest01,type=memory free=1.57231104e+09 1531180007000000000
memory,host=dustintest01,type=memory free=1.572327424e+09 1531180067000000000
memory,host=dustintest01,type=memory free=1.571733504e+09 1531180127000000000
memory,host=dustintest01,type=memory slab_recl=1.668096e+08 1531179767000000000
memory,host=dustintest01,type=memory slab_recl=1.66854656e+08 1531179827000000000
memory,host=dustintest01,type=memory slab_recl=1.66854656e+08 1531179887000000000
memory,host=dustintest01,type=memory slab_recl=1.66854656e+08 1531179947000000000
memory,host=dustintest01,type=memory slab_recl=1.66854656e+08 1531180007000000000
memory,host=dustintest01,type=memory slab_recl=1.66854656e+08 1531180067000000000
memory,host=dustintest01,type=memory slab_recl=1.66846464e+08 1531180127000000000
memory,host=dustintest01,type=memory slab_unrecl=4.1725952e+07 1531179767000000000
memory,host=dustintest01,type=memory slab_unrecl=4.1197568e+07 1531179827000000000
memory,host=dustintest01,type=memory slab_unrecl=4.093952e+07 1531179887000000000
memory,host=dustintest01,type=memory slab_unrecl=4.0833024e+07 1531179947000000000
memory,host=dustintest01,type=memory slab_unrecl=4.0833024e+07 1531180007000000000
memory,host=dustintest01,type=memory slab_unrecl=4.0833024e+07 1531180067000000000
memory,host=dustintest01,type=memory slab_unrecl=4.0833024e+07 1531180127000000000
memory,host=dustintest01,type=memory used=4.396711936e+09 1531179767000000000
memory,host=dustintest01,type=memory used=4.25957376e+09 1531179827000000000
memory,host=dustintest01,type=memory used=4.25986048e+09 1531179887000000000
memory,host=dustintest01,type=memory used=4.259950592e+09 1531179947000000000
memory,host=dustintest01,type=memory used=4.259950592e+09 1531180007000000000
memory,host=dustintest01,type=memory used=4.259934208e+09 1531180067000000000
memory,host=dustintest01,type=memory used=4.26053632e+09 1531180127000000000

@shakefu
Copy link

shakefu commented Jul 27, 2018

Also seeing this issue on Influx Cloud 1.5.3-c1.5.3.

@cha87de
Copy link

cha87de commented Oct 6, 2018

I'm running into this behavior of influxdb as well. I'm using logstash to write json into an influx measurement. I'm actually only trying to change the field type from float to int - since this seems to be impossible I tried to DROP MEASUREMENT ... - but the field with the old type reappears. Seems I'm going from one issue to another :-( I'm using the docker image influxdb:1.4-alpine.

@sada-narayanappa
Copy link

Same issue for me - why can't we drop and wipe out a measurement; Influxdb - why are you caching the OLD fields .... grr ....

If I insert the same data into a new table, insert goes through; Otherwise I get this error:

influxdb.exceptions.InfluxDBClientError: 400: {"error":"partial write: field type conflict: input field "BLAH_007" on measurement "BB2" is type float, already exists as type integer dropped=1"}

@abbasqamar
Copy link

me too have same issue..

@derrix060
Copy link

derrix060 commented Dec 5, 2018

What I've found is that If I do a backup, and then restore this backup somewhere else (in another docker container, for example), the new database will work fine.

@derrix060
Copy link

The only way that I managed to deal with this (very hackish workaround) was creating a backup to the database that I want, drop this database and finally restoring the backup.

@f1-outsourcing
Copy link

I am still not able to drop measurements, and I have been waiting quite some time for an update that fixes this. How is it even possible that you are releasing versions that are so buggy???
I would be ashamed if I put clients in such position, also totally no support on your community forum.

drop measurement "collectd"
drop series from "collectd"
delete from "collectd"

influxdb-1.7.1-1.x86_64
CentOS Linux release 7.5.1804 (Core)
Linux db1 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

@f1-outsourcing
Copy link

Seems I'm going from one issue to another :-( I'm using the docker image influxdb:1.4-alpine.

Yes, it is quite bad here. I think it has to do also with the GO language, such a 'low entry threshold level' language, compared to something like c and c++. Is attracting the 'left over' programmers that were 'unable' adopt to more complex and more effort taking language. This difference is also noticeable with something like php. But that is a personal opinion from years of experience, maybe I was just not to lucky with my contacts.
Here some rookie accepts this as being normal. I guess it is a sign of the times.
https://community.influxdata.com/t/or-rention-policies-are-not-correctly-dropped-or-there-is-something-wrong-with-the-cli/7461/4?u=f1outsourcing

@nibynool
Copy link

Currently experiencing this issue with InfluxCloud - really painful as I can't even try the backup/restore option someone else mentioned

@e-dard
Copy link
Contributor

e-dard commented Dec 19, 2018

We think this could be fixed with the 1.7.2 release, where we have fixed a few bugs to do with concurrent writes and deletes.

@nibynool please drop a ticket into support where you will be assisted.

@e-dard
Copy link
Contributor

e-dard commented Dec 19, 2018

@f1-outsourcing would you upgrade to 1.7.2 and see if that resolves your issue? If you have a dataset that we can use to reproduce your issue that would be great too.

@f1-outsourcing
Copy link

This seemed to work, (just posting it here also)

find . -type d -name index -exec rm -Rf {} ;

influx_inspect buildtsi -datadir data/ -waldir wal/

@dgnorton dgnorton added the 1.x label Jan 7, 2019
@garceri
Copy link

garceri commented Jan 9, 2019

Running 1.7.2, experiencing the same issue here..

@e-dard
Copy link
Contributor

e-dard commented Jan 11, 2019

@garceri do you have a definitive way to reproduce this issue on 1.7.2?

@silviot
Copy link

silviot commented Jan 18, 2019

@e-dard I'm experiencing this problem too.

I tried to reproduce it on a lean install, but I was not able to.

But I have a database where it consistently happens. It's 75 Mb and contains info that is not confidential, but I'm not comfortable sharing publicly. I can send it your way so you can check what's going on.

A few more details:

  • I had a type mismatch after changing telegraf config, so I dropped the measurement to start clean
  • I can only insert new rows using the old types
  • if I restart the database the measurement does not show up with SHOW MEASUREMENTS until I try to insert a new row. After that it does show up, and SHOW FIELD KEYS shows the incorrect and old types

@phemmer
Copy link
Contributor

phemmer commented Jan 21, 2019

Experiencing this issue as well on 1.7.2.
Tried the index delete & influx_inspect mentioned by @f1-outsourcing but it didn't work for me :-/

I'm unable to reproduce on demand. I just have a screwed up measurement I can't get fixed.

@drb-germany
Copy link

I have spend quite some time to find a sure way to reproduce the problem. It did not work. The same procedure (with little data) only seldomly reproduced the problem. However, I have a large number of larger datasets where I almost always found the problem.

Therefore, I have the feeling that this only occurs if large amounts of data are accumulated, e.g. a datapoint every 5 or 10 seconds for several days or weeks (maybe if data is spread over different files or shards?).

Thanks for looking into this, this gives us a huge headache as somtimes datapoints are written with the wrong datatype and we have to rename the values and keep the old fields. Currently, the only way is to copy the whole measurement without the wanted fields, drop the measurements, copy it back and then backup and restore the whole database. Can take hours just to get rid of a single field.

@wollew
Copy link

wollew commented May 7, 2019

@wollew that would be great. Please email me edd@<nameoftherepo>.com and I will provide you with some credentials where you can securely upload data to our company SFTP server. If that doesn't work for you we can figure something else out.

I just emailed you, hope that helps.

@wollew
Copy link

wollew commented Jun 4, 2019

Any news on this issue?

@kezsto
Copy link

kezsto commented Jun 27, 2019

Just got slammed with this in production. Really really frustrating.

@jfcg
Copy link
Contributor

jfcg commented Jul 4, 2019

Same here with InfluxDB 1.6.6 on Linux

I wrote some fields as strings instead of float. Old schema survives a "drop measurement" if you reuse measurement name. It is really surprising a database company:

  • fails to test a fundamental feature
  • fails to recreate a simple and dire bug

@e-dard
Copy link
Contributor

e-dard commented Jul 4, 2019

We are actively investigating this issue. Thanks for your patience, and to those of you who have provided me with data to reproduce the issue.

@e-dard e-dard self-assigned this Jul 4, 2019
e-dard added a commit that referenced this issue Jul 5, 2019
This commit fixes an issue where field keys would reappear in results
when querying previously dropped measurements.

The issue manifests itself when duplicates of a new series are inserted
into the `inmem` index. In this case, a map that tracks the number of
series belonging to a measurement was incorrectly incremented once for
each duplication of the series. Then, when it came time to drop the
measurement, the index assumed there were several series belonging to
the measurement left in the index (because the counter was higher than
it should be). The result of that was that the `fields.idx` file (which
stores a mapping between measurements and field keys) was not truncated
and rebuilt. This left old field keys in that file, which were then
returned in subsequent queries over all field keys.
e-dard added a commit that referenced this issue Jul 5, 2019
Fixes #10052

This commit fixes an issue where field keys would reappear in results
when querying previously dropped measurements.

The issue manifests itself when duplicates of a new series are inserted
into the `inmem` index. In this case, a map that tracks the number of
series belonging to a measurement was incorrectly incremented once for
each duplication of the series. Then, when it came time to drop the
measurement, the index assumed there were several series belonging to
the measurement left in the index (because the counter was higher than
it should be). The result of that was that the `fields.idx` file (which
stores a mapping between measurements and field keys) was not truncated
and rebuilt. This left old field keys in that file, which were then
returned in subsequent queries over all field keys.
e-dard added a commit that referenced this issue Jul 5, 2019
Fixes #10052

This commit fixes an issue where field keys would reappear in results
when querying previously dropped measurements.

The issue manifests itself when duplicates of a new series are inserted
into the `inmem` index. In this case, a map that tracks the number of
series belonging to a measurement was incorrectly incremented once for
each duplication of the series. Then, when it came time to drop the
measurement, the index assumed there were several series belonging to
the measurement left in the index (because the counter was higher than
it should be). The result of that was that the `fields.idx` file (which
stores a mapping between measurements and field keys) was not truncated
and rebuilt. This left old field keys in that file, which were then
returned in subsequent queries over all field keys.
e-dard added a commit that referenced this issue Jul 5, 2019
Fixes #10052

This commit fixes an issue where field keys would reappear in results
when querying previously dropped measurements.

The issue manifests itself when duplicates of a new series are inserted
into the `inmem` index. In this case, a map that tracks the number of
series belonging to a measurement was incorrectly incremented once for
each duplication of the series. Then, when it came time to drop the
measurement, the index assumed there were several series belonging to
the measurement left in the index (because the counter was higher than
it should be). The result of that was that the `fields.idx` file (which
stores a mapping between measurements and field keys) was not truncated
and rebuilt. This left old field keys in that file, which were then
returned in subsequent queries over all field keys.
@e-dard e-dard removed the area/tsi label Jul 5, 2019
@e-dard
Copy link
Contributor

e-dard commented Jul 5, 2019

Update

Hello everyone affected by this issue. Firstly, I would like to apologise that it's been almost a year since this issue was filed. Yesterday I was able to really dig into what was causing this issue, mainly due to the .influxdb directly that @ragnarkurmwunder sent me a while back.

This is a pretty difficult issue to reproduce without an existing dataset. Triggering the issue relies on duplicates of a new series being inserted into a database using the inmem index within the same batch. Further, it looks like they need to sit inside of the WAL so that when the database is restarted they will be replayed in and the problem will continue...

The cause of the issue is that the inmem index, in this rare case over counts how many series belong to the measurement (it counts the duplicate points for the same series as different series). Then, when you go to delete the measurement, the index thinks there are still some series around for the measurement and it does not clean up the fields.idx file. This file contains mappings from measurements to field keys, and if it's not cleaned up properly, then those old field keys can be returned in some cases.

I believe I have fixed this issue in #14266

The fix will be available in the 1.8 release, and also in a future 1.7.8 release.

Operational Mitigation Steps

Here are some operational steps you could take to try and resolve this issue without waiting for 1.8 or 1.7.8.

Use the TSI index

I was unable to reproduce this issue using the TSI index. Even when I triggered the issue on the inmem index, and then upgraded to the TSI index, I saw the issue disappear. Whilst we will of course continue to support the inmem index on the 1.x line, from 2.x onwards the TSI index will be the main index InfluxDB uses, and all our development effort will continue on that.

You can find out more information about how to upgrade to TSI here. In the simplest case, you bring your server down and then do something like:

influx_inspect buildtsi -datadir ~/.influxdb/data -waldir ~/.influxdb/wal

Remove invalid fields.idx files

The bug is caused because the fields.idx files (there is one file per shard directory) are not properly rebuilt when the measurement is deleted. However, InfluxDB will rebuild these files if they're missing. If you are currently suffering from fields that are appearing in queries when they shouldn't be then I recommend that you delete all of the field.idx files for the problematic database/retention policy. You will need to bring down your server to do this, then:

$ rm -i ~/.influxdb/data/<db_name>/<rp_name>/*/fields.idx

@1ma
Copy link

1ma commented Jul 5, 2019

That's some great news!

Will we need to do the manual cleanup if we just update to 1.7.8 when it's released?

@e-dard
Copy link
Contributor

e-dard commented Jul 5, 2019

@1ma that's a great point. We will have to add something to the release notes. You will have to either do a manual cleanup, or the issue will resolve itself if you re-drop the measurement.

The manual cleanup would involve removing the stale fields.idx files.

e-dard added a commit that referenced this issue Jul 9, 2019
Fixes #10052

This commit fixes an issue where field keys would reappear in results
when querying previously dropped measurements.

The issue manifests itself when duplicates of a new series are inserted
into the `inmem` index. In this case, a map that tracks the number of
series belonging to a measurement was incorrectly incremented once for
each duplication of the series. Then, when it came time to drop the
measurement, the index assumed there were several series belonging to
the measurement left in the index (because the counter was higher than
it should be). The result of that was that the `fields.idx` file (which
stores a mapping between measurements and field keys) was not truncated
and rebuilt. This left old field keys in that file, which were then
returned in subsequent queries over all field keys.
@ghost
Copy link

ghost commented Aug 23, 2019

officail docker image influxdb:1.7.7 still has this issue.
So I had to drop database

@lovasoa
Copy link

lovasoa commented Oct 11, 2019

Any news on this ? I am not sure the issue mentioned by @e-dard above is the same as what everyone is encountering here. The issue is not "hard to reproduce", for me it happens systematically with any measurement that I drop.

@jeankarunadewi
Copy link

also experienced this issue on influx v1.7.7.
just now, I'm experimenting using telegraf to convert a csv to my influxdb database.
At first the drop is successful, but after 3 attempts the drops suddenly didn't work. The points are gone from the measurement, but everytime I "show measurements", it's still there. Also when I use "show tag keys" and "show field keys", the tags and fields are still there. Then untill now, the measurement cannot be dropped at all.
I hope there is a continuation from influxdb team to solve this "bug"

@e-dard
Copy link
Contributor

e-dard commented Nov 11, 2019

This issue should be fixed in 1.7.8. Please open a new issue if you see problems on 1.7.8.

@rakopoul
Copy link

rakopoul commented Dec 5, 2023

We are having same issue with InfluxDB. We use tag 2.5.1-alpine inside a container.
I have a table where metrics coming from Kafka are inserted there.
Initially the table had three tags and several fields. To make measurement faster i changed telegraf agent config to make some of the fields as tags.
So instead of having measurement like:
tag1,tag2,tag3 field1,field2,field3,field4,field5
I made them
tag1,tag2,tag3,tag4 (name of field1),tag5(name of field2) field3,field4,field5

I dropped old table and Influx seems to work fine.
After some time i wanted to change my dashboards showing the metrics and i redeployed the whole stack (changing only dashboards code). After this one it seemed like Influx brought back somehow the old metrics as well, creating conflict and renaming the new tags to tag_1 etc, of course making the queries not to show anything. Dropping again the tables fixed the problem, till new restart reproduces it.

Checked my metrics in Kafka and there is no old message, it seems somehow influx caches the old data somewhere and we are not sure how to resolve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests