Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typed iterators are racy #14267

Closed
e-dard opened this issue Jul 5, 2019 · 2 comments
Closed

Typed iterators are racy #14267

e-dard opened this issue Jul 5, 2019 · 2 comments
Labels
area/storage flaky test This is a flaky test. These issues should be groomed at the next opportunity kind/bug wontfix

Comments

@e-dard
Copy link
Contributor

e-dard commented Jul 5, 2019

Once I developed a fix for #10052 I found that a concurrent test we run started to fail occasionally: TestShard_WritePoints_FieldConflictConcurrentQuery.

This test continuously writes into the same series using different typed values (one goroutine integers, one goroutine floats) and then attempts to read them out. It panics occasionally because by the time the query engine has determined it needs a float cursor, the series has been deleted and reinserted as an integer. Therefore the type conversion fails with a panic:

func (c *floatDescendingCursor) peekCache() (t int64, v float64) {
if c.cache.pos < 0 || c.cache.pos >= len(c.cache.values) {
return tsdb.EOF, 0
}
item := c.cache.values[c.cache.pos]
return item.UnixNano(), item.(FloatValue).value
}

The stacktrace on the failing test looks like:

=== RUN   TestShard_WritePoints_FieldConflictConcurrentQuery
panic: interface conversion: tsm1.Value is tsm1.IntegerValue, not tsm1.FloatValue

goroutine 51548 [running]:
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*floatAscendingCursor).peekCache(0xc000198aa0, 0x0, 0x6)
	/Users/edd/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:412 +0x1a8
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*floatAscendingCursor).nextFloat(0xc000198aa0, 0xc0002e33e0, 0xc0004ea000)
	/Users/edd/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:443 +0x3d
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*floatIterator).Next(0xc00042a000, 0x5, 0xc000671648, 0x104278a)
	/Users/edd/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:244 +0x952
github.com/influxdata/influxdb/tsdb_test.TestShard_WritePoints_FieldConflictConcurrentQuery.func1(0xc0001346c0, 0xc0004577a0)
	/Users/edd/go/src/github.com/influxdata/influxdb/tsdb/shard_test.go:568 +0xd58
created by github.com/influxdata/influxdb/tsdb_test.TestShard_WritePoints_FieldConflictConcurrentQuery
	/Users/edd/go/src/github.com/influxdata/influxdb/tsdb/shard_test.go:519 +0x4e9

We need to determine the following:

  1. The test does not use the top-level write/delete/query interface, so is it possible for a user to actually trigger this panic? I think technically it could be, because the tsm1.Engine does not lock over deletes/queries...
  2. The fix would probably involve peeking at the first values the cursor has, and then doing a type check on it. If it's not the type we expect then we should return an error.

The error could be returned from:

// Next returns the next point from the iterator.
func (itr *floatIterator) Next() (*query.FloatPoint, error) {
for {
seek := tsdb.EOF
if itr.cur != nil {
// Read from the main cursor if we have one.
itr.point.Time, itr.point.Value = itr.cur.nextFloat()
seek = itr.point.Time

However, it's not clear how many places we need to check and the scope of that work...

@e-dard e-dard added kind/bug flaky test This is a flaky test. These issues should be groomed at the next opportunity labels Jul 5, 2019
@e-dard e-dard changed the title Types iterators are racy Typed iterators are racy Jul 5, 2019
@stale
Copy link

stale bot commented Oct 17, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Oct 17, 2019
@stale
Copy link

stale bot commented Oct 24, 2019

This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions.

@stale stale bot closed this as completed Oct 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage flaky test This is a flaky test. These issues should be groomed at the next opportunity kind/bug wontfix
Projects
None yet
Development

No branches or pull requests

2 participants