Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compactor: Unsolicited response received on idle HTTP channel starting, read TOC: invalid checksum #3858

Closed
goober opened this issue Mar 1, 2021 · 4 comments
Labels

Comments

@goober
Copy link
Contributor

goober commented Mar 1, 2021

I can verify that #3795 has solved part of our issue with invalid checksums when the compactor downloads blocks, and it will recover and try again. However, as seen in the log attached below we hit another issue where the compactor stops. A restart will let it continue to download new blocks for processing before it hits the same error again. It seems that it is different blocks each time.

Thanos, Prometheus and Golang version used:
thanos/thanos:master-2021-02-26-2027fb30

Object Storage Provider:

Ceph version 3.2

Running parameters:

compact \
--http-address=0.0.0.0:10902 \
--objstore.config-file=/etc/thanos/config/thanos.yaml \
--data-dir=/var/thanos/compact \
--retention.resolution-raw=14d \
--retention.resolution-5m=14d \
--retention.resolution-1h=14d \
--delete-delay=12h \
--wait

What happened:

Compactor stops compacting with the following error:

2021/03/01 15:43:00 Unsolicited response received on idle HTTP channel starting with "*\x00,\xed/\x00,\xed4\x00,\xed9\x00,\xed>\x00,\xedC\x00,\xedH\x00,\xedM\x00,\xedR\x00,\xedW\x00,\xed\\\x00,\xeda\x00,\xedf\x00,\xedk\x00,\xedp\x00,\xedu\x00,\xedz\x00,\xed\u007f\x00,\xed\x84\x00,\xed\x89\x00,\xed\x8e\x00,\xed\x93\x00,\xed\x98\x00,\xed\x9d\x00,\xed\xa2\x00,\xed\xa7\x00,\xed\xac\x00,\xed\xb1\x00,\xed\xb6\x00,\xed\xbb\x00,\xed\xc0\x00,\xed\xc5\x00,\xed\xca\x00,\xed\xcf\x00,\xed\xd4\x00,\xed\xd9\x00,\xed\xde\x00,\xed\xe3\x00,\xed\xe8\x00,\xed\xed\x00,\xed\xf2\x00,\xed\xf7\x00,\xed\xfc\x00,\xee\x01\x00,\xee\x06\x00,\xee\v\x00,\xee\x10\x00,\xee\x15\x00,\xee\x1a\x00,\xee\x1f\x00,\xee$\x00,\xee)\x00,\xee.\x00,\xee3\x00,\xee8\x00,\xee=\x00,\xeeB\x00,\xeeG\x00,\xeeL\x00,\xeeQ\x00,\xeeV\x00,\xee[\x00,\xee`\x00,\xeee\x00,\xeej\x00,\xeeo\x00,\xeet\x00,\xeey\x00,\xee~\x00,\xee\x83\x00,\xee\x88\x00,\xee\x8d\x00,\xee\x92\x00,\xee\x97\x00,\xee\x9c\x00,\xee\xa1\x00,\xee\xa6\x00,\xee\xab\x00,\xee\xb0\x00,\xee\xb5\x00,\xee\xba\x00,\xee\xbf\x00,\xee\xc4\x00,\xee\xc9\x00,\xee\xce\x00,\xee\xd3\x00,\xee\xd8\x00,\xee\xdd\x00,\xee\xe2\x00,\xee\xe7\x00,\xee\xec\x00,\xee\xf1\x00,\xee\xf6\x00,\xee\xfb\x00,\xef\x00\x00,\xef\x05\x00,\xef\n\x00,\xef\x0f\x00,\xef\x14\x00,\xef\x19\x00,\xef\x1e\x00,\xef#\x00,\xef(\x00,\xef-\x00,\xef2\x00,\xef7\x00,\xef<\x00,\xefA\x00,\xefF\x00,\xefK\x00,\xefP\x00,\xefU\x00,\xefZ\x00,\xef_\x00,\xefd\x00,\xefi\x00,\xefn\x00,\xefs\x00,\xefx\x00,\xef}\x00,\xef\x82\x00,\xef\x87\x00,\xef\x8c\x00,\xef\x91\x00,\xef\x96\x00,\xef\x9b\x00,\xef\xa0\x00,\xef\xa5\x00,\xef\xaa\x00,\xef\xaf\x00,\xef\xb4\x00,\xef\xb9\x00,\xef\xbe\x00,\xef\xc3\x00,\xef\xc8\x00,\xef\xcd\x00,\xef\xd2\x00,\xef\xd7\x00,\xef\xdc\x00,\xef\xe1\x00,\xef\xe6\x00,\xef\xeb\x00,\xef\xf0\x00,\xef\xf5\x00,\xef\xfa\x00,\xef\xff\x00,\xf0\x04\x00,\xf0\t\x00,\xf0\x0e\x00,\xf0\x13\x00,\xf0\x18\x00,\xf0\x1d\x00,\xf0\"\x00,\xf0'\x00,\xf0,\x00,\xf01\x00,\xf06\x00,\xf0;\x00,\xf0@\x00,\xf0E\x00,\xf0J\x00,\xf0O\x00,\xf0T\x00,\xf0Y\x00,\xf0^\x00,\xf0c\x00,\xf0h\x00,\xf0m\x00,\xf0r\x00,\xf0w\x00,\xf0\|\x00,\xf0\x81\x00,\xf0\x86\x00,\xf0\x8b\x00,\xf0\x90\x00,\xf0\x95\x00,\xf0\x9a\x00,\xf0\x9f\x00,\xf0\xa4\x00,\xf0\xa9\x00,\xf0\xae\x00,\xf0\xb3\x00,\xf0\xb8\x00,\xf0\xbd\x00,\xf0\xc2\x00,\xf0\xc7\x00,\xf0\xcc\x00,\xf0\xd1\x00,\xf0\xd6\x00,\xf0\xdb\x00,\xf0\xe0\x00,\xf0\xe5\x00,\xf0\xea\x00,\xf0\xef\x00,\xf0\xf4\x00,\xf0\xf9\x00,\xf0\xfe\x00,\xf1\x03\x00,\xf1\b\x00,\xf1\r\x00,\xf1\x12\x00,\xf1\x17\x00,\xf1\x1c\x00,\xf1!\x00,\xf1&\x00,\xf1+\x00,\xf10\x00,\xf15\x00,\xf1:\x00,\xf1?\x00,\xf1D\x00,\xf1I\x00,\xf1N\x00,\xf1S\x00,\xf1X\x00,\xf1]\x00,\xf1b\x00,\xf1g\x00,\xf1l\x00,\xf1q\x00,\xf1v\x00,\xf1{\x00,\xf1\x80\x00,\xf1\x85\x00,\xf1\x8a\x00,\xf1\x8f\x00,\xf1\x94\x00,\xf1\x99\x00,\xf1\x9e\x00,\xf1\xa3\x00,\xf1\xa8\x00,\xf1\xad\x00,\xf1\xb2\x00,\xf1\xb7\x00,\xf1\xbc\x00,\xf1\xc1\x00,\xf1\xc6\x00,\xf1\xcb\x00,\xf1\xd0\x00,\xf1\xd5\x00,\xf1\xda\x00,\xf1\xdf\x00,\xf1\xe4\x00,\xf1\xe9\x00,\xf1\xee\x00,\xf1\xf3\x00,\xf1\xf8\x00,\xf1\xfd\x00,\xf2\x02\x00,\xf2\a\x00,\xf2\f\x00,\xf2\x11\x00,\xf2\x16\x00,\xf2\x1b\x00,\xf2 \x00,\xf2%\x00,\xf2*\x00,\xf2/\x00,\xf24\x00,\xf29\x00,\xf2>\x00,\xf2C\x00,\xf2H\x00,\xf2M\x00,\xf2R\x00,\xf2W\x00,\xf2\\\x00,\xf2a\x00,\xf2f\x00,\xf2k\x00,\xf2p\x00,\xf2u\x00,\xf2z\x00,\xf2\u007f\x00,\xf2\x84\x00,\xf2\x89\x00,\xf2\x8e\x00,\xf2\x93\x00,\xf2\x98\x00,\xf2\x9d\x00,\xf2\xa2\x00,\xf2\xa7\x00,\xf2\xac\x00,\xf2\xb1\x00,\xf2\xb6\x00,\xf2\xbb\x00,\xf2\xc0\x00,\xf2\xc5\x00,\xf2\xca\x00,\xf2\xcf\x00,\xf2\xd4\x00,\xf2\xd9\x00,\xf2\xde\x00,\xf2\xe3\x00,\xf2\xe8\x00,\xf2\xed\x00,\xf2\xf2\x00,\xf2\xf7\x00,\xf2\xfc\x00,\xf3\x01\x00,\xf3\x06\x00,\xf3\v\x00,\xf3\x10\x00,\xf3\x15\x00,\xf3\x1a\x00,\xf3\x1f\x00,\xf3$\x00,\xf3)\x00,\xf3.\x00,\xf33\x00,\xf38\x00,\xf3=\x00,\xf3B\x00,\xf3G\x00,\xf3L\x00,\xf3Q\x00,\xf3V\x00,\xf3[\x00,\xf3`\x00,\xf3e\x00,\xf3j\x00,\xf3o\x00,\xf3t\x00,\xf3y\x00,\xf3~\x00,\xf3\x83\x00,\xf3\x88\x00,\xf3\x8d\x00,\xf3\x92\x00,\xf3\x97\x00,\xf3\x9c\x00,\xf3\xa1\x00,\xf3\xa6\x00,\xf3\xab\x00,\xf3\xb0\x00,\xf3\xb5\x00,\xf3\xba\x00,\xf3\xbf\x00,\xf3\xc4\x00,\xf3\xc9\x00,\xf3\xce\x00,\xf3\xd3\x00,\xf3\xd8\x00,\xf3\xdd\x00,\xf3\xe2\x00,\xf3\xe7\x00,\xf3\xec\x00,\xf3\xf1\x00,\xf3\xf6\x00,\xf3\xfb\x00,\xf4\x00\x00,\xf4\x05\x00,\xf4\n\x00,\xf4\x0f\x00,\xf4\x14\x00,\xf4\x19\x00,\xf4\x1e\x00,\xf4#\x00,\xf4(\x00,\xf4-\x00,\xf42\x00,\xf47\x00,\xf4<\x00,\xf4A\x00,\xf4F\x00,\xf4K\x00,\xf4P\x00,\xf4U\x00,\xf4Z\x00,\xf4_\x00,\xf4d\x00,\xf4i\x00,\xf4n\x00,\xf4s\x00,\xf4x\x00,\xf4}\x00,\xf4\x82\x00,\xf4\x87\x00,\xf4\x8c\x00,\xf4\x91\x00,\xf4\x96\x00,\xf4\x9b\x00,\xf4\xa0\x00,\xf4\xa5\x00,\xf4\xaa\x00,\xf4\xaf\x00,\xf4\xb4\x00,\xf4\xb9\x00,\xf4\xbe\x00,\xf4\xc3\x00,\xf4\xc8\x00,\xf4\xcd\x00,\xf4\xd2\x00,\xf4\xd7\x00,\xf4\xdc\x00,\xf4\xe1\x00,\xf4\xe6\x00,\xf4\xeb\x00,\xf4\xf0\x00,\xf4\xf5\x00,\xf4\xfa\x00,\xf4\xff\x00,\xf5\x04\x00,\xf5\t\x00,\xf5\x0e\x00,\xf5\x13\x00,\xf5\x18\x00,\xf5\x1d\x00,\xf5\"\x00,\xf5'\x00,\xf5,\x00,\xf51\x00,\xf56\x00,\xf5;\x00,\xf5@\x00,\xf5E\x00,\xf5J\x00,\xf5O\x00,\xf5T\x00,\xf5Y\x00,\xf5^\x00,\xf5c\x00,\xf5h\x00,\xf5m\x00,\xf5r\x00,\xf5w\x00,\xf5\|\x00,\xf5\x81\x00,\xf5\x86\x00,\xf5\x8b\x00,\xf5\x90\x00,\xf5\x95\x00,\xf5\x9a\x00,\xf5\x9f\x00,\xf5\xa4\x00,\xf5\xa9\x00,\xf5\xae\x00,\xf5\xb3\x00,\xf5\xb8\x00,\xf5\xbd\x00,\xf5\xc2\x00,\xf5\xc7\x00,\xf5\xcc\x00,\xf5\xd1\x00,\xf5\xd6\x00,\xf5\xdb\x00,\xf5\xe0\x00,\xf5\xe5\x00,\xf5\xea\x00,\xf5\xef\x00,\xf5\xf4\x00,\xf5\xf9\x00,\xf5\xfe\x00,\xf6\x03\x00,\xf6\b\x00,\xf6\r\x00,\xf6\x12\x00,\xf6\x17\x00,\xf6\x1c\x00,\xf6!\x00,\xf6&\x00,\xf6+\x00,\xf60\x00,\xf65\x00,\xf6:\x00,\xf6?\x00,\xf6D\x00,\xf6I\x00,\xf6N\x00,\xf6S\x00,\xf6X\x00,\xf6]\x00,\xf6b\x00,\xf6g\x00,\xf6l\x00,\xf6q\x00,\xf6v\x00,\xf6{\x00,\xf6\x80\x00,\xf6\x85\x00,\xf6\x8a\x00,\xf6\x8f\x00,\xf6\x94\x00,\xf6\x99\x00,\xf6\x9e\x00,\xf6\xa3\x00,\xf6\xa8\x00,\xf6\xad\x00,\xf6\xb2\x00,\xf6\xb7\x00,\xf6\xbc\x00,\xf6\xc1\x00,\xf6\xc6"; err=<nil>
--
  | level=warn ts=2021-03-01T15:43:02.121080162Z caller=intrumentation.go:54 msg="changing probe status" status=not-ready reason="error executing compaction: compaction: group 0@393068013787079835: gather index issues for block /var/thanos/compact/compact/0@393068013787079835/01EY7QBN6MTBD43AVC7BAQQ62P: open index file: read TOC: read TOC: invalid checksum"
  | level=info ts=2021-03-01T15:43:02.121130533Z caller=http.go:69 service=http/server component=compact msg="internal server is shutting down" err="error executing compaction: compaction: group 0@393068013787079835: gather index issues for block /var/thanos/compact/compact/0@393068013787079835/01EY7QBN6MTBD43AVC7BAQQ62P: open index file: read TOC: read TOC: invalid checksum"
  | level=info ts=2021-03-01T15:43:02.624302054Z caller=http.go:88 service=http/server component=compact msg="internal server is shutdown gracefully" err="error executing compaction: compaction: group 0@393068013787079835: gather index issues for block /var/thanos/compact/compact/0@393068013787079835/01EY7QBN6MTBD43AVC7BAQQ62P: open index file: read TOC: read TOC: invalid checksum"
  | level=info ts=2021-03-01T15:43:02.624398259Z caller=intrumentation.go:66 msg="changing probe status" status=not-healthy reason="error executing compaction: compaction: group 0@393068013787079835: gather index issues for block /var/thanos/compact/compact/0@393068013787079835/01EY7QBN6MTBD43AVC7BAQQ62P: open index file: read TOC: read TOC: invalid checksum"
  | level=error ts=2021-03-01T15:43:02.632852605Z caller=main.go:156 err="group 0@393068013787079835: gather index issues for block /var/thanos/compact/compact/0@393068013787079835/01EY7QBN6MTBD43AVC7BAQQ62P: open index file: read TOC: read TOC: invalid checksum\ncompaction\nmain.runCompact.func7\n\t/home/circleci/project/cmd/thanos/compact.go:362\nmain.runCompact.func8.1\n\t/home/circleci/project/cmd/thanos/compact.go:416\ngithub.com/thanos-io/thanos/pkg/runutil.Repeat\n\t/home/circleci/project/pkg/runutil/runutil.go:73\nmain.runCompact.func8\n\t/home/circleci/project/cmd/thanos/compact.go:415\ngithub.com/oklog/run.(*Group).Run.func1\n\t/home/circleci/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374\nerror executing compaction\nmain.runCompact.func8.1\n\t/home/circleci/project/cmd/thanos/compact.go:443\ngithub.com/thanos-io/thanos/pkg/runutil.Repeat\n\t/home/circleci/project/pkg/runutil/runutil.go:73\nmain.runCompact.func8\n\t/home/circleci/project/cmd/thanos/compact.go:415\ngithub.com/oklog/run.(*Group).Run.func1\n\t/home/circleci/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374\ncompact command failed\nmain.main\n\t/home/circleci/project/cmd/thanos/main.go:156\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374"
@goober
Copy link
Contributor Author

goober commented Mar 11, 2021

It could have been caused by corrupt data. With the latest version from master and with a clean bucket it has been running smoothly for over a week now. So this ticket can be closed for now.

@goober goober closed this as completed Mar 11, 2021
@goober
Copy link
Contributor Author

goober commented Mar 12, 2021

The issue has been observed again.

By adding the argument --no-debug.halt-on-error we can bypass the issue since the compact component will be restarted when the error occurs and the compaction will successfully compact or downsample the blocks in the next iteration.

@goober goober reopened this Mar 12, 2021
@stale
Copy link

stale bot commented Jun 2, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jun 2, 2021
@stale
Copy link

stale bot commented Jun 16, 2021

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Jun 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant