Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8 #445

Closed
cyberonstian opened this issue Mar 12, 2019 · 21 comments

Comments

@cyberonstian
Copy link

We are in the process of deploying Kolide to our customers and we get this error on multiple of the windows clients. When the error is logged launcher stops sending data to the server.

We have tried updating osqueryd (3.3.2) and launcher (9.0.2) to latest versions but the error still persists.

If we delete the osquery.db files and restart launcher it works for a while and then the error reappears and the log traffic stops.

We have done some investigation and nailed it down to this query:
SELECT * FROM programs

This works perfectly in Kolide fleet dashboard, but when running it scheduled launcher fails to send the logs due to the error message. We have seen the error message on multiple computers but are still investigation what queries triggers those endpoints.

To nail it down a bit further we found the software in the list that had the characters that triggered the error:
SELECT * FROM programs WHERE name = "HxD Hex Editor version 1.7.7.0"

What version of fleet are you using (fleet version --full)?

fleet - version 2.0.2
branch: master
revision: 8ca0358bf28173685815b79d8683a4239d629a14
build date: 2019-01-18T00:39:40Z
build user: zwass
go version: go1.11.3¨

What operating system are you using?

Windows 10 Pro 1893

What did you do?

  1. Install Kolide Fleet agent on the client
  2. Start in debug mode
  3. Wait a couple of minutes, we see that the server is recieving data from the client
  4. The error appears in the logs and no data is recieved on the server.

What did you expect to see?

Data sent from launcher.exe to kolide fleet

What did you see instead?

The two first log entries are the error message, the last one is the software that we think is causing them.

{"caller":"publish_logs.go:157","err":"rpc error: code = Internal desc = grpc: error while marshaling: proto: field "kolide.agent.LogCollection.Log.Data" contains invalid UTF-8","errcode":"","logType":"string","log_count":3,"message":"","method":"PublishLogs","reauth":false,"severity":"info","took":"0s","ts":"2019-03-12T12:34:15.8004054Z","uuid":"bde64b33-697d-4782-b8a2-866e0a44e71a"}

{"caller":"extension.go:494","err":"sending string logs: writing logs: transport error sending logs: rpc error: code = Internal desc = grpc: error while marshaling: proto: field "kolide.agent.LogCollection.Log.Data" contains invalid UTF-8","severity":"info","ts":"2019-03-12T12:34:15.8014026Z"}

{"caller":"publish_results.go:168","err":null,"errcode":"","message":"","method":"PublishResults","reauth":false,"results":"[{"query_name":"kolide_distributed_query_352","status":0,"rows":[{"identifying_number":"","install_date":"20180625","install_location":"C:\\Program Files (x86)\\HxD\\","install_source":"","language":"","name":"HxD Hex Editor version 1.7.7.0","publisher":"Ma�l H�rz","uninstall_string":"\"C:\\Program Files (x86)\\HxD\\unins000.exe\"","version":"1.7.7.0"}]}]","severity":"debug","took":"16.9527ms","ts":"2019-03-12T22:34:46.991793Z","uuid":"7729e05c-71a1-4f7b-bd37-8fb37e76e6db"}

@groob
Copy link
Contributor

groob commented Mar 12, 2019

osquery/osquery#5288

directionless added a commit that referenced this issue May 3, 2019
Osquery sometimes mis-encodes utf8 data osquery/osquery#5288

This is a broad attempt to repair log files that exhibit that issue. This runs against the entire log file. Hopefully, there isn’t going to be a case where it misfires.

Fixes: #445
@polak785
Copy link

polak785 commented Mar 20, 2020

Hello,

This bug still exists and block a normal usage. I'm on Kolide Fleet 2.5.0 on server side and Windows 10 1709 with Kolide Launcher 0.11.9 and osquery 4.2.0 on client side. This a test virtual machine, therefore there are few things installed on it.

I confirm that the bug happen when a query is done on table programs. Then, none query return result, Kolide Launcher can be restarted but is still on a bad state and the quick fix is to delete the local database of Kolide Launcher. But this will block again when the scheduled query is started.

I think the bug is on Kolide Launcher side because when I run osquery with exact same parameters as Kolide Launcher starts it, it works like a charm, osqueryd return results and keep continue to work.

Here is the command I run for osqueryd:

"C:\Program Files\Kolide\Launcher-launcher\bin\osqueryd.exe-updates\1583771796\osqueryd" --tls_hostname=xxx.yyy.zzz:8080 --enroll_secret_path="C:\Program Files\Kolide\Launcher-launcher\conf\secret" --enroll_tls_endpoint=/api/v1/osquery/enroll --config_plugin=tls --config_tls_endpoint=/api/v1/osquery/config --config_refresh=10 --host_identifier=uuid --pidfile="C:\Program Files\Kolide\Launcher-launcher\data\osquery.db\osquery.pid" --database_path="C:\Program Files\Kolide\Launcher-launcher\data\osquery.db\osquery.db" --extensions_socket="C:\Program Files\Kolide\Launcher-launcher\data\osquery.sock" --extensions_timeout=10 -logger_plugin=tls --distributed_plugin=tls --disable_distributed=false --distributed_interval=5 --pack_delimiter=: --host_identifier=uuid --force=true --disable_watchdog --utc --verbose --config_refresh=300 --config_accelerated_refresh=30 --extensions_autoload="C:\Program Files\Kolide\Launcher-launcher\data\osquery.autoload"

Can you fix this bug please? I can do more tests if required.

@directionless
Copy link
Contributor

I think the encoding bug is in osquery, as discussed in osquery/osquery#5288 but I agree that launcher shouldn't crash.

How are you issuing the that query? Is it a scheduled query, or being sent as a distributed query?

@polak785
Copy link

This is a scheduled query with logging as snapshot.

Yes this is very blocking that Launcher crash because the only fix is to delete local database.

@directionless
Copy link
Contributor

I have not yet spun up a test environment for this, but I don't think we see this on the SaaS side of things. I'm wondering if launcher is still up, but it's stuck in a loop trying to send the same log to the server. Do the launcher logs indicate?

@polak785
Copy link

Yes the same log occur every minute:

{"caller":"publish_logs.go:157","err":null,"errcode":"","logType":"status","log_count":9,"message":"","method":"PublishLogs","reauth":false,"severity":"debug","took":"34.045ms","ts":"2020-03-19T09:45:29.3740801Z","uuid":"bf28a7df-0748-4076-8c59-09d300b3b8e0"}
{"caller":"publish_logs.go:157","err":"rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","errcode":"","logType":"string","log_count":4530,"message":"","method":"PublishLogs","reauth":false,"severity":"info","took":"16.3766ms","ts":"2020-03-19T09:45:29.4973711Z","uuid":"7196d32e-f1a0-4d82-8b6f-2be5306376bb"}
{"caller":"extension.go:494","err":"sending string logs: writing logs: transport error sending logs: rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","severity":"info","ts":"2020-03-19T09:45:29.5006111Z"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"30.0947ms","ts":"2020-03-19T09:45:37.0671645Z","uuid":"9c8f5303-2baa-4271-81d9-25321910f906"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"31.2682ms","ts":"2020-03-19T09:45:47.4014405Z","uuid":"d8ffa71b-6486-41a8-8f2c-d87f4013533a"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"29.0556ms","ts":"2020-03-19T09:45:56.6949539Z","uuid":"299d8b07-3360-47f1-bdd4-b938f0a006d6"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"30.7393ms","ts":"2020-03-19T09:46:05.0590382Z","uuid":"4ebf9656-9878-4d1f-96e1-285985d5bfba"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"34.1126ms","ts":"2020-03-19T09:46:12.8515392Z","uuid":"77db8c33-5e0c-4890-8d9f-d41895f6e5a1"}
{"caller":"query_target.go:71","msg":"server does not implement GetTargets","severity":"debug","ts":"2020-03-19T09:46:21.3208714Z"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"31.79ms","ts":"2020-03-19T09:46:21.4265874Z","uuid":"2fdb6afb-eb1b-499a-b586-f1e938e43d28"}
{"caller":"publish_logs.go:157","err":"rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","errcode":"","logType":"string","log_count":4530,"message":"","method":"PublishLogs","reauth":false,"severity":"info","took":"10.1363ms","ts":"2020-03-19T09:46:24.2113109Z","uuid":"40f6d97a-1b81-4074-8343-71a5623b46be"}
{"caller":"extension.go:494","err":"sending string logs: writing logs: transport error sending logs: rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","severity":"info","ts":"2020-03-19T09:46:24.2203495Z"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"29.8889ms","ts":"2020-03-19T09:46:30.8535053Z","uuid":"74066323-9566-4e30-9d56-be3c7d9fc525"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"32.7023ms","ts":"2020-03-19T09:46:41.2258338Z","uuid":"c90a6fbf-ea71-4993-aa9f-6a5f6b006803"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"30.8766ms","ts":"2020-03-19T09:46:52.6475113Z","uuid":"402dd069-77f5-4034-9a04-98aec1f56c45"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"30.3889ms","ts":"2020-03-19T09:47:05.2297429Z","uuid":"75efbae4-4699-483e-93a0-f335fee6cb41"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"29.0314ms","ts":"2020-03-19T09:47:18.5264584Z","uuid":"7e8c8d50-3f12-4023-985c-c830aa435a1b"}
{"caller":"query_target.go:71","msg":"server does not implement GetTargets","severity":"debug","ts":"2020-03-19T09:47:30.0738021Z"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"46.5929ms","ts":"2020-03-19T09:47:30.5033555Z","uuid":"9633aa80-b003-4651-9478-d5d6014ada96"}
{"caller":"publish_logs.go:157","err":"rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","errcode":"","logType":"string","log_count":4530,"message":"","method":"PublishLogs","reauth":false,"severity":"info","took":"8.4364ms","ts":"2020-03-19T09:47:33.4980049Z","uuid":"d505fbb3-7326-4cd5-8c3b-0e8105407cf5"}
{"caller":"extension.go:494","err":"sending string logs: writing logs: transport error sending logs: rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","severity":"info","ts":"2020-03-19T09:47:33.5020486Z"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"29.5181ms","ts":"2020-03-19T09:47:41.2621737Z","uuid":"febf6a67-e727-48bd-b5de-6abafe0b03e0"}
{"caller":"scheduler.cpp:96","component":"osquery","level":"stderr","msg":"I0319 10:47:49.286285  7356 scheduler.cpp:96] Executing scheduled query pack/Software/Installed software list - Windows: select name as software_name, version as software_version from programs;","severity":"debug","ts":"2020-03-19T09:47:49.2939863Z"}
{"caller":"process_ops.cpp:99","component":"osquery","level":"stderr","msg":"I0319 10:47:49.298260  7356 process_ops.cpp:99] No account name provided","severity":"info","ts":"2020-03-19T09:47:49.3016493Z"}
{"caller":"logged_in_users.cpp:129","component":"osquery","level":"stderr","msg":"I0319 10:47:49.298260  7356 logged_in_users.cpp:129] Error converting username to SID","severity":"info","ts":"2020-03-19T09:47:49.3025674Z"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"30.894ms","ts":"2020-03-19T09:47:50.9465303Z","uuid":"aaf33ff4-6b2d-42fc-bda8-449650f93766"}

@directionless
Copy link
Contributor

I would say that indicates it's not a launcher bug. Launcher is sending a log, and fleet is rejecting it. Launcher is then resending it. The reason you need to drop the database, is because launcher will always try to resend stored logs.

I am not immediately sure whether I think launcher should drop the log when the server rejects it. It's the sort of thing that would cause logs to be dropped, and potentially break any kind of monitoring.

My gut sense is that fleet should accept these and either fix them, and record the data, or discard them. @zwass what do you think here?

@polak785
Copy link

As reference, the original issue has been created on fleet : kolide/fleet#2014

But as with osquery this works fine, I think a fix should be done on Kolide side to accept the log and not discard it.

@zwass
Copy link
Contributor

zwass commented Mar 20, 2020

This does not look like an error on the Fleet side. Notice the error message:

rpc error: code = Internal desc = grpc: error while marshaling: proto: field "kolide.agent.LogCollection.Log.Data" contains invalid UTF-8

There wouldn't be any marshalling on the Fleet side, only unmarshalling.

@directionless
Copy link
Contributor

rpc error: code = Internal desc = grpc: error while marshaling: proto: field "kolide.agent.LogCollection.Log.Data" contains invalid UTF-8

Hrm. I'll take a closer look. @zwass What do you think launcher should do? We talked about it briefly in #481

@zwass
Copy link
Contributor

zwass commented Mar 20, 2020

Either escape it in some way, or drop the invalid characters I suppose. Probably best to do this only after receiving the error so we don't add an extra encoding/scanning step for every single log.

@wkleinhenz
Copy link

I seem to be experience this error based on logs im seeing from the service on windows host, has a workaround or solution been determined yet, im running the most recent version of launcher (0.11.9) and fleet version 2.6.0

@polak785
Copy link

For my concern, the bug has been fixed with latest osquery release (4.3.0).

The related PR is osquery/osquery#6190

All encoding issues should be fixed when the PR osquery/osquery#6338 will be released

@B3DTech
Copy link

B3DTech commented Jan 19, 2021

Is there any progress on this? 4.5.1 and all my event logs across all my Windows servers are seeing thousands of:

caller=log.go:124 ts=2021-01-19T16:50:26.127444Z caller=level.go:63 level=info caller=extension.go:494 err="sending string logs: writing logs: transport error sending logs: rpc error: code = Internal desc = grpc: error while marshaling: proto: field "kolide.agent.LogCollection.Log.Data" contains invalid UTF-8"

caller=log.go:124 ts=2021-01-19T16:50:26.1264442Z caller=level.go:63 level=info caller=publish_logs.go:179 method=PublishLogs uuid=6e2ccedd-1f13-4ea9-9bcc-37d9611e3eff logType=string log_count=1103 message= errcode= reauth=false err="rpc error: code = Internal desc = grpc: error while marshaling: proto: field "kolide.agent.LogCollection.Log.Data" contains invalid UTF-8" took=14.0018ms

@zwass
Copy link
Contributor

zwass commented Jan 20, 2021

@B3DTech FWIW we (github.com/fleetdm) are working on an osquery autoupdater that takes a different approach by letting osquery do the heavy lifting. This (I believe) will help resolve issues like this one with the grpc transport in Launcher.

@directionless
Copy link
Contributor

@B3DTech The underlying issue here, is that osquery sometimes produces data that is not utf-8 compatible. This is generally believed to be a bug. What do you think launcher should do when that happens?

@B3DTech
Copy link

B3DTech commented Jan 20, 2021

My problem is that it isn't "sometimes". It's my event logs on every machine are getting flooded with 2-4 of these message every minute. I'm not sure what should happen because I don't know why it's happening or what it's choking on. Osquery shouldn't be failing to produce UTF-8 messages.

And additionally, my 4.5.1 clients don't seem to be sending event logs, but my 4.2 clients are.

@directionless
Copy link
Contributor

Regardless of whether osquery "should" produce non-utf8, it does. So what do you think launcher should do? Attempt to repair the data? Drop it on the floor? Encode it?

directionless added a commit that referenced this issue Jan 21, 2021
Osquery sometimes mis-encodes utf8 data osquery/osquery#5288

This is a broad attempt to repair log files that exhibit that issue. This runs against the entire log file. Hopefully, there isn’t going to be a case where it misfires.

Fixes: #445
@directionless
Copy link
Contributor

I've updated #481. It should now attempt to repair the data, then redact the non-utf8 characters. I can't easily test it though

@B3DTech
Copy link

B3DTech commented Jan 29, 2021

It was determined that there is mis-formatted osquery results in the osquery store from the previous version of osquery, and Launcher is still trying to send that. Removing the C:\Program Files\Kolide\Launcher-so-launcher\data\ directory and restarting the service fixed the issue - no more UTF8 events.

@directionless
Copy link
Contributor

I think this is as resolved as it's getting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants