-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define UNICODE and _UNICODE preprocessors for windows #6338
Define UNICODE and _UNICODE preprocessors for windows #6338
Conversation
@Breakwell if you could please review the changes to windows\process.cpp I made, would appreciate it. :-) I think you had also made changes recently around the code I updated here. |
This is great! I'll bring up the performance concern again. :D Is there a way to compare the before / after:
Heads up I am not sure the Windows equivalent of Also @alessandrogario, do you have time to help test performance impact to |
With powershell you can do something like
Performance-wise, these changes will be the same or better than what currently exists in master. Reasoning is below: Existing case. Let's say, a piece of osquery code calls OS's So in pseudo-code simplification,
What the update does. We perform
So, these should two are at least equal in performance (ignoring CP_ACP/CP_UTF8 parameter difference). Why the updated code may be better* performance-wise: In the case of ANSI Code Page (ACP), the system is performing localization table lookups, whereas converting from UTF-16 to UTF-8 or reverse is a functional transformation as per RFC 2279. So, it depends on whether the localization table (for ACP) is on disk or loaded in memory, and if it's from disk, IO will make it slower than functional transformation between UTF-16 and UTF-8. |
My performance concern is with the added string conversions, |
Understood. And I agree-- let's perform some experiments to determine performance impact of these changes. |
I assume this is still in limbo. |
@directionless Yes, that's correct. I do not have enough sample data to perform comparative measurements. I think @theopolis wanted @alessandrogario to perform some tests, if he had time available. Conceptually, the string conversions ought to not double memory impact because we're doing this conversion explicitly now whereas before this conversion was happening implicitly when the ANSI version of the functions (*A) were performing conversion and then calling the Wide (*W) and then reconverting the string back to ANSI. You can observe by disassembling the *A function's implementation to see that it's performing MultiByteToWideChar, then calling *W version, and then calling WideCharToMultiByte. |
Ok, I'll take another detailed pass at the changes here and if everything looks OK we can merge. |
Great! Thanks @theopolis! Sorry, I've been a bit busy on my end with our own releases in July, so I haven't had a chance to look over any changes in osquery. |
No worries, thanks for all the work thus far here! I can take care of rebasing the PR and resolving any conflicts that show up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions, then this looks good!
Is there a way to prevent the use of |
Haha, if it were that easy. :) I do not know of any way. However, what this PR does is if a programmer does not explicitly specify A suffix, e.g., just calls |
@farfella, thanks for the insight/details. With this understanding I feel confident we can land this with the nitpick fills for |
I will have some time starting Wednesday night this week. Not sure if other *A functions have been added to master recently. I see r["mode"] = TEXT(meta->getMode()); in sleuthkit.cpp, which should have SQL_ prefix. Not sure if there are any others. |
Right, it would be any code that was added since your original PR. This is my fault for letting the PR sit for so long. |
Sorry, I had a migraine last night, and didn't get a chance to work on this. I was in fact building right now, as you approved and fixed the remaining merge issues. :) Thank you, @theopolis ! |
Should this have resolved the following that's flooding my Windows event logs? Running 4.5.1 caller=log.go:124 ts=2021-01-19T16:50:26.1264442Z caller=level.go:63 level=info caller=publish_logs.go:179 method=PublishLogs uuid=6e2ccedd-1f13-4ea9-9bcc-37d9611e3eff logType=string log_count=1103 message= errcode= reauth=false err="rpc error: code = Internal desc = grpc: error while marshaling: proto: field "kolide.agent.LogCollection.Log.Data" contains invalid UTF-8" took=14.0018ms caller=log.go:124 ts=2021-01-19T16:52:01.0215442Z caller=level.go:63 level=info caller=log.go:69 component=osquery level=stderr msg="I0119 11:52:01.021148 11232 registry.cpp:558] Failed to expand globs: Failed to open registry handle" caller=registry.cpp:558 |
Hmm not sure. This seems to be a grpc marshalling error. Might not be related to osquery proper. |
The GRPC error is coming from https://github.com/kolide/launcher/, but it launcher isn't doing anything other than trying to marshal the data from osquery. So it's likely indicative of non-utf8 data. Whether it's this, or an instance of #5288 is hard to know without seeing the logs it's trying to encode. |
It was determined that there is mis-formatted osquery results in the osquery store from the previous version of osquery, and Launcher is still trying to send that. Removing the C:\Program Files\Kolide\Launcher-so-launcher\data\ directory and restarting the service fixed the issue - no more UTF8 events. |
Define UNICODE and _UNICODE preprocessors. This will result in Windows API function macros defaulting to the wide implementation instead of the ANSI (local) implementations. This PR ought to be the final PR in the line of PRs that aim to resolve internationalization/localization issues observed in osquery on Windows.
This should resolve #2569 from 2016. :-)
Please encourage future PR submitters to not call the *A functions but the *W functions instead.