Define UNICODE and _UNICODE preprocessors for windows #6338

farfella · 2020-03-29T02:25:22Z

Define UNICODE and _UNICODE preprocessors. This will result in Windows API function macros defaulting to the wide implementation instead of the ANSI (local) implementations. This PR ought to be the final PR in the line of PRs that aim to resolve internationalization/localization issues observed in osquery on Windows.

This should resolve #2569 from 2016. :-)

Please encourage future PR submitters to not call the *A functions but the *W functions instead.

osquery/config/tests/config_tests.cpp

farfella · 2020-04-11T23:44:11Z

@Breakwell if you could please review the changes to windows\process.cpp I made, would appreciate it. :-) I think you had also made changes recently around the code I updated here.

theopolis · 2020-04-26T02:19:43Z

This is great! I'll bring up the performance concern again. :D

Is there a way to compare the before / after:

» time osqueryi --profile 1  --disable_extensions "select * from processes"

Heads up I am not sure the Windows equivalent of time is.

Also @alessandrogario, do you have time to help test performance impact to windows_events usage on a system under minor load?

farfella · 2020-04-26T13:25:37Z

Is there a way to compare the before / after:
» time osqueryi --profile 1  --disable_extensions "select * from processes"

With powershell you can do something like

Measure-Command {your command statement}

Performance-wise, these changes will be the same or better than what currently exists in master. Reasoning is below:

Existing case. Let's say, a piece of osquery code calls OS's CreateFileA(S, ...). CreateFileA will performW = MultiByteToWideChar(GetACP(), A, ...) and then call CreateFileW(W, ...).

So in pseudo-code simplification,

void OurCode()
{
      const std::string S = "Hello, world!";
      CreateFileA(S.c_str());
      ...
}

HANDLE CreateFile(A, ...)
{
    W = MultiByteToWideChar(CP_ACP, A);
    return CreateFileW(W, ...);
}

What the update does. We perform MultiByteToWideChar(CP_UTF8, A, W, ...), and then call CreateFileW(W).

void OurCode()
{
    std::string S = "Hello, world!";
    W = MultiByteToWideChar(CP_UTF8, S.c_str());*
    CreateFileW(W,...);
    ...
}

So, these should two are at least equal in performance (ignoring CP_ACP/CP_UTF8 parameter difference).

Why the updated code may be better* performance-wise: In the case of ANSI Code Page (ACP), the system is performing localization table lookups, whereas converting from UTF-16 to UTF-8 or reverse is a functional transformation as per RFC 2279. So, it depends on whether the localization table (for ACP) is on disk or loaded in memory, and if it's from disk, IO will make it slower than functional transformation between UTF-16 and UTF-8.

theopolis · 2020-04-26T14:17:08Z

My performance concern is with the added string conversions, stringToWstring and wstringToString will at least double the memory impact for these functions. I'd love to see some of the more critical parts of code measured before and after to see if there's a noticeable change.

farfella · 2020-04-26T19:07:52Z

Understood. And I agree-- let's perform some experiments to determine performance impact of these changes.

directionless · 2020-05-27T02:06:46Z

I assume this is still in limbo.

farfella · 2020-05-28T01:37:30Z

@directionless Yes, that's correct. I do not have enough sample data to perform comparative measurements. I think @theopolis wanted @alessandrogario to perform some tests, if he had time available. Conceptually, the string conversions ought to not double memory impact because we're doing this conversion explicitly now whereas before this conversion was happening implicitly when the ANSI version of the functions (*A) were performing conversion and then calling the Wide (*W) and then reconverting the string back to ANSI. You can observe by disassembling the *A function's implementation to see that it's performing MultiByteToWideChar, then calling *W version, and then calling WideCharToMultiByte.

theopolis · 2020-07-08T03:10:17Z

Ok, I'll take another detailed pass at the changes here and if everything looks OK we can merge.

farfella · 2020-07-09T02:01:50Z

Great! Thanks @theopolis! Sorry, I've been a bit busy on my end with our own releases in July, so I haven't had a chance to look over any changes in osquery.

theopolis · 2020-07-09T14:40:42Z

Great! Thanks @theopolis! Sorry, I've been a bit busy on my end with our own releases in July, so I haven't had a chance to look over any changes in osquery.

No worries, thanks for all the work thus far here! I can take care of rebasing the PR and resolving any conflicts that show up.

theopolis

Two questions, then this looks good!

osquery/process/windows/process.cpp

theopolis · 2020-07-17T02:28:15Z

Is there a way to prevent the use of *A functions?

farfella · 2020-07-18T23:28:06Z

Is there a way to prevent the use of *A functions?

Haha, if it were that easy. :) I do not know of any way. However, what this PR does is if a programmer does not explicitly specify A suffix, e.g., just calls CreateFile, this will now be defined as CreateFileW. So, when reviewing, someone explicitly uses an *A function it will now stick out like a sore thumb.

theopolis · 2020-07-20T02:18:48Z

@farfella, thanks for the insight/details. With this understanding I feel confident we can land this with the nitpick fills for NULL->nullptr. I am happy to rebase this on master and apply the change if you don't have time this week, let me know.

farfella · 2020-07-21T13:59:32Z

I will have some time starting Wednesday night this week. Not sure if other *A functions have been added to master recently. I see r["mode"] = TEXT(meta->getMode()); in sleuthkit.cpp, which should have SQL_ prefix. Not sure if there are any others.

theopolis · 2020-07-22T02:44:07Z

Right, it would be any code that was added since your original PR. This is my fault for letting the PR sit for so long.

… default to wide implementation

farfella · 2020-07-24T01:24:29Z

Sorry, I had a migraine last night, and didn't get a chance to work on this. I was in fact building right now, as you approved and fixed the remaining merge issues. :) Thank you, @theopolis !

B3DTech · 2021-01-19T17:02:55Z

Should this have resolved the following that's flooding my Windows event logs? Running 4.5.1

caller=log.go:124 ts=2021-01-19T16:50:26.1264442Z caller=level.go:63 level=info caller=publish_logs.go:179 method=PublishLogs uuid=6e2ccedd-1f13-4ea9-9bcc-37d9611e3eff logType=string log_count=1103 message= errcode= reauth=false err="rpc error: code = Internal desc = grpc: error while marshaling: proto: field "kolide.agent.LogCollection.Log.Data" contains invalid UTF-8" took=14.0018ms

caller=log.go:124 ts=2021-01-19T16:52:01.0215442Z caller=level.go:63 level=info caller=log.go:69 component=osquery level=stderr msg="I0119 11:52:01.021148 11232 registry.cpp:558] Failed to expand globs: Failed to open registry handle" caller=registry.cpp:558

farfella · 2021-01-20T02:04:43Z

Hmm not sure. This seems to be a grpc marshalling error. Might not be related to osquery proper.

directionless · 2021-01-21T03:58:10Z

The GRPC error is coming from https://github.com/kolide/launcher/, but it launcher isn't doing anything other than trying to marshal the data from osquery. So it's likely indicative of non-utf8 data.

Whether it's this, or an instance of #5288 is hard to know without seeing the logs it's trying to encode.

B3DTech · 2021-01-29T20:14:04Z

It was determined that there is mis-formatted osquery results in the osquery store from the previous version of osquery, and Launcher is still trying to send that. Removing the C:\Program Files\Kolide\Launcher-so-launcher\data\ directory and restarting the service fixed the issue - no more UTF8 events.

directionless added this to the 4.2.1 milestone Mar 31, 2020

theopolis reviewed Apr 6, 2020

View reviewed changes

osquery/config/tests/config_tests.cpp Outdated Show resolved Hide resolved

Smjert modified the milestones: 4.3.0, 4.3.1 Apr 9, 2020

theopolis added Windows high pri labels Apr 26, 2020

polak785 mentioned this pull request Apr 27, 2020

error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8 kolide/launcher#445

Closed

directionless modified the milestones: 4.4.0, 4.5.0 May 27, 2020

theopolis reviewed Jul 17, 2020

View reviewed changes

osquery/process/windows/process.cpp Outdated Show resolved Hide resolved

farfella added 5 commits July 22, 2020 21:56

Define UNICODE and _UNICODE preprocessors Windows API function macros…

8ff090a

… default to wide implementation

updating tests

2339966

check format

4c3b23a

updating additional tests

85d7760

updating windows\process.cpp

dd78c38

farfella and others added 6 commits July 22, 2020 21:56

check style

f617569

rebase fixes

3df4ae6

add events tables

2a52784

add tsk tables

f694f50

add shimcache table

7f1d7cf

add LoadLibraryExW

f70c4ab

theopolis approved these changes Jul 24, 2020

View reviewed changes

theopolis merged commit f79d7e3 into osquery:master Jul 24, 2020

farfella deleted the define_unicode_preprocessors_for_windows branch August 16, 2020 23:30

farfella restored the define_unicode_preprocessors_for_windows branch August 16, 2020 23:31

Smjert mentioned this pull request Oct 25, 2020

osquery appears to ignore files with names that contain multibyte characters over U+00FF on Windows #4150

Closed

YingFengOu mentioned this pull request Jun 24, 2021

unicode character escaped in the log results #7175

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define UNICODE and _UNICODE preprocessors for windows #6338

Define UNICODE and _UNICODE preprocessors for windows #6338

farfella commented Mar 29, 2020

farfella commented Apr 11, 2020

theopolis commented Apr 26, 2020

farfella commented Apr 26, 2020 •

edited

Loading

theopolis commented Apr 26, 2020

farfella commented Apr 26, 2020

directionless commented May 27, 2020

farfella commented May 28, 2020

theopolis commented Jul 8, 2020

farfella commented Jul 9, 2020

theopolis commented Jul 9, 2020

theopolis left a comment

theopolis commented Jul 17, 2020

farfella commented Jul 18, 2020

theopolis commented Jul 20, 2020

farfella commented Jul 21, 2020

theopolis commented Jul 22, 2020

farfella commented Jul 24, 2020

B3DTech commented Jan 19, 2021

farfella commented Jan 20, 2021

directionless commented Jan 21, 2021

B3DTech commented Jan 29, 2021

Define UNICODE and _UNICODE preprocessors for windows #6338

Define UNICODE and _UNICODE preprocessors for windows #6338

Conversation

farfella commented Mar 29, 2020

farfella commented Apr 11, 2020

theopolis commented Apr 26, 2020

farfella commented Apr 26, 2020 • edited Loading

theopolis commented Apr 26, 2020

farfella commented Apr 26, 2020

directionless commented May 27, 2020

farfella commented May 28, 2020

theopolis commented Jul 8, 2020

farfella commented Jul 9, 2020

theopolis commented Jul 9, 2020

theopolis left a comment

Choose a reason for hiding this comment

theopolis commented Jul 17, 2020

farfella commented Jul 18, 2020

theopolis commented Jul 20, 2020

farfella commented Jul 21, 2020

theopolis commented Jul 22, 2020

farfella commented Jul 24, 2020

B3DTech commented Jan 19, 2021

farfella commented Jan 20, 2021

directionless commented Jan 21, 2021

B3DTech commented Jan 29, 2021

farfella commented Apr 26, 2020 •

edited

Loading