Marshalling native non-PODs yield different results depending on their size #12312

Alan-FGR · 2019-03-21T17:43:19Z

Hi there. This issue was originally reported on dotnet/csharplang here: dotnet/csharplang#2357 but as requested I'm migrating it here since it's the most appropriate place.

I was marshaling some data from C++ and I came across a strange inconsistent behavior, here's a repro project:
https://github.com/Alan-FGR/PinvokeDebug
here's the native code:
https://github.com/Alan-FGR/PinvokeDebug/blob/c63b8e1b310ab6ad0316e1044580789aa7348b2f/Dll/Dll.cpp#L9-L44
here's the C# code:
https://github.com/Alan-FGR/PinvokeDebug/blob/c63b8e1b310ab6ad0316e1044580789aa7348b2f/PinvokeProblemDebug/Program.cs#L8-L46
(github should be displaying the lines here, that isn't working on my end though)
Note how the only difference is that single extra float. The version with 3 floats works fine, with 2 it doesn't, but if I cast it into a POD it works fine. I can see why that doesn't work since the destructor is called when exiting the scope, but what I don't understand is the inconsistency there.

Am I not supposed to do that? Is it OK to return PODs?

jkoritzinsky · 2019-03-21T18:11:34Z

Are you hitting this issue on x86 or x64 (or both)?

Alan-FGR · 2019-03-21T18:51:23Z

@jkoritzinsky sorry, I forgot to add those details. These are all x64, I've tried both .Net Framework 4.7.2 and CoreCLR 2.1 (and minor revs like 2.1.503 and .402).
So is that code supposed to work?
EDIT: Also, while .Net FW returns garbage data, Core simply crashes with no debug info whatsoever

jkoritzinsky · 2019-03-21T20:43:11Z

I know what's going on and there's no good way to fix it:

In the Windows x64 calling convention, user-defined types can be returned by value in a register by global and static member functions (not instance member functions) if its size is a multiple of 2 and at most 64 bits and is a C++03 POD type. Otherwise, the convention requires a return buffer to be passed as the first parameter. There's not really a good way for CoreCLR to identify if the structure it is marshalling is a C++03 POD or not, so the JIT assumes that it is a C++03 POD. I believe that behavior was chosen because most Win32 structures are PODs and it would enable the expected behavior most of the time.

See the Win-x64 calling convention documentation on Microsoft Docs for more information.

The reason the 3-float scenario passes is because a 3-float struct is 12 bytes, which isn't a ~~multiple~~power of 2 and as a result is never eligable for a return-by-register (the question of if it is a POD never comes into play).

Now I don't know why NETFX and Core have different behavior.

My suggestion would be to manually pass any non-POD structures as out parameters instead of return values if you are able to do so.

scalablecory · 2019-03-21T21:30:33Z

Can confirm this fails in 32-bit. I didn't even test x64 last night. cdecl is not implemented identically between CLR and VC++.

Leaving the C++ alone, you can change the signature in .NET to this to fix it:

static extern void getVec2(ref p2 x)

I also noticed that even with this version, it appears CLR is putting the reference in both stack and in ecx.

jkoritzinsky · 2019-03-21T21:56:45Z

In the x86 cdecl calling convention, ecx is a caller-saved register anyway so the fact that the CLR puts the reference in ecx is unrelated to the parameter passing.

After a small experiment on Compiler Explorer, it looks like the x86 cdecl , stdcall, calling conventions have the same corner case as the x64 calling convention (PODs can be enregistered but non-PODs cannot).

So for Windows-x86 I suggest the same workaround as I did with x64 or @scalablecory's workaround if you cannot change the native signature.

Alan-FGR · 2019-03-21T22:03:42Z

@jkoritzinsky Thank you for investigating this. While I can certainly use ref/out parameters to retrieve the data, I don't really like the unnecessary memory initialization required for that. However, casting into a POD certainly has some overhead too, and I think zeroing the memory should be faster than copying, so maybe that's acceptable.
Thank you guys.

jkoritzinsky · 2019-03-21T22:44:50Z

I'm going to close this issue then since we've found a workaround.

Alan-FGR · 2019-03-22T15:40:30Z

Hi guys, I have one more question, is this really a POD?:

struct p2POD
{
	float x{42},y{2};
};

that is the version that works, but at least for Clang that's not a POD because of the fields defaults (which counts as having default constructor (i.e. non trivial)). I'm using Clang to parse the code and generate bindings and I don't know how to identify the ones that don't have to be passed as ref in this case. Simply checking if they are POD won't work, and I couldn't really find any other difference.

jkoritzinsky · 2019-03-22T16:08:15Z

Maybe because the constructor is still compiler generated and implicitly defined it qualifies for enregistered return? You could try checking the exact list of conditions listed in the documentation I linked above.

scalablecory · 2019-03-22T16:08:36Z

@Alan-FGR Clang is correct. Part of the requirement of POD is to be trivial, and to be trivial it must have a trivial (i.e. no-op) constructor. Your supplying defaults creates a non-trivial constructor.

Alan-FGR · 2019-03-22T17:07:13Z

I dumped all the Clang attributes and ran a diffviewer and it seems the only difference (that could be related) between all of these non-PODs:

struct ActualPOD //WORKS, 
{
	float x,y;
};

struct NonPOD //WORKS, 
{
	float x{2},y{3};
};

struct NonPOD2 //BROKEN. 
{
	float x{2},y{3};
	NonPOD2(){}
};

struct NonPOD3 //BROKEN, 
{
	float x,y;
	NonPOD3(){}
};

Is that the 'broken' ones have a non-implicit constructor (which isn't defaulted but I think this doesn't matter)... now the question is whether that information alone is enough to reliably know whether the object is safe to return. I think it is but I'm not 100% sure if I won't have false negatives just by checking whether all constructors (and copy and destructors) are implicit.
I'm thinking about maybe just trying to somehow come up with a hacky way to not initialize structs in C#, maybe by using some IL, so I can pass everything by ref and there's no need to deal with that issue at all. If you guys know something about that please let me know :P.

jkoritzinsky · 2019-03-22T17:12:55Z

That condition sounds like it should work. Here's the quote from the documentation that describes the exact conditions you need to detect:

To return a user-defined type by value in RAX, it must have a length of 1, 2, 4, 8, 16, 32, or 64 bits. It must also have no user-defined constructor, destructor, or copy assignment operator; no private or protected non-static data members; no non-static data members of reference type; no base classes; no virtual functions; and no data members that do not also meet these requirements.

If you can detect those cases (which Clang must be able to since it accurately implements the Microsoft x64 ABI on Windows), then you should be all good.

Alan-FGR · 2019-03-22T17:16:40Z

Oh well... actually, that seems to be as easy as initblk?

Alan-FGR · 2019-03-22T17:18:44Z

@jkoritzinsky Sorry, your message didn't appear here before I reply... in any case, it would be nice not to initialize objects I'm going to fully overwrite anyway :)
Thank you

jkotas · 2019-03-22T19:35:49Z

For interop, it is best to stay with the POD or pointers. Otherwise, your interop code will be non-portable between architectures (e.g. it can work on Windows x64, but not on Linux x64 or Windows x86) and you need to maintain architecture-specific versions of it to be portable.

initialize objects I'm going to fully overwrite anyway

You do not need to initialize objects that you are going to fully overwrite. Taking a pointer of uninitialized structure works in C# just fine.

Alan-FGR · 2019-03-23T00:46:41Z

@jkotas so you do not recommend returning for the types that Clang doesn't classify as POD but that still worked in my tests? (basically types with defaults but no explicit ctors/dtors, as that NonPOD class in the code above) Do you think that won't be portable? Maybe another thing to consider is whether compilers and options affect that.

EDIT: Actually I misread something :P

tritao · 2019-03-23T00:59:07Z

This is how we work it out from Clang in CppSharp: https://github.com/mono/CppSharp/blob/master/src/CppParser/Parser.cpp#L3240

Alan-FGR · 2019-03-23T01:00:27Z

@jkotas well, you're actually right, and it seems that the inline out declaration actually expands into the version that doesn't initialize the object, in other words this:

initP2(out p2 p);

expands into this:

p2 p;
initP2(out p);

and there's no object initialization instruction in the IL...
So is there any reason not to simply return all the objects as ref/out values?

Alan-FGR · 2019-03-23T01:02:19Z

@tritao I'll take a look, thanks... but now I don't see a reason not to simply use the first parameter for all the objects tbh, is there any reason not to?
EDIT: lol, that's looking very similar to my code here :P

tritao · 2019-03-23T01:14:26Z

If you're just looking for the logic Clang uses, see bool MicrosoftCXXABI::classifyReturnType(CGFunctionInfo &FI) const { .

https://github.com/llvm-mirror/clang/blob/2982a888ff/lib/CodeGen/MicrosoftCXXABI.cpp#L1051

jkotas · 2019-03-23T01:35:49Z

Do you think that won't be portable?

Yes, I am pretty sure that the generated C# code won't be portable. The rules for when the return value is returned via return buffer are different between platforms.

So is there any reason not to simply return all the objects as ref/out values?

It is even better (more performant) to use unmanaged pointer, e.g.:

p2 p;
initP2(&p);

Alan-FGR · 2019-03-23T02:53:51Z

Awesome! Thank you guys! 👍

Alan-FGR · 2019-05-14T15:05:46Z

It is even better (more performant) to use unmanaged pointer

@jkotas sorry to resurrect this for a kinda off-topic question, but I can't really understand the reason for that, if you're passing as ref a stack/fixed object (or say unmanaged in native heap), what prevents the compiler from generating the exact same code as a pointer? I can't think of any reason like extra safety checks that would be necessary there.

jkotas · 2019-05-14T15:39:36Z

what prevents the compiler from generating the exact same code as a pointer?

ref requires pinning before it is passed it to PInvoke. You are right that it may be possible to optimized out the pinning for stack allocated objects. The current .NET Core JIT does not do this optimization.

jkoritzinsky closed this as completed Mar 21, 2019

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

ghost locked as resolved and limited conversation to collaborators Dec 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Marshalling native non-PODs yield different results depending on their size #12312

Marshalling native non-PODs yield different results depending on their size #12312

Alan-FGR commented Mar 21, 2019

jkoritzinsky commented Mar 21, 2019

Alan-FGR commented Mar 21, 2019 •

edited

Loading

jkoritzinsky commented Mar 21, 2019 •

edited by RussKeldorph

Loading

scalablecory commented Mar 21, 2019

jkoritzinsky commented Mar 21, 2019

Alan-FGR commented Mar 21, 2019

jkoritzinsky commented Mar 21, 2019

Alan-FGR commented Mar 22, 2019 •

edited

Loading

jkoritzinsky commented Mar 22, 2019

scalablecory commented Mar 22, 2019

Alan-FGR commented Mar 22, 2019

jkoritzinsky commented Mar 22, 2019

Alan-FGR commented Mar 22, 2019

Alan-FGR commented Mar 22, 2019

jkotas commented Mar 22, 2019

Alan-FGR commented Mar 23, 2019 •

edited

Loading

tritao commented Mar 23, 2019

Alan-FGR commented Mar 23, 2019

Alan-FGR commented Mar 23, 2019 •

edited

Loading

tritao commented Mar 23, 2019

jkotas commented Mar 23, 2019

Alan-FGR commented Mar 23, 2019

Alan-FGR commented May 14, 2019

jkotas commented May 14, 2019

Marshalling native non-PODs yield different results depending on their size #12312

Marshalling native non-PODs yield different results depending on their size #12312

Comments

Alan-FGR commented Mar 21, 2019

jkoritzinsky commented Mar 21, 2019

Alan-FGR commented Mar 21, 2019 • edited Loading

jkoritzinsky commented Mar 21, 2019 • edited by RussKeldorph Loading

scalablecory commented Mar 21, 2019

jkoritzinsky commented Mar 21, 2019

Alan-FGR commented Mar 21, 2019

jkoritzinsky commented Mar 21, 2019

Alan-FGR commented Mar 22, 2019 • edited Loading

jkoritzinsky commented Mar 22, 2019

scalablecory commented Mar 22, 2019

Alan-FGR commented Mar 22, 2019

jkoritzinsky commented Mar 22, 2019

Alan-FGR commented Mar 22, 2019

Alan-FGR commented Mar 22, 2019

jkotas commented Mar 22, 2019

Alan-FGR commented Mar 23, 2019 • edited Loading

tritao commented Mar 23, 2019

Alan-FGR commented Mar 23, 2019

Alan-FGR commented Mar 23, 2019 • edited Loading

tritao commented Mar 23, 2019

jkotas commented Mar 23, 2019

Alan-FGR commented Mar 23, 2019

Alan-FGR commented May 14, 2019

jkotas commented May 14, 2019

Alan-FGR commented Mar 21, 2019 •

edited

Loading

jkoritzinsky commented Mar 21, 2019 •

edited by RussKeldorph

Loading

Alan-FGR commented Mar 22, 2019 •

edited

Loading

Alan-FGR commented Mar 23, 2019 •

edited

Loading

Alan-FGR commented Mar 23, 2019 •

edited

Loading