Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Marshalling native non-PODs yield different results depending on their size #12312

Closed
Alan-FGR opened this issue Mar 21, 2019 · 24 comments
Closed
Labels
area-Interop-coreclr question Answer questions and provide assistance, not an issue with source code or documentation.

Comments

@Alan-FGR
Copy link

Hi there. This issue was originally reported on dotnet/csharplang here: dotnet/csharplang#2357 but as requested I'm migrating it here since it's the most appropriate place.


I was marshaling some data from C++ and I came across a strange inconsistent behavior, here's a repro project:
https://github.com/Alan-FGR/PinvokeDebug
here's the native code:
https://github.com/Alan-FGR/PinvokeDebug/blob/c63b8e1b310ab6ad0316e1044580789aa7348b2f/Dll/Dll.cpp#L9-L44
here's the C# code:
https://github.com/Alan-FGR/PinvokeDebug/blob/c63b8e1b310ab6ad0316e1044580789aa7348b2f/PinvokeProblemDebug/Program.cs#L8-L46
(github should be displaying the lines here, that isn't working on my end though)
Note how the only difference is that single extra float. The version with 3 floats works fine, with 2 it doesn't, but if I cast it into a POD it works fine. I can see why that doesn't work since the destructor is called when exiting the scope, but what I don't understand is the inconsistency there.

Am I not supposed to do that? Is it OK to return PODs?

@jkoritzinsky
Copy link
Member

Are you hitting this issue on x86 or x64 (or both)?

@Alan-FGR
Copy link
Author

Alan-FGR commented Mar 21, 2019

@jkoritzinsky sorry, I forgot to add those details. These are all x64, I've tried both .Net Framework 4.7.2 and CoreCLR 2.1 (and minor revs like 2.1.503 and .402).
So is that code supposed to work?
EDIT: Also, while .Net FW returns garbage data, Core simply crashes with no debug info whatsoever

@jkoritzinsky
Copy link
Member

jkoritzinsky commented Mar 21, 2019

I know what's going on and there's no good way to fix it:

In the Windows x64 calling convention, user-defined types can be returned by value in a register by global and static member functions (not instance member functions) if its size is a multiple of 2 and at most 64 bits and is a C++03 POD type. Otherwise, the convention requires a return buffer to be passed as the first parameter. There's not really a good way for CoreCLR to identify if the structure it is marshalling is a C++03 POD or not, so the JIT assumes that it is a C++03 POD. I believe that behavior was chosen because most Win32 structures are PODs and it would enable the expected behavior most of the time.

See the Win-x64 calling convention documentation on Microsoft Docs for more information.

The reason the 3-float scenario passes is because a 3-float struct is 12 bytes, which isn't a multiplepower of 2 and as a result is never eligable for a return-by-register (the question of if it is a POD never comes into play).

Now I don't know why NETFX and Core have different behavior.

My suggestion would be to manually pass any non-POD structures as out parameters instead of return values if you are able to do so.

@scalablecory
Copy link
Contributor

Can confirm this fails in 32-bit. I didn't even test x64 last night. cdecl is not implemented identically between CLR and VC++.

Leaving the C++ alone, you can change the signature in .NET to this to fix it:

static extern void getVec2(ref p2 x)

I also noticed that even with this version, it appears CLR is putting the reference in both stack and in ecx.

@jkoritzinsky
Copy link
Member

In the x86 cdecl calling convention, ecx is a caller-saved register anyway so the fact that the CLR puts the reference in ecx is unrelated to the parameter passing.

After a small experiment on Compiler Explorer, it looks like the x86 cdecl , stdcall, calling conventions have the same corner case as the x64 calling convention (PODs can be enregistered but non-PODs cannot).

So for Windows-x86 I suggest the same workaround as I did with x64 or @scalablecory's workaround if you cannot change the native signature.

@Alan-FGR
Copy link
Author

@jkoritzinsky Thank you for investigating this. While I can certainly use ref/out parameters to retrieve the data, I don't really like the unnecessary memory initialization required for that. However, casting into a POD certainly has some overhead too, and I think zeroing the memory should be faster than copying, so maybe that's acceptable.
Thank you guys.

@jkoritzinsky
Copy link
Member

I'm going to close this issue then since we've found a workaround.

@Alan-FGR
Copy link
Author

Alan-FGR commented Mar 22, 2019

Hi guys, I have one more question, is this really a POD?:

struct p2POD
{
	float x{42},y{2};
};

that is the version that works, but at least for Clang that's not a POD because of the fields defaults (which counts as having default constructor (i.e. non trivial)). I'm using Clang to parse the code and generate bindings and I don't know how to identify the ones that don't have to be passed as ref in this case. Simply checking if they are POD won't work, and I couldn't really find any other difference.

@jkoritzinsky
Copy link
Member

Maybe because the constructor is still compiler generated and implicitly defined it qualifies for enregistered return? You could try checking the exact list of conditions listed in the documentation I linked above.

@scalablecory
Copy link
Contributor

@Alan-FGR Clang is correct. Part of the requirement of POD is to be trivial, and to be trivial it must have a trivial (i.e. no-op) constructor. Your supplying defaults creates a non-trivial constructor.

@Alan-FGR
Copy link
Author

I dumped all the Clang attributes and ran a diffviewer and it seems the only difference (that could be related) between all of these non-PODs:

struct ActualPOD //WORKS, 
{
	float x,y;
};

struct NonPOD //WORKS, 
{
	float x{2},y{3};
};

struct NonPOD2 //BROKEN. 
{
	float x{2},y{3};
	NonPOD2(){}
};

struct NonPOD3 //BROKEN, 
{
	float x,y;
	NonPOD3(){}
};

Is that the 'broken' ones have a non-implicit constructor (which isn't defaulted but I think this doesn't matter)... now the question is whether that information alone is enough to reliably know whether the object is safe to return. I think it is but I'm not 100% sure if I won't have false negatives just by checking whether all constructors (and copy and destructors) are implicit.
I'm thinking about maybe just trying to somehow come up with a hacky way to not initialize structs in C#, maybe by using some IL, so I can pass everything by ref and there's no need to deal with that issue at all. If you guys know something about that please let me know :P.

@jkoritzinsky
Copy link
Member

That condition sounds like it should work. Here's the quote from the documentation that describes the exact conditions you need to detect:

To return a user-defined type by value in RAX, it must have a length of 1, 2, 4, 8, 16, 32, or 64 bits. It must also have no user-defined constructor, destructor, or copy assignment operator; no private or protected non-static data members; no non-static data members of reference type; no base classes; no virtual functions; and no data members that do not also meet these requirements.

If you can detect those cases (which Clang must be able to since it accurately implements the Microsoft x64 ABI on Windows), then you should be all good.

@Alan-FGR
Copy link
Author

Oh well... actually, that seems to be as easy as initblk?

@Alan-FGR
Copy link
Author

@jkoritzinsky Sorry, your message didn't appear here before I reply... in any case, it would be nice not to initialize objects I'm going to fully overwrite anyway :)
Thank you

@jkotas
Copy link
Member

jkotas commented Mar 22, 2019

For interop, it is best to stay with the POD or pointers. Otherwise, your interop code will be non-portable between architectures (e.g. it can work on Windows x64, but not on Linux x64 or Windows x86) and you need to maintain architecture-specific versions of it to be portable.

initialize objects I'm going to fully overwrite anyway

You do not need to initialize objects that you are going to fully overwrite. Taking a pointer of uninitialized structure works in C# just fine.

@Alan-FGR
Copy link
Author

Alan-FGR commented Mar 23, 2019

@jkotas so you do not recommend returning for the types that Clang doesn't classify as POD but that still worked in my tests? (basically types with defaults but no explicit ctors/dtors, as that NonPOD class in the code above) Do you think that won't be portable? Maybe another thing to consider is whether compilers and options affect that.

EDIT: Actually I misread something :P

@tritao
Copy link
Contributor

tritao commented Mar 23, 2019

This is how we work it out from Clang in CppSharp: https://github.com/mono/CppSharp/blob/master/src/CppParser/Parser.cpp#L3240

@Alan-FGR
Copy link
Author

@jkotas well, you're actually right, and it seems that the inline out declaration actually expands into the version that doesn't initialize the object, in other words this:

initP2(out p2 p);

expands into this:

p2 p;
initP2(out p);

and there's no object initialization instruction in the IL...
So is there any reason not to simply return all the objects as ref/out values?

@Alan-FGR
Copy link
Author

Alan-FGR commented Mar 23, 2019

@tritao I'll take a look, thanks... but now I don't see a reason not to simply use the first parameter for all the objects tbh, is there any reason not to?
EDIT: lol, that's looking very similar to my code here :P

@tritao
Copy link
Contributor

tritao commented Mar 23, 2019

If you're just looking for the logic Clang uses, see bool MicrosoftCXXABI::classifyReturnType(CGFunctionInfo &FI) const { .

https://github.com/llvm-mirror/clang/blob/2982a888ff/lib/CodeGen/MicrosoftCXXABI.cpp#L1051

@jkotas
Copy link
Member

jkotas commented Mar 23, 2019

Do you think that won't be portable?

Yes, I am pretty sure that the generated C# code won't be portable. The rules for when the return value is returned via return buffer are different between platforms.

So is there any reason not to simply return all the objects as ref/out values?

It is even better (more performant) to use unmanaged pointer, e.g.:

p2 p;
initP2(&p);

@Alan-FGR
Copy link
Author

Awesome! Thank you guys! 👍

@Alan-FGR
Copy link
Author

It is even better (more performant) to use unmanaged pointer

@jkotas sorry to resurrect this for a kinda off-topic question, but I can't really understand the reason for that, if you're passing as ref a stack/fixed object (or say unmanaged in native heap), what prevents the compiler from generating the exact same code as a pointer? I can't think of any reason like extra safety checks that would be necessary there.

@jkotas
Copy link
Member

jkotas commented May 14, 2019

what prevents the compiler from generating the exact same code as a pointer?

ref requires pinning before it is passed it to PInvoke. You are right that it may be possible to optimized out the pinning for stack allocated objects. The current .NET Core JIT does not do this optimization.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Interop-coreclr question Answer questions and provide assistance, not an issue with source code or documentation.
Projects
None yet
Development

No branches or pull requests

5 participants