Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow Up Refactoring for PR #2874 #2946

Closed
christophwille opened this issue Apr 3, 2023 · 4 comments
Closed

Follow Up Refactoring for PR #2874 #2946

christophwille opened this issue Apr 3, 2023 · 4 comments
Labels
Decompiler The decompiler engine itself Enhancement Areas for improvement

Comments

@christophwille
Copy link
Member

See #2874 (comment)

@ElektroKill @siegfriedpammer Please continue impl discussion here (or in the eventual PR).

@christophwille christophwille added the Enhancement Areas for improvement label Apr 3, 2023
@ElektroKill
Copy link
Contributor

The current way of storing the inferred information about the compiler that was used to compile a given ILFunction is not very good. It's not very scalable and rather clunky to use. There are currently two fields responsible for this information but this number may increase in the future making managing this information harder.

/// <summary>
/// Gets whether the YieldReturnDecompiler determined that the Mono C# compiler was used to compile this function.
/// </summary>
public bool StateMachineCompiledWithMono;
/// <summary>
/// Gets whether the YieldReturnDecompiler determined that the Legacy VB compiler was used to compile this function.
/// </summary>
public bool StateMachineCompiledWithLegacyVisualBasic;

Proposal:
Replace the two fields with a single field of type CompilerFlags like so:

public CompilerFlags CompilerFlags;

CompilerFlags definition:

[Flags]
public enum CompilerFlags : ushort
{
	// Lower 8 bits are used for compiler flags, allowing for 256 possible languages
	Language_None = 0,
	Language_CSharp = 0x1,
	Language_VisualBasic = 0x2,
	LanguageMask = 0xFF,

	// Upper 8 bits are used for compiler kind, allowing for 256 possible compiler kinds
	CompilerKind_None = 0,
	CompilerKind_Mono = 0x100,
	CompilerKind_Legacy = 0x200,
	CompilerKindMask = 0xFF00,

	CSharp = Language_CSharp | CompilerKind_None,
	CSharpLegacy = Language_CSharp | CompilerKind_Legacy,
	CSharpMono = Language_CSharp | CompilerKind_Mono,

	VisualBasic = Language_VisualBasic | CompilerKind_None,
	VisualBasicLegacy = Language_VisualBasic | CompilerKind_Legacy,
	VisualBasicMono = Language_VisualBasic | CompilerKind_Mono,
}

Additional extension methods to make working with this enum easier:

public static class CompilerFlagsExtensions
{
	public static bool IsCSharp(this CompilerFlags flags) => (flags & CompilerFlags.LanguageMask) == CompilerFlags.Language_CSharp;

	public static bool IsVisualBasic(this CompilerFlags flags) => (flags & CompilerFlags.LanguageMask) == CompilerFlags.Language_VisualBasic;

	public static bool IsLegacy(this CompilerFlags flags) => (flags & CompilerFlags.CompilerKindMask) == CompilerFlags.CompilerKind_Legacy;

	public static bool IsMono(this CompilerFlags flags) => (flags & CompilerFlags.CompilerKindMask) == CompilerFlags.CompilerKind_Mono;
}

The split for language and compiler kind is to allow different languages to be compiled with known compiler kinds without introducing too many changes to the enum or the code utilizing it. Code that depends on the Mono family of compilers may use the included IsMono() method to check this. The same applies for code based on a specific language without regard for a specific compiler.

The choice to use ushort and allow 256 languages and compiler kinds was done as I thought that byte is too restrictive with its mere 8 bits. On 16 compiler kinds and 16 languages (if bits are split equally) should be enough for the current state of the .NET ecosystem so this is something that could be changed.

Currently, the default value is None meaning that the language is unknown but it is also possible that ILSpy could default to C# as it is the most common and most likely input language. The current CompilerKind_None could also be used as a unknown compiler kind and a new CompilerKind_Roslyn could be added if necessary.

This implementation would also allow for a future ILSpy API to detect the compiler used to generate the IL (don't know if this is something that has been thought about earlier or suggested, just something that came to mind when I was drafting this implementation).

Feedback on this proposal is greatly appreciated :P

@siegfriedpammer
Copy link
Member

Currently, the default value is None meaning that the language is unknown but it is also possible that ILSpy could default to C# as it is the most common and most likely input language. The current CompilerKind_None could also be used as a unknown compiler kind and a new CompilerKind_Roslyn could be added if necessary.

I think what you mention in the last sentence is a good idea. For our unit tests we currently use a similar enumeration:

[Flags]
public enum CompilerOptions
{
None,
Optimize = 0x1,
UseDebug = 0x2,
Force32Bit = 0x4,
Library = 0x8,
UseRoslyn1_3_2 = 0x10,
UseMcs2_6_4 = 0x20,
ReferenceVisualBasic = 0x40,
TargetNet40 = 0x80,
GeneratePdb = 0x100,
Preview = 0x200,
UseRoslyn2_10_0 = 0x400,
UseRoslyn3_11_0 = 0x800,
UseRoslynLatest = 0x1000,
UseMcs5_23 = 0x2000,
UseTestRunner = 0x4000,
NullableEnable = 0x8000,
ReferenceUnsafe = 0x10000,
UseMcsMask = UseMcs2_6_4 | UseMcs5_23,
UseRoslynMask = UseRoslyn1_3_2 | UseRoslyn2_10_0 | UseRoslyn3_11_0 | UseRoslynLatest
}

Where we have distinct modes for legacy csc, Roslyn csc and Mono mcs. In that case legacy csc is the default and the other two get a separate value. I think we could do the same here?

I have no idea if this would make sense when taking VB.NET and its compilers into the picture...

@dgrunwald
Copy link
Member

Honestly, I think this is overengineering for the yield-return/async decompiler. I don't think there's anything wrong with having 2 bools.

Also, I don't we'll be able to detect a whole bunch of compiler versions/options from the IL code; so a detailed enum seems misleading: not all combinations can be accurately detected.

@ElektroKill
Copy link
Contributor

Currently, the default value is None meaning that the language is unknown but it is also possible that ILSpy could default to C# as it is the most common and most likely input language. The current CompilerKind_None could also be used as a unknown compiler kind and a new CompilerKind_Roslyn could be added if necessary.

I think what you mention in the last sentence is a good idea. For our unit tests we currently use a similar enumeration:

[Flags]
public enum CompilerOptions
{
None,
Optimize = 0x1,
UseDebug = 0x2,
Force32Bit = 0x4,
Library = 0x8,
UseRoslyn1_3_2 = 0x10,
UseMcs2_6_4 = 0x20,
ReferenceVisualBasic = 0x40,
TargetNet40 = 0x80,
GeneratePdb = 0x100,
Preview = 0x200,
UseRoslyn2_10_0 = 0x400,
UseRoslyn3_11_0 = 0x800,
UseRoslynLatest = 0x1000,
UseMcs5_23 = 0x2000,
UseTestRunner = 0x4000,
NullableEnable = 0x8000,
ReferenceUnsafe = 0x10000,
UseMcsMask = UseMcs2_6_4 | UseMcs5_23,
UseRoslynMask = UseRoslyn1_3_2 | UseRoslyn2_10_0 | UseRoslyn3_11_0 | UseRoslynLatest
}

I don't think including the same level of detail as found in the enum used in the unit tests is possible in the decompiler. Detecting the flavor of the compiler should be doable but detecting the exact version used could be quite tricky if even possible to do accurately. The same goes for compiler options like optimize.

Where we have distinct modes for legacy csc, Roslyn csc and Mono mcs. In that case legacy csc is the default and the other two get a separate value. I think we could do the same here?

I have no idea if this would make sense when taking VB.NET and its compilers into the picture...

I don't think we should default to legacy csc. A better option would be to have an unknown state for both parts of this new enumeration so that when ILSpy is unable to detect the compiler kind it is properly reflected in the proposed field. As for VB.NET it is definitely possible to use the same legacy vbc, mono vbc, and roslyn differentiation. Although I am not sure about the necessity for the mono vbc compiler option as I am yet to see it being used.

Honestly, I think this is overengineering for the yield-return/async decompiler. I don't think there's anything wrong with having 2 bools.

Yes, the two fields are enough for the purposes of the decompiler right now (this probably won't change that much) but since the decompiler already has some idea about other compiler information at certain stages of the ILAst processing, why not disclose this information to the user/ICSharpCode.Decompiler API consumer!

Also, I don't we'll be able to detect a whole bunch of compiler versions/options from the IL code; so a detailed enum seems misleading: not all combinations can be accurately detected.

The language can definitely be determined in a lot of cases as many of the patterns for various transforms differ across languages and compiler "flavors". For example async and yield-return state machines; display classes; lock, using, cached delegate initialization patterns; etc. can be used to make quite good guesses or sometimes conclusions about the compiler and language used to produce given a given IL body.

@christophwille christophwille added the Decompiler The decompiler engine itself label Apr 9, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Decompiler The decompiler engine itself Enhancement Areas for improvement
Projects
None yet
Development

No branches or pull requests

4 participants