Improve switch statement decompilation #1258

Chicken-Bones · 2018-08-14T02:03:45Z

This PR contains a number of commits aimed at improving code quality surrounding [potential] switch statements. It primarily addresses two issues currently producing unnecessary gotos.

SwitchDetection.UseCSharpSwitch is over-aggressive, resulting in the replacement of condition trees with much poorer looking switch statements. A number of heuristics are added to better discern which representation to use.

LoopDetection.DetectSwitchBody, FindExitPoint and associated loop partitioning code is unaware of the interaction between switch statements and continue resulting in poor exit point selection when cases can leave the switch body via a branch to a continue block

The test cases added with each commit are the best place to look for a concise before/after. Unfortunately the number of test cases required to ensure correctness would be massive, so we have to resort to inspecting the round trip tests.

The only switch related gotos remaining in the round trip tests are in NRefactory.MonoCSharp/Tokenizer|Convert|MemberCore and Newtonsoft.Json.Utilities/ConvertUtils, all containing "goto case" statements. I have a solution based on condition tree flow analysis, but it is comparatively massive for addressing such a small target, so I've left it for future review.

I'd be happy to discuss wider design decision via PM or some form of instant messaging if this is a lot to review.

Edit: I am unsatisfied with the quality of the LoopDetection changes, but decided to leave invasing refactoring to the maintainers.

…tements

…cing poor quality code.

Remedy incorrect assumption that the default case was special.

… blocks.

… if body

…itchDetection

…nested loops

…ents

christophwille · 2018-08-14T07:43:34Z

Please note that @siegfriedpammer is on vacation so you might not get technical feedback for some time. Just wanted to let you know so you don't feel "ignored".

dgrunwald · 2018-09-01T16:09:26Z

ICSharpCode.Decompiler/IL/ControlFlow/SwitchAnalysis.cs

@@ -230,15 +231,36 @@ bool MatchSwitchVar(ILInstruction inst)
 				return inst.MatchLdLoc(out switchVar);
 		}

+		bool MatchSwitchVar(ILInstruction inst, out long sub)
+		{
+			if (inst.MatchBinaryNumericInstruction(BinaryNumericOperator.Sub, out var left, out var right) && right.MatchLdcI(out sub))


I'm assuming Roslyn uses a plain sub here? You should also check the overflow flag, as-is you're also matching sub.ovf/sub.ovf.un.

dgrunwald · 2018-09-01T16:21:51Z

ICSharpCode.Decompiler/IL/ControlFlow/SwitchAnalysis.cs

 				// if (comp(V OP val))
 				trueValues = MakeSetWhereComparisonIsTrue(comp.Kind, val, comp.Sign);
+				if (sub != 0)
+					trueValues = new LongSet(trueValues.Intervals.Select(i => ShiftInterval(i, sub)));


The ShiftInterval logic here doesn't seem quite right to me. Is there any reason you did not use trueValues.AddOffset(sub)?
AddOffset should be a better match for the wrap-around semantics of sub. (see also: SwitchAnalysis.AnalyzeSwitch())

Roslyn always uses unsigned casts with the sub, but I did just miss AddOffset

dgrunwald · 2018-09-01T16:54:44Z

ICSharpCode.Decompiler/IL/ControlFlow/LoopDetection.cs

+
+		/// <summary>
+		/// Returns the children in a loop dominator tree, with an optional exit point
+		/// Avoids returning continue statements when analysing switches (because increment blocks can be dominated)


This function isn't returning statements, but control flow nodes. I guess Avoids returning continue statements should be Avoids returning the target block of continue statements?
After all, continue; is fine within a switch, we just don't want to move the increment block itself.

Should be changed to Avoids returning continue statement target nodes (increment blocks)

Chicken-Bones · 2018-09-03T04:48:01Z

Awaiting further feedback before adding a fixup commit. Any idea when siegfriedpammer will be back?

dgrunwald · 2018-10-03T14:16:24Z

I'm not quite sure whether switch body detection is even in the right place (talking about the pass order).
I put it together with loop detection mostly because I wanted to reuse the logic for finding the body blocks.
Maybe it really ought to be somewhere in the big interleaved transform step instead.
So I thought about this for a while:

Fundamental differences between the C# control flow constructs:

if etc. has single entry-point, and a single "natural" exit point (allowing additional exits via goto/break/continue/return),
but don't provide targets for break/continue.
loops have single entry-point, single-exit point (allowing additional exits only via goto/return).
That loop exit point also serves as target for break.
The target for continue is more complicated.
switch has single entry-point, single-exit point (allowing additional exits via goto/continue/return).
The switch exit point also serves as target for break.

This means:

finding a nice exit point for "if" requires knowing about break/continue, so that those can be accepted while goto is avoided.
This means ConditionDetection should run after loop and switch bodies are detected.
finding a nice exit point for "switch" requires knowing about continue, so that those can be accepted while goto is avoided.
finding a nice exit point for loops doesn't require anything knowledge about anything but return statements, so it's safe to do loops first.

Unfortunately, the "continue" business is tricky and depends on the loop type.
In particular, for "for" loops, only some types of C# statements are allowed as increment statement.
So we need to run all our statement transforms before we can tell whether something is a valid "increment block".

Basically, we gain the information about what is a valid expression/statement (for if conditions and increment-blocks) in a bottom-up manner.
But we also need to know about break/continue targets during this bottom-up pass constructing if statement (basically passing the "current break/continue target" in a top-down manner).

When there's only if and loops in play, there's an easy approach: first find the loops,
then use the bottom-up-construction for expressions/statements and structured control flow with if.
Though even here, we'd need to take care that candidates for increment blocks are visited before any potential continue-statement is -- otherwise it's possible that an increment block isn't detected as such (e.g. because it contains not-yet-lifted nullable operations) by the time ConditionDetection runs, thus does not consider a branch to the increment block as "continue" and ends up picking the wrong exit point. We don't do this yet, but it's at least plausible that we could, with some additional care in the order we visit the control flow blocks.

But switch adds significant complexity: the "break" part of switch should work as in loops (thus needing the currently implemented approach), while the "continue" part of switch needs a bottom-up approach similar to how "if" is currently handled. I don't see any way to satisfy both parts at once. Since "continue" is somewhat rare within switch, that means we should keep treating switch similar to loops.

So I think your changes here are the right way to go. It feels like a hack to use HighLevelLoopTransform.GetIncrementBlock() at this point where it isn't reliable yet because the increment block isn't transformed; but I think the reasoning above explains that it is a pretty much unavoidable hack.
Fortunately most for loops have a pretty simple i++ increment block, so it's pretty difficult to actually hit this issue with our pass ordering. You'd need a loop with non-trivial increment block containing a switch containing a continue; statement. And unless I'm mistaken, any false positives or false negatives in IsIncrementBlock() will just result in redundant gotos.

dgrunwald · 2018-10-03T17:19:38Z

ICSharpCode.Decompiler/IL/ControlFlow/SwitchDetection.cs

+			private readonly IDictionary<ControlFlowNode, int> continueDepth = new Dictionary<ControlFlowNode, int>();
+
+			public LoopContext(ControlFlowGraph cfg, ControlFlowNode contextNode)
+			{


Can you comment some more on what this constructor is supposed to be doing?
Why is it analyzing the control flow reachable from the switch head, but stopping at loop heads?

Is it trying to follow the control flow up from the switch through the loop back-edge (a continue; statement) to the loop head, thus discovering the outer loop containing the switch?

If so, doesn't that fail for the following code?

while (true) { switch (i) { case 0: while (i < 10) i++; continue; default: return; } }

The outer loop head is not reachable from the switch without going through an inner loop head, but your traversal stops at the inner loop head.

Oh, n.Dominates(contextNode) is not the typical loop-head check. My example is no problem, since there's no loop head detected for the inner loop.
Still not quite sure which loop heads this constructor is supposed to find. All loop heads reachable from contextNode that also dominate the contextNode? It doesn't quite do that, though at least it should usually find the head of the loop that a continue; would target.

A different approach here would be to just iterate through the nodes dominating contextNode, and check if any of them is a loop head:

for (ControlFlowNode n = contextNode.ImmediateDominator; n != null; n = n.ImmediateDominator) { if (n.Predecessors.Any(p => n.Dominates(p))) { loopHeads.Add(n); } }

This should give you the heads of all loops containing contextNode, already ordered from innermost to outermost.

"heads of all loops containing contextNode" is the goal yes, so that continues (and their depth) can be identified.
I believe the example you've given will find loops which dominate the switch but don't contain it. My approach also ensures there's a back edge to the loop head from the switch's successor tree.

The problem is that we're running at a point where loop bodies haven't been detected yet. "loops containing contextNode" is not quite defined yet.
Code dominated by a loop head could end up in the loop, even if there's no back-edge:

outer_loop_head: while (...) { while (...) { if (...) { switch (...) { ... goto outer_loop_head; } return; } } }

But such a block dominated by the loop without back-edge could also end up outside the loop -- ultimately it's up to loop detection's decision.
My code snipper approximates "heads of all loops containing contextNode" by computing "heads of all loops dominating contextNode" (thus giving an upper bound). Your code seems to give a lower bound instead, so really the question is: which type of error is better here, false positive or false negative?

Maybe we could change the pass ordering to detect loops before we start dealing with switches? I don't see a reason why SwitchDetection needs to run before LoopDetection; and loops->switches->ifs seems like it would cause less issues.

The key is that if there's no back edge, then there's no continue statement, so you'll never encounter the false negative anyway.

dgrunwald · 2018-10-03T17:24:26Z

ICSharpCode.Decompiler/IL/ControlFlow/SwitchDetection.cs

+
+				// match do-while condition
+				if (pred.Successors.Count <= 2) {
+					if (HighLevelLoopTransform.MatchDoWhileConditionBlock((Block)pred.UserData, out var t1, out var t2) &&


This is quite unreliable here; any non-trivial loop condition might take up multiple blocks at this point in time.
The "do-while condition block" doesn't exist until much later in the transformation pipeline. Not sure what could be done about that...

Chicken-Bones · 2018-10-03T23:48:46Z

Good analysis. I wasn't able to achieve a provably correct or satisfactorily general approach here like I was with ConditionDetection, but I'm convinced such isn't possible with the current architecture, as you've mentioned. Particularly the fact that switch statements are often optimised to condition trees meant I had to build prediction heuristics. This implementation still provides great improvement in the vast majority of cases.

Proper detection would rely on two things.

outer loops being constructed first (currently inner loops and switches must be constructed first due to some ambiguity in loop contents but there may be another way around this)
high level loop transforms (and any pre-requisite transforms) being performed on the outer loop before anything that depends on continue statements (condition/switch detection)

dgrunwald · 2018-10-04T09:22:57Z

As a next step, I think we should try changing up the pass order to be:

Loop(Body)Detection
Switch(Head)Detection
the other switch-head-detecting transforms
SwitchBodyDetection

The guesswork about the "continue target block" seems unavoidable, but at least we get to avoid the guesswork about which loops contain the switch.
Maybe we could also run switch-body-detection within the pass that detects the switch head -- so instead of the SwitchUsesGoto() heuristic, we could ask "does switch-body-detection find a single unambiguous exit point?", and abort constructing the switch head if it doesn't.

Chicken-Bones · 2018-10-04T13:52:29Z

In general, I support said changing of pass order. The only issue I ran into when I attempted it was related to loops (and switch containers) being constructed inner first #915 Part of the switch body would end up outside the loop, and have to be poached from the parent container into the child. I didn't go any further with this because control flow graphs are restricted to container context, and I got quite good results with the current approach.

dgrunwald · 2018-10-07T14:42:31Z

Well, let's keep future improvements for the future.

I'm merging the branch as-is. It's already a huge improvement, no need to wait for the perfect solution.

Chicken-Bones added 11 commits August 14, 2018 11:19

Handle roslyn generated sub instructions in partitioned C# switch sta…

9a3914f

…tements

Improve UseCSharpSwitch to reduce over-aggressive use of switch produ…

62b2ad4

…cing poor quality code.

Add IL ordering based hint to UseCSharpSwitch

cb4fa90

Fix detection of switch statements with cases containing a single break;

1a02163

Remedy incorrect assumption that the default case was special.

Improve switch decompilation in loops via early detection of continue…

7017d99

… blocks.

Restore HighLevelLoopTransform pattern match for loop contents within…

52a279f

… if body

Prefer simple if statements over switch with single case block

d86f4b4

Remove dead code when unreachable condition blocks are consumed by Sw…

eed337c

…itchDetection

Increase max values per switch section

9f21e6a

Select outer-loop continue branches as break targets for switches in …

d8244e3

…nested loops

Better shortcircuit detection to avoid single condition switch statem…

481e05e

…ents

dgrunwald reviewed Sep 1, 2018

View reviewed changes

dgrunwald reviewed Oct 3, 2018

View reviewed changes

dgrunwald merged commit 481e05e into icsharpcode:master Oct 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve switch statement decompilation #1258

Improve switch statement decompilation #1258

Chicken-Bones commented Aug 14, 2018 •

edited

Loading

christophwille commented Aug 14, 2018

dgrunwald Sep 1, 2018

dgrunwald Sep 1, 2018

Chicken-Bones Sep 2, 2018

dgrunwald Sep 1, 2018

Chicken-Bones Sep 2, 2018

Chicken-Bones commented Sep 3, 2018

dgrunwald commented Oct 3, 2018

dgrunwald Oct 3, 2018

dgrunwald Oct 3, 2018

Chicken-Bones Oct 3, 2018 •

edited

Loading

dgrunwald Oct 4, 2018

Chicken-Bones Oct 4, 2018

dgrunwald Oct 3, 2018 •

edited

Loading

Chicken-Bones commented Oct 3, 2018

dgrunwald commented Oct 4, 2018

Chicken-Bones commented Oct 4, 2018

dgrunwald commented Oct 7, 2018

Improve switch statement decompilation #1258

Improve switch statement decompilation #1258

Conversation

Chicken-Bones commented Aug 14, 2018 • edited Loading

christophwille commented Aug 14, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Chicken-Bones commented Sep 3, 2018

dgrunwald commented Oct 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Chicken-Bones Oct 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgrunwald Oct 3, 2018 • edited Loading

Choose a reason for hiding this comment

Chicken-Bones commented Oct 3, 2018

dgrunwald commented Oct 4, 2018

Chicken-Bones commented Oct 4, 2018

dgrunwald commented Oct 7, 2018

Chicken-Bones commented Aug 14, 2018 •

edited

Loading

Chicken-Bones Oct 3, 2018 •

edited

Loading

dgrunwald Oct 3, 2018 •

edited

Loading