-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve switch statement decompilation #1258
Conversation
…cing poor quality code.
Remedy incorrect assumption that the default case was special.
Please note that @siegfriedpammer is on vacation so you might not get technical feedback for some time. Just wanted to let you know so you don't feel "ignored". |
@@ -230,15 +231,36 @@ bool MatchSwitchVar(ILInstruction inst) | |||
return inst.MatchLdLoc(out switchVar); | |||
} | |||
|
|||
bool MatchSwitchVar(ILInstruction inst, out long sub) | |||
{ | |||
if (inst.MatchBinaryNumericInstruction(BinaryNumericOperator.Sub, out var left, out var right) && right.MatchLdcI(out sub)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming Roslyn uses a plain sub
here? You should also check the overflow flag, as-is you're also matching sub.ovf
/sub.ovf.un
.
// if (comp(V OP val)) | ||
trueValues = MakeSetWhereComparisonIsTrue(comp.Kind, val, comp.Sign); | ||
if (sub != 0) | ||
trueValues = new LongSet(trueValues.Intervals.Select(i => ShiftInterval(i, sub))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ShiftInterval
logic here doesn't seem quite right to me. Is there any reason you did not use trueValues.AddOffset(sub)
?
AddOffset
should be a better match for the wrap-around semantics of sub
. (see also: SwitchAnalysis.AnalyzeSwitch()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roslyn always uses unsigned casts with the sub, but I did just miss AddOffset
|
||
/// <summary> | ||
/// Returns the children in a loop dominator tree, with an optional exit point | ||
/// Avoids returning continue statements when analysing switches (because increment blocks can be dominated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function isn't returning statements, but control flow nodes. I guess Avoids returning continue statements
should be Avoids returning the target block of continue statements
?
After all, continue;
is fine within a switch, we just don't want to move the increment block itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be changed to Avoids returning continue statement target nodes (increment blocks)
Awaiting further feedback before adding a fixup commit. Any idea when siegfriedpammer will be back? |
I'm not quite sure whether switch body detection is even in the right place (talking about the pass order). Fundamental differences between the C# control flow constructs:
This means:
Unfortunately, the "continue" business is tricky and depends on the loop type. Basically, we gain the information about what is a valid expression/statement (for When there's only But So I think your changes here are the right way to go. It feels like a hack to use |
private readonly IDictionary<ControlFlowNode, int> continueDepth = new Dictionary<ControlFlowNode, int>(); | ||
|
||
public LoopContext(ControlFlowGraph cfg, ControlFlowNode contextNode) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment some more on what this constructor is supposed to be doing?
Why is it analyzing the control flow reachable from the switch head, but stopping at loop heads?
Is it trying to follow the control flow up from the switch through the loop back-edge (a continue;
statement) to the loop head, thus discovering the outer loop containing the switch?
If so, doesn't that fail for the following code?
while (true) {
switch (i) {
case 0:
while (i < 10) i++;
continue;
default:
return;
}
}
The outer loop head is not reachable from the switch without going through an inner loop head, but your traversal stops at the inner loop head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, n.Dominates(contextNode)
is not the typical loop-head check. My example is no problem, since there's no loop head detected for the inner loop.
Still not quite sure which loop heads this constructor is supposed to find. All loop heads reachable from contextNode
that also dominate the contextNode
? It doesn't quite do that, though at least it should usually find the head of the loop that a continue;
would target.
A different approach here would be to just iterate through the nodes dominating contextNode
, and check if any of them is a loop head:
for (ControlFlowNode n = contextNode.ImmediateDominator; n != null; n = n.ImmediateDominator) {
if (n.Predecessors.Any(p => n.Dominates(p))) {
loopHeads.Add(n);
}
}
This should give you the heads of all loops containing contextNode
, already ordered from innermost to outermost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"heads of all loops containing contextNode
" is the goal yes, so that continues (and their depth) can be identified.
I believe the example you've given will find loops which dominate the switch but don't contain it. My approach also ensures there's a back edge to the loop head from the switch's successor tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we're running at a point where loop bodies haven't been detected yet. "loops containing contextNode
" is not quite defined yet.
Code dominated by a loop head could end up in the loop, even if there's no back-edge:
outer_loop_head:
while (...) {
while (...) {
if (...) {
switch (...) { ... goto outer_loop_head; }
return;
}
}
}
But such a block dominated by the loop without back-edge could also end up outside the loop -- ultimately it's up to loop detection's decision.
My code snipper approximates "heads of all loops containing contextNode" by computing "heads of all loops dominating contextNode" (thus giving an upper bound). Your code seems to give a lower bound instead, so really the question is: which type of error is better here, false positive or false negative?
Maybe we could change the pass ordering to detect loops before we start dealing with switches? I don't see a reason why SwitchDetection
needs to run before LoopDetection
; and loops->switches->ifs seems like it would cause less issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key is that if there's no back edge, then there's no continue statement, so you'll never encounter the false negative anyway.
|
||
// match do-while condition | ||
if (pred.Successors.Count <= 2) { | ||
if (HighLevelLoopTransform.MatchDoWhileConditionBlock((Block)pred.UserData, out var t1, out var t2) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite unreliable here; any non-trivial loop condition might take up multiple blocks at this point in time.
The "do-while condition block" doesn't exist until much later in the transformation pipeline. Not sure what could be done about that...
Good analysis. I wasn't able to achieve a provably correct or satisfactorily general approach here like I was with Proper detection would rely on two things.
|
As a next step, I think we should try changing up the pass order to be:
The guesswork about the "continue target block" seems unavoidable, but at least we get to avoid the guesswork about which loops contain the switch. |
In general, I support said changing of pass order. The only issue I ran into when I attempted it was related to loops (and switch containers) being constructed inner first #915 Part of the switch body would end up outside the loop, and have to be poached from the parent container into the child. I didn't go any further with this because control flow graphs are restricted to container context, and I got quite good results with the current approach. |
Well, let's keep future improvements for the future. I'm merging the branch as-is. It's already a huge improvement, no need to wait for the perfect solution. |
This PR contains a number of commits aimed at improving code quality surrounding [potential] switch statements. It primarily addresses two issues currently producing unnecessary gotos.
SwitchDetection.UseCSharpSwitch
is over-aggressive, resulting in the replacement of condition trees with much poorer looking switch statements. A number of heuristics are added to better discern which representation to use.LoopDetection.DetectSwitchBody
,FindExitPoint
and associated loop partitioning code is unaware of the interaction between switch statements andcontinue
resulting in poor exit point selection when cases can leave the switch body via a branch to a continue blockThe test cases added with each commit are the best place to look for a concise before/after. Unfortunately the number of test cases required to ensure correctness would be massive, so we have to resort to inspecting the round trip tests.
The only switch related gotos remaining in the round trip tests are in
NRefactory.MonoCSharp/Tokenizer|Convert|MemberCore
andNewtonsoft.Json.Utilities/ConvertUtils
, all containing "goto case" statements. I have a solution based on condition tree flow analysis, but it is comparatively massive for addressing such a small target, so I've left it for future review.I'd be happy to discuss wider design decision via PM or some form of instant messaging if this is a lot to review.
Edit: I am unsatisfied with the quality of the
LoopDetection
changes, but decided to leave invasing refactoring to the maintainers.