Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break out of quantification loop if there is no forward progress #560

Merged
merged 4 commits into from
Jul 11, 2022

Conversation

rctcwyvrn
Copy link
Contributor

@rctcwyvrn rctcwyvrn commented Jul 7, 2022

If we have an inner node of a quantification that doesn't progress our position in the input, we simply loop forever #558

The cases where this can happen range from nefarious (purposefully adding groups around instructions that cannot normally be quantified) to cases that are fairly easy to accidentally hit by making a typo (like the original case in the issue where there was an empty alternation case).

The fix is to add a check to quantification to ensure that our position has progressed, however this comes at a performance cost so this PR also adds an optimization to skip these checks if we can ensure that the inner node will have forward progress

Resolves #558 and resolves #542
rdar://96461197

@rctcwyvrn rctcwyvrn requested a review from milseman July 7, 2022 20:34
@rctcwyvrn
Copy link
Contributor Author

rctcwyvrn commented Jul 7, 2022

Additional edge cases to consider

  • (?:\d|(?i)){3}a a non-progressing adjustment to matching options. The adjusted matching options stay within that scope so it shouldn't affect the outside, we don't need to match it a specific number of times
  • A non-progressing custom consumer/matcher that contains state and expects to be called a certain number of times. Is this part of the API that we provide? Should we only break out of the quantification if it's both unlimited and not progressing? Actually I don't see a reason why we shouldn't do that, I'm gonna add it

@hamishknight
Copy link
Contributor

Nice! Does this also resolve #542?

@rctcwyvrn
Copy link
Contributor Author

Hmmmm it doesn't right now because we assume quantifications will result in forward progress if the inner node has forward progress, but I guess that isn't true if the quantification has a min-trips of 0. I'll add that case

Sources/_StringProcessing/ByteCodeGen.swift Outdated Show resolved Hide resolved
@@ -120,11 +130,15 @@ extension Processor.Registers {

self.values = Array(
repeating: SentinelValue(), count: info.values)
self.positions = Array(
repeating: Processor.Registers.sentinelIndex,
count: info.positions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future: Why the start index instead of the provided sentinel? Should or could we construct an invalid index and at least assert that we're never loading that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SentinelValue only works on the value registers since they are [Any]. For now we only emit position registers in this one case, if/when we have more complicated cases in the future we'll want some kinda validation like that

@@ -208,4 +208,40 @@ extension RegexTests {
expectProgram(for: "[abc]", semanticLevel: .unicodeScalar, doesNotContain: [.matchBitset])
expectProgram(for: "[abc]", semanticLevel: .unicodeScalar, contains: [.consumeBy])
}

func testQuantificationForwardProgressCompile() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with these for now, but do note that the kinds of tests can become extremely fragile and are likely to get dropped if control flow is ever overhauled. Make sure anything important is represented by both functional tests and more precise unit testing.

E.g. you can test the guaranteesForwardProgress query on regexes, and that wouldn't be fragile.

@rctcwyvrn
Copy link
Contributor Author

@swift-ci test

@rctcwyvrn rctcwyvrn merged commit 33acdeb into swiftlang:main Jul 11, 2022
rctcwyvrn added a commit to rctcwyvrn/swift-experimental-string-processing that referenced this pull request Jul 12, 2022
…ftlang#560)

This fixes infinite loops when we loop over an internal node that does not have any forward progress. Also included is an optimization to only emit the check/break instructions if we have a case that might result in an infinite loop (possibly non-progressing inner node + unlimited quantification)
hamishknight pushed a commit to hamishknight/swift-experimental-string-processing that referenced this pull request Jul 19, 2022
…ftlang#560)

This fixes infinite loops when we loop over an internal node that does not have any forward progress. Also included is an optimization to only emit the check/break instructions if we have a case that might result in an infinite loop (possibly non-progressing inner node + unlimited quantification)
hamishknight pushed a commit to hamishknight/swift-experimental-string-processing that referenced this pull request Jul 21, 2022
…ftlang#560)

This fixes infinite loops when we loop over an internal node that does not have any forward progress. Also included is an optimization to only emit the check/break instructions if we have a case that might result in an infinite loop (possibly non-progressing inner node + unlimited quantification)
hamishknight pushed a commit to rctcwyvrn/swift-experimental-string-processing that referenced this pull request Jul 21, 2022
…ftlang#560)

This fixes infinite loops when we loop over an internal node that does not have any forward progress. Also included is an optimization to only emit the check/break instructions if we have a case that might result in an infinite loop (possibly non-progressing inner node + unlimited quantification)
hamishknight pushed a commit to hamishknight/swift-experimental-string-processing that referenced this pull request Jul 21, 2022
…ftlang#560)

This fixes infinite loops when we loop over an internal node that does not have any forward progress. Also included is an optimization to only emit the check/break instructions if we have a case that might result in an infinite loop (possibly non-progressing inner node + unlimited quantification)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Null alternation causes infinite loop (a*)* spins the matching engine forever
3 participants