Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterative generic math changes to match API review #69391

Merged
merged 11 commits into from
May 19, 2022

Conversation

tannergooding
Copy link
Member

This updates GetShortestBitLength to return an int, exposes the WriteBigEndian, WriteExponentBigEndian, WriteSignificandBigEndian APIs, and updates System.Char to explicitly implement the numeric interfaces.

@ghost ghost assigned tannergooding May 16, 2022
@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

if (destination.Length >= sizeof(byte))
{
byte value = m_value;
Unsafe.WriteUnaligned(ref MemoryMarshal.GetReference(destination), value);
Copy link
Member

@stephentoub stephentoub May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this result in better codegen than just destination[0] = m_value;? I'd have expected the guard above (which could also be if (!destination.IsEmpty)) to remove the bounds check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, but its simpler to implement and compare the code when they are all roughly the same size/shape so I've mostly just been doing write once, copy/paste, small tweak.

I can update if you'd prefer it do the simpler thing here.

/// <summary>Tries to write the current value, in big-endian format, to a given span.</summary>
/// <param name="destination">The span to which the current value should be written.</param>
/// <param name="bytesWritten">The number of bytes written to <paramref name="destination" />.</param>
/// <returns><c>true</c> if the value was succesfully written to <paramref name="destination" />; otherwise, <c>false</c>.</returns>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the by; my understanding is that <c>true</c> and <c>false</c> should actually be <see langword="true" /> and <see langword="false"/> so that they capitalize when changing to the VB version. (Same with null => Nothing)

Carlos' tool probably handles those as special-cased; but thought I'd share for future reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlossanlop is this handled or should I be explicitly using see langword ?


i++;
}
while ((part == 0) && (i < bits.Length));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What spins here, and do we have boundary tests for it? (e.g. something does the do and exits to the other loop, something else runs the body twice, something runs thrice, and maybe 4 isn't possible?)

(A comment in the code to help future maintainers would be helpful).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No sure what you're asking for here? This handles the conversion from one's complement to two's complement.

The first loop handles 0x0000_0000 as the one's complement of that is 0x0000_0000. The second loop handles the remaining bits in the value.

  • 0x0000_0000_0000_0000_0000_0000 becomes 0x0000_0000_0000_0000_0000_0000
  • 0x0000_0000_0000_0000_0000_0001 becomes 0xFFFF_FFFF_FFFF_FFFF_FFFF_FFFF
  • 0x0000_0000_0000_0001_0000_0000 becomes 0x0000_0000_FFFF_FFFF_0000_0000

This handles what is logically ~x + 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand what values of a BigInteger might cause the do/while to run more than once. Given that there's all of 1s complement, 2s complement, little endian, and big endian present, I can't follow the data flow in my head.

  • Negative values -1 .. int.MinValue + 1 all fit in _sign and _bits is null, so none of those are relevant.
  • int.MinValue produces a one element _bits value, so doesn't loop.
  • int.MinValue - 1 (0xFFFFFFFF_7FFFFFFF) is, I believe, stored as { 0x00000000, 0x80000000 }
    • The first time through the loop we load the 0. So now do ~0 + 1, which I guess goes back to 0.
      • Yay, this value looped.
    • The second time through we load the 0x8000_0000 and produce, uh, 0x8000_0000?
      • It's not zero, so we'd exit, but we're also out of data, so we'd exit anyways.

So perhaps the answer is "anything in the tests ending in multiples of 4 zero-byte values (BE) went through the top loop (trailing_zero_bytes_count / 4) + 1 times".

If that's the case, I'd expect to see tests that

  • Don't end in 4 zeros, but are at least 8 bytes (do+break+second_loop)
  • End in 4 zeros and are 8 bytes (do+do+return)
  • End in 4 zeros and are more than 8 bytes (do+do+break+second_loop)
  • End in 8 zeros and are 12 bytes (do+do+do+return)
  • End in 8 zeros and are more than 12 bytes (do+do+do+break+second_loop)

Ideally with each of the second_loop cases having a one/more_than_one case (and super ideally as one/two/more_than_two, since sometimes "two" is special)

If I understand the states now, I don't see any tests that produce 12 or more bytes, so there's nothing that runs that top loop 3 or more times.


The thing I'd hope for here is a comment talking about some of the boundary cases. i.e. a particular value that will run the top loop twice, then the bottom loop twice, but one more (or less?) would have a different execution characteristic (which I'm not feeling knowledgeable enough to predict if that means 1/3 or 3/1, or a black hole, or what). And those same values being codified in tests.

Copy link
Member Author

@tannergooding tannergooding May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this method in particular, we have tests covering everything but the 12 + byte case because it requires significant more setup to do and represent.

I can certainly add it, but its not interesting IMO. This logic is the same as we have in WriteLittleEndian and that is used in DangerousMakeTwosComplement, etc. The only difference is that it writes from the end of the buffer to the beginning, rather than from the beginning to the end and the current tests ensure that we are computing the correct results with both loops executing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is the same as we have in WriteLittleEndian

Which looks to be equally under-tested.

and that is used in DangerousMakeTwosComplement, etc.

To me, the fact that there are three copies of the code means there need to be more tests, not fewer, so that someone who changes something about only two of the three is likely to see failures from the remaining one.

Having read the comments in DangerousMakeTwosComplement I now better understand the reason for two distinct pieces of logic (the first loop is the +1 of the overall (~x + 1), and then also any carry values from the lower significance segments), but I genuinely didn't get that impression when reading the TryWriteLittleEndian and TryWriteBigEndian code blocks... it just came across as magic.

I still think that tests in the 12-16 byte range are useful for both TryWriteBigEndian and TryWriteLittleEndian, and even a comment as sparse as what DangerousMakeTwosComplement has about the carry bit would go a long way toward maintainability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. There was indeed an issue with a specific edge case value due to it not accounting for needing an extra "part" to hold the sign.

In particular, signed.MinValue - 1 is represented in one's complement as 0x8000_0000, 0x0000_0001, swapped to two's complement this is 0x7FFF_FFFF, 0xFFFF_FFFF which is of course signed.MaxValue instead. We need it to be 0xFFFF_FFFF, 0x7FFF_FFFF, 0xFFFF_FFFF so that the sign is still tracked instead.


Assert.True(FloatingPointHelper<double>.TryWriteSignificandBigEndian(double.NegativeInfinity, destination, out bytesWritten));
Assert.Equal(8, bytesWritten);
Assert.Equal(new byte[] { 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, destination.ToArray());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW: We have an AssertExtensions.SequenceEquals<T>(ROSpan<T>, ROSPan<T>) for spans, if you don't like all the ToArray calls in your tests.

https://github.com/dotnet/runtime/blob/main/src/libraries/Common/tests/TestUtilities/System/AssertExtensions.cs#L435-L476

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know for future changes. I might do that as a later cleanup. As for now, the current logic works and allows easily copying and tweaking logic between the 15-20 types that need it.

@ghost ghost locked as resolved and limited conversation to collaborators Jun 19, 2022
@tannergooding tannergooding deleted the generic-math-2 branch November 11, 2022 15:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants