Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexOutOfRangeException serializing large strings #103155

Open
MaceWindu opened this issue Jun 7, 2024 · 3 comments
Open

IndexOutOfRangeException serializing large strings #103155

MaceWindu opened this issue Jun 7, 2024 · 3 comments

Comments

@MaceWindu
Copy link

Description

Json serializer could throw IndexOutOfRangeException exception on large strings serialization. Looks like string should be in 100Mb-160Mb to fail. Larger strings fail with expected string too big exception.

Reproduction Steps

using System.Text;
using System.Text.Json;

internal class Program
{
	static void Main(string[] args)
	{
		var rnd = new Random();
		while (true)
		{
			var length = rnd.Next(100_000_000, 160_000_000);
			var sb = new StringBuilder();

			while (sb.Length < length)
			{
				var allowSurrogate = length - sb.Length > 1;
				_ = sb.Append(Char(rnd));
			}

			try
			{
				_ = JsonSerializer.Serialize(sb.ToString(), JsonSerializerOptions.Default);
			}
			catch (ArgumentException ex) when (ex.Message.Contains("too large and not supported"))
			{
				// ignore expected errors
				Console.WriteLine($"Too big: {ex.Message}");
			}
			//catch (IndexOutOfRangeException)
			//{
			//	Console.WriteLine($"Crashed!!! String size: {sb.Length}");
			//}
		}

		static string Char(Random rnd)
		{
			if (rnd.Next(0, 10) == 0)
			{
				while (true)
				{
					var chr = ((char)(ushort)rnd.Next(0xd800, 0xdbff)).ToString()
						+ ((char)(ushort)rnd.Next(0xdc00, 0xdfff)).ToString();
					if (char.IsSurrogatePair(chr, 0))
					{
						return chr;
					}
				}
			}

			while (true)
			{
				var chr = (char)(ushort)rnd.Next(ushort.MaxValue);

				if (chr is < (char)0xD800 or > (char)0xDFFF)
				{
					return chr.ToString();
				}
			}
		}
	}
}

Expected behavior

No exceptions if string size in allowed limits and ArgumentException when limit reached (as it happens now for strings > ~160Mb).

Actual behavior

Unhandled exception. System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at System.Text.Json.Utf8JsonWriter.WriteStringMinimized(ReadOnlySpan`1 escapedValue)
   at System.Text.Json.Utf8JsonWriter.WriteStringEscapeValue(ReadOnlySpan`1 value, Int32 firstEscapeIndexVal)
   at System.Text.Json.Utf8JsonWriter.WriteStringEscape(ReadOnlySpan`1 value)
   at System.Text.Json.Utf8JsonWriter.WriteStringValue(ReadOnlySpan`1 value)
   at System.Text.Json.Serialization.Converters.StringConverter.Write(Utf8JsonWriter writer, String value, JsonSerializerOptions options)
   at System.Text.Json.Serialization.JsonConverter`1.TryWrite(Utf8JsonWriter writer, T& value, JsonSerializerOptions options, WriteStack& state)
   at System.Text.Json.Serialization.JsonConverter`1.WriteCore(Utf8JsonWriter writer, T& value, JsonSerializerOptions options, WriteStack& state)
   at System.Text.Json.Serialization.Metadata.JsonTypeInfo`1.Serialize(Utf8JsonWriter writer, T& rootValue, Object rootValueBoxed)
   at System.Text.Json.JsonSerializer.WriteString[TValue](TValue& value, JsonTypeInfo`1 jsonTypeInfo)
   at Program.Main(String[] args)

Regression?

No response

Known Workarounds

As we actually don't need such large strings serialized, we are using custom string converter which trims them to sane length.

Configuration

Runtime: 8.0.5
Arch: Windows x64

Other information

No response

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jun 7, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

@elgonzo
Copy link

elgonzo commented Jun 7, 2024

Simplified repro code:

int strLength = 120_000_000;
var str = new string((char) 0x07f, strLength);
var json = JsonSerializer.Serialize(str, JsonSerializerOptions.Default);

The problem appears to be:

int maxRequired = (escapedValue.Length * JsonConstants.MaxExpansionFactorWhileTranscoding) + 3;

which is overflowing for large values of escapedValue.Length (as is the case when serializing very large strings with lots of characters to escape), resulting in maxRequired either being negative number or a positive number that is possibly smaller than required, thus the internal memory buffer remaining at whatever size it currently is or being resized to a size that's too small and therefore leading to the IndexOutOfRangeException.

I guess there needs to be a check (somewhere) that ensures that the provided escapedValue buffer is not larger than (int.MaxValue - 3) / JsonConstants.MaxExpansionFactorWhileTranscoding to be able to fit the serialized value in a maximally sized output buffer (assuming the output buffer isn't able to grow beyond 2 GB and that counting the UTF-8 octets to write for optimally sizing the output buffer is cost-prohibitive).

@eiriktsarpalis
Copy link
Member

Can reproduce as well. I suspect we might be able to solve this following a chunking approach, elements of which are being implemented in this PR: #101356

@eiriktsarpalis eiriktsarpalis added bug and removed untriaged New issue has not been triaged by the area owner labels Jun 7, 2024
@eiriktsarpalis eiriktsarpalis added this to the Future milestone Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants