Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serde byte/int/long array serialization (Issue #27 / #32) #51

Closed
wants to merge 4 commits into from

Conversation

Schuwi
Copy link
Contributor

@Schuwi Schuwi commented Jun 14, 2020

This PR aims to implement serde serialization support to serialize i8, i32 and i64 sequence types to NBT ByteArray, IntegerArray and LongArray respectively.
This implementation checks the type of the first sequence element to determine which NBT List/Array type to use.
This means of course that serializing to a NBT List of NBT Byte/Integer/Long types is not possible anymore, but as discussed in #27 this doesn't seem to be used in practice. Also if required this could probably be implemented as some kind of option later.

I ran some rudimentary tests by serializing a Minecraft chunk with https://github.com/feather-rs/feather including Lists and LongArrays and checking the result in an NBT viewer. Everything serialized as expected.

This would resolve #27 and #32.

@Schuwi
Copy link
Contributor Author

Schuwi commented Jun 14, 2020

Oh, yeah kinda forgot to run cargo test, heh...

@Schuwi Schuwi marked this pull request as draft June 14, 2020 20:53
@atheriel
Copy link
Collaborator

I’m very happy to see work on this. It looks like the failures are due to handling of bested lists; do you think it’s possible to handle that with your current approach? Or does the serializer not have enough information at that point?

@Schuwi
Copy link
Contributor Author

Schuwi commented Jun 14, 2020

Okay, the tests (including array serialization tests) should now complete successfully, though it just dawned on me that this whole approach is flawed. Empty arrays can't be identified on serialization because no elements are written to identify the element type with. So every empty sequence is serialized as an empty list right now...

@Schuwi
Copy link
Contributor Author

Schuwi commented Jun 15, 2020

Alright. Here is how it looks:

  • Sequences are generally untyped in serde and thus the element type can only be inferred by the Serializer by looking at the first element's type, which is impossible for zero-length sequences (see also Sequence and map types needed serde-rs/serde#607)
  • Explicitly tagging a Rust array type as i8/i32/i64-NBT-array on the consumer side (i.e. the crate implementing hematite_nbt) is theoretically possible via #[serde(serialize_with="hematite_nbt::xxxx")] but doesn't really help us because this serialize_with function (see https://serde.rs/field-attrs.html) only receives a generic S: Serializer parameter. This doesn't allow for any "custom" communication to the specific hematite_nbt Serializer to tell it to serialize an i8/i32/i64-NBT-array next.

In conclusion:

The current implementation of serde isn't designed to allow for explicitely typed sequences and this is unlikely to change in the future (see the wontfix tag on serde-rs/serde#607).

My proposal:

Using a dirty (not-so-)little hack.

As NBT doesn't support tuples, our Serializer::serialize_tuple_struct functions are currently "unused" (i.e. they just throw an error upon invocation). This method has a very interesting signature:
fn serialize_tuple_struct(self, name: &'static str, len: usize) -> Result<Self::SerializeTupleStruct, Self::Error>
which is basically the same signature as Serializer::serialize_seq plus an extra &'static str (and usize instead of Option<usize> but our sequences are always sized so that distinction doesn't matter).

Now my idea is to (ab)use this method by implementing three serialize_with functions (i8_array, i32_array and i64_array) which are defined generically for their corresponding IntoIterator<ixx> doing something like:

let seq = serializer.serialize_tuple_struct("i8", iter.size_hint().1.unwrap());
for elem in iter {
    seq.serialize_field(&elem);
}
seq.end();

Thus the serialize_tuple_struct method can check the value of the name parameter and serialize an array according to the type.

Downsides to this approach:

Surprisingly few actually. The serialize_tuple_struct implementation can't be called "accidentally" (via the #[derive(Serialize)] macro) because i8, i32 and i64 are reserved keywords and thus can't be used as struct names.
The only downside I can think of right now:

  • This would probably prevent serialization to other data formats using the same #[derive(Serialize)]'d data structure, because the i8/i32/i64-array fields tagged as such will be serialized as tuple structs, if even supported by the used serializer.

I'll try to implement this and see how it works out. It should actually be less work than the current approach but optimistic assessments have been shown to be incorrect in the past.

@atheriel
Copy link
Collaborator

I’m not sure how empty lists are serialized by Minecraft, it may not matter what tag is set in those cases. Do you know it generated data that could confirm this?

@Schuwi
Copy link
Contributor Author

Schuwi commented Jun 15, 2020

I’m not sure how empty lists are serialized by Minecraft, it may not matter what tag is set in those cases. Do you know it generated data that could confirm this?

Looking at some 1.13.2 vanilla chunk data right now I can confirm I have found an empty long array (in this case Chunk:Level/Structures/References/Monument). Though I have no idea whether the vanilla client would accept an empty List here instead.

Edit: Manually changing this exact empty array to an empty List and loading the chunk in the vanilla 1.13.2 client resulted in no unusual console output and inspecting the chunk data afterwards the value in question was back to an empty long array.

@caelunshun
Copy link
Contributor

caelunshun commented Jun 15, 2020

I’m not sure how empty lists are serialized by Minecraft, it may not matter what tag is set in those cases. Do you know it generated data that could confirm this?

In most cases using an empty list should be fine, since from my tests vanilla will ignore a tag if it has the wrong type. In @Schuwi's Monument case above, the client expected a long array but found a list, so it skipped properly deserializing that field without throwing any errors. (This can be observed by, for example, changing the type of the Palette tag for chunk data—vanilla will just skip deserializing that chunk section.)

It's possible that not all vanilla deserializers will ignore errors in this manner, so I would advise caution using the empty TAG_List approach. Though it will work in all cases I've tried, it could cause bugs in the future.

Also, most of the other NBT libraries I've looked at won't be able to interpret an empty list as an empty TAG_Long_Array or TAG_Int_Array. So going with this approach might break compatibility with some non-vanilla implementations.

The serialize_tuple_struct implementation can't be called "accidentally" (via the #[derive(Serialize)] macro) because i8, i32 and i64 are reserved keywords and thus can't be used as struct names.

Note that the r# syntax can allow a keyword to be used as an identifier. For example, this compiles and will have the struct named i32:

struct r#i32 {}

I doubt anyone will name their types like this, but to be sure you might want to use a more obscure special value, such as __hematite_nbt_i32_array__.

@Schuwi
Copy link
Contributor Author

Schuwi commented Jun 15, 2020

Alright, I implemented my proposed serialize_with approach (https://github.com/Schuwi/hematite_nbt/tree/explicit-array-serializing, working implementation) but I bumped into a new problem with that approach.
serialize_with can't be used on nested collections. That's not really our fault, but rather a limitation of serde which has an open issue (serde-rs/serde#723) but it doesn't look like any solution will be implemented soon.
Other than that I find this approach still quite promising, so I'd be happy if anyone can come up with a workaround for this limitation. I haven't really found one yet...

@atheriel
Copy link
Collaborator

The serialize_with workaround seems like a clever solution to me, it's unfortunate there is a roadblock. When I originally implemented this stuff, I did encounter many of these limitations of serde myself. I think this is ultimately because NBT does not map perfectly to the serde data model, so we're just stuck trying to find reasonable workarounds.

@Schuwi
Copy link
Contributor Author

Schuwi commented Jun 17, 2020

Alright, I fixed up the code and implemented the nested lists using a workaround.
I would propose to merge it for now (other branch, will make a new PR shortly). I think this approach is the way to go and the rest (proper nested lists support) is up to serde to implement. Also, as it is entirely optional, nobody is forced to use it and it breaks no existing code.

@Schuwi
Copy link
Contributor Author

Schuwi commented Jun 17, 2020

Closed in favor of #52

@Schuwi Schuwi closed this Jun 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Papercut: nbt-serde always writes List tags as opposed to ByteArray/IntArray tags
3 participants