Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support casting to a KnownLayout type with a user-provided length computed dynamically #1289

Open
joshlf opened this issue May 17, 2024 · 3 comments

Comments

@joshlf
Copy link
Member

joshlf commented May 17, 2024

See also: #1290, #1328, #5 (comment)

Requirements

  • Length field that encodes the number of elements
  • Length field that encodes the number of bytes (where each element might be multiple bytes long)
  • Length field that encodes the number of bytes of a fixed-size header plus a variable number of elements (e.g. IPv4 total length field)
  • Length field that should be interpreted as a number of quanta, but these quanta are not equal to the number of trailing elements (e.g. IPv4 IHL field)
  • Length field that encodes the length of multiple variable-length sections simultaneously. The element sizes of each section may be different, but the number of elements in each section is the same. (See Discord discussion for context.)
  • Must play nicely with packet::BufferView::take_front and friends

Details

Some formats have an explicit length field, and some use cases with these formats require parsing a subset of the available bytes based on that length field. We'd like to write something like:

#[derive(KnownLayout, FromBytes)]
#[repr(C)]
struct UdpHeader {
    src_port: [u8; 2],
    dst_port: [u8; 2],
    length: [u8; 2],
    checksum: [u8; 2],
}

#[derive(KnownLayout, FromBytes)]
struct UdpPacket {
    header: UdpHeader,
    body: [u8],
}

Unfortunately, all of the conversions we permit today require the number of bytes to be parsed to be known ahead of time - either it's simply the entire source byte slice, or it's computed from a fixed number of trailing slice elements.

Ideally, we could support an API that permits the caller to specify how to extract the length and then use that to determine the number of bytes to parse.

One idea: Do one parsing pass, then allow the user to provide a callback which extracts the length field. Finally, re-parse using the extracted length. There may be multiple axes we need to consider:

  • What is the type of the length field? Can we just require this callback to return a usize, or do we need to support other numeric types?
  • Will the length field always describe the byte length of the trailing field? What if it's a count of elements that are larger than 1 byte in size each? What if it instead describes the overall length of the packet, including the header?
@kupiakos
Copy link
Contributor

kupiakos commented May 20, 2024

(also mentioned in #5 (comment))

This feature could be achieved rather naturally by extending the validation routine to also extract length information. What if the result of is_bit_valid isn't a simple bool, but rather:

enum BitValidity {
    /// This &T contains invalid bits for the type.
    Invalid,

    /// This `&T` contains valid bits for the type.
    Valid,

    /// This `&T` would contain valid bits if the tail slice were truncated to this many elements.
    ///
    /// If the tail slice already contains exactly this many elements, this is semantically identical to returning `Valid`.
    ValidIfTruncatedTo(usize),
}

That way ref_from_prefix and friends could consume the correct amount of data based on e.g. a length in the header of a &T. Exact-matching conversions would reject the input if the validator returns ValidIfTruncatedTo with a length different from the length derived from the bytes provided, and prefix-matching conversions would truncate the input to the given length. I can extend this design into something more detailed if desired.

@joshlf
Copy link
Member Author

joshlf commented Oct 4, 2024

I was toying around with this and got something to work that might be the seed of a good API:

// Method on `FromBytes`
fn ref_from_bytes_with_length_field(
    source: &[u8],
    elems: impl FnOnce(&Self) -> usize,
) -> Result<&Self, CastError<&[u8], Self>>
where
    Self: KnownLayout<PointerMetadata = usize> + Immutable,
{
    let slf = Self::ref_from_bytes_with_elems(source, 0)?;
    let count = elems(slf);
    Self::ref_from_bytes_with_elems(source, count)
}

@joshlf
Copy link
Member Author

joshlf commented Oct 6, 2024

Consolidating #1328 into this issue. Authored by @jswrenn.


See also: #5 (comment)

Addresses #1289. Discussion of alternative solutions should occur there, or in their own issues.

Many formats include a length field in their header that describes the length of a variable-sized body; e.g., UDP:

#[derive(FromBytes, KnownLayout, Immutable)]
#[repr(C)]
struct Packet {
    src_port: [u8; 2],
    dst_port: [u8; 2],
    length: [u8; 2],
    checksum: [u8; 2],
    body: [u8],
}

Such types are inconvenient to correctly parse in zerocopy; the length must first be parsed, and only then FromBytes::try_ref_from_prefix_with_elems can be invoked with that length. Alternatively, the entire buffer can be parsed with try_ref_from_prefix, and then truncated thereafter. These approaches are inconvenient.

We could potentially simplify this with a #[length] field attribute; e.g.:

#[derive(FromBytes, KnownLayout, Immutable)]
#[repr(C)]
struct Packet {
    src_port: [u8; 2],
    dst_port: [u8; 2],
    #[zerocopy::length]
    length: [u8; 2],
    checksum: [u8; 2],
    body: [u8],
}

...such that FromBytes::ref_from_prefix would respect the length attribute.

Some considerations:

  • This attribute should accept a function for cases in which the length in bytes must be computed
  • This approach should also work on TryFromBytes, where excess data might be invalid
  • Should we attempt to support the case where the length is factored into a different, fixed-sized header type?

@kupiakos on May 20 2024

If the design in #1289 (comment) were to be implemented, this attribute could have the validator always return ValidIfTruncatedTo(length_specified) assuming the other fields are validated.


@kupiakos on May 20 2024

This should also consider:

  • Whether the length is in bytes or number of elements, and
  • Whether this length includes the size of the header itself or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants