Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditionally deserialize sub-struct based on visited values #1470

Closed
tyranron opened this issue Jan 30, 2019 · 6 comments
Closed

Conditionally deserialize sub-struct based on visited values #1470

tyranron opened this issue Jan 30, 2019 · 6 comments
Labels

Comments

@tyranron
Copy link
Contributor

I need to deserialize the following list

[{
  "t": "item.created",
  "v": 1,
  "data": {
    "id": 100,
    "name": "some"
  }
}, {
  "t": "item.deleted",
  "v": 1,
  "data": {
    "at": "1971-02-02T00:00:00Z"
  }
}]

into Vec<ItemEvent> where ItemEvent is defined as the following:

trait Event {
    fn event_type(&self) -> &'static str;
    fn event_version(&self) -> u8;
}

#[derive(Deserialize)]
struct Created {
    id: u32,
    name: String,
}
impl Event for Created {
    fn event_type(&self) -> &'static str {"item.created"}
    fn event_version(&self) -> u8 {1}
}

#[derive(Deserialize)]
struct Deleted {
    at: DateTime<Utc>,
}
impl Event for Deleted {
    fn event_type(&self) -> &'static str {"item.deleted"}
    fn event_version(&self) -> u8 {1}
}

enum ItemEvent {
    Created(Created),
    Deleted(Deleted),
}
impl Event for ItemEvent {
    fn event_type(&self) -> &'static str {
        match self {
            ItemEvent::Created(ev) => ev.event_type(),
            ItemEvent::Deleted(ev) => ev.event_type(),
        }
    }
    fn event_version(&self) -> u8 {
        match self {
            ItemEvent::Created(ev) => ev.event_version(),
            ItemEvent::Deleted(ev) => ev.event_version(),
        }
    }
}

I've checked/tried the following things with no success:

  1. Use adjancet tagging, but I have two tag fields and tag value goes from trait implementation, not enum.
  2. As I need to inspect t/v fields during deserialization the Visitor should definitely be used.
  3. Any Visitor implementation examples which I've found:
    • either know the exact deserialization type, so visit_map() just ends with Ok(MyType{});
    • or forward deserialization of the whole MapAccess received in visit_map().

I've ended up with somewhat following

impl<'de> Deserialize<'de> for ItemEvent {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        #[derive(Deserialize)]
        #[serde(field_identifier)]
        enum Field {
            #[serde(rename = "t")]
            Type,
            #[serde(rename = "v")]
            Version,
            #[serde(rename = "data")]
            Data,
        };

        const FIELDS: &'static [&'static str] = &["t", "v", "data"];

        struct ItemEventVisitor;

        impl<'de> Visitor<'de> for ItemEventVisitor {
            type Value = ItemEvent;

            fn expecting(&self, f: &mut fmt::Formatter) -> fmt::Result {
                f.write_str("ItemEvent")
            }

            fn visit_map<M>(self, map: M) -> Result<ItemEvent, M::Error>
            where
                M: de::MapAccess<'de>,
            {
                let (mut t, mut v) = (None, None);
                let mut data = None;
                while let Some(key) = map.next_key()? {
                    match key {
                        Field::Type => {
                            if t.is_some() {
                                return Err(de::Error::duplicate_field("t"));
                            }
                            t = Some(map.next_value()?);
                        }
                        Field::Version => {
                            if v.is_some() {
                                return Err(de::Error::duplicate_field("v"));
                            }
                            v = Some(map.next_value()?);
                        }
                        Field::Data => {
                            if data.is_some() {
                                return Err(de::Error::duplicate_field("data"));
                            }
                            data = Some(map.next_value()?);
                        }
                    }
                }
                let t = t.ok_or_else(|| de::Error::missing_field("t"))?;
                let v = v.ok_or_else(|| de::Error::missing_field("v"))?;
                let data =
                    data.ok_or_else(|| de::Error::missing_field("data"))?;
                Ok(match (t, v) {
                    ("item.created", 1) => {
                        ItemEvent::Created(Created::deserialize(
                            de::value::MapAccessDeserializer::new(data),
                        )?)
                    },
                    ("item.deleted", 1) => {
                        ItemEvent::Deleted(Deleted::deserialize(
                            de::value::MapAccessDeserializer::new(data),
                        )?)
                    },
                    _ => return Err(de::Error::custom("🔥"));
                })
            }
        }

        deserializer.deserialize_struct("ItemEvent", FIELDS, ItemEventVisitor)
    }
}

which does not actually compile.

The question is: how one could forward deserialization inside visit_map(), but only for one field, not the whole received MapAccess?

@dtolnay
Copy link
Member

dtolnay commented Jan 31, 2019

I would write this as:

impl<'de> Deserialize<'de> for ItemEvent {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        #[derive(Deserialize, Debug)]
        enum EventType {
            #[serde(rename = "item.created")]
            Created,
            #[serde(rename = "item.deleted")]
            Deleted,
        }

        #[derive(Deserialize)]
        struct EventHelper {
            t: EventType,
            v: u32,
            data: serde_json::Value,
        }

        let helper = EventHelper::deserialize(deserializer)?;
        match (helper.t, helper.v) {
            (EventType::Created, 1) => Created::deserialize(helper.data)
                .map(ItemEvent::Created)
                .map_err(de::Error::custom),
            (EventType::Deleted, 1) => Deleted::deserialize(helper.data)
                .map(ItemEvent::Deleted)
                .map_err(de::Error::custom),
            (t, v) => Err(de::Error::custom(format!(
                "unrecognized version v={} for event {:?}",
                v, t
            ))),
        }
    }
}

@tyranron
Copy link
Contributor Author

@dtolnay thanks, but that would work only for JSON as serde_json::Value is used? Is there a way to make it work for arbitrary format?

@dtolnay
Copy link
Member

dtolnay commented Jan 31, 2019

You could use serde_value::Value in place of serde_json::Value.

@svanharmelen
Copy link

@dtolnay I have somewhat of a similar use case, but for quite a big struct with a lot of aliases.

So I was wondering if there is a way to do something like this, but without having to copy over the whole struct as a helper struct (as it's really pretty big)?

Using the same struct inside the deserialize method would create an infinite look right? So could this be done with a type alias or newtype (not familiar with these, so I have no clue if that could work)?

Thanks!!

@svanharmelen
Copy link

Maybe good to elaborate a little on how my use case could be similar. I have a struct with a value containing an enum of enums and which enum that should be used is dependent on another field.

So my idea was (following the above approach) to deserialize the whole struct, but use a temp Value for the enum field. Once I have access to the other field, I can then deserialize the remaining field with the correct enum.

So something like this:

#[derive(Serialize, Deserialize, Clone, Debug)]
#[serde(untagged)]
pub enum Model {
    Model1(Serie1),
    Model2(Serie2),
}

#[derive(Serialize_repr, Deserialize_repr, Clone, Debug)]
#[allow(non_camel_case_types)]
#[repr(i8)]
pub enum Serie1 {
    #[serde(rename(deserialize = "SubModel 1"))]
    SubModel1 = 0,
    #[serde(rename(deserialize = "SubModel 2"))]
    SubModel2 = 1,
}

#[derive(Serialize_repr, Deserialize_repr, Clone, Debug)]
#[repr(i8)]
pub enum Serie2 {
    #[serde(rename(deserialize = "Model 123"))]
    Model1= 0,
    #[serde(rename(deserialize = "Model 234"))]
    Model2 = 1,
    #[serde(rename(deserialize = "Model 345"))]
    Model3 = 2,
}

struct SomeDevice {
    driver: String,
    model: Model,
}

Of course this is very simplified, but I think the idea is clear. The issue is that there are multiple sub enums that represent 0 so the first match will always be used instead of the correct one (which belongs to the specific driver).

Hope this solution can also be used for this use case, but of course any other pointers are more then welcome as well! Thanks!

@svanharmelen
Copy link

@dtolnay do you prefer that I open a new support issue for this instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants