Loss of precision in serde deserialization due to f64 conversion #279

bchallenor · 2020-10-17T23:05:54Z

I am seeing a loss of precision in serde deserialization that I believe is due to an intermediate f64 conversion.

I'm trying to deserialize from CSV, so I'm using rust-csv as the serde backend. rust-csv's implementation of deserialize_any will try to guess the type of the CSV field based on its contents, so if the field looks like a float, it will end up creating an f64 and passing it to visit_f64 on DecimalVisitor. At this point, we lose precision, and even for representable numbers, we lose track of the input scale (e.g. trailing zeros).

Here is a patch to add a test exhibiting the problem:

diff --git a/Cargo.toml b/Cargo.toml
index f607209..ef5dbb3 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -24,4 +24,5 @@ tokio-postgres = { default-features = false, optional = true, version = "0.5" }
 bincode = "1.3"
 bytes = "0.5"
+csv = "1.1"
 futures = "0.3"
 rand = "0.7"
diff --git a/src/serde_types.rs b/src/serde_types.rs
index 00b975d..2bea1fc 100644
--- a/src/serde_types.rs
+++ b/src/serde_types.rs
@@ -141,4 +141,36 @@ mod test {
     }

+    #[test]
+    #[cfg(not(feature = "serde-bincode"))]
+    fn deserialize_valid_decimal_csv() {
+        let data = [
+            ("amount\r\n1.0\r\n", "1.0"),
+            ("amount\r\n1.0000000000000001\r\n", "1.0000000000000001"),
+            ("amount\r\n1.234\r\n", "1.234"),
+            ("amount\r\n1234\r\n", "1234"),
+            ("amount\r\n1234.56\r\n", "1234.56"),
+            ("amount\r\n1.23456e3\r\n", "1234.56"),
+        ];
+        for &(serialized, value) in data.iter() {
+            let mut reader = csv::ReaderBuilder::new().from_reader(serialized.as_bytes());
+            let result = reader.deserialize().next().unwrap();
+            assert_eq!(
+                true,
+                result.is_ok(),
+                "expected successful deseralization for {}. Error: {:?}",
+                serialized,
+                result.err().unwrap()
+            );
+            let record: Record = result.unwrap();
+            assert_eq!(
+                value,
+                record.amount.to_string(),
+                "expected: {}, actual: {}",
+                value,
+                record.amount.to_string()
+            );
+        }
+    }
+
     #[test]
     #[should_panic]

This problem is caused by an unfortunate interaction of (otherwise useful) features in the two libraries: firstly, the field type inference in rust-csv, and secondly, the support for deserialize_any in rust-decimal. So I'm not sure where this problem would best be solved:

A feature in rust-decimal to force deserialization via string. If I change the impl of Deserialize for Decimal to call deserialize_str instead of deserialize_any, the above test passes (but the equivalent JSON test fails, because serde_json needs to be able to deserialize int/float values).

A feature in rust-csv to force deserialize_any to unconditionally invoke visit_str, rather than trying to guess the type.

The former has precedent in that rust-decimal already has the serde-float feature to do exactly this - it's just limited to bincode right now. Enabling this feature for CSV would unfortunately break support for int and float representations in JSON, so an app that uses both CSV and JSON would be forced to use a string representation of Decimal in JSON as well.

The text was updated successfully, but these errors were encountered:

paupino · 2020-10-18T18:35:27Z

Out of curiosity, does enabling the serde-bincode feature resolve things for you? I think this feature is poorly named - it doesn't actually require bincode even though it appears that way. If it does fix it, I could alias this feature name and make it more generic.

bchallenor · 2020-10-18T21:26:05Z

Yes, enabling serde-bincode does fix the test above, thanks!

I agree the feature might be more discoverable if it were renamed. Perhaps serde-str, or serde-string?

Interestingly, it breaks another use case I have for rust-csv + rust-decimal involving #[serde(untagged)] enums, but given that serde-bincode breaks it, I think we can assume that the only reason it worked before is that there is another codepath in rust-csv that is converting to f64. So I quite like that serde-bincode revealed this (by breaking deserialization).

What do you think about extending the conditional compilation to the methods of DecimalVisitor as well? To have guaranteed "arbitrary precision or fail" semantics, it would be nice if serde-string meant that visit_f64 was not even implemented. Something like this:

diff --git a/src/serde_types.rs b/src/serde_types.rs
index 75ed3ef..2520158 100644
--- a/src/serde_types.rs
+++ b/src/serde_types.rs
@@ -5,5 +5,5 @@ use num_traits::FromPrimitive;
 use serde::{self, de::Unexpected};

-#[cfg(not(feature = "serde-bincode"))]
+#[cfg(feature = "serde-any")]
 impl<'de> serde::Deserialize<'de> for Decimal {
     fn deserialize<D>(deserializer: D) -> Result<Decimal, D::Error>
@@ -15,5 +15,5 @@ impl<'de> serde::Deserialize<'de> for Decimal {
 }

-#[cfg(all(feature = "serde-bincode", not(feature = "serde-float")))]
+#[cfg(feature = "serde-string")]
 impl<'de> serde::Deserialize<'de> for Decimal {
     fn deserialize<D>(deserializer: D) -> Result<Decimal, D::Error>
@@ -25,5 +25,5 @@ impl<'de> serde::Deserialize<'de> for Decimal {
 }

-#[cfg(all(feature = "serde-bincode", feature = "serde-float"))]
+#[cfg(feature = "serde-float")]
 impl<'de> serde::Deserialize<'de> for Decimal {
     fn deserialize<D>(deserializer: D) -> Result<Decimal, D::Error>
@@ -44,4 +44,5 @@ impl<'de> serde::de::Visitor<'de> for DecimalVisitor {
     }

+    #[cfg(feature = "serde-any")]
     fn visit_i64<E>(self, value: i64) -> Result<Decimal, E>
     where
@@ -54,4 +55,5 @@ impl<'de> serde::de::Visitor<'de> for DecimalVisitor {
     }

+    #[cfg(feature = "serde-any")]
     fn visit_u64<E>(self, value: u64) -> Result<Decimal, E>
     where
@@ -64,4 +66,5 @@ impl<'de> serde::de::Visitor<'de> for DecimalVisitor {
     }

+    #[cfg(any(feature = "serde-any", feature = "serde-float"))]
     fn visit_f64<E>(self, value: f64) -> Result<Decimal, E>
     where
@@ -71,4 +74,5 @@ impl<'de> serde::de::Visitor<'de> for DecimalVisitor {
     }

+    #[cfg(any(feature = "serde-any", feature = "serde-string"))]
     fn visit_str<E>(self, value: &str) -> Result<Decimal, E>
     where

paupino · 2020-12-05T00:22:12Z

For now I've added a PR to try the aliasing approach however would like to be able to reproduce the failure case you mentioned. If you have any example snippets then please feel free to share and I'll add some tests to fiddle around with!

#279: Alias serde-bincode -> serde-str to remove bincode confusion

bchallenor mentioned this issue Oct 18, 2020

document usage of #[serde(flatten)] more thoroughly BurntSushi/rust-csv#151

Open

paupino added this to the v1.9 milestone Nov 10, 2020

paupino added a commit that referenced this issue Dec 5, 2020

#279: Alias serde-bincode -> serde-str to remove bincode confusion

90d2862

paupino added a commit that referenced this issue Dec 5, 2020

Merge pull request #290 from paupino/issue/279

dcb57eb

#279: Alias serde-bincode -> serde-str to remove bincode confusion

paupino closed this as completed Dec 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss of precision in serde deserialization due to f64 conversion #279

Loss of precision in serde deserialization due to f64 conversion #279

bchallenor commented Oct 17, 2020 •

edited

Loading

paupino commented Oct 18, 2020

bchallenor commented Oct 18, 2020

paupino commented Dec 5, 2020

Loss of precision in serde deserialization due to f64 conversion #279

Loss of precision in serde deserialization due to f64 conversion #279

Comments

bchallenor commented Oct 17, 2020 • edited Loading

paupino commented Oct 18, 2020

bchallenor commented Oct 18, 2020

paupino commented Dec 5, 2020

bchallenor commented Oct 17, 2020 •

edited

Loading