Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss of precision in serde deserialization due to f64 conversion #279

Closed
bchallenor opened this issue Oct 17, 2020 · 3 comments
Closed

Loss of precision in serde deserialization due to f64 conversion #279

bchallenor opened this issue Oct 17, 2020 · 3 comments
Milestone

Comments

@bchallenor
Copy link

bchallenor commented Oct 17, 2020

I am seeing a loss of precision in serde deserialization that I believe is due to an intermediate f64 conversion.

I'm trying to deserialize from CSV, so I'm using rust-csv as the serde backend. rust-csv's implementation of deserialize_any will try to guess the type of the CSV field based on its contents, so if the field looks like a float, it will end up creating an f64 and passing it to visit_f64 on DecimalVisitor. At this point, we lose precision, and even for representable numbers, we lose track of the input scale (e.g. trailing zeros).

Here is a patch to add a test exhibiting the problem:

diff --git a/Cargo.toml b/Cargo.toml
index f607209..ef5dbb3 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -24,4 +24,5 @@ tokio-postgres = { default-features = false, optional = true, version = "0.5" }
 bincode = "1.3"
 bytes = "0.5"
+csv = "1.1"
 futures = "0.3"
 rand = "0.7"
diff --git a/src/serde_types.rs b/src/serde_types.rs
index 00b975d..2bea1fc 100644
--- a/src/serde_types.rs
+++ b/src/serde_types.rs
@@ -141,4 +141,36 @@ mod test {
     }

+    #[test]
+    #[cfg(not(feature = "serde-bincode"))]
+    fn deserialize_valid_decimal_csv() {
+        let data = [
+            ("amount\r\n1.0\r\n", "1.0"),
+            ("amount\r\n1.0000000000000001\r\n", "1.0000000000000001"),
+            ("amount\r\n1.234\r\n", "1.234"),
+            ("amount\r\n1234\r\n", "1234"),
+            ("amount\r\n1234.56\r\n", "1234.56"),
+            ("amount\r\n1.23456e3\r\n", "1234.56"),
+        ];
+        for &(serialized, value) in data.iter() {
+            let mut reader = csv::ReaderBuilder::new().from_reader(serialized.as_bytes());
+            let result = reader.deserialize().next().unwrap();
+            assert_eq!(
+                true,
+                result.is_ok(),
+                "expected successful deseralization for {}. Error: {:?}",
+                serialized,
+                result.err().unwrap()
+            );
+            let record: Record = result.unwrap();
+            assert_eq!(
+                value,
+                record.amount.to_string(),
+                "expected: {}, actual: {}",
+                value,
+                record.amount.to_string()
+            );
+        }
+    }
+
     #[test]
     #[should_panic]

This problem is caused by an unfortunate interaction of (otherwise useful) features in the two libraries: firstly, the field type inference in rust-csv, and secondly, the support for deserialize_any in rust-decimal. So I'm not sure where this problem would best be solved:

  1. A feature in rust-decimal to force deserialization via string. If I change the impl of Deserialize for Decimal to call deserialize_str instead of deserialize_any, the above test passes (but the equivalent JSON test fails, because serde_json needs to be able to deserialize int/float values).
  1. A feature in rust-csv to force deserialize_any to unconditionally invoke visit_str, rather than trying to guess the type.

The former has precedent in that rust-decimal already has the serde-float feature to do exactly this - it's just limited to bincode right now. Enabling this feature for CSV would unfortunately break support for int and float representations in JSON, so an app that uses both CSV and JSON would be forced to use a string representation of Decimal in JSON as well.

@paupino
Copy link
Owner

paupino commented Oct 18, 2020

Out of curiosity, does enabling the serde-bincode feature resolve things for you? I think this feature is poorly named - it doesn't actually require bincode even though it appears that way. If it does fix it, I could alias this feature name and make it more generic.

@bchallenor
Copy link
Author

Yes, enabling serde-bincode does fix the test above, thanks!

I agree the feature might be more discoverable if it were renamed. Perhaps serde-str, or serde-string?

Interestingly, it breaks another use case I have for rust-csv + rust-decimal involving #[serde(untagged)] enums, but given that serde-bincode breaks it, I think we can assume that the only reason it worked before is that there is another codepath in rust-csv that is converting to f64. So I quite like that serde-bincode revealed this (by breaking deserialization).

What do you think about extending the conditional compilation to the methods of DecimalVisitor as well? To have guaranteed "arbitrary precision or fail" semantics, it would be nice if serde-string meant that visit_f64 was not even implemented. Something like this:

diff --git a/src/serde_types.rs b/src/serde_types.rs
index 75ed3ef..2520158 100644
--- a/src/serde_types.rs
+++ b/src/serde_types.rs
@@ -5,5 +5,5 @@ use num_traits::FromPrimitive;
 use serde::{self, de::Unexpected};

-#[cfg(not(feature = "serde-bincode"))]
+#[cfg(feature = "serde-any")]
 impl<'de> serde::Deserialize<'de> for Decimal {
     fn deserialize<D>(deserializer: D) -> Result<Decimal, D::Error>
@@ -15,5 +15,5 @@ impl<'de> serde::Deserialize<'de> for Decimal {
 }

-#[cfg(all(feature = "serde-bincode", not(feature = "serde-float")))]
+#[cfg(feature = "serde-string")]
 impl<'de> serde::Deserialize<'de> for Decimal {
     fn deserialize<D>(deserializer: D) -> Result<Decimal, D::Error>
@@ -25,5 +25,5 @@ impl<'de> serde::Deserialize<'de> for Decimal {
 }

-#[cfg(all(feature = "serde-bincode", feature = "serde-float"))]
+#[cfg(feature = "serde-float")]
 impl<'de> serde::Deserialize<'de> for Decimal {
     fn deserialize<D>(deserializer: D) -> Result<Decimal, D::Error>
@@ -44,4 +44,5 @@ impl<'de> serde::de::Visitor<'de> for DecimalVisitor {
     }

+    #[cfg(feature = "serde-any")]
     fn visit_i64<E>(self, value: i64) -> Result<Decimal, E>
     where
@@ -54,4 +55,5 @@ impl<'de> serde::de::Visitor<'de> for DecimalVisitor {
     }

+    #[cfg(feature = "serde-any")]
     fn visit_u64<E>(self, value: u64) -> Result<Decimal, E>
     where
@@ -64,4 +66,5 @@ impl<'de> serde::de::Visitor<'de> for DecimalVisitor {
     }

+    #[cfg(any(feature = "serde-any", feature = "serde-float"))]
     fn visit_f64<E>(self, value: f64) -> Result<Decimal, E>
     where
@@ -71,4 +74,5 @@ impl<'de> serde::de::Visitor<'de> for DecimalVisitor {
     }

+    #[cfg(any(feature = "serde-any", feature = "serde-string"))]
     fn visit_str<E>(self, value: &str) -> Result<Decimal, E>
     where

@paupino
Copy link
Owner

paupino commented Dec 5, 2020

For now I've added a PR to try the aliasing approach however would like to be able to reproduce the failure case you mentioned. If you have any example snippets then please feel free to share and I'll add some tests to fiddle around with!

paupino added a commit that referenced this issue Dec 5, 2020
#279: Alias serde-bincode -> serde-str to remove bincode confusion
@paupino paupino closed this as completed Dec 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants