Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overflow when casting timestamps prior to the epoch #3512

Closed
alamb opened this issue Jan 11, 2023 · 9 comments · Fixed by #3519
Closed

Overflow when casting timestamps prior to the epoch #3512

alamb opened this issue Jan 11, 2023 · 9 comments · Fixed by #3519
Assignees
Labels
arrow Changes to the arrow crate bug

Comments

@alamb
Copy link
Contributor

alamb commented Jan 11, 2023

Describe the bug
When a string such as '1677-06-14T07:29:01.256' is casted to a timestamp (via string_to_timestamp_nanos) it panics in debug builds and gives a wonky answer in release builds.

To Reproduce
Run this test in arrow-cast/src/display.rs

    #[test]
    fn string_to_timestamp_old() {
        assert_eq!(
            123,
            parse_timestamp("1677-06-14T07:29:01.256").unwrap()
        );
    }

failures:

---- parse::tests::string_to_timestamp_old stdout ----
thread 'parse::tests::string_to_timestamp_old' panicked at 'attempt to multiply with overflow', /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/chrono-0.4.23/src/naive/datetime/mod.rs:426:21
stack backtrace:
   0: rust_begin_unwind
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:65:14
   2: core::panicking::panic
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:115:5
   3: chrono::naive::datetime::NaiveDateTime::timestamp_nanos
             at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/chrono-0.4.23/src/naive/datetime/mod.rs:426:21
   4: arrow_cast::parse::string_to_timestamp_nanos
             at ./src/parse.rs:102:19
   5: arrow_cast::parse::tests::parse_timestamp
             at ./src/parse.rs:544:22
   6: arrow_cast::parse::tests::string_to_timestamp_old
             at ./src/parse.rs:471:13
   7: arrow_cast::parse::tests::string_to_timestamp_old::{{closure}}
             at ./src/parse.rs:468:5
   8: core::ops::function::FnOnce::call_once
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/ops/function.rs:251:5
   9: core::ops::function::FnOnce::call_once
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/ops/function.rs:251:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


failures:
    parse::tests::string_to_timestamp_old

Expected behavior
It should not panic; It should return an error

Additional context
Found in datafusion apache/datafusion#4875 by @DDtKey

The actual panic is in chrono, it appears to be this line

// without a timezone specifier as a local time, using T as a separator
// Example: 2020-09-08T13:42:29.190855
if let Ok(ts) = NaiveDateTime::parse_from_str(s, "%Y-%m-%dT%H:%M:%S%.f") {
return Ok(ts.timestamp_nanos());
}

@comphead
Copy link
Contributor

@alamb can I take this?

@comphead
Copy link
Contributor

https://github.com/chronotope/chrono/blob/16e976d5d551fe6b0801c5799284beb8010f52fc/src/naive/datetime/mod.rs#L438

This is known limitation.

/// # Panics
///
/// Note also that this does reduce the number of years that can be
/// represented from ~584 Billion to ~584 years. The dates that can be
/// represented as nanoseconds are between 1677-09-21T00:12:44.0 and
/// 2262-04-11T23:47:16.854775804.
///
/// (If this is a problem, please file an issue to let me know what domain
/// needs nanosecond precision over millennia, I'm curious.)

Nice message from authors :)

@DDtKey
Copy link
Contributor

DDtKey commented Jan 12, 2023

@comphead
To be honest, this is a direct vulnerability to allow through the input data to cause a panic (and probably stop the application).

IMO, need to avoid this and at least return an error instead 🤔

@comphead
Copy link
Contributor

comphead commented Jan 12, 2023

@alamb, @tustvold how do you like the idea to return Err if requested timestamp string is beyond the limits?

@tustvold
Copy link
Contributor

tustvold commented Jan 12, 2023

That would be consistent with how we handle overflow when casting in other places, or if the string is not valid

@alamb
Copy link
Contributor Author

alamb commented Jan 12, 2023

I agree -- an error would be my preferred behavior here -- I think we can just inline the implementation of to_timestamp from chrono and check for overflow when multiplying by 1_000_000_000;

@alamb
Copy link
Contributor Author

alamb commented Jan 12, 2023

Updated the expected behavior to "return error"

@tustvold
Copy link
Contributor

I think we can just inline the implementation of to_timestamp from chrono

Most methods in chrono now have non-panicking variants, so it is likely just a case of finding the right one to call

@tustvold tustvold added the arrow Changes to the arrow crate label Jan 27, 2023
@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'arrow'} from #3519

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants