Skip to content

Commit b594d90

Browse files
Jefffreytustvold
andauthored
Enhance Date64 type documentation (#5323)
* Enhance Date64 type documentation * Update arrow-schema/src/datatype.rs Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com> * Update arrow-schema/src/datatype.rs Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com> * Update arrow-schema/src/datatype.rs Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com> --------- Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
1 parent ce58932 commit b594d90

File tree

1 file changed

+23
-2
lines changed

1 file changed

+23
-2
lines changed

arrow-schema/src/datatype.rs

+23-2
Original file line numberDiff line numberDiff line change
@@ -145,10 +145,31 @@ pub enum DataType {
145145
/// ```
146146
Timestamp(TimeUnit, Option<Arc<str>>),
147147
/// A signed 32-bit date representing the elapsed time since UNIX epoch (1970-01-01)
148-
/// in days (32 bits).
148+
/// in days.
149149
Date32,
150150
/// A signed 64-bit date representing the elapsed time since UNIX epoch (1970-01-01)
151-
/// in milliseconds (64 bits). Values are evenly divisible by 86400000.
151+
/// in milliseconds.
152+
///
153+
/// According to the specification (see [Schema.fbs]), this should be treated as the number of
154+
/// days, in milliseconds, since the UNIX epoch. Therefore, values must be evenly divisible by
155+
/// `86_400_000` (the number of milliseconds in a standard day).
156+
///
157+
/// The reason for this is for compatibility with other language's native libraries,
158+
/// such as Java, which historically lacked a dedicated date type
159+
/// and only supported timestamps.
160+
///
161+
/// Practically, validation that values of this type are evenly divisible by `86_400_000` is not enforced
162+
/// by this library for performance and usability reasons. Date64 values will be treated similarly to the
163+
/// `Timestamp(TimeUnit::Millisecond, None)` type, in that its values will be printed showing the time of
164+
/// day if the value does not represent an exact day, and arithmetic can be done at the millisecond
165+
/// granularity to change the time represented.
166+
///
167+
/// Users should prefer using Date32 to cleanly represent the number of days, or one of the Timestamp
168+
/// variants to include time as part of the representation, depending on their use case.
169+
///
170+
/// For more details, see [#5288](https://github.com/apache/arrow-rs/issues/5288).
171+
///
172+
/// [Schema.fbs]: https://github.com/apache/arrow/blob/main/format/Schema.fbs
152173
Date64,
153174
/// A signed 32-bit time representing the elapsed time since midnight in the unit of `TimeUnit`.
154175
/// Must be either seconds or milliseconds.

0 commit comments

Comments
 (0)