-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add TextWriter #11789
feat: Add TextWriter #11789
Conversation
✅ Deploy Preview for meta-velox canceled.
|
@kewang1024 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
b0db7b7
to
b4cc637
Compare
@kewang1024 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
CMakeLists.txt
Outdated
add_library(gflags_gflags INTERFACE | ||
velox/dwio/text/writer/BufferedWriter.cpp | ||
velox/dwio/text/writer/BufferedWriter.h | ||
velox/dwio/text/tests/writer/TextWriterTest.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems off.
add_test( | ||
NAME velox_text_writer_test | ||
COMMAND velox_text_writer_test | ||
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be simplified as add_test(velox_text_writer_test velox_text_writer_test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
BufferedWriter::BufferedWriter( | ||
std::unique_ptr<dwio::common::FileSink> sink, | ||
std::shared_ptr<memory::MemoryPool> pool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add &&
and const &
, respectively, to avoid object copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was written this way to be more consistent with our codebase
velox/velox/dwio/parquet/writer/Writer.cpp
Lines 221 to 222 in c046524
Writer::Writer( | |
std::unique_ptr<dwio::common::FileSink> sink, |
velox/velox/dwio/parquet/writer/Writer.cpp
Lines 41 to 42 in c046524
ArrowDataBufferSink( | |
std::unique_ptr<dwio::common::FileSink> sink, |
|
||
#pragma once | ||
|
||
namespace facebook::velox::text { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: how about facebook::velox::dwio::text
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for orc/parquet, their namespace doesn't have dwio
namespace facebook::velox::parquet { |
velox/velox/dwio/orc/reader/OrcReader.h
Line 22 in c046524
namespace facebook::velox::orc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to have dwio namespace there. We probably need to fix that cc @majetideepak
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kewang1024 thanks for the change % comments.
|
||
#pragma once | ||
|
||
namespace facebook::velox::text { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
b4cc637
to
d219184
Compare
|
||
#pragma once | ||
|
||
namespace facebook::velox::text { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to have dwio namespace there. We probably need to fix that cc @majetideepak
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kewang1024 overall looks good. thanks for the update!
std::vector<std::vector<std::string>> result = readFile(filePath); | ||
EXPECT_EQ(result.size(), 3); | ||
EXPECT_EQ(result[0].size(), 2); | ||
EXPECT_EQ(result[0][0], "1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider to have a fuzzer test by using (in fbcode/velox/vector/fuzzer/VectorFuzzer.h)
TypePtr randType(
FuzzerGenerator& rng,
const std::vector<TypePtr>& scalarTypes,
int maxDepth = 5);
to generate random type data with variable size vector and verify the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also thought about that, but as of now we're not able to read the data back
we need to do it in a followup after we move text reader into velox
2df38a9
to
6862c7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kewang1024 LGTM. Thanks and please merge after address comments.
eac8374
to
c17a2ce
Compare
c17a2ce
to
ded0c59
Compare
@kewang1024 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@kewang1024 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@kewang1024 merged this pull request in 0605101. |
velox_dwio_arrow_parquet_writer_lib | ||
velox_dwio_arrow_parquet_writer_util_lib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why does text writer need to depend on arrow parquet writer lib ?
This creates problems in presto_server build because parquet arrow are not built natively on the mac.
(Another miss on velox part is that parquet does not build on the mac, and thus velox_dwio_arrow_parquet_writer_lib is not built , along with obviously velox_dwio_text_writer) .
Summary: Add TextWriter support for primitive data types. - [ ] Add support for complex types - [ ] Add TextReader support - [ ] Add fuzzer test after we have TextReader Pull Request resolved: facebookincubator#11789 Reviewed By: xiaoxmeng Differential Revision: D66943043 Pulled By: kewang1024 fbshipit-source-id: 82cd558c3602b8f1b30adbceec313296a22613bc
Add TextWriter support for primitive data types.