Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add TextWriter #11789

Closed

Conversation

kewang1024
Copy link
Contributor

@kewang1024 kewang1024 commented Dec 9, 2024

Add TextWriter support for primitive data types.

  • Add support for complex types
  • Add TextReader support
  • Add fuzzer test after we have TextReader

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 9, 2024
Copy link

netlify bot commented Dec 9, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit ded0c59
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67739899fd7a4700081c50b4

@facebook-github-bot
Copy link
Contributor

@kewang1024 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@kewang1024 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

CMakeLists.txt Outdated
add_library(gflags_gflags INTERFACE
velox/dwio/text/writer/BufferedWriter.cpp
velox/dwio/text/writer/BufferedWriter.h
velox/dwio/text/tests/writer/TextWriterTest.cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems off.

velox/dwio/text/CMakeLists.txt Outdated Show resolved Hide resolved
add_test(
NAME velox_text_writer_test
COMMAND velox_text_writer_test
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be simplified as add_test(velox_text_writer_test velox_text_writer_test)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


BufferedWriter::BufferedWriter(
std::unique_ptr<dwio::common::FileSink> sink,
std::shared_ptr<memory::MemoryPool> pool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add && and const &, respectively, to avoid object copies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

@kewang1024 kewang1024 Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was written this way to be more consistent with our codebase

Writer::Writer(
std::unique_ptr<dwio::common::FileSink> sink,

ArrowDataBufferSink(
std::unique_ptr<dwio::common::FileSink> sink,


#pragma once

namespace facebook::velox::text {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: how about facebook::velox::dwio::text?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

@kewang1024 kewang1024 Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for orc/parquet, their namespace doesn't have dwio

namespace facebook::velox::parquet {

namespace facebook::velox::orc {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to have dwio namespace there. We probably need to fix that cc @majetideepak

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kewang1024 thanks for the change % comments.


#pragma once

namespace facebook::velox::text {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

velox/dwio/text/writer/TextWriter.h Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.h Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.h Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.h Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.cpp Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.cpp Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.cpp Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.cpp Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.cpp Outdated Show resolved Hide resolved
@xiaoxmeng xiaoxmeng mentioned this pull request Dec 29, 2024
@kewang1024 kewang1024 changed the title [Draft. Don't review] Text writer Add Text writer Dec 29, 2024
@kewang1024 kewang1024 marked this pull request as ready for review December 29, 2024 07:15
@kewang1024 kewang1024 changed the title Add Text writer feat: Add Text writer Dec 29, 2024
@kewang1024 kewang1024 changed the title feat: Add Text writer feat: Add TextWriter Dec 29, 2024
@kewang1024 kewang1024 requested a review from xiaoxmeng December 29, 2024 07:18

#pragma once

namespace facebook::velox::text {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to have dwio namespace there. We probably need to fix that cc @majetideepak

velox/dwio/text/writer/BufferedWriterSink.h Outdated Show resolved Hide resolved
velox/dwio/text/writer/BufferedWriterSink.h Outdated Show resolved Hide resolved
velox/dwio/text/writer/BufferedWriterSink.h Outdated Show resolved Hide resolved
velox/dwio/text/writer/BufferedWriterSink.h Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.cpp Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.cpp Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.cpp Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.cpp Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kewang1024 overall looks good. thanks for the update!

velox/dwio/text/tests/writer/TextWriterTest.cpp Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.cpp Show resolved Hide resolved
std::vector<std::vector<std::string>> result = readFile(filePath);
EXPECT_EQ(result.size(), 3);
EXPECT_EQ(result[0].size(), 2);
EXPECT_EQ(result[0][0], "1");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider to have a fuzzer test by using (in fbcode/velox/vector/fuzzer/VectorFuzzer.h)

TypePtr randType(
    FuzzerGenerator& rng,
    const std::vector<TypePtr>& scalarTypes,
    int maxDepth = 5);

to generate random type data with variable size vector and verify the result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also thought about that, but as of now we're not able to read the data back
we need to do it in a followup after we move text reader into velox

@kewang1024 kewang1024 force-pushed the text-writer-new branch 3 times, most recently from 2df38a9 to 6862c7b Compare December 30, 2024 20:18
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kewang1024 LGTM. Thanks and please merge after address comments.

velox/dwio/text/tests/writer/FileReaderUtil.h Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.h Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.h Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.h Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.h Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.cpp Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.cpp Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.cpp Outdated Show resolved Hide resolved
velox/dwio/text/writer/TextWriter.cpp Outdated Show resolved Hide resolved
@kewang1024 kewang1024 force-pushed the text-writer-new branch 6 times, most recently from eac8374 to c17a2ce Compare December 31, 2024 06:31
@facebook-github-bot
Copy link
Contributor

@kewang1024 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@kewang1024 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@kewang1024 merged this pull request in 0605101.

Comment on lines +19 to +20
velox_dwio_arrow_parquet_writer_lib
velox_dwio_arrow_parquet_writer_util_lib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why does text writer need to depend on arrow parquet writer lib ?
This creates problems in presto_server build because parquet arrow are not built natively on the mac.
(Another miss on velox part is that parquet does not build on the mac, and thus velox_dwio_arrow_parquet_writer_lib is not built , along with obviously velox_dwio_text_writer) .

athmaja-n pushed a commit to athmaja-n/velox that referenced this pull request Jan 10, 2025
Summary:
Add TextWriter support for primitive data types.

- [ ] Add support for complex types
- [ ] Add TextReader support
- [ ] Add fuzzer test after we have TextReader

Pull Request resolved: facebookincubator#11789

Reviewed By: xiaoxmeng

Differential Revision: D66943043

Pulled By: kewang1024

fbshipit-source-id: 82cd558c3602b8f1b30adbceec313296a22613bc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants