Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native] Table writer 2: Add Presto write protocols #18377

Merged
merged 1 commit into from
Nov 12, 2022

Conversation

gggrace14
Copy link
Contributor

@gggrace14 gggrace14 commented Sep 21, 2022

Add PrestoNoCommitWriteProtocol and PrestoTaskCommitWriteProtocol,
which extends the write protocols of Hive connector and make commit()
return the specific format of outputs expected by Presto from
the table writer. Register the two write protocols during server start.
The right protocol will be picked up by the table writer,
given the CommitStrategy used by the TableWriter.

@gggrace14 gggrace14 force-pushed the insert branch 4 times, most recently from d39e15a to 47a898e Compare September 26, 2022 17:16
@gggrace14 gggrace14 force-pushed the insert branch 3 times, most recently from 596d4f9 to 78198b6 Compare October 7, 2022 20:33
@gggrace14 gggrace14 marked this pull request as ready for review October 7, 2022 20:43
@gggrace14 gggrace14 requested a review from a team as a code owner October 7, 2022 20:43
@gggrace14 gggrace14 force-pushed the insert branch 2 times, most recently from 964c0eb to c84c999 Compare October 14, 2022 23:34
@gggrace14 gggrace14 changed the title [native] Support INSERT INTO unpartitioned table [native] Add Presto write protocols Oct 14, 2022
@gggrace14 gggrace14 changed the title [native] Add Presto write protocols [native] Table write 2: Add Presto write protocols Oct 14, 2022
@gggrace14 gggrace14 force-pushed the insert branch 5 times, most recently from 9e68793 to c069b87 Compare October 21, 2022 08:22
@gggrace14 gggrace14 changed the title [native] Table write 2: Add Presto write protocols [native] Table writer: Add Presto write protocols Oct 21, 2022
@gggrace14 gggrace14 changed the title [native] Table writer: Add Presto write protocols [native] Table writer 2: Add Presto write protocols Oct 25, 2022
@gggrace14 gggrace14 force-pushed the insert branch 3 times, most recently from 6c33e23 to 80b2b95 Compare October 27, 2022 22:25
@gggrace14 gggrace14 force-pushed the insert branch 3 times, most recently from ca52b64 to 39ca0c5 Compare November 2, 2022 04:55
@mbasmanova
Copy link
Contributor

@gggrace14 There are CI failures. Are these related?

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gggrace14 Looks good overall. Some comments below.

case protocol::TableType::TEMPORARY:
return connector::hive::LocationHandle::TableType::kTemporary;
default:
throw std::invalid_argument("Unknown table type");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use VELOX_USER_CHECK and include tableType in the error message (see toJsonString helper function) to simplify troubleshooting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised here and other places. Found in this file we use VELOX_UNSUPPORTED for this case

return connector::hive::LocationHandle::WriteMode::
kDirectToTargetExistingDirectory;
default:
throw std::invalid_argument("Unknown write mode");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -1662,6 +1717,7 @@ VeloxQueryPlanConverter::toVeloxQueryPlan(
node->columnNames,
insertTableHandle,
outputType,
connector::WriteProtocol::CommitStrategy::kNoCommit,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is no-commit protocol hard-coded here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where a system sets the commit strategy it's going to use. CommitStrategy::kNoCommit is set for Presto.

Also discussed with MJ previously, and for PrestoSpark, he will change here to add a condition to set kTaskCommit for PrestoSpark and kNoCommit for Presto.

class PrestoNoCommitWriteProtocol
: public velox::connector::hive::HiveNoCommitWriteProtocol {
public:
~PrestoNoCommitWriteProtocol() override {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= default or remove this and let compiler auto-generate

std::make_shared<PrestoNoCommitWriteProtocol>());
}

// private:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented out code

rowWrittenVector = std::dynamic_pointer_cast<FlatVector<int64_t>>(
BaseVector::create(BIGINT(), 1, pool));
rowWrittenVector->set(0, commitInfo.numWrittenRows());
columns.emplace_back(rowWrittenVector);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider shortening

columns.emplace_back(BaseVector::createConstant(commitInfo.numWrittenRows(), 1, pool))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know


vector_size_t numOutputRows = 1;
FlatVectorPtr<int64_t> rowWrittenVector;
FlatVectorPtr<StringView> fragmentsVector;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fragmentsVector and commitContextVector are used only in the 'else' branch. Consider moving them there.

commitInfo.connectorCommitInfo());
numOutputRows = hiveCommitInfo.writerParameters().size() + 1;

// Set rows column
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add period at the end of the sentence; ditto other places

numOutputRows = hiveCommitInfo.writerParameters().size() + 1;

// Set rows column
rowWrittenVector = std::dynamic_pointer_cast<FlatVector<int64_t>>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use BaseVector::create<FlatVector<int64_t>>(...)

ditto other places

return std::make_shared<RowVector>(
pool,
commitInfo.outputType(),
BufferPtr(nullptr),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just "nullptr"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know

@gggrace14 gggrace14 force-pushed the insert branch 3 times, most recently from b8f071f to f0d3a97 Compare November 3, 2022 16:20
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gggrace14 Looks good to me. Please, make sure all CI is green before landing.

("targetFileName", hiveWriterParameters->targetFileName())
("fileSize", 0)))
("rowCount", numWrittenRows)
("inMemoryDataSizeInBytes", 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://fburl.com/18vim7k4.

Is this accessible outside of Meta?

}
};

class PrestoHiveTaskCommitWriteProtocol
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are inside presto::protocol namespace, "Presto" prefix can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see

Add PrestoNoCommitWriteProtocol and PrestoTaskCommitWriteProtocol,
which extends the write protocols of Hive connector and make commit()
return the specific format of outputs expected by Presto from
the table writer. Register the two write protocols during server start.
The right protocol will be picked up by the table writer,
given the CommitStrategy used by the TableWriter.
@gggrace14 gggrace14 merged commit c329c38 into prestodb:master Nov 12, 2022
@gggrace14 gggrace14 deleted the insert branch November 12, 2022 07:14
@wanglinsong wanglinsong mentioned this pull request Jan 12, 2023
30 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants