Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schemeboard: pass describe-result as an opaque payload #2083

Merged
merged 4 commits into from
Feb 29, 2024

Conversation

ijon
Copy link
Collaborator

@ijon ijon commented Feb 19, 2024

Changelog entry

Make schemeboard replicas consume less CPU, especially when processing rapid updates for tables with huge amount of partitions.

Changelog category

  • Performance improvement

Additional information

Schema information on a path exist in the form of DescribeSchemeResult object: schemeshard generates those objects and publishes them to the schemeboard, schemeboard notifies scheme-caches on the nodes about path info changes. So schemeshard generates DescribeSchemeResult, scheme-cache serves DescribeSchemeResult to the consumers. But schemeboard components in-between do not require the full content of a TEvDescribeSchemeResult to operate efficiently.

This update enables the schemeboard to transmit DescribeSchemeResult through as an opaque payload rather than as a fully detailed protobuf object. Thus reducing the unnecessary memory management and serialization/deserialization overhead.

Copy link

github-actions bot commented Feb 19, 2024

2024-02-19 20:51:31 UTC Pre-commit check for 927b476 has started.
2024-02-19 20:51:34 UTC Build linux-x86_64-release-asan is running...
🔴 2024-02-19 21:20:28 UTC Build failed. see the build logs.
2024-02-19 21:20:40 UTC Tests are running...
🔴 2024-02-19 22:59:08 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
14605 14361 0 36 168 40

Copy link

github-actions bot commented Feb 19, 2024

2024-02-19 20:51:22 UTC Pre-commit check for 927b476 has started.
2024-02-19 20:51:24 UTC Build linux-x86_64-relwithdebinfo is running...
🔴 2024-02-19 21:16:44 UTC Build failed. see the build logs.
2024-02-19 21:16:58 UTC Tests are running...
🔴 2024-02-19 22:51:53 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
67617 56624 0 15 10943 35

@ijon ijon force-pushed the schemeboard-opaque-describeresult branch from be95526 to ba5f107 Compare February 20, 2024 17:45
Copy link

github-actions bot commented Feb 20, 2024

2024-02-20 17:48:51 UTC Pre-commit check for 0f1ab83 has started.
2024-02-20 17:48:56 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-02-20 18:16:37 UTC Build successful.
2024-02-20 18:16:52 UTC Tests are running...
🔴 2024-02-20 19:52:50 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
67709 56707 0 16 10945 41

Copy link

github-actions bot commented Feb 20, 2024

2024-02-20 17:53:47 UTC Pre-commit check for 0f1ab83 has started.
2024-02-20 17:53:50 UTC Build linux-x86_64-release-asan is running...
🟢 2024-02-20 18:20:34 UTC Build successful.
2024-02-20 18:20:48 UTC Tests are running...
🔴 2024-02-20 20:00:21 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
14687 14478 0 36 137 36

@ijon ijon force-pushed the schemeboard-opaque-describeresult branch from ba5f107 to 4760387 Compare February 22, 2024 10:36
Copy link

github-actions bot commented Feb 22, 2024

2024-02-22 10:52:19 UTC Pre-commit check for e9e8f61 has started.
2024-02-22 10:52:22 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-02-22 11:20:19 UTC Build successful.
2024-02-22 11:20:30 UTC Tests are running...
🔴 2024-02-22 12:53:16 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
67911 56922 0 2 10951 36

Copy link

github-actions bot commented Feb 22, 2024

2024-02-22 10:58:52 UTC Pre-commit check for e9e8f61 has started.
2024-02-22 10:58:55 UTC Build linux-x86_64-release-asan is running...
🟢 2024-02-22 11:26:39 UTC Build successful.
2024-02-22 11:26:54 UTC Tests are running...
🔴 2024-02-22 13:08:24 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
14826 14606 0 21 165 34

@ijon ijon force-pushed the schemeboard-opaque-describeresult branch from 4760387 to 4488b95 Compare February 28, 2024 10:46
ijon added 2 commits February 28, 2024 13:47
PathOwner field was replaced by PathOwnerId a long time ago.
Change type of `{TEvUpdate,TEvNotify}.DescribeSchemeResult` from transparent
`TEvDescribeSchemeResult` to opaque `bytes` and support that throughout
Populator, Replica, Subscriber actors.

Properly typed TEvDescribeSchemeResult induce additional overhead to
automatically serialize and deserialize this message when transfering over
the wire.
This performance cost is usually either negligible or imperceptible.
But in specific situations, particularly when rapidly updating partitioning
information for tables with huge number of shards, this overhead could lead
to significant issues. Schemeboard replicas could get overloaded and become
unresponsive to further requests. This is problematic, especially considering
the schemeboard subsystem's critical role in servicing all databases within
a cluster, making it a SPOF.

The core realization is that the schemeboard components do not require
the full content of a TEvDescribeSchemeResult message to operate efficiently.
Instead, only a limited set of fields (path, path-id, version and info about
subdomain/database) is required for processing.
And a whole TEvDescribeSchemeResult could be passed through as an opaque payload.

Type change from TEvDescribeSchemeResult to bytes without changing field number
is a safe move. Actual value of the field remains unchanged at the wire
protocol level.
Thus, older implementations will interpret the payload as
a TEvDescribeSchemeResult message and proceed with deserialization as usual.
And newer implementations will recognize the data as a binary blob and will
deserialize it explicitly only when necessary.

KIKIMR-14948
@ijon ijon force-pushed the schemeboard-opaque-describeresult branch from 4488b95 to 56b61d3 Compare February 28, 2024 10:52
Copy link

github-actions bot commented Feb 28, 2024

2024-02-28 10:55:05 UTC Pre-commit check for 16a7df3 has started.
2024-02-28 10:55:07 UTC Build linux-x86_64-relwithdebinfo is running...
🔴 2024-02-28 11:18:01 UTC Build failed. see the build logs.
2024-02-28 11:18:16 UTC Tests are running...
🔴 2024-02-28 11:55:02 UTC Test run completed, no test results found for commit 56b61d3. Please check build logs.
2024-02-28 11:55:05 UTC Check cancelled

Copy link

github-actions bot commented Feb 28, 2024

2024-02-28 10:55:13 UTC Pre-commit check for 16a7df3 has started.
2024-02-28 10:55:17 UTC Build linux-x86_64-release-asan is running...
🔴 2024-02-28 11:18:06 UTC Build failed. see the build logs.
2024-02-28 11:18:18 UTC Tests are running...
🔴 2024-02-28 11:55:00 UTC Test run completed, no test results found for commit 56b61d3. Please check build logs.
2024-02-28 11:55:03 UTC Check cancelled

Copy link

github-actions bot commented Feb 28, 2024

2024-02-28 10:55:19 UTC Pre-commit check for 16a7df3 has started.
2024-02-28 10:55:21 UTC Build linux-x86_64-release-cmake14 is running...
🔴 2024-02-28 11:18:51 UTC Build failed. see the build logs.

@ijon ijon marked this pull request as ready for review February 28, 2024 11:01
@ijon ijon requested review from CyberROFL and snaury February 28, 2024 11:02
Copy link

github-actions bot commented Feb 28, 2024

2024-02-28 11:56:17 UTC Pre-commit check for 68c485d has started.
2024-02-28 11:56:19 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-02-28 11:58:31 UTC Build successful.
2024-02-28 11:58:40 UTC Tests are running...
🔴 2024-02-28 12:55:23 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
67914 56994 0 10 10889 21

Copy link

github-actions bot commented Feb 28, 2024

2024-02-28 11:57:27 UTC Pre-commit check for 68c485d has started.
2024-02-28 11:57:29 UTC Build linux-x86_64-release-cmake14 is running...
🟢 2024-02-28 11:59:43 UTC Build successful.

Copy link

github-actions bot commented Feb 28, 2024

2024-02-28 11:58:32 UTC Pre-commit check for 68c485d has started.
2024-02-28 11:58:35 UTC Build linux-x86_64-release-asan is running...
🟢 2024-02-28 12:01:20 UTC Build successful.
2024-02-28 12:01:33 UTC Tests are running...
🔴 2024-02-28 13:11:38 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
14878 14745 0 26 83 24

Copy link

github-actions bot commented Feb 28, 2024

2024-02-28 14:19:29 UTC Pre-commit check for c4e209f has started.
2024-02-28 14:19:31 UTC Build linux-x86_64-release-cmake14 is running...
🟢 2024-02-28 14:40:29 UTC Build successful.

Copy link

github-actions bot commented Feb 28, 2024

2024-02-28 14:20:46 UTC Pre-commit check for c4e209f has started.
2024-02-28 14:20:48 UTC Build linux-x86_64-release-asan is running...
🟢 2024-02-28 14:47:07 UTC Build successful.
2024-02-28 14:47:22 UTC Tests are running...
🔴 2024-02-28 16:29:51 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
14879 14722 0 32 103 22

Copy link

github-actions bot commented Feb 28, 2024

2024-02-28 14:20:49 UTC Pre-commit check for c4e209f has started.
2024-02-28 14:20:51 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-02-28 14:45:21 UTC Build successful.
2024-02-28 14:45:30 UTC Tests are running...
🔴 2024-02-28 16:17:25 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
68015 57097 0 8 10890 20

@ijon
Copy link
Collaborator Author

ijon commented Feb 29, 2024

Test failures (in linux-x86_64-relwithdebinfo build conf) have known causes and are not related to this pr.

Comment on lines +40 to +44

// Context->SetLogPriority(NKikimrServices::SCHEME_BOARD_REPLICA, NLog::PRI_DEBUG);
// Context->SetLogPriority(NKikimrServices::SCHEME_BOARD_SUBSCRIBER, NLog::PRI_DEBUG);
// Context->SetLogPriority(NKikimrServices::TX_PROXY_SCHEME_CACHE, NLog::PRI_DEBUG);
// Context->SetLogPriority(NKikimrServices::FLAT_TX_SCHEMESHARD, NLog::PRI_DEBUG);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Лишнее.

@ijon ijon self-assigned this Feb 29, 2024
@ijon ijon linked an issue Feb 29, 2024 that may be closed by this pull request
@ijon ijon merged commit 3819aed into ydb-platform:main Feb 29, 2024
3 of 5 checks passed
ijon added a commit to ijon/ydb that referenced this pull request Mar 1, 2024
Cherry-pick 3819aed from main (ydb-platform#2083).

Change type of `{TEvUpdate,TEvNotify}.DescribeSchemeResult` from transparent
`TEvDescribeSchemeResult` to opaque `bytes` and support that throughout
Populator, Replica, Subscriber actors.

Properly typed TEvDescribeSchemeResult induce additional overhead to
automatically serialize and deserialize this message when transfering over
the wire.
This performance cost is usually either negligible or imperceptible.
But in specific situations, particularly when rapidly updating partitioning
information for tables with huge number of shards, this overhead could lead
to significant issues. Schemeboard replicas could get overloaded and become
unresponsive to further requests. This is problematic, especially considering
the schemeboard subsystem's critical role in servicing all databases within
a cluster, making it a SPOF.

The core realization is that the schemeboard components do not require
the full content of a TEvDescribeSchemeResult message to operate efficiently.
Instead, only a limited set of fields (path, path-id, version and info about
subdomain/database) is required for processing.
And a whole TEvDescribeSchemeResult could be passed through as an opaque payload.

Type change from TEvDescribeSchemeResult to bytes without changing field number
is a safe move. Actual value of the field remains unchanged at the wire
protocol level.
Thus, older implementations will interpret the payload as
a TEvDescribeSchemeResult message and proceed with deserialization as usual.
And newer implementations will recognize the data as a binary blob and will
deserialize it explicitly only when necessary.

KIKIMR-14948
ijon added a commit that referenced this pull request Mar 5, 2024
Cherry-pick 3819aed from main (#2083).

Make schemeboard replicas consume less CPU, especially when processing rapid updates for tables with huge amount of partitions.

Schema information on a path exist in the form of `DescribeSchemeResult` object: schemeshard generates those objects and publishes them to the schemeboard, schemeboard notifies scheme-caches on the nodes about path info changes. So schemeshard generates `DescribeSchemeResult`, scheme-cache serves `DescribeSchemeResult` to the consumers. But schemeboard components in-between do not require the full content of a TEvDescribeSchemeResult to operate efficiently.

This update enables the schemeboard to transmit `DescribeSchemeResult` through as an opaque payload rather than as a fully detailed protobuf object. Thus reducing the unnecessary memory management and serialization/deserialization overhead.
@shnikd shnikd mentioned this pull request Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not deserialize describe-scheme on schemeboard replicas
3 participants