-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: dynamic configuration change #13660
Conversation
Codecov Report
@@ Coverage Diff @@
## master #13660 +/- ##
================================================
- Coverage 80.0635% 79.9129% -0.1507%
================================================
Files 473 473
Lines 116440 115801 -639
================================================
- Hits 93226 92540 -686
- Misses 15924 15973 +49
+ Partials 7290 7288 -2 |
Signed-off-by: Ryan Leung <rleungx@gmail.com>
9c70ce2
to
76a534d
Compare
There are also several scenarios to consider, it is recommended to add to the workflow:
In addition, it is best to describe how the workdflow is in the normal machines and kubernetes environment. |
- New cluster. Both TiDB and TiKV use the default configuration and send it to | ||
PD to complete the registration. The registration needs to establish the mapping | ||
relationship between the component ID, version and local configuration. For | ||
customized requirements, such as modifying the size of block cache. It needs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does the customization happen? I think it should be happened before TiKV and TiDB to register themselves. Because some configurations won't take effect after restart.
Global global = 2; | ||
} | ||
string name = 3; | ||
string value = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All values are string?
files of those components can be removed and don't need to learn how those | ||
tools works. It reduces administrative costs. For configuration options that | ||
cannot be modified dynamically, we still can change it using this unified way, | ||
but we need to wait for the next restart after modification to take effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a mechanism to tell user if they need to restart the cluster to make the modification to take effect.
The functions of each interface are as follows: | ||
|
||
*Create* is used to register a configuration to PD when the components start. | ||
*Get* is used to get the complete configuration of the component periodically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a Watch
API be helpful? Client could watch the configuration version and only Get
on version change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can directly use the version to decide if we need to return the configuration?
|
||
*Create* is used to register a configuration to PD when the components start. | ||
*Get* is used to get the complete configuration of the component periodically | ||
from PD and decide if the component configuration need to update. *Update* is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does Update
perform concurrency control?
These two types are used to distinguish whether the configuration is shared by | ||
components. For example, the label configuration of TiKV is individual for each | ||
TiKV instance. So the type should be local. Each instance here is uniquely identified | ||
by the *component_id*, which can be obtained by hashing *IP: port*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know little about the behavior convention of components, but the IP
of a logical instance would change in many scenarios, wouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the registered address, in k8s the instance IP may change when upgraded but we can register a persistent address for each component.
registers or queries the component ID. By comparing the version carried by the | ||
request with the version stored in PD, it determines whether to return the | ||
configuration of the component in the response. After receiving the reply, TiDB | ||
or TiKV decides whether to update the configuration or not after comparing with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose PD lost the configurations in a disaster, can operator make configuration changes after recovery? Seems like that the changes will be ignored by component because the version in PD is less than version in component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, the PD recover tool will recover the cluster-id and also pick a big enough alloc-id which needs bigger than the current allocated id. So I guess when storing configuration in PD, the corresponding config version needs to be handled the same way.
or TiKV decides whether to update the configuration or not after comparing with | ||
the version stored in the component. | ||
|
||
- Delete the node. PD can directly delete the corresponding component ID, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the API of deleting a node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are two ways to delete a node. One is adding a delete API. Another one is using the TTL mechanism.
|
||
- Add a new component or restart the component. The initialization of | ||
configuration calls the *Create* method. After receiving the request, PD first | ||
registers or queries the component ID. By comparing the version carried by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does a new component or a restarted component determine which version to send in the request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the version of each component also needs to persist itself.
Signed-off-by: Ryan Leung <rleungx@gmail.com>
1 similar comment
/label test |
These labels are not found test. |
No release note, Please follow https://github.com/pingcap/community/blob/master/contributors/release-note-checker.md |
/label test,wip |
These labels are not found test,wip. |
2 similar comments
I'm going to close this PR since it's hasn't been updated for a long time, feel free to reopen it if you are planning to continue this PR in the future. Thanks for your contribution. |
Summary
This proposal proposes a unified way to manage the configuration options of TiDB, TiKV, and PD by storing them in PD and support dynamically change the configuration options by using the same way among the different components which can greatly improve usability.
Motivation
Here are some reasons why we need to do it:
pd-ctl
,tikv-ctl
, andSQL
, resulting in poor usability. For better usability, provide a unified way to modify them dynamically.