-
Notifications
You must be signed in to change notification settings - Fork 45
Move metadata #280
Move metadata #280
Conversation
Codecov Report
@@ Coverage Diff @@
## main #280 +/- ##
============================================
- Coverage 77.39% 77.07% -0.33%
- Complexity 1509 1550 +41
============================================
Files 196 198 +2
Lines 7831 8217 +386
Branches 1246 1317 +71
============================================
+ Hits 6061 6333 +272
- Misses 1095 1172 +77
- Partials 675 712 +37 Continue to review full report at Codecov.
|
add routing to get request
@@ -48,11 +48,12 @@ data class StepMetaData( | |||
} | |||
|
|||
override fun toXContent(builder: XContentBuilder, params: ToXContent.Params): XContentBuilder { | |||
return builder.startObject(STEP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we don't show step metadata out as json. The change here is to be consistent with ActionMetadata code link
Moving this comment to this PR from other High level comment - worried about perf issues by moving it into config index. Job scheduler does have a postIndex subscriber to the config index which is executed every time a doc is added/updated. Imagining someone with 10k indices every 1 minute which means 20k updates per minute (each ISM execution ends up doing 2 updates to metadata for starting/closing a transaction) which are all triggering that callback which then has to check if it's of a job type to schedule (which it isn't). Alternatively we can consider another ism index which is purely for metadata which won't have this listener attached... we could have both rollup and ism metadata in there instead of the config index. Some testing could be good to see if this is a real concern. |
...n/opendistroforelasticsearch/indexmanagement/indexstatemanagement/ManagedIndexCoordinator.kt
Outdated
Show resolved
Hide resolved
...n/opendistroforelasticsearch/indexmanagement/indexstatemanagement/ManagedIndexCoordinator.kt
Outdated
Show resolved
Hide resolved
...n/opendistroforelasticsearch/indexmanagement/indexstatemanagement/ManagedIndexCoordinator.kt
Outdated
Show resolved
Hide resolved
...n/opendistroforelasticsearch/indexmanagement/indexstatemanagement/ManagedIndexCoordinator.kt
Outdated
Show resolved
Hide resolved
val bulkRequest = BulkRequest().add(deleteRequests) | ||
val bulkResponse: BulkResponse = client.suspendUntil { bulk(bulkRequest, it) } | ||
bulkResponse.forEach { | ||
if (it.isFailed) logger.error("Failed to clear ManagedIndexMetadata for [index=${it.index}]. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any particular failure reasons in which we want to collect the failures and throw an exception with a retryCause
to trigger the retryPolicy
on them here?
Something similar to this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think since this is just to remove metadata, we may not need to do that. @dbbaughe to give some opinion
...management/indexstatemanagement/transport/action/changepolicy/TransportChangePolicyAction.kt
Outdated
Show resolved
Hide resolved
...arch/indexmanagement/indexstatemanagement/transport/action/explain/TransportExplainAction.kt
Outdated
Show resolved
Hide resolved
...istroforelasticsearch/indexmanagement/indexstatemanagement/IndexStateManagementITTestCase.kt
Outdated
Show resolved
Hide resolved
...orelasticsearch/indexmanagement/indexstatemanagement/resthandler/RestChangePolicyActionIT.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/com/amazon/opendistroforelasticsearch/indexmanagement/IndexManagementPlugin.kt
Show resolved
Hide resolved
val request = NodesInfoRequest().clear().addMetric("plugins") | ||
client.execute(NodesInfoAction.INSTANCE, request, object : ActionListener<NodesInfoResponse> { | ||
override fun onResponse(response: NodesInfoResponse) { | ||
flag = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should not reset flag at start. What happens when:
Cluster has multiple versions when new node is added and this logic is executed and sets flag to true.
As job is running another node is added and this sets flag back to false while checking and job is running and goes through. Small race condition, but could happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's some good thinking! I move it to
if (versionSet.size > 1) {
flag = true
} else flag = false
.../com/amazon/opendistroforelasticsearch/indexmanagement/indexstatemanagement/SkipExecution.kt
Show resolved
Hide resolved
Can you post examples of any API response additions/changes and/or history document example too. |
...pendistroforelasticsearch/indexmanagement/indexstatemanagement/model/ManagedIndexMetaData.kt
Outdated
Show resolved
Hide resolved
...pendistroforelasticsearch/indexmanagement/indexstatemanagement/model/ManagedIndexMetaData.kt
Show resolved
Hide resolved
PolicyRetryInfoMetaData.RETRY_INFO -> retryInfo = PolicyRetryInfoMetaData.parse(xcp) | ||
TRANSITION_TO -> transitionTo = if (xcp.currentToken() == Token.VALUE_NULL) null else xcp.text() | ||
StateMetaData.STATE -> { | ||
// check null for invalid policy situation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain more about this situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what this comment means, but since we now save state metadata even it's null, we should have corresponding parse to deal with null state metadata
...in/com/amazon/opendistroforelasticsearch/indexmanagement/indexstatemanagement/TestHelpers.kt
Show resolved
Hide resolved
...management/transport/action/updateindexmetadata/TransportUpdateManagedIndexMetaDataAction.kt
Show resolved
Hide resolved
...on/opendistroforelasticsearch/indexmanagement/indexstatemanagement/util/ManagedIndexUtils.kt
Show resolved
Hide resolved
...on/opendistroforelasticsearch/indexmanagement/indexstatemanagement/util/ManagedIndexUtils.kt
Outdated
Show resolved
Hide resolved
...ndistroforelasticsearch/indexmanagement/indexstatemanagement/elasticapi/ElasticExtensions.kt
Show resolved
Hide resolved
...ndistroforelasticsearch/indexmanagement/indexstatemanagement/elasticapi/ElasticExtensions.kt
Outdated
Show resolved
Hide resolved
...ndistroforelasticsearch/indexmanagement/indexstatemanagement/elasticapi/ElasticExtensions.kt
Outdated
Show resolved
Hide resolved
private val listOfMetadata: MutableList<ManagedIndexMetaData> = mutableListOf() | ||
private val listOfIndexToMetadata: MutableList<Pair<Index, ManagedIndexMetaData>> = mutableListOf() | ||
private val mapOfItemIdToIndex: MutableMap<Int, Index> = mutableMapOf() | ||
private lateinit var clusterState: ClusterState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generic comment for whole file:
What happens when you have ManagedIndexMetadata from the cluster state and ManagedIndexMetadata from the config index (i.e. the migration started but failed on delete part)?
Also what happens when only cluster state has the ManagedIndexMetadata (i.e. migration hasn't happened yet)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now using metadataservice to move metadata, ISM only rely on metadata saved in/moved to config index, if metadata stay in cluster state, ISM will wait until it get moved to config index
...management/indexstatemanagement/transport/action/changepolicy/TransportChangePolicyAction.kt
Show resolved
Hide resolved
updateManagedIndexConfigStartTime(managedIndexConfig) | ||
|
||
// check no job has been run | ||
wait { assertEquals(null, getExistingManagedIndexConfig(indexName).policy) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So best case (in successful path) this just runs the assert block every 100ms for 10 seconds to verify nothing ran? Why not just sleep the thread for 10 seconds and assert it never ran instead of a new function etc.?
...n/opendistroforelasticsearch/indexmanagement/indexstatemanagement/ManagedIndexCoordinator.kt
Outdated
Show resolved
Hide resolved
...istroforelasticsearch/indexmanagement/indexstatemanagement/runner/ManagedIndexRunnerTests.kt
Show resolved
Hide resolved
...amazon/opendistroforelasticsearch/indexmanagement/indexstatemanagement/ManagedIndexRunner.kt
Outdated
Show resolved
Hide resolved
...om/amazon/opendistroforelasticsearch/indexmanagement/indexstatemanagement/MetadataService.kt
Show resolved
Hide resolved
...azon/opendistroforelasticsearch/indexmanagement/indexstatemanagement/MetadataRegressionIT.kt
Outdated
Show resolved
Hide resolved
...om/amazon/opendistroforelasticsearch/indexmanagement/indexstatemanagement/MetadataService.kt
Outdated
Show resolved
Hide resolved
...om/amazon/opendistroforelasticsearch/indexmanagement/indexstatemanagement/MetadataService.kt
Outdated
Show resolved
Hide resolved
...om/amazon/opendistroforelasticsearch/indexmanagement/indexstatemanagement/MetadataService.kt
Show resolved
Hide resolved
...om/amazon/opendistroforelasticsearch/indexmanagement/indexstatemanagement/MetadataService.kt
Outdated
Show resolved
Hide resolved
...om/amazon/opendistroforelasticsearch/indexmanagement/indexstatemanagement/MetadataService.kt
Outdated
Show resolved
Hide resolved
...om/amazon/opendistroforelasticsearch/indexmanagement/indexstatemanagement/MetadataService.kt
Outdated
Show resolved
Hide resolved
...om/amazon/opendistroforelasticsearch/indexmanagement/indexstatemanagement/MetadataService.kt
Show resolved
Hide resolved
...om/amazon/opendistroforelasticsearch/indexmanagement/indexstatemanagement/MetadataService.kt
Outdated
Show resolved
Hide resolved
...om/amazon/opendistroforelasticsearch/indexmanagement/indexstatemanagement/MetadataService.kt
Outdated
Show resolved
Hide resolved
...om/amazon/opendistroforelasticsearch/indexmanagement/indexstatemanagement/MetadataService.kt
Outdated
Show resolved
Hide resolved
...om/amazon/opendistroforelasticsearch/indexmanagement/indexstatemanagement/MetadataService.kt
Outdated
Show resolved
Hide resolved
...om/amazon/opendistroforelasticsearch/indexmanagement/indexstatemanagement/MetadataService.kt
Outdated
Show resolved
Hide resolved
...om/amazon/opendistroforelasticsearch/indexmanagement/indexstatemanagement/MetadataService.kt
Outdated
Show resolved
Hide resolved
finishFlag = false; runningLock = false | ||
return | ||
} | ||
if (counter++ > 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this if
check is looking for this code path being hit a certain number of times. If I'm not mistaken, since the counter++
increment is a postfix, it will return the original value and then increment. So we hit if (clusterStateMetadata.isEmpty())
4 times in a row (since the else
sets counter back to 0), we enter this if
statement. Is there a reason that was the condition chosen for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wanna choose 3. No specific reason for this, I just want a way to cancel the scheduling job at some point. pls lemme know if any suggestion
...roforelasticsearch/indexmanagement/indexstatemanagement/IndexStateManagementIntegTestCase.kt
Outdated
Show resolved
Hide resolved
@@ -138,6 +139,9 @@ class ManagedIndexCoordinator( | |||
indexStateManagementEnabled = it | |||
if (!indexStateManagementEnabled) disable() else enable() | |||
} | |||
clusterService.clusterSettings.addSettingsUpdateConsumer(METADATA_SERVICE_ENABLED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it keeps scheduling in the other functions even if disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, add a metadataServiceEnabled variable to save this setting and use it to return early if it's false
Issue #, if available:
#207
Description of changes:
this is to make sure metadata in the cluster state will never be newer than metadata in the config index, so that we can safely move metadata from cluster state to config index when the node is running this version of ISM.
tests
MetadataRegressionIT
SkipExecution
flag./gradlew mixedCluster --tests "com.amazon.opendistroforelasticsearch.indexmanagement.indexstatemanagement.MetadataRegressionIT.test new node skip execution when old node exist in cluster" -PnumNodes=2 -DmixCluster=true
SkipExecutionTests
SkipExecution
flagExample Response
metadata saved in config index
Explain API output
history index content
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.