Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Test org.opensearch.indices.replication.SegmentReplicationIT.testReplicaHasDiffFilesThanPrimary is flaky #6885

Closed
kotwanikunal opened this issue Mar 29, 2023 · 3 comments · Fixed by #6979 or #7513
Assignees
Labels
bug Something isn't working distributed framework flaky-test Random test failure that succeeds on second run

Comments

@kotwanikunal
Copy link
Member

Describe the bug

To Reproduce
Steps to reproduce the behavior:

./gradlew ':server:internalClusterTest' --tests "org.opensearch.indices.replication.SegmentReplicationIT.testReplicaHasDiffFilesThanPrimary" -Dtests.seed=40450506CE0A9D4A -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=es-PE -Dtests.timezone=Antarctica/Vostok -Druntime.java=19

Screenshots

Host/Environment (please complete the following information):

  • Jenkins Build Server

Additional context
Add any other context about the problem here.

@kotwanikunal kotwanikunal added bug Something isn't working untriaged labels Mar 29, 2023
@kotwanikunal kotwanikunal added flaky-test Random test failure that succeeds on second run untriaged and removed untriaged labels Mar 29, 2023
@dreamer-89
Copy link
Member

Failing seed

./gradlew ':server:internalClusterTest' --tests "org.opensearch.indices.replication.SegmentReplicationIT.testReplicaHasDiffFilesThanPrimary" -Dtests.seed=40450506CE0A9D4A -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=es-PE -Dtests.timezone=Antarctica/Vostok -Druntime.java=19

org.opensearch.indices.replication.SegmentReplicationIT > testReplicaHasDiffFilesThanPrimary FAILED
    java.lang.AssertionError: Count is 8 hits but 10 was expected.  Total shards: 1 Successful shards: 1 & 0 shard failures:
        at __randomizedtesting.SeedInfo.seed([40450506CE0A9D4A:B7A9A677146A423D]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.opensearch.test.hamcrest.OpenSearchAssertions.assertHitCount(OpenSearchAssertions.java:303)
        at org.opensearch.indices.replication.SegmentReplicationBaseIT.assertDocCounts(SegmentReplicationBaseIT.java:100)
        at org.opensearch.indices.replication.SegmentReplicationIT.lambda$testReplicaHasDiffFilesThanPrimary$6(SegmentReplicationIT.java:747)
        at org.opensearch.test.OpenSearchTestCase.assertBusy(OpenSearchTestCase.java:1060)
        at org.opensearch.test.OpenSearchTestCase.assertBusy(OpenSearchTestCase.java:1033)
        at org.opensearch.indices.replication.SegmentReplicationIT.testReplicaHasDiffFilesThanPrimary(SegmentReplicationIT.java:747)

@saratvemulapalli
Copy link
Member

Re-opening this: The tests are flaky and hitting into failures again. See: #7302 (comment)

@mch2
Copy link
Member

mch2 commented May 10, 2023

Fixed a portion of this with linked PR which fails when replicas are not sync'd but still getting failures for seed -Dtests.seed=788A6B8A3DB5D6B4.
Now it fails with

五月 11, 2023 7:02:39 上午 com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
警告: Uncaught exception in thread: Thread[#79,opensearch[node_t2][refresh][T#1],5,TGRP-SegmentReplicationIT]
java.lang.AssertionError: afterRefresh called but not beforeRefresh
	at __randomizedtesting.SeedInfo.seed([788A6B8A3DB5D6B4]:0)
	at org.opensearch.index.shard.IndexShard$RefreshMetricUpdater.afterRefresh(IndexShard.java:4315)
	at org.apache.lucene.search.ReferenceManager.notifyRefreshListenersRefreshed(ReferenceManager.java:275)
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:182)
	at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240)
	at org.opensearch.index.engine.NRTReplicationEngine.refresh(NRTReplicationEngine.java:336)
	at org.opensearch.index.shard.IndexShard.refresh(IndexShard.java:1255)
	at org.opensearch.action.admin.indices.refresh.TransportShardRefreshAction.lambda$shardOperationOnReplica$1(TransportShardRefreshAction.java:110)
	at org.opensearch.action.ActionListener.completeWith(ActionListener.java:342)
	at org.opensearch.action.admin.indices.refresh.TransportShardRefreshAction.shardOperationOnReplica(TransportShardRefreshAction.java:109)
	at org.opensearch.action.admin.indices.refresh.TransportShardRefreshAction.shardOperationOnReplica(TransportShardRefreshAction.java:57)
	at org.opensearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.onResponse(TransportReplicationAction.java:758)
	at org.opensearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.onResponse(TransportReplicationAction.java:729)
	at org.opensearch.index.shard.IndexShard.lambda$innerAcquireReplicaOperationPermit$41(IndexShard.java:3873)
	at org.opensearch.action.ActionListener$3.onResponse(ActionListener.java:130)
	at org.opensearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:309)
	at org.opensearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:254)
	at org.opensearch.index.shard.IndexShard.lambda$acquireReplicaOperationPermit$39(IndexShard.java:3801)
	at org.opensearch.index.shard.IndexShard.innerAcquireReplicaOperationPermit(IndexShard.java:3921)
	at org.opensearch.index.shard.IndexShard.acquireReplicaOperationPermit(IndexShard.java:3795)
	at org.opensearch.action.support.replication.TransportReplicationAction.acquireReplicaOperationPermit(TransportReplicationAction.java:1204)
	at org.opensearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.doRun(TransportReplicationAction.java:849)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at org.opensearch.action.support.replication.TransportReplicationAction.handleReplicaRequest(TransportReplicationAction.java:697)
	at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106)
	at org.opensearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:453)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:806)

@mch2 mch2 self-assigned this May 11, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Segment Replication May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working distributed framework flaky-test Random test failure that succeeds on second run
Projects
Status: Done
5 participants