Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] RssShuffleDataIterator.cleanup may be called multiple times #1818

Closed
3 tasks done
wForget opened this issue Jun 21, 2024 · 0 comments · Fixed by #1819
Closed
3 tasks done

[Bug] RssShuffleDataIterator.cleanup may be called multiple times #1818

wForget opened this issue Jun 21, 2024 · 0 comments · Fixed by #1819
Assignees
Labels
flaky test a flaky test

Comments

@wForget
Copy link
Member

wForget commented Jun 21, 2024

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

I got an IllegalReferenceCountException when running unit tests locally, which was caused by RssShuffleDataIterator.cleanup being called multiple times.

[2024-06-21 20:37:15.230] [Executor task launch worker for task 3.0 in stage 1.0 (TID 13)] [ERROR] TaskContextImpl.logError - Error in TaskCompletionListener
io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
	at io.netty.util.internal.ReferenceCountUpdater.toLiveRealRefCnt(ReferenceCountUpdater.java:83)
	at io.netty.util.internal.ReferenceCountUpdater.release(ReferenceCountUpdater.java:148)
	at io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:101)
	at org.apache.uniffle.common.netty.buffer.NettyManagedBuffer.release(NettyManagedBuffer.java:59)
	at org.apache.uniffle.common.ShuffleDataResult.release(ShuffleDataResult.java:112)
	at org.apache.uniffle.client.impl.ShuffleReadClientImpl.close(ShuffleReadClientImpl.java:332)
	at org.apache.spark.shuffle.reader.RssShuffleDataIterator.cleanup(RssShuffleDataIterator.java:219)
	at org.apache.spark.shuffle.reader.RssShuffleReader$MultiPartitionIterator.lambda$new$0(RssShuffleReader.java:289)
	at scala.Function0.apply$mcV$sp(Function0.scala:39)
	at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:47)
	at org.apache.spark.shuffle.reader.RssShuffleReader$MultiPartitionIterator.lambda$new$1(RssShuffleReader.java:313)
	at org.apache.spark.TaskContextImpl.$anonfun$markTaskCompleted$1(TaskContextImpl.scala:124)
	at org.apache.spark.TaskContextImpl.$anonfun$markTaskCompleted$1$adapted(TaskContextImpl.scala:124)
	at org.apache.spark.TaskContextImpl.$anonfun$invokeListeners$1(TaskContextImpl.scala:137)
	at org.apache.spark.TaskContextImpl.$anonfun$invokeListeners$1$adapted(TaskContextImpl.scala:135)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:135)
	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:124)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
[2024-06-21 20:37:15.234] [Executor task launch worker for task 3.0 in stage 1.0 (TID 13)] [ERROR] Executor.logError - Exception in task 3.0 in stage 1.0 (TID 13)
org.apache.spark.util.TaskCompletionListenerException: refCnt: 0, decrement: 1

Previous exception in task: null
	java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	java.base/java.lang.reflect.Method.invoke(Method.java:566)
	io.netty.util.internal.CleanerJava9.freeDirectBuffer(CleanerJava9.java:88)
	io.netty.util.internal.PlatformDependent.freeDirectBuffer(PlatformDependent.java:521)
	org.apache.uniffle.common.util.RssUtils.releaseByteBuffer(RssUtils.java:422)
	org.apache.uniffle.client.impl.ShuffleReadClientImpl.close(ShuffleReadClientImpl.java:335)
	org.apache.spark.shuffle.reader.RssShuffleDataIterator.cleanup(RssShuffleDataIterator.java:219)
	org.apache.spark.shuffle.reader.RssShuffleReader$MultiPartitionIterator.lambda$new$0(RssShuffleReader.java:289)
	scala.Function0.apply$mcV$sp(Function0.scala:39)
	org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:47)
	org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:36)
	org.apache.spark.shuffle.reader.RssShuffleReader$MultiPartitionIterator.hasNext(RssShuffleReader.java:324)
	org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:155)
	org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:50)
	org.apache.spark.shuffle.reader.RssShuffleReader.read(RssShuffleReader.java:180)
	org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:106)
	org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	org.apache.spark.scheduler.Task.run(Task.scala:131)
	org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	java.base/java.lang.Thread.run(Thread.java:834)
	at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:145)
	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:124)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

Affects Version(s)

master

Uniffle Server Log Output

No response

Uniffle Engine Log Output

No response

Uniffle Server Configurations

No response

Uniffle Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@rickyma rickyma added the flaky test a flaky test label Jun 21, 2024
zuston pushed a commit that referenced this issue Jun 25, 2024
…tiple times (#1819)

### What changes were proposed in this pull request?

Avoid calling `RssShuffleDataIterator.cleanup` multiple times.

### Why are the changes needed?

Fix: #1818

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.
zhengchenyu pushed a commit that referenced this issue Aug 2, 2024
…tiple times (#1819)

### What changes were proposed in this pull request?

Avoid calling `RssShuffleDataIterator.cleanup` multiple times.

### Why are the changes needed?

Fix: #1818

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky test a flaky test
Projects
None yet
2 participants