[VL] Task killed by Yarn #6947

FelixYBW · 2024-08-20T22:02:58Z

Backend

VL (Velox)

Bug description

one common issue of Gluten is that the task is killed by yarn. Currently Gluten has some memory allocation like std::vector isn't tracked by spark's memory management. It's counted into executor.overhead memory now. But there should be some such allocation with large data size. We should create a proxy allocator for such memory allocation.

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

Yohahaha · 2024-08-21T04:36:15Z

We may need provide the ability to proxy malloc/free in Gluten, and inject this proxy to Velox MemoryAllocator.

FelixYBW · 2024-08-21T18:07:10Z

Related to #6960. The golbal memory allocation isn't tracked either, which doesn't have a limit now. It's mostly the root cause.

Thank @zhztheplayer and @marin-ma

FelixYBW · 2024-09-14T06:50:06Z

PR6988 added the limitation of global memory allocation, 0.75xoverhead memory. But Velox has no control of the global memory allocation. So with PR6988, we can workaround to set a large overhead memory, otherwise you will see the Exceeded memory allocator limit exception. Though not kill by yarn anymore.

FelixYBW · 2024-09-20T06:37:47Z

FelixYBW@fbfe0f3

Here is a simple way to detect this. It output the memory allocation bigger than 1M but not from memory pool. disable jemalloc, set executor.core to 1, then you can get output like

allocated 67108864
libvelox.so(+0x15b4ff5) [0x7f3301cf2ff5]
libvelox.so(+0x15b51eb) [0x7f3301cf31eb]
libgluten.so(_Znwm+0x19) [0x7f3307be9ea9]
libvelox.so(_ZNSt6vectorISt10shared_ptrIN8facebook5velox4exec15WindowPartitionEESaIS5_EE17_M_realloc_insertIJS5_EEEvN9__gnu_cxx17__normal_iteratorIPS5_S7_EEDpOT_+0x80) [0x7f3305130180]
libvelox.so(_ZN8facebook5velox4exec24RowsStreamingWindowBuild18addPartitionInputsEb+0x1cf) [0x7f330512f18f]
libvelox.so(_ZN8facebook5velox4exec24RowsStreamingWindowBuild8addInputESt10shared_ptrINS0_9RowVectorEE+0x419) [0x7f330512f799]
libvelox.so(_ZN8facebook5velox4exec6Window8addInputESt10shared_ptrINS0_9RowVectorEE+0x53) [0x7f3304bda5b3]
libvelox.so(_ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE+0xfb9) [0x7f3304b5e0f9]
libvelox.so(_ZN8facebook5velox4exec6Driver4nextEPN5folly10SemiFutureINS3_4UnitEEE+0xbb) [0x7f3304b5f63b]
libvelox.so(_ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE+0x384) [0x7f3303124514]
libvelox.so(_ZN6gluten24WholeStageResultIterator4nextEv+0x78) [0x7f3301ce78f8]
libgluten.so(Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext+0x139) [0x7f33071c99c9]

In this example, the root cause is

windowPartitions_.push_back(std::make_shared<WindowPartition>(
       pool_, data_.get(), inversedInputChannels_, sortKeyInfo_));

Solution is to set memory::StlAllocator<char*> to std::vector and std::allocate_shared

FelixYBW · 2024-09-20T08:17:23Z

the other places:
https://github.com/FelixYBW/velox/blob/97f6d25518f3fea264e8382ac66a8d2479ff98f0/velox/exec/SortBuffer.cpp#L277

  auto spillRows = std::vector<char*>(
      sortedRows_.begin() + numOutputRows_, sortedRows_.end());

https://github.com/FelixYBW/velox/blob/97f6d25518f3fea264e8382ac66a8d2479ff98f0/velox/exec/SortBuffer.cpp#L114

sortedRows_.resize(numInputRows_);

FelixYBW · 2024-09-20T08:24:43Z

and here:
https://github.com/FelixYBW/velox/blob/a322893d54e87acbe6a0f84c88d9dbbbd5441a5f/velox/exec/SpillFile.cpp#L145

  IOBufOutputStream out(
      *pool_, nullptr, std::max<int64_t>(64 * 1024, batch_->size()));

FelixYBW · 2024-09-25T18:02:50Z

and here: FelixYBW/velox@a322893/velox/exec/SpillFile.cpp#L145
  IOBufOutputStream out(
      *pool_, nullptr, std::max<int64_t>(64 * 1024, batch_->size()));

It's a temp buffer and will be released after flush

FelixYBW · 2024-09-25T18:05:19Z

Another known BKM, the malloc also matters. To boost performance, jemalloc will hold some memory for performance, which is counted into overhead memory by yarn as well.

FelixYBW · 2024-09-25T18:55:01Z

https://github.com/FelixYBW/velox/blob/97f6d25518f3fea264e8382ac66a8d2479ff98f0/velox/exec/Spiller.cpp#L419

Looks timsort needs multiple memory allocation. In this case each allocation is ~3M

allocate 3964904 0x55d7c6b55880
libvelox.so(+0x15b6105) [0x7f8acf3e7105]
libvelox.so(+0x15b643b) [0x7f8acf3e743b]
libgluten.so(_Znwm+0x19) [0x7f8ad52e0ea9]
libvelox.so(_ZNSt6vectorIPcSaIS0_EE13_M_assign_auxISt13move_iteratorIN9__gnu_cxx17__normal_iteratorIPS0_S_IS0_N8facebook5velox6memory12StlAllocatorIS0_EEEEEEEEvT_SG_St20forward_iterator_tag+0x5e) [0x7f8ad07fed9e]
libvelox.so(+0x29c6ac7) [0x7f8ad07f7ac7]
libvelox.so(_ZN8facebook5velox4exec7Spiller12ensureSortedERNS2_8SpillRunE+0x2685) [0x7f8ad07faed5]
libvelox.so(_ZN8facebook5velox4exec7Spiller10writeSpillEi+0x60) [0x7f8ad07fb540]
libvelox.so(+0x29ca72b) [0x7f8ad07fb72b]
libvelox.so(_ZNSt17_Function_handlerIFSt10unique_ptrIN8facebook5velox4exec7Spiller11SpillStatusESt14default_deleteIS5_EEvEZNS2_6memory28createAsyncMemoryReclaimTaskIS5_EESt10shared_ptrINS2_11AsyncSourceIT_EEESt8functionIFS0_ISE_S6_ISE_EEvEEEUlvE_E9_M_invokeERKSt9_Any_data+0x58) [0x7f8ad07fb938]
libvelox.so(_ZN8facebook5velox11AsyncSourceINS0_4exec7Spiller11SpillStatusEE4moveEv+0x226) [0x7f8ad07fe916]
libvelox.so(_ZN8facebook5velox4exec7Spiller8runSpillEb+0x46c) [0x7f8ad07f5fbc]
libvelox.so(_ZN8facebook5velox4exec7Spiller5spillEPKNS1_20RowContainerIteratorE+0x88) [0x7f8ad07f68b8]
libvelox.so(_ZN8facebook5velox4exec10SortBuffer10spillInputEv+0x40) [0x7f8ad07e1d70]
libvelox.so(_ZN8facebook5velox4exec7OrderBy7reclaimEmRNS0_6memory15MemoryReclaimer5StatsE+0xf8) [0x7f8ad2293828]
libvelox.so(+0x2986fdd) [0x7f8ad07b7fdd]
libvelox.so(_ZN8facebook5velox6memory15MemoryReclaimer3runERKSt8functionIFlvEERNS2_5StatsE+0x59) [0x7f8acf5ac699]
libvelox.so(_ZN8facebook5velox4exec8Operator15MemoryReclaimer7reclaimEPNS0_6memory10MemoryPoolEmmRNS4_15MemoryReclaimer5StatsE+0x27d) [0x7f8ad07bc7dd]
libvelox.so(_ZN8facebook5velox6memory15MemoryReclaimer7reclaimEPNS1_10MemoryPoolEmmRNS2_5StatsE+0x57a) [0x7f8acf5ac21a]
libvelox.so(_ZN8facebook5velox4exec23ParallelMemoryReclaimer7reclaimEPNS0_6memory10MemoryPoolEmmRNS3_15MemoryReclaimer5StatsE+0xd05) [0x7f8ad07b7845]
libvelox.so(_ZN8facebook5velox6memory15MemoryReclaimer7reclaimEPNS1_10MemoryPoolEmmRNS2_5StatsE+0x57a) [0x7f8acf5ac21a]
libvelox.so(_ZN8facebook5velox4exec4Task15MemoryReclaimer11reclaimTaskERKSt10shared_ptrIS2_EmmRNS0_6memory15MemoryReclaimer5StatsE+0xed) [0x7f8ad081759d]
libvelox.so(_ZN8facebook5velox4exec4Task15MemoryReclaimer7reclaimEPNS0_6memory10MemoryPoolEmmRNS4_15MemoryReclaimer5StatsE+0x189) [0x7f8ad0817c19]
libvelox.so(_ZN8facebook5velox6memory15MemoryReclaimer7reclaimEPNS1_10MemoryPoolEmmRNS2_5StatsE+0x57a) [0x7f8acf5ac21a]
libvelox.so(_ZN6gluten20ListenableArbitrator14shrinkCapacityEmbb+0xc8) [0x7f8acf3fd938]
libvelox.so(_ZN6gluten24WholeStageResultIterator14spillFixedSizeEl+0x2fa) [0x7f8acf3dbfca]
libgluten.so(Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeSpill+0x125) [0x7f8ad48c0e75]

FelixYBW · 2024-09-26T07:45:46Z

window related fix1: facebookincubator/velox#11077

sort related fix: facebookincubator/velox#11129

FelixYBW · 2024-09-29T05:05:51Z

the other places: https://github.com/FelixYBW/velox/blob/97f6d25518f3fea264e8382ac66a8d2479ff98f0/velox/exec/SortBuffer.cpp#L277
  auto spillRows = std::vector<char*>(
      sortedRows_.begin() + numOutputRows_, sortedRows_.end());
https://github.com/FelixYBW/velox/blob/97f6d25518f3fea264e8382ac66a8d2479ff98f0/velox/exec/SortBuffer.cpp#L114
sortedRows_.resize(numInputRows_);

facebookincubator/velox#11129

FelixYBW · 2024-10-29T08:19:37Z

An basic design in Velox is that before spill, we allocate memory in spark memory pool (offheap size). During spill, we allocate memory in global memory pool (overhead)

FelixYBW · 2024-11-24T07:24:07Z

The 3 configurations have big impact to the offheap and overhead memory usage:

spark.gluten.sql.columnar.backend.velox.spillWriteBufferSize
spark.gluten.sql.columnar.backend.velox.MaxSpillRunRows
spark.gluten.sql.columnar.backend.velox.maxSpillFileSize

SpillWriteBufferSize controls the buffer size when spill write data to disk. Looks it also control the read buffer size when spill data is fetch back. Each file must have one buffer allocated in offheap memory. If the size is too large, it will report OOM error triggered by getOutput.

MaxSpillRunRows controls the batch size of spill. The bigger the number, the more overhead memory is allocated, because during spill all memory allocation is overhead memory. The smaller the number, the more spill files.

maxSpillFileSize controls the file size of spill. The smaller the number, the more spill files.

#8025

FelixYBW added bug Something isn't working triage labels Aug 20, 2024

zhztheplayer changed the title ~~[VL] task killed by Yarn~~ [VL] Task killed by Yarn Aug 21, 2024

FelixYBW mentioned this issue Sep 18, 2024

[VL] umbrella performance issues during Gluten adoption #7269

Open

7 tasks

FelixYBW mentioned this issue Sep 26, 2024

Large memory allocations which are not tracked facebookincubator/velox#11099

Open

ccat3z mentioned this issue Nov 8, 2024

[CORE] LocalParitionWriter causes OOM during mergeSpills #7860

Closed

FelixYBW added the tracker Tracker of issues in the same category label Dec 13, 2024

zhztheplayer mentioned this issue Dec 23, 2024

[GLUTEN-7750][VL] Move ColumnarBuildSideRelation's memory occupation to Spark off-heap #8127

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] Task killed by Yarn #6947

[VL] Task killed by Yarn #6947

FelixYBW commented Aug 20, 2024

Yohahaha commented Aug 21, 2024

FelixYBW commented Aug 21, 2024

FelixYBW commented Sep 14, 2024

FelixYBW commented Sep 20, 2024 •

edited

Loading

FelixYBW commented Sep 20, 2024

FelixYBW commented Sep 20, 2024

FelixYBW commented Sep 25, 2024

FelixYBW commented Sep 25, 2024

FelixYBW commented Sep 25, 2024 •

edited

Loading

FelixYBW commented Sep 26, 2024 •

edited

Loading

FelixYBW commented Sep 29, 2024 •

edited

Loading

FelixYBW commented Oct 29, 2024

FelixYBW commented Nov 24, 2024

[VL] Task killed by Yarn #6947

[VL] Task killed by Yarn #6947

Comments

FelixYBW commented Aug 20, 2024

Backend

Bug description

Spark version

Spark configurations

System information

Relevant logs

Yohahaha commented Aug 21, 2024

FelixYBW commented Aug 21, 2024

FelixYBW commented Sep 14, 2024

FelixYBW commented Sep 20, 2024 • edited Loading

FelixYBW commented Sep 20, 2024

FelixYBW commented Sep 20, 2024

FelixYBW commented Sep 25, 2024

FelixYBW commented Sep 25, 2024

FelixYBW commented Sep 25, 2024 • edited Loading

FelixYBW commented Sep 26, 2024 • edited Loading

FelixYBW commented Sep 29, 2024 • edited Loading

FelixYBW commented Oct 29, 2024

FelixYBW commented Nov 24, 2024

FelixYBW commented Sep 20, 2024 •

edited

Loading

FelixYBW commented Sep 25, 2024 •

edited

Loading

FelixYBW commented Sep 26, 2024 •

edited

Loading

FelixYBW commented Sep 29, 2024 •

edited

Loading