From 36472a70b3f19ff49831c29bbe244211948cf955 Mon Sep 17 00:00:00 2001 From: Jameson Nash <vtjnash@gmail.com> Date: Mon, 6 Jan 2025 13:58:32 -0500 Subject: [PATCH] gc: fix assertion / ASAN violation in gc_big_object_link (#56944) We somehow just got (un)lucky that `DFS!` at Compiler/src/ssair/domtree.jl:184 just happened to store exactly the same value as this pointer in this particular memory location previously, so that this branch on `undef` hit exactly the right value to fail. What are the odds? Seen on a CI run (with rr) The odds of this happening seem somewhere around 2^60 against, to 1 for each time. So that seems impressive we hit this even this once. But we did, and the proof is here, caught in rr: https://buildkite.com/julialang/julia-master/builds/43366#019425d7-67fd-4f33-a025-6d7cd6181649 ``` From worker 6: julia: /cache/build/tester-amdci5-10/julialang/julia-master/src/gc-stock.h:492: gc_big_object_link: Assertion `node->header != gc_bigval_sentinel_tag' failed. 2025-01-02 07:47:22 UTC From worker 6: 2025-01-02 07:47:22 UTC From worker 6: [3877] signal 6 (-6): Aborted 2025-01-02 07:47:22 UTC From worker 6: in expression starting at none:1 2025-01-02 07:47:22 UTC From worker 6: gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) 2025-01-02 07:47:22 UTC From worker 6: abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) 2025-01-02 07:47:22 UTC From worker 6: unknown function (ip: 0x7fb9a4b5040e) at /lib/x86_64-linux-gnu/libc.so.6 2025-01-02 07:47:22 UTC From worker 6: __assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) 2025-01-02 07:47:22 UTC From worker 6: gc_big_object_link at /cache/build/tester-amdci5-10/julialang/julia-master/src/gc-stock.h:492 [inlined] 2025-01-02 07:47:22 UTC From worker 6: gc_setmark_big at /cache/build/tester-amdci5-10/julialang/julia-master/src/gc-stock.c:276 2025-01-02 07:47:22 UTC From worker 6: jl_gc_big_alloc_inner at /cache/build/tester-amdci5-10/julialang/julia-master/src/gc-stock.h:491 ``` --- src/gc-stock.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/gc-stock.c b/src/gc-stock.c index c1bc1d64ae199..1f6b58d71da3a 100644 --- a/src/gc-stock.c +++ b/src/gc-stock.c @@ -423,7 +423,7 @@ STATIC_INLINE void jl_batch_accum_free_size(jl_ptls_t ptls, uint64_t sz) JL_NOTS // big value list -// Size includes the tag and the tag is not cleared!! +// Size includes the tag and the tag field is undefined on return (must be set before the next GC safepoint) STATIC_INLINE jl_value_t *jl_gc_big_alloc_inner(jl_ptls_t ptls, size_t sz) { maybe_collect(ptls); @@ -448,6 +448,9 @@ STATIC_INLINE jl_value_t *jl_gc_big_alloc_inner(jl_ptls_t ptls, size_t sz) memset(v, 0xee, allocsz); #endif v->sz = allocsz; +#ifndef NDEBUG + v->header = 0; // must be initialized (and not gc_bigval_sentinel_tag) or gc_big_object_link assertions will get confused +#endif gc_big_object_link(ptls->gc_tls.heap.young_generation_of_bigvals, v); return jl_valueof(&v->header); }