From 36472a70b3f19ff49831c29bbe244211948cf955 Mon Sep 17 00:00:00 2001
From: Jameson Nash <vtjnash@gmail.com>
Date: Mon, 6 Jan 2025 13:58:32 -0500
Subject: [PATCH] gc: fix assertion / ASAN violation in gc_big_object_link
 (#56944)

We somehow just got (un)lucky that `DFS!` at
Compiler/src/ssair/domtree.jl:184 just happened to store exactly the
same value as this pointer in this particular memory location
previously, so that this branch on `undef` hit exactly the right value
to fail. What are the odds?

Seen on a CI run (with rr)

The odds of this happening seem somewhere around 2^60 against, to 1 for
each time. So that seems impressive we hit this even this once.

But we did, and the proof is here, caught in rr:

https://buildkite.com/julialang/julia-master/builds/43366#019425d7-67fd-4f33-a025-6d7cd6181649
```
      From worker 6:	julia: /cache/build/tester-amdci5-10/julialang/julia-master/src/gc-stock.h:492: gc_big_object_link: Assertion `node->header != gc_bigval_sentinel_tag' failed.
2025-01-02 07:47:22 UTC	      From worker 6:
2025-01-02 07:47:22 UTC	      From worker 6:	[3877] signal 6 (-6): Aborted
2025-01-02 07:47:22 UTC	      From worker 6:	in expression starting at none:1
2025-01-02 07:47:22 UTC	      From worker 6:	gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
2025-01-02 07:47:22 UTC	      From worker 6:	abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
2025-01-02 07:47:22 UTC	      From worker 6:	unknown function (ip: 0x7fb9a4b5040e) at /lib/x86_64-linux-gnu/libc.so.6
2025-01-02 07:47:22 UTC	      From worker 6:	__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
2025-01-02 07:47:22 UTC	      From worker 6:	gc_big_object_link at /cache/build/tester-amdci5-10/julialang/julia-master/src/gc-stock.h:492 [inlined]
2025-01-02 07:47:22 UTC	      From worker 6:	gc_setmark_big at /cache/build/tester-amdci5-10/julialang/julia-master/src/gc-stock.c:276
2025-01-02 07:47:22 UTC	      From worker 6:	jl_gc_big_alloc_inner at /cache/build/tester-amdci5-10/julialang/julia-master/src/gc-stock.h:491
```
---
 src/gc-stock.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/gc-stock.c b/src/gc-stock.c
index c1bc1d64ae199..1f6b58d71da3a 100644
--- a/src/gc-stock.c
+++ b/src/gc-stock.c
@@ -423,7 +423,7 @@ STATIC_INLINE void jl_batch_accum_free_size(jl_ptls_t ptls, uint64_t sz) JL_NOTS
 
 // big value list
 
-// Size includes the tag and the tag is not cleared!!
+// Size includes the tag and the tag field is undefined on return (must be set before the next GC safepoint)
 STATIC_INLINE jl_value_t *jl_gc_big_alloc_inner(jl_ptls_t ptls, size_t sz)
 {
     maybe_collect(ptls);
@@ -448,6 +448,9 @@ STATIC_INLINE jl_value_t *jl_gc_big_alloc_inner(jl_ptls_t ptls, size_t sz)
     memset(v, 0xee, allocsz);
 #endif
     v->sz = allocsz;
+#ifndef NDEBUG
+    v->header = 0; // must be initialized (and not gc_bigval_sentinel_tag) or gc_big_object_link assertions will get confused
+#endif
     gc_big_object_link(ptls->gc_tls.heap.young_generation_of_bigvals, v);
     return jl_valueof(&v->header);
 }