Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SWT cannot yet pass the yklua test suite #1528

Open
ltratt opened this issue Dec 24, 2024 · 11 comments
Open

SWT cannot yet pass the yklua test suite #1528

ltratt opened this issue Dec 24, 2024 · 11 comments
Assignees

Comments

@ltratt
Copy link
Contributor

ltratt commented Dec 24, 2024

I thought SWT was capable of running -- though rather slowly! -- all the things that HWT can. But on a fresh clone on a debug build if I try running yklua's test suite I get:

$ cd yklua/tests
$ YKD_OPT=0 YKD_SERIALISE_COMPILATION=1 ../src/lua -e"_U=true" all.lua
...
***** FILE 'cstack.lua'*****
testing stack overflow detection
testing stack overflow in message handling
.final count:   250037
testing recursion inside pattern matching
testing stack-overflow in recursive 'gsub'
final count:    197
testing stack-overflow in recursive 'gsub' with metatables
final count:    99
testing limits in coroutines inside deep calls
final count:    196
chain of 'coroutine.close'
thread '<unnamed>' panicked at ykrt/src/compile/jitc_yk/codegen/x64/mod.rs:2012:30:
ConstPtr(0)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at ykrt/src/mt.rs:229:39:
called `Result::unwrap()` on an `Err` value: Any { .. }
fatal runtime error: failed to initiate panic, error 5
Aborted

This is immediately a bit fishy, because I would expect swt to create identical traces to hwt, but here we seem to be encountering a null pointer. However, just in case, I implemented support for constant pointers in this experimental branch. With YKD_OPT=0 this no longer panics but corrupts yklua's state leading to a Lua level backtrace:

...
***** FILE 'cstack.lua'*****
testing stack overflow detection
testing stack overflow in message handling
.final count:   250037
testing recursion inside pattern matching
testing stack-overflow in recursive 'gsub'
final count:    197
testing stack-overflow in recursive 'gsub' with metatables
final count:    99
testing limits in coroutines inside deep calls
final count:    196
chain of 'coroutine.close'
../src/lua: cstack.lua:105: attempt to call a boolean value (field 'resume')
stack traceback:
        cstack.lua:105: in main chunk
        (...tail calls...)
        all.lua:181: in main chunk
        [C]: in ?

Can it be that we never solved the unmappable problem (see e.g. #1095 or #980)? Or is there some other problem in swt? Either way, we should almost certainly fix this, because the longer we can't run swt on our test suites, the more likely that deeper harder-to-fix problems are going to creep in.

CC @ptersilie.

@ltratt
Copy link
Contributor Author

ltratt commented Jan 28, 2025

I am not sure if this is still (or ever was!) an issue or not. @Pavel-Durov did you replicate?

@Pavel-Durov
Copy link
Contributor

I am not sure if this is still (or ever was!) an issue or not. @Pavel-Durov did you replicate?

I haven't tried replicating it, but I can give it a go :)

@ltratt
Copy link
Contributor Author

ltratt commented Jan 28, 2025

Thanks!

@Pavel-Durov
Copy link
Contributor

Yes I was able to reproduce it as well

yk build (commit 94e9c56)

YKB_TRACER=swt cargo build

yklua build:

 YK_BUILD_TYPE=debug YKB_TRACER=swt make -j "$(nproc)"

yklua test:

 RUST_BACKTRACE=full YKD_OPT=0 YKD_SERIALISE_COMPILATION=1 ../src/lua -e"_U=true" all.lua

Output:

***** FILE 'cstack.lua'*****
testing stack overflow detection
testing stack overflow in message handling
.final count: 	250037
testing recursion inside pattern matching
testing stack-overflow in recursive 'gsub'
final count: 	197
testing stack-overflow in recursive 'gsub' with metatables
final count: 	99
testing limits in coroutines inside deep calls
final count: 	196
chain of 'coroutine.close'
../src/lua: cstack.lua:105: attempt to call a boolean value (field 'resume')
stack traceback:
	cstack.lua:105: in main chunk
	(...tail calls...)
	all.lua:181: in main chunk
	[C]: in ?
.>>> closing state <<<

@ltratt
Copy link
Contributor Author

ltratt commented Jan 29, 2025

I suspect it's worth running shrinkray on cstack.lua w/swt: we'll probably get a tiny test case in a few minutes. @Pavel-Durov have you used shrinkray yet?

@Pavel-Durov
Copy link
Contributor

Pavel-Durov commented Feb 1, 2025

Can it be that we never solved the unmappable problem (see e.g. #1095 or #980)?

I looked into the changes since the last time I ran the yklua tests.

We might not have tested the 5.6.4 import cause previous 5.4.4 import works with current yk.

The diff cstack.lua test versions is:

do    -- bug since 5.4.0
  local count = 0
  print("chain of 'coroutine.close'")
  -- create N coroutines forming a list so that each one, when closed,
  -- closes the previous one. (With a large enough N, previous Lua
  -- versions crash in this test.)
  local coro = false
  for i = 1, 1000 do
    local previous = coro
    coro = coroutine.create(function()
      local cc <close> = setmetatable({}, {__close=function()
        count = count + 1
        if previous then
          assert(coroutine.close(previous))
        end
      end})
      coroutine.yield()   -- leaves 'cc' pending to be closed
    end)
    assert(coroutine.resume(coro))  -- start it and run until it yields
  end
  local st, msg = coroutine.close(coro)
  assert(not st and string.find(msg, "C stack overflow"))
  print("final count: ", count)
end

@ltratt
Copy link
Contributor Author

ltratt commented Feb 1, 2025

If we shrinkray that, does it reduce the size of the problem?

@Pavel-Durov
Copy link
Contributor

When I run shrinkray on it using this script.

It completes:

Reduction completed!
Deleted 722 Bytes out of 824 Bytes (87.62% reduction) in 1 minute and 18.75 seconds

but it produces lua code with infinate loop:

do for A = 0, 1 / 0 do
    A = coroutine.create(function() coroutine.yield() end)
    assert(coroutine.resume(A))
  end end

that runs forever when I run at as:

RUST_BACKTRACE=full YKD_OPT=0 YKD_SERIALISE_COMPILATION=1  ../src/lua ./cstack.shrinked.lua

I'm sure that I'm doing something wrong, just not sure what is it... 🤔

@ltratt
Copy link
Contributor Author

ltratt commented Feb 1, 2025

Yes, your reduction script probably isn't looking for the right error. It does take a bit of getting used to.

@Pavel-Durov
Copy link
Contributor

It looks like this is the minimal reduction result:

Deleted 40 Bytes out of 141 Bytes (28.37% reduction) in 1 minute and 4.73 seconds

do
  for A = 0, 60 do
    A = coroutine.create(function() coroutine.yield() end)
    assert(coroutine.resume(A))
  end
end

@ltratt
Copy link
Contributor Author

ltratt commented Feb 2, 2025

Ah, actually, I think the ConstPtr error is a genuine todo. I think I fixed this on master, so if you're not upto date try merging master in? If not, I have a branch elsewhere which implements this (but is now blocked on a fix elsewhere).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants