Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Unable to reproduce results on 4*A10 machines #1

Open
yyccli opened this issue Nov 1, 2023 · 0 comments
Open

[Bug] Unable to reproduce results on 4*A10 machines #1

yyccli opened this issue Nov 1, 2023 · 0 comments

Comments

@yyccli
Copy link

yyccli commented Nov 1, 2023

Hi~ First thanks for your work for the dl compilers.
I'm trying to reproduce your results on a 4*A10 machine(I don't have a A100 machine now :< ). A10's compute capability is 8.6 so i think it should work.

I try both building from scratch and provided docker image(hguyue1/alcop:latest), and i get different problems.

build from scratch

  1. the docker image nvidia/cuda:11.4.0-cudnn8-devel-ubuntu20.04 seems to be invaild now, it should be nvidia/cuda:11.4.**3**-cudnn8-devel-ubuntu20.04 docker hub link
  2. it seems alcop has a dependency on one of your own libraries named my_cutlass, but it can no longer be found under your github, so i comment out this code.
image
  1. now i can build tvm and try tests. Baseline is ok and i can get results below (i change the problem size to 2048,2048,2048)
image
  1. pipelining fails (the problem size is 512,512,512, other shapes like 4096,2048 will get the same problem)
image I'm quite new to CUDA and dl compilers now, so i cannot locate this error. Maybe it has something to do with shared memory size, i'm not sure...

pipelining's full error tracking is attached.
alcop_pipelining_error_log.txt

building from provided docker images

  1. get the same problem as above, and moreover get a core dump (both baseline and pipelining will trigger core dump)

baseline:
image

pipelining:
image
image

I am hoping you could look at this for me. Thanks!
My email address is liyangcheng.lyc@alibaba-inc.com if you would like to contact with emails. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant