Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docker][CI][RISC-V] Build riscv-isa-sim (spike) in ci_riscv Docker image to enable RISC-V unit testing #12534

Merged
merged 6 commits into from
Sep 15, 2022

Conversation

PhilippvK
Copy link
Contributor

@PhilippvK PhilippvK commented Aug 22, 2022

Context

See Discussion in https://discuss.tvm.apache.org/t/coordination-of-risc-v-integration-in-tvm/13133

Summary

  1. Update Dockerfile.ci_riscv to download RISC-V toolchain and compile Spike simulator (including proxy kernel)

    • Using SiFives GCC toolchain for Embedded targets (See below)
    • Two variants of proxy kernel (pk, pk64) are build to support rv32gc as well as rv64gc targets
  2. Introduce AotTestRunner to use spike in unit tests (currently rv32gc only)

  3. Enable spike based AoTTestRunner for AoT/CRT related test cases

Future work

I have a followup ready based on this PR, which enable usage of spike for MicroTVM deployment (ProjectAPI Server).

Questions

  • Why not use existing CSI-NN GCC toolchain?

That toolchain (Xuantie-900-gcc-linux) is mainly targeting 64-bit devices capable of running Linux. While it could be used for 32-bit bare metal targets as well (using the -static compile flag), it is missing relevant multilib arches, such as rv32gc+ilp32d to be usable. Thus, I use Sifives GCC (which does currenlty not support the RVV1.0 extension)

  • Should we fix the version of spike being build or just use the latest development branch?

cc @Mousius @areusch @driazati @gigiblender

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i retriggered the PR due to i think a network issue, and just have one question here.

cc @alter-xp can you take a look at the compiler used here and see if it would also work for you (or whether there's one in Zephyr we should consider)?

@@ -65,6 +65,17 @@
},
)

AOT_SPIKE_RUNNER = AOTTestRunner(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious--i think this shouldn't be necessary if we are using Project API. did you need to use the AOTTestRunner infra?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK AOTTestRunner and the MicroTVM Project API are just two different approaches. The latter is more powerful (allows autotuning, profiling) but comes with some overheads (due to RPC communication) while the test runner is very minimalistic but good enough for effective unit testing.

I went the AOTTestRunner approach to run the existing AOT/CRT related test cases in a straightforward fashion. I am not sure if this was one of you goals? Feel free to propose which test should be run using spike instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah gotcha; it's probably fine to take that approach. I think we do want to migrate the unit tests to Project API at some point (the advantage is that it forces the set of information passed from compiler to runtime to be presented in Model Library Format, thereby ensuring that we don't need side-channeled information beyond that format). it might be easier to leave the tests as-is for now, but i'm wondering if you foresee any challenge with using Project API with SPIKE? cc @mehrdadh @mkatanbaf @gromero

Copy link
Member

@mehrdadh mehrdadh Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much encourage to use project API from the first place, unless if there's a blocker. Initially, AOTTestRunner was introduced because at that point we didn't have host driven AOT and multi model support in MLF artifact. We're gonna remove AOTTestRunner in near future and consolidate all testing to a single flow.

@areusch
Copy link
Contributor

areusch commented Aug 22, 2022

turns out the network issue was actually a forgotten step in adding the image, which i've now hopefully rectified.

@PhilippvK
Copy link
Contributor Author

turns out the network issue was actually a forgotten step in adding the image, which i've now hopefully rectified.

@areusch it is probably a fault on my side. Seems like the spike test runner is enabled for every docker image instead of only in ci_riscv. I will try to come up with a followup.

@PhilippvK
Copy link
Contributor Author

@areusch

cc @alter-xp can you take a look at the compiler used here and see if it would also work for you (or whether there's one in Zephyr we should consider)?

I got in touch with @alter-xp a while ago regarding the multilib limitations in their toolchain but if i remember correctly I did not get a response.

What we have to consider is that in the end we will have to support both baremetal (most likely 32-bit MCUs) and linux-driven (i.e. Allwinner D1 based on C906 chip) RISC-V targets.

The baremetal targets will most likely just use the MicroTVM platforms (Zephyr, Arduino, Spike) while the larger chips should just follow the usual TVM (RPC-Server) flow similar to the Aarch64 targets such as Raspberry Pis.

Having a single toolchain for both strategies is desirable to keep maintenance low, but not a requirement for now IMHO.

@alter-xp
Copy link
Contributor

alter-xp commented Aug 23, 2022

@PhilippvK @areusch

cc @alter-xp can you take a look at the compiler used here and see if it would also work for you (or whether there's one in Zephyr we should consider)?

for baremetal we have the other newlib toolchain. Its instruction support is the same as that of Linux tools.It may solve the compilation problem here.

I got in touch with @alter-xp a while ago regarding the multilib limitations in their toolchain but if i remember correctly I did not get a response.

I'm sorry I didn't fully understand your question at that time. you can try this tool Xuantie-900-gcc-elf-newlib-x86_64-V2.6.0-20220715.tar.gz

Having a single toolchain for both strategies is desirable to keep maintenance low, but not a requirement for now IMHO.

It should be a common practice in the industry to provide two sets of compilation tools for baremetal and Linux driven.

@areusch
Copy link
Contributor

areusch commented Aug 23, 2022

@alter-xp @PhilippvK gotcha--it's fine to have two toolchains in ci_riscv if they have a defined purpose. could we document that purpose in the install scripts?

@alter-xp
Copy link
Contributor

@areusch Currently, there is only one Xuantie compiler in ci_riscv. I ignored the situation of baremetal before. I can add it in image. But It seems better to add it directly to this pr because this PR needs to use it. What do you @PhilippvK think?

@areusch
Copy link
Contributor

areusch commented Aug 24, 2022

one piece of advice: we don't quite have the debug flow well-polished for the type of PR that both adds a dependency to a docker container and then tries to use that dependency. the issue is that you can't pull the docker image built by the CI. for that reason, I suggest to split adds/uses of a new dependency into two PRs (happy to accept the first without the second as long as we're reasonably confident in landing the second).

@PhilippvK
Copy link
Contributor Author

Sorry for the delay, I was not at home for the past few days. I will update the PR later this week

@alter-xp Thank your for providing the Xuantie-900-gcc-elf-newlib-x86_64-V2.6.0-20220715.tar.gz. It seems to support the relevant extensions for baseline TVM integration and it does make sense to have both the the Linux and Newlib toolchain from a similar codebase. However it have two questions (unrelated to this pr) after shortly trying it out by myself:

  • While the P extension seems to be somehow supported (intrinsics are available) I could not link a 32bit program (-march=rv32gcp -mabi=ilp32d) using these intrinsics, while linking 64bit programs (-march=rv64gcp -mabi=lp64d) works… Am I doing something wrong or is the 32bit version of the RVP extensions currently not supported.
  • I was also wondering why the p extension is not listed in the output of riscv64-unknown-elf-gcc -print-multi-lib.

It would be great if you could help me out with that!

@alter-xp
Copy link
Contributor

alter-xp commented Sep 1, 2022

hi @PhilippvK

  • While the P extension seems to be somehow supported (intrinsics are available) I could not link a 32bit program (-march=rv32gcp -mabi=ilp32d) using these intrinsics, while linking 64bit programs (-march=rv64gcp -mabi=lp64d) works… Am I doing something wrong or is the 32bit version of the RVP extensions currently not supported.

To link a 32bit program with P extensions, you can try this command -march=rv32gcp_zpn_zpsfoperand -mabi=ilp32d.
And for this case, the compiler may only support this option at present.

  • I was also wondering why the p extension is not listed in the output of riscv64-unknown-elf-gcc -print-multi-lib.

Since the file size needs to be taken into account when releasing the package, there is no library for all instruction combinations. The latest version of GCC source code will be opened in about two weeks, which may be helpful to you.

@PhilippvK PhilippvK force-pushed the feature_docker_spike branch from 869f95c to 0025677 Compare September 7, 2022 13:44
@PhilippvK
Copy link
Contributor Author

PhilippvK commented Sep 7, 2022

Here is a short update:

  • I dropped the commits related to the SpikeAOTTestRunner and will look into testing using the MicroTVM Project API for a future PR. Thanks @areusch and @mehrdadh for letting me know that the currently used AOT test infrastructure is to be deprecated soon.
  • The baremetal toolchain used in the image is now the Xuantie900 version proposed by @alter-xp
  • For consistency, I refactored the CSI-NN2 insalll script to move the installation of the Linux toolchain to a separate script. As this has a few consequences, I would like to ask @alter-xp to review these changes:
    • CSI-NN is not anymore included in Dockerfile.ci_cortexm
    • Both toolchains are now installed into /opt/riscv instead of /opt/csi-nn2/
    • The script/download_toolchain.sh script is not used anymore. This also means that the installed version of the toolchain has to be kept in sync with the used version of CSI-NN2 by updating the TVM CI script.
    • The used download URLs are hopefully to be replaced with more stable/similar ones once the toolchain will be open sourced
    • After moving the gcc download out of the CSI-NN2 script, if might make sense to do the same with the QEMU installation. What do you think @alter-xp?

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @PhilippvK !

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PhilippvK looks like the PR is missing a file

areusch
areusch previously requested changes Sep 7, 2022
Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing approval til the missing file is in

@PhilippvK PhilippvK force-pushed the feature_docker_spike branch from 0025677 to 7c9ab89 Compare September 7, 2022 20:37
@PhilippvK
Copy link
Contributor Author

removing approval til the missing file is in

File is added now. Thanks for the hint.

@PhilippvK PhilippvK force-pushed the feature_docker_spike branch from 7c9ab89 to 5fe5fe7 Compare September 7, 2022 20:39
Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @PhilippvK !

mkdir build
cd build
../configure --prefix=$RISCV --with-isa=RV32IMAC
make -j`nproc`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slight pref for nproc - 1, in case you're locally building.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I forgot to fix this... sorry for that, we can change it in a later commit.

@areusch areusch dismissed their stale review September 7, 2022 20:42

file added

Comment on lines 44 to 52
RISCV_GCC_RELEASE="1.12.1"
RISCV_GCC_VERSION="2.4.0"
RISCV_GCC_KERNEL_VERSION="5.10.4"
RISCV_GCC_DATE="20220428"
RISCV_GCC_ARCH="x86_64"
RISCV_GCC_BASE="Xuantie-900-gcc-linux-${RISCV_GCC_KERNEL_VERSION}-glibc-${RISCV_GCC_ARCH}-V${RISCV_GCC_VERSION}-${RISCV_GCC_DATE}"
RISCV_GCC_EXT="tar.gz"
RISCV_GCC_URL="https://github.com/T-head-Semi/csi-nn2/releases/download/v${RISCV_GCC_RELEASE}/${RISCV_GCC_BASE}.${RISCV_GCC_EXT}"
DOWNLOAD_PATH="/tmp/${RISCV_GCC_BASE}.tar.gz"
Copy link
Contributor

@alter-xp alter-xp Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RISCV_GCC_VERSION="2.6.0"
RISCV_GCC_ID="1659325511536"
RISCV_GCC_KERNEL_VERSION="5.10.4"
RISCV_GCC_DATE="20220715"
RISCV_GCC_ARCH="x86_64"
RISCV_GCC_BASE="Xuantie-900-gcc-linux-${RISCV_GCC_KERNEL_VERSION}-glibc-${RISCV_GCC_ARCH}-V${RISCV_GCC_VERSION}-${RISCV_GCC_DATE}"
RISCV_GCC_EXT="tar.gz"
RISCV_GCC_URL="https://occ-oss-prod.oss-cn-hangzhou.aliyuncs.com/resource//${RISCV_GCC_ID}/${RISCV_GCC_BASE}.${RISCV_GCC_EXT}"
DOWNLOAD_PATH="/tmp/${RISCV_GCC_BASE}.tar.gz"

You can use this code to update the toolchain version here to make it consistent with the newlib version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, this is what I was looking for!

@PhilippvK PhilippvK force-pushed the feature_docker_spike branch from 5fe5fe7 to de32558 Compare September 8, 2022 05:55
@alter-xp
Copy link
Contributor

alter-xp commented Sep 8, 2022

hi @PhilippvK

  • For consistency, I refactored the CSI-NN2 insalll script to move the installation of the Linux toolchain to a separate script. As this has a few consequences, I would like to ask @alter-xp to review these changes:

    • CSI-NN is not anymore included in Dockerfile.ci_cortexm
    • Both toolchains are now installed into /opt/riscv instead of /opt/csi-nn2/
    • The script/download_toolchain.sh script is not used anymore. This also means that the installed version of the toolchain has to be kept in sync with the used version of CSI-NN2 by updating the TVM CI script.
    • The used download URLs are hopefully to be replaced with more stable/similar ones once the toolchain will be open sourced

These changes don't have much effect on me. Regarding the installation script here, I am also planning to separate it recently. I'm very glad you did it.

  • After moving the gcc download out of the CSI-NN2 script, if might make sense to do the same with the QEMU installation. What do you think @alter-xp?

I also intend to do this. Should we also do it in this PR?

@PhilippvK PhilippvK force-pushed the feature_docker_spike branch from de32558 to c64a400 Compare September 9, 2022 00:18
@PhilippvK
Copy link
Contributor Author

I also intend to do this. Should we also do it in this PR?

I added it to this PR. Please check if the changes are okay or if I am missing something @alter-xp.

@PhilippvK PhilippvK force-pushed the feature_docker_spike branch from c64a400 to f17e2b4 Compare September 9, 2022 12:16
@PhilippvK
Copy link
Contributor Author

I also intend to do this. Should we also do it in this PR?

@alter-xp done :)

@areusch areusch merged commit f5517d4 into apache:main Sep 15, 2022
@areusch
Copy link
Contributor

areusch commented Sep 15, 2022

thanks @PhilippvK @alter-xp ! we'll work on moving to newly-built docker images so you can start using this.

@PhilippvK
Copy link
Contributor Author

The latest version of GCC source code will be opened in about two weeks, which may be helpful to you.

@alter-xp Is there any update on this?

@alter-xp
Copy link
Contributor

The latest version of GCC source code will be opened in about two weeks, which may be helpful to you.

@alter-xp Is there any update on this?

hi, @PhilippvK. because of some other factors, the open time of gcc source code is delayed. The latest timetable is before December. Once the code is ready, I will inform you here.

xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
…mage to enable RISC-V unit testing (apache#12534)

* Remove CSI-NN from ci_cortexm docker image

* [Docker] [RISC-V] Split up CSI-NN2 installation script into several files

[Docker] [RISC-V] move gcc toolchain installation out of csi-nn2 script

[Docker] [RISC-V] move qemu installation out of csi-nn2 script

* use updated version of qemu

* [Docker] [RISC-V] Install newlib (baremetal) gcc toolchain

* [Docker] [RISC-V] Install spike simulator

* [Docker] move initialization of timezone and DEBIAN_FRONTEND to ubuntu_install_core.sh script
@alter-xp
Copy link
Contributor

alter-xp commented Dec 5, 2022

hi, @PhilippvK the new gcc source code has been opened, you can find here. I hope this will help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants