Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDBF-793 - Retire MSAN clang-15 builder, upgrade to clang-20 #562

Open
wants to merge 109 commits into
base: dev
Choose a base branch
from

Conversation

RazvanLiviuVarzaru
Copy link
Collaborator

@RazvanLiviuVarzaru RazvanLiviuVarzaru commented Sep 12, 2024

Replace MSAN CLANG-15 builder with CLANG 19 on Debian 12.

@RazvanLiviuVarzaru RazvanLiviuVarzaru marked this pull request as ready for review September 12, 2024 12:46
@RazvanLiviuVarzaru
Copy link
Collaborator Author

@grooverdan
I think this closes #482.

RazvanLiviuVarzaru and others added 23 commits November 28, 2024 15:25
- replace debian 11 - clang 15 with debian 12 - clang 19

For msan.Dockerfile
 - starting from clang 18 libunwing is added to ENABLE_RUNTIMES
 - make the dockerfile work for both bookworm / bullseye editions (LLVM repository)
 - doc Makefile is not present in bookworm for gmp
 - bookworm has newer aclocal / automake -> cracklib2 fix
 - clang 19 needs libclang-19-dev and libllvmlibc-19-dev installed
This is to aid consumption by developers
and have a ready available MSAN container.

Technically we don't need to purge from image
as its a build stage, but we do to keep size a bit smaller.
Also use update-alternatives to provide clang/clang++ links.

With a 6 monthly new release cycle of clang. Keeping it such
that a rebuild of image is sufficient to re-deploy rather than
a master restart will facilitate more frequent updates and
a fixed builder url.
Requested by Marko to make this a more realistic test
of the codebase delivered to users.

No -DWITH_DBUG_TRACE=OFF exists in RelWithDebInfo mode
make clang/clang++ alternates as soon as installed
Unit tests run all ok.

WITH_SAFE_MALLOC=OFF is default in RelWithDebInfo mode.
Also convient CFLAGS and MSAN environment
variables.
to non-instrumented things.

Otherwise these sorts of error in testing:

 LD_LIBRARY_PATH=/msan-libs/ /usr/bin/ctest
/usr/bin/ctest: symbol lookup error: /msan-libs/libgmp.so.10: undefined symbol: __msan_va_arg_overflow_size_tls
And the --force is for debian:11 compat where the files exist in the tarball.
@grooverdan grooverdan force-pushed the feature/msan-clang-19 branch from 73cd2d3 to 6703791 Compare February 26, 2025 05:01
Copy link
Collaborator Author

@RazvanLiviuVarzaru RazvanLiviuVarzaru Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I correctly understand, this is a docker file that can be built alone, right?
It took me a while to figure it out that you are actually using COPY --from=rr to transfer the binary to the MSAN container image.

Maybe it's useful that in the header of this Dockerfile to say:

  • that this is behaving like a stage in a multi-stage Dockerfile after the contents of the base dockerfile (debian.Dockerfile) are added on top of it . (order is important)
  • can be built alone if needed
  • in the multi-stage setup it's acting like a .fragment. Dockerfile and sharing the RR binary is achieved using COPY --from=stage syntax
  • in the same multi-stage setup, to highlight the importance of FROM $BASE_IMAGE, I guess that it's important that Building RR and Building MSAN should happen on the same OS/VERSION

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it can be built alone.

Multi-stage - doesn't explictly add on top. It just provides the arctifacts for COPY --from=stage.

The requirements for the later stage are a bit less. Hopefully covered all these in updated header of the file.

util.BuilderConfig(
name="amd64-debug-msan-clang",
workernames=workers["x64-bbw-docker-debug-msan-clang"],
tags=["Debian", "clang", "msan", "debug"],
Copy link
Collaborator Author

@RazvanLiviuVarzaru RazvanLiviuVarzaru Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the builder name doesn't have the version of clang in the name and if the tags do not give the user a clue, then the user's last resort is to study the logs.

It's useful to have an easy to spot place for telling the clang-version even if that means we will update the configuration once in a while.

In Grid view, builder name is the easiest indicator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

  • amd64-msan-clang-20
  • amd64-msan-clang-20-debug

What are the benefits?

  • debug word at the end as in other builders, also draws attention
  • builder history wise we keep apples and pears in separate baskets e.g. clang-20 full run history in amd64-msan-clang-20, future clang-21 history in amd64-msan-clang-21
  • transitioning . Future amd64-msan-clang-21 can run in parallel with amd64-msan-clang-20 until is stable and we can discontinue the latter smoothly (just a protected branches switch)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention build type/compiler/extra tests wasn't fully clear.

Ok, I accept these reasons.

reminder both Marko and I are almost of the opinion that the msan-debug might be providing more timeouts for a minimal addition of coverage (on an non-released work). And the issue noted on zulip means it currently doesn't even bootstrap on debug, so look at dropping this if too much trouble, and the debug obviousn't won't be protected branches.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given updates to clang-21 if it still occurs within the support cycle of debian-12 I've updated the image tag to include -20.

- image: debian:11
platforms: linux/amd64
branch: 10.11
tag: debian11-msan-clang-16
Copy link
Collaborator Author

@RazvanLiviuVarzaru RazvanLiviuVarzaru Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should highlight the approach on transitioning, I know it was discussed briefly but please confirm If my understand is right.

  1. you are discontinuing the build of debian-11-msan-clang-16 container image because it is incompatible with the patches in msan.fragment.Dockerfile
  2. you don't touch the msan-clang-16 builder definition in master-docker-nonstandard because we want to keep it alive while we validate the new msan-clang-20 builder

If 1 & 2 are true, I think it's worth that a commit clearly states that.

Maybe you already mentioned this, but is very hard for me to go through 101 commits. And about that, if you can please make the commit stack smaller, group them and make them highlight the important decisions of this patch. It will be very hard for anyone to follow them.

4-5 commits if possible, would be perfect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 & 2 are correct. I've included these and the other transition plan details in the MDBF.

I know the 101 commits (now more) is actually as horrendous as the development process was. Its not something I aimed to do or will aim to do again, and I concur with 4-5 being ideal.

I have tied to rebase this down and have failed so far. I'd like to think these commits are well documented up and show up in changing a limited set of files minimally to be unobtrusive. I'd like to think the terse commit messages are still conveying significant meaning to be searchable and meaningful for a direct read via git blame without a commit by commit bisectable review consistent with most recent commits.

So can I please ask for forgiveness here and accept the large number of commits as is?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry Daniel, if you want, I can help you grouping them.

My general idea is:

1 commit for msan.Dockerfile and .yml workflow

  • this could be an introductory commit where you first highlight the transition to clang-20 and debian-12

1 commit for the instrumented libs shell script:

  • this is a follow up of the first commit, the flow is natural and you basically show the implementation of RUN ./msan.instrumentedlibs.sh
  • I saw many commits that patch various instrumented libs in this file. These can be grouped under a single commit. You already gave precious details for each library in the file itself, so you can keep the short version in the commit message.

1 commit for all factory changes:

  • there are many commits only affecting the factory
  • in this commit you can also include changes to constants.py
  • this is also a natural transition where you can highlight how the factory changes are coupled with msan.Dockerfile changes
  • here is also a place to explain the Debug / RelwithDebInfo builders

1 commit for RR

  • highlighting that this is a "human helper" and a few words about its implementation

Some commits are "fix that, and other", "revert", "typo" and so on. So the changes in these can be easily grouped under the above categories.

I apologize again for insisting but I hope you understand my reasoning.

canStartBuild=canStartBuild,
properties={"c_compiler": "clang", "cxx_compiler": "clang++", "build_type": "RelWithDebInfo"},
locks=getLocks,
factory=f_msan_build,
Copy link
Collaborator Author

@RazvanLiviuVarzaru RazvanLiviuVarzaru Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take into account that now f_msan_build factory in master-docker-nonstandard-2 is not equal anymore with f_msan_build factory in master-docker-nonstandard (where clang-16 builder is).

We should have extra care at transitioning and mention the plan. Things to note:

  • a protected branches builder should have backup workers. It's OK that now these 2 builders only have apexis, they are not protected. MSAN Clang-16 for example has hz-bbw1 / 4 / 5 as worker nodes.
  • at transitioning, f_msan_build factory that clang-16 uses is no longer valid (right ?) and we can move f_msan_build from your patch in common_factories.py and of course, adjust the worker pool of these 2 builders when they become protected.

Maybe @cvicentiu has better ideas here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable, however a transition plan is probably better in in the MDBF task.

image: ${{ matrix.image }}
platforms: ${{ matrix.platforms }}
tag: ${{ matrix.tag }}
branch: ${{ matrix.branch }}
clang_version: ${{ matrix.clang_version }}
nogalera: ${{ matrix.nogalera }}
files:
Copy link
Collaborator Author

@RazvanLiviuVarzaru RazvanLiviuVarzaru Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think files: is needed, adds extra complexity.
In bintars, for example, we just do COPY ci_build_images/scripts/* /scripts/ without any files: specification in the workflow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as @cvicentiu mentioned, for future us, as many detailed comments as you can for the library instrumentation script.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is build with uses: bbw_build_container_template.yml , so the additional file complexity is needed to fit in with what the template expects.

I did include as much information as I know in the library instrumentation script.

# Marker to make it possible to build a dev msan builder
# from the nightly clang versions as they are in a differently
# name repo
ENV CLANG_DEV_VERSION=21
Copy link
Collaborator Author

@RazvanLiviuVarzaru RazvanLiviuVarzaru Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for local testing?
So when we move to clang 21 we just set CLANG_VERSION=21 and CLANG_DEV_VERSION to what is currently in the nightly builds repository?

If yes, it's worth adding a comment on how do we handle upgrades.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local testing or if ever every want to use a dev version for BB testing.

CLANG_VERSION is the version you want. CLANG_DEV_VERSION https://apt.llvm.org/ under "development" branch (which is in a state of flux as clang-20 is at the end of its rc cycle.

file comment updated.

Adjust to the resources on apexis
Suggestion:

amd64-msan-clang-20
amd64-msan-clang-20-debug

What are the benefits?

* debug word at the end as in other builders, also draws attention
* builder history wise we keep apples and pears in separate baskets e.g. clang-20 full run history in amd64-msan-clang-20, future clang-21 history in amd64-msan-clang-21
* transitioning . Future amd64-msan-clang-21 can run in parallel with amd64-msan-clang-20 until is stable and we can discontinue the latter smoothly (just a protected branches switch)
RazvanLiviuVarzaru added a commit to RazvanLiviuVarzaru/buildbot-r that referenced this pull request Mar 3, 2025
Part of MDBF-993.
As per Vicențiu Ciorbaru request,

last-N-failed is only reporting its status (not in branch protection)
and Vicențiu Ciorbaru reported often sporadic failures for it.

debian-11-msan It will be replaced by a clang-20 builder in MariaDB#562
so no need to update GITHUB_STATUS_BUILDERS for it in this Pull Request.
pkg-config \
python3-pexpect \
unzip \
zlib1g-dev \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list of packages if exactly the same as the one installed could be in a variable, this would ease maintenance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants