Skip to content

5. Troubleshooting, Testing and Debugging

fabnicol edited this page Sep 26, 2021 · 3 revisions

Troubleshooting

Foreword

Experience shows that many build failures result from:

  • insufficient allocated RAM (parameter mem, 8GB by default)
  • too many CPU cores allocated to the VM (parameter ncpus, a third of total CPU threads by default).

The two parameters are not independent and the defaults, for a modern PC with 16GB of RAM, is a relatively safe one. However, increasing even moderately the number of threads allocated to the VM should come with an associated increase of allocated RAM (2 GB by extra thread is a minimal requirement from experience). A VM that runs short of RAM will inexplicably crash without indicating adequate reasons. An occasional error message (if any) will refer to the build being killed by user interruption, while it was not, which may be confusing.

Another common cause of failure arises from changes in the portage tree, notably in the set of use or keyword parameters (more occasionally, licensing changes). Users are invited to closely check error messages related to these feature changes. It is a fairly easy fix to adjust files ebuilds.list.use and ebuilds.list.accept_keywords at the root of the source directory (or within the VM) to fix these issues. In this event, please report the fix in the Issues section of this site.

Stalled machines

Machine stalling is another issue that may arise sometimes from VirtualBox software limitations or bugs. Most occurrences of stalling happen at the end of the building procedure, sometimes on compacting (this is why compact is not the default value).
If stalling happens when all jobs have actually been completed, the fix is simple: stop the VM and let the script proceed:

# VBoxManage controlvm name_of_vm poweroff

Otherwise you will have to restart the VM using either the VirtualBox graphical interface or:

# VBoxManage controlvm name_of_vm reset

Then have a go at debugging techniques outlines below.

Testing

Testing portage dependencies is advised if it has not been performed recently.
By default, as of tag v2.3, MKG tests package dependencies as a prior step to all building operations. The job will not proceed if tests fail for some reason, avoiding most of the causes of time loss due to failing virtual machines.
This step is performed in about 3 to 10 minutes. It can be deactivated by adding test_emerge=false to command line.
It is also possible to just test-run MKG so that to make sure that the combination of use, keyword and package specifications in files ebuilds.list.* is coherent with the portage tree of the day. Once these sanity checks are performed, MKG exits with a diagnosis of possible inconsistencies and also stops if tests are passed. It is up to the user to fix possible portage tree issues using the messages displayed by the test run and the Gentoo Installation handbook (notably section When Portage is complaining).
To enable test runs, add option test_only to command line.
It is advised to do so if MKG has not be run for more than a few days:

# ./mkg test_only

As of tag v2.3, it is possible to remotely run-test MKG this way on Github Actions, using the workflows yaml script under .github/workflows in the repository, by triggering the workflow_dispatch event. Testing takes about half an hour and, if successful, ends as follows:

MKG test GH Actions completion

Note that you cannot test-run MKG using test_only and at the same time use the Github Actions preprocessed Gentoo install ISO file (see usage section of this wiki). So test_only sets use_mkg_workflow to false automatically.
Below is a commented excerpt of the console output for this invocation:

Output Comment
Mar 11 22:35:18 fab: [INF] Fetching live CD... Downloading the Gentoo minimal installation CD
Mar 11 22:35:30 fab: [INF] Fetching stage3 tarball... Downloading stage3 archive
Mar 11 22:35:36 fab: [INF] Testing whether packages will be emerged... Launching the test
Mar 11 22:35:36 fab: [INF] Unmounting host filesystem Cleaning up mnt, mnt2 mount directories
Mar 11 22:35:42 fab: [INF] Moving stage3.tar.xz to mnt2/squashfs-root Moving files to the chroot target directory
Mar 11 22:35:54 fab: [MSG] Using CFLAGS=-march=native -O2 Mentioning CFLAGS value
[INF] Merging portage tree... Now testing for portage tree conflicts
* Generating 2 locales (this might take a while) with 12 jobs Locale setting
app-misc/pax-utils: 1.2.9 1.2.6 none (...) Some packages may occasionally be uninstalled
>>> Unmerging (1 of 3) app-misc/pax-utils-1.2.9... ...
[MSG] Using profile=default/linux/amd64/17.1/desktop/plasma Selected profile
[INF] Updating cmake... cmake must first be built
>>> Installing (5 of 5) dev-util/cmake-3.18.5::gentoo ...
[INF] Updating python. Please wait... Working around a circlular dependency involving Python
>>> Installing (1 of 1) dev-lang/python-3.9.1-r1::gentoo ...
[INF] Testing update of world set... Now on to updating @world set
Calculating dependencies ..... done! ...
[ebuild N ] dev-qt/qtchooser-66 USE="-test"
(...) list of ebuild updates ...
[Possible message about conflicts] If none, @world update was passed.
[INF] Testing whether packages may be merged... Now on to installing packages in file ebuilds.list.complete or minimal
These are the packages that would be merged, in order:
Calculating dependencies ... .... done!
[ebuild N ] dev-qt/qtchooser-66 USE="-test"
(...) long list of ebuild dependencies ...
[Possible message about conflicting dependencies If none, this test was passed.
Mar 11 22:43:59 fab: [MSG] Portage tests were passed. All is fine
Mar 11 22:43:59 fab: [INF] Unmounting host filesystem Cleaning up
Mar 11 22:44:02 fab: [INF] Removing mount directory ...
Mar 11 22:44:02 fab: [MSG] Gentoo building process ended. End of test: OK.

Debugging

Quick check

When building has stopped, a quick check procedure should first be used to ensure that the building process exited with code 0:

# ./mkg exitcode vm=... no_run

if the exit code indicated by the output message of the command is non-null, then proceed as indicated below, choosing between VM debugging (you need how to handle VirtualBox) or shared root debugging (easier, but you will need a decent installation of qemu on your platform).

You can combine exitcode with any Gentoo OS building command line (so not with from_device or from_iso). In this case, the exit code of the VM process will be printed on exiting MKG. Use no_run in combination with exitcode in the reverse case, when you do not want to start a building process again.

Debugging in the VM

Some users may want to debug the running VM themselves. This is only possible if MKG is not run in silent mode (so without command line option gui=false). You may follow these steps:

  • stop the VM if still running.

  • check the parameters and launch it again.

  • select your keyboard locale about 30 seconds after boot when requested.

  • enter the appropriate replies in the network settings menus (usually OK if you are connected to a wired network)

  • interrupt with Ctrl + C within 5 seconds after the last menu. Run ifconfig to check that your network connection.

  • mount /dev/sda4 to /mnt/gentoo and chroot into it:

    # mount /dev/sda4 /mnt/gentoo && cd /mnt/gentoo
    # for i in proc sys dev dev/pts run; do mount -B /$i /mnt/gentoo/$i; done
    # chroot /mnt/gentoo

  • proceed with your debugging session. Preferably use nano as an editor (I've had occasional issues with vim)

  • at the end of the session, run exit and (preferably) unmount the mount points before shutting down the VM.

Shared root debugging

You may prefer the convenience of debugging under a shared host directory, to which the guest virtual disk will be mounted.
Be aware that this may cause security issues, which are aggravated if the VM is running and/or write permissions are granted (normally the connection should crash in this case).
Add the following options to command line:

  • # ./mkg vm=your_vm_name share_root="r" ["w" for write permissions, "r" for read-only]

Use this to allow mounting of the virtual machine VDI disk root to a host directory, with write permissions if "w" is specified and read-only if "r" is specified.

  • # ./mkg vm=your_vm_name share_root=... shared_dir=/path/to/host/shared/dir

Use this to avoid using default /vdi value and use a custom mount point instead.
It is advisable to use the above options when the virtual machine has been stopped.
You may however use them on launch too (use share_root=r in this case). In this case, a 15 minute delay will be enforced, which is the time requested by the machine to launch and partition the virtual VDI disk (plus a safety margin). Do not consider this delay to be a bug: the exact time of partitioning is not known and depends on hardware.
While running, the following logs are generated by the building process:

  • While running mkvm_chroot.sh:adjust_environment()

    • partition_log: partitioning operations
    • emerge.build: basic tools, updating @world, keyboard, timezone, locale
  • While running mkvm_chroot.sh:build_kernel()

    • /kernel.log: kernel source code install
    • /usr/src/linux/kernel.log: kernel build
  • While running mkvm_chroot.sh:install_software()

    • log_install_software.log: installation of software listed in ebuilds.list.minimal or ebuilds.list.complete
    • Rlibs.log: installation of R libraries for the full distribution install (empty for the minimal one).
  • While running mkvm_chroot.sh:global_config()

    • sddm.log: SSDM configuration for the Plasma desktop master branch.
    • grub.log: grub install log.
    • useradd.log: only if adding user was not possible.
  • While running mkvm_chroot.sh:finalize()

    • log_uninstall.log: uninstalled software
  • res.log: exit code of mkvm_chroot.sh.
    The following exit values may indicate where errors appeared:

    • odd number: issue with adjust_environment()
    • code & 2 == 1: issue with build_kernel()
    • code & 4 == 1: issue with install_software()
    • code & 8 == 1: issue with global_config()
    • code & 16 == 1: issue with finalize()

Filing an issue

Before filing an issue, please check that you are using the latest Github source code, unless repository changes since your version are clearly irrelevant to your case.
For issue reports to be useful for debugging, please follow the procedure indicated below as closely as possible:

  • Minimally:
    • your command line
    • the date and version of the starge3 archive, the Gentoo install ISO and the CloneZilla ISO files
    • versioning references (tag or commit, date)
    • the output of:
      # grep \\[...\\] /var/log/syslog > log if syslogd was previously installed (which is often the case).
      Alternatively, rerun with tee:

# [your mkg command] 2>&1 | tee log

and attach log to your post.

  • If one of your virtual machine fails, then first check that you have installed guestfish.

Please run again your job with compact=false debug_mode verbose added to command line, without an ISO output.
Once the new job has ended, plug in an empty or freely erasable USB key or external disk (minimum size: 55 GB).
Check the device label (say sdX) of this mass storage device using fdisk -l. Do not mount it.
Now run:

# guestfish --progress-bars --ro -a [your VDI disk] run : download /dev/sda /dev/sdX && sync

This will take between 3 and 5 minutes.
Note: All data on your mass storage device will be erased and replaced with the downloaded virtual machine.

Mount the device for example to /mnt and /mnt2:

# mount /dev/sdX4 /mnt && mount /dev/sdX2 /mnt2 && cd /mnt

You will see a number of files at the root of /mnt. Report the full list of files:

# ls -al * > files.log

Join a tarball of the log and ebuild files, and the syslogs (if any):
# tar cJvf debug.tar.xz *log *build* [var/log/syslog*]

Now cd to /mnt2 and report the state of kernel builds:

# cd /mnt2 && ls -al * > boot.log

Attach files.log, boot.log and debug.tar.xz to your post.