-
Notifications
You must be signed in to change notification settings - Fork 3
5. Troubleshooting, Testing and Debugging
Experience shows that many build failures result from:
- insufficient allocated RAM (parameter
mem
, 8GB by default) - too many CPU cores allocated to the VM (parameter
ncpus
, a third of total CPU threads by default).
The two parameters are not independent and the defaults, for a modern PC with 16GB of RAM, is a relatively safe one. However, increasing even moderately the number of threads allocated to the VM should come with an associated increase of allocated RAM (2 GB by extra thread is a minimal requirement from experience). A VM that runs short of RAM will inexplicably crash without indicating adequate reasons. An occasional error message (if any) will refer to the build being killed by user interruption, while it was not, which may be confusing.
Another common cause of failure arises from changes in the portage tree, notably in the set of use or keyword parameters (more occasionally, licensing changes). Users are invited to closely check error messages related to these feature changes. It is a fairly easy fix to adjust files ebuilds.list.use and ebuilds.list.accept_keywords at the root of the source directory (or within the VM) to fix these issues. In this event, please report the fix in the Issues section of this site.
Machine stalling is another issue that may arise sometimes from VirtualBox software limitations or bugs. Most occurrences of stalling happen at the end of the building procedure, sometimes on compacting (this is why compact
is not the default value).
If stalling happens when all jobs have actually been completed, the fix is simple: stop the VM and let the script proceed:
# VBoxManage controlvm name_of_vm poweroff
Otherwise you will have to restart the VM using either the VirtualBox graphical interface or:
# VBoxManage controlvm name_of_vm reset
Then have a go at debugging techniques outlines below.
Testing portage dependencies is advised if it has not been performed recently.
By default, as of tag v2.3, MKG tests package dependencies as a prior step to all building operations.
The job will not proceed if tests fail for some reason, avoiding most of the causes of time loss due to failing virtual machines.
This step is performed in about 3 to 10 minutes. It can be deactivated by adding test_emerge=false
to command line.
It is also possible to just test-run MKG so that to make sure that the combination of use, keyword and package specifications
in files ebuilds.list.* is coherent with the portage tree of the day. Once these sanity checks are performed, MKG exits with a
diagnosis of possible inconsistencies and also stops if tests are passed.
It is up to the user to fix possible portage tree issues using the messages displayed by the
test run and the Gentoo Installation handbook (notably section When Portage is complaining).
To enable test runs, add option test_only
to command line.
It is advised to do so if MKG has not be run for more than a few days:
# ./mkg test_only
As of tag v2.3, it is possible to remotely run-test MKG this way on Github Actions, using the workflows yaml script under .github/workflows in the repository, by triggering the workflow_dispatch
event. Testing takes about half an hour and, if successful, ends as follows:
Note that you cannot test-run MKG using test_only
and at the same time use the Github Actions preprocessed Gentoo install ISO file (see usage section of this wiki). So test_only
sets use_mkg_workflow
to false
automatically.
Below is a commented excerpt of the console output for this invocation:
Output | Comment |
---|---|
Mar 11 22:35:18 fab: [INF] Fetching live CD... | Downloading the Gentoo minimal installation CD |
Mar 11 22:35:30 fab: [INF] Fetching stage3 tarball... | Downloading stage3 archive |
Mar 11 22:35:36 fab: [INF] Testing whether packages will be emerged... | Launching the test |
Mar 11 22:35:36 fab: [INF] Unmounting host filesystem | Cleaning up mnt, mnt2 mount directories |
Mar 11 22:35:42 fab: [INF] Moving stage3.tar.xz to mnt2/squashfs-root | Moving files to the chroot target directory |
Mar 11 22:35:54 fab: [MSG] Using CFLAGS=-march=native -O2 | Mentioning CFLAGS value |
[INF] Merging portage tree... | Now testing for portage tree conflicts |
* Generating 2 locales (this might take a while) with 12 jobs | Locale setting |
app-misc/pax-utils: 1.2.9 1.2.6 none (...) | Some packages may occasionally be uninstalled |
>>> Unmerging (1 of 3) app-misc/pax-utils-1.2.9... | ... |
[MSG] Using profile=default/linux/amd64/17.1/desktop/plasma | Selected profile |
[INF] Updating cmake... | cmake must first be built |
>>> Installing (5 of 5) dev-util/cmake-3.18.5::gentoo | ... |
[INF] Updating python. Please wait... | Working around a circlular dependency involving Python |
>>> Installing (1 of 1) dev-lang/python-3.9.1-r1::gentoo | ... |
[INF] Testing update of world set... | Now on to updating @world set |
Calculating dependencies ..... done! | ... |
[ebuild N ] dev-qt/qtchooser-66 USE="-test" | |
(...) list of ebuild updates | ... |
[Possible message about conflicts] | If none, @world update was passed. |
[INF] Testing whether packages may be merged... | Now on to installing packages in file ebuilds.list.complete or minimal |
These are the packages that would be merged, in order: | |
Calculating dependencies ... .... done! | |
[ebuild N ] dev-qt/qtchooser-66 USE="-test" | |
(...) long list of ebuild dependencies | ... |
[Possible message about conflicting dependencies | If none, this test was passed. |
Mar 11 22:43:59 fab: [MSG] Portage tests were passed. | All is fine |
Mar 11 22:43:59 fab: [INF] Unmounting host filesystem | Cleaning up |
Mar 11 22:44:02 fab: [INF] Removing mount directory | ... |
Mar 11 22:44:02 fab: [MSG] Gentoo building process ended. | End of test: OK. |
When building has stopped, a quick check procedure should first be used to ensure that the building process exited with code 0:
# ./mkg exitcode vm=... no_run
if the exit code indicated by the output message of the command is non-null, then proceed as indicated below, choosing between VM debugging (you need how to handle VirtualBox) or shared root debugging (easier, but you will need a decent installation of qemu
on your platform).
You can combine exitcode
with any Gentoo OS building command line (so not with from_device
or from_iso
). In this case, the exit code of the VM process will be printed on exiting MKG. Use no_run
in combination with exitcode
in the reverse case, when you do not want to start a building process again.
Some users may want to debug the running VM themselves. This is only possible if MKG is not run in silent mode (so without command line option gui=false
). You may follow these steps:
-
stop the VM if still running.
-
check the parameters and launch it again.
-
select your keyboard locale about 30 seconds after boot when requested.
-
enter the appropriate replies in the network settings menus (usually OK if you are connected to a wired network)
-
interrupt with
Ctrl + C
within 5 seconds after the last menu. Runifconfig
to check that your network connection. -
mount /dev/sda4 to /mnt/gentoo and chroot into it:
# mount /dev/sda4 /mnt/gentoo && cd /mnt/gentoo
# for i in proc sys dev dev/pts run; do mount -B /$i /mnt/gentoo/$i; done
# chroot /mnt/gentoo
-
proceed with your debugging session. Preferably use
nano
as an editor (I've had occasional issues withvim
) -
at the end of the session, run
exit
and (preferably) unmount the mount points before shutting down the VM.
You may prefer the convenience of debugging under a shared host directory, to which the guest virtual disk will be mounted.
Be aware that this may cause security issues, which are aggravated if the VM is running and/or write permissions are granted (normally the connection should crash in this case).
Add the following options to command line:
# ./mkg vm=your_vm_name share_root="r" ["w" for write permissions, "r" for read-only]
Use this to allow mounting of the virtual machine VDI disk root to a host directory, with write permissions if "w" is specified and read-only if "r" is specified.
# ./mkg vm=your_vm_name share_root=... shared_dir=/path/to/host/shared/dir
Use this to avoid using default /vdi
value and use a custom mount point instead.
It is advisable to use the above options when the virtual machine has been stopped.
You may however use them on launch too (use share_root=r
in this case). In this case, a 15 minute delay will be enforced, which is the time requested by the machine to launch and partition the virtual VDI disk (plus a safety margin). Do not consider this delay to be a bug: the exact time of partitioning is not known and depends on hardware.
While running, the following logs are generated by the building process:
-
While running
mkvm_chroot.sh:adjust_environment()
- partition_log: partitioning operations
- emerge.build: basic tools, updating @world, keyboard, timezone, locale
-
While running
mkvm_chroot.sh:build_kernel()
- /kernel.log: kernel source code install
- /usr/src/linux/kernel.log: kernel build
-
While running
mkvm_chroot.sh:install_software()
- log_install_software.log: installation of software listed in ebuilds.list.minimal or ebuilds.list.complete
- Rlibs.log: installation of R libraries for the full distribution install (empty for the minimal one).
-
While running
mkvm_chroot.sh:global_config()
- sddm.log: SSDM configuration for the Plasma desktop master branch.
- grub.log: grub install log.
- useradd.log: only if adding user was not possible.
-
While running
mkvm_chroot.sh:finalize()
- log_uninstall.log: uninstalled software
-
res.log: exit code of mkvm_chroot.sh.
The following exit values may indicate where errors appeared:- odd number: issue with adjust_environment()
- code & 2 == 1: issue with build_kernel()
- code & 4 == 1: issue with install_software()
- code & 8 == 1: issue with global_config()
- code & 16 == 1: issue with finalize()
Before filing an issue, please check that you are using the latest Github source code, unless repository changes since your version are clearly irrelevant to your case.
For issue reports to be useful for debugging, please follow the procedure indicated below as closely as possible:
- Minimally:
- your command line
- the date and version of the starge3 archive, the Gentoo install ISO and the CloneZilla ISO files
- versioning references (tag or commit, date)
- the output of:
# grep \\[...\\] /var/log/syslog
> log ifsyslogd
was previously installed (which is often the case).
Alternatively, rerun withtee
:
# [your mkg command] 2>&1 | tee log
and attach log to your post.
- If one of your virtual machine fails, then first check that you have installed
guestfish
.
Please run again your job with compact=false debug_mode verbose
added to command line, without an ISO output.
Once the new job has ended, plug in an empty or freely erasable USB key or external disk (minimum size: 55 GB).
Check the device label (say sdX) of this mass storage device using fdisk -l
. Do not mount it.
Now run:
# guestfish --progress-bars --ro -a [your VDI disk] run : download /dev/sda /dev/sdX && sync
This will take between 3 and 5 minutes.
Note: All data on your mass storage device will be erased and replaced with the downloaded virtual machine.
Mount the device for example to /mnt
and /mnt2
:
# mount /dev/sdX4 /mnt && mount /dev/sdX2 /mnt2 && cd /mnt
You will see a number of files at the root of /mnt
. Report the full list of files:
# ls -al * > files.log
Join a tarball of the log and ebuild files, and the syslogs (if any):
# tar cJvf debug.tar.xz *log *build* [var/log/syslog*]
Now cd to /mnt2
and report the state of kernel builds:
# cd /mnt2 && ls -al * > boot.log
Attach files.log
, boot.log
and debug.tar.xz
to your post.