Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Conditionally remove networkd online dependency on Ubuntu #5772

Merged
merged 8 commits into from
Oct 17, 2024

Conversation

TheRealFalcon
Copy link
Member

@TheRealFalcon TheRealFalcon commented Oct 2, 2024

Proposed Commit Message

feat: Conditionally remove networkd online dependency on Ubuntu

Traditionally, cloud-init-network.service (previously
cloud-init.service) waited for network connectivity (via systemd
service ordering) before running. This has caused
cloud-init-network.service to block boot for a significant amount of
time. For the vast majority of boots, this network connectivity
isn't required.

This commit removes the ordering
After=systemd-networkd-wait-online.service, but checks the datasource
and user data in the init-local timeframe to see if network
connectivity will be necessary in the init network timeframe.
If so, when the init network service starts, it will call
systemd-networkd-wait-online manually in the same manner that the
systemd-networkd-wait-online.service does to wait for network
connectivity.

This commit affects Ubuntu only due to the various number of service
orderings and network renderers possible, along with the downstream
synchronization needed. However, a new overrideable method in the
Distro class should make this optimization trivial to implement for
any other distro.

Additional Context

Test Steps

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

@TheRealFalcon
Copy link
Member Author

TheRealFalcon commented Oct 3, 2024

Take 2...

I added a commit that instead of using a drop-in will call the systemd-networkd-wait-online binary using the exact same args that the service uses based on 'systemctl cat'. This removes the need for a costly daemon-reload in the cases we need to wait on network.

I haven't touched integration tests since the first commit, so they will have known failures.

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @TheRealFalcon for working this. quick first pass review as your are actively coding/developing tests for this functionality

cloudinit/cmd/main.py Show resolved Hide resolved
cloudinit/cmd/main.py Outdated Show resolved Hide resolved
cloudinit/cmd/main.py Outdated Show resolved Hide resolved
cloudinit/distros/debian.py Outdated Show resolved Hide resolved
systemd/cloud-init-network.service.tmpl Outdated Show resolved Hide resolved
cloudinit/net/netplan.py Outdated Show resolved Hide resolved
@TheRealFalcon TheRealFalcon force-pushed the wait branch 2 times, most recently from f88d8a9 to ed3a50e Compare October 3, 2024 19:48
@TheRealFalcon
Copy link
Member Author

@blackboxsw , thanks for the review. I applied your comments and also made a change to move the wait into the activators.

@blackboxsw blackboxsw added the packaging Supplemental package review requested label Oct 3, 2024
cloudinit/cmd/main.py Show resolved Hide resolved
cloudinit/distros/__init__.py Show resolved Hide resolved
@TheRealFalcon TheRealFalcon marked this pull request as ready for review October 8, 2024 19:41
@blackboxsw blackboxsw self-assigned this Oct 10, 2024
cloudinit/cmd/main.py Outdated Show resolved Hide resolved
cloudinit/cmd/main.py Outdated Show resolved Hide resolved
cloudinit/cmd/main.py Show resolved Hide resolved
cloudinit/cmd/main.py Outdated Show resolved Hide resolved
@TheRealFalcon TheRealFalcon force-pushed the wait branch 2 times, most recently from 483f2b6 to bbf16f9 Compare October 16, 2024 13:26
@TheRealFalcon
Copy link
Member Author

updated based on comments. Integration tests still need to be updated.

Traditionally, cloud-init-network.service (previously
cloud-init.service) waited for network connectivity (via systemd
service ordering) before running. This has caused
cloud-init-network.service to block boot for a significant amount of
time. For the vast majority of boots, this network connectivity
isn't required.

This commit removes the ordering
After=systemd-networkd-wait-online.service, but checks the datasource
and user data in the init-local timeframe to see if network
connectivity will be necessary in the init network timeframe.
If so, when the init network service starts, it will call
systemd-networkd-wait-online manually in the same manner that the
systemd-networkd-wait-online.service does to wait for network
connectivity.

This commit affects Ubuntu only due to the various number of service
orderings and network renderers possible, along with the downstream
synchronization needed. However, a new overrideable method in the
Distro class should make this optimization trivial to implement for
any other distro.
The biggest one being write .skip-network rather than .wait-for-network,
so if there's ever a case where it doesn't get written where it should,
default behavior will be to wait as we always have.
@TheRealFalcon TheRealFalcon force-pushed the wait branch 2 times, most recently from 8940070 to 45d48ac Compare October 17, 2024 13:11
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content looks good, is limited to Ubutnu and behaves well across upgrade testing.

Ran through some performance samples on Azure across clean reboot to assess boot speed impacts the result is negligible boot speed impacts across the upgrade well within standard deviation of the samples that can be attributed to platform behavior differences.. In some cases I was able to see 1.2 second reduction of time to SSH but that wasn't represented in all cases. Generally the degraded timing of init-local/search-Azure appears due to latency in dhcpcd responses which is platform performance related, not cloud-init changeset related for this branch.

Performance samples on 15 clean reboot runs on Azure of daily PPA vs this branch ``` --------------------- Performance Deltas Encountered --------------------------------- | Control Avg/Stdev | Upgr. Avg/Stdev | Avg delta | Delta type and service name -------------------------------------------------------------------------------------- | 021.33s/0.86s | 021.59s/0.29s | +00.26s | *** client_time_to_ssh | 014.29s/0.66s | 014.45s/0.59s | +00.16s | *** time_systemd_userspace | 010.34s/3.79s | 012.24s/3.71s | +01.90s | *** time_cloudinit_total | 000.98s/0.65s | 001.35s/0.65s | +00.37s | ***DEGRADED stage/init-local/search-Azure | 000.94s/0.65s | 001.30s/0.65s | +00.37s | ***DEGRADED stage/azure-ds/crawl_metadata | 000.84s/0.65s | 001.22s/0.65s | +00.37s | ***DEGRADED stage/azure-ds/obtain-dhcp-lease | 000.85s/0.65s | 001.22s/0.65s | +00.37s | ***DEGRADED stage/azure-ds/_setup_ephemeral_networking | 001.08s/0.65s | 001.44s/0.65s | +00.37s | ***DEGRADED stage/init-local | 000.98s/0.65s | 001.35s/0.65s | +00.37s | ***DEGRADED stage/azure-ds/_get_data ------------------- Control image -------------------------- | Avg/Stdev | Max | Min | Metric Name ----------------------------------------------------------------------- | 021.33s/0.86s | 021.87s | 018.38s | client_time_to_ssh | 021.91s/0.61s | 022.38s | 019.89s | client_time_to_cloudinit_done | 001.43s/0.02s | 001.48s | 001.40s | time_systemd_kernel | 014.29s/0.66s | 015.44s | 013.19s | time_systemd_userspace | 010.34s/3.79s | 017.19s | 005.59s | time_cloudinit_total | 004.05s/0.12s | 004.43s | 003.95s | cloud-config.service | 004.03s/0.13s | 004.45s | 003.92s | snapd.seeded.service | 004.00s/0.08s | 004.21s | 003.90s | dev-sda1.device | 003.85s/0.12s | 004.23s | 003.70s | snapd.service | 004.43s/0.79s | 005.75s | 003.77s | walinuxagent-network-setup.service | 003.21s/0.12s | 003.49s | 002.99s | apport.service | 003.38s/0.36s | 003.83s | 002.81s | rsyslog.service | 002.96s/0.32s | 003.84s | 002.78s | networkd-dispatcher.service | 002.65s/0.34s | 003.50s | 002.39s | udisks2.service | 001.85s/0.05s | 001.92s | 001.74s | polkit.service | 001.57s/0.06s | 001.73s | 001.45s | chrony.service | 001.36s/0.18s | 001.53s | 001.13s | systemd-fsck@dev-disk-by\x2dlabel-BOOT.service | 001.37s/0.06s | 001.48s | 001.33s | systemd-fsck@dev-disk-by\x2duuid-CD06\x2d6D44.service | 001.34s/0.03s | 001.38s | 001.30s | cloud-init-main.service | 001.25s/0.07s | 001.38s | 001.14s | systemd-logind.service | 001.09s/0.06s | 001.20s | 001.00s | cloud-init-network.service | 001.72s/0.48s | 002.35s | 001.02s | cloud-init-local.service | 001.70s/0.25s | 001.88s | 001.53s | ModemManager.service | 001.47s/0.00s | 001.47s | 001.47s | systemd-fsck@dev-disk-cloud-azure_resource\x2dpart1.service | 001.06s/0.00s | 001.06s | 001.06s | secureboot-db.service | 002.24s/0.17s | 002.75s | 002.00s | modules-config/config-grub_dpkg | 000.44s/0.01s | 000.46s | 000.42s | init-network/config-mounts | 000.18s/0.05s | 000.28s | 000.12s | modules-config/config-apt_configure | 000.84s/0.65s | 002.04s | 000.07s | stage/azure-ds/obtain-dhcp-lease | 000.85s/0.65s | 002.04s | 000.08s | stage/azure-ds/_setup_ephemeral_networking | 000.94s/0.65s | 002.13s | 000.17s | stage/azure-ds/crawl_metadata | 000.98s/0.65s | 002.17s | 000.21s | stage/azure-ds/_get_data | 000.98s/0.65s | 002.18s | 000.21s | stage/init-local/search-Azure | 001.08s/0.65s | 002.27s | 000.31s | stage/init-local | 000.23s/0.09s | 000.37s | 000.10s | stage/init-network/config-ssh | 000.98s/0.08s | 001.13s | 000.85s | stage/init-network | 002.89s/0.17s | 003.40s | 002.70s | stage/modules-config | 000.17s/0.03s | 000.29s | 000.15s | stage/modules-final ------------------- Updated cloud-init image -------------------------- | Avg/Stdev | Max | Min | Metric Name ----------------------------------------------------------------------- | 021.59s/0.29s | 021.99s | 020.72s | client_time_to_ssh | 022.10s/0.30s | 022.50s | 021.22s | client_time_to_cloudinit_done | 001.42s/0.02s | 001.44s | 001.39s | time_systemd_kernel | 014.45s/0.59s | 015.19s | 013.46s | time_systemd_userspace | 012.24s/3.71s | 016.89s | 006.58s | time_cloudinit_total | 004.10s/0.26s | 004.96s | 003.87s | dev-sda1.device | 003.93s/0.21s | 004.08s | 003.20s | cloud-config.service | 003.91s/0.19s | 004.03s | 003.25s | snapd.seeded.service | 004.26s/0.59s | 005.61s | 003.75s | walinuxagent-network-setup.service | 003.72s/0.20s | 003.86s | 003.02s | snapd.service | 003.08s/0.21s | 003.18s | 002.32s | apport.service | 003.24s/0.43s | 003.58s | 002.15s | rsyslog.service | 002.75s/0.18s | 002.91s | 002.12s | networkd-dispatcher.service | 002.43s/0.23s | 002.59s | 001.62s | udisks2.service | 001.79s/0.13s | 001.98s | 001.35s | polkit.service | 001.92s/0.33s | 002.38s | 001.28s | cloud-init-local.service | 001.54s/0.10s | 001.63s | 001.21s | chrony.service | 001.37s/0.05s | 001.47s | 001.31s | cloud-init-main.service | 001.24s/0.05s | 001.35s | 001.16s | systemd-logind.service | 001.36s/0.13s | 001.50s | 001.15s | systemd-fsck@dev-disk-by\x2dlabel-BOOT.service | 001.12s/0.19s | 001.63s | 001.01s | cloud-init-network.service | 001.46s/0.11s | 001.57s | 001.34s | systemd-fsck@dev-disk-by\x2duuid-CD06\x2d6D44.service | 001.33s/0.16s | 001.44s | 001.21s | systemd-fsck@dev-disk-cloud-azure_resource\x2dpart1.service | 002.12s/0.13s | 002.29s | 001.69s | modules-config/config-grub_dpkg | 000.43s/0.01s | 000.45s | 000.41s | init-network/config-mounts | 000.20s/0.05s | 000.26s | 000.11s | modules-config/config-apt_configure | 001.22s/0.65s | 002.07s | 000.25s | stage/azure-ds/obtain-dhcp-lease | 001.22s/0.65s | 002.07s | 000.26s | stage/azure-ds/_setup_ephemeral_networking | 001.30s/0.65s | 002.16s | 000.34s | stage/azure-ds/crawl_metadata | 001.35s/0.65s | 002.20s | 000.39s | stage/azure-ds/_get_data | 001.35s/0.65s | 002.21s | 000.39s | stage/init-local/search-Azure | 001.44s/0.65s | 002.29s | 000.49s | stage/init-local | 000.23s/0.17s | 000.78s | 000.07s | stage/init-network/config-ssh | 000.98s/0.17s | 001.54s | 000.81s | stage/init-network | 002.76s/0.18s | 002.92s | 002.15s | stage/modules-config | 000.17s/0.03s | 000.27s | 000.15s | stage/modules-final ```

Comment on lines +192 to +206
(
mock.Mock(),
textwrap.dedent(
"""\
#cloud-config
write_files:
- source:
uri: http://example.com
headers:
Authorization: Basic stuff
User-Agent: me
"""
),
True,
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a counter-case with uri: /somepath and assert False

cloudinit/net/activators.py Show resolved Hide resolved
cloudinit/net/activators.py Show resolved Hide resolved
@TheRealFalcon
Copy link
Member Author

Test case added and no more comments left. I'm going to merge this.

@TheRealFalcon TheRealFalcon merged commit e30549e into canonical:main Oct 17, 2024
22 checks passed
@TheRealFalcon TheRealFalcon deleted the wait branch October 17, 2024 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
packaging Supplemental package review requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants