Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datadog-agent/7.61.0 package update #39388

Closed
wants to merge 12 commits into from

Conversation

octo-sts[bot]
Copy link
Contributor

@octo-sts octo-sts bot commented Jan 13, 2025

Signed-off-by: wolfi-bot <121097084+wolfi-bot@users.noreply.github.com>
@octo-sts octo-sts bot added request-version-update request for a newer version of a package automated pr P1 This label indicates our scanning found High, Medium or Low CVEs for these packages. labels Jan 13, 2025
Copy link
Contributor Author

octo-sts bot commented Jan 13, 2025

⚠️ EXPERIMENTAL

Please use 👍 or 👎 on this comment to indicate if you agree or disagree with the recommendation.

To provide more detailed feedback please comment on the recommendation prefixed with /ai-verify:

e.g. /ai-verify partially helpful but I also added bash to the build environment

Gen AI suggestions to solve the build error:

Based on the build error log, here's my analysis and suggested fix:

• Detected Error: "No idea what '--no-bundle' is!"

• Error Category: Build Configuration

• Failure Point: During the system-probe build step when running:

unshare --user --map-root-user \
  invoke -e system-probe.build \
  --strip-object-files \
  --no-bundle \
  --bundle-ebpf

• Root Cause Analysis:
The --no-bundle flag appears to be invalid or no longer supported in the current version of the Datadog Agent build system.

• Suggested Fix:
Remove the --no-bundle flag from the build command. Update the build step to:

- runs: |
    unshare --user --map-root-user \
      invoke -e system-probe.build \
      --strip-object-files \
      --bundle-ebpf

• Explanation:
The error suggests that the --no-bundle option is not recognized by the build system. Looking at the recent Datadog Agent changelog and build documentation, this flag may have been deprecated or removed. The core functionality we need - building the system probe with eBPF support - is still maintained through the --bundle-ebpf flag.

• Additional Notes:

  • The rest of the build configuration appears correct, including the kernel headers setup and clang installation
  • Other flags like --strip-object-files and --bundle-ebpf are still valid
  • This change shouldn't affect the final package functionality since we're still bundling the eBPF components

• References:

@octo-sts octo-sts bot added the ai/skip-comment Stop AI from commenting on PR label Jan 13, 2025
…rations

Signed-off-by: Aditvil-Dev <aditya.tirmanwar@chainguard.dev>
@Aditvil-Dev
Copy link
Member

change in .gitlab/binary_build/system_probe.yml https://github.com/DataDog/datadog-agent/compare/7.60.1...7.61.0

- inv -e system-probe.build --strip-object-files --no-bundle
+ inv -e system-probe.build --strip-object-files

@octo-sts octo-sts bot added bincapz/pass bincapz/pass Bincapz (aka. malcontent) scan didn't detect any CRITICALs on the scanned packages. manual/review-needed labels Jan 13, 2025
Signed-off-by: RJ Sampson <rj.sampson@chainguard.dev>
@Aditvil-Dev Aditvil-Dev removed their assignment Jan 15, 2025
…tregration core.

Signed-off-by: Debasish Biswas <debasishbsws.dev@gmail.com>
@debasishbsws debasishbsws removed the ai/skip-comment Stop AI from commenting on PR label Jan 17, 2025
@debasishbsws
Copy link
Member

After some research, it appears that the copied .venv library does not include all the standard library functions or the necessary embedded Python files and libraries.

…ython module can find the std modules present in /usr/lib/python3.xx

Signed-off-by: Debasish Biswas <debasishbsws.dev@gmail.com>
@debasishbsws debasishbsws requested a review from a team January 17, 2025 12:02
Copy link
Member

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exacty requesting changes, but I would like us to resolve my one concern before we land.

@@ -566,6 +569,7 @@ test:
# cannot use vars.destd here. https://github.com/chainguard-dev/melange/issues/1402
# setting PATH here has no effect.
mypath: /opt/datadog-agent/bin/agent:/opt/datadog-agent/embedded/bin
PYTHONPATH: /opt/datadog-agent/embedded/lib/python${{vars.py-version}}/site-packages:/usr/lib/python${{vars.py-version}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely convinced by this test workaround. To me, it implies that users would need to set PYTHONPATH for the package to work as expected, which shouldn't be required. Am I misunderstanding something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set the PYTHONPATH environment variable in the image configuration when required by the software. This is necessary as the application may not use the default Python locations and could reside elsewhere, such as /opt/datadog-agent/....

Similarly, we need to set the PYTHONPATH in the test environment, as we do in other package tests.

In this case, the PYTHONPATH in the image configuration appears to be incorrect. Which needs to be fixed, I’ve started a Slack thread to better understand how Datadog handles its Python environment. and how we can dynamically set the python version in image config as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By testing the older version locally it seems this python path issue is with this new version only and the older version are working just fine.

@OddBloke
Copy link
Member

I've spent a bit more time looking into the issue here. I'm still not really sure why the issue has cropped up since the last upstream release, though DataDog/datadog-agent@f631ce2 is the most likely culprit.

The issue we are now running into is that the agent code expects a full Python installation to be embedded into /opt/datadog-agent/embedded, and so uses that directory as PYTHONHOME (https://github.com/DataDog/datadog-agent/blob/main/pkg/collector/python/init.go#L403). PYTHONHOME sets the root of the Python installation, meaning that the standard library will be searched for under that path. A virtualenv is not intended to be used as a PYTHONHOME, which is the cause of the encodings import failure: encodings is in /usr/lib/python3.12/, and not anywhere under the PYTHONHOME which the agent sets. This is why setting the PYTHONPATH in the test environment fixes the problem, but means that any executions of the agent will need the same PYTHONPATH setting in order to find the standard library.

One thing I tested is removing the embedded directory altogether. In this case, the agent successfully instantiates the system Python (which is configured correctly and so can import encodings) but then can't import any of the Datadog modules (unsurprisingly, of course, as they're installed into the embedded directory).

It seems to me that we have a few options here: we can:

  1. continue mimicking a full embedded Python installation using a virtualenv in /opt/datadog-agent/embedded, with the PYTHONPATH workaround for now and perhaps other issues in future due to divergence from upstream,
  2. actually place a full embedded Python installation into .../embedded, aligning us with upstream,
  3. install the Datadog Python packages into the system Python and remove the .../embedded directory altogether, which has some upstream support but isn't how they ship, though is at least a normal Python environment which should make addressing any future issues somewhat simpler, or
  4. craft and carry a patch which modifies the agent's behaviour to not expect a full Python installation in .../embedded

It's not clear to me which of these is the correct option for our use cases.

Signed-off-by: Debasish Biswas <debasishbsws.dev@gmail.com>
@debasishbsws
Copy link
Member

@OddBloke IMO I think suggestion no 2 seems more correct which align with the upstream and may reduce problem in the future as well.

Copying all the site-packages from the /usr/lib/python${{vars.py-version}}/site-packages to the .../embedded python before the build should work.

@cmwilson21 cmwilson21 added the help wanted Extra attention is needed label Jan 21, 2025
@philroche
Copy link
Member

@OddBloke @debasishbsws I favour option 3

install the Datadog Python packages into the system Python and remove the .../embedded directory altogether, which has some upstream support but isn't how they ship, though is at least a normal Python environment which should make addressing any future issues somewhat simpler, or

This means that we can use packaged python libs too where possible

Copying all the site-packages from the /usr/lib/python${{vars.py-version}}/site-packages to the .../embedded python before the build should work.

This will likely not end well for us. system packages and venvs are not designed to be copied around like this. See https://virtualenv.pypa.io/en/legacy/userguide.html#making-environments-relocatable specifically

Normally environments are tied to a specific path. That means that you cannot move an environment around or copy it to another computer. You can fix up an environment to make it relocatable with the command:

The --relocatable option currently has a number of issues, and is not guaranteed to work in all circumstances. It is possible that the option will be deprecated in a future version of virtualenv.

@cmwilson21 cmwilson21 assigned hbh7 and unassigned debasishbsws Jan 23, 2025
Signed-off-by: hbh7 <hunter.harris@chainguard.dev>
@hbh7
Copy link
Contributor

hbh7 commented Jan 23, 2025

After getting caught up on the conversation around this, sounds like we're going to try option 3 and get rid of the virtualenv and the embedded folder, and installing everything using wolfi-specific python packages where possible and installing everything else alongside otherwise. I've started working on this.

@hbh7 hbh7 force-pushed the wolfictl-d82f9495-65fe-4bbe-9a34-b2f0dd8fa083 branch from f8fd1c1 to 5452198 Compare January 27, 2025 22:08
Copy link
Member

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start, thank you! I've noticed and analysed one pretty serious issue, details in the comments.

Comment on lines +353 to +354
mkdir -p ${{targets.contextdir}}/usr/lib/python${{vars.py-version}}
cp -r /usr/lib/python${{vars.py-version}}/site-packages ${{targets.contextdir}}/usr/lib/python${{vars.py-version}}/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to avoid these lines, I think. I see other packages doing pip install --prefix=/usr --root=${{targets.contextdir}} which I think would solve it.

- name: Ensure the agent integration subcommand works
runs: |
PATH=$mypath:$PATH
# TODO: mkdir and ln shouldn't be required
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a look at the code and, unfortunately, I don't think there's a way around this (without patching). getCommandPython is the relevant function once you follow the codepath down (briefly: freezeCmd -> func list -> func pip -> func getCommandPython).

It uses the relative path from the agent binary to locate the expected embedded Python installation (which is what the symlink we're installing here mimics).

You'll notice a useSysPython flag to the function. Unfortunately, this requires passing -p to agent integration freeze which is intended for developer/power-user usage so isn't listed in help text, and means that anyone following standard instructions will see failures. So I think we'll probably need to ship this directory and the symlink in order to have agent integration work as expected.

However, our integration issues do not end here! agent integration remove works as expected:

$ /opt/datadog-agent/bin/agent/agent integration remove datadog-zk
Found existing installation: datadog-zk 6.0.0
Uninstalling datadog-zk-6.0.0:
  Successfully uninstalled datadog-zk-6.0.0

but installation doesn't:

$ /opt/datadog-agent/bin/agent/agent integration install datadog-zk==6.0.0
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/lib/python3.12/site-packages/datadog_checks/downloader/__main__.py", line 9, in <module>
    sys.exit(download())
             ^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/datadog_checks/downloader/cli.py", line 147, in download
    run_downloader(tuf_downloader, standard_distribution_name, version, ignore_python_version)
  File "/usr/lib/python3.12/site-packages/datadog_checks/downloader/cli.py", line 138, in run_downloader
    wheel_abspath = tuf_downloader.download(wheel_relpath)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/datadog_checks/downloader/download.py", line 316, in download
    target_abspath = self._download_with_tuf_in_toto(target_relpath)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/datadog_checks/downloader/download.py", line 300, in _download_with_tuf_in_toto
    self.__download_and_verify_in_toto_metadata(target_relpath, target_abspath, target)
  File "/usr/lib/python3.12/site-packages/datadog_checks/downloader/download.py", line 289, in __download_and_verify_in_toto_metadata
    self.__in_toto_verify(inspection_packet, target_relpath)
  File "/usr/lib/python3.12/site-packages/datadog_checks/downloader/download.py", line 264, in __in_toto_verify
    self.__handle_in_toto_verification_exception(target_relpath, e)
  File "/usr/lib/python3.12/site-packages/datadog_checks/downloader/download.py", line 245, in __handle_in_toto_verification_exception
    raise e
  File "/usr/lib/python3.12/site-packages/datadog_checks/downloader/download.py", line 262, in __in_toto_verify
    verifylib.in_toto_verify(root_layout, root_layout_pubkeys, substitution_parameters=root_layout_params)
  File "/usr/lib/python3.12/site-packages/in_toto/verifylib.py", line 1613, in in_toto_verify
    inspection_link_dict = run_all_inspections(layout, persist_inspection_links)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/in_toto/verifylib.py", line 227, in run_all_inspections
    link = in_toto.runlib.in_toto_run(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/in_toto/runlib.py", line 539, in in_toto_run
    byproducts = execute_link(link_cmd_args, record_streams)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/in_toto/runlib.py", line 318, in execute_link
    process = subprocess.run(
              ^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.12/subprocess.py", line 1955, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'python'

We can work around this by installing python3<3.13 (which has a python symlink), but we then get:

$ /opt/datadog-agent/bin/agent/agent integration install datadog-zk==6.0.0
/usr/lib/python3.12/site-packages/datadog_checks/downloader/data/repo/targets/simple/datadog-zk/datadog_zk-6.0.0-py2.py3-none-any.whl
Processing /usr/lib/python3.12/site-packages/datadog_checks/downloader/data/repo/targets/simple/datadog-zk/datadog_zk-6.0.0-py2.py3-none-any.whl
ERROR: Cannot install datadog-zk 6.0.0 (from /usr/lib/python3.12/site-packages/datadog_checks/downloader/data/repo/targets/simple/datadog-zk/datadog_zk-6.0.0-py2.py3-none-any.whl) because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested datadog-zk 6.0.0 (from /usr/lib/python3.12/site-packages/datadog_checks/downloader/data/repo/targets/simple/datadog-zk/datadog_zk-6.0.0-py2.py3-none-any.whl)
    The user requested (constraint) datadog-zk

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip to attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
Error: error installing wheel /usr/lib/python3.12/site-packages/datadog_checks/downloader/data/repo/targets/simple/datadog-zk/datadog_zk-6.0.0-py2.py3-none-any.whl: error running command: exit status 1

As you can see, the conflict described there doesn't make a great deal of sense. This is particularly true because python3 -m pip install /usr/lib/python3.12/site-packages/datadog_checks/downloader/data/repo/targets/simple/datadog-zk/datadog_zk-6.0.0-py2.py3-none-any.whl does work. Looking at the installation code it passes --constraint to pip using the constraints file we installed: if I also pass --constraint /opt/datadog-agent/final_constraints-py3.txt to my pip invocation, then I see the same error message as from integration install.

Looking at the constraints file, I see:

$ grep datadog-zk /opt/datadog-agent/final_constraints-py3.txt
datadog-zk @ file:///home/integrations/zk

That's the path which we use during build, which doesn't exist in the installed environment.

So, after all this, I think we need to try a build with the pip arguments I mention above, and see if that fixes this problem. (We should probably also add remove/install to our testing!)

@octo-sts octo-sts bot closed this Jan 29, 2025
Copy link
Contributor Author

octo-sts bot commented Jan 29, 2025

superseded by #40819

@octo-sts octo-sts bot deleted the wolfictl-d82f9495-65fe-4bbe-9a34-b2f0dd8fa083 branch January 30, 2025 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automated pr bincapz/pass bincapz/pass Bincapz (aka. malcontent) scan didn't detect any CRITICALs on the scanned packages. help wanted Extra attention is needed manual/review-needed P1 This label indicates our scanning found High, Medium or Low CVEs for these packages. request-version-update request for a newer version of a package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants