Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ldap_search: callback utf-8 error "surrogates not allowed" on ansible-core 2.14 #5704

Closed
1 task done
bluikko opened this issue Dec 19, 2022 · 16 comments · Fixed by #6475 or #7264
Closed
1 task done

ldap_search: callback utf-8 error "surrogates not allowed" on ansible-core 2.14 #5704

bluikko opened this issue Dec 19, 2022 · 16 comments · Fixed by #6475 or #7264
Labels
bug This issue/PR relates to a bug has_pr module module net_tools plugins plugin (any type)

Comments

@bluikko
Copy link
Contributor

bluikko commented Dec 19, 2022

Summary

Searching for computer objects from an Active Directory with the ldap_search module fails to print to output search results in -vv mode of ansible-playbook and instead a WARNING is printed about a utf-8 error surrogates not allowed.

My guess would be that some of the binary data included in the objects returned by the query is not compatible with the callback plugin's expected UTF-8 encoding.

The same query works on ansible-core 2.11 and prints to output the search results, some attributes in the objects looks like binary data.

Issue Type

Bug Report

Component Name

ldap_search

Ansible Version

$ ansible --version
ansible [core 2.14.1]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/ansible/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /ansible/lib64/python3.9/site-packages/ansible
  ansible collection location = /home/ansible/.ansible/collections:/usr/share/ansible/collections
  executable location = /ansible/bin/ansible
  python version = 3.9.14 (main, Nov  7 2022, 00:00:00) [GCC 11.3.1 20220421 (Red Hat 11.3.1-2)] (/ansible/bin/python3)
  jinja version = 3.1.2
  libyaml = True

Community.general Version

$ ansible-galaxy collection list community.general
# /usr/share/ansible/collections/ansible_collections
Collection        Version
----------------- -------
community.general 6.1.0

Configuration

$ ansible-config dump --only-changed
DEFAULT_STDOUT_CALLBACK(/etc/ansible/ansible.cfg) = community.general.yaml

OS / Environment

EL9.1, Python 3.9

Steps to Reproduce

Query computer objects from Active Directory and run ansible-playbook -vv to make the task print the search results to screen.

On ansible-core 2.11 the output includes binary-looking data such as:

    objectGUID: |-
      þ-ÎDBÎâªØÓ¥
    objectSid: "[...]E¹Sîà¹'3Ô[...]"

Expected Results

The task should print to output the search results in the same way it does on ansible-core 2.11.

Actual Results

[WARNING]: Failure using method (v2_runner_on_ok) in callback plugin (<ansible_collections.community.general.plugins.callback.yaml.CallbackModule object at 0x7f4071cb7b20>): 'utf-8' codec can't encode characters in position 6-9: surrogates not allowed

Instead of printing search results, only the above warning is printed. Otherwise the task seems to work correctly, results are registered.

Code of Conduct

  • I agree to follow the Ansible Code of Conduct
@ansibullbot
Copy link
Collaborator

Files identified in the description:

If these files are incorrect, please update the component name section of the description or use the !component bot command.

click here for bot help

@ansibullbot
Copy link
Collaborator

@ansibullbot ansibullbot added bug This issue/PR relates to a bug module module net_tools plugins plugin (any type) labels Dec 19, 2022
@stefanDeveloper
Copy link

I get the same error, when I try to output the results

@felixfontein
Copy link
Collaborator

In general Ansible does not handle binary output well; modules should not return binary output. This is a bit tricky since the ldap_search module doesn't know what it returns, and Base64 encoding everything makes the module harder to work with.

One could add an option to the module which allows to specify a list of fields that should be Base64 encoded, then you could make sure that objectGUID and objectSid are Base64 encoded (from the example in the issue itself), but other fields (like imaginary fields objectName and objectDescription, which contain text and no binary).

Alternatively (or at the same time), the yaml callback could be fixed to behave better with binary data. @stefanDeveloper are you also using the yaml callback, or are you using another one?

@felixfontein
Copy link
Collaborator

!component +plugins/callback/yaml.py

@ansibullbot
Copy link
Collaborator

Files identified in the description:

If these files are incorrect, please update the component name section of the description or use the !component bot command.

click here for bot help

@bluikko
Copy link
Contributor Author

bluikko commented Mar 2, 2023

This used to work in 2.11, don't know what has changed.

I do not care much about the binary data but some output would be necessary. I do not know if/how this could work in practice but IMO the easiest solution would be to just check if the data is binary and then just print (binary data) or something similar.
Edit: this would move the problem from core/callback plugin to the module which might in practice be easier to get released workflow-wise.

@felixfontein
Copy link
Collaborator

Did anyone check whether this also happens with the default callback? The only examples in this issue where anything more concrete is mentioned are with the community.general.yaml callback. That would help to decide where this error actually comes from (ansible-core or the yaml callback).

@felixfontein
Copy link
Collaborator

The problem seems to arise with all callbacks (that use Dispay) and can easily be reproduced by running ansible localhost -m command -a 'dd if=/dev/urandom of=/dev/stdout bs=1 count=20'. With that it's easy to test that this did work with ansible-core 2.13, but it does no longer work with ansible-core 2.14. I guess it is related to all the encoding / locale changes mentioned in https://github.com/ansible/ansible/blob/stable-2.14/changelogs/CHANGELOG-v2.14.rst.

@RomyxBaps
Copy link

Hello,
I have the same problem here.
In a playbook i launch the latex binary and i have and error/warning on stdout :

[WARNING]: Failure using method (v2_runner_on_failed) in callback plugin (<ansible_collections.community.general.plugins.callback.yaml.CallbackModule object
at 0x7f57d7166cd0>): 'utf-8' codec can't encode character '\udce9' in position 9543: surrogates not allowed

@felixfontein
Copy link
Collaborator

I think it's time to create an issue in ansible-core for this, since this is an ansible-core problem, not really a community.general problem.

@felixfontein
Copy link
Collaborator

I've created ansible/ansible#80258 for this. Please add more examples that are not related to ldap_search there and not here :)

@felixfontein
Copy link
Collaborator

According to ansible/ansible#80258 (comment) the module must not return binary data as text, so we need to adjust the module's output. I would probably use a similar code as in ansible/ansible#80258 (comment):

            data = to_bytes(data, encoding='utf-8')
            data = to_text(data, 'utf-8', errors='replace')

Probably we should have some better way to output data, but that can happen later (resp. when someone has a good idea how to actually do it).

@felixfontein
Copy link
Collaborator

I created a PR to fix ldap_search: #6475 I would be glad if some of you could test this!

The PR does two things:

  • Force all string output to be UTF-8 (by mangling non-UTF-8 binary data); this is a breaking change;
  • Allow to specify attributes that will be Base64 encoded (in case you need the exact value, or if you want to have nicer output, then use this).

Comments welcome! Since this is a breaking change I want to get this into community.general 7.0.0 (to be released in ~a week), and this won't get backported to 6.x.y or before.

@ursetto
Copy link

ursetto commented Sep 14, 2023

This PR didn't fix the issue because _normalize_string only operates on string values, but python-ldap always returns bytes values by design. https://www.python-ldap.org/en/python-ldap-3.4.3/bytes_mode.html

def _normalize_string(val, convert_to_base64):
if isinstance(val, string_types):

Thus base64_attributes has no effect, and additionally no output is converted to UTF-8, nor does it have illegal characters replaced.

@felixfontein
Copy link
Collaborator

I guess it should have been isinstance(val, (string_types, binary_type)) then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment