Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cukinia: filter junitxml output to only include valid xml 1.0 chars #66

Merged
merged 2 commits into from
Nov 20, 2023

Conversation

markfeathers
Copy link
Contributor

In running some hardware tests that use cukinia_cmd/cukinia_run_dir I ran a script that included "btattach" output. This includes invalid chars in XML 1.0. Jenkins will refuses to parse the output file.

The exact btattach output can be reproduced with:
tests/unicode/filtertest.sh

#!/bin/sh -e

echo -e '\x1b[0;93m[CHG]\x1b[0m'
>&2 echo -e '\x1b[0;93m[CHG]\x1b[0m'

unicode.conf:

verbose cukinia_run_dir tests/unicode

run:

./cukinia -f junitxml -o output.xml unicode.conf && xmllint output.xml

This will return:

output.xml:5: parser error : Unregistered error message
      <system-err><![CDATA[[BAR]]]></system-err>
                           ^
output.xml:5: parser error : PCDATA invalid Char value 27
      <system-err><![CDATA[[BAR]]]></system-err>
                           ^
output.xml:5: parser error : PCDATA invalid Char value 27
      <system-err><![CDATA[[BAR]]]></system-err>
                                       ^
output.xml:5: parser error : Sequence ']]>' not allowed in content
      <system-err><![CDATA[[BAR]]]></system-err>
                                        ^
output.xml:6: parser error : Unregistered error message
      <system-out><![CDATA[[FOO]]]></system-out>
                           ^
output.xml:6: parser error : PCDATA invalid Char value 27
      <system-out><![CDATA[[FOO]]]></system-out>
                           ^
output.xml:6: parser error : PCDATA invalid Char value 27
      <system-out><![CDATA[[FOO]]]></system-out>
                                       ^
output.xml:6: parser error : Sequence ']]>' not allowed in content177
      <system-out><![CDATA[[FOO]]]></system-out>

Jenkins will refuse to parse this with:

org.dom4j.DocumentException: Error on line 11 of document  : An invalid XML character (Unicode: 0x1) was found in the CDATA section.
	at org.dom4j.io.SAXReader.read(SAXReader.java:511)
	at org.dom4j.io.SAXReader.read(SAXReader.java:392)

This commit uses tr to filter to only valid characters. I tested this with a few other examples:
Byte values 0-255:

#!/bin/bash

for i in $(seq 0 255); do
    printf "\\$(printf '%03o' $i)"
done

Random:

#!/bin/sh

dd bs=1M count=10 if=/dev/urandom

Both of these pass after adding the tr filter. The downside to this method is that it is overly aggressive. This will break unicode that otherwise might be valid in xml 1.0/junitxml, but I'm not sure how else to filter this more carefully in only busybox / sh otherwise. I'm curious if there are thoughts on any better way to handle this.

The other solution I see to this are just requiring the tests to limit their own output. I could certainly make my test script silent if it succeeds, but the most valuable output is probably when it fails in surprising ways where I might want to see the unexpected output.

@ch-perry
Copy link

Hello Mark,

I agree that cukinia should output a valid XML file even if stdout or stderr outputs binary data.

Would tr -dc [:print:] work with unicode?

Please include your test files in ./tests if possible.

@markfeathers markfeathers force-pushed the filter-junitxml-chars branch 2 times, most recently from b99bc4f to 0cba180 Compare August 25, 2023 23:31
Prevent outputting characters invalid in xml 1.0 by limiting output to
printable characters.
@markfeathers markfeathers force-pushed the filter-junitxml-chars branch from 0cba180 to e546036 Compare August 25, 2023 23:35
@markfeathers
Copy link
Contributor Author

That filtered the xml as well and is a lot simpler. The tests I had add a lot of random text that might make the existing test more difficult to use, so I added a separate config.

cd tests/xml
../../cukinia -f junitxml -o test.xml xml.conf && xmllint --noout test.xml && echo $?

@markfeathers markfeathers force-pushed the filter-junitxml-chars branch from e546036 to 0f95890 Compare August 25, 2023 23:43
@ch-perry
Copy link

@joufella: any opinion on this?

It would be nice to have a "works all the time" type of output, without having to specify what type of output we're expecting for each command.

tr -dc [:print:] will work but will truncate unicode characters. I'll note that hexdump could be useful here as it has some interesting formatting options that would allow us to keep the non-printable characters instead of truncating them:

_c
           Output characters in the default character set. Non-printing characters are displayed in
           three-character, zero-padded octal, except for those representable by standard escape notation
           (see above), which are displayed as two-character strings.
_p
           Output characters in the default character set. Non-printing characters are displayed as a single
           '.'.
_u
           Output US ASCII characters, with the exception that control characters are displayed using the
           following, lower-case, names. Characters greater than 0xff, hexadecimal, are displayed as
           hexadecimal strings.

@ch-perry ch-perry merged commit e2f0912 into savoirfairelinux:master Nov 20, 2023
@markfeathers markfeathers deleted the filter-junitxml-chars branch May 3, 2024 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants