Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grep and sed stopped handling non-ASCII patterns #6429

Closed
bigConifer opened this issue Feb 20, 2021 · 14 comments · Fixed by #8069
Closed

grep and sed stopped handling non-ASCII patterns #6429

bigConifer opened this issue Feb 20, 2021 · 14 comments · Fixed by #8069
Labels
bug report Something is not working properly

Comments

@bigConifer
Copy link

bigConifer commented Feb 20, 2021

Problem description
When running grep or sed with patterns containing a non-ASCII character, then no matches will be found in the text.
The same issue was reported in #6401 last week.
So I've checked old versions and found that this bug was introduced in termux v0.92 (almost exactly one year ago).
In prior versions it worked as expected!

Steps to reproduce



I've executed these commands each in a clean Termux without upgrading the packages (resp. only with perl installed).

Expected behavior
In other environments grep and sed have no problems with non-ASCII characters inside regular expressions.
Workarounds using perl or perl-regexp are normally not necessary and in some cases not feasible at all.

Additional information

LANG=en_US.UTF-8

Packages architecture: 
arm
Android version: 
9
@bigConifer
Copy link
Author

bigConifer commented Feb 20, 2021

Maybe it was caused by a commit like this 970391d (build-essential: keep bare minimum in dependencies & move others to recommends).

@ghost
Copy link

ghost commented Feb 20, 2021

Maybe it was caused by a commit like this 970391d (build-essential: keep bare minimum in dependencies & move others to recommends).

Nothing to do with build-essential package. This package and its dependencies are not involved into cross-compilation.

@ghost ghost added the bug report Something is not working properly label Feb 20, 2021
@landfillbaby
Copy link
Member

it's a gnulib issue... looking into it...

@xtkoba
Copy link
Contributor

xtkoba commented Oct 10, 2021

This looks like a duplicate of #5171.

@xtkoba
Copy link
Contributor

xtkoba commented Oct 10, 2021

FYI, perl (#6341) and lftp (#5479) reportedly also have issues with handling non-ASCII characters. I have no idea how much they are related to this issue here though.

@bigConifer
Copy link
Author

This looks like a duplicate of #5171

Yeah, you're right. However, this here is more an attempt to narrow down the source of the bug to a change between Termux release v0.90 and v0.92.
When you run the bootstrapped grep and sed binaries from Termux v0.90, they still work like expected regarding the UTF-8 pattern handling.
So I would think that either the January 2020 upgrade of the upstream sources (636fe8d, 30119e1) breaks something that needs to be addressed by a patch, or there is a change to the build environment, that is responsible for the bug (https://github.com/termux/termux-packages/commits/master?branch=master&path[]=scripts&since=2019-10-10&until=2020-02-08).

@bigConifer
Copy link
Author

@xeffyr, @landfillbaby, not sure if this one is related, but what about b20be8a (docker image: install locales and configure en_US.UTF-8 as default)?
Is there a reason, why the locale was changed back and forth (8818368, 65512d6)?

@bigConifer
Copy link
Author

@xtkoba, I think at least #6341 can be closed if no one else is able to reproduce it.

@landfillbaby
Copy link
Member

landfillbaby commented Oct 10, 2021 via email

@ghost
Copy link

ghost commented Oct 10, 2021

not sure if this one is related, but what about b20be8a (docker image: install locales and configure en_US.UTF-8 as default)?

Warnings about unconfigured locale when running some utilities. Nothing to do with package compilation and build environment locales do not affect Termux packages anyway.

Is there a reason, why the locale was changed back and forth (8818368, 65512d6)?

I don't know what was happening in 2016, but most likely dropping "unneeded" steps in Dockerfile. Locales on their own aren't needed if you don't use Docker image for something more than just running package compiler.

@SDRausty

This comment has been minimized.

SDRausty referenced this issue in shlibs/shlibs.sh Oct 10, 2021
SDRausty referenced this issue in shlibs/shlibs.sh Oct 10, 2021
SDRausty referenced this issue in shlibs/shlibs.sh Oct 10, 2021
SDRausty referenced this issue in shlibs/shlibs.sh Oct 10, 2021
	deleted:    .scripts/maintenance/upr.sh
	deleted:    .scripts/maintenance/vgen.sh
	deleted:    buildAPKs/maintenance/.up.sh.swp
SDRausty referenced this issue in shlibs/shlibs.sh Oct 10, 2021
SDRausty referenced this issue in shlibs/shlibs.sh Oct 10, 2021
SDRausty referenced this issue in shlibs/shlibs.sh Oct 10, 2021
SDRausty referenced this issue in shlibs/shlibs.sh Oct 10, 2021
SDRausty referenced this issue in shlibs/shlibs.sh Oct 10, 2021
SDRausty referenced this issue in shlibs/shlibs.sh Oct 10, 2021
@SDRausty

This comment has been minimized.

@stale
Copy link

stale bot commented Nov 26, 2021

This issue/PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix Issue won't be fixed label Nov 26, 2021
@xtkoba
Copy link
Contributor

xtkoba commented Nov 26, 2021

I have noticed that this behavior disappears from sed when configured with --host=aarch64-linux (instead of --host=aarch64-linux-android).

There are several confvars that may be guessed differently for *-android host.

For sed:

$ grep -A 1 -- -android sed-4.8/configure | grep -v "# Guess "
            linux*-android*) gl_cv_func_memchr_works="guessing no" ;;
--
           *-android*)    gl_cv_func_ungetc_works="guessing yes" ;;
--
          *-android*)    gl_cv_pthread_rwlock_rdlock_prefer_writer="guessing no" ;;
--
              linux*-android*) gl_cv_func_wcrtomb_works="guessing no";;
--
           *-android*) # implemented using dup3(), which fails if oldfd == newfd
             gl_cv_func_dup2_works="guessing no" ;;
--
              linux*-android*) gl_cv_func_setlocale_works="guessing no";;

For grep:

$ grep -A 1 -- -android grep-3.7/configure | grep -v "# Guess "
          linux*-android*) gl_cv_func_memchr_works="guessing no" ;;
--
           linux*-android*)      gl_cv_func_snprintf_retval_c99="guessing yes";;
--
           linux*-android*)      gl_cv_func_snprintf_truncation_c99="guessing yes";;
--
         *-android*) # implemented using dup3(), which fails if oldfd == newfd
           gl_cv_func_dup2_works="guessing no" ;;
--
          *-android*)    gl_cv_pthread_rwlock_rdlock_prefer_writer="guessing no" ;;
--
              linux*-android*) gl_cv_func_wcrtomb_works="guessing no";;
--
              linux*-android*) gl_cv_func_setlocale_works="guessing no";;
--
           linux*-android*) gl_cv_func_snprintf_size1="guessing yes" ;;
--
           linux*-android*) gl_cv_func_printf_positions="guessing yes";;

I guess some of them affect the behavior. I'm going to narrow them down.

@stale stale bot removed the wontfix Issue won't be fixed label Nov 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug report Something is not working properly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants