Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux: Encoding problems with non-uft8 locale #6287

Closed
fmkaiser opened this issue Jan 9, 2018 · 9 comments
Closed

Linux: Encoding problems with non-uft8 locale #6287

fmkaiser opened this issue Jan 9, 2018 · 9 comments
Assignees
Labels
p4-low Low priority ReadyToTest QA, please validate the fix/enhancement type:bug
Milestone

Comments

@fmkaiser
Copy link

fmkaiser commented Jan 9, 2018

Expected behaviour

Files with special characters in the name are synchronized correctly.

Actual behaviour

With the 2.4.0 Linux ownCloud client and locale set to LANG=C,
objects on the server with special characters such as umlauts in the name
get downloaded with the special character replaced by a question mark (?)
and then renamed on the server on the next sync as well.

That's especially bad if the files are in a folder shared with you by someone else...

Steps to reproduce

  1. Start ownCloud client v2.4 under Linux with

LANG=C owncloud --logwindow

  1. Create a folder with some special characters on the server
  2. Watch the sync log

Server configuration

Doesn't seem to matter much, tested with ownCloud 9.0.8 and 10.0.3.

Client configuration

Client version: 2.4.0 - Problem did not occur with 2.3.3

Operating system: openSUSE Leap 42.3

OS language: LANG=C

Qt version used by client package (Linux only, see also Settings dialog): Qt 5.6.2

Client package: owncloud-client-2.4.0-8911.1.x86_64 (ownCloud)

Logs

Client log - the interesting parts:

# filename is "täßt.txt" - apologies to all German speakers for the spelling
01-09 10:50:43:896 [ info sync.accessmanager ]: 6 "PROPFIND" "https://staging.sis.rzg.mpg.de/remote.php/dav/files/test-fek/umlauttest" has X-Request-ID "2576f085-78e2-49b1-a59e-bb6aa77a3063"
01-09 10:50:44:031 [ info sync.csync.updater ]: file: umlauttest/t??t.txt, instruction: INSTRUCTION_NEW <<=
01-09 10:50:44:202 [ info sync.database ]: Updating file record for path: "umlauttest/t\xC3\xA4\xC3\x9Ft.txt" inode: 107843249
01-09 10:50:46:381 [ info sync.propagator ]: Starting INSTRUCTION_RENAME propagation of "umlauttest/t??t.txt" by OCC::PropagateRemoteMove(0x56329b1622f0)
01-09 10:50:46:382 [ info sync.accessmanager ]: 6 "MOVE" "https://staging.sis.rzg.mpg.de/remote.php/dav/files/test-fek/umlauttest/t??t.txt" has X-Request-ID "e6c9de7a-b8ce-4e73-8bb5-a7c0ceeb55c7"

Full log: https://gist.github.com/fmkaiser/4f18c2b75b042c1f9d728be87c2f2e07

@ckamm ckamm self-assigned this Jan 10, 2018
@ckamm ckamm added this to the 2.4.1 milestone Jan 10, 2018
@ckamm
Copy link
Contributor

ckamm commented Jan 10, 2018

Thank you for the report. I'm attempting to reproduce it now. Possibly related to this change: 72809ef or bf2b089

@ckamm ckamm added type:bug sev1-critical p2-high Escalation, on top of current planning, release blocker labels Jan 10, 2018
@ckamm
Copy link
Contributor

ckamm commented Jan 10, 2018

Yep, can reproduce!

@ckamm
Copy link
Contributor

ckamm commented Jan 10, 2018

It's neither of the two commits mentioned.

It looks like the remote file "tö" gets inserted into the database as "t\xC3\xB6" (which is utf8 for tö, and correct) and stored on the filesystem as "t?". Then the next sync reads it notices the local and db name are different and propagates a rename to the server.

@ckamm
Copy link
Contributor

ckamm commented Jan 10, 2018

I don't yet see why this would be a new issue in 2.4: we pass the correct path to QFile::rename (I'm testing by renaming a remote file). It gets a "t\xC3\xB6" but what arrives on the filesystem is a "t?". In fact QFile::encodeName("tö") == "t?" - and it should have been like this for 2.3 too. I'll test that now.

@ckamm
Copy link
Contributor

ckamm commented Jan 10, 2018

Yes, same behavior with the 2.3 branch.

@ckamm ckamm added sev2-high and removed p2-high Escalation, on top of current planning, release blocker sev1-critical labels Jan 10, 2018
@ckamm
Copy link
Contributor

ckamm commented Jan 10, 2018

I think we should ignore files that can't be encoded for the filesystem without producing invalid characters, like what @ogoffart intended with 72809ef

@ckamm
Copy link
Contributor

ckamm commented Jan 10, 2018

We have tracked this down to a bug in QTextCodec::canEncode and will add a workaround for 2.4.1. Possibly upstream bug https://bugreports.qt.io/browse/QTBUG-6925.

ckamm added a commit that referenced this issue Jan 10, 2018
There's an upstream bug where QTextCodec::canEncode returns true even
though it should be false. This works around that issue and adds a test.

The original work was done in 72809ef

See #6287, #5676, #5719
See https://bugreports.qt.io/browse/QTBUG-6925
ckamm added a commit that referenced this issue Jan 10, 2018
There's an upstream bug where QTextCodec::canEncode returns true even
though it should be false. This works around that issue and adds a test.

The original work was done in 72809ef

See #6287, #5676, #5719
See https://bugreports.qt.io/browse/QTBUG-6925
ckamm added a commit that referenced this issue Jan 10, 2018
There's an upstream bug where QTextCodec::canEncode returns true even
though it should be false. This works around that issue and adds a test.

The original work was done in 72809ef

See #6287, #5676, #5719
See https://bugreports.qt.io/browse/QTBUG-6925
@ckamm ckamm added the ReadyToTest QA, please validate the fix/enhancement label Jan 10, 2018
@fmkaiser
Copy link
Author

@ckamm Thanks for the quick response!

Maybe the local state database is rebuilt from the server in 2.4?
The question marks in the file names definitely appeared after the upgrade to 2.4 (from 2.3.3).
However the files with the umlauts have been in my workspace for a while and were likely downloaded with sync client < 2.3. The 2.3 client probably didn't touch them as long as they weren't changed on the server...

@guruz guruz changed the title Encoding problems with non-uft8 locale Linux: Encoding problems with non-uft8 locale Feb 12, 2018
@felixboehm
Copy link
Contributor

please test and reopen if there is still an issue. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p4-low Low priority ReadyToTest QA, please validate the fix/enhancement type:bug
Projects
None yet
Development

No branches or pull requests

3 participants