Fix import errors with unicode filenames #830

jkarni · 2013-08-29T20:47:37Z

@nedbat @singingwolfboy - Despite Ned's helpful lesson, I've no clue whether this is the right way about things, but courses with unicode file names can't be imported even though there isn't (to the best of my knowledge) a reason why for our not supporting them.

jkarni · 2013-08-29T20:52:10Z

@chrisndodge This seems like it's the same issue you were looking into [STUD-680]. With these changes I can import courses with unicode filenames, but I'm worried something else might go wrong.

nedbat · 2013-08-30T13:46:20Z

common/lib/xmodule/xmodule/modulestore/xml_importer.py

@@ -31,7 +31,7 @@ def import_static_content(modules, course_loc, course_data_path, static_content_
            try:
                content_path = os.path.join(dirname, filename)
                if verbose:
-                    log.debug('importing static content {0}...'.format(content_path))
+                    log.debug('importing static content {0}...'.format(content_path.encode("utf-8")))


This would be better as: log.debug('importing static content %s...', content_path)

Why? Isn't using str.format() preferable to using the %s old-style string formatting?

Best practice for log.debug and its ilk is to defer the formatting to the logging module itself. It accepts %-style formatting. In this case, it would also solve the unicode issues without an explicit encode on our part.

chrisndodge · 2013-08-30T19:04:23Z

Please add some test files with unicode characters to common/test/data test courses. In particular for courses that do an imprt/export/reimport paths. Thx.

I think your suspicion is likely correct in that this is probably a partial fix. Ultimately I think we need to make Location be urlencoded rather than doing the underscore hack.

jkarni · 2013-08-30T19:18:26Z

Yeah, I think this might be enough to get the import to stop falling, but
likely not enough for the assets and pages to be accessible.
On Aug 30, 2013 3:04 PM, "chrisndodge" notifications@github.com wrote:

Please add some test files with unicode characters to common/test/data
test courses. In particular for courses that do an imprt/export/reimport
paths. Thx.

I think your suspicion is likely correct in that this is probably a
partial fix. Ultimately I think we need to make Location be urlencoded
rather than doing the underscore hack.

—
Reply to this email directly or view it on GitHubhttps://github.com/edx/edx-platform/pull/830#issuecomment-23582712
.

jkarni · 2013-09-03T17:46:05Z

Wrote some tests cases for various potential unicode issues. I'm finding a number of other similar or related problems ; I guess we shouldn't merge a partial fix if it might mean unknown behavior later, so this probably shouldn't be merged.

jkarni · 2013-09-04T17:39:56Z

@nedbat @chrisndodge Added some tests that check that courses with unicode filenames still get imported, even if some files don't. And if we decide to surface the errors/exceptions (with or without backing off of import), the exception throwing and catching stuff now works.

chrisndodge · 2013-09-04T17:51:38Z

Hey, thanks for the sample course content & tests. Hate to be a nit-pick, but it seems like you just copied the toy course and added just a bit of stuff, but most of that course has nothing to do with unicode stuff. Can I suggest you prune away some of the non-relevant content in the "unicode" test course?

jkarni · 2013-09-04T18:04:31Z

Sure. Only thing I kind of wanted was for there to be non-unicode filenames
loaded and tested for after exceptions were raised for unicode filename. I
can try to figure out the load order so I can prune the course and still
get that.
On Sep 4, 2013 1:52 PM, "chrisndodge" notifications@github.com wrote:

Hey, thanks for the sample course content & tests. Hate to be a nit-pick,
but it seems like you just copied the toy course and added just a bit of
stuff, but most of that course has nothing to do with unicode stuff. Can I
suggest you prune away some of the non-relevant content in the "unicode"
test course?

—
Reply to this email directly or view it on GitHubhttps://github.com/edx/edx-platform/pull/830#issuecomment-23809740
.

jkarni · 2013-09-04T19:26:55Z

Build failing since diff-cover/diff-quality don't currently support unicode filenames. Opened a PR for diff-cover, and Will is reviewing/updating Jenkins, but merge will have to wait for that.

singingwolfboy · 2013-09-05T13:36:16Z

common/lib/xmodule/xmodule/modulestore/xml_importer.py

@@ -3,6 +3,7 @@
 import mimetypes
 from path import path

+from nose.tools import set_trace


This import is no longer necessary

jkarni · 2013-09-10T17:03:20Z

@chrisndodge @nedbat Made the fixes and cleaned up the test course - let me know if there are other issues or if this is good to merge

chrisndodge · 2013-09-11T05:19:51Z

@jkarni just noticing that the build failed... :-(

cahrens · 2013-09-11T15:23:21Z

@chrisndodge It only failed because of the LTI known issue (with the version of oathlib). I'll kick off a manual run.

chrisndodge · 2013-09-12T14:35:08Z

common/lib/xmodule/xmodule/modulestore/tests/test_mongo.py

            """
-            return courses[2].tabs[index]['name']
+            return courses[4].tabs[index]['name']


This type of indexing is always going to break as we add more test courses. I wonder if we should just add a helpful method such as find_test_course_by_name(courses, 'toy').

This is a nice-to-have....

chrisndodge · 2013-09-12T14:39:02Z

I believe @brianhw also did some unicode compatibility work. Do you want to brush this by him as well?

brianhw · 2013-09-12T14:58:34Z

I don't think I have much to add to this -- looks good.

singingwolfboy · 2013-09-12T18:54:32Z

common/lib/xmodule/xmodule/modulestore/tests/test_mongo.py

-        assert_equals(courses[1].id, 'edX/simple_with_draft/2012_Fall')
-        assert_equals(courses[2].id, 'edX/test_import_course/2012_Fall')
-        assert_equals(courses[3].id, 'edX/toy/2012_Fall')
+        assert_equals(len(courses), 5)


Why does the list of courses in the initdb method contain four courses ('toy', 'simple', 'simple_with_draft', and 'test_unicode') but this test expects there to be five? Can we calculate what the number should be in the initdb method and save it somewhere, rather than hardcoding it into this test? (Is that even a good idea?)

The fifth course gets imported in a separate call to import_from_xml since its params are different.

Not sure about calculating the number: we obviously shouldn't just do len(courses) since that's what we're trying to test. We could create a wrapper around import_from_xml that adds to a global object; but that wouldn't work if the import_from_xml call is supposed to fail...

singingwolfboy · 2013-09-12T18:56:58Z

Aside from my one comment about hardcoding the test value, this looks good to me.

nedbat · 2013-09-12T19:00:58Z

common/lib/xmodule/xmodule/modulestore/tests/test_mongo.py

+
+    def get_course_by_id(self, name):
+        """ Utility function that returns the first course with id `name`, or
+        None if there are none.


Please follow the docstring formatting guidelines here: https://edx-wiki.atlassian.net/wiki/display/ENG/Python+Guidelines

In our codebase I most commonly see the first line start at the same line as the triple-quotes, but the link suggests otherwise. Does it matter? If so, which way do I go?

Do it the way the wiki page shows. We need to be more consistent, it's true.

chrisndodge · 2013-09-17T19:51:15Z

+1 from me

Fix import errors with unicode filenames

Fix MCKIN-5419 Group Work v2 Installation Issue

Update sga

* origin/html-template/grade-me-button: Remove HTML template version of the GradeMe Button

…lment tracks (openedx#828)" (openedx#830) This reverts commit 45d5141. Co-authored-by: Simon Chen <schen@edX-C02FW0GUML85.local>

nedbat reviewed Aug 30, 2013
View reviewed changes

singingwolfboy reviewed Sep 5, 2013
View reviewed changes

chrisndodge reviewed Sep 12, 2013
View reviewed changes

singingwolfboy reviewed Sep 12, 2013
View reviewed changes

nedbat reviewed Sep 12, 2013
View reviewed changes

Julian Arni added 5 commits September 12, 2013 15:47

Fix import errors with unicode filenames

1813b22

Add tests for imports with unicode filenames

1cfad39

Fix unicode errors on exceptions and logging

d54b197

Use upside-down english for unicode text

0208506

Remove unnecessary encode

af64621

Julian Arni added 3 commits September 12, 2013 15:48

Bump diff-cover version

9d3395e

Cleaner tests

3e62c5e

Utility functions for finding courses by id

2aaade0

jkarni pushed a commit that referenced this pull request Sep 18, 2013

Merge pull request #830 from edx/jkarni/fix/unicode-import

0e2d833

Fix import errors with unicode filenames

jkarni merged commit 0e2d833 into master Sep 18, 2013

jkarni deleted the jkarni/fix/unicode-import branch September 18, 2013 14:23

pomegranited pushed a commit to open-craft/edx-platform that referenced this pull request Aug 7, 2017

Merge pull request openedx#830 from edx-solutions/MCKIN-5419-fix-gwv2

7dffb09

Fix MCKIN-5419 Group Work v2 Installation Issue

andrey-canon pushed a commit to eduNEXT/edx-platform that referenced this pull request Jul 6, 2018

Merge pull request openedx#830 from eduNEXT/and/sga0.8.2-3

e0b4114

Update sga

kluo pushed a commit to kluo/edx-platform that referenced this pull request Oct 20, 2018

Merge pull request openedx#830

7702f15

* origin/html-template/grade-me-button: Remove HTML template version of the GradeMe Button

baby636 mentioned this pull request Nov 5, 2022

[Snyk] Fix for 1 vulnerabilities baby636/edx-platform#33

Open

baby636 mentioned this pull request Dec 26, 2022

[Snyk] Fix for 1 vulnerabilities baby636/edx-platform#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix import errors with unicode filenames #830

Fix import errors with unicode filenames #830

jkarni commented Aug 29, 2013

jkarni commented Aug 29, 2013

nedbat Aug 30, 2013

singingwolfboy Aug 30, 2013

nedbat Aug 30, 2013

chrisndodge commented Aug 30, 2013

jkarni commented Aug 30, 2013

jkarni commented Sep 3, 2013

jkarni commented Sep 4, 2013

chrisndodge commented Sep 4, 2013

jkarni commented Sep 4, 2013

jkarni commented Sep 4, 2013

singingwolfboy Sep 5, 2013

jkarni commented Sep 10, 2013

chrisndodge commented Sep 11, 2013

cahrens commented Sep 11, 2013

chrisndodge Sep 12, 2013

chrisndodge commented Sep 12, 2013

brianhw commented Sep 12, 2013

singingwolfboy Sep 12, 2013

jkarni Sep 12, 2013

singingwolfboy commented Sep 12, 2013

nedbat Sep 12, 2013

jkarni Sep 12, 2013

nedbat Sep 12, 2013

chrisndodge commented Sep 17, 2013

Fix import errors with unicode filenames #830

Fix import errors with unicode filenames #830

Conversation

jkarni commented Aug 29, 2013

jkarni commented Aug 29, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisndodge commented Aug 30, 2013

jkarni commented Aug 30, 2013

jkarni commented Sep 3, 2013

jkarni commented Sep 4, 2013

chrisndodge commented Sep 4, 2013

jkarni commented Sep 4, 2013

jkarni commented Sep 4, 2013

Choose a reason for hiding this comment

jkarni commented Sep 10, 2013

chrisndodge commented Sep 11, 2013

cahrens commented Sep 11, 2013

Choose a reason for hiding this comment

chrisndodge commented Sep 12, 2013

brianhw commented Sep 12, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

singingwolfboy commented Sep 12, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisndodge commented Sep 17, 2013