Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix import errors with unicode filenames #830

Merged
merged 8 commits into from
Sep 18, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cms/djangoapps/contentstore/views/import_export.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ def wfile(filename, dirname):
sf.write("Extracting")

tar_file = tarfile.open(temp_filepath)
tar_file.extractall(course_dir + '/')
tar_file.extractall((course_dir + '/').encode('utf-8'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what people will expect for a directory name from a non-ascii course title? This should prevent errors, but may still produce odd results.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean non-unicode dirnames will raise issues elsewhere? I think that's probably true, but presumably we should fix those issues too.


with open(status_file, 'w+') as sf:
sf.write("Verifying")
Expand Down
52 changes: 42 additions & 10 deletions common/lib/xmodule/xmodule/modulestore/tests/test_mongo.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,10 @@
# pylint: disable=E0611
from nose.tools import assert_equals, assert_raises, \
assert_not_equals, assert_false
from itertools import ifilter
# pylint: enable=E0611
import pymongo
import logging
from uuid import uuid4

from xblock.fields import Scope
Expand All @@ -19,6 +21,7 @@

from xmodule.modulestore.tests.test_modulestore import check_path_to_location

log = logging.getLogger(__name__)

HOST = 'localhost'
PORT = 27017
Expand Down Expand Up @@ -59,7 +62,7 @@ def initdb():
#
draft_store = DraftModuleStore(HOST, DB, COLLECTION, FS_ROOT, RENDER_TEMPLATE, default_class=DEFAULT_CLASS)
# Explicitly list the courses to load (don't want the big one)
courses = ['toy', 'simple', 'simple_with_draft']
courses = ['toy', 'simple', 'simple_with_draft', 'test_unicode']
import_from_xml(store, DATA_DIR, courses, draft_store=draft_store, static_content_store=content_store)

# also test a course with no importing of static content
Expand All @@ -86,6 +89,19 @@ def setUp(self):
def tearDown(self):
pass

def get_course_by_id(self, name):
"""
Returns the first course with `id` of `name`, or `None` if there are none.
"""
courses = self.store.get_courses()
return next(ifilter(lambda x: x.id == name, courses), None)

def course_with_id_exists(self, name):
"""
Returns true iff there exists some course with `id` of `name`.
"""
return (self.get_course_by_id(name) is not None)

def test_init(self):
'''Make sure the db loads, and print all the locations in the db.
Call this directly from failing tests to see what is loaded'''
Expand All @@ -100,12 +116,12 @@ def test_mongo_modulestore_type(self):
def test_get_courses(self):
'''Make sure the course objects loaded properly'''
courses = self.store.get_courses()
assert_equals(len(courses), 4)
courses.sort(key=lambda c: c.id)
assert_equals(courses[0].id, 'edX/simple/2012_Fall')
assert_equals(courses[1].id, 'edX/simple_with_draft/2012_Fall')
assert_equals(courses[2].id, 'edX/test_import_course/2012_Fall')
assert_equals(courses[3].id, 'edX/toy/2012_Fall')
assert_equals(len(courses), 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the list of courses in the initdb method contain four courses ('toy', 'simple', 'simple_with_draft', and 'test_unicode') but this test expects there to be five? Can we calculate what the number should be in the initdb method and save it somewhere, rather than hardcoding it into this test? (Is that even a good idea?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fifth course gets imported in a separate call to import_from_xml since its params are different.

Not sure about calculating the number: we obviously shouldn't just do len(courses) since that's what we're trying to test. We could create a wrapper around import_from_xml that adds to a global object; but that wouldn't work if the import_from_xml call is supposed to fail...

assert self.course_with_id_exists('edX/simple/2012_Fall')
assert self.course_with_id_exists('edX/simple_with_draft/2012_Fall')
assert self.course_with_id_exists('edX/test_import_course/2012_Fall')
assert self.course_with_id_exists('edX/test_unicode/2012_Fall')
assert self.course_with_id_exists('edX/toy/2012_Fall')

def test_loads(self):
assert_not_equals(
Expand All @@ -120,6 +136,22 @@ def test_loads(self):
self.store.get_item("i4x://edX/toy/video/Welcome"),
None)

def test_unicode_loads(self):
assert_not_equals(
self.store.get_item("i4x://edX/test_unicode/course/2012_Fall"),
None)
# All items with ascii-only filenames should load properly.
assert_not_equals(
self.store.get_item("i4x://edX/test_unicode/video/Welcome"),
None)
assert_not_equals(
self.store.get_item("i4x://edX/test_unicode/video/Welcome"),
None)
assert_not_equals(
self.store.get_item("i4x://edX/test_unicode/chapter/Overview"),
None)


def test_find_one(self):
assert_not_equals(
self.store._find_one(Location("i4x://edX/toy/course/2012_Fall")),
Expand Down Expand Up @@ -153,15 +185,15 @@ def test_get_courses_has_no_templates(self):
)

def test_static_tab_names(self):
courses = self.store.get_courses()

def get_tab_name(index):
"""
Helper function for pulling out the name of a given static tab.

Assumes the information is desired for courses[1] ('toy' course).
Assumes the information is desired for courses[4] ('toy' course).
"""
return courses[2].tabs[index]['name']
course = self.get_course_by_id('edX/toy/2012_Fall')
return course.tabs[index]['name']

# There was a bug where model.save was not getting called after the static tab name
# was set set for tabs that have a URL slug. 'Syllabus' and 'Resources' fall into that
Expand Down
10 changes: 6 additions & 4 deletions common/lib/xmodule/xmodule/modulestore/xml.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ def fallback_name(orig_name=None):
# Didn't load properly. Fall back on loading as an error
# descriptor. This should never error due to formatting.

msg = "Error loading from xml. " + str(err)[:200]
msg = "Error loading from xml. " + unicode(err)[:200]
log.warning(msg)
# Normally, we don't want lots of exception traces in our logs from common
# content problems. But if you're debugging the xml loading code itself,
Expand Down Expand Up @@ -317,7 +317,8 @@ def try_load_course(self, course_dir):
try:
course_descriptor = self.load_course(course_dir, errorlog.tracker)
except Exception as e:
msg = "ERROR: Failed to load course '{0}': {1}".format(course_dir, str(e))
msg = "ERROR: Failed to load course '{0}': {1}".format(course_dir.encode("utf-8"),
unicode(e))
log.exception(msg)
errorlog.tracker(msg)

Expand Down Expand Up @@ -493,8 +494,9 @@ def _load_extra_content(self, system, course_descriptor, category, path, course_
module.save()
self.modules[course_descriptor.id][module.location] = module
except Exception, e:
logging.exception("Failed to load {0}. Skipping... Exception: {1}".format(filepath, str(e)))
system.error_tracker("ERROR: " + str(e))
logging.exception("Failed to load %s. Skipping... \
Exception: %s", filepath, unicode(e))
system.error_tracker("ERROR: " + unicode(e))

def get_instance(self, course_id, location, depth=0):
"""
Expand Down
2 changes: 1 addition & 1 deletion common/lib/xmodule/xmodule/modulestore/xml_importer.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def import_static_content(modules, course_loc, course_data_path, static_content_
try:
content_path = os.path.join(dirname, filename)
if verbose:
log.debug('importing static content {0}...'.format(content_path))
log.debug('importing static content %s...', content_path)

fullname_with_subpath = content_path.replace(static_dir, '') # strip away leading path from the name
if fullname_with_subpath.startswith('/'):
Expand Down
2 changes: 1 addition & 1 deletion common/lib/xmodule/xmodule/seq_module.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ def definition_from_xml(cls, xml_object, system):
except Exception as e:
log.exception("Unable to load child when parsing Sequence. Continuing...")
if system.error_tracker is not None:
system.error_tracker("ERROR: " + str(e))
system.error_tracker("ERROR: " + unicode(e))
continue
return {}, children

Expand Down
26 changes: 26 additions & 0 deletions common/lib/xmodule/xmodule/tests/test_import.py
Original file line number Diff line number Diff line change
Expand Up @@ -368,6 +368,32 @@ def test_colon_in_url_name(self):
html = modulestore.get_instance(course_id, loc)
self.assertEquals(html.display_name, "Toy lab")

def test_unicode(self):
"""Check that courses with unicode characters in filenames and in
org/course/name import properly. Currently, this means: (a) Having
files with unicode names does not prevent import; (b) if files are not
loaded because of unicode filenames, there are appropriate
exceptions/errors to that effect."""

print("Starting import")
modulestore = XMLModuleStore(DATA_DIR, course_dirs=['test_unicode'])
courses = modulestore.get_courses()
self.assertEquals(len(courses), 1)
course = courses[0]

print("course errors:")

# Expect to find an error/exception about characters in "®esources"
expect = "Invalid characters in '®esources'"
errors = [(msg.encode("utf-8"), err.encode("utf-8"))
for msg, err in
modulestore.get_item_errors(course.location)]

self.assertTrue(any(expect in msg or expect in err
for msg, err in errors))
chapters = course.get_children()
self.assertEqual(len(chapters), 3)

def test_url_name_mangling(self):
"""
Make sure that url_names are only mangled once.
Expand Down
1 change: 1 addition & 0 deletions common/test/data/test_unicode/about/end_date.html
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
TBD
8 changes: 8 additions & 0 deletions common/test/data/test_unicode/chapter/simple_html.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<chapter display_name="åñ html ƒile">
<sequential display_name="åñ html ƒile">
<html display_name="åñ html ƒile">
<p> This is upside down text:</p>
<p>˙ʇxəʇ uʍop əpısdn sı sıɥʇ</p>
</html>
</sequential>
</chapter>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loving it!

How did you get upside down text?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://www.sunnyneo.com/upsidedowntext.php?word=This+is+a+story+all+about+how%252C+my+life+got+flipped%252C+turned+upside+down.

seems like a good solution to the issue @nedbat - we should be able to tell whether the unicode characters are right.

1 change: 1 addition & 0 deletions common/test/data/test_unicode/course.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<course org="edX" course="test_unicode" url_name="2012_Fall"/>
8 changes: 8 additions & 0 deletions common/test/data/test_unicode/course/2012_Fall.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<course>
<chapter display_name="Overview">
<video url_name="Welcome" youtube_id_1_0="p2Q6BrNhdh8" display_name="Welcome"/>
</chapter>
<chapter url_name="sîmple_√ideo"/>
<chapter url_name="simple_html"/>
<chapter url_name="vertical_container"/>
</course>
23 changes: 23 additions & 0 deletions common/test/data/test_unicode/policies/2012_Fall.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"course/2012_Fall": {
"graceperiod": "2 days 5 hours 59 minutes 59 seconds",
"start": "2015-07-17T12:00",
"display_name": "Üñîçø∂e †es† Course",
"graded": "true",
"tabs": [
{"type": "courseware"},
{"type": "course_info", "name": "Course Info"},
{"type": "static_tab", "url_slug": "syllabus", "name": "ßyllabus"},
{"type": "static_tab", "url_slug": "resources", "name": "®esources"},
{"type": "discussion", "name": "∂iscussion"},
{"type": "wiki", "name": "∑iki"},
{"type": "progress", "name": "πrogress"}
]
},
"chapter/Overview": {
"display_name": "O√erview"
},
"video/Welcome": {
"display_name": "Welcome"
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<sequential>
<vertical filename="vertical_test" slug="vertical_test" />
<html slug="unicode">…</html>
</sequential>
10 changes: 10 additions & 0 deletions common/test/data/test_unicode/static/unicø∂é.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@

Upside down unicode text (http://www.sunnyneo.com/upsidedowntext.php?)

˙ʇʇɐld uoɯəl plos əɥs ˙pəʌıl əuɹʎq ʎʇʇəq əɹəɥʍ pɐoɹ əɥʇ uʍop əɯɐɔ ʍoɔ-ooɯ əɥʇ
˙ooʞɔnʇ ʎqɐq sɐʍ əɥ ˙əɔɐɟ ʎɹıɐɥ ɐ pɐɥ əɥ :ssɐlƃ ɐ ɥƃnoɹɥʇ ɯıɥ ʇɐ pəʞool ɹəɥʇɐɟ
sıɥ :ʎɹoʇs ʇɐɥʇ ɯıɥ ploʇ ɹəɥʇɐɟ sıɥ

˙˙˙ooʞɔnʇ ʎqɐq pəɯɐu ʎoq əlʇʇıl suəɔıu ɐ ʇəɯ pɐoɹ əɥʇ ƃuolɐ uʍop ƃuıɯoɔ sɐʍ ʇɐɥʇ
ʍoɔ-ooɯ sıɥʇ puɐ pɐoɹ əɥʇ ƃuolɐ uʍop ƃuıɯoɔ ʍoɔ-ooɯ ɐ sɐʍ əɹəɥʇ sɐʍ ʇı əɯıʇ pooƃ
ʎɹəʌ ɐ puɐ əɯıʇ ɐ uodn əɔuo
1 change: 1 addition & 0 deletions common/test/data/test_unicode/tabs/®esources.html
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<div>resources!</div>
3 changes: 3 additions & 0 deletions common/test/data/test_unicode/vertical/vertical_test.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<vertical>
<video url_name="separate_file_video"/>
</vertical>
2 changes: 1 addition & 1 deletion requirements/edx/github.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@
# Our libraries:
-e git+https://github.com/edx/XBlock.git@aa0d60627#egg=XBlock
-e git+https://github.com/edx/codejail.git@0a1b468#egg=codejail
-e git+https://github.com/edx/diff-cover.git@v0.2.2#egg=diff_cover
-e git+https://github.com/edx/diff-cover.git@v0.2.3#egg=diff_cover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, why the new dependency version?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because diff-cover had an issue with diffing unicode filenames (the regex that was being used would break); I sent a PR for that and it's been merged with a version bump.

-e git+https://github.com/edx/js-test-tool.git@v0.0.7#egg=js_test_tool