-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix import errors with unicode filenames #830
Conversation
@chrisndodge This seems like it's the same issue you were looking into [STUD-680]. With these changes I can import courses with unicode filenames, but I'm worried something else might go wrong. |
@@ -31,7 +31,7 @@ def import_static_content(modules, course_loc, course_data_path, static_content_ | |||
try: | |||
content_path = os.path.join(dirname, filename) | |||
if verbose: | |||
log.debug('importing static content {0}...'.format(content_path)) | |||
log.debug('importing static content {0}...'.format(content_path.encode("utf-8"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be better as: log.debug('importing static content %s...', content_path)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Isn't using str.format()
preferable to using the %s
old-style string formatting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best practice for log.debug and its ilk is to defer the formatting to the logging module itself. It accepts %-style formatting. In this case, it would also solve the unicode issues without an explicit encode on our part.
Please add some test files with unicode characters to common/test/data test courses. In particular for courses that do an imprt/export/reimport paths. Thx. I think your suspicion is likely correct in that this is probably a partial fix. Ultimately I think we need to make Location be urlencoded rather than doing the underscore hack. |
Yeah, I think this might be enough to get the import to stop falling, but
|
Wrote some tests cases for various potential unicode issues. I'm finding a number of other similar or related problems ; I guess we shouldn't merge a partial fix if it might mean unknown behavior later, so this probably shouldn't be merged. |
@nedbat @chrisndodge Added some tests that check that courses with unicode filenames still get imported, even if some files don't. And if we decide to surface the errors/exceptions (with or without backing off of import), the exception throwing and catching stuff now works. |
Hey, thanks for the sample course content & tests. Hate to be a nit-pick, but it seems like you just copied the toy course and added just a bit of stuff, but most of that course has nothing to do with unicode stuff. Can I suggest you prune away some of the non-relevant content in the "unicode" test course? |
Sure. Only thing I kind of wanted was for there to be non-unicode filenames
|
Build failing since diff-cover/diff-quality don't currently support unicode filenames. Opened a PR for diff-cover, and Will is reviewing/updating Jenkins, but merge will have to wait for that. |
@@ -3,6 +3,7 @@ | |||
import mimetypes | |||
from path import path | |||
|
|||
from nose.tools import set_trace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import is no longer necessary
@chrisndodge @nedbat Made the fixes and cleaned up the test course - let me know if there are other issues or if this is good to merge |
@jkarni just noticing that the build failed... :-( |
@chrisndodge It only failed because of the LTI known issue (with the version of oathlib). I'll kick off a manual run. |
""" | ||
return courses[2].tabs[index]['name'] | ||
return courses[4].tabs[index]['name'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type of indexing is always going to break as we add more test courses. I wonder if we should just add a helpful method such as find_test_course_by_name(courses, 'toy').
This is a nice-to-have....
I believe @brianhw also did some unicode compatibility work. Do you want to brush this by him as well? |
I don't think I have much to add to this -- looks good. |
assert_equals(courses[1].id, 'edX/simple_with_draft/2012_Fall') | ||
assert_equals(courses[2].id, 'edX/test_import_course/2012_Fall') | ||
assert_equals(courses[3].id, 'edX/toy/2012_Fall') | ||
assert_equals(len(courses), 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the list of courses in the initdb
method contain four courses ('toy', 'simple', 'simple_with_draft', and 'test_unicode') but this test expects there to be five? Can we calculate what the number should be in the initdb
method and save it somewhere, rather than hardcoding it into this test? (Is that even a good idea?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fifth course gets imported in a separate call to import_from_xml
since its params are different.
Not sure about calculating the number: we obviously shouldn't just do len(courses)
since that's what we're trying to test. We could create a wrapper around import_from_xml
that adds to a global object; but that wouldn't work if the import_from_xml
call is supposed to fail...
Aside from my one comment about hardcoding the test value, this looks good to me. |
|
||
def get_course_by_id(self, name): | ||
""" Utility function that returns the first course with id `name`, or | ||
None if there are none. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow the docstring formatting guidelines here: https://edx-wiki.atlassian.net/wiki/display/ENG/Python+Guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our codebase I most commonly see the first line start at the same line as the triple-quotes, but the link suggests otherwise. Does it matter? If so, which way do I go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do it the way the wiki page shows. We need to be more consistent, it's true.
+1 from me |
Fix import errors with unicode filenames
Fix MCKIN-5419 Group Work v2 Installation Issue
* origin/html-template/grade-me-button: Remove HTML template version of the GradeMe Button
…lment tracks (openedx#828)" (openedx#830) This reverts commit 45d5141. Co-authored-by: Simon Chen <schen@edX-C02FW0GUML85.local>
@nedbat @singingwolfboy - Despite Ned's helpful lesson, I've no clue whether this is the right way about things, but courses with unicode file names can't be imported even though there isn't (to the best of my knowledge) a reason why for our not supporting them.