Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix import errors with unicode filenames #830

Merged
merged 8 commits into from
Sep 18, 2013
Merged

Conversation

jkarni
Copy link

@jkarni jkarni commented Aug 29, 2013

@nedbat @singingwolfboy - Despite Ned's helpful lesson, I've no clue whether this is the right way about things, but courses with unicode file names can't be imported even though there isn't (to the best of my knowledge) a reason why for our not supporting them.

@jkarni
Copy link
Author

jkarni commented Aug 29, 2013

@chrisndodge This seems like it's the same issue you were looking into [STUD-680]. With these changes I can import courses with unicode filenames, but I'm worried something else might go wrong.

@@ -31,7 +31,7 @@ def import_static_content(modules, course_loc, course_data_path, static_content_
try:
content_path = os.path.join(dirname, filename)
if verbose:
log.debug('importing static content {0}...'.format(content_path))
log.debug('importing static content {0}...'.format(content_path.encode("utf-8")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better as: log.debug('importing static content %s...', content_path)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Isn't using str.format() preferable to using the %s old-style string formatting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best practice for log.debug and its ilk is to defer the formatting to the logging module itself. It accepts %-style formatting. In this case, it would also solve the unicode issues without an explicit encode on our part.

@chrisndodge
Copy link
Contributor

Please add some test files with unicode characters to common/test/data test courses. In particular for courses that do an imprt/export/reimport paths. Thx.

I think your suspicion is likely correct in that this is probably a partial fix. Ultimately I think we need to make Location be urlencoded rather than doing the underscore hack.

@jkarni
Copy link
Author

jkarni commented Aug 30, 2013

Yeah, I think this might be enough to get the import to stop falling, but
likely not enough for the assets and pages to be accessible.
On Aug 30, 2013 3:04 PM, "chrisndodge" notifications@github.com wrote:

Please add some test files with unicode characters to common/test/data
test courses. In particular for courses that do an imprt/export/reimport
paths. Thx.

I think your suspicion is likely correct in that this is probably a
partial fix. Ultimately I think we need to make Location be urlencoded
rather than doing the underscore hack.


Reply to this email directly or view it on GitHubhttps://github.com/edx/edx-platform/pull/830#issuecomment-23582712
.

@jkarni
Copy link
Author

jkarni commented Sep 3, 2013

Wrote some tests cases for various potential unicode issues. I'm finding a number of other similar or related problems ; I guess we shouldn't merge a partial fix if it might mean unknown behavior later, so this probably shouldn't be merged.

@jkarni
Copy link
Author

jkarni commented Sep 4, 2013

@nedbat @chrisndodge Added some tests that check that courses with unicode filenames still get imported, even if some files don't. And if we decide to surface the errors/exceptions (with or without backing off of import), the exception throwing and catching stuff now works.

@chrisndodge
Copy link
Contributor

Hey, thanks for the sample course content & tests. Hate to be a nit-pick, but it seems like you just copied the toy course and added just a bit of stuff, but most of that course has nothing to do with unicode stuff. Can I suggest you prune away some of the non-relevant content in the "unicode" test course?

@jkarni
Copy link
Author

jkarni commented Sep 4, 2013

Sure. Only thing I kind of wanted was for there to be non-unicode filenames
loaded and tested for after exceptions were raised for unicode filename. I
can try to figure out the load order so I can prune the course and still
get that.
On Sep 4, 2013 1:52 PM, "chrisndodge" notifications@github.com wrote:

Hey, thanks for the sample course content & tests. Hate to be a nit-pick,
but it seems like you just copied the toy course and added just a bit of
stuff, but most of that course has nothing to do with unicode stuff. Can I
suggest you prune away some of the non-relevant content in the "unicode"
test course?


Reply to this email directly or view it on GitHubhttps://github.com/edx/edx-platform/pull/830#issuecomment-23809740
.

@jkarni
Copy link
Author

jkarni commented Sep 4, 2013

Build failing since diff-cover/diff-quality don't currently support unicode filenames. Opened a PR for diff-cover, and Will is reviewing/updating Jenkins, but merge will have to wait for that.

@@ -3,6 +3,7 @@
import mimetypes
from path import path

from nose.tools import set_trace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import is no longer necessary

@jkarni
Copy link
Author

jkarni commented Sep 10, 2013

@chrisndodge @nedbat Made the fixes and cleaned up the test course - let me know if there are other issues or if this is good to merge

@chrisndodge
Copy link
Contributor

@jkarni just noticing that the build failed... :-(

@cahrens
Copy link

cahrens commented Sep 11, 2013

@chrisndodge It only failed because of the LTI known issue (with the version of oathlib). I'll kick off a manual run.

"""
return courses[2].tabs[index]['name']
return courses[4].tabs[index]['name']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of indexing is always going to break as we add more test courses. I wonder if we should just add a helpful method such as find_test_course_by_name(courses, 'toy').

This is a nice-to-have....

@chrisndodge
Copy link
Contributor

I believe @brianhw also did some unicode compatibility work. Do you want to brush this by him as well?

@brianhw
Copy link
Contributor

brianhw commented Sep 12, 2013

I don't think I have much to add to this -- looks good.

assert_equals(courses[1].id, 'edX/simple_with_draft/2012_Fall')
assert_equals(courses[2].id, 'edX/test_import_course/2012_Fall')
assert_equals(courses[3].id, 'edX/toy/2012_Fall')
assert_equals(len(courses), 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the list of courses in the initdb method contain four courses ('toy', 'simple', 'simple_with_draft', and 'test_unicode') but this test expects there to be five? Can we calculate what the number should be in the initdb method and save it somewhere, rather than hardcoding it into this test? (Is that even a good idea?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fifth course gets imported in a separate call to import_from_xml since its params are different.

Not sure about calculating the number: we obviously shouldn't just do len(courses) since that's what we're trying to test. We could create a wrapper around import_from_xml that adds to a global object; but that wouldn't work if the import_from_xml call is supposed to fail...

@singingwolfboy
Copy link
Contributor

Aside from my one comment about hardcoding the test value, this looks good to me.


def get_course_by_id(self, name):
""" Utility function that returns the first course with id `name`, or
None if there are none.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the docstring formatting guidelines here: https://edx-wiki.atlassian.net/wiki/display/ENG/Python+Guidelines

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our codebase I most commonly see the first line start at the same line as the triple-quotes, but the link suggests otherwise. Does it matter? If so, which way do I go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do it the way the wiki page shows. We need to be more consistent, it's true.

@chrisndodge
Copy link
Contributor

+1 from me

jkarni pushed a commit that referenced this pull request Sep 18, 2013
Fix import errors with unicode filenames
@jkarni jkarni merged commit 0e2d833 into master Sep 18, 2013
@jkarni jkarni deleted the jkarni/fix/unicode-import branch September 18, 2013 14:23
pomegranited pushed a commit to open-craft/edx-platform that referenced this pull request Aug 7, 2017
Fix MCKIN-5419 Group Work v2 Installation Issue
andrey-canon pushed a commit to eduNEXT/edx-platform that referenced this pull request Jul 6, 2018
kluo pushed a commit to kluo/edx-platform that referenced this pull request Oct 20, 2018
* origin/html-template/grade-me-button:
  Remove HTML template version of the GradeMe Button
Sujeet1379 pushed a commit to chandrudev/edx-platform that referenced this pull request Nov 17, 2022
…lment tracks (openedx#828)" (openedx#830)

This reverts commit 45d5141.

Co-authored-by: Simon Chen <schen@edX-C02FW0GUML85.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants