Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Video editor supports transcripts [FC-0076] #36058

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

ChrisChV
Copy link
Contributor

@ChrisChV ChrisChV commented Dec 25, 2024

Description

  • Add error handler on save video to avoid creating sjson
  • Support transcripts without edx_video_id in definition_to_xml
  • When copying a video from a library to a course: Create a new edx_video_id
  • Save transcripts as static assets in a video in a library when adding a new transcript.
  • Delete transcripts as static assets in a video in a library when deleting transcripts.
  • Support download transcript in a video in a library.
  • Support replace transcript in a video in a library.
  • Support updating transcripts in video in a library.
  • Refactor the code of downloading YouTube transcripts to enable this feature in libraries.
  • Support copy from a library to a course and a course to a library.
  • Which edX user roles will this change impact? "Course Author"

Supporting information

Testing instructions

Follow the testing instructions at: openedx/frontend-app-authoring#1596

Deadline

No rush

Other information

* Add error handler on save video to avoid create sjson
* Support transcripts without edx_video_id in definition_to_xml
@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Dec 25, 2024
@openedx-webhooks
Copy link

openedx-webhooks commented Dec 25, 2024

Thanks for the pull request, @ChrisChV!

This repository is currently maintained by @openedx/wg-maintenance-edx-platform.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.


Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@ChrisChV ChrisChV marked this pull request as draft December 25, 2024 21:16
@ChrisChV ChrisChV changed the title feat: Video editor supports transcripts feat: Video editor supports transcripts [FC-0076] Dec 25, 2024
@mphilbrick211 mphilbrick211 added the FC Relates to an Axim Funded Contribution project label Dec 27, 2024
Copy link
Contributor

@pomegranited pomegranited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ChrisChV , this is working well for the most part, good job dealing with the old transcript code!

But I found a bug with the upstream/downstream syncing, and left a few nits/change requests too.

cms/djangoapps/contentstore/helpers.py Outdated Show resolved Hide resolved
@@ -81,13 +84,17 @@ def link_video_to_component(video_component, user):
edx_video_id = clean_video_id(video_component.edx_video_id)
if not edx_video_id:
edx_video_id = create_external_video(display_name='external video')

if isinstance(video_component.usage_key, UsageKeyV2):
return edx_video_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we're returning early here.. Could you add a comment to clarify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated e4f7c72

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. But I wonder, should we still be calling create_external_video and returning an edx_video_id at all, if it's not going to be saved into the video block? Doesn't that create some stranded video data in VAL?

cms/djangoapps/contentstore/views/transcripts_ajax.py Outdated Show resolved Hide resolved
openedx/core/djangoapps/content_libraries/api.py Outdated Show resolved Hide resolved
Comment on lines 511 to 512
except AttributeError:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this error need to be caught now? Seems a little dangerous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 5685f16

xmodule/video_block/video_handlers.py Outdated Show resolved Hide resolved
xmodule/video_block/video_handlers.py Outdated Show resolved Hide resolved
@@ -10,6 +10,7 @@
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing a bug when I sync a LibraryBlock video with transcripts from an upstream video.

Steps to reproduce:

  1. Create a library video with transcripts (here, I imported them from the example youtube video).
  2. Publish the library video.
  3. Copy it to the clipboard.
  4. Paste into a course.
    Note that the transcripts are displaying fine here.
  5. Re-edit the library video, and replace a transcript. (Here, I replaced the English one, I don't know if replacing others causes the same issue).
  6. Return to the course LibraryBlock, and refresh to see the "updates available" button. Click it.
    Note that the upstream video preview shows its transcripts fine, but the downstream (course) video preview doesn't show its transcripts anymore.
  7. Accept changes.
    Note that the course video no longer shows its transcripts, but if you edit it, you can see they're still there.
Syncing.upstream.video.breaks.transcripts.mp4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is related to openedx/modular-learning#246

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrisChV That could very well be.. however I don't think it's resolved by @DanielVZ96 's #36173, but it's also possible that I didn't merge conflicts accurately. cf my merged branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pomegranited To be safe, I will wait until #36173 is ready to fix this bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries @ChrisChV , thank you for keeping an eye on this issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that that other PR is merged, any update on this?

Copy link
Contributor

@pomegranited pomegranited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thank you for making those changes @ChrisChV ! Code looks and works great.

  • I tested this using the testing instructions from feat: Enable transcripts for video library [FC-0076] frontend-app-authoring#1596.
    I also tested "duplicating" video blocks with transcripts in courses, and they worked too.
  • I read through the code
  • I checked for accessibility issues by using my keyboard to navigate
  • Includes documentation -- good code comments
  • User-facing strings are extracted for translation N/A

@@ -10,6 +10,7 @@
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrisChV That could very well be.. however I don't think it's resolved by @DanielVZ96 's #36173, but it's also possible that I didn't merge conflicts accurately. cf my merged branch.

Copy link
Contributor

@DanielVZ96 DanielVZ96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

  • I tested this
  • I read through the code
  • I checked for accessibility issues

@@ -299,13 +302,21 @@ def import_staged_content_from_user_clipboard(parent_key: UsageKey, request) ->
tags=user_clipboard.content.tags,
)

usage_key = new_xblock.scope_ids.usage_id
if usage_key.block_type == 'video':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overlaps a bit with the changes at #36173. Since there are some refactors to these functions here. I'll wait for the merge of this PR to update #36173 with your changes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since that PR merged first, does this one need to be updated?

Copy link
Contributor

@bradenmacdonald bradenmacdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big PR so I haven't finished reviewing yet, but here a couple questions so far.

@@ -299,13 +302,21 @@ def import_staged_content_from_user_clipboard(parent_key: UsageKey, request) ->
tags=user_clipboard.content.tags,
)

usage_key = new_xblock.scope_ids.usage_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
usage_key = new_xblock.scope_ids.usage_id
usage_key = new_xblock.usage_key

If you want, there is now a simpler way to get this :)

Comment on lines +633 to +650
if usage_key.block_type == 'video':
# Adding transcripts to VAL using the new edx_video_id
language_code = next((k for k, v in block.transcripts.items() if v == filename), None)
if language_code:
sjson_subs = Transcript.convert(
content=data,
input_format=Transcript.SRT,
output_format=Transcript.SJSON
).encode()
create_or_update_video_transcript(
video_id=block.edx_video_id,
language_code=language_code,
metadata={
'file_format': Transcript.SJSON,
'language_code': language_code
},
file_data=ContentFile(sjson_subs),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new code does not seem to match the docstring of the function: "Import a single staged static asset file into the course, unless it already exists." It also doesn't use staged_content_id nor file_data_obj.

I think the code is fine but it should be moved out of _import_file_into_course and into a new helper function like _import_transcripts

@@ -81,13 +84,17 @@ def link_video_to_component(video_component, user):
edx_video_id = clean_video_id(video_component.edx_video_id)
if not edx_video_id:
edx_video_id = create_external_video(display_name='external video')

if isinstance(video_component.usage_key, UsageKeyV2):
return edx_video_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. But I wonder, should we still be calling create_external_video and returning an edx_video_id at all, if it's not going to be saved into the video block? Doesn't that create some stranded video data in VAL?

Copy link
Contributor

@bradenmacdonald bradenmacdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrisChV Nice work on a very complex and ugly part of the code 👏🏻. I have a few small changes to request but I think this is just about good to go.

output_format=Transcript.SRT
).encode()

filename = f"static/{edx_video_id}-{language_code}.srt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to put the edx_video_id in the filename? Because transcript-{language_code}.srt would be a much nicer name.

Comment on lines +719 to +724
lib_api.require_permission_for_library_key(
context_key,
request.user,
lib_api.lib_permissions.CAN_EDIT_THIS_CONTENT_LIBRARY
)
return xblock_api.load_block(usage_key, request.user)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a public API for this:

Suggested change
lib_api.require_permission_for_library_key(
context_key,
request.user,
lib_api.lib_permissions.CAN_EDIT_THIS_CONTENT_LIBRARY
)
return xblock_api.load_block(usage_key, request.user)
return xblock_api.load_block(usage_key, request.user, check_permission=CheckPerm.CAN_EDIT)

Comment on lines +1955 to +1957

# Allow content library permissions to be used in the public API
lib_permissions = permissions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above - we shouldn't expose the permissions as part of the API because they're considered internal to the content libraries feature. But you can use the xblock_api.load_block(..., check_permission=...) API to require permissions when you load the block.

@@ -741,6 +741,48 @@ def test_export_to_xml(self, mock_val_api):
course_id=self.block.scope_ids.usage_id.context_key,
)

def test_export_to_xml_without_video_id(self):
"""
Test that we write the correct XML without video_id on export.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unclear from this wording - is this testing "that we write the correct XML for a block that doesn't have a video_id" or is this testing that "the correct XML should not include video_id when exported" ?

delete_video_transcript(video_id=edx_video_id, language_code=language)
def _studio_transcript_upload(self, request):
"""
Upload transcript. Usedn in "POST" method in `studio_transcript`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Upload transcript. Usedn in "POST" method in `studio_transcript`
Upload transcript. Used in "POST" method in `studio_transcript`

remove_subs_from_store(self.transcripts.pop(language, None), self, language)
def _studio_transcript_delete(self, request):
"""
Delete transcript. Usedn in "DELETE" method in `studio_transcript`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Delete transcript. Usedn in "DELETE" method in `studio_transcript`
Delete transcript. Used in "DELETE" method in `studio_transcript`


def _studio_transcript_get(self, request):
"""
Get transcript. Usedn in "GET" method in `studio_transcript`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Get transcript. Usedn in "GET" method in `studio_transcript`
Get transcript. Used in "GET" method in `studio_transcript`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FC Relates to an Axim Funded Contribution project open-source-contribution PR author is not from Axim or 2U
Projects
Status: Waiting on Author
Development

Successfully merging this pull request may close these issues.

6 participants