Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract text content from blocks #3057

Merged
merged 15 commits into from
Feb 10, 2025

Conversation

thomasdax98
Copy link
Member

@thomasdax98 thomasdax98 commented Jan 9, 2025

Description

Add extractTextContents method to blocks

extractTextContents can be used to extract plain text from blocks. This functionality is particularly useful for operations such as search indexing or using the content for LLM-based tasks.

The method is optional for now, but it is recommended to implement it for all blocks and documents. The default behavior is to return

  • if the state is a string: the string itself
  • otherwise: an empty array

In this PR, I implement extractTextContents for all factories and blocks in the library and checked all blocks in Demo. As you can see in Demo, I think it won't be necessary to override the default implementation for most cases.

Acceptance criteria

Example data + result

Test content (blocks)
[{"name":"StandaloneHeading","visible":true,"output":{"heading":{"eyebrow":{"draftContent":{"blocks":[{"key":"9kctd","text":"Where Performance Meets Elegance","type":"unstyled","depth":0,"inlineStyleRanges":[],"entityRanges":[],"data":{}}],"entityMap":{}}},"headline":{"draftContent":{"blocks":[{"key":"c9lii","text":"The Ultimate Driving Experience","type":"header-one","depth":0,"inlineStyleRanges":[],"entityRanges":[],"data":{}}],"entityMap":{}}},"htmlTag":"H2"},"textAlignment":"left"},"additionalFields":{"userGroup":"All"}},{"name":"RichText","visible":true,"output":{"draftContent":{"blocks":[{"key":"a0aj2","text":"Gliding effortlessly down the road, the sleek black sedan is more than just a car—it’s a statement of innovation and luxury. Its aerodynamic design, sculpted to perfection, not only enhances its visual appeal but also improves fuel efficiency. Beneath the hood, a powerful yet whisper-quiet engine delivers a seamless blend of speed and control, making every journey an exhilarating experience.","type":"paragraph-standard","depth":0,"inlineStyleRanges":[],"entityRanges":[],"data":{}},{"key":"a0to6","text":"Inside, the cabin welcomes passengers with premium leather seats, ambient lighting, and a high-tech digital dashboard that puts control at the driver’s fingertips. Advanced safety features, including adaptive cruise control and lane-keeping assist, ensure that every ride is as secure as it is comfortable.","type":"paragraph-standard","depth":0,"inlineStyleRanges":[],"entityRanges":[],"data":{}},{"key":"5ivc8","text":"Whether navigating city streets or embracing the open highway, this car redefines what it means to drive. It’s not just about getting from one place to another—it’s about enjoying every moment of the journey.","type":"paragraph-standard","depth":0,"inlineStyleRanges":[],"entityRanges":[],"data":{}}],"entityMap":{}}},"additionalFields":{"userGroup":"All"}},{"name":"FullWidthImage","visible":true,"output":{"image":{"attachedBlocks":[{"type":"pixelImage","props":{"damFileId":"bac65cc6-353d-40e5-acd6-2d7c82f41edb"}}],"activeType":"pixelImage"},"content":{"visible":false}},"additionalFields":{"userGroup":"All"}},{"name":"StandaloneCallToActionList","visible":true,"output":{"callToActionList":{"blocks":[{"key":"dc9fd671-a05e-436c-8ae9-435b1d405a84","visible":true,"props":{"textLink":{"link":{"attachedBlocks":[{"type":"internal","props":{"targetPageId":"ce1afa91-64f9-449c-b17b-ad747ed3e8f0"}}],"activeType":"internal"},"text":"My button"},"variant":"Contained"}}]},"alignment":"left"},"additionalFields":{"userGroup":"All"}},{"name":"TextImage","visible":true,"output":{"text":{"draftContent":{"blocks":[{"key":"61otq","text":"My text image text","type":"paragraph-standard","depth":0,"inlineStyleRanges":[],"entityRanges":[],"data":{}}],"entityMap":{}}},"image":{"attachedBlocks":[{"type":"pixelImage","props":{"damFileId":"2d281f60-6d82-486e-ae3a-301e3e0827a1"}}],"activeType":"pixelImage"},"imagePosition":"left","imageAspectRatio":"4x3"},"additionalFields":{"userGroup":"All"}},{"name":"Layout","visible":true,"output":{"layout":"layout1","media1":{"attachedBlocks":[{"type":"image","props":{"attachedBlocks":[{"type":"pixelImage","props":{}}],"activeType":"pixelImage"}}],"activeType":"image"},"text1":{"draftContent":{"blocks":[{"key":"cnkq","text":"","type":"unstyled","depth":0,"inlineStyleRanges":[],"entityRanges":[],"data":{}}],"entityMap":{}}},"media2":{"attachedBlocks":[{"type":"image","props":{"attachedBlocks":[{"type":"pixelImage","props":{}}],"activeType":"pixelImage"}}],"activeType":"image"},"text2":{"draftContent":{"blocks":[{"key":"779h3","text":"","type":"unstyled","depth":0,"inlineStyleRanges":[],"entityRanges":[],"data":{}}],"entityMap":{}}}},"additionalFields":{"userGroup":"All"}},{"name":"Columns","visible":true,"output":{"layout":"9-9","columns":[{"key":"77367703-c61c-4f2b-8eaf-5c69e442ac2c","visible":true,"props":{"blocks":[{"key":"a3df6ed1-126e-4e2d-9ce4-4f60507565bc","visible":true,"type":"accordion","props":{"blocks":[{"key":"245e35fe-54db-4486-8499-a1932b78e3d8","visible":true,"props":{"title":"My accordion item","content":{"blocks":[{"key":"0edac65f-d1e2-4268-a102-d453f8b98f11","visible":true,"type":"richtext","props":{"draftContent":{"blocks":[{"key":"219a1","text":"Text in accordion","type":"paragraph-standard","depth":0,"inlineStyleRanges":[],"entityRanges":[],"data":{}}],"entityMap":{}}}}]},"openByDefault":false}}]}}]}},{"key":"12d94f34-19c6-46f3-b19c-c41c96ed10ac","visible":true,"props":{"blocks":[]}}]},"additionalFields":{"userGroup":"All"}}]
Result
[
    "<p>Where Performance Meets Elegance</p>",
    "<h1>The Ultimate Driving Experience</h1>",
    "<p class=\"paragraph-standard\">Gliding effortlessly down the road, the sleek black sedan is more than just a car—it’s a statement of innovation and luxury. Its aerodynamic design, sculpted to perfection, not only enhances its visual appeal but also improves fuel efficiency. Beneath the hood, a powerful yet whisper-quiet engine delivers a seamless blend of speed and control, making every journey an exhilarating experience.</p>\n<p class=\"paragraph-standard\">Inside, the cabin welcomes passengers with premium leather seats, ambient lighting, and a high-tech digital dashboard that puts control at the driver’s fingertips. Advanced safety features, including adaptive cruise control and lane-keeping assist, ensure that every ride is as secure as it is comfortable.</p>\n<p class=\"paragraph-standard\">Whether navigating city streets or embracing the open highway, this car redefines what it means to drive. It’s not just about getting from one place to another—it’s about enjoying every moment of the journey.</p>",
    "A digitally-rendered image of a large, white, multi-stage rocket with two side boosters, viewed from a low angle against a clear blue sky with clouds below.",
    "Impressive space rocket soaring high above the clouds",
    "My button",
    "Test 2",
    "<p class=\"paragraph-standard\">My text image text</p>",
    "An astronaut in a detailed space suit is set against a backdrop of stars and space, with the reflection of another astronaut and the Earth visible in the visor of the helmet.",
    "Astronaut in space suit with reflection of Earth in helmet visor against a starry background",
    "My accordion item",
    "<p class=\"paragraph-standard\">Text in accordion</p>"
]

A real use case is implemented in #2614

Further information

@thomasdax98 thomasdax98 force-pushed the extract-text-content branch 2 times, most recently from 51d5035 to 4723b9d Compare January 10, 2025 11:25
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@thomasdax98 thomasdax98 changed the title Draft: Extract text content from blocks Extract text content from blocks Feb 3, 2025
@thomasdax98 thomasdax98 marked this pull request as ready for review February 3, 2025 09:06
@auto-assign auto-assign bot requested a review from johnnyomair February 3, 2025 09:06
return [];
}

return block?.extractTextContents?.(blockState.props) ?? [];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this extract the content for all child blocks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think so because the non-active blocks are basically invisible and shouldn't be considered as the content (same as invisible OptionalBlock).

This could actually be a problem for the translation case. Then you want to include invisible content. Maybe we need a setting for that in extractTextContents.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maybe we need to add a flag indicating if the extracted text is currently visible.

return [];
}

return decoratedBlock.extractTextContents?.(state.block) ?? [];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The block's visibility isn't considered here... is this correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, added in e2146b8

-> maybe we need an option to include invisible content for translation

@thomasdax98 thomasdax98 changed the base branch from main to feature/seo-tag-generation February 10, 2025 09:36
Copy link
Collaborator

@johnnyomair johnnyomair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll probably need an API for text visibility. But since this targets a feature branch, we can merge this PR and add it in a follow-up PR.

@thomasdax98
Copy link
Member Author

We'll probably need an API for text visibility. But since this targets a feature branch, we can merge this PR and add it in a follow-up PR.

Follow-up Task: https://vivid-planet.atlassian.net/browse/COM-1620

@thomasdax98 thomasdax98 merged commit 7f72e82 into feature/seo-tag-generation Feb 10, 2025
11 checks passed
@thomasdax98 thomasdax98 deleted the extract-text-content branch February 10, 2025 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants