Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from mikf:master #109

Merged
merged 5 commits into from
Jan 3, 2025
Merged

[pull] master from mikf:master #109

merged 5 commits into from
Jan 3, 2025

Conversation

pull[bot]
Copy link

@pull pull bot commented Jan 3, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

Dominik and others added 5 commits January 3, 2025 08:18
The structure of content is like this:

```
<div class="post-content" data-role="post_content-text">
                <div class="trix-content">
                    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
                    <html>
                        <body>
                            <div>
                                Unspeakable thing are written here<br />
                                <br />
                                haiiiiiiiiiiiiiiii hi hi hiii its meee back againnn, plspls leave a comment if uuuu liked it mwah
                                &lt;3
                            </div>
                        </body>
                    </html>
                </div>
            </div>
            <div class="post-uploads
```

Currently we extract content with:

```
(extr('<div class="post-content', '<div class="post-uploads').partition(">")[2])
```

I propose we just take the body parts:

```
extr('<body>', '</body>')
```

which only happen when surrounding actual content.

It is then easier to use it in the filename content with the `!H`
formatter: `content[:160]!H}`. Otherwise the content currently extracted
can't be decoded with it.
@pull pull bot added the ⤵️ pull label Jan 3, 2025
@pull pull bot merged commit 1dcb40b into bradenhilton:master Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants