-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient iteration and replacement of nodes #195
Comments
Got some progress with custom def indexed_ifilter(wikicode, recursive=True, matches=None, flags=FLAGS,
forcetype=None):
"""Iterate over nodes and their corresponding indices and parents.
The arguments are interpreted as for :meth:`ifilter`. For each tuple
``(parent, i, node)`` yielded by this method, ``parent`` is the direct
parent wikicode of ``node`` and ``parent.index(node) == i``.
"""
match = wikicode._build_matcher(matches, flags)
if recursive:
restrict = forcetype if recursive == wikicode.RECURSE_OTHERS else None
def getter(node):
for parent, ch in wikicode._get_children(node, restrict=restrict, contexts=True, parent=wikicode):
i = parent.index(ch)
yield (parent, i, ch)
inodes = chain(*(getter(n) for n in wikicode.nodes))
else:
inodes = ((wikicode, i, node) for i, node in enumerate(wikicode.nodes))
for parent, i, node in inodes:
if (not forcetype or isinstance(node, forcetype)) and match(node):
yield (parent, i, node)
def expand_2(wikicode):
for parent, i, template in indexed_ifilter(wikicode, forcetype=mwparserfromhell.nodes.template.Template, recursive=wikicode.RECURSE_OTHERS):
if template.has(1):
replacement = template.get(1).value
expand_2(replacement)
else:
replacement = ""
assert parent.get(i) is template, (parent.nodes[i], i, template)
parent.nodes.pop(i)
parent.insert(i, replacement) Which performs like this:
And second attempt: def parented_ifilter(wikicode, recursive=True, matches=None, flags=FLAGS,
forcetype=None):
"""Iterate over nodes and their corresponding parents.
The arguments are interpreted as for :meth:`ifilter`. For each tuple
``(parent, node)`` yielded by this method, ``parent`` is the direct
parent wikicode of ``node``.
"""
match = wikicode._build_matcher(matches, flags)
if recursive:
restrict = forcetype if recursive == wikicode.RECURSE_OTHERS else None
def getter(node):
for parent, ch in wikicode._get_children(node, restrict=restrict, contexts=True, parent=wikicode):
yield (parent, ch)
inodes = chain(*(getter(n) for n in wikicode.nodes))
else:
inodes = ((wikicode, node) for node in wikicode.nodes)
for parent, node in inodes:
if (not forcetype or isinstance(node, forcetype)) and match(node):
yield (parent, node)
def expand_3(wikicode):
for parent, template in parented_ifilter(wikicode, forcetype=mwparserfromhell.nodes.template.Template, recursive=wikicode.RECURSE_OTHERS):
if template.has(1):
replacement = template.get(1).value
expand_3(replacement)
else:
replacement = ""
parent.replace(template, replacement, recursive=False) Which is more than 10 times faster than the original: 😉
I put the whole benchmark here for future reference. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Consider the following problem: we want to replace all templates in a wikicode with their first argument. (This is basically a very simplified template expansion problem (#100) which I'm working on.)
The obvious way to solve this is this code:
However, it is very inefficient if you run it on a moderately complex real-world wiki page like Wireless network configuration on the Arch wiki (output from line_profiler):
The problem is that the position of the template in the wikicode is known to the
ifilter_templates
iterator, but it cannot be reused by thereplace
method, so it has to iterate from the start...There is a
Wikicode._indexed_ifilter
method, but it doesn't immediately help with this problem, because it returns only the top-level index and not a multiindex like(i, j, k)
where the returned node is the k-th child of the j-th child of the i-th child of the top-level wikicode.Can you think of an efficient solution? I don't mind doing the iteration manually, but of course it would be nice to provide a nice interface for it from mwparserfromhell itself at some point.
The text was updated successfully, but these errors were encountered: