Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for expanding templates. #100

Open
eggpi opened this issue Apr 26, 2015 · 6 comments
Open

Support for expanding templates. #100

eggpi opened this issue Apr 26, 2015 · 6 comments

Comments

@eggpi
Copy link

eggpi commented Apr 26, 2015

It would be nice if the application could optionally specify, say, a directory containing templates (one per file), and have parse expand those recursively in the wikicode as needed, in addition to parser functions and other magic words, before returning the final AST.

I realize this would be quite a lot of work, but it would go a long way towards MediaWiki compatibility. This feature already exists in the smc.mw parser, from which I stole the template directory idea. Thoughts?

@earwig
Copy link
Owner

earwig commented Apr 27, 2015

I don't know how I feel about this. You can do something similar yourself by iterating over the templates after parse has been called and using code.replace if the template name matches some particular string. The advantage to template expansions being available during parse time is that the parser can correctly handle the issues we see in #40, but it would also make rebuilding the original wikitext more difficult (or impossible?) depending on what kind of modifications you make.

I'm not too sure I like the idea of asking people to put a bunch of wikicode files in a directory, since template expansions are somewhat dynamic, and how are you going to decide which templates to put in the directory? I think a better idea is passing a callback function to parse that takes a template name and returns either its wikitext contents or None if the template is not to be expanded. That way, you can easily implement your directory lookup feature, or someone else could hook the parser up to the API without the parser itself needing to know anything about how MediaWiki's API works.

What are some concrete use cases for this feature?

@eggpi
Copy link
Author

eggpi commented Apr 27, 2015

I think a better idea is passing a callback function to parse that takes a template name and returns either its wikitext contents or None if the template is not to be expanded.

I completely agree, and in fact, that's pretty much what smc.mw does under the hood. A template directory was just the most straightforward way I found of describing the feature without dwelling too much into its implementation :)

What are some concrete use cases for this feature?

My usecase is having better support for templates in my pet project CitationHunt. Basically I'm parsing Wikipedia dumps looking for short snippets lacking citations. I use mwparserfromhell to help find {{Citation needed}} templates and strip the extra wikicode from them, and I'm considering extracting templates from the dumps as well and expanding some (all?) of them on snippets so they don't look broken.

I can definitely see your point that this could be done in the application, and my usecase is simple enough that it might be doable. On the other hand, having robust support for this, including parser functions, looks far from trivial, so it could be helpful to have the parser optionally handle it.

@earwig
Copy link
Owner

earwig commented Apr 27, 2015

Can you clarify the parser function thing - do you mean expanding parser function logic within the parser (e.g. the parser knows to convert {{#if:test|foo|bar}} into foo) or just that these are passed to the callback function in the same way as normal templates? Honestly, I think the former is beyond the scope of the parser, but the latter has limitations (it won't be able to handle things like the above example).

Either way, the callback concept for expanding (regular) templates seems reasonable, so I'll implement that part.

@eggpi
Copy link
Author

eggpi commented Apr 27, 2015

It would be great to have the former, as it would make it easy to support some very common but otherwise simple templates like IPAc-en, but I'm fine with not having it if you think it would be of little value in general. I suspect that simple expansion and maybe a few heuristics for dealing with broken text (such as removing empty parenthesis) could fix quite a lot of my broken snippets anyway.

Either way, the callback concept for expanding (regular) templates seems reasonable, so I'll implement that part.

Great, thanks! And thanks for being so responsive! 👍

@lahwaacz
Copy link
Contributor

I'm working on this for my wiki-scripts project - see the ws.parser_helpers.template_expansion module and the accompanying tests. The main function, expand_templates, takes a content-getter callback function which should return the content of the requested page. Implementing the callback function for tests was trivial, now I'm working on getting the content from the custom local SQL database (see lahwaacz/wiki-scripts#35 if you're interested).

I hope that I managed to cover most of the MediaWiki peculiarities regarding transclusion (or at least added notes into the code). There is one particular issue that bugs me though: according to the Transclusion#Transclusion markup documentation, the <onlyinclude> tags should be parsed even inside <nowiki> tags, so that e.g. <nowiki><onlyinclude>My content</onlyinclude></nowiki> produces My content upon transclusion. So the parsing of <nowiki> tags is context-sensitive - that's bizarre enough. One solution would be to not care about such cases, but I'm interested what you think would be the best solution to handle the case properly.

@nyurik
Copy link
Contributor

nyurik commented Dec 31, 2019

I also hacked together a recursive template expander as part of wiktionary parsing code (lexicator creates Wikidata lexemes), but due to Lua usage and complexity, I am beginning to reduce its usage in favor of asking MediaWiki to do this work for me using action=parse (code), and artificially constructing a large list of template calls (see resolvers in readme).

For example, template "Movie" is used on many pages, and internally it calls template "Infobox" with some useful strings that I would like to use after all the preparations done by the Movie template/lua code. I do not want to parse HTML generated by the Infobox, so I substitute Infobox template with my own version using templatesandboxtext param, which simply outputs arguments as list items (lists are easy to parse). To optimize the process and reduce server load, I generate multiple calls to the Movie template inside a single action=parse API call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants