Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(i)filter_templates chokes on templates containing HTML markup #115

Closed
ghost opened this issue Aug 4, 2015 · 2 comments
Closed

(i)filter_templates chokes on templates containing HTML markup #115

ghost opened this issue Aug 4, 2015 · 2 comments

Comments

@ghost
Copy link

ghost commented Aug 4, 2015

I thought I'd replace some of my ol' trusty (and nasty) regexes and stumbled on this bug when attempting to parse The Only Way Is Essex (series 11).

In [15]: print(text)
{{Use dmy dates|date=April 2014}}
{{Use British English|date=April 2014}}
{{Infobox television season
| show_name        = [[The Only Way Is Essex]]
| season_name      = <span style="color: white;">The Only Way Is Essex<br />'' Series 11</span>
| bgcolour         = #58FA58
| image            =
| caption          = Series 11 title card
| country          = [[United Kingdom]]
| network          = [[ITV2]]
| first_aired      = 23 February 2014
| last_aired       = 2 April 2014
| num_episodes     = 12
| dvd_release_date = TBA
| prev_season      = [[The Only Way Is Essex (series 10)|Series 10]]
| next_season      = [[The Only Way Is Essex (series 12)|Series 12]]
}}

The '''eleventh series''' of the [[United Kingdom|British]] semi-reality television programme ''[[The Only Way Is Essex]]'' was confirmed on 30 January 2014 when it had been announced that it had renewed for a further three series, the eleventh, twelfth and thirteenth.<ref>http://www.digitalspy.co.uk/tv/s143/the-only-way-is-essex/news/a547651/the-only-way-is-essex-to-air-3-series-this-yearThe '''eleventh series'''t; The '''eleventh series''' of the [[United Kin aThe '''eleventh series''' of the [[United Kingdom|British]] semi-reality television programme ''[[The Only Way Is Essex]]'urThe '''eleventh series''' uriThe '''eleventh sericiaThe '''eleventh series''' of the [[UniteieThe '''eleventh series''' of the [[Unisaw the return of Lydia Bright.

In [16]: mwparserfromhell.parse(text).filter_templates()
Out[16]:
['{{Use dmy dates|date=April 2014}}',
 '{{Use British English|date=April 2014}}']

In [17]: print(text2)
{{Use dmy dates|date=April 2014}}
{{Use British English|date=April 2014}}
{{Infobox television season
| show_name        = [[The Only Way Is Essex]]
| season_name      =
| bgcolour         = #58FA58
| image            =
| caption          = Series 11 title card
| country          = [[United Kingdom]]
| network          = [[ITV2]]
| first_aired      = 23 February 2014
| last_aired       = 2 April 2014
| num_episodes     = 12
| dvd_release_date = TBA
| prev_season      = [[The Only Way Is Essex (series 10)|Series 10]]
| next_season      = [[The Only Way Is Essex (series 12)|Series 12]]
}}

The '''eleventh series''' of the [[United Kingdom|British]] semi-reality television programme ''[[The Only Way Is Essex]]'' was confirmed on 30 January 2014 when it had been announced that it had renewed for a further three series, the eleventh, twelfth and thirteenth.&lt;ref&gt;http://www.digitalspy.co.uk/tv/s143/the-only-way-is-essex/news/a547651/the-only-way-is-essex-to-air-3-series-this-year-itv2-says.html&lt;/ref&gt; The series premiered on 23 February 2014 with a 60-minute special, and was followed by another 11 episodes. This was also the first series not to include [[Lucy Mecklenburgh]] after her departure during the Christmas special after the [[The Only Way Is Essex (series 10)|tenth series]]. The series also saw the return of Lydia Bright.

In [18]: mwparserfromhell.parse(text2).filter_templates()
Out[18]:
['{{Use dmy dates|date=April 2014}}',
 '{{Use British English|date=April 2014}}',
 '{{Infobox television season\n| show_name        = [[The Only Way Is Essex]]\n| season_name      = \n| bgcolour         = #58FA58\n| image            =  \n| caption          = Series 11 title card\n| country          = [[United Kingdom]]\n| network          = [[ITV2]]\n| first_aired      = 23 February 2014\n| last_aired       = 2 April 2014\n| num_episodes     = 12\n| dvd_release_date = TBA\n| prev_season      = [[The Only Way Is Essex (series 10)|Series 10]]\n| next_season      = [[The Only Way Is Essex (series 12)|Series 12]]\n}}']
@earwig
Copy link
Owner

earwig commented Aug 4, 2015

Right here:

<br />'' Series 11</span>

The open-italics tag is not closed where it should be, so the parser gets confused and thinks it extends to the beginning of ''[[The Only Way Is Essex]]'', breaking the template.

Until #40 gets addressed, you can fix this by doing mwparserfromhell.parse(text, skip_style_tags=True) instead.

@earwig earwig closed this as completed Aug 4, 2015
@earwig earwig self-assigned this Aug 4, 2015
@ghost
Copy link
Author

ghost commented Aug 4, 2015

Cheers.

@earwig earwig removed their assignment Dec 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant