-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support symbol '-' in standard_ref() #344
Comments
I tried to make a solution for the gerenal case and came up with the patch below. But it turned out to be more difficult than I expected :). I downloaded this page and used the following grep -oP '<a href="https://timsong-cpp.github.io/cppwp/std23/[^#]+#\K[^"]+' 14882_\ Index.html > sections.txt As you can see, there are many interesting examples that we don't support yet, especially the ones that end with
should be displayed as I don't think we can solve that with a single regex (but maybe we can postprocess the match). But there are only 60 click here to see the patchdiff --git a/quiz/templatetags/quiz_extras.py b/quiz/templatetags/quiz_extras.py
index 4b637ab..0295d9f 100644
--- a/quiz/templatetags/quiz_extras.py
+++ b/quiz/templatetags/quiz_extras.py
@@ -18,21 +18,36 @@ def format_reference(match):
full_link = "https://timsong-cpp.github.io/cppwp/n4659/" + section_name
if paragraph_number:
full_link = full_link + "#" + paragraph_number
- return "<em><a href=\"" + full_link + "\">" + full_reference + "</a></em>"
+ return f"*[{full_reference}]({full_link})*"
def standard_ref(text):
- section_name = u'(\[(?P<section_name>[\w:]+(\.[\w:]+)*)\])'
- possible_paragraph = u'(¶(?P<paragraph>\d+(\.\d+)*))*'
- regex = re.compile('§(' + section_name + possible_paragraph + ')')
- return re.sub(regex, format_reference, text)
+ regex = re.compile(
+ r'''
+ §\[ # every ref starts its section part with §[
+ (?P<section_name> #
+ [^]]+ # at least one character other than ]
+ ) #
+ \] # every ref ends its section part with ]
+ (?:¶ # ref may start an optional paragraph part with ¶
+ (?P<paragraph> #
+ \S* # any number of non-whitespace chars
+ \w # supported sections must end with a word char
+ ) #
+ )? #
+ ''',
+ re.VERBOSE,
+ )
+ return regex.sub(format_reference, text)
@register.filter(needs_autoescape=True)
def to_html(text, autoescape=None):
return mark_safe(
- standard_ref(
- markdown.markdown(text, extensions=['nl2br', 'pymdownx.superfences'])))
+ markdown.markdown(
+ standard_ref(text), extensions=['nl2br', 'pymdownx.superfences']
+ )
+ ) |
Thanks for investigating all of this! I'm very busy this week so don't have time to dig into it. Something that supports most cases and isn't too complicated sounds good to me. But I'd like to see some examples of the types of links we match, and some things we shouldn't match. However, instead of regular unit tests, it can often be nice to just have a comment in the code with a link to regex101, like this https://regex101.com/r/Wxld8e/2 |
I believe we should parse the refs first and only then convert markdown to html. In that case, this is the list of possible refs we might encounter. tmp=''
for index in 'general' 'grammar' 'header' 'library' 'concept' 'impldef'; do
tmp+=$(grep -oP '<a href="https://timsong-cpp.github.io/cppwp/std23/\K[^"\s]+(?=")' ${index}.html | python3 -c "import sys,html,urllib.parse; print(urllib.parse.unquote(html.unescape(sys.stdin.read())))")
tmp+=$'\n'
done
echo "${tmp%?}" | sed -e 's/^/§[/' | sed -e 's/#/]¶/' | sort | uniq | head -c -1 > possible_refs.txt And this is how the suggested regex performs on the dataset (1711/20945 are unsupported): |
And here are the characters I think we can safely support:
, leaving 158/20945 unsupported (https://regex101.com/r/hf2P3X/2). Alternatively, we can try to enumerate the postfixes that the valid reference can't have like this:
, leaving 131/20945 unsupported (https://regex101.com/r/hf2P3X/3). |
I prefer the first one, the second one matches more things but is just so hard to read. Want to make a PR? |
All currently existing refs in cppquiz are supported, 297 out of 20945 in C++23 index are unsupported (160 containing an asterisk, 131 ending in a right parenthesis, 4 ending in a quotation mark, 2 ending in an explamation mark). Fixes knatten#344.
All currently existing refs in cppquiz are supported, 297 out of 20945 in C++23 index are unsupported (160 containing an asterisk, 131 ending in a right parenthesis, 4 ending in a quotation mark, 2 ending in an explamation mark). Fixes knatten#344.
All currently existing refs in cppquiz are supported, 297 out of 20945 in C++23 index are unsupported (160 containing an asterisk, 131 ending in a right parenthesis, 4 ending in a quotation mark, 2 ending in an explamation mark). Fixes knatten#344.
All currently existing refs in cppquiz are supported, 297 out of 20945 in C++23 index are unsupported (160 containing an asterisk, 131 ending in a right parenthesis, 4 ending in a quotation mark, 2 ending in an explamation mark). Fixes knatten#344.
All currently existing refs in cppquiz are supported, 297 out of 20945 in C++23 index are unsupported (160 containing an asterisk, 131 ending in a right parenthesis, 4 ending in a quotation mark, 2 ending in an exclamation mark). Fixes knatten#344.
We should support the symbol
-
in our references. For example,should not be rendered as
The text was updated successfully, but these errors were encountered: