Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YARD fails to recognize code listings in AsciiDoc documents (no links nor syntax highlight) #1239

Closed
skalee opened this issue Apr 15, 2019 · 0 comments · Fixed by #1276
Closed

Comments

@skalee
Copy link
Contributor

skalee commented Apr 15, 2019

Consider following example output (a border was added by me for clarity):

issue2a

Rendering issues are apparent:

  • some code samples are not in callouts
  • some code samples are not highlighted
  • in some code samples there are no links to classes and methods

The problematic document

A following piece of AsciiDoc was used to generate the first section of above document:

=== Ruby snippet (language specified)

[source,ruby]
----
two = 1 + 1
eleven = "1" + "1"
raise if Failure.problem
----

It translates to following HTML:

<div class="sect2">
<h3 id="_ruby_snippet_language_specified">Ruby snippet (language specified)</h3>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-ruby" data-lang="ruby">two = 1 + 1
eleven = "1" + "1"
raise if Failure.problem</code></pre>
</div>
</div>
</div>

Full document is in this Gist: https://gist.github.com/skalee/fe1a52a797f7c821ae354a59f7812fd9.

Why it fails

The YARD::Templates::Helpers::HtmlHelper#parse_codeblocks method is responsible for post processing the HTML output produced by markup processor. It enables code highlighting, adds links to particular Ruby methods and classes, and finally normalizes the HTML document so that it can be styled with CSS:

# Parses code blocks out of html and performs syntax highlighting
# on code inside of the blocks.
#
# @param [String] html the html to search for code in
# @return [String] highlighted html
# @see #html_syntax_highlight
def parse_codeblocks(html)
html.gsub(%r{<pre\s*(?:lang="(.+?)")?>(?:\s*<code\s*(?:class="(.+?)")?\s*>)?(.+?)(?:</code>\s*)?</pre>}m) do
string = $3
# handle !!!LANG prefix to send to html_syntax_highlight_LANG
language, = parse_lang_for_codeblock(string)
language ||= $1 || $2 || object.source_type
if options.highlight
string = html_syntax_highlight(CGI.unescapeHTML(string), language)
end
classes = ['code', language].compact.join(' ')
%(<pre class="#{classes}"><code class="#{language}">#{string}</code></pre>)
end
end

Unfortunately, the regular expression in this method fails to match code listings produced by Asciidoctor when they are annotated with name of programming language.

Other considerations

The most obvious solution would be to make that regular expression more liberal. However:

  • This regular expression is overcomplicated already.
  • HTML attributes may contain other things than language names. For example, <pre class="highlight"> in aforementioned example. Also, language name may be decorated with some additional specifiers, for example <code class="language-ruby"> in the same example.

Therefore, some smarter detection should be discussed. I have two proposals.

Option 1: Regular expression should be markup-processor-specific.

Method parse_codeblocks(html) is called from method htmlify(text, markup = options.markup) only. Therefore, an additional argument can be added: parse_codeblocks(html, markup).

Then, a brand new regular expression can be introduced for the Asciidoctor markup processor. Other markup processors can have their own dedicated regular expressions if they need any. The current regular expression will be used as a default generic one.

Option 2: Single regular expression with much smarter detection

A very generic regular expression can recognize different HTML attributes and prioritize them, either by value relevance (looking for a value which resembles a programming language name, like ruby, language-ruby, code-ruby), or by name relevance (looking for attributes with names data-lang, lang, class in that order).


Possibly the same problem as in #781, although that bug report was very unclear.

I can help implementing either, but discussion is required IMO. Personally, I am leaning towards option 1.

I have read the Contributing Guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant