YARD fails to recognize code listings in AsciiDoc documents (no links nor syntax highlight) #1239

skalee · 2019-04-15T05:20:32Z

Consider following example output (a border was added by me for clarity):

Rendering issues are apparent:

some code samples are not in callouts
some code samples are not highlighted
in some code samples there are no links to classes and methods

The problematic document

A following piece of AsciiDoc was used to generate the first section of above document:

=== Ruby snippet (language specified)

[source,ruby]
----
two = 1 + 1
eleven = "1" + "1"
raise if Failure.problem
----

It translates to following HTML:

<div class="sect2">
<h3 id="_ruby_snippet_language_specified">Ruby snippet (language specified)</h3>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-ruby" data-lang="ruby">two = 1 + 1
eleven = "1" + "1"
raise if Failure.problem</code></pre>
</div>
</div>
</div>

Full document is in this Gist: https://gist.github.com/skalee/fe1a52a797f7c821ae354a59f7812fd9.

Why it fails

The YARD::Templates::Helpers::HtmlHelper#parse_codeblocks method is responsible for post processing the HTML output produced by markup processor. It enables code highlighting, adds links to particular Ruby methods and classes, and finally normalizes the HTML document so that it can be styled with CSS:

yard/lib/yard/templates/helpers/html_helper.rb

Lines 624 to 643 in 12f56cf

    
           # Parses code blocks out of html and performs syntax highlighting 
        
           # on code inside of the blocks. 
        
           # 
        
           # @param [String] html the html to search for code in 
        
           # @return [String] highlighted html 
        
           # @see #html_syntax_highlight 
        
           def parse_codeblocks(html) 
        
             html.gsub(%r{<pre\s*(?:lang="(.+?)")?>(?:\s*<code\s*(?:class="(.+?)")?\s*>)?(.+?)(?:</code>\s*)?</pre>}m) do 
        
               string = $3 
        
               # handle !!!LANG prefix to send to html_syntax_highlight_LANG 
        
               language, = parse_lang_for_codeblock(string) 
        
               language ||= $1 || $2 || object.source_type 
        
               if options.highlight 
        
                 string = html_syntax_highlight(CGI.unescapeHTML(string), language) 
        
               end 
        
               classes = ['code', language].compact.join(' ') 
        
               %(<pre class="#{classes}"><code class="#{language}">#{string}</code></pre>) 
        
             end 
        
           end

Unfortunately, the regular expression in this method fails to match code listings produced by Asciidoctor when they are annotated with name of programming language.

Other considerations

The most obvious solution would be to make that regular expression more liberal. However:

This regular expression is overcomplicated already.
HTML attributes may contain other things than language names. For example, <pre class="highlight"> in aforementioned example. Also, language name may be decorated with some additional specifiers, for example <code class="language-ruby"> in the same example.

Therefore, some smarter detection should be discussed. I have two proposals.

Option 1: Regular expression should be markup-processor-specific.

Method parse_codeblocks(html) is called from method htmlify(text, markup = options.markup) only. Therefore, an additional argument can be added: parse_codeblocks(html, markup).

Then, a brand new regular expression can be introduced for the Asciidoctor markup processor. Other markup processors can have their own dedicated regular expressions if they need any. The current regular expression will be used as a default generic one.

Option 2: Single regular expression with much smarter detection

A very generic regular expression can recognize different HTML attributes and prioritize them, either by value relevance (looking for a value which resembles a programming language name, like ruby, language-ruby, code-ruby), or by name relevance (looking for attributes with names data-lang, lang, class in that order).

Possibly the same problem as in #781, although that bug report was very unclear.

I can help implementing either, but discussion is required IMO. Personally, I am leaning towards option 1.

I have read the Contributing Guide.

The text was updated successfully, but these errors were encountered:

skalee mentioned this issue Apr 18, 2019

Correct code highlight in rubydoc.info riboseinc/enmail#133

Closed

skalee mentioned this issue May 6, 2019

Add tests to cover integration with various markup renderers #1241

Closed

4 tasks

skalee mentioned this issue Aug 30, 2019

Fix Asciidoc syntax highlight #1276

Merged

4 tasks

lsegal closed this as completed in #1276 Nov 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YARD fails to recognize code listings in AsciiDoc documents (no links nor syntax highlight) #1239

YARD fails to recognize code listings in AsciiDoc documents (no links nor syntax highlight) #1239

skalee commented Apr 15, 2019 •

edited

Loading

YARD fails to recognize code listings in AsciiDoc documents (no links nor syntax highlight) #1239

YARD fails to recognize code listings in AsciiDoc documents (no links nor syntax highlight) #1239

Comments

skalee commented Apr 15, 2019 • edited Loading

The problematic document

Why it fails

Other considerations

Option 1: Regular expression should be markup-processor-specific.

Option 2: Single regular expression with much smarter detection

skalee commented Apr 15, 2019 •

edited

Loading