Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace pegdown with modern markdown parser #81

Open
sirthias opened this issue Dec 13, 2016 · 21 comments
Open

Replace pegdown with modern markdown parser #81

sirthias opened this issue Dec 13, 2016 · 21 comments

Comments

@sirthias
Copy link
Contributor

paradox currently builds upon pegdown as the underlying markdown parser. Although pegdown has a number of serious issues it comes with quite a large feature-set and has long been one of the few "go-to" markdown-parsing solutions on the JVM.

Unfortunately, pegdown is essentially unmaintained with crucial bugs not being fixed. Also its parsing performance is relatively bad. Parser runtime can sometimes even become exponential, which means that the parser either appears to "hang" completely or abort processing after a time-out.

These deficiencies, along with the availability of newer, more modern and better maintained alternatives, should be sufficient motivation to consider a switch to another underlying parser solution.

I'd recommend we look at commonmark-java and flexmark-java and evaluate, which one works better.
From his involvement with pegdown I know @vsch (the author of flexmark-java) as being very friendly and responsive, so I'd trust him with responsible maintenance for the foreseeable future.
commonmark-java on the other hand is maintained by Atlassian, which is certainly not bad either.

My impression is that either alternative will provide a much better foundation for all future work on paradox than pegdown.

@jonas
Copy link
Contributor

jonas commented Dec 13, 2016

Any thought on Laika @sirthias ? I saw you played with it at one point. It's directive format is a big different, but it has a PDF renderer.

Also it looks like flexmark-java requires Java 8. Right now Paradox only requires Java 7 AFAIK.

@vsch
Copy link

vsch commented Dec 13, 2016

@jonas, I am the author of flexmark-java and in discussion with @sirthias about pegdown end of life maintenance.

Java 8 language level features will be removed from flexmark-java in the next few days since it also prevents flexmark-java from supporting android. So this is not going to be an issue.

If you do decide to migrate to flexmark-java and are interested I can offer active help in the migration of paradox to flexmark-java since I had to make this trip for my plugin: https://github.com/vsch/idea-multimarkdown, which was intimately dependent on pegdown's every quirk and feature.

@sirthias
Copy link
Contributor Author

sirthias commented Dec 13, 2016

Yes, I've played with Laika. It's not a bad project either, but it has a lot fewer users than commonmark-java or flexmark-java.

For example, flexmark-java is the basis of the Markdown Navigator plugin for Jetbrains IDEs, with a significant number of commercial users and almost 2 mio downloads last year.
So it's bound to be quite a bit more mature than Laika, and with better prospects for stable maintenance going forward.

Also, quite importantly, flexmark matches pegdown in terms of features-set (like tables, footnotes, etc.), which Laika doesn't.

@jonas
Copy link
Contributor

jonas commented Dec 13, 2016

@vsch I don't know if the Java version matters, but noted it because of Paradox's sbt plugin. Related with #14, do you have any thoughts on the complexity of implementing a Latex or PDF renderer for flexmark-java?

@sirthias Makes sense.

@vsch
Copy link

vsch commented Dec 13, 2016

@jonas, Java version matters for android which is a show stopper for some pegdown users who are experiencing issues with ASM dependency in pegdown. So I have to address this. Lamda use can be easily replaced with anonymous classes and in IntelliJ IDEA carries no extra cost in typing or source code real-estate since the IDE collapses them to lambda looking code for display.

I haven't looked at difficulty of rendering PDF or Latex but don't see an issue since it is a matter of walking the AST and generating the required output. Flexmark AST is source based like pegdown, but unlike pegdown is a linked list based with parent link. If anything, doing it for flexmark will be an order of magnitude simpler plus it would be a very nice addition to its feature set.

Are there any existing open source implementations for Markdown to PDF and Latex converters that I can use as a stepping stone since I have not worked with either format before?

@sirthias
Copy link
Contributor Author

@vsch Laika implements a PDF from MD generator whose implementation might be interesting. It's pure Scala though.

@vsch
Copy link

vsch commented Dec 13, 2016

@sirthias, thanks, I will take a look at it. I am no expert in Scala but learned to be conversant with it out of necessity.

@jonas
Copy link
Contributor

jonas commented Dec 13, 2016

I'd be interested in helping out with PDF support if Paradox makes the move. Laika uses https://xmlgraphics.apache.org/ which should in theory keep everything JVM-based although I've never had much luck with XML-FO. I know that @eed3si9n has used Pandoc to generate PDFs via Latex by simply concatenating all Markdown files together. This however requires that each file is written with this in mind. In any case I don't think PDF support is a blocker for moving to a more modern parser.

@vsch One last question: I don't know if flexmark-java have support for custom directives, but in any case would it be possible to extend the parser to support the current directive syntax?

@ref:[Inline directive](b.md)

@@ snip [Leaf block directive](../scala/Obj.scala)

@@@ note
Container block directive
@@@

@vsch
Copy link

vsch commented Dec 13, 2016

@jonas, any help will be greatly appreciated. It has been just me and the crickets behind me, working on this like mad, for the last year. 😄

flexmark does not have custom directives but has a very flexible extension API. I don't see it as being hard to add. If you can give me what you need with some constraints, if any exist for above elements then I can add it as an extension.

I am assuming that the above examples are additions on top of markdown, not affecting markdown syntax processing then they can be easily added as a node-processor, an extension that walks the AST and: removes, adds, modifies, re-arranges or creates new nodes based on whatever the code decides to do.

If the @@@ note can contain any markdown text spanning multiple paragraphs then it would require a block parser extension that would take care of splitting the content out of the stream so its markers do not accidentally span markdown inlines or other markdown block elements. Either way a standard, easy extension to implement.

If you have any options you would like to see in the extension such as rendering options, parsing options, please let me know. My aim in flexmark is to offer up configuration options for every extension that eliminates the need to write code for 99% of use cases.

@sirthias
Copy link
Contributor Author

sirthias commented Dec 14, 2016

Just FYI here: I have just added a deprecation note to pegdown's README officially recommending flexmark-java as the best replacement.

@eed3si9n
Copy link
Contributor

@sirthias Thanks for starting this discussion. You're the markdown parsing expert, so if you're saying flexmark-java is the way to go, I'm all for it.

@vsch
Copy link

vsch commented Dec 21, 2016

@jonas, flexmark-java Java language level has been downgraded to 7.

Convenience class to convert pegdown extension flags to flexmark-java config options added. It makes migration easier. https://github.com/vsch/flexmark-java#pegdown-migration-helper

If you can give me the specs for custom directives I will add the extension to the next release.

@jroper
Copy link
Member

jroper commented Apr 7, 2017

I'd just like to say, while I'm definitely not against migrating, I don't think the current scenario is that grim. We've been using pegdown for many years now serving documentation on playframework.com. All markdown is parsed and rendered on the fly on each request to the documentation. Performance has never been an issue for our needs, and issues such as hanging have never manifested as being a problem. Aside from the initial contribution I made to support extensions, I don't think we've ever had a need for upstream changes or bugfixes in pegdown that I'm aware of.

Aside from support for custom extensions, whatever library we select will need to have the following features:

  • The ability to walk the markdown AST for purposes other than HTML rendering. Neither Play nor Lagom have migrated to Paradox yet, but eventually we will, and one feature we will be porting is our validation code. This walks the pegdown AST and validates things like internal links (including anchor links), validates extensions (that source code snippets exist), detects orphan pages, enforces certain linking conventions (eg, enforce that javadoc links are to the frames version, ie index.html?com/example/MyClass.html) and even validates external links. This validation (excluding external links) is run by CI and is we consider crucial to maintaining the integrity of our documentation.
  • The ability to customise HTML rendering, such as apply a custom heading anchor link convention, or the ability to modify links and anchor names to work when rendering all docs as a single HTML page (for conversion to PDF). Pegdown supports this by extending the ToHtmlSerializer and overriding specific methods.
  • GFM support, including like handling of line breaks.
  • Table support.

@andyczerwonka
Copy link

andyczerwonka commented Aug 12, 2017

All, as I started down to hunt for solutions for #98 I quickly found myself here while researching markdown parsers. As per @jroper comment above, the ability to support PDF is clearly impacted by which markdown parser is chosen. Shall we close #98 as I think this item covers it?

@pvlugter
Copy link
Member

Switching to flexmark-java sounds good to me too. And the feature set sounds great: AST with post processing support, detailed source positions in the AST, extensible at multiple levels, faster parser.

@dwijnand
Copy link
Member

Switching to a parser that allows generating a PDF would solve #14.

@wsargent
Copy link
Contributor

wsargent commented May 26, 2018

I'm also interested in using code fences (aka "verbatim") with scalafiddle out of the box by specifying it as "scala scalafiddle", which is something that should be available according to the commonmark spec: https://spec.commonmark.org/0.27/#example-111

See jrblevin/markdown-mode#184 for an example of this kind of usage.

I am aware there's a FiddleDirective, I just like using tut inline blocks more, and right now I have to do the following:

<div data-scalafiddle>
<pre>
def sum(a: Int, b: Int) = a + b

println(sum(2, 2))
</pre>
</div>

and add integration.js by hand.

@jonas
Copy link
Contributor

jonas commented Aug 21, 2018

I will start looking at this over the coming weeks.

Vladimir and I have discussed meeting in person if something blocks my progress. Thanks a lot @vsch for showing this level of support!

@pvlugter
Copy link
Member

Awesome. Thanks @jonas and @vcsh.

@nafg
Copy link

nafg commented Jan 31, 2022

Any updates?

@nafg
Copy link

nafg commented Jan 31, 2022

I suggest adopting Ornate's markdown handling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests