Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline PUML #968

Closed
wants to merge 2 commits into from
Closed

Inline PUML #968

wants to merge 2 commits into from

Conversation

crphang
Copy link
Contributor

@crphang crphang commented Jan 1, 2020

Resolves #912

Extremely raw implementation here. This also breaks existing PUML which will be addressed in later commits.

Here's a demo:

image

image

Problems with current implementation:

  1. The markdown-it library was hacked and was doing file generation of plantUML and I really don't think it should be doing it.
  2. Currently I commented the file based PlantUML plugin handler as it removes all puml tags.
  3. The logic of file based and inline puml are handled in very different places. This can be made better.
  4. This uses Fenced Code blocks to input PUML but not written in an extensible way.

Going forward:

  1. I believe I would still make use on fenced code blocks to retrieve the content of the PUML file. This helps to ensure that the content is properly escaped. A pluggable handler which allows fenced blocks to handle other kind of content, with PlantUML being only 1 type of it.

This fits in nicely considering fenced block already handling other special types of content like diffs:

+ new
- old
  1. Merge the user experience of both PUML files:
<puml src="filename" />

<puml inline>
```puml
...
</puml>
  1. Allow user to specify output file name for diagrams, and also generate temporary but consistent file names (MD5 hash of content).
  2. Refactor to perform the generation of PlantUML diagram in a seperate. Currently it is done in the markdown-it library itself. Markdown-it will only be used to return the content of the generated HTML and metadata if necessary.

@yamgent
Copy link
Member

yamgent commented Jan 2, 2020

Thanks for working on this. I think the code block thing is a nice trick for us, but it does make the syntax less usable on the author side (to them it seems like an addition that is not intuitive at first look, since it is different from how they use components).

Is it possible for us to do this "code block" thing behind the scenes, so that the author does not have to manually do it themselves, while still allowing us to escape it properly for later processing by markdown-it?

@damithc
Copy link
Contributor

damithc commented Jan 2, 2020

Is it possible for us to do this "code block" thing behind the scenes, so that the author does not have to manually do it themselves, while still allowing us to escape it properly for later processing by markdown-it?

Perhaps something like this?

```puml-inline
...
``` 

@yamgent
Copy link
Member

yamgent commented Jan 2, 2020

Ok I just went back and read through the old puml PRs and realise the difficulty in doing the "behind the scene" method that I suggested, because of how HTML just generally behaves.

Perhaps something like this?

```puml-inline
...
```

We can explore this way. The HTML tag seems to be rather redundant non-aesthetic wise, if we have to escape the content anyway (which makes it seem un-HTML).

@damithc
Copy link
Contributor

damithc commented Jan 2, 2020

Here's a demo:

image

If alternatives are too hard, I don't mind resorting to this solution; while it is a bit more work for the author, it is still easier than having to create an extra puml file.

@crphang crphang changed the title WIP: Inline PUML Inline PUML Feb 3, 2020
@crphang
Copy link
Contributor Author

crphang commented Feb 3, 2020

Checked that this PR resolves: #936 on top of #912.

Refactored code to handle both file PUML and inline PUML. Currently inline PUML works well without any new lines. For example, the below inline PUML would render fine:

<puml>
```puml
@startuml
alice -> bob ++ : hello
bob -> bob ++ : self call
bob -> bib ++  #005500 : hello
bob -> george ** : create
return done
return rc
bob -> george !! : delete
return success
@enduml
</puml>

However, there would be issues when we add newlines in the PUML as codefences are not completely escaped when nested as an immediate children under any element as there would be <p> tags added in the middle replacing the newlines and also <pre><code>... at the end.

This behavior is consistent for all codefences when nested under a tag. For example:

<span>
```
Some code fenced

Newline
```
</span>

will not work properly as well.

@yamgent @damithc is this a known issue?

Currently I am thinking to improve the behavior by either:

  1. Resolving the above issue separately if possible.
  2. Change the approach from using markdown-it-attr to inject additional user provided inputs instead (like outputFileName in this case), which removes the problematic tag for inline-puml. For example, the user experience becomes just:
```puml
...
```

The change here should be minimal in terms of the main logic, but I need to investigate further on how to pass in additional context (cwf, resultPath) and registering widgetHandlers to markdown-it renderer in a more generic way.

@acjh
Copy link
Contributor

acjh commented Feb 3, 2020

Add an empty line between HTML tag and opening code fence.

@crphang
Copy link
Contributor Author

crphang commented Feb 3, 2020

Add an empty line between HTML tag and opening code fence.

Yes, that workaround usually works, but I think it won't work when we rely on a syntax like below to inject user-input (as with this PR as well):

<tag user-input>
```tag
```

This is also discussed #473 (comment).

Was wondering whether it is a design choice/known issue to use this workaround? I guess if so, do you think it might be better to inject context (user-input) through the markdown-it-attr?

@acjh
Copy link
Contributor

acjh commented Feb 3, 2020

You mean the following doesn't work?

<tag user-input>

```tag
```
</tag>

It is in the GitHub Flavored Markdown Spec: https://github.github.com/gfm/#example-157

Example 157

<div>

*Emphasized* text.

</div>
<div>
<p><em>Emphasized</em> text.</p>
</div>

Example 158

<div>
*Emphasized* text.
</div>
<div>
*Emphasized* text.
</div>

@crphang
Copy link
Contributor Author

crphang commented Feb 4, 2020

Huge thanks for pointing that out. Leave a space does not work out of the box. I added a additional handler to make it work.

  static extractAndGenerateCodeBlock(element, codeLanguage = '') {
    let codeBlockElement;
    element.children.forEach((child) => {
      if (child.name === 'pre') {
        const nestedChild = child.children[0];
        [codeBlockElement] = nestedChild.children;
        codeBlockElement.data = `\`\`\`${codeLanguage}\n${codeBlockElement.data}\n\`\`\``;
      }
    });

    return codeBlockElement;
  }

This is because when leaving a space, initial md.render has already rendered the code fence as <pre><code>..., so to make this work, we retrieve the code block and regenerate it with the correct context and handler.

This keeps the behavior consistent as described https://github.github.com/gfm/#example-157 and also allows us to add additional context as well.

What do you think about this approach? If it is okay, I would proceed to refactor, add more edge case handling, additional tests and user guide for this.

@openorclose
Copy link
Contributor

Currently I am thinking to improve the behavior by either:

1. Resolving the above issue separately if possible.

2. Change the approach from using markdown-it-attr to inject additional user provided inputs instead (like outputFileName in this case), which removes the problematic tag for inline-puml. For example, the user experience becomes just:
```puml
...

The change here should be minimal in terms of the main logic, but I need to investigate further on how to pass in additional context (cwf, resultPath) and registering widgetHandlers to markdown-it renderer in a more generic way.

Can we try approach 2 instead?

@crphang
Copy link
Contributor Author

crphang commented Feb 5, 2020

Can we try approach 2 instead?

I'm on the fence for both approaches.

Approach 1
<tag user-input>

```tag
...
```
</tag>
Approach 2
```tag {user-input=x}
...
```

While approach 2 is less verbose, approach 1 is a more familiar way to add user-input rather than introducing markdown-it-attr for components (especially when comparing inline and src puml).

@yamgent @acjh what do you guys think?

@crphang crphang force-pushed the inline-puml branch 2 times, most recently from 48145c8 to 1e5d0b0 Compare February 11, 2020 15:21
@crphang
Copy link
Contributor Author

crphang commented Feb 11, 2020

Ready for review. Keeping this PR simpler as it primarily aims to resolve #936 and #912.

Future work: I explored adding PUML to be handled in componentParser, but decided to keep that separate for now. The reason is because I believe each "widget" (PUML/Latex/...) can be made more abstract, defining preprocess and parsing interface and this PR might not be best to introduce this refactor.

@ang-zeyu
Copy link
Contributor

ang-zeyu commented Feb 15, 2020

Hi @crphang,
I'm working on an api that would allow plugins to define certain html tag types which the parser would ignore, without the use of markdown code blocks.
This should allow us to keep plugin code to plugins strictly for better cohesion, which should be nice for #984 ( we can easily add more plugins for more sorts of diagrams, etc., without touching preprocess or anything else )

It seems to be pretty feasible for both htmlParser2 and markdown-it.
Would you want to base your changes on top of it?

The essential idea is to treat these tags the same way these parsers treat <script> tags, which allows all sorts of weird symbols anyway.

Its in a very early stage though, so if this gives you some ideas on how to do so, do let me know if you want to do it instead!

@crphang
Copy link
Contributor Author

crphang commented Feb 15, 2020

Hey @ang-zeyu, such an API would definitely help! One of the main problems I faced with plugins was that it introduced quite a bit of side effects, and we dont have as much control over the components as we do in parser.

I don't exactly think that introducing new diagrams will require "polluting" preprocess going forward. Definitely like your work on componentParser and refactoring preprocess. For adding new widgets, im looking to abstract it more and adding preprocess/parse handlers to each widget (will be in the scope of another PR #984)

Could you share the branch? I could take a further look. My current belief is that widgets as defined going forward are more easily maintained and beneficial for us developers to provide official supported components. Plugins offer extensibility in user defined components and functionality.

@ang-zeyu
Copy link
Contributor

Sure, i'll try to clean it up and push a functional one for htmlparser2 over tonight

@ang-zeyu
Copy link
Contributor

ang-zeyu commented Feb 15, 2020

ang-zeyu@3f1e5dc These are the changes to htmlparser2 required roughly ( still very much a WIP, but it works for simple use cases like <abc> <thisShouldAppearAsText> </abc> )

It should be much simpler for markdown-it due to its plugin system and use of regex patterns

In addition to this, what needs to be added would be

  • an additional field in plugin options allowing plugins to return an array of tag names to "ignore". These tag names would be set as the specialTagNames array in the commit above during initial site generation
  • markdown-it plugin to patch its behaviour
  • much more polishing and fixing for htmlparser2 patch

I don't exactly think that introducing new diagrams will require "polluting" preprocess going forward. Definitely like your work on componentParser and refactoring preprocess. For adding new widgets, im looking to abstract it more and adding preprocess/parse handlers to each widget (will be in the scope of another PR #984)

Hmm, reason I mentioned this was because I saw changes were needed in your code to _preprocess and _parse; As we want to add more and more diagrams, e.g. <line-graph> ... </line-graph> the amount of code inside these core markbind functions may be quite a lot.

We can alleviate this with something like #1026, but I feel its better to provide a sufficiently robust and consistent api to plugins which should make plugin development easier.
( we would have a clearly defined interface for plugins to operate, and an equally clear api contract for the core code to fufill )

This would also consolidate all optional functionality in plugin files, rather having such optional functionality coupled to the internal processes of parser which is easily subject to change.

One of the main problems I faced with plugins was that it introduced quite a bit of side effects, and we dont have as much control over the components as we do in parser.

Plugins certainly don't have as much control, but we can always add more to the api cautiously when really needed. ( such as "escaping" text inside the certain tags like this PR )

I'm not really aware of any side effects though 😮 I think this is one of the main benefits of a plugin system, where plugins may only interact with the provided content. A plugin can return a malformed html content string, but that would be the fault of the plugin development, not the core code I think. This could just as easily occur when plugins are coupled to the internal processes of the core code as well.

Just my 2 cents though, might be good to get some more opinions in 😄

@crphang
Copy link
Contributor Author

crphang commented Feb 16, 2020

I'm not really aware of any side effects though

What I meant was that the manipulation of the DOM is rather orthogonal from the main parsing process with plugins.

Hmm, reason I mentioned this was because I saw changes were needed in your code to _preprocess and _parse; As we want to add more and more diagrams, e.g. <line-graph> ... </line-graph> the amount of code inside these core markbind functions may be quite a lot.

Yes, certainly not a good idea to keep adding to the core logic. I just made a refactor to make it clearer what I was trying to achieve going forward.

ang-zeyu/markbind@3f1e5dc

Looks really good if it properly escapes from htmlParser and we can go with:

<component>
content
</component>

Certainly something we could explore further. Any rough estimate how long it would take for that POC to be completed? I am thinking of decoupling the two PR. I believe that the user experience without using code blocks to escape content will be better, but would want to start adding the other components. Perhaps we could add this change without modifying the user guide (since the changes are not backward compatible), so that other components in #984 have something to build on since both are rather pluggable. And migrating back to plugins once it is tested should not be too difficult.

@ang-zeyu
Copy link
Contributor

What I meant was that the manipulation of the DOM is rather orthogonal from the main parsing process with plugins.

Just looked at your refactor, I think I have a better idea now, thanks!

I'm all for this as well, I think it'd be good to have plugins tie in directly to the per element _parse / _preprocess process. This should keep performance good as we add more and more plugins as well, since we don't have to parse the entire content another round in each plugin.

Since its a major architectural addon for plugins, might be really good to have more opinions in first though @yamgent @acjh @openorclose @yash-chowdhary

And migrating back to plugins once it is tested should not be too difficult.

Just to clarify this, would the user still be able to turn off widgets eventually ( site config )?

Any rough estimate how long it would take for that POC to be completed? I am thinking of decoupling the two PR.

It should take about a week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow inline plantUML code when possible
6 participants