Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Root-Relative Links (Absolute Links)? #57

Closed
karlhorky opened this issue Aug 14, 2020 · 10 comments
Closed

Support Root-Relative Links (Absolute Links)? #57

karlhorky opened this issue Aug 14, 2020 · 10 comments
Labels
💪 phase/solved Post is done

Comments

@karlhorky
Copy link

Hi @wooorm ! 👋

Subject of the feature

What do you think about supporting root-relative links such as in the example below?

[abc](/docs/non-existent/)
[abc](/docs/exists/#non-existent2)

I suppose this could be configured with a configuration option, maybe named like basedir.

Problem

On large sites with a lot of moving files or large directory structures, updating relative links can be a pain.

Root relative links can also avoid any ambiguity that can be caused by using the same path ending and file name between multiple files.

Expected behaviour

It would be nice for these links to also be checked.

Alternatives

gatsby-remark-check-links, but:

  1. It's Gatsby-specific
  2. It runs as a Gatsby plugin and only warns in the console :(

gh-action-check-broken-links, but:

  1. It's wrapped in a GitHub Action
  2. It assumes everything is in a pages directory
@wooorm
Copy link
Member

wooorm commented Aug 22, 2020

interesting!

This project is very much for stuff hosted on github/gitlab/bitbucket. So, repositories, that link to other markdown files in that repository. I think we even support such URLs, but in the way that /remarkjs/remark-validate-links/blob/main/readme.md and /remarkjs/remark-validate-links point to the readme here! So adding a feature like that, probably also comes with the need for other features (e.g., swapping md for html or the dirname), and I’m wondering whether that makes sense in this project or should be something else?

@karlhorky
Copy link
Author

karlhorky commented Aug 22, 2020

/remarkjs/remark-validate-links/blob/main/readme.md and /remarkjs/remark-validate-links point to the readme here

Aha, I guess these are root-relative links to GitHub then? These I would actually count as "external links", but I can see how they have all of the characteristics of a normal root-relative link.

I think that the 95% case can probably be handled with a general resolution scheme, with a configuration option of baseDir or similar that would be removed from the link.

For example, this configuration...

module.exports = {
  plugins: [
    [
      "remark-validate-links",
      {
        baseDir: '/remarkjs/remark-validate-links',
      },
    ],
  ],
}

...along with this link in Markdown...

[link](/remarkjs/remark-validate-links/docs/configuration)

...would search for the following files in the repo to see if it will be a broken link:

/docs/configuration.md
/docs/configuration/index.md

This would allow for a lot of cases for Markdown projects, both:

  1. for rendering on GitHub/GitLab/Bitbucket
  2. separate from these platforms (eg. Gatsby, Next.js, 11ty, etc)

However, since your library has a focus on these platforms, there could be two or more special cases added for these platforms:

1. Add check for readme.md files

Since these platforms use readme.md instead of index.md, add a single extra check:

/docs/configuration.md
/docs/configuration/index.md
+/docs/configuration/readme.md

2. Add check for non-blob paths

In your first example, there was a GitHub-specific blob path (/remarkjs/remark-validate-links/blob/main/readme.md).

If it's a link like your example, pointing at the default branch, then it can probably be special-cased to look for /readme.md (remove the GitHub-specific /\/blob\/[^/]+\//, after removing the baseDir).

However, if it is not the default branch, this is more complicated, because it takes into account branches and even commits. For this, it probably makes sense to try to resolve this as an external link (or just not support these links with proprietary branch / commit strings in them).

For example, this configuration...

module.exports = {
  plugins: [
    [
      "remark-validate-links",
      {
        baseDir: '/remarkjs/remark-validate-links',
      },
    ],
  ],
}

...along with this link in Markdown...

[link](/remarkjs/remark-validate-links/blob/non-default-branch/readme.md)

...would check the following external links:

https://github.com/remarkjs/remark-validate-links/blob/non-default-branch/readme.md

@karlhorky
Copy link
Author

Oh, just reading in find.js, I see you're already trying to do some resolution of GitHub/etc root-relative links, with no support for branches, got it :)

// Absolute paths: `/wooorm/test/blob/main/directory/example.md`.
if (value.charAt(0) === slash) {
if (!config.hostname) {
return
}
// Create a URL.
value = https + slashes + config.hostname + value
}
try {
url = new URL(value)
} catch (_) {}
// URLs: `https://github.com/wooorm/test/blob/main/directory/example.md`.
if (url) {
// Exit if we don’t have hosted Git info or this is not a URL to the repo.
if (
!config.prefix ||
!config.hostname ||
(url.protocol !== https && url.protocol !== http) ||
url.hostname !== config.hostname ||
url.pathname.slice(0, config.prefix.length) !== config.prefix
) {
return
}
value = url.pathname.slice(config.prefix.length)
// Things get interesting here: branches: `foo/bar/baz` could be `baz` on
// the `foo/bar` branch, or, `baz` in the `bar` directory on the `foo`
// branch.
// Currently, we’re ignoring this and just not supporting branches.
value = value.split(slash).slice(1).join(slash)

@karlhorky
Copy link
Author

So I suppose the changes proposed by this issue would have to be somewhere in here:

// Absolute paths: `/wooorm/test/blob/main/directory/example.md`.
if (value.charAt(0) === slash) {
if (!config.hostname) {
return
}
// Create a URL.
value = https + slashes + config.hostname + value
}
try {
url = new URL(value)
} catch (_) {}
// URLs: `https://github.com/wooorm/test/blob/main/directory/example.md`.
if (url) {
// Exit if we don’t have hosted Git info or this is not a URL to the repo.
if (
!config.prefix ||
!config.hostname ||
(url.protocol !== https && url.protocol !== http) ||
url.hostname !== config.hostname ||
url.pathname.slice(0, config.prefix.length) !== config.prefix
) {
return
}

Specifically, to add a separate branch that would resolve root-relative links to file paths with baseDir or similar config setting.

@wooorm
Copy link
Member

wooorm commented Aug 25, 2020

Say I have a package.json like this:

{
  "name": "example",
  "scripts": {
    "test": "remark ."
  },
  "remarkConfig": {
    "plugins": [
      [
        "validate-links",
        {
          "repo": "x/y"
        }
      ]
    ]
  },
  "devDependencies": {
    "remark-cli": "^8.0.1",
    "remark-validate-links": "^10.0.2"
  }
}

A readme.md file:

[a](/x/y/blob/main/docs/non-existent/)
[b](/x/y/blob/main/docs/exists/#readme)
[c](/x/y/blob/main/docs/exists/#non-existent2)

And a docs/exists/readme.md file:

Hi!

Now, running npm test yields:

$ nr test

> example@ test /Users/tilde/Projects/oss/example
> remark .

docs/exists/readme.md: no issues found
readme.md
  1:1-1:39  warning  Link to unknown file: `docs/non-existent`                  missing-file             remark-validate-links
  3:1-3:47  warning  Link to unknown heading in `docs/exists`: `non-existent2`  missing-heading-in-file  remark-validate-links

⚠ 2 warnings

As you can see, the links are checked. remark-validate-links correctly sees that non-existent does not exist, that docs/exists is a folder with a readme.md, but that it does not have a #non-existent2 header.

This is to say, we do support full urls with a hostname (https://github.com/x/y/blob/main/docs/exists), absolute paths (/x/y/blob/main/docs/exists), relative paths and whatnot too.
What this project does is resolve all URLs, as if the markdown was rendered on GH, Gl, Bb. It checks whether they would work or not.


What you mention here:

...along with this link in Markdown...

[link](/remarkjs/remark-validate-links/docs/configuration)

...would search for the following files in the repo to see if it will be a broken link:

/docs/configuration.md
/docs/configuration/index.md

This would allow for a lot of cases for Markdown projects, both:

That sort of works as shown above, but it does that with readme.md files. (And, blob/branch is needed).


So, I’d say the question, support root-relative links (absolute links), is already solved. Just in a different way than what you expected. Thus, to come back to the original problem

On large sites with a lot of moving files or large directory structures, updating relative links can be a pain.

Root relative links can also avoid any ambiguity that can be caused by using the same path ending and file name between multiple files.

Can you tell me more about this?

@karlhorky
Copy link
Author

karlhorky commented Aug 25, 2020

Sure. Here's an example:

  1. A user has a website or blog built with MD/MDX files in a src/pages directory (eg. on Gatsby, Next.js, 11ty, etc)
  2. The MD/MDX files link amongst themselves. These links will become links on the built website:
    To read more, check out [the docs](/lib-2/docs) about `lib-2`!
    This link above would link to the built result of the file /src/pages/lib-2/docs.md. The configuration here would probably look something like: baseDir: '/src/pages'
  3. Optional (to illustrate ambiguity): This website or blog has a structure that is repeated across multiple pages (eg. a docs.md file or docs folder inside many different folders). It's clearer and less ambiguous to use root-relative links here such as /lib-2/docs.
  4. Optional (to illustrate files / directories that move): This is a fast-moving project, so the structure of the site ends up changing every 6 months to 2 years.
  5. Users want to be able to be certain that their links work and continue to work

With any sort of renaming or structure changes, it can lead to a lot of error-prone manual work, if there is no automation such as a link checker involved (and users want the guarantees mentioned in point 5).

Does this clarify anything?

@karlhorky
Copy link
Author

karlhorky commented Nov 1, 2023

I ended up building this CLI tool in the meantime:

https://github.com/upleveled/mdx-local-link-checker

This supports links to local MDX files, along with the following features:

  1. checking a specific folder at a path
  2. ignoring anything at a provided blob pattern
  3. checking a specific folder with a different root-relative path
# Check the current directory with no ignore patterns
mdx-local-link-checker

# Check the src/pages folder, ignoring anything in a
# folder called "books" (at any depth)
mdx-local-link-checker src/pages src/pages "/books/**"

# Check the src/pages folder, ignoring anything in a
# folder called "books" or "slide-decks" (at any depth)
mdx-local-link-checker src/pages src/pages "/(books|slide-decks)/**"

# Check only the docs folder with the src/pages
# folder set to be the base path (for root-relative
# links such as "/docs/router")
mdx-local-link-checker src/pages/docs src/pages

@karlhorky
Copy link
Author

karlhorky commented Nov 1, 2023

So in case urlConfig or some other configuration setting would be changed in remark-validate-links (unlikely, given that this project is GitHub / GitLab / Bitbucket focused), maybe it could work like this:

Config:

baseDir: '/src/pages',
extension: 'mdx',

<repository root>/src/pages/lib-1/index.mdx

To read more, check out [the docs](/lib-2/docs) about `lib-2`!

The link above would be resolved to the file <repository root>/src/pages/lib-2/docs.mdx

(maybe the extension config is superfluous and could just be replaced by a number of supported extensions which would be checked eg. .md, .mdx, maybe also .html)

@LuchoTurtle
Copy link

Having this feature would be extremely useful, especially in monorepos where some markdowns reference other projects. Any news on this? 👀

@wooorm
Copy link
Member

wooorm commented Oct 31, 2024

Absolute paths are now supported like GitHub again, per #75.
For arbitrary websites, so not markdown files on GitHub/Gitlab/Bitbucket, use remark-lint-no-dead-urls

@wooorm wooorm added 💪 phase/solved Post is done and removed 🦋 type/enhancement This is great to have 🙉 open/needs-info This needs some more info labels Oct 31, 2024
@wooorm wooorm closed this as completed Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 phase/solved Post is done
Development

No branches or pull requests

3 participants