Skip to content
This repository has been archived by the owner on May 3, 2023. It is now read-only.

Invalidates regular expressions when parsing the file #2

Closed
jonaslagoni opened this issue Nov 13, 2021 · 6 comments
Closed

Invalidates regular expressions when parsing the file #2

jonaslagoni opened this issue Nov 13, 2021 · 6 comments

Comments

@jonaslagoni
Copy link

jonaslagoni commented Nov 13, 2021

While the file does not contain any $ref it fails to get the schema regardless, so it is the minimal reproducible example:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "patternProperties": {
    "^x-[\\w\\d\\.\\-\\_]+$": true
  }
}

Code that is being run:

const Bundler = require("@hyperjump/json-schema-bundler");


(async () => {
  try {
    // Get the initial schema to pass to the bundler
    const main = await Bundler.get(`file://${__dirname}/../asyncapi.json`);

    // The bundler will fetch from the file system, network, or internal schemas as
    // needed to build to bundle.
    const bundle = await Bundler.bundle(main);

    console.log(bundle);
  } catch (error) {
    console.log(error);
  }
})();

The error being thrown:

SyntaxError: Invalid regular expression: /^x-[\w\d\.\-\_]+$/: Invalid escape
    at new RegExp (<anonymous>)
    at /x/bundler/node_modules/@hyperjump/json-schema/lib/keywords/patternProperties.js:7:50
    at Array.map (<anonymous>)
    at /xbundler/node_modules/@hyperjump/pact/lib/map.js:4:55
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
@jdesrosiers
Copy link
Contributor

This is a bug in the schema, not in this library. As the error is telling you, the regular expression is invalid. Apparently node.js doesn't like it when you try to escape a character that shouldn't be escaped. In this case, that character appears to be _. Browsers appear to be more tolerant of this error.

@jonaslagoni
Copy link
Author

Ha! I just assumed the regular expression was accurate, sorry! And actually, it is all the escaped characters that are troublesome 🧐 Weird none of the online validation tools does not complain about these...

On a side note, why does a bundler need to parse regular expressions? 🤔

@jdesrosiers
Copy link
Contributor

Weird none of the online validation tools does not complain about these...

This won't have anything to do with the implementation because implementations don't implement their own regex engine. They use the one built into the platform they are running on. That's why all implementations will throw an error with this schema when running on node.js but not when running in a browser. It's because of the difference between the RegExp implementation used in browsers vs the one used in node.js. It has nothing to do with the JSON Schema implementation itself.

why does a bundler need to parse regular expressions?

It doesn't, but that functionality came by default. Hyperjump JSC is built specifically to facilitate building tools like this. It compiles the schema into an AST that I can then use to build whatever tooling I want (validator, bundler, code gen, etc). By using that library to generate an AST, I was able to skip the hardest part of this implementation. Which is why I was able to put this together in an afternoon. But, that's also why there are some things in the AST building process that have nothing to do with bundlers. Those things are there for the benefit of other tools I've built or might build using the same AST (including a validator).

@jonaslagoni
Copy link
Author

jonaslagoni commented Nov 15, 2021

Hyperjump JSC is built specifically to facilitate building tools like this. It compiles the schema into an AST that I can then use to build whatever tooling I want (validator, bundler, code gen, etc).

Makes sense! Thanks for the clarification 🙂

Regarding the regex, I still can't wrap my head around why this ain't working, and what we can do to fix it, or where the problem lies 😆 This is the regex we try to build: https://regexr.com/69hcv, however when described in a JSON file we need to escape the \ chars.

If I run a small node example:

// First test
const firstReg = new RegExp('^x-[\\w\\d\\.\\-\\_]+$', 'g');
const firstRes = firstReg.test('x-something');
console.log(firstRes);

// Second test
const secondReg = new RegExp('^x-[\w\d\.\-\_]+$', 'g');
const secondRes = secondReg.test('x-something');
console.log(secondRes);

It produces:

true 
false

The second regex is what the bundler tries to pass to RegExp, so if we change the regex in the JSON Schema to ^x-[\\\\w\\\\d\\\\.\\\\-\\\\_]+$ (to try and get the second option) it no longer matches accurately in browsers, however, works in your bundler, at least it is accepted in your implementation 🧐

So I am left puzzled 😆

@jdesrosiers
Copy link
Contributor

I just figured out that browsers aren't working differently than node. The difference is the use of the u flag (for unicode support) in the regular expression (which is a SHOULD requirement in JSON Schema). When you use the u flag you get the error. When you don't it's more tolerant for some reason.

Your failing test case should be, new RegExp("^x-[\\w\\d\\.\\-\\_]+$", "u"). This regex has two unnecessary escapings: . and _, but _ appears to be the one that it's complaining about. This works, new RegExp("^x-[\\w\\d\\.\\-_]+$", "u"), but this is better new RegExp("^x-[\\w\\d.\\-_]+$", "u"). You could also avoid escaping the - by putting it at the end so it isn't ambiguous whether it's the character - or a range specifier, new RegExp("^x-[\\w\\d._-]+$", "u").

@jonaslagoni
Copy link
Author

Thanks, @jdesrosiers! That should clear it up 🙇

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants