Skip to content

Latest commit

 

History

History
414 lines (336 loc) · 11.2 KB

Readme.md

File metadata and controls

414 lines (336 loc) · 11.2 KB

js-regex

What is it?

js-regex is a fluent regex builder for JavaScript. Its aim is to make the writing and maintenance of complicated regexes less taxing and error-prone.

Features

js-regex has a mix of features that make it especially appealing, when compared to writing raw regexs or using other builder libraries, for building complicated regexes:

  • Macros
    • Macros are basically named sequences
    • That can be registered for a particular builder instance, or across all js-regex objects
    • That are added onto the current regex as a single term by using .macro(registeredName)
  • Named Capture Groups
    • When using exec and similar functions, you don't get an array of the matches
    • Instead, you get an object with the match property (representing the entire match of the regex)
    • Along with a property for each named property group you gave to .capture(...)
  • Minimal Generated Expressions
    • Some regex builder libraries have a habit of wrapping almost everything you add in a non-capture group ((?:<stuff here>))
    • The above works, and is easy to make correct
    • But js-regex has the goal of not doing so whenever actually possible
  • Named Backreferences
    • Ignore the pumping lemma with your non-regular language expressions
    • Backreferences, in brief, allow you to refer to a previous captured group, and say that that text has to repeat itself exactly

Why?

Let's suppose that you've been asked to figure out why the following regex isn't working:

(SH|RE|MF)-((?:197[1-9]|19[89]\d|[2-9]\d{3})-(?:0[1-9]|1[012])-(?:0[1-9]|[12]\d|3[01]))-((?!0{5})\d{5})

If you're experienced with regexes, it's certainly possible to gain an understanding of it, but it takes longer than it should.

This is one example regex that has been built with this library; see below to see this example translated into a js-regex equivalent, or simply read on to go through most of the API before jumping into the complex examples.

Tests

In addition to the usage documented below, with a matching test suite here, there's a fair number of other test cases here.

At the time of writing, js-regex has more test code than executable code, and this is likely to remain the case:

wc -l regex.js
851 regex.js

wc -l test/cases/*
  160 test/cases/alt_syntax.js
   30 test/cases/any.js
  163 test/cases/capture.js
   30 test/cases/flags.js
   18 test/cases/literals.js
   68 test/cases/macros.js
  180 test/cases/or.js
  427 test/cases/readme_cases.js
  299 test/cases/repeat.js
   46 test/cases/sequence.js
   20 test/cases/states.js
    9 test/cases/test.js
 1450 total

Usage

Simple usage with peek()

regex()
    .literals('abc')
    .peek();        // Will return 'abc'

Never stop chaining!

regex()
    .literals('abc')
    .call(function (curNode) {
        console.log(this === curNode); // Will print true
        console.log(curNode.peek());   // Will print 'abc'
    })
    .literals('def')
    .call(function (curNode) {
        console.log(curNode.peek());   // Will print 'abcdef'
    });

Special Flags

regex()
    .f.digit()
    .f.whitespace()
    .peek();       // Will return '\d\s'

Capture Groups

regex()
    .literals('aaa')
      .capture()
    .peek();        // Will return '(aaa)'

Repeating

regex()
    .literals('aaa')
      .repeat()
    .peek();        // Will return '(?:aaa)*'

regex()
    .literals('aaa')
    .call(function (curNode) {
        console.log(curNode.peek()); // Will print 'aaa'
    })
      .repeat(1, 3)
    .peek();                         // Will return '(?:aaa){1,3}'

Simple Grouping

regex()
    .sequence()
        .literals('aaa')
        .f.digit()
        .literals('bbb')
    .endSequence()
      .repeat()
    .peek();            // Will return '(?:aaa\dbbb)*'

regex().sequence('aaa', regex.flags.digit(), 'bbb')
    .repeat()
    .peek();            // Will return '(?:aaa\dbbb)*'

Character Sets

regex()
    .any('abcdefg')
    .peek();       // Will return '[abcdefg]'

regex()
    .any()
        .literals('abc')
        .f.digit()
    .endAny()
    .peek();            // Will return '[abc\d]'

regex()
    .none()
        .literals('abc')
        .f.whitespace()
    .endNone()
    .peek();            // Will return '[^abc\s]'

Or

regex()
    .either()
        .literals('abc')
        .literals('def')
    .endEither()
    .peek();             // Will return 'abc|def'

regex()
    .either('abc', regex.any('def'))
    .peek();             // Will return 'abc|[def]'

Macros

regex.create(); // Alternate form of regex()

regex
    .addMacro('any-quote') // Adding a global macro for single or double quote
        .any('\'"')
    .endMacro()
    .create()
        .macro('any-quote')
        .f.dot()
          .repeat()
        .macro('any-quote')
        .peek();           // Will return '['"].*['"]'

regex
    .addMacro('quote')
        .any('\'"')
    .endMacro()
    .create()
        .addMacro('quote') // Local macros override global ones
            .literal('"')  //  Here, restricting to double quote only
        .endMacro()
        .macro('quote')
        .f.dot()
          .repeat()
        .macro('quote')
        .peek();           // Will return '".*"'

Followed By

regex()
    .literals('aaa')
      .followedBy('bbb')
    .peek();            // Will return 'aaa(?=bbb)'

regex()
    .literals('ccc')
      .notFollowedBy('ddd')
    .peek();               // Will return 'ccc(?!ddd)

Named Capture Groups and Exec

regex()
    .flags.anything()
      .repeat()
      .capture('preamble')
    .either('cool!', 'awesome!')
      .capture('exclamation')
    .call(function (rb) {
        // Would print '(.*)(cool!|awesome!)'
        console.log(rb.peek());

        // Would print 'this is '
        console.log(rb.exec('this is cool!  isn\'t it?').preamble);
        // Would print 'cool!'
        console.log(rb.exec('this is cool!  isn\'t it?').exclamation);

        // Would print 'this is also '
        console.log(rb.exec('this is also awesome!').preamble);
        // Would print 'awesome!'
        console.log(rb.exec('this is also awesome!').exclamation);
    });

Named Backreferences

You know how JS regular expressions are more powerful than regular languages? You can reference previous capture terms. js-regex supports this:

regex()
    .flags.anything()
      .repeat(1)
      .capture('anything')
    .literal('-')
    .reference('anything')
    .call(function (rb) {
        // Would print '(.+)-\1'
        console.log(rb.peek());

        // Would print 'whatever'
        console.log(rb.exec('whatever-whatever').anything);

        // Would print false
        console.log(rb.test('whatever-whatev'));
    });

Complicated Regexes

Example 1

How quickly can you figure out what this is supposed to represent?

regex()
    .addMacro('0-255')
        .either()
            .sequence()
                .literals('25')
                .anyFrom('0', '5')
            .endSequence()
            .sequence()
                .literal('2')
                .anyFrom('0', '4')
                .anyFrom('0', '9')
            .endSequence()
            .sequence()
                .any('01').optional()
                .anyFrom('0', '9')
                .anyFrom('0', '9').optional()
            .endSequence()
        .endEither()
    .endMacro()
    .macro('0-255').capture()
    .literal('.')
    .macro('0-255').capture()
    .literal('.')
    .macro('0-255').capture()
    .literal('.')
    .macro('0-255').capture()
    .peek();

(Hint: it's described here, in the fourth section on the page.)

(Also note: this example uses the 'verbose' usage form, always closing portions with endXXX(); the Readme tests cover the same using an alternate form)

Business Logic Regex

So our 'business logic' regex looks like this:

(SH|RE|MF)-((?:197[1-9]|19[89]\d|[2-9]\d{3})-(?:0[1-9]|1[012])-(?:0[1-9]|[12]\d|3[01]))-((?!0{5})\d{5})

Written in human terms, that would be: one of three department codes, a dash, a YYYY-MM-DD date (after Jan 1, 1971), a dash, then a non 00000 5 digit number.

In converting this regex to use js-regex, we make use of macros to define the department code, the date, and the trailing number. Note that most of this example is spent setting up the date regex - if your situation called for many dates being used in the application, the cost of setting up this most complicated portion of the regex would only need to be done once, after which it would be usable in other circumstances with no code changes, and far greater readability.

Anyway, let's take a look:

regex
    // Setting up our macros...
    .addMacro('dept-prefix', regex.either('SH', 'RE', 'MF'))
    .addMacro('date',
        regex.either(
            regex.sequence(
                '197',
                regex.anyFrom('1', '9')),
            regex.sequence(
                '19',
                regex.any('89'),
                regex.flags.digit()),
            regex.sequence(
                regex.anyFrom('2', '9'),
                regex.flags.digit().repeat(3, 3))),
        '-',
        regex.either(
            regex.sequence(
                '0',
                regex.anyFrom('1', '9')),
            regex.sequence(
                '1',
                regex.any('012'))),
        '-',
        regex.either(
            regex.sequence(
                '0',
                regex.anyFrom('1', '9')),
            regex.sequence(
                regex.any('12'),
                regex.flags.digit()),
            regex.sequence(
                '3',
                regex.any('01'))))
    .addMacro('issuenum',
        regex.notFollowedBy()
            .literal('0')
                .repeat(5, 5),
        regex.flags.digit()
            .repeat(5, 5))
    // Macros are setup, let's create our actual regex now:
    .create()
        .macro('dept-prefix').capture()
        .literal('-')
        .macro('date').capture()
        .literal('-')
        .macro('issuenum').capture()
        .peek(); // Returns the string shown above this code example

Conclusion

Perhaps this library piques your interest. If so, cool! Let me know! Just make sure that nothing on the issues page scares you before jumping in and actually using it.

Really, Really Experimental Methods

Simple Testing

test() is still kinda pointless.

regex()
    .literal('a')
    .test('a');   // Will return true

Simple Replacing

Needs more tests.

regex()
    .literals('abc')
    .replace('abc', function () {
        return 'def';
    });              // Will return 'def'