Re: RegExp.escape()

classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

Benjamin Gruenbaum
Ok, with a ton of help from Domenic I've put up http://benjamingr.github.io/RexExp.escape/

Less cool coloring but more links and motivating examples and so on at https://github.com/benjamingr/RexExp.escape

As this is my first attempt at this sort of thing - any non-bikeshed feedback would be appreciated :)

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

Mark S. Miller-2
Nice! Inspired

  // Based on
  function re(template, ...subs) {
    const parts = [];
    const numSubs = subs.length;
    for (let i = 0; i < numSubs; i++) {
      parts.push(template.raw[i]);
      parts.push(subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\$&'));
    }
    parts.push(template.raw[numSubs]);
    return RegExp(parts.join(''));
  }



On Fri, Jun 12, 2015 at 5:48 PM, Benjamin Gruenbaum <[hidden email]> wrote:
Ok, with a ton of help from Domenic I've put up http://benjamingr.github.io/RexExp.escape/

Less cool coloring but more links and motivating examples and so on at https://github.com/benjamingr/RexExp.escape

As this is my first attempt at this sort of thing - any non-bikeshed feedback would be appreciated :)

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss




--
    Cheers,
    --MarkM

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

C. Scott Ananian
On Sat, Jun 13, 2015 at 1:51 AM, Mark S. Miller <[hidden email]> wrote:
Nice! Inspired

  // Based on
  function re(template, ...subs) {
    const parts = [];
    const numSubs = subs.length;
    for (let i = 0; i < numSubs; i++) {
      parts.push(template.raw[i]);
      parts.push(subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\$&'));
    }
    parts.push(template.raw[numSubs]);
    return RegExp(parts.join(''));
  }

A slight tweak allows you to pass flags:
```
function re(flags, ...args) {
  if (typeof template !== 'string') {
     // no flags given
     return re(undefined)(flags, ...args);
  }
  return function(template, ...subs) {
    const parts = [];
    const numSubs = subs.length;
    for (let i = 0; i < numSubs; i++) {
      parts.push(template.raw[i]);
      parts.push(subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\$&'));
    }
    parts.push(template.raw[numSubs]);
    return RegExp(parts.join(''), flags);
  };
}
```

Use like this:
```
var r = re('i')`cAsEiNsEnSiTiVe`;
```
  --scott

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

Mark S. Miller-2
Good idea bug infinite recursion bug. Fixed:

  function re(first, ...args) {
    let flags = first;
    function tag(template, ...subs) {
      const parts = [];
      const numSubs = subs.length;
      for (let i = 0; i < numSubs; i++) {
        parts.push(template.raw[i]);
        parts.push(subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\$&'));
      }
      parts.push(template.raw[numSubs]);
      return RegExp(parts.join(''), flags);
    }
    if (typeof first === 'string') {
      return tag;
    } else {
      flags = void 0;  // Should this be '' ?
      return tag(first, ...args);
    }
  }


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

Mark S. Miller-2
Perfection?

  function re(first, ...args) {
    let flags = first;
    function tag(template, ...subs) {
      const parts = [];
      const numSubs = subs.length;
      for (let i = 0; i < numSubs; i++) {
        parts.push(template.raw[i]);
        const subst = subs[i] instanceof RegExp ? subs[i].source :
            subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\$&');
        parts.push(subst);
      }
      parts.push(template.raw[numSubs]);
      return RegExp(parts.join(''), flags);
    }
    if (typeof first === 'string') {
      return tag;
    } else {
      flags = void 0;  // Should this be '' ?
      return tag(first, ...args);
    }
  }


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

Mark S. Miller-2
The point of this last variant is that data gets escaped but RegExp objects do not -- allowing you to compose RegExps:   re`${re1}|${re2}*|${data}`
But this requires one more adjustment:

 
>
>   function re(first, ...args) {
>     let flags = first;
>     function tag(template, ...subs) {
>       const parts = [];
>       const numSubs = subs.length;
>       for (let i = 0; i < numSubs; i++) {
>         parts.push(template.raw[i]);
>         const subst = subs[i] instanceof RegExp ?

               `(?:${subs[i].source})` : 

>             subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\amp;');

>         parts.push(subst);
>       }
>       parts.push(template.raw[numSubs]);
>       return RegExp(parts.join(''), flags);
>     }
>     if (typeof first === 'string') {
>       return tag;
>     } else {
>       flags = void 0;  // Should this be '' ?
>       return tag(first, ...args);
>     }
>   }


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

RE: RegExp.escape()

Domenic Denicola

All of these should be building on top of RegExp.escape :P

 

From: es-discuss [mailto:[hidden email]] On Behalf Of Mark S. Miller
Sent: Saturday, June 13, 2015 02:39
To: C. Scott Ananian
Cc: Benjamin Gruenbaum; es-discuss
Subject: Re: RegExp.escape()

 

The point of this last variant is that data gets escaped but RegExp objects do not -- allowing you to compose RegExps:   re`${re1}|${re2}*|${data}`
But this requires one more adjustment:

 
>
>   function re(first, ...args) {
>     let flags = first;
>     function tag(template, ...subs) {
>       const parts = [];
>       const numSubs = subs.length;
>       for (let i = 0; i < numSubs; i++) {
>         parts.push(template.raw[i]);
>         const subst = subs[i] instanceof RegExp ?


               `(?:${subs[i].source})` : 

>             subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\amp;');
>         parts.push(subst);
>       }
>       parts.push(template.raw[numSubs]);
>       return RegExp(parts.join(''), flags);
>     }
>     if (typeof first === 'string') {
>       return tag;
>     } else {
>       flags = void 0;  // Should this be '' ?
>       return tag(first, ...args);
>     }
>   }


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

Mark S. Miller-2
On Sat, Jun 13, 2015 at 9:17 AM, Domenic Denicola <[hidden email]> wrote:

All of these should be building on top of RegExp.escape :P


I am not yet agreeing or disagreeing with this. Were both to become std, clearly they should be consistent with each other. At the time I wrote this, it had not occurred to me that the tag itself might be stdized at the same time as RegExp.escape. Now that this possibility has been proposed, I am realizing lots of flaws with my polyfill. It's funny how, by considering it as leading to a proposal, I quickly saw deep flaws that I was previously missing.

* The big one is that the literal template parts that are taken to represent the regexp pattern fragments being expressed should be syntactically valid *fragments*, in the sense that it makes semantic sense to inject data between these fragments. Escaping the data + validating the overall result does not do this. For example:

    const data = ':x';
    const rebad = RegExp.tag`(?${data})`;
    console.log(rebad.test('x')); // true

is nonsense. Since the RegExp grammar can be extended per platform, the same argument that says we should have the platform provide RegExp.escape says we should have the platform provide RegExp.tag -- so that they can conisistently reflect these platform extensions.

* Now that we have modules, I would like to see us stop having each proposal for new functionality come at the price of further global namespace pollution. I would like to see us transition towards having most new std library entry points be provided by std modules. I understand why we haven't yet, but something needs to go first.

* ES6 made RegExp subclassable with most methods delegating to a common @exec method, so that a subclass only needs to consistently override a small number of things to stay consistent. Neither RegExpSubclass.escape nor RegExpSubclass.tag can be derived from aRegExpSubclass[@exec]. Because of the first bullet, RegExpSubclass.tag also cannot be derived from RegExpSubclass.escape. But having RegExpSubclass.escape delegating to RegExpSubclass.tag seem weird.

* The instanceof below prevents this polyfill from working cross-frame. Also, when doing RegExpSubclass1.tag`xx${aRegExpSubclass2}yy`, where RegExpSubclass2.source produces a regexp grammar string that RegExpSubclass1 does not understand, I have no idea what the composition should do other than reject with an error. But what if the strings happen to be mutually valid but with conflicting meaning between these subclasses?


 

 

From: es-discuss [mailto:[hidden email]] On Behalf Of Mark S. Miller
Sent: Saturday, June 13, 2015 02:39
To: C. Scott Ananian
Cc: Benjamin Gruenbaum; es-discuss
Subject: Re: RegExp.escape()

 

The point of this last variant is that data gets escaped but RegExp objects do not -- allowing you to compose RegExps:   re`${re1}|${re2}*|${data}`
But this requires one more adjustment:

 
>
>   function re(first, ...args) {
>     let flags = first;
>     function tag(template, ...subs) {
>       const parts = [];
>       const numSubs = subs.length;
>       for (let i = 0; i < numSubs; i++) {
>         parts.push(template.raw[i]);
>         const subst = subs[i] instanceof RegExp ?


               `(?:${subs[i].source})` : 

>             subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\amp;');
>         parts.push(subst);
>       }
>       parts.push(template.raw[numSubs]);
>       return RegExp(parts.join(''), flags);
>     }
>     if (typeof first === 'string') {
>       return tag;
>     } else {
>       flags = void 0;  // Should this be '' ?
>       return tag(first, ...args);
>     }
>   }




--
    Cheers,
    --MarkM

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

Jordan Harband
Would it help subclassing to have the list of syntax characters/code points be on a well-known-symbol property? Like `RegExp.prototype[@@syntaxCharacters] = Object.freeze('^$\\.*+?()[]{}|'.split(''));` or something? Then @exec could reference that, and similarly `RegExp.escape` and RegExpSubclass.escape` could reference it as well?

On Sat, Jun 13, 2015 at 11:07 AM, Mark S. Miller <[hidden email]> wrote:
On Sat, Jun 13, 2015 at 9:17 AM, Domenic Denicola <[hidden email]> wrote:

All of these should be building on top of RegExp.escape :P


I am not yet agreeing or disagreeing with this. Were both to become std, clearly they should be consistent with each other. At the time I wrote this, it had not occurred to me that the tag itself might be stdized at the same time as RegExp.escape. Now that this possibility has been proposed, I am realizing lots of flaws with my polyfill. It's funny how, by considering it as leading to a proposal, I quickly saw deep flaws that I was previously missing.

* The big one is that the literal template parts that are taken to represent the regexp pattern fragments being expressed should be syntactically valid *fragments*, in the sense that it makes semantic sense to inject data between these fragments. Escaping the data + validating the overall result does not do this. For example:

    const data = ':x';
    const rebad = RegExp.tag`(?${data})`;
    console.log(rebad.test('x')); // true

is nonsense. Since the RegExp grammar can be extended per platform, the same argument that says we should have the platform provide RegExp.escape says we should have the platform provide RegExp.tag -- so that they can conisistently reflect these platform extensions.

* Now that we have modules, I would like to see us stop having each proposal for new functionality come at the price of further global namespace pollution. I would like to see us transition towards having most new std library entry points be provided by std modules. I understand why we haven't yet, but something needs to go first.

* ES6 made RegExp subclassable with most methods delegating to a common @exec method, so that a subclass only needs to consistently override a small number of things to stay consistent. Neither RegExpSubclass.escape nor RegExpSubclass.tag can be derived from aRegExpSubclass[@exec]. Because of the first bullet, RegExpSubclass.tag also cannot be derived from RegExpSubclass.escape. But having RegExpSubclass.escape delegating to RegExpSubclass.tag seem weird.

* The instanceof below prevents this polyfill from working cross-frame. Also, when doing RegExpSubclass1.tag`xx${aRegExpSubclass2}yy`, where RegExpSubclass2.source produces a regexp grammar string that RegExpSubclass1 does not understand, I have no idea what the composition should do other than reject with an error. But what if the strings happen to be mutually valid but with conflicting meaning between these subclasses?


 

 

From: es-discuss [mailto:[hidden email]] On Behalf Of Mark S. Miller
Sent: Saturday, June 13, 2015 02:39
To: C. Scott Ananian
Cc: Benjamin Gruenbaum; es-discuss
Subject: Re: RegExp.escape()

 

The point of this last variant is that data gets escaped but RegExp objects do not -- allowing you to compose RegExps:   re`${re1}|${re2}*|${data}`
But this requires one more adjustment:

 
>
>   function re(first, ...args) {
>     let flags = first;
>     function tag(template, ...subs) {
>       const parts = [];
>       const numSubs = subs.length;
>       for (let i = 0; i < numSubs; i++) {
>         parts.push(template.raw[i]);
>         const subst = subs[i] instanceof RegExp ?


               `(?:${subs[i].source})` : 

>             subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\amp;');
>         parts.push(subst);
>       }
>       parts.push(template.raw[numSubs]);
>       return RegExp(parts.join(''), flags);
>     }
>     if (typeof first === 'string') {
>       return tag;
>     } else {
>       flags = void 0;  // Should this be '' ?
>       return tag(first, ...args);
>     }
>   }




--
    Cheers,
    --MarkM

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

Benjamin Gruenbaum
In reply to this post by Mark S. Miller-2
On Sat, Jun 13, 2015 at 9:07 PM, Mark S. Miller <[hidden email]> wrote:
On Sat, Jun 13, 2015 at 9:17 AM, Domenic Denicola <[hidden email]> wrote:

All of these should be building on top of RegExp.escape :P


It's funny how, by considering it as leading to a proposal, I quickly saw deep flaws that I was previously missing.


That was a big part of making a proposal out of it - to find these things :)
 
the overall result does not do this. For example:

    const data = ':x';
    const rebad = RegExp.tag`(?${data})`;
    console.log(rebad.test('x')); // true

is nonsense. Since the RegExp grammar can be extended per platform, the same argument that says we should have the platform provide RegExp.escape says we should have the platform provide RegExp.tag -- so that they can conisistently reflect these platform extensions.


This is a good point, I considered whether or not `-` should be included for a similar reason. I think it is reasonable to only include syntax identifiers and expect users to deal with parts of patterns of more than one characters themselves (by wrapping the string with `()` in the constructor). This is what every other language does practically.

That said - I'm very open to allowing implementations to escape _more_ than `SyntaxCharacter` in their implementations and to even recommend  that they do so in such a way that is consistent with their regular expressions. What do you think about doing that?

I'm also open to `.tag` wrapping with `()` to avoid these issues but I'm not sure if we have a way in JavaScript to not make a capturing group out of it.
 
* Now that we have modules, I would like to see us stop having each proposal for new functionality come at the price of further global namespace pollution. I would like to see us transition towards having most new std library entry points be provided by std modules. I understand why we haven't yet, but something needs to go first.


I think that doing this should be an eventual target but I don't think adding a single much-asked-for static function to the RegExp function would be a good place to start. I think the committee first needs to agree about how this form of modularisation should be done - there are much bigger targets first and I would not like to see this proposal tied and held back by that (useful) goal.
 
* ES6 made RegExp subclassable with most methods delegating to a common @exec method, so that a subclass only needs to consistently override a small number of things to stay consistent. Neither RegExpSubclass.escape nor RegExpSubclass.tag can be derived from aRegExpSubclass[@exec]. Because of the first bullet, RegExpSubclass.tag also cannot be derived from RegExpSubclass.escape. But having RegExpSubclass.escape delegating to RegExpSubclass.tag seem weird.


Right but it makes sense that `escape` does not play in this game since it is a static method that takes a string argument - I'm not sure how it could use @exec.
 
* The instanceof below prevents this polyfill from working cross-frame. Also, when doing RegExpSubclass1.tag`xx${aRegExpSubclass2}yy`, where RegExpSubclass2.source produces a regexp grammar string that RegExpSubclass1 does not understand, I have no idea what the composition should do other than reject with an error. But what if the strings happen to be mutually valid but with conflicting meaning between these subclasses?

This is hacky, but in my code I just did `argument.exec ? treatAsRegExp : treatAsString`.
 

 

 

From: es-discuss [mailto:[hidden email]] On Behalf Of Mark S. Miller
Sent: Saturday, June 13, 2015 02:39
To: C. Scott Ananian
Cc: Benjamin Gruenbaum; es-discuss
Subject: Re: RegExp.escape()

 

The point of this last variant is that data gets escaped but RegExp objects do not -- allowing you to compose RegExps:   re`${re1}|${re2}*|${data}`
But this requires one more adjustment:

 
>
>   function re(first, ...args) {
>     let flags = first;
>     function tag(template, ...subs) {
>       const parts = [];
>       const numSubs = subs.length;
>       for (let i = 0; i < numSubs; i++) {
>         parts.push(template.raw[i]);
>         const subst = subs[i] instanceof RegExp ?


               `(?:${subs[i].source})` : 

>             subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\amp;');
>         parts.push(subst);
>       }
>       parts.push(template.raw[numSubs]);
>       return RegExp(parts.join(''), flags);
>     }
>     if (typeof first === 'string') {
>       return tag;
>     } else {
>       flags = void 0;  // Should this be '' ?
>       return tag(first, ...args);
>     }
>   }




--
    Cheers,
    --MarkM


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

Mark S. Miller-2
On Sat, Jun 13, 2015 at 11:39 AM, Benjamin Gruenbaum <[hidden email]> wrote:
On Sat, Jun 13, 2015 at 9:07 PM, Mark S. Miller <[hidden email]> wrote:
On Sat, Jun 13, 2015 at 9:17 AM, Domenic Denicola <[hidden email]> wrote:

All of these should be building on top of RegExp.escape :P


It's funny how, by considering it as leading to a proposal, I quickly saw deep flaws that I was previously missing.


That was a big part of making a proposal out of it - to find these things :)

Indeed! Much appreciated.

 
 
the overall result does not do this. For example:

    const data = ':x';
    const rebad = RegExp.tag`(?${data})`;
    console.log(rebad.test('x')); // true

is nonsense. Since the RegExp grammar can be extended per platform, the same argument that says we should have the platform provide RegExp.escape says we should have the platform provide RegExp.tag -- so that they can conisistently reflect these platform extensions.


This is a good point, I considered whether or not `-` should be included for a similar reason. I think it is reasonable to only include syntax identifiers and expect users to deal with parts of patterns of more than one characters themselves (by wrapping the string with `()` in the constructor). This is what every other language does practically.

That said - I'm very open to allowing implementations to escape _more_ than `SyntaxCharacter` in their implementations and to even recommend  that they do so in such a way that is consistent with their regular expressions. What do you think about doing that?

I'm also open to `.tag` wrapping with `()` to avoid these issues but I'm not sure if we have a way in JavaScript to not make a capturing group out of it.

Better or different escaping is not issue of this first bullet, but rather, validating that a fragment is a valid fragment for that regexp grammar. For the std grammar, "(?" is not a valid fragment and the tag should have rejected the template with an error on that basis alone.


 
 
* Now that we have modules, I would like to see us stop having each proposal for new functionality come at the price of further global namespace pollution. I would like to see us transition towards having most new std library entry points be provided by std modules. I understand why we haven't yet, but something needs to go first.


I think that doing this should be an eventual target but I don't think adding a single much-asked-for static function to the RegExp function would be a good place to start. I think the committee first needs to agree about how this form of modularisation should be done - there are much bigger targets first and I would not like to see this proposal tied and held back by that (useful) goal.

I agree, but this will be true for any individual proposal. 

Perhaps we need a sacrificial "first penguin through the ice" proposal whose *only* purpose is to arrive as a std import rather than a std primordial.
(Just kidding.)
 
 
* ES6 made RegExp subclassable with most methods delegating to a common @exec method, so that a subclass only needs to consistently override a small number of things to stay consistent. Neither RegExpSubclass.escape nor RegExpSubclass.tag can be derived from aRegExpSubclass[@exec]. Because of the first bullet, RegExpSubclass.tag also cannot be derived from RegExpSubclass.escape. But having RegExpSubclass.escape delegating to RegExpSubclass.tag seem weird.


Right but it makes sense that `escape` does not play in this game since it is a static method that takes a string argument - I'm not sure how it could use @exec.

I agree that defining a class-side method to delegate to an instance-side method is unpleasant. But because we have class-side inheritance, static methods should be designed with this larger game in mind.

 
 
* The instanceof below prevents this polyfill from working cross-frame. Also, when doing RegExpSubclass1.tag`xx${aRegExpSubclass2}yy`, where RegExpSubclass2.source produces a regexp grammar string that RegExpSubclass1 does not understand, I have no idea what the composition should do other than reject with an error. But what if the strings happen to be mutually valid but with conflicting meaning between these subclasses?

This is hacky, but in my code I just did `argument.exec ? treatAsRegExp : treatAsString`.

Yes, as with instanceof, that's the difference between the quality needed in a polyfill for personal use vs a proposed std.



--
    Cheers,
    --MarkM

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

Mark Miller-2
In reply to this post by Jordan Harband
Perhaps. I encourage you to draft a possible concrete proposal.


On Sat, Jun 13, 2015 at 11:30 AM, Jordan Harband <[hidden email]> wrote:
Would it help subclassing to have the list of syntax characters/code points be on a well-known-symbol property? Like `RegExp.prototype[@@syntaxCharacters] = Object.freeze('^$\\.*+?()[]{}|'.split(''));` or something? Then @exec could reference that, and similarly `RegExp.escape` and RegExpSubclass.escape` could reference it as well?

On Sat, Jun 13, 2015 at 11:07 AM, Mark S. Miller <[hidden email]> wrote:
On Sat, Jun 13, 2015 at 9:17 AM, Domenic Denicola <[hidden email]> wrote:

All of these should be building on top of RegExp.escape :P


I am not yet agreeing or disagreeing with this. Were both to become std, clearly they should be consistent with each other. At the time I wrote this, it had not occurred to me that the tag itself might be stdized at the same time as RegExp.escape. Now that this possibility has been proposed, I am realizing lots of flaws with my polyfill. It's funny how, by considering it as leading to a proposal, I quickly saw deep flaws that I was previously missing.

* The big one is that the literal template parts that are taken to represent the regexp pattern fragments being expressed should be syntactically valid *fragments*, in the sense that it makes semantic sense to inject data between these fragments. Escaping the data + validating the overall result does not do this. For example:

    const data = ':x';
    const rebad = RegExp.tag`(?${data})`;
    console.log(rebad.test('x')); // true

is nonsense. Since the RegExp grammar can be extended per platform, the same argument that says we should have the platform provide RegExp.escape says we should have the platform provide RegExp.tag -- so that they can conisistently reflect these platform extensions.

* Now that we have modules, I would like to see us stop having each proposal for new functionality come at the price of further global namespace pollution. I would like to see us transition towards having most new std library entry points be provided by std modules. I understand why we haven't yet, but something needs to go first.

* ES6 made RegExp subclassable with most methods delegating to a common @exec method, so that a subclass only needs to consistently override a small number of things to stay consistent. Neither RegExpSubclass.escape nor RegExpSubclass.tag can be derived from aRegExpSubclass[@exec]. Because of the first bullet, RegExpSubclass.tag also cannot be derived from RegExpSubclass.escape. But having RegExpSubclass.escape delegating to RegExpSubclass.tag seem weird.

* The instanceof below prevents this polyfill from working cross-frame. Also, when doing RegExpSubclass1.tag`xx${aRegExpSubclass2}yy`, where RegExpSubclass2.source produces a regexp grammar string that RegExpSubclass1 does not understand, I have no idea what the composition should do other than reject with an error. But what if the strings happen to be mutually valid but with conflicting meaning between these subclasses?


 

 

From: es-discuss [mailto:[hidden email]] On Behalf Of Mark S. Miller
Sent: Saturday, June 13, 2015 02:39
To: C. Scott Ananian
Cc: Benjamin Gruenbaum; es-discuss
Subject: Re: RegExp.escape()

 

The point of this last variant is that data gets escaped but RegExp objects do not -- allowing you to compose RegExps:   re`${re1}|${re2}*|${data}`
But this requires one more adjustment:

 
>
>   function re(first, ...args) {
>     let flags = first;
>     function tag(template, ...subs) {
>       const parts = [];
>       const numSubs = subs.length;
>       for (let i = 0; i < numSubs; i++) {
>         parts.push(template.raw[i]);
>         const subst = subs[i] instanceof RegExp ?


               `(?:${subs[i].source})` : 

>             subs[i].replace(/[\/\\^$*+?.()|[\]{}]/g, '\\amp;');
>         parts.push(subst);
>       }
>       parts.push(template.raw[numSubs]);
>       return RegExp(parts.join(''), flags);
>     }
>     if (typeof first === 'string') {
>       return tag;
>     } else {
>       flags = void 0;  // Should this be '' ?
>       return tag(first, ...args);
>     }
>   }




--
    Cheers,
    --MarkM

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss




--
Text by me above is hereby placed in the public domain

  Cheers,
  --MarkM

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

Benjamin Gruenbaum
In reply to this post by Mark S. Miller-2
What about that part in particular?

> That said - I'm very open to allowing implementations to escape _more_ than `SyntaxCharacter` in their implementations and to even recommend  that they do so in such a way that is consistent with their regular expressions. What do you think about doing that?

If we go with `.escape` (and not tag at this stage) - implementations extending the regexp syntax(which is apparently allowed?) to add identifiers should be allowed to add identifiers to escape?

This sounds like the biggest barrier at this point from what I understand. I'm also considering a bit of `as if` to allow implementations to, for example, not escape some characters inside `[...]` as long as the end result is the same. 



On Sat, Jun 13, 2015 at 9:57 PM, Mark S. Miller <[hidden email]> wrote:
On Sat, Jun 13, 2015 at 11:39 AM, Benjamin Gruenbaum <[hidden email]> wrote:
On Sat, Jun 13, 2015 at 9:07 PM, Mark S. Miller <[hidden email]> wrote:
On Sat, Jun 13, 2015 at 9:17 AM, Domenic Denicola <[hidden email]> wrote:

All of these should be building on top of RegExp.escape :P


It's funny how, by considering it as leading to a proposal, I quickly saw deep flaws that I was previously missing.


That was a big part of making a proposal out of it - to find these things :)

Indeed! Much appreciated.

 
 
the overall result does not do this. For example:

    const data = ':x';
    const rebad = RegExp.tag`(?${data})`;
    console.log(rebad.test('x')); // true

is nonsense. Since the RegExp grammar can be extended per platform, the same argument that says we should have the platform provide RegExp.escape says we should have the platform provide RegExp.tag -- so that they can conisistently reflect these platform extensions.


This is a good point, I considered whether or not `-` should be included for a similar reason. I think it is reasonable to only include syntax identifiers and expect users to deal with parts of patterns of more than one characters themselves (by wrapping the string with `()` in the constructor). This is what every other language does practically.

That said - I'm very open to allowing implementations to escape _more_ than `SyntaxCharacter` in their implementations and to even recommend  that they do so in such a way that is consistent with their regular expressions. What do you think about doing that?

I'm also open to `.tag` wrapping with `()` to avoid these issues but I'm not sure if we have a way in JavaScript to not make a capturing group out of it.

Better or different escaping is not issue of this first bullet, but rather, validating that a fragment is a valid fragment for that regexp grammar. For the std grammar, "(?" is not a valid fragment and the tag should have rejected the template with an error on that basis alone.


 
 
* Now that we have modules, I would like to see us stop having each proposal for new functionality come at the price of further global namespace pollution. I would like to see us transition towards having most new std library entry points be provided by std modules. I understand why we haven't yet, but something needs to go first.


I think that doing this should be an eventual target but I don't think adding a single much-asked-for static function to the RegExp function would be a good place to start. I think the committee first needs to agree about how this form of modularisation should be done - there are much bigger targets first and I would not like to see this proposal tied and held back by that (useful) goal.

I agree, but this will be true for any individual proposal. 

Perhaps we need a sacrificial "first penguin through the ice" proposal whose *only* purpose is to arrive as a std import rather than a std primordial.
(Just kidding.)
 
 
* ES6 made RegExp subclassable with most methods delegating to a common @exec method, so that a subclass only needs to consistently override a small number of things to stay consistent. Neither RegExpSubclass.escape nor RegExpSubclass.tag can be derived from aRegExpSubclass[@exec]. Because of the first bullet, RegExpSubclass.tag also cannot be derived from RegExpSubclass.escape. But having RegExpSubclass.escape delegating to RegExpSubclass.tag seem weird.


Right but it makes sense that `escape` does not play in this game since it is a static method that takes a string argument - I'm not sure how it could use @exec.

I agree that defining a class-side method to delegate to an instance-side method is unpleasant. But because we have class-side inheritance, static methods should be designed with this larger game in mind.

 
 
* The instanceof below prevents this polyfill from working cross-frame. Also, when doing RegExpSubclass1.tag`xx${aRegExpSubclass2}yy`, where RegExpSubclass2.source produces a regexp grammar string that RegExpSubclass1 does not understand, I have no idea what the composition should do other than reject with an error. But what if the strings happen to be mutually valid but with conflicting meaning between these subclasses?

This is hacky, but in my code I just did `argument.exec ? treatAsRegExp : treatAsString`.

Yes, as with instanceof, that's the difference between the quality needed in a polyfill for personal use vs a proposed std.



--
    Cheers,
    --MarkM


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

C. Scott Ananian
To throw some more paint on the bikeshed:

The "instanceof RegExp" and "RegExp(...)" parts of the "perfect" implementation of `RegExp.tag` should also be fixed to play nicely with species.

I think Allen and I would say that you should *not* use the species pattern for instantiating the new regexp (because this is a factory), but you should be doing `new this(...)` to create the result, instead of `RegExp(...)`.  (Domenic might disagree, but this is the pattern the ES6 spec is currently consistent with.)

The `instanceof RegExp` test might also be reviewed.  It might be okay, but perhaps you want to invoke a `toRegExp` method instead, or just look at `source`, so that we used duck typing instead of a fixed inheritance chain.  You could even define `String#toRegExp` and have that handle proper escaping.  This pattern might not play as nicely with subtyping, though, so perhaps using `this.escape(string)` (returning an instance of `this`) is in fact preferable.  Everything other than a string might be passed through `new this(somethingelse).source` which could cast it from a "base RegExp" to a subclass as necessary. You could handle whatever conversions are necessary in the constructor for your subclass.

If we did want to use the species pattern, the best way (IMO) would be to expose the fundamental alternation/concatenation/etc operators.  For example, `RegExp.prototype.concat(x)` would use the species pattern to produce a result, and would also handle escaping `x` if it was a string.  The set of instance methods needed is large but not *too* large: `concat`, `alt`, `mult`, and `group` (with options) might be sufficient.
  --scott



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

Allen Wirfs-Brock

On Jun 13, 2015, at 1:18 PM, C. Scott Ananian wrote:

To throw some more paint on the bikeshed:

The "instanceof RegExp" and "RegExp(...)" parts of the "perfect" implementation of `RegExp.tag` should also be fixed to play nicely with species.

I think Allen and I would say that you should *not* use the species pattern for instantiating the new regexp (because this is a factory), but you should be doing `new this(...)` to create the result, instead of `RegExp(...)`.  (Domenic might disagree, but this is the pattern the ES6 spec is currently consistent with.)

Absolute, `new this(...)` is a pattern that everyone who whats to create inheritable "static" factory methods needs to learn.  It's how such a factory says: I need to create an instance of the constructor I was invoked upon.

`species` is very different.  It is how an instance method says: I need to create an new instance that is similar to this instance, but lack its specialized behavior.


The `instanceof RegExp` test might also be reviewed.  It might be okay, but perhaps you want to invoke a `toRegExp` method instead, or just look at `source`, so that we used duck typing instead of a fixed inheritance chain.  You could even define `String#toRegExp` and have that handle proper escaping.  This pattern might not play as nicely with subtyping, though, so perhaps using `this.escape(string)` (returning an instance of `this`) is in fact preferable.  Everything other than a string might be passed through `new this(somethingelse).source` which could cast it from a "base RegExp" to a subclass as necessary. You could handle whatever conversions are necessary in the constructor for your subclass.

Originally, ES6 had a @@isRegExp property that was used to brand objects that could be used in context's where RegExp instances were expected. It was used by methods like String.prototype.match/split/search/replace to determine if the "pattern" argument was an "regular expression"  rather than a string.  Latter @@isRegExp was eliminated and replaced with @@match, @@search, @@split, and @@replace because we realized that the corresponding methods didn't depend upon the full generality of regular expressions, but only upon the more specific behaviors.  When we did that, we also decided that we would use the present of @@match a property as the brand to identify regular expression like objects.  This is captured in the ES6 spec. by the IsRegExp abstract operation http://people.mozilla.org/~jorendorff/es6-draft.html#sec-isregexp which is used at several places within the ES6 spec.

So, the property ES6 way to do a cross-realm friendly check for RegExp like behavior is  to check for the existence of a property whose key is Symbol.match


If we did want to use the species pattern, the best way (IMO) would be to expose the fundamental alternation/concatenation/etc operators.  For example, `RegExp.prototype.concat(x)` would use the species pattern to produce a result, and would also handle escaping `x` if it was a string.  The set of instance methods needed is large but not *too* large: `concat`, `alt`, `mult`, and `group` (with options) might be sufficient.

yes.

Again, the key difference is whether we are talking about a request to a constructor object or a request to an instance object.

MyArraySubclass.of(1,2,3,4)    //I expect to get an instance of MyArraySubclass, not of Array or some other Array-like species

aMyArraySubclassObject.map(from=>f(from))  //I expect to get something with Array behavior but it may not be an instance of MyArraySubclass

Allen

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

Allen Wirfs-Brock

On Jun 13, 2015, at 4:16 PM, Allen Wirfs-Brock wrote:


On Jun 13, 2015, at 1:18 PM, C. Scott Ananian wrote:

To throw some more paint on the bikeshed:

The "instanceof RegExp" and "RegExp(...)" parts of the "perfect" implementation of `RegExp.tag` should also be fixed to play nicely with species.

I think Allen and I would say that you should *not* use the species pattern for instantiating the new regexp (because this is a factory), but you should be doing `new this(...)` to create the result, instead of `RegExp(...)`.  (Domenic might disagree, but this is the pattern the ES6 spec is currently consistent with.)

Err- - Absolutely
Absolute, `new this(...)` is a pattern that everyone who whats to create inheritable "static" factory methods needs to learn.  It's how such a factory says: I need to create an instance of the constructor I was invoked upon.

`species` is very different.  It is how an instance method says: I need to create an new instance that is similar to this instance, but lack its specialized behavior.


The `instanceof RegExp` test might also be reviewed.  It might be okay, but perhaps you want to invoke a `toRegExp` method instead, or just look at `source`, so that we used duck typing instead of a fixed inheritance chain.  You could even define `String#toRegExp` and have that handle proper escaping.  This pattern might not play as nicely with subtyping, though, so perhaps using `this.escape(string)` (returning an instance of `this`) is in fact preferable.  Everything other than a string might be passed through `new this(somethingelse).source` which could cast it from a "base RegExp" to a subclass as necessary. You could handle whatever conversions are necessary in the constructor for your subclass.

Originally, ES6 had a @@isRegExp property that was used to brand objects that could be used in context's where RegExp instances were expected. It was used by methods like String.prototype.match/split/search/replace to determine if the "pattern" argument was an "regular expression"  rather than a string.  Latter @@isRegExp was eliminated and replaced with @@match, @@search, @@split, and @@replace because we realized that the corresponding methods didn't depend upon the full generality of regular expressions, but only upon the more specific behaviors.  When we did that, we also decided that we would use the present of @@match a property as the brand to identify regular expression like objects.  This is captured in the ES6 spec. by the IsRegExp abstract operation http://people.mozilla.org/~jorendorff/es6-draft.html#sec-isregexp which is used at several places within the ES6 spec.

So, the property ES6 way to do a cross-realm friendly check for RegExp like behavior is  to check for the existence of a property whose key is Symbol.match


If we did want to use the species pattern, the best way (IMO) would be to expose the fundamental alternation/concatenation/etc operators.  For example, `RegExp.prototype.concat(x)` would use the species pattern to produce a result, and would also handle escaping `x` if it was a string.  The set of instance methods needed is large but not *too* large: `concat`, `alt`, `mult`, and `group` (with options) might be sufficient.

yes.

Again, the key difference is whether we are talking about a request to a constructor object or a request to an instance object.

MyArraySubclass.of(1,2,3,4)    //I expect to get an instance of MyArraySubclass, not of Array or some other Array-like species

aMyArraySubclassObject.map(from=>f(from))  //I expect to get something with Array behavior but it may not be an instance of MyArraySubclass

Allen
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

Benjamin Gruenbaum
In reply to this post by Benjamin Gruenbaum
As a cross-cutting concern I'd like the feedback of more people on https://github.com/benjamingr/RegExp.escape/issues/29 

Basically we've got to make a design choice of readable output vs. potentially safer output. 


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

Bucaran
Hi.

I have never written a proposal before, but I would love if it was possible to do the following in JavaScript:

```js
// This code exposes a function that when called bound to an `object` inserts a method in that `object`.
// The following are 3 ways to do this, the last one being my proposal. Note that I am using CommonJS
// require style to include modules, but that is simply to illustrate how I came up with thi. Using `export default function…` 
// does not really address my use case, but just for completeness I have included an example using the new `import` 
// `export` system as well.

// Disclaimer: This is a simple plugin-like system example, but notice I am not checking whether the 
// method already exist in `object`. You can ignore this.


// Idiomatic JavaScript

module.exports = function () {
  let method = require("./lib/method")
  this.method = function (…args) {
    return args.map((arg) => method(arg))
  }
}

// Clumsy, but without using `let` variables

module.exports = function () {
  this.method = function (…args) {
    return (function (method) {
      return args.map((arg) => method(path))
    }(
      require(“./lib/method")
    ))
  }
}

// My proposal is to allow blocks to receive arguments (similar to the non-standard let blocks) and
// whatever you return inside the block is the return value of the enclosing function. Also, there
// should be no need to write `return` as it will always return the result of the last expression.

module.exports = function () {
  this.method = function (…args) {
    (method) { args.map((arg) => method(arg)) }(require("./lib/method")
  }
}


// Just for completeness, but not really related to what I am proposing (although it 
does solve this particular case very neatly) using import & exports

import method from "./method"

exports default function () {
  this.method = function (…args) {
    return args.map((arg) => method(arg))
  }
}

```

If you think this is a good or bad idea, please let me know your comments and observations.

Regards
J


On Jun 20, 2015, at 8:07 PM, Benjamin Gruenbaum <[hidden email]> wrote:

As a cross-cutting concern I'd like the feedback of more people on https://github.com/benjamingr/RegExp.escape/issues/29 

Basically we've got to make a design choice of readable output vs. potentially safer output. 

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

Benjamin Gruenbaum
In reply to this post by Benjamin Gruenbaum
Why is this a comment on the RegExp.escape discussion?

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RegExp.escape()

Benjamin Gruenbaum
In reply to this post by Benjamin Gruenbaum
I'd like to give https://github.com/benjamingr/RegExp.escape/issues/29 another week please, if you have a strong opinion voice it after which we'll settle on a hopefully final API for RegExp.escape in terms of the escaped parts.

Some parts so you won't have to read the whole thread (debated issues):

 - Numeric literals are escaped at the start of the string to not interfere with capturing groups (yes/no)
 - Hex characters ([0-9a-f]) are escaped at the start of the string to not interfere with unicode escape sequences (yes/no)
 - `/` is escaped to support passing a RegExp string to eval (yes/no)?

And so on.

On Sat, Jun 20, 2015 at 2:07 PM, Benjamin Gruenbaum <[hidden email]> wrote:
As a cross-cutting concern I'd like the feedback of more people on https://github.com/benjamingr/RegExp.escape/issues/29 

Basically we've got to make a design choice of readable output vs. potentially safer output. 



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
12