Disabling nonstandard RegExp functionalities for proper subclasses of RegExp

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Disabling nonstandard RegExp functionalities for proper subclasses of RegExp

Claude Pache
Hi,

About the bad old nonstandard RegExp functionalities:

* `RegExp.prototype.compile()` — currently in Annex B;
* `RegExp.$1`, `RegExp.leftContext`, etc. — currently unspecced.

Although we could probably not get rid of them for plain regexps, I think it is a good idea to disable them for proper subclasses of RegExp (e.g., `class MyRegExp extends RegExp {}`).

Basically, the reason is that these methods break encapsulation (they operate on the raw regexp), leading to potential bugs when using them. Moreover `RegExp.$1` and friends have the additional troublesome property of relying on a global state that could be unexpectedly modified. For concrete examples of what could go wrong, see the subclass-restriction-motivation.md subpage in the proposal linked below.

Here is a link to a possible specification for the regexp statics (RegExp.$1, ...), that includes the disabling of bad legacy features for proper subclasses of RegExp (and for some edge-cases around cross-realm interactions):


What do you think? Are implementations willing to try?

—Claude


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Disabling nonstandard RegExp functionalities for proper subclasses of RegExp

Andrea Giammarchi-2
My 2 cents.

I always had the feeling people complaining about `RegExp.$1` and friends never really used them.

For instance, your example:

```js
/(a)/.exec('a')
Object.keys(bar)
RegExp.$1
```

might have side effects but it's also made up and, I believe, not a real-world concern.
If you use `re.exec` you address it, otherwise you go `re.test` while if you use `re.test` you (or at least *me*) are aware of possible side effects and will use the `RegExp.$1` or other property *instantly* after.

Following just a simple example:

```js
if (/^item-(\d+)$/.test(id)) {
  let num = parseInt(RegExp.$1, 10);
  // the rest of the code
}
```

I'm not saying these properties are cool, always safe, or anything, I'm just saying there are few useful cases for them.

However, while I agree the problem is that these modify the globally shared constructor, I also think that having these magic properties available in a subclass only, would be probably the key to solve pretty much all problems described in that page.

```js
reg.exp = class extends RegExp {};
function reg(source, ...flags) {
  return typeof source === 'string' ?
    new reg.exp(...[source, ...flags]) :
    new reg.exp(source.source, source.flags);
}

if (reg(/^item-(\d+)$/).test(id)) {
  let num = parseInt(reg.exp.$1, 10);
  // the rest of the code

  RegExp.$1 === reg.exp.$1; // false
  // the extended RegExp didn't modify
  // the global RegExp
}
```

Of course this would still suffer same, or very similar, problems in case `re.class` is exported as module and consumed by many different libraries, but I would be surprised if subclassing `RegExp` will create a limited subset or, at least, I wouldn't call that an extend.

After all, if "safe" is the main concern, writing `class SRegExp extends RegExp {}` on top of a module that uses RegExp in various ways doesn't look like a big piece of extra code to write, it's like a function declaration and it will make your code immune from global RegExp gotchas.

Or ... doesn't it?

Best Regards









On Tue, Apr 26, 2016 at 7:20 PM, Claude Pache <[hidden email]> wrote:
Hi,

About the bad old nonstandard RegExp functionalities:

* `RegExp.prototype.compile()` — currently in Annex B;
* `RegExp.$1`, `RegExp.leftContext`, etc. — currently unspecced.

Although we could probably not get rid of them for plain regexps, I think it is a good idea to disable them for proper subclasses of RegExp (e.g., `class MyRegExp extends RegExp {}`).

Basically, the reason is that these methods break encapsulation (they operate on the raw regexp), leading to potential bugs when using them. Moreover `RegExp.$1` and friends have the additional troublesome property of relying on a global state that could be unexpectedly modified. For concrete examples of what could go wrong, see the subclass-restriction-motivation.md subpage in the proposal linked below.

Here is a link to a possible specification for the regexp statics (RegExp.$1, ...), that includes the disabling of bad legacy features for proper subclasses of RegExp (and for some edge-cases around cross-realm interactions):


What do you think? Are implementations willing to try?

—Claude


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Disabling nonstandard RegExp functionalities for proper subclasses of RegExp

Claude Pache

> Le 27 avr. 2016 à 06:50, Andrea Giammarchi <[hidden email]> a écrit :
>
> My 2 cents.
>
> I always had the feeling people complaining about `RegExp.$1` and friends never really used them.
>
> For instance, your example:
>
> ```js
> /(a)/.exec('a')
> Object.keys(bar)
> RegExp.$1
> ```
>
> might have side effects but it's also made up and, I believe, not a real-world concern.
> If you use `re.exec` you address it, otherwise you go `re.test` while if you use `re.test` you (or at least *me*) are aware of possible side effects and will use the `RegExp.$1` or other property *instantly* after.
>
> Following just a simple example:
>
> ```js
> if (/^item-(\d+)$/.test(id)) {
>   let num = parseInt(RegExp.$1, 10);
>   // the rest of the code
> }
> ```
>
> I'm not saying these properties are cool, always safe, or anything, I'm just saying there are few useful cases for them.

An important point in my made-up example with `Object.keys()`, is that such an apparently innocuous function call might be hidden deep in the subclass implementation. That was an illustration of a way (among others) how a user-defined subclass could make the built-in RegExp.$1 feature brittle or buggy. (And no, I haven't taken time to find a realistic example, and I won't.)

>
> However, while I agree the problem is that these modify the globally shared constructor, I also think that having these magic properties available in a subclass only, would be probably the key to solve pretty much all problems described in that page.
>
> ```js
> reg.exp = class extends RegExp {};
> function reg(source, ...flags) {
>   return typeof source === 'string' ?
>     new reg.exp(...[source, ...flags]) :
>     new reg.exp(source.source, source.flags);
> }
>
> if (reg(/^item-(\d+)$/).test(id)) {
>   let num = parseInt(reg.exp.$1, 10);
>   // the rest of the code
>
>   RegExp.$1 === reg.exp.$1; // false
>   // the extended RegExp didn't modify
>   // the global RegExp
> }
> ```
>
> Of course this would still suffer same, or very similar, problems in case `re.class` is exported as module and consumed by many different libraries, but I would be surprised if subclassing `RegExp` will create a limited subset or, at least, I wouldn't call that an extend.

Of course, subclasses of RegExp are free to define their own `reg.exp.$1` static properties, insomuch that they are free to implement features that other people judge horrific.

In any case, it might be good that `reg.exp.$1` does not inherit from `RegExp.$1` by default (as it is currently the case), in case the semantics and implementation of `reg.exp` renders the value of the inherited property misleading.


>
> After all, if "safe" is the main concern, writing `class SRegExp extends RegExp {}` on top of a module that uses RegExp in various ways doesn't look like a big piece of extra code to write, it's like a function declaration and it will make your code immune from global RegExp gotchas.
>
> Or ... doesn't it?

I think that safety is not the main concern here. I mean, safety is better served by completely removing RegExp.$1 and friends, including for plain regexps, which, although it couldn't be achieved by default, can be made possible by leaving to the user the ability to remove completely the API (e.g., by making RegExp.$1 configurable and deletable) and by limiting cross-realm leaks. I.e., by rendering the entire environment safe instead of leaving that task to individual modules.

—Claude

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Disabling nonstandard RegExp functionalities for proper subclasses of RegExp

Andrea Giammarchi-2
What I am saying is that `class MyRE extends RegExp {}` should *not* inherit any magic property from global `RegExp` but *also* should *not* pollute the global `RegExp` when used.

Right now, the code I've tested on Chrome, suffers pollution either ways: `MyRE.$1` is `RegExp.$1` and vice versa, even if I'm using `new MyRE()` when testing or matching my own regular expressions.

Having same features improved on the global pollution side looks like a better win than removing features one might or not like.

Best Regards



On Wed, Apr 27, 2016 at 2:34 PM, Claude Pache <[hidden email]> wrote:

> Le 27 avr. 2016 à 06:50, Andrea Giammarchi <[hidden email]> a écrit :
>
> My 2 cents.
>
> I always had the feeling people complaining about `RegExp.$1` and friends never really used them.
>
> For instance, your example:
>
> ```js
> /(a)/.exec('a')
> Object.keys(bar)
> RegExp.$1
> ```
>
> might have side effects but it's also made up and, I believe, not a real-world concern.
> If you use `re.exec` you address it, otherwise you go `re.test` while if you use `re.test` you (or at least *me*) are aware of possible side effects and will use the `RegExp.$1` or other property *instantly* after.
>
> Following just a simple example:
>
> ```js
> if (/^item-(\d+)$/.test(id)) {
>   let num = parseInt(RegExp.$1, 10);
>   // the rest of the code
> }
> ```
>
> I'm not saying these properties are cool, always safe, or anything, I'm just saying there are few useful cases for them.

An important point in my made-up example with `Object.keys()`, is that such an apparently innocuous function call might be hidden deep in the subclass implementation. That was an illustration of a way (among others) how a user-defined subclass could make the built-in RegExp.$1 feature brittle or buggy. (And no, I haven't taken time to find a realistic example, and I won't.)

>
> However, while I agree the problem is that these modify the globally shared constructor, I also think that having these magic properties available in a subclass only, would be probably the key to solve pretty much all problems described in that page.
>
> ```js
> reg.exp = class extends RegExp {};
> function reg(source, ...flags) {
>   return typeof source === 'string' ?
>     new reg.exp(...[source, ...flags]) :
>     new reg.exp(source.source, source.flags);
> }
>
> if (reg(/^item-(\d+)$/).test(id)) {
>   let num = parseInt(reg.exp.$1, 10);
>   // the rest of the code
>
>   RegExp.$1 === reg.exp.$1; // false
>   // the extended RegExp didn't modify
>   // the global RegExp
> }
> ```
>
> Of course this would still suffer same, or very similar, problems in case `re.class` is exported as module and consumed by many different libraries, but I would be surprised if subclassing `RegExp` will create a limited subset or, at least, I wouldn't call that an extend.

Of course, subclasses of RegExp are free to define their own `reg.exp.$1` static properties, insomuch that they are free to implement features that other people judge horrific.

In any case, it might be good that `reg.exp.$1` does not inherit from `RegExp.$1` by default (as it is currently the case), in case the semantics and implementation of `reg.exp` renders the value of the inherited property misleading.


>
> After all, if "safe" is the main concern, writing `class SRegExp extends RegExp {}` on top of a module that uses RegExp in various ways doesn't look like a big piece of extra code to write, it's like a function declaration and it will make your code immune from global RegExp gotchas.
>
> Or ... doesn't it?

I think that safety is not the main concern here. I mean, safety is better served by completely removing RegExp.$1 and friends, including for plain regexps, which, although it couldn't be achieved by default, can be made possible by leaving to the user the ability to remove completely the API (e.g., by making RegExp.$1 configurable and deletable) and by limiting cross-realm leaks. I.e., by rendering the entire environment safe instead of leaving that task to individual modules.

—Claude



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Disabling nonstandard RegExp functionalities for proper subclasses of RegExp

Mark S. Miller-2
In reply to this post by Andrea Giammarchi-2
Sorry, none of these examples are persuasive. Such non-local causality is a nightmare on many grounds. If we did want to provide equivalent functionality somehow, we should find a non-stupid way to package it.

I like Claude's proposal as stated: Annex B normative *optional*, configurable and actually deletable, no contamination of cross-realm or subclasses. The only reason to admit even that is legacy. Claude's proposal is a clean proposal that accommodates legacy while minimizing further impact.


On Tue, Apr 26, 2016 at 9:50 PM, Andrea Giammarchi <[hidden email]> wrote:
My 2 cents.

I always had the feeling people complaining about `RegExp.$1` and friends never really used them.

For instance, your example:

```js
/(a)/.exec('a')
Object.keys(bar)
RegExp.$1
```

might have side effects but it's also made up and, I believe, not a real-world concern.
If you use `re.exec` you address it, otherwise you go `re.test` while if you use `re.test` you (or at least *me*) are aware of possible side effects and will use the `RegExp.$1` or other property *instantly* after.

Following just a simple example:

```js
if (/^item-(\d+)$/.test(id)) {
  let num = parseInt(RegExp.$1, 10);
  // the rest of the code
}
```

I'm not saying these properties are cool, always safe, or anything, I'm just saying there are few useful cases for them.

However, while I agree the problem is that these modify the globally shared constructor, I also think that having these magic properties available in a subclass only, would be probably the key to solve pretty much all problems described in that page.

```js
reg.exp = class extends RegExp {};
function reg(source, ...flags) {
  return typeof source === 'string' ?
    new reg.exp(...[source, ...flags]) :
    new reg.exp(source.source, source.flags);
}

if (reg(/^item-(\d+)$/).test(id)) {
  let num = parseInt(reg.exp.$1, 10);
  // the rest of the code

  RegExp.$1 === reg.exp.$1; // false
  // the extended RegExp didn't modify
  // the global RegExp
}
```

Of course this would still suffer same, or very similar, problems in case `re.class` is exported as module and consumed by many different libraries, but I would be surprised if subclassing `RegExp` will create a limited subset or, at least, I wouldn't call that an extend.

After all, if "safe" is the main concern, writing `class SRegExp extends RegExp {}` on top of a module that uses RegExp in various ways doesn't look like a big piece of extra code to write, it's like a function declaration and it will make your code immune from global RegExp gotchas.

Or ... doesn't it?

Best Regards









On Tue, Apr 26, 2016 at 7:20 PM, Claude Pache <[hidden email]> wrote:
Hi,

About the bad old nonstandard RegExp functionalities:

* `RegExp.prototype.compile()` — currently in Annex B;
* `RegExp.$1`, `RegExp.leftContext`, etc. — currently unspecced.

Although we could probably not get rid of them for plain regexps, I think it is a good idea to disable them for proper subclasses of RegExp (e.g., `class MyRegExp extends RegExp {}`).

Basically, the reason is that these methods break encapsulation (they operate on the raw regexp), leading to potential bugs when using them. Moreover `RegExp.$1` and friends have the additional troublesome property of relying on a global state that could be unexpectedly modified. For concrete examples of what could go wrong, see the subclass-restriction-motivation.md subpage in the proposal linked below.

Here is a link to a possible specification for the regexp statics (RegExp.$1, ...), that includes the disabling of bad legacy features for proper subclasses of RegExp (and for some edge-cases around cross-realm interactions):


What do you think? Are implementations willing to try?

—Claude


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss




--
    Cheers,
    --MarkM

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Disabling nonstandard RegExp functionalities for proper subclasses of RegExp

Claude Pache
In reply to this post by Andrea Giammarchi-2

> Le 27 avr. 2016 à 15:46, Andrea Giammarchi <[hidden email]> a écrit :
>
> What I am saying is that `class MyRE extends RegExp {}` should *not* inherit any magic property from global `RegExp` but *also* should *not* pollute the global `RegExp` when used.
>
> Right now, the code I've tested on Chrome, suffers pollution either ways: `MyRE.$1` is `RegExp.$1` and vice versa, even if I'm using `new MyRE()` when testing or matching my own regular expressions.
>
> Having same features improved on the global pollution side looks like a better win than removing features one might or not like.
>
> Best Regards

Right. I have just amended my proposal so that MyRE.$1 is undefined by default.

In the same time, I have simplified the semantics: more precisely, I have removed attempts to be smart that resulted in being more risky w.r.t backward compatibility. It is more important to have something that is agreed to be implemented.

—Claude
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss