Unicode non-character being treat as space on Firefox/Chrome

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Unicode non-character being treat as space on Firefox/Chrome

Gareth Heyes
Hi all

Not sure if this is a bug or not. Non-character is being treated as a space even though it's not defined as one. Edge and Safari treat it as an invalid character.

```javascript
�alert�(1)�
```

In case the characters get mangled:
```javascript
eval("alert"+String.fromCharCode(65534)+"(1)");
```

Cheers
Gareth

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unicode non-character being treat as space on Firefox/Chrome

Michał Wadas
I believe that Unicode specification make it undefined behaviour.

In effect, noncharacters can be thought of as application-internal private-use code points. Unlike the private-use characters discussed in Section 16.5, Private-Use Characters, which are assigned characters and which are intended for use in open interchange, subject to interpretation by private agreement, noncharacters are permanently reserved (unassigned) and have no interpretation whatsoever outside of their possible application-internal private uses

http://www.unicode.org/versions/Unicode6.0.0/ch16.pdf



On Thu, May 25, 2017 at 12:33 PM, Gareth Heyes <[hidden email]> wrote:
Hi all

Not sure if this is a bug or not. Non-character is being treated as a space even though it's not defined as one. Edge and Safari treat it as an invalid character.

```javascript
�alert�(1)�
```

In case the characters get mangled:
```javascript
eval("alert"+String.fromCharCode(65534)+"(1)");
```

Cheers
Gareth

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unicode non-character being treat as space on Firefox/Chrome

Mark S. Miller-2
What is the relevant EcmaScript standards text that would delegate to this? Even if Unicode implies an undefined case, EcmaScript should not. If EcmaScript behavior for such cases is undefined, we should define it.


On Thu, May 25, 2017 at 9:01 AM, Michał Wadas <[hidden email]> wrote:
I believe that Unicode specification make it undefined behaviour.

In effect, noncharacters can be thought of as application-internal private-use code points. Unlike the private-use characters discussed in Section 16.5, Private-Use Characters, which are assigned characters and which are intended for use in open interchange, subject to interpretation by private agreement, noncharacters are permanently reserved (unassigned) and have no interpretation whatsoever outside of their possible application-internal private uses

http://www.unicode.org/versions/Unicode6.0.0/ch16.pdf



On Thu, May 25, 2017 at 12:33 PM, Gareth Heyes <[hidden email]> wrote:
Hi all

Not sure if this is a bug or not. Non-character is being treated as a space even though it's not defined as one. Edge and Safari treat it as an invalid character.

```javascript
�alert�(1)�
```

In case the characters get mangled:
```javascript
eval("alert"+String.fromCharCode(65534)+"(1)");
```

Cheers
Gareth

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss




--
    Cheers,
    --MarkM

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unicode non-character being treat as space on Firefox/Chrome

Gareth Heyes
On 25 May 2017 at 14:04, Mark S. Miller <[hidden email]> wrote:
What is the relevant EcmaScript standards text that would delegate to this? Even if Unicode implies an undefined case, EcmaScript should not. If EcmaScript behavior for such cases is undefined, we should define it.

Looking at the spec. it seems undefined. 0xfffe isn't defined as a whitespace character. This is probably why we have different behaviour in different browsers.

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unicode non-character being treat as space on Firefox/Chrome

Domenic Denicola

We should probably move this to a GitHub issue then, so ES can have clarity on it.


If it helps, I am pretty sure (although I should double-check) that HTML treats such noncharacters as conformance errors (i.e. external tools like validators will warn you about them), but does not let them impact the processing model; they are passed through as-is.


From: es-discuss <[hidden email]> on behalf of Gareth Heyes <[hidden email]>
Sent: Thursday, May 25, 2017 10:52:52 AM
To: Mark S. Miller
Cc: [hidden email]
Subject: Re: Unicode non-character being treat as space on Firefox/Chrome
 
On 25 May 2017 at 14:04, Mark S. Miller <[hidden email]> wrote:
What is the relevant EcmaScript standards text that would delegate to this? Even if Unicode implies an undefined case, EcmaScript should not. If EcmaScript behavior for such cases is undefined, we should define it.

Looking at the spec. it seems undefined. 0xfffe isn't defined as a whitespace character. This is probably why we have different behaviour in different browsers.

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unicode non-character being treat as space on Firefox/Chrome

Allen Wirfs-Brock
clause 10.1: 

ECMAScript code is expressed using Unicode. ECMAScript source text is a sequence of code points. All Unicode code point values from U+0000 to U+10FFFF, including surrogate code points, may occur in source text where permitted by the ECMAScript grammars.


https://tc39.github.io/ecma262/#sec-white-space exactly defines which specific code units are treated as Whitespae by the ECMAScript grammar. It does not include unassigned code points in the set of valid Whitespace

Allen



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unicode non-character being treat as space on Firefox/Chrome

Mark S. Miller-2
Allen, I'm very glad to hear that it is unambiguous after all.

Gareth, could you file bugs against the non-conforming browsers? Thanks for finding this!



On Thu, May 25, 2017 at 8:58 AM, Allen Wirfs-Brock <[hidden email]> wrote:
clause 10.1: 

ECMAScript code is expressed using Unicode. ECMAScript source text is a sequence of code points. All Unicode code point values from U+0000 to U+10FFFF, including surrogate code points, may occur in source text where permitted by the ECMAScript grammars.


https://tc39.github.io/ecma262/#sec-white-space exactly defines which specific code units are treated as Whitespae by the ECMAScript grammar. It does not include unassigned code points in the set of valid Whitespace

Allen





--
    Cheers,
    --MarkM

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unicode non-character being treat as space on Firefox/Chrome

Gareth Heyes
On 25 May 2017 at 17:02, Mark S. Miller <[hidden email]> wrote:
Allen, I'm very glad to hear that it is unambiguous after all.
Gareth, could you file bugs against the non-conforming browsers? Thanks for finding this!

Yeah sure I'll file the bugs now.

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss