How can a lexer decide a token to be "get", "IdentifierName" or "Identifier" ?

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

How can a lexer decide a token to be "get", "IdentifierName" or "Identifier" ?

csf178@gmail.com
Hi, everyone

I noticed both “IdentifierName” and “Identifier” appeared in
syntactical grammar of ES5. It seems a lexer will not be able to
decide a token to be an "IdentifierName" or "Identifier" during lexing
phase.

A similar problem is “get”. "get" is not a keyword but is used like a
keyword in object literal. It can also be used as an "IdentifierName"
or "Identifier".

To solve the above issues, I guess we can treat "IdentifierName" as a
syntactical symbol instead of a token type, using the following
production:

IdentifierName ::
    Identifier
    Keywords
    FutureReservedWord


/Shaofei Cheng
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: How can a lexer decide a token to be "get", "IdentifierName" or "Identifier" ?

Brendan Eich-3
SpiderMonkey, at least, uses feed-forward from parser to lexer, a
one-bit mode flag:

http://mxr.mozilla.org/mozilla-central/search?string=TSF_KEYWORD_IS_NAME

/be

程劭非 wrote:

> Hi, everyone
>
> I noticed both “IdentifierName” and “Identifier” appeared in
> syntactical grammar of ES5. It seems a lexer will not be able to
> decide a token to be an "IdentifierName" or "Identifier" during lexing
> phase.
>
> A similar problem is “get”. "get" is not a keyword but is used like a
> keyword in object literal. It can also be used as an "IdentifierName"
> or "Identifier".
>
> To solve the above issues, I guess we can treat "IdentifierName" as a
> syntactical symbol instead of a token type, using the following
> production:
>
> IdentifierName ::
>      Identifier
>      Keywords
>      FutureReservedWord
>
>
> /Shaofei Cheng
> _______________________________________________
> es-discuss mailing list
> [hidden email]
> https://mail.mozilla.org/listinfo/es-discuss
>
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: How can a lexer decide a token to be "get", "IdentifierName" or "Identifier" ?

Allen Wirfs-Brock
In reply to this post by csf178@gmail.com
If I was going to move something to the syntactic grammar it would probably be the current definition of Identifier

Identifier :
    IdentifierName but not ReservedWord

You might  then lex all IdentifierNames (including ReservedWord) as IdentifierName tokens and treat all occurrences of keyword terminals in the syntactic grammar as short-hands for saying: IdentifierName matching this specific keyword.  For example:

PropertyAssignment :
   get PropertyName ( ) { FunctionBody }

could be interpreted as:

PropertyAssignment :
   IdentifierName PropertyName ( ) { FunctionBody }

with the static semantic restriction that the text of IdentifierName must be "get"


Allen


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: How can a lexer decide a token to be "get", "IdentifierName" or "Identifier" ?

csf178@gmail.com
Though it's a little too long since this discussion, I've tried Allen's idea in my parser and still find conflicting.

Consider the following rules:

PropertyAssignment :
    IdentifierName PropertyName ( ) { FunctionBody }

PropertyAssignment :
    PropertyName : AssignmentExpression

PropertyName :
    IdentifierName

when a parser get “IdentifierName” it need to decide reduce the IdentifierName into "get" or “PropertyName”. For LR parsers there is no way to do these things.

I would suggest another way:

IdentifierName ::
    FutureReservedWord
    Keywords
    Identifier
    SpecialWord

SpecialWord ::
    get
    set

2012/5/3 Allen Wirfs-Brock <[hidden email]>
If I was going to move something to the syntactic grammar it would probably be the current definition of Identifier

Identifier :
    IdentifierName but not ReservedWord

You might  then lex all IdentifierNames (including ReservedWord) as IdentifierName tokens and treat all occurrences of keyword terminals in the syntactic grammar as short-hands for saying: IdentifierName matching this specific keyword.  For example:

PropertyAssignment :
   get PropertyName ( ) { FunctionBody }

could be interpreted as:

PropertyAssignment :
   IdentifierName PropertyName ( ) { FunctionBody }

with the static semantic restriction that the text of IdentifierName must be "get"


Allen




_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: How can a lexer decide a token to be "get", "IdentifierName" or "Identifier" ?

Michael Dyck
程劭非 wrote:

> Though it's a little too long since this discussion, I've tried Allen's
> idea in my parser and still find conflicting.
>
> Consider the following rules:
>
> PropertyAssignment :
>     IdentifierName PropertyName ( ) { FunctionBody }
>
> PropertyAssignment :
>     PropertyName : AssignmentExpression
>
> PropertyName :
>     IdentifierName
>
> when a parser get “IdentifierName” it need to decide reduce
> the IdentifierName into "get" or “PropertyName”.

Strictly speaking, it wouldn't reduce IdentifierName to "get", because
there isn't a production
     "get" : IdentifierName
Instead, it's a shift-reduce conflict.

> For LR parsers there is no way to do these things.

There's no way only if you're talking about an LR(0) parser. But an LR(1)
parser would have 1 token of lookahead to resolve the conflict:
  -- if the next token is ":", reduce IdentifierName to PropertyName;
  -- if the next token is an IdentifierName, shift that.
  -- if the next token is anything else, syntax error.

Similarly, an LL(1) parser wouldn't be able to decide between the two
alternatives for PropertyAssignment, but an LL(2) could do it.

(All of this is ignoring the effects of other productions.)

-Michael


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: How can a lexer decide a token to be "get", "IdentifierName" or "Identifier" ?

Brendan Eich-3
Michael Dyck wrote:
> There's no way only if you're talking about an LR(0) parser. But an
> LR(1) parser would have 1 token of lookahead to resolve the conflict:
>  -- if the next token is ":", reduce IdentifierName to PropertyName;
>  -- if the next token is an IdentifierName, shift that.
>  -- if the next token is anything else, syntax error.

Don't forget method definition shorthand syntax, if the next token is
"(". Then the method is named "get", of course.

/be
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: How can a lexer decide a token to be "get", "IdentifierName" or "Identifier" ?

Michael Dyck
Brendan Eich wrote:
>
> Don't forget method definition shorthand syntax, if the next token is
> "(". Then the method is named "get", of course.

Yup. Like I said, I was ignoring the effect of other productions.
(That's what the OP appeared to be doing too.)

-Michael
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: How can a lexer decide a token to be "get", "IdentifierName" or "Identifier" ?

gaz Heyes
On 4 February 2013 21:56, Michael Dyck <[hidden email]> wrote:
Brendan Eich wrote:

Don't forget method definition shorthand syntax, if the next token is "(". Then the method is named "get", of course.

I find the syntax of set/get confusing:
({'get'x(){return 123;}}).x

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: How can a lexer decide a token to be "get", "IdentifierName" or "Identifier" ?

Brendan Eich-3
gaz Heyes wrote:
> I find the syntax of set/get confusing:

What's confusing?

> ({'get'x(){return 123;}}).x

That's not legal ES5.

/be
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: How can a lexer decide a token to be "get", "IdentifierName" or "Identifier" ?

gaz Heyes
On 4 February 2013 23:44, Brendan Eich <[hidden email]> wrote:
What's confusing?

The fact that you can have an object property without a colon and a function without a function keyword. Then a property descriptor uses a completely new syntax to define the same thing. Why?
Object.defineProperty(window,'x',{set:alert});
x=1;

To me this seems hacked together.
 
({'get'x(){return 123;}}).x

That's not legal ES5.

Some engines support it though and I'm pretty sure Firefox did at some point. 

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: How can a lexer decide a token to be "get", "IdentifierName" or "Identifier" ?

Rick Waldron



On Tue, Feb 5, 2013 at 3:19 AM, gaz Heyes <[hidden email]> wrote:
On 4 February 2013 23:44, Brendan Eich <[hidden email]> wrote:
What's confusing?

The fact that you can have an object property without a colon and a function without a function keyword.

ES6 concise methods will make this the norm:

let o = {
  meaning() {
    return 42;
  }
};

o.meaning(); // 42

 
Then a property descriptor uses a completely new syntax to define the same thing. Why?
Object.defineProperty(window,'x',{set:alert});
x=1;


What part is "new syntax"? Property descriptors are just object literal syntax—did you mean "different syntax"?

 

To me this seems hacked together.
 
({'get'x(){return 123;}}).x

That's not legal ES5.

Some engines support it though and I'm pretty sure Firefox did at some point. 

I think Brendan was referring to the quotes, ie. 'get'. Remove those for legal syntax:

({ get x() { return 123; } }).x 


Rick


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: How can a lexer decide a token to be "get", "IdentifierName" or "Identifier" ?

Brandon Benvie
Indeed, and given use of ES6, I expect things like this wouldn't be very uncommon (I think is supposed to be Object.define right?):

Object.define(x, {
  get a(){},
  set a(v){},
  get b(){},
  c(){}
});

Instead of most current descriptor stuff (since enumerability and configurability are rarely desired to be false).

On Tuesday, February 5, 2013, Rick Waldron wrote:



On Tue, Feb 5, 2013 at 3:19 AM, gaz Heyes <[hidden email]> wrote:
On 4 February 2013 23:44, Brendan Eich <[hidden email]> wrote:
What's confusing?

The fact that you can have an object property without a colon and a function without a function keyword.

ES6 concise methods will make this the norm:

let o = {
  meaning() {
    return 42;
  }
};

o.meaning(); // 42

 
Then a property descriptor uses a completely new syntax to define the same thing. Why?
Object.defineProperty(window,'x',{set:alert});
x=1;


What part is "new syntax"? Property descriptors are just object literal syntax—did you mean "different syntax"?

 

To me this seems hacked together.
 
({'get'x(){return 123;}}).x

That's not legal ES5.

Some engines support it though and I'm pretty sure Firefox did at some point. 

I think Brendan was referring to the quotes, ie. 'get'. Remove those for legal syntax:

({ get x() { return 123; } }).x 


Rick


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: How can a lexer decide a token to be "get", "IdentifierName" or "Identifier" ?

Rick Waldron



On Tue, Feb 5, 2013 at 11:55 AM, Brandon Benvie <[hidden email]> wrote:
Indeed, and given use of ES6, I expect things like this wouldn't be very uncommon (I think is supposed to be Object.define right?):

Nothing there yet, though I suspect Object.mixin() will have more traction.


Rick
 

Object.define(x, {
  get a(){},
  set a(v){},
  get b(){},
  c(){}
});

Instead of most current descriptor stuff (since enumerability and configurability are rarely desired to be false).


On Tuesday, February 5, 2013, Rick Waldron wrote:



On Tue, Feb 5, 2013 at 3:19 AM, gaz Heyes <[hidden email]> wrote:
On 4 February 2013 23:44, Brendan Eich <[hidden email]> wrote:
What's confusing?

The fact that you can have an object property without a colon and a function without a function keyword.

ES6 concise methods will make this the norm:

let o = {
  meaning() {
    return 42;
  }
};

o.meaning(); // 42

 
Then a property descriptor uses a completely new syntax to define the same thing. Why?
Object.defineProperty(window,'x',{set:alert});
x=1;


What part is "new syntax"? Property descriptors are just object literal syntax—did you mean "different syntax"?

 

To me this seems hacked together.
 
({'get'x(){return 123;}}).x

That's not legal ES5.

Some engines support it though and I'm pretty sure Firefox did at some point. 

I think Brendan was referring to the quotes, ie. 'get'. Remove those for legal syntax:

({ get x() { return 123; } }).x 


Rick



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: How can a lexer decide a token to be "get", "IdentifierName" or "Identifier" ?

Allen Wirfs-Brock
Right, I think "mixin" is winning over "define" as the name.   Same semantics in either case.

Allen

On Feb 5, 2013, at 9:03 AM, Rick Waldron wrote:




On Tue, Feb 5, 2013 at 11:55 AM, Brandon Benvie <[hidden email]> wrote:
Indeed, and given use of ES6, I expect things like this wouldn't be very uncommon (I think is supposed to be Object.define right?):

Nothing there yet, though I suspect Object.mixin() will have more traction.


Rick
 

Object.define(x, {
  get a(){},
  set a(v){},
  get b(){},
  c(){}
});

Instead of most current descriptor stuff (since enumerability and configurability are rarely desired to be false).


On Tuesday, February 5, 2013, Rick Waldron wrote:



On Tue, Feb 5, 2013 at 3:19 AM, gaz Heyes <[hidden email]> wrote:
On 4 February 2013 23:44, Brendan Eich <[hidden email]> wrote:
What's confusing?

The fact that you can have an object property without a colon and a function without a function keyword.

ES6 concise methods will make this the norm:

let o = {
  meaning() {
    return 42;
  }
};

o.meaning(); // 42

 
Then a property descriptor uses a completely new syntax to define the same thing. Why?
Object.defineProperty(window,'x',{set:alert});
x=1;


What part is "new syntax"? Property descriptors are just object literal syntax—did you mean "different syntax"?

 

To me this seems hacked together.
 
({'get'x(){return 123;}}).x

That's not legal ES5.

Some engines support it though and I'm pretty sure Firefox did at some point. 

I think Brendan was referring to the quotes, ie. 'get'. Remove those for legal syntax:

({ get x() { return 123; } }).x 


Rick


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss