|
12
|
On 02/01/2017 10:25, Dmitry Soshnikov wrote:
> As mentioned, Cover grammar is usually the process of the grammar design itself (as in ES6 spec itself). I'm not aware of automatic transformations for this (if you find any please let me know).
Cover grammars are an ugly hack that we had to add when there was no other practical choice. Avoid them as much as possible. They are only used in situations where lookahead restrictions and parametrized grammar rules do not work in any practical way.
When designing the grammar, the preferences are:
- Use standard LR(1) productions
- Use parametrized productions
- Use lookahead restrictions if parametrized productions would introduce too many parameters into rules
- Use a cover grammar if the grammar can't be reasonably expressed in any other way. They're a last resort with lots of problems.
Lookahead restrictions fit very well into an LR(1) engine as long as they're limited to only one token, and that's what I've implemented in the validator. You need to be very careful with them if looking more than one token ahead because lexing of the tokens can vary based on context. For example, if the next few characters in front of the cursor are )/abc/i+, then what is the second token? What is the third token? It's actually context-dependent.
The same problem is even worse for cover grammars.
Waldemar
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
|
|
On 17-02-02 06:23 PM, Waldemar Horwat wrote:
>
> Lookahead restrictions fit very well into an LR(1) engine
Again: Great, but how? E.g., do you pre-process the grammar, modify the
construction of the automaton, and/or modify the operation of the parser?
> as long as they're limited to only one token, and that's what I've
> implemented in the validator.
So when the grammar has a prohibited lookahead-sequence with more than one
token (in ExpressionStatement, IterationStatement, and ExportDeclaration),
does the validator just use the first token?
> You need to be very careful with them if looking more than one token
> ahead because lexing of the tokens can vary based on context. For
> example, if the next few characters in front of the cursor are )/abc/i+,
> then what is the second token? What is the third token? It's actually
> context-dependent.
But the context-dependentness of lexing is a parse-time problem, not a
validation-time problem, right? The syntactic level can just assume a stream
of (correctly lexed) input elements. Or do you validate deeper than the
syntactic grammar?
(And it seems to me that multi-token lookahead-restrictions would be hard
for a validator to handle even if lexing *wasn't* context-dependent, but
maybe that depends on how you handle them.)
-Michael
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
|
|
On 17-02-02 06:56 PM, Dmitry Soshnikov wrote:
>
> Just to double-check, by "parametrized productions" you mean the
> `No<lookahead>` extra productions?
See the 'Grammar Notation' section of ES6+, at the paragraph beginning
"A production may be parameterized by a subscripted annotation".
They were introduced in draft 20 of ES6, and *replaced* the use of the
"NoIn" productions of the previous draft (and ES3-5).
(which, strictly speaking, weren't "No<lookahead> productions", because they
didn't express a restriction on lookahead)
-Michael
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
|
|
On 02/02/2017 15:56, Dmitry Soshnikov wrote:
> On Thu, Feb 2, 2017 at 3:23 PM, Waldemar Horwat < [hidden email] <mailto: [hidden email]>> wrote:
>
> On 02/01/2017 10:25, Dmitry Soshnikov wrote:
>
> As mentioned, Cover grammar is usually the process of the grammar design itself (as in ES6 spec itself). I'm not aware of automatic transformations for this (if you find any please let me know).
>
>
> Cover grammars are an ugly hack that we had to add when there was no other practical choice. Avoid them as much as possible. They are only used in situations where lookahead restrictions and parametrized grammar rules do not work in any practical way.
>
>
> Yeah, absolutely agreed. The reason why I used cover grammar in that example to parse `{` as a `Block` vs. `ObjectLiteral`, and handle `ExpressionStatement` is to make the resulting grammar short, and don't introduce a bunch of `NoCurly`, etc extra productions for this. That's also said, this ugly hack also forces you to do post-processing overhead -- either validation of the nodes, or even re-interpretation (rewriting) to other nodes -- in my case I have to actually check that all nodes between `{` and `}` are properties, or vice-versa, statements, based on the expression/statement position.
Don't use a cover grammar to unify blocks with object literals. That's a really bad idea and you'll likely get it wrong. If the `{` starts a block, then if the next token is a keyword such as `if` then it's parsed as a keyword. If the `{` starts an object literal, then if the next token is a keyword then it's parsed as an identifiername. As we extend the language, the expansions of these can later diverge to the point where you won't know whether a `/` starts a regular expression or is a division symbol.
Waldemar
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
|
|
On 02/03/2017 08:17, Michael Dyck wrote:
> On 17-02-02 06:23 PM, Waldemar Horwat wrote:
>>
>> Lookahead restrictions fit very well into an LR(1) engine
>
> Again: Great, but how? E.g., do you pre-process the grammar, modify the construction of the automaton, and/or modify the operation of the parser?
For each state × token combination, the automaton describes what happens when you're in state S and see a token T. The lookahead restrictions remove possible transitions; without them there would be ambiguities where a given state × token combination would want to do two incompatible things.
That's different from parametrized rules, which simply macro-expand into lots of individual rules.
>> as long as they're limited to only one token, and that's what I've
>> implemented in the validator.
>
> So when the grammar has a prohibited lookahead-sequence with more than one token (in ExpressionStatement, IterationStatement, and ExportDeclaration), does the validator just use the first token?
My LR(1) validator can't actually handle the case of multiple tokens of lookahead, and I didn't feel like doing an LR(2) grammar. I had to check these by hand.
>> You need to be very careful with them if looking more than one token
>> ahead because lexing of the tokens can vary based on context. For
>> example, if the next few characters in front of the cursor are )/abc/i+,
>> then what is the second token? What is the third token? It's actually
>> context-dependent.
>
> But the context-dependentness of lexing is a parse-time problem, not a validation-time problem, right?
No.
> The syntactic level can just assume a stream of (correctly lexed) input elements.
No! It's impossible to create a stream of correctly lexed input elements without doing syntactic level parsing. See the example I gave above.
> Or do you validate deeper than the syntactic grammar?
Yes. The syntactic grammar controls the lexer. The validator looks for problems such as the syntactic grammar giving the lexer contradictory instructions. An example would be any syntactic grammar automaton state where one outgoing syntactic transition would swallow a regexp token and another outgoing syntactic transition from the same state would swallow a / token. If any such state exists, the grammar is broken.
> (And it seems to me that multi-token lookahead-restrictions would be hard for a validator to handle even if lexing *wasn't* context-dependent, but maybe that depends on how you handle them.)
>
> -Michael
>
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
|
|
On 17-02-03 05:32 PM, Waldemar Horwat wrote:
> On 02/03/2017 08:17, Michael Dyck wrote:
>> On 17-02-02 06:23 PM, Waldemar Horwat wrote:
>>>
>>> Lookahead restrictions fit very well into an LR(1) engine
>>
>> Again: Great, but how? E.g., do you pre-process the grammar, modify the
>> construction of the automaton, and/or modify the operation of the parser?
>
> For each state × token combination, the automaton describes what happens
> when you're in state S and see a token T. The lookahead restrictions remove
> possible transitions; without them there would be ambiguities where a given
> state × token combination would want to do two incompatible things.
Okay, so do you generate the automaton (ignoring lookahead restrictions) and
then remove transitions (using lookahead restrictions)? Or do you integrate
the lookahead-restrictions into the generation of the automaton?
> That's different from parametrized rules, which simply macro-expand into
> lots of individual rules.
Yup.
>> But the context-dependentness of lexing is a parse-time problem, not a
>> validation-time problem, right?
>
> No.
>
>> The syntactic level can just assume a stream of (correctly lexed) input
>> elements.
(I should have said "*Validation* of the syntactic level [etc]")
> No! It's impossible to create a stream of correctly lexed input elements
> without doing syntactic level parsing.
I quite agree. I didn't mean to suggest otherwise. What I mean is that, once
you've generated the automaton for the syntactic grammar, you can just look
at each state's set of expected terminals, and from that deduce the goal
symbol that the lexer will need to use to get the next token when the parser
is in that state. The point being that you can do that *after* generating
the syntactic automaton. So the context-dependentness of lexing doesn't have
to affect the process of generating the syntactic automaton.
(That's assuming an LR(1)-ish parser, and an approach where you don't try to
combine the syntactic and lexical grammars to generate a single
[scannerless] automaton. Which may not be your approach.)
> The validator looks for problems such as the syntactic grammar giving the
> lexer contradictory instructions. An example would be any syntactic
> grammar automaton state where one outgoing syntactic transition would
> swallow a regexp token and another outgoing syntactic transition from the
> same state would swallow a / token. If any such state exists, the grammar
> is broken.
Yup, the spec asserts the non-existence of such states ("syntactic grammar
contexts").
-Michael
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
|
|
On 02/04/2017 07:20, Michael Dyck wrote:
> On 17-02-03 05:32 PM, Waldemar Horwat wrote:
>> On 02/03/2017 08:17, Michael Dyck wrote:
>>> On 17-02-02 06:23 PM, Waldemar Horwat wrote:
>>>>
>>>> Lookahead restrictions fit very well into an LR(1) engine
>>>
>>> Again: Great, but how? E.g., do you pre-process the grammar, modify the
>>> construction of the automaton, and/or modify the operation of the parser?
>>
>> For each state × token combination, the automaton describes what happens
>> when you're in state S and see a token T. The lookahead restrictions remove
>> possible transitions; without them there would be ambiguities where a given
>> state × token combination would want to do two incompatible things.
>
> Okay, so do you generate the automaton (ignoring lookahead restrictions) and then remove transitions (using lookahead restrictions)? Or do you integrate the lookahead-restrictions into the generation of the automaton?
It's integrated. You can't generate a valid automaton without the lookahead restrictions.
>> That's different from parametrized rules, which simply macro-expand into
>> lots of individual rules.
>
> Yup.
>
>
>>> But the context-dependentness of lexing is a parse-time problem, not a
>>> validation-time problem, right?
>>
>> No.
>>
>>> The syntactic level can just assume a stream of (correctly lexed) input
>>> elements.
>
> (I should have said "*Validation* of the syntactic level [etc]")
>
>> No! It's impossible to create a stream of correctly lexed input elements
>> without doing syntactic level parsing.
>
> I quite agree. I didn't mean to suggest otherwise. What I mean is that, once you've generated the automaton for the syntactic grammar, you can just look at each state's set of expected terminals, and from that deduce the goal symbol that the lexer will need to use to get the next token when the parser is in that state. The point being that you can do that *after* generating the syntactic automaton. So the context-dependentness of lexing doesn't have to affect the process of generating the syntactic automaton.
That's correct.
> (That's assuming an LR(1)-ish parser, and an approach where you don't try to combine the syntactic and lexical grammars to generate a single [scannerless] automaton. Which may not be your approach.)
The parser and lexer stay separate, other than the lexer providing tokens to the parser and the parser selecting one of several top-level lexer goal symbols for lexing the next token. I do not use any kind of unified parser-lexer grammar; that could run into issues such as the syntactic grammar making lexing non-greedy: a+++++b lexing as a ++ + ++ b instead of the correct a ++ ++ + b (which would then generate a syntax error at the syntactic level).
>> The validator looks for problems such as the syntactic grammar giving the
>> lexer contradictory instructions. An example would be any syntactic
>> grammar automaton state where one outgoing syntactic transition would
>> swallow a regexp token and another outgoing syntactic transition from the
>> same state would swallow a / token. If any such state exists, the grammar
>> is broken.
>
> Yup, the spec asserts the non-existence of such states ("syntactic grammar contexts").
>
> -Michael
Waldemar
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
|
|
On 17-02-06 07:32 PM, Waldemar Horwat wrote:
> On 02/04/2017 07:20, Michael Dyck wrote:
>> On 17-02-03 05:32 PM, Waldemar Horwat wrote:
>>> On 02/03/2017 08:17, Michael Dyck wrote:
>>>> On 17-02-02 06:23 PM, Waldemar Horwat wrote:
>>>>>
>>>>> Lookahead restrictions fit very well into an LR(1) engine
>>>>
>>>> Again: Great, but how? E.g., do you pre-process the grammar, modify the
>>>> construction of the automaton, and/or modify the operation of the parser?
>>>
>>> For each state × token combination, the automaton describes what happens
>>> when you're in state S and see a token T. The lookahead restrictions remove
>>> possible transitions; without them there would be ambiguities where a given
>>> state × token combination would want to do two incompatible things.
>>
>> Okay, so do you generate the automaton (ignoring lookahead restrictions)
>> and then remove transitions (using lookahead restrictions)? Or do you
>> integrate the lookahead-restrictions into the generation of the automaton?
>
> It's integrated. You can't generate a valid automaton without the lookahead
> restrictions.
So, when you're making LR items for a production with a
lookahead-restriction (l-r), do you:
-- treat the l-r as a (nullable) symbol, so that you get an item with the
dot before the l-r, and one with the dot after, or
-- treat the l-r as occurring between two adjacent symbols, so that you get
an item where the dot is "at" the l-r, or
-- something else?
-Michael
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
|
|
Just out of curiosity, what's the best way to handle async arrow
function arguments vs `async` function calls? Is it term rewriting, or
is there a better way of handling that? I'm much more familiar with
recursive-descent and LL parsers than LR or shift-reduce ones, so I'm
not as readily familiar with how to handle these cases.
-----
Isiah Meadows
[hidden email]
On Tue, Feb 7, 2017 at 9:39 AM, Michael Dyck < [hidden email]> wrote:
> On 17-02-06 07:32 PM, Waldemar Horwat wrote:
>>
>> On 02/04/2017 07:20, Michael Dyck wrote:
>>>
>>> On 17-02-03 05:32 PM, Waldemar Horwat wrote:
>>>>
>>>> On 02/03/2017 08:17, Michael Dyck wrote:
>>>>>
>>>>> On 17-02-02 06:23 PM, Waldemar Horwat wrote:
>>>>>>
>>>>>>
>>>>>> Lookahead restrictions fit very well into an LR(1) engine
>>>>>
>>>>>
>>>>> Again: Great, but how? E.g., do you pre-process the grammar, modify the
>>>>> construction of the automaton, and/or modify the operation of the
>>>>> parser?
>>>>
>>>>
>>>> For each state × token combination, the automaton describes what happens
>>>> when you're in state S and see a token T. The lookahead restrictions
>>>> remove
>>>> possible transitions; without them there would be ambiguities where a
>>>> given
>>>> state × token combination would want to do two incompatible things.
>>>
>>>
>>> Okay, so do you generate the automaton (ignoring lookahead restrictions)
>>> and then remove transitions (using lookahead restrictions)? Or do you
>>> integrate the lookahead-restrictions into the generation of the
>>> automaton?
>>
>>
>> It's integrated. You can't generate a valid automaton without the
>> lookahead
>> restrictions.
>
>
> So, when you're making LR items for a production with a
> lookahead-restriction (l-r), do you:
>
> -- treat the l-r as a (nullable) symbol, so that you get an item with the
> dot before the l-r, and one with the dot after, or
>
> -- treat the l-r as occurring between two adjacent symbols, so that you get
> an item where the dot is "at" the l-r, or
>
> -- something else?
>
>
> -Michael
>
> _______________________________________________
> es-discuss mailing list
> [hidden email]
> https://mail.mozilla.org/listinfo/es-discuss_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
|
|
On 02/07/2017 06:39, Michael Dyck wrote:
> On 17-02-06 07:32 PM, Waldemar Horwat wrote:
>> On 02/04/2017 07:20, Michael Dyck wrote:
>>> On 17-02-03 05:32 PM, Waldemar Horwat wrote:
>>>> On 02/03/2017 08:17, Michael Dyck wrote:
>>>>> On 17-02-02 06:23 PM, Waldemar Horwat wrote:
>>>>>>
>>>>>> Lookahead restrictions fit very well into an LR(1) engine
>>>>>
>>>>> Again: Great, but how? E.g., do you pre-process the grammar, modify the
>>>>> construction of the automaton, and/or modify the operation of the parser?
>>>>
>>>> For each state × token combination, the automaton describes what happens
>>>> when you're in state S and see a token T. The lookahead restrictions remove
>>>> possible transitions; without them there would be ambiguities where a given
>>>> state × token combination would want to do two incompatible things.
>>>
>>> Okay, so do you generate the automaton (ignoring lookahead restrictions)
>>> and then remove transitions (using lookahead restrictions)? Or do you
>>> integrate the lookahead-restrictions into the generation of the automaton?
>>
>> It's integrated. You can't generate a valid automaton without the lookahead
>> restrictions.
>
> So, when you're making LR items for a production with a lookahead-restriction (l-r), do you:
>
> -- treat the l-r as a (nullable) symbol, so that you get an item with the dot before the l-r, and one with the dot after, or
>
> -- treat the l-r as occurring between two adjacent symbols, so that you get an item where the dot is "at" the l-r, or
>
> -- something else?
It's something else — it's directly tied into the generation of the automaton states. Each automaton state contains a collection of possible places in the expansions of grammar rules that the state can represent. Following a terminal symbol T from a state A leads to a state B. A lookahead restriction prevents state B's collection from including expansions that would have been prohibited by the lookahead restriction on T. If that generates an inconsistency (for example, if there are two ways to get to an identical state, one with a lookahead restriction and one without), the grammar validation fails.
Waldemar
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
|
|
On 17-02-07 06:02 PM, Waldemar Horwat wrote:
>
> Lookahead restrictions fit very well into an LR(1) engine [...]
> it's directly tied into the generation of the automaton states. Each
> automaton state contains a collection of possible places in the
> expansions of grammar rules that the state can represent. Following a
> terminal symbol T from a state A leads to a state B. A lookahead
> restriction prevents state B's collection from including expansions that
> would have been prohibited by the lookahead restriction on T. If that
> generates an inconsistency (for example, if there are two ways to get to
> an identical state, one with a lookahead restriction and one without),
> the grammar validation fails.
So that sounds like you form the closure of state A in the standard way, and
then use the lookahead-restriction to limit the transitions out of A (and
the items that A contributes to the kernel of B). But it seems to me that
forming the closure is also affected by the lookahead-restriction. Is that
true in your validator?
-Michael
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
|
12
|