Machine readable specifications

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Machine readable specifications

gaz Heyes
Hi all

I'd like to discuss a radical change on how JS specifications and others are constructed. I suggest int based rules are used to define language behaviour. I know this works with last state and next state tracking and expected states too maybe other behaviour could be defined this way as well. The first part of the specification should define a list of ints that correspond to the state starting from 0 with 0 being a start state of Nothing. The initial states should be specified as follows:

initalStates = {0:"Nothing",1:"FunctionExpression", 2:"FunctionStatement", 3:"FunctionStatementIdentifier" ...

Once every state is defined as an int, each state can be used in a lookup table to determine the allowed last state and/or expected states. The lookup table is used for error messages by the parser to convert the corresponding int to human readable form. To define that FunctionStatement can follow Nothing but FunctionExpression can't follow nothing and FunctionStatementIdentifier is expected after FunctionStatement.

rules = {
        //FunctionStatement State
         2:    {
                   lastStates: {
                                          //FunctionStatement is allowed after last state "Nothing"
                                          0: true
                                    },
                   expectedStates: {
                                          //FunctionStatement expects the next state to be FunctionStatementIdentifier
                                          3:true
                                   }
                }
};

"lastStates" and "expectedStates" would also be ints but I added the text for clarity, true could be shortened to 1 for compression. To use these rules the parser can simply check the state machine which was the last state and if the next state is valid. E.g.
lastState = 0;//Last state was Nothing
state = 2;//FunctionStatement state
if(rules[state].lastStates[lastState]) {
  // FunctionStatement is allowed to follow Nothing
}

lastState = 0;//Last state was Nothing
state = 1;//FunctionExpression state
if(!rules[state].lastStates[lastState]) {
   //FunctionExpression is not allowed to follow Nothing
   error();
}

The same technique could be used for expected states since you'd just need to lookup the next state with the assigned expected state from the last state.
Once a machine readable specification has been done we could then generate a human readable form of it based on the rules and it could be checked that it conforms to what was intended. Both specifications (human readable and machine readable) could be used by the implementer, when a specification changes the parsers could automatically update based on the machine readable form. Emulation for older parsers could be added within the browser itself by using the new specification rules in a parser shim.

Thoughts?

Cheers

Gareth

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Machine readable specifications

Alex Russell-4
I expect that what you'll hear from implementers is that parsing isn't the hard bit of a modern JS engine -- it's certainly not the thorniest part of Traceur, and it doesn't do _most_ of the work a JIT-ing engine would.

If you would like to concretely improve the situation, you might ask Allen and folks from engine teams what would simplify their jobs before proposing a solution.

On Friday, March 22, 2013, gaz Heyes wrote:
Hi all

I'd like to discuss a radical change on how JS specifications and others are constructed. I suggest int based rules are used to define language behaviour. I know this works with last state and next state tracking and expected states too maybe other behaviour could be defined this way as well. The first part of the specification should define a list of ints that correspond to the state starting from 0 with 0 being a start state of Nothing. The initial states should be specified as follows:

initalStates = {0:"Nothing",1:"FunctionExpression", 2:"FunctionStatement", 3:"FunctionStatementIdentifier" ...

Once every state is defined as an int, each state can be used in a lookup table to determine the allowed last state and/or expected states. The lookup table is used for error messages by the parser to convert the corresponding int to human readable form. To define that FunctionStatement can follow Nothing but FunctionExpression can't follow nothing and FunctionStatementIdentifier is expected after FunctionStatement.

rules = {
        //FunctionStatement State
         2:    {
                   lastStates: {
                                          //FunctionStatement is allowed after last state "Nothing"
                                          0: true
                                    },
                   expectedStates: {
                                          //FunctionStatement expects the next state to be FunctionStatementIdentifier
                                          3:true
                                   }
                }
};

"lastStates" and "expectedStates" would also be ints but I added the text for clarity, true could be shortened to 1 for compression. To use these rules the parser can simply check the state machine which was the last state and if the next state is valid. E.g.
lastState = 0;//Last state was Nothing
state = 2;//FunctionStatement state
if(rules[state].lastStates[lastState]) {
  // FunctionStatement is allowed to follow Nothing
}

lastState = 0;//Last state was Nothing
state = 1;//FunctionExpression state
if(!rules[state].lastStates[lastState]) {
   //FunctionExpression is not allowed to follow Nothing
   error();
}

The same technique could be used for expected states since you'd just need to lookup the next state with the assigned expected state from the last state.
Once a machine readable specification has been done we could then generate a human readable form of it based on the rules and it could be checked that it conforms to what was intended. Both specifications (human readable and machine readable) could be used by the implementer, when a specification changes the parsers could automatically update based on the machine readable form. Emulation for older parsers could be added within the browser itself by using the new specification rules in a parser shim.

Thoughts?

Cheers

Gareth

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Machine readable specifications

Andreas Rossberg-4
In reply to this post by gaz Heyes
On 22 March 2013 12:51, gaz Heyes <[hidden email]> wrote:

> I'd like to discuss a radical change on how JS specifications and others are
> constructed. I suggest int based rules are used to define language
> behaviour. I know this works with last state and next state tracking and
> expected states too maybe other behaviour could be defined this way as well.
> The first part of the specification should define a list of ints that
> correspond to the state starting from 0 with 0 being a start state of
> Nothing. The initial states should be specified as follows:
>
> initalStates = {0:"Nothing",1:"FunctionExpression", 2:"FunctionStatement",
> 3:"FunctionStatementIdentifier" ...

Aren't you confusing "machine readable specification" with "machine
readable syntax specification"? Syntax is only a small part of a
language spec, and quite frankly, the least interesting one by far.
Also, I don't see the benefit of your format. The EcmaScript spec
actually goes to quite some length to give a grammar that is LR(1).
The whole point is to keep it de facto "machine readable". But unlike
yours, it also is human readable.

(FWIW, there are at least two serious efforts of actually formalising
and mechanising the JavaScript language, i.e. its _semantics_, which
is a much more interesting endeavor. Techniques for formalised
language specs have made a great leap over the last decade, and I
sincerely hope we'll see them becoming mainstream at some point. But
as usual, the mainstream is very conservative, probably for valid
reasons.)

/Andreas
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Machine readable specifications

Wes Garland
In reply to this post by Alex Russell-4
The specification is hard enough to digest for average developers now;
I'd hate to point one of my guys at a machine-readable document when
he's having trouble with some corner-case.
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Machine readable specifications

gaz Heyes
In reply to this post by Andreas Rossberg-4
On 22 March 2013 12:33, Andreas Rossberg <[hidden email]> wrote:
Aren't you confusing "machine readable specification" with "machine
readable syntax specification"? Syntax is only a small part of a
language spec, and quite frankly, the least interesting one by far.
Also, I don't see the benefit of your format. The EcmaScript spec
actually goes to quite some length to give a grammar that is LR(1).
The whole point is to keep it de facto "machine readable". But unlike
yours, it also is human readable.

That was the reason to discuss, I think it could be applied to more than just syntax rules. You don't like it and that's fine with me I just wanted to post my ideas.

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Machine readable specifications

gaz Heyes
In reply to this post by Wes Garland
On 22 March 2013 12:36, Wes Garland <[hidden email]> wrote:
The specification is hard enough to digest for average developers now;
I'd hate to point one of my guys at a machine-readable document when
he's having trouble with some corner-case.

You wouldn't do that, you'd point them to the human readable form. The machine version is specifically designed to be read by machines with low overhead to update functionality without touching code.

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Machine readable specifications

Wes Garland
On 22 March 2013 08:41, gaz Heyes <[hidden email]> wrote:
> You wouldn't do that, you'd point them to the human readable form. The
> machine version is specifically designed to be read by machines with low
> overhead to update functionality without touching code.

Is there a formalized way to translate between the human and
machine-readable specifications?  If they are found to have different
meanings (not hard to imagine), which specification would be
considered authoritative?

--
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Machine readable specifications

gaz Heyes
On 22 March 2013 12:57, Wes Garland <[hidden email]> wrote:
Is there a formalized way to translate between the human and
machine-readable specifications?  If they are found to have different
meanings (not hard to imagine), which specification would be
considered authoritative?

The human readable form would be generated from the machine readable form. The human form should be a descriptive representation of the actual rules defined by the machine readable form. There should be no confusion only human error in understanding the descriptive text. The machine readable form should be authoritative since the idea is to implement whatever code based on the rules from the machine readable specification.

To translate what I wrote in the initial mail into human form would be something like:

Function Statement can follow Nothing and then expects Function Statement Identifier to follow.
and
Function Expression cannot follow Nothing.


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Machine readable specifications

Andreas Rossberg-4
In reply to this post by gaz Heyes
On 22 March 2013 13:37, gaz Heyes <[hidden email]> wrote:

> On 22 March 2013 12:33, Andreas Rossberg <[hidden email]> wrote:
>>
>> Aren't you confusing "machine readable specification" with "machine
>> readable syntax specification"? Syntax is only a small part of a
>> language spec, and quite frankly, the least interesting one by far.
>> Also, I don't see the benefit of your format. The EcmaScript spec
>> actually goes to quite some length to give a grammar that is LR(1).
>> The whole point is to keep it de facto "machine readable". But unlike
>> yours, it also is human readable.
>
> That was the reason to discuss, I think it could be applied to more than
> just syntax rules. You don't like it and that's fine with me I just wanted
> to post my ideas.

You may be underestimating the challenge ;). If you are aiming for
more than just syntax then you certainly wouldn't want a low-level
encoding like the one you suggest. The standard approach to
formalising a language is through a structured operational semantics
over an abstract syntax, and the standard approach to making that
accessible to a machine is encoding it in an established theorem
prover or language like Coq or Isabelle or Agda or Twelf.

/Andreas
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Machine readable specifications

Michael Dyck
In reply to this post by Andreas Rossberg-4
Andreas Rossberg wrote:
>
> FWIW, there are at least two serious efforts of actually formalising
> and mechanising the JavaScript language, i.e. its _semantics_, which
> is a much more interesting endeavor.

Could you give (or point me to) details of these projects?

I've done some work along these lines (though I'm not sure it qualifies
as "serious"), so I'm wondering if I'm just reinventing the work of
others.

-Michael


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Machine readable specifications

Sam Tobin-Hochstadt
On Fri, Mar 22, 2013 at 6:15 PM, Michael Dyck <[hidden email]> wrote:

> Andreas Rossberg wrote:
>>
>>
>> FWIW, there are at least two serious efforts of actually formalising
>> and mechanising the JavaScript language, i.e. its _semantics_, which
>> is a much more interesting endeavor.
>
>
> Could you give (or point me to) details of these projects?
>
> I've done some work along these lines (though I'm not sure it qualifies
> as "serious"), so I'm wondering if I'm just reinventing the work of
> others.

The two projects Andreas is referring to are:

S5 and LambdaJS, by Arjun Guha and others: http://www.jswebtools.org/
Another, unnamed semantics, by Ankur Taly and others: http://jssec.net/

S5, which covers (almost) all of ES5, is still under development, and
passes a large portion of Test262.

Sam
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Machine readable specifications

Mark S. Miller-2
[+arjun, +joe, +ankur, +philippa, +gareth]


On Fri, Mar 22, 2013 at 11:34 PM, Sam Tobin-Hochstadt <[hidden email]> wrote:
On Fri, Mar 22, 2013 at 6:15 PM, Michael Dyck <[hidden email]> wrote:
> Andreas Rossberg wrote:
>>
>>
>> FWIW, there are at least two serious efforts of actually formalising
>> and mechanising the JavaScript language, i.e. its _semantics_, which
>> is a much more interesting endeavor.
>
>
> Could you give (or point me to) details of these projects?
>
> I've done some work along these lines (though I'm not sure it qualifies
> as "serious"), so I'm wondering if I'm just reinventing the work of
> others.

The two projects Andreas is referring to are:

S5 and LambdaJS, by Arjun Guha and others: http://www.jswebtools.org/
Another, unnamed semantics, by Ankur Taly and others: http://jssec.net/


Both semantics have been used as the basis for further work by other groups, such as the JSCert group you link to above. (Btw, I am in London right now in order to meet with them this Monday and Tuesday. I'll also be doing a presentation on Dr. SES at Imperial on Monday.)

 


S5, which covers (almost) all of ES5, is still under development, and
passes a large portion of Test262.

Sam
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss



--
    Cheers,
    --MarkM

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Machine readable specifications

Mark S. Miller-2



On Sat, Mar 23, 2013 at 3:36 PM, Mark S. Miller <[hidden email]> wrote:
[+arjun, +joe, +ankur, +philippa, +gareth]


On Fri, Mar 22, 2013 at 11:34 PM, Sam Tobin-Hochstadt <[hidden email]> wrote:
On Fri, Mar 22, 2013 at 6:15 PM, Michael Dyck <[hidden email]> wrote:
> Andreas Rossberg wrote:
>>
>>
>> FWIW, there are at least two serious efforts of actually formalising
>> and mechanising the JavaScript language, i.e. its _semantics_, which
>> is a much more interesting endeavor.
>
>
> Could you give (or point me to) details of these projects?
>
> I've done some work along these lines (though I'm not sure it qualifies
> as "serious"), so I'm wondering if I'm just reinventing the work of
> others.

The two projects Andreas is referring to are:

S5 and LambdaJS, by Arjun Guha and others: http://www.jswebtools.org/
Another, unnamed semantics, by Ankur Taly and others: http://jssec.net/


Both semantics have been used as the basis for further work by other groups, such as the JSCert group you link to above.

My mistake. You didn't link directly to JSCert, but rather the page you did link to does. JSCert is at http://jscert.org/

 
(Btw, I am in London right now in order to meet with them this Monday and Tuesday. I'll also be doing a presentation on Dr. SES at Imperial on Monday.)

 


S5, which covers (almost) all of ES5, is still under development, and
passes a large portion of Test262.

Sam
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss



--
    Cheers,
    --MarkM



--
    Cheers,
    --MarkM

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss