SML vs Ocaml for ECMA script spec.

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

SML vs Ocaml for ECMA script spec.

Stephen Weeks
I am happy to hear about the decision to use SML to specify ES4 and
think it is a good one.  I have a few comments.


I don't understand why meta-language continuations are necessary for
the spec.  If performance really isn't an issue, why not express
stacks/continuations as ordinary data structures and do whatever
manipulations you want on them directly?  It would probably make the
spec more clear to not rely on meta-language features like call/cc.
Illuminating reading for this kind of thing is Reynold's "Definitional
Interpreters for Higher-Order Programming Languages":

  http://mlton.org/References#Reynolds98_2

If you do decide to use meta-language continuations, I suspect that
exceptions and threads really are enough.  You should consider using
(a very small subset of) MLton's thread interface:

  http://mlton.org/MLtonThread

  MLton.Thread provides access to MLton's user-level thread
  implementation (i.e. not OS-level threads).  Threads are lightweight
  data structures that represent a paused computation. Runnable
  threads are threads that will begin or continue computing when
  switched to.  MLton.Thread does not include a default scheduling
  mechanism, but it can be used to implement both preemptive and
  non-preemptive threads.

In order to specify ES4's yield, all you probably need is Thread.new
and Thread.switch.  Importantly, MLton.Thread.switch is constant time,
unlike MLton.Cont.throw, which takes time proportional to the size of
the stack:

  http://mlton.org/MLtonCont

Plus, it is trivial to implement Thread.{new,switch} on top of a
constant-time implementation of callcc/throw like SML/NJ has (and not
vice versa).


Now, for a few comments on the choice of SML implementation.

SML does have the benefit of a number of active implementations (eight
or so):

  http://mlton.org/StandardMLImplementations

However, SML/NJ is the least conforming to the Definition.

  http://mlton.org/DefinitionOfStandardML
  http://mlton.org/SMLNJDeviations

If you want to develop code that matches the Definition and is
portable among SML implementations, you should use at least one of the
other implementations from the start of the project (Hamlet and MLton
being the most pedantic about following the Definition).  I can not
understate the number of times I have heard people who started a
project with SML/NJ encounter porting problems because they
unknowingly used some aspect of SML/NJ that is not SML and is not
supported by other SML implementations.


One feature missing from SML/NJ that may be relevant in providing an
executable spec is exact binary<->decimal conversions.  For example,
if you try

  Real.fmt StringCvt.EXACT Real.Math.pi

in SML/NJ you will get

  uncaught exception Fail [Fail: RealFormat: fmtReal: EXACT not supported]

On the other hand, if you try this in MLton, you will get

  0.3141592653589793E1

MLton uses gdtoa for binary<->decimal conversions.  I don't know of
any other SML implementation that does.


Another feature missing from SML/NJ, but is present in MLton, the ML
Kit, and Poly/ML, is the generation of standalone executables.  This
could be useful for shipping a binary package containing the
executable spec.

Reply | Threaded
Open this post in threaded view
|

Re: SML vs Ocaml for ECMA script spec.

Dave Herman-2
> I don't understand why meta-language continuations are necessary for
> the spec.  If performance really isn't an issue, why not express
> stacks/continuations as ordinary data structures and do whatever
> manipulations you want on them directly?  It would probably make the

We are looking for a middle ground between precision and readability. A
direct-style interpreter has the benefit of being clearest and requiring
no whole-interpreter modifications. Explicit continuations or evaluation
contexts risk polluting a lot of the spec with mostly irrelevant detail.

Clearly, as you point out, the dangers in using features of the
meta-language to model aspects of the object language are a) bugs due to
impedance mismatches between the two and b) misunderstanding due to
readers not understanding the implicit features of ML.

I think the benefits outweight the risk: the features of ML that we are
using are few, and pretty standard. Exceptions and handlers are, at
their core, simple and widely understood, and ML encodes only their very
basic features. There's no notion of "finally", for example. To be sure,
call/cc is a monster, but as Brendan has been saying we can at least
"keep the monster in a box" by only using it in the one place where it's
needed, not throughout the spec.

By contrast, explicit evaluation contexts would be needed nearly
everywhere, and would often be superfluous.

Another nice aspect of this decision is that it's not hard to write a
translator that takes an ML program with effects and translates effects
into a pure ML program written in monadic style. Again, since our
effects are few, this requires the definition of only one monad
(something like exceptions + continuations + state). This may prove
useful if we need to translate the evaluator into a form more amenable
to theorem proving. We're not there yet, though. :)

> If you do decide to use meta-language continuations, I suspect that
> exceptions and threads really are enough.  You should consider using
> (a very small subset of) MLton's thread interface:

I've already written an example implementation of yield in terms of
Concurrent ML threads using SML/NJ. I guarantee sequentiality via a
simple protocol through a single channel shared between the main thread
and the generator thread.

Certainly a pre-emptive threading system would make the sequentiality
manifest. Either way, it's a relatively small part of the spec -- again,
the monster's in a box. :)

You're also quite right that using callcc for yield might be swatting
flies with a bazooka. I'm experimenting with the two to see which I find
clearer. Ah, the beauty of signatures and functors! It's such a pleasure
working with ML.

> Now, for a few comments on the choice of SML implementation.

I really appreciate all your input. In fact, I imagine we will be
*almost* ML-implementation agnostic. Given your warnings, maybe that's
naive -- I'm sure you have far more experience with this than I. But
right now we need to move forward, so for the moment I think we're going
to go with SML/NJ. However, the libraries are remarkably similar, and
the vast majority of our code will just be using the simple, core
features of SML. So porting to a different implementation at some point
isn't out of the question.

Thanks,

Dave

Reply | Threaded
Open this post in threaded view
|

Re: SML vs Ocaml for ECMA script spec.

Stephen Weeks
> We are looking for a middle ground between precision and
> readability. A direct-style interpreter has the benefit of being
> clearest and requiring no whole-interpreter modifications. Explicit
> continuations or evaluation contexts risk polluting a lot of the
> spec with mostly irrelevant detail.

Quite reasonable.

> I've already written an example implementation of yield in terms of
> Concurrent ML threads using SML/NJ. I guarantee sequentiality via a
> simple protocol through a single channel shared between the main thread
> and the generator thread.

If your primary goal is a clean spec, throwing in Concurrent ML adds
yet another level of semantic complexity.  I appreciate the argument
of keeping the complexity localized, but by using Concurrent ML you've
now added a meta-language semantic complexity that pervades all of the
semantics (the meta-language now has preemptive threads).  I suspect
all that is necessary for ES4 is non-preemptive threads (one example
is MLton.Thread.{new,switch}), which have a much simpler semantics.
My point is that it would be nice to keep the monster in the box in
terms of both the code defining ES4 *and* the explanation of the meta
language.  CML might be too heavyweight.

To be more precise, and to speak in SML code, which is always more
fun, I would think a structure matching the following signature is all
that's needed.

  signature THREAD = sig
 
     type 'a t
 
     val new: ('a -> unit) -> 'a t
     val switch: ('a t -> 'b t * 'b) -> 'a
 
  end

One can implement this signature using continuations, but it is also
easily implementable using MLton.Thread.  The point being that you can
give a simple semantics to the meta-language extended with structure
Thread: THREAD, and then implement structure Thread using whatever
lower-level mechanism the particular SML implementation provides.  And
readers of the ES4 spec don't have to understand the low-level stuff.

Here, e.g., is how I might implement THREAD using a variant of
SMLofNJ.Cont.

  signature CONT = sig
 
     type 'a t
 
     val callcc: ('a t -> 'a) -> 'a
     val throw: 'a t * 'a -> 'b
 
  end
 
  functor ContToThread (structure Cont: CONT
                        val die: string -> 'a): THREAD = struct
 
     structure Thread = struct
        datatype 'a t =
           Dead
         | New of 'a -> unit
         | Paused of 'a Cont.t
     end
 
     datatype z = datatype Thread.t
 
     datatype 'a t = T of 'a Thread.t ref
 
     val func: (unit -> unit) option ref = ref NONE
 
     val start: unit Cont.t option ref = ref NONE
       
     val () = Cont.callcc (fn k => start := SOME k)
 
     val start = valOf (!start)
 
     val () =
        case !func of
            NONE => ()
          | SOME f => let
               datatype t = Exit | Raise of exn
            in
               case (f (); Exit) handle e => Raise e of
                  Exit => die "thread exited"
                | Raise _ => die "thread had unhandled exception"
            end
 
     fun new f = T (ref (New f))
       
     fun switch f =
        Cont.callcc
        (fn k => let
           val (T r, b) = f (T (ref (Paused k)))
        in
           case !r of
              Dead => die "switch to dead thread"
            | New f => (func := SOME (fn () => f b); Cont.throw (start, ()))
            | Paused k => Cont.throw (k, b)
        end)
 
  end

I haven't used callcc in years, so there are likely errors.  But my
main point is that (I hope) you can improve your spec by containing
the semantic extensions to the meta language.

> Certainly a pre-emptive threading system would make the sequentiality
> manifest.

I'm not sure what your point is here.

> In fact, I imagine we will be *almost* ML-implementation
> agnostic. Given your warnings, maybe that's naive -- I'm sure you
> have far more experience with this than I. But right now we need to
> move forward, so for the moment I think we're going to go with
> SML/NJ.

Going with SML/NJ is a fine choice.  It's often what I use.  I am
simply saying that you should also compile with another compiler
regularly, and from the start.  It's not that hard.  And if you don't,
you will shoot yourself in the foot, and not find out until much
later, after you have lost a lot of blood.

And if you do settle on MLton for your alternate implementation, you
will get the added benefit of its type error messages, which many
people find significantly better than SML/NJ's.  And since
type-checking is a big part of an SML programmers life :-), this can
be a big win.

Reply | Threaded
Open this post in threaded view
|

Re: SML vs Ocaml for ECMA script spec.

Dave Herman-2
> One can implement this signature using continuations, but it is also
> easily implementable using MLton.Thread.  The point being that you can
> give a simple semantics to the meta-language extended with structure
> Thread: THREAD, and then implement structure Thread using whatever
> lower-level mechanism the particular SML implementation provides.  And
> readers of the ES4 spec don't have to understand the low-level stuff.

It's an interesting idea, that we could pretend to be building on top of
a native cooperative-threading semantics, which is actually implemented
in terms of callcc. (I think it's what you were trying to say in your
first email but I didn't pick up on it.) So casual readers have one
level of abstraction they can use to think about it, and semanticists
can see the lower level definition.

OTOH, I'm not sure the extra level of abstraction is necessary.

>> Certainly a pre-emptive threading system would make the sequentiality
>> manifest.
>
> I'm not sure what your point is here.

Just basically the same thing you said: using a concurrent meta-language
complicates the underlying semantics.

> Going with SML/NJ is a fine choice.  It's often what I use.  I am
> simply saying that you should also compile with another compiler
> regularly, and from the start.  It's not that hard.  And if you don't,
> you will shoot yourself in the foot, and not find out until much
> later, after you have lost a lot of blood.

Good advice, thanks. I'll give MLton a try.

Thanks,
Dave

Reply | Threaded
Open this post in threaded view
|

Re: SML vs Ocaml for ECMA script spec.

Stephen Weeks
> we could pretend to be building on top of a native
> cooperative-threading semantics, which is actually implemented in
> terms of callcc.

Yes, that's a nice way of summarizing my point.

Reply | Threaded
Open this post in threaded view
|

Re: SML vs Ocaml for ECMA script spec.

Daniel C. Wang-2
In reply to this post by Dave Herman-2
BTW is there a way to run MLton in a "check the syntax and type mode only".
That would be a nice feature if you're just checking conformance with the SML spec.
 
Also, I would hope the SML community at large steps up to support this effort. If the preliminary spec is published via CVS/Subversion. I think it would be good for all the SML implementations to try to add that spec to their regressions. This would proivde the widest possible level of conformance, and a strong gurantee that the spec runs on every/many SML implemntation.
 
For that reason the idea of using Threads rather than call-cc might encourage more SML implementors to add the spec to their regressions.
 
As for the style of the semantics. In an ideal world there would be a high-level "natural" big-step semantics. This big-step semanitcs is best for your "casual" progammer to undersand the spec. A very detaled "small-steps" semantics would be useful and avoid the need for adding things like threads/call-cc.
 
Such a small-steps semantics maybe out of the charter of the current standard, but there is nothing preventing some energetic folks to maintain a small-step semantics in parallel with the "offical" spec. If the small-step semantics is more unambigous, one might consider that the "offical" spec and the big-step semantics the advisor commentary about it, or vice versa. But that decsion can be sorted out at the end of the day.
 
The other great thing about using any programming langauge as a spec. rather than a paper spec, is that we have great tools and models for collaborating via programming languages. So, I hope everyone views this as a great opporutinty to collaborate by investing resources and expertise when they can.
 
 


 
On 10/24/06, Dave Herman <[hidden email]> wrote:
> One can implement this signature using continuations, but it is also
> easily implementable using MLton.Thread.  The point being that you can
> give a simple semantics to the meta-language extended with structure
> Thread: THREAD, and then implement structure Thread using whatever
> lower-level mechanism the particular SML implementation provides.  And
> readers of the ES4 spec don't have to understand the low-level stuff.

It's an interesting idea, that we could pretend to be building on top of
a native cooperative-threading semantics, which is actually implemented
in terms of callcc. (I think it's what you were trying to say in your
first email but I didn't pick up on it.) So casual readers have one
level of abstraction they can use to think about it, and semanticists
can see the lower level definition.

OTOH, I'm not sure the extra level of abstraction is necessary.

>> Certainly a pre-emptive threading system would make the sequentiality
>> manifest.
>
> I'm not sure what your point is here.

Just basically the same thing you said: using a concurrent meta-language
complicates the underlying semantics.

> Going with SML/NJ is a fine choice.  It's often what I use.  I am
> simply saying that you should also compile with another compiler
> regularly, and from the start.  It's not that hard.  And if you don't,
> you will shoot yourself in the foot, and not find out until much
> later, after you have lost a lot of blood.

Good advice, thanks. I'll give MLton a try.

Thanks,
Dave
_______________________________________________
Es4-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es4-discuss

Reply | Threaded
Open this post in threaded view
|

Re: SML vs Ocaml for ECMA script spec.

Stephen Weeks
> BTW is there a way to run MLton in a "check the syntax and type mode only".
> That would be a nice feature if you're just checking conformance with the
> SML spec.

Yes, "mlton -stop tc foo.sml" will stop the compiler after type
checking is done.  This is documented at

  http://mlton.org/CompileTimeOptions

Reply | Threaded
Open this post in threaded view
|

Re: SML vs Ocaml for ECMA script spec.

Dave Herman-2
In reply to this post by Daniel C. Wang-2
> As for the style of the semantics. In an ideal world there would be a
> high-level "natural" big-step semantics. This big-step semanitcs is best
> for your "casual" progammer to undersand the spec. A very detaled
> "small-steps" semantics would be useful and avoid the need for adding
> things like threads/call-cc.

There's no reason a big-step operational semantics should be more
readable than an ML evaluator. We were heading in the direction of a
small-step semantics with explicit evaluation contexts, but decided that
a direct-style interpreter would be a bit more abstract and more accessible.

> Such a small-steps semantics maybe out of the charter of the current
> standard, but there is nothing preventing some energetic folks to
> maintain a small-step semantics in parallel with the "offical" spec. If

Yes, I've already prototyped a small-step semantics implemented in the
term rewriting language Stratego. One of the interesting side effects of
a small-step interpreter is that you get an automatic stepper
(debugger), as opposed to just an evaluator.

> the small-step semantics is more unambigous, one might consider that the
> "offical" spec and the big-step semantics the advisor commentary about
> it, or vice versa. But that decsion can be sorted out at the end of the
> day.

Well, no-- the working group has decided on ML as the spec language for
very good reasons we've already gone into. We aren't going to develop
two semantics in parallel on a tight deadline. :)

Dave

Reply | Threaded
Open this post in threaded view
|

Re: SML vs Ocaml for ECMA script spec.

Daniel C. Wang-2
I was suggesting that those who were interested in different
specification techniques do it on their own time, and leave you guys
alone. :) When the next rev comes out people can flog it out or perhaps
have a process to include more detailed specs as clarifying
informational addenda...  If I remember correctly RS5Rn is in that
style. The prose is the "official" standard and the semantics was
"informational".

It might be good to have such parallel efforts, as it might help to
debug the spec you guys are developing. It's my experience that you can
never believe anything is done right, unless you do it two times in
different ways.

In particular proving type-safety is best done with a small-steps
semantics. However, I don't think you guys want to tackle that in the
spec. :) But it would be good to have something closely in sync with the
spec so those in the research community interested in doing such a thing
would have a good base to work from.

Dave Herman wrote:
>
> Well, no-- the working group has decided on ML as the spec language
> for very good reasons we've already gone into. We aren't going to
> develop two semantics in parallel on a tight deadline. :)
>
> Dave
>


Reply | Threaded
Open this post in threaded view
|

Re: SML vs Ocaml for ECMA script spec.

Dave Herman-2
> I was suggesting that those who were interested in different
> specification techniques do it on their own time, and leave you guys
> alone. :) When the next rev comes out people can flog it out or perhaps
> have a process to include more detailed specs as clarifying
> informational addenda...  If I remember correctly RS5Rn is in that
> style. The prose is the "official" standard and the semantics was
> "informational".

I see.

> It might be good to have such parallel efforts, as it might help to
> debug the spec you guys are developing. It's my experience that you can
> never believe anything is done right, unless you do it two times in
> different ways.

Sure.

> In particular proving type-safety is best done with a small-steps
> semantics. However, I don't think you guys want to tackle that in the

Indeed-- however, it's hard to prove your type system sound if it isn't.
;) According to the current draft spec, ES4 has a dynamic type system
with a "verification phase" that may reject some programs. However,
there is no Milner-style guarantee if the static checker blesses your
program; any expression may still generate a runtime type error.

Basic language safety (in the sense of Appel's "sermon on safety" in the
Tiger book) is ensured via dynamic types. With a spec in ML, this
amounts roughly to proving that every case analysis is exhaustive
(conveniently proved by the ML compiler!), and perhaps strengthened by a
hand proof showing that certain "can't-happen" cases never happen. Type
soundness, though, doesn't hold.

(For the very latest on what type soundness means in a language with
mixed static and dynamic types, I recommend Sam Tobin-Hochstadt and
Matthias Felleisen's upcoming paper at DLS:

     http://www.ccs.neu.edu/home/samth/

The basic refinement of Milner's statement is something along the lines
of "programs with well-typed modules only go wrong due to untyped
modules." However, we don't quite have such a system.)

Dave