ModuleDeclarationInstantiation behaviour after failure

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

ModuleDeclarationInstantiation behaviour after failure

Jon Coppeard
Fuzz testing has recently turned up some interesting behaviour involving
ModuleDeclarationInstantiation.

What currently happens is that ModuleDeclarationInstantiation always
initialises the source text module record's Environment slot, even if it
subsequently fails with an error (e.g. failure to resolve import).  This
leaves the environment partially initialised.

Then if it is called a second time for that module (maybe in the course
of recursively performing ModuleDeclarationInstantiation starting from a
different module) it will succeed without doing anything.  This can lead
to failure further on when code is executed without the expected
bindings being present.

Is this expected?

It seems to me that a subsequent call should either attempt to create
the environment again (maybe all dependencies are now available?) or
fail immediately in the same way.

The former could be accomplished by only setting the Environment slot at
the end after the environment has been successfully initialised.

Jon
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: ModuleDeclarationInstantiation behaviour after failure

Caridy Patino
Moving step 7 to the end is not an option since the primary use of that step is to support circular references.

As for recovering from errors, this is in the realm of the loader and the registry to prevent subsequence attempts to instantiate a source text module record that failed already. There is not such things are a recovery from a failure in a module, just like there is none on a script. A testament of this is that a module record never holds a reference to another module record, it always resolve it when needed.

/caridy

> On Jul 6, 2016, at 6:02 AM, Jon Coppeard <[hidden email]> wrote:
>
> Fuzz testing has recently turned up some interesting behaviour involving
> ModuleDeclarationInstantiation.
>
> What currently happens is that ModuleDeclarationInstantiation always
> initialises the source text module record's Environment slot, even if it
> subsequently fails with an error (e.g. failure to resolve import).  This
> leaves the environment partially initialised.
>
> Then if it is called a second time for that module (maybe in the course
> of recursively performing ModuleDeclarationInstantiation starting from a
> different module) it will succeed without doing anything.  This can lead
> to failure further on when code is executed without the expected
> bindings being present.
>
> Is this expected?
>
> It seems to me that a subsequent call should either attempt to create
> the environment again (maybe all dependencies are now available?) or
> fail immediately in the same way.
>
> The former could be accomplished by only setting the Environment slot at
> the end after the environment has been successfully initialised.
>
> Jon
> _______________________________________________
> es-discuss mailing list
> [hidden email]
> https://mail.mozilla.org/listinfo/es-discuss

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: ModuleDeclarationInstantiation behaviour after failure

Jason Orendorff
On Thu, Jul 7, 2016 at 12:09 PM, Caridy Patiño <[hidden email]> wrote:
Moving step 7 to the end is not an option since the primary use of that step is to support circular references.

Not sure this is helpful, but back when I was working with Dave on this stuff, it was supposed to work something like this:

* The loader determines when a set of new modules is ready to link. This only happens after resolving all requested modules for the whole set.

* Check that all requested modules actually have the export bindings we require of them. If not, it's an error.

* Linking. We create module Environments and all their bindings, including import bindings. Observably, this is an atomic step and it can't fail. We already checked for all possible link errors.

* Next, we evaluate module bodies, etc.

So in our design, if code ran in a module environment, that environment was fully linked. It looks like ModuleDeclarationInstantiation does not have this guarantee. If true, that's a serious bug.

> As for recovering from errors, this is in the realm of the loader and the registry to prevent subsequence attempts to instantiate a source text module record that failed already. [...]

It's fine to have an error sometimes shoot down a whole graph of modules. The problem here is that the failed Environments can be exposed later. The spec says you're even supposed to run code in them, right? But forging ahead with partially-initialized data structures seems as obviously undesirable in a spec as in running code -- that's begging for bugs -- and it's an unreasonable implementation burden too.

-j


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: ModuleDeclarationInstantiation behaviour after failure

Allen Wirfs-Brock


On Jul 7, 2016 1:11 PM, Jason Orendorff <[hidden email]> wrote:
>
> On Thu, Jul 7, 2016 at 12:09 PM, Caridy Patiño <[hidden email]> wrote:
>>
>> Moving step 7 to the end is not an option since the primary use of that step is to support circular references.
>
>
> Not sure this is helpful, but back when I was working with Dave on this stuff, it was supposed to work something like this:
>
> * The loader determines when a set of new modules is ready to link. This only happens after resolving all requested modules for the whole set.
>
> * Check that all requested modules actually have the export bindings we require of them. If not, it's an error.
>
> * Linking. We create module Environments and all their bindings, including import bindings. Observably, this is an atomic step and it can't fail. We already checked for all possible link errors.

Yup, that's also how the spec. is structured.

>
> * Next, we evaluate module bodies, etc.
>
> So in our design, if code ran in a module environment, that environment was fully linked. It looks like ModuleDeclarationInstantiation does not have this guarantee. If true, that's a serious bug.

It isn't true. No user code runs while linking/instantiating a graph of modules. Evaluation is a separate step that only happens if the linking/instantiation occurs without error.

>
> > As for recovering from errors, this is in the realm of the loader and the registry to prevent subsequence attempts to instantiate a source text module record that failed already. [...]

Yup

>
> It's fine to have an error sometimes shoot down a whole graph of modules. The problem here is that the failed Environments can be exposed later. The spec says you're even supposed to run code in them, right? But forging ahead with partially-initialized data structures seems as obviously undesirable in a spec as in running code -- that's begging for bugs -- and it's an unreasonable implementation burden too.

The spec. Only generates errors and does try to define error recovery. That's up to the implementation. But I would expect implementations to discard any module records it created during a linking phase that throws errors.

Allen


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: ModuleDeclarationInstantiation behaviour after failure

Jon Coppeard
On 07/07/2016 22:33, Allen Wirfs-Brock wrote:
> The spec. Only generates errors and does try to define error recovery.
> That's up to the implementation.

OK, that makes sense.  This situation would clearly be a bug in the
module loader.  Maybe it would be worth making ES robust against such
bugs though.

> But I would expect implementations to
> discard any module records it created during a linking phase that throws
> errors.

This wasn't obvious to me from reading the spec, although there is an
assertion in ModuleEvaluation that ModuleDeclarationInstantation has
completed successfully first that would be invalidated in this case.

Thanks,

Jon
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: ModuleDeclarationInstantiation behaviour after failure

Jon Coppeard
In reply to this post by Caridy Patino
On 07/07/2016 18:09, Caridy Patiño wrote:
> Moving step 7 to the end is not an option since the primary use of that step is to support circular references.

Ah, of course.  Thanks,

Jon
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: ModuleDeclarationInstantiation behaviour after failure

Jason Orendorff
In reply to this post by Allen Wirfs-Brock
On Thu, Jul 7, 2016 at 4:33 PM, Allen Wirfs-Brock <[hidden email]> wrote:
The spec. Only generates errors and does try to define error recovery. That's up to the implementation. But I would expect implementations to discard any module records it created during a linking phase that throws errors.

That makes sense. Thanks.

This explains why I didn't understand it. Originally we specified a loader too, and the loader's behavior on error was a little different from this. Not necessarily better, though.

-j

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: ModuleDeclarationInstantiation behaviour after failure

Jon Coppeard
In reply to this post by Allen Wirfs-Brock
On 07/07/2016 22:33, Allen Wirfs-Brock wrote:
> I would expect implementations to
> discard any module records it created during a linking phase that throws
> errors.

I think it's not trivial to know which module records to discard.

Thinking out loud:

A module loader may be simultaneously loading multiple top-level modules
which have overlapping dependency graphs.  So an imported module may be
created to satisfy the dependencies of more than one top-level module
that is currently being loaded.

We can't throw away all dependencies on failure because that could
discard successfully loaded modules that are in use by a previously
loaded module.  A subsequent load of another module could then re-load a
different version of these modules.

Really we only need to throw away the module which failed to link and
every ancestor module up to the root of the tree, i.e. those for which
instantiation has started but did not complete.  But a module loader
doesn't have a way to work this out.

One solution might be to have the loader maintain an 'instantiated' flag
for each module which is initially false and to set this for all
uninstantiated descendants on successful instantiation.  Then we could
throw away uninstantiated descendants on failure.  (This would discard
more than necessary, but that doesn't matter).

Does that make sense?  I think this would work but it feels like it's
making the loader do extra work to compute state that could more simply
be stored in the module record itself.

Jon
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: ModuleDeclarationInstantiation behaviour after failure

Allen Wirfs-Brock

On Jul 14, 2016, at 3:51 AM, Jon Coppeard <[hidden email]> wrote:

On 07/07/2016 22:33, Allen Wirfs-Brock wrote:
I would expect implementations to
discard any module records it created during a linking phase that throws
errors.

I think it's not trivial to know which module records to discard.

Thinking out loud:

A module loader may be simultaneously loading multiple top-level modules
which have overlapping dependency graphs.  So an imported module may be
created to satisfy the dependencies of more than one top-level module
that is currently being loaded.

In the spec, module instantiation is always initiated by a TopLevelModuleEvaluationJob. Such jobs are serially executed to completion.  The spec. explicitly says that an implementation may pre-instantiate and link modules ( it could even happen during a build process) as long as error reporting is deferred to the actual TopLevelModuleEvaluationJob (of course a build time linker would also report build time errors).

So, form the spec perspective there is no semantics of “simultaneously loading” multiple top-level modules.  To conform to those semantics, an implementation that wants to simultaneously eagerly instantiate and link modules has to do it in a manner that preserves the serial instantiation semantics.



We can't throw away all dependencies on failure because that could
discard successfully loaded modules that are in use by a previously
loaded module.  A subsequent load of another module could then re-load a
different version of these modules.

Really we only need to throw away the module which failed to link and
every ancestor module up to the root of the tree, i.e. those for which
instantiation has started but did not complete.  But a module loader
doesn't have a way to work this out.

One solution might be to have the loader maintain an 'instantiated' flag
for each module which is initially false and to set this for all
uninstantiated descendants on successful instantiation.  Then we could
throw away uninstantiated descendants on failure.  (This would discard
more than necessary, but that doesn't matter).

Does that make sense?  I think this would work but it feels like it's
making the loader do extra work to compute state that could more simply
be stored in the module record itself.

I’m just thinking out load too,  But here is how I would approach it.

Module Records are created by ParseModule (often indirectly via HostResolveImportedModule) and subsequently retrieved by HostResolveImportModule. 

I would treat steps 3-5 of [TopLevelModuleEvaluationJob](https://tc39.github.io/ecma262/#sec-toplevelmoduleevaluationjob) as an atomic transaction that either succeeds or fails.  During such a transaction, any new module record that is created is considered a “pending module record”. When a transition successfully completes, all of its pending module records permanently become “committed module records” but if a transaction fails its pending module records are discarded.  During a transaction, HostResoveImportedModule uses both the committed and pending module record sets to resolve import requests.  

Regarding storing state (for example, committed or pending) in module records:  Module records (like most data structures in the spec) are just abstrations used by the specification to describe the semantics and observable state changes.  An implementation is free to represent those abstractions any way it wants and certainly can incorporate additional unobservable implementation state.

Allen




_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: ModuleDeclarationInstantiation behaviour after failure

Jon Coppeard
Thank you for the detailed response.

To start with I should say that although I'm working on an
implementation of the basic module loader defined by the HTML spec for
<script type="module">, a goal is also to validate that spec.  As it
stands it suffers from the issue we are discussing and attempts to
simultaneously load multiple top-level modules but does not correctly
handle errors in instantiation.

In the browser, this simultaneous loading is important for performance
reasons.  Using modules already has the potential to create many more
server round trips as dependencies are fetched, so browsers will want to
parallelise these fetches as much as possible.  This means parsing
modules as soon as they have been fetched to extract their dependencies
and starting to fetch their dependencies straight away, all across
multiple module trees.

I think it's important to outline how this is going to work in the HTML
spec itself, especially since the algorithm chosen can have subtle
effects on the behaviour.

The approach you suggested above would work but I think it would have
the disadvantage that while while the transaction for one module tree
was in progress it would delay modules from a second module tree being
parsed (and their dependencies fetched).  Did I understand that correctly?

Another similar approach that also uses the committed state might be to
parse all modules as they become available and discard any uncommitted
dependencies on instantiation error.  Dependencies of a top-level module
are marked as committed on successful load.  I think this would be
equivalent to some serial execution of TopLevelModuleEvaluationJob for
each top-level module.

However that still has the disadvantage that too many modules are thrown
away on error.  This means that some modules might have to be re-fetched
later if they were used by a sequent load, even if they were present
when that load started.  My concern is mainly that handling this corner
case would complicate the HTML spec.

I guess what I'd really like is some support for this simultaneous
loading in ES.  Removing the restriction on discarding modules after
failure and making ModuleDeclarationInstantiation fail if called again
after failure would be one way.  Another would be to report to the
caller which modules failed to instantiate.  Either way the problem is
that it's hard to handle errors from ModuleDeclarationInstantiation if
you don't know which modules failed.

Jon

On 14/07/2016 17:20, Allen Wirfs-Brock wrote:

>
>> On Jul 14, 2016, at 3:51 AM, Jon Coppeard <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> On 07/07/2016 22:33, Allen Wirfs-Brock wrote:
>>> I would expect implementations to
>>> discard any module records it created during a linking phase that throws
>>> errors.
>>
>> I think it's not trivial to know which module records to discard.
>>
>> Thinking out loud:
>>
>> A module loader may be simultaneously loading multiple top-level modules
>> which have overlapping dependency graphs.  So an imported module may be
>> created to satisfy the dependencies of more than one top-level module
>> that is currently being loaded.
>
> In the spec, module instantiation is always initiated by a
> TopLevelModuleEvaluationJob. Such jobs are serially executed to
> completion.  The spec. explicitly says that an implementation may
> pre-instantiate and link modules ( it could even happen during a build
> process) as long as error reporting is deferred to the actual
> TopLevelModuleEvaluationJob (of course a build time linker would also
> report build time errors).
>
> So, form the spec perspective there is no semantics of “simultaneously
> loading” multiple top-level modules.  To conform to those semantics, an
> implementation that wants to simultaneously eagerly instantiate and link
> modules has to do it in a manner that preserves the serial instantiation
> semantics.
>
>
>>
>> We can't throw away all dependencies on failure because that could
>> discard successfully loaded modules that are in use by a previously
>> loaded module.  A subsequent load of another module could then re-load a
>> different version of these modules.
>>
>> Really we only need to throw away the module which failed to link and
>> every ancestor module up to the root of the tree, i.e. those for which
>> instantiation has started but did not complete.  But a module loader
>> doesn't have a way to work this out.
>>
>> One solution might be to have the loader maintain an 'instantiated' flag
>> for each module which is initially false and to set this for all
>> uninstantiated descendants on successful instantiation.  Then we could
>> throw away uninstantiated descendants on failure.  (This would discard
>> more than necessary, but that doesn't matter).
>>
>> Does that make sense?  I think this would work but it feels like it's
>> making the loader do extra work to compute state that could more simply
>> be stored in the module record itself.
>
> I’m just thinking out load too,  But here is how I would approach it.
>
> Module Records are created by ParseModule (often indirectly via
> HostResolveImportedModule) and subsequently retrieved by
> HostResolveImportModule.
>
> I would treat steps 3-5 of
> [TopLevelModuleEvaluationJob](https://tc39.github.io/ecma262/#sec-toplevelmoduleevaluationjob)
> as an atomic transaction that either succeeds or fails.  During such a
> transaction, any new module record that is created is considered a
> “pending module record”. When a transition successfully completes, all
> of its pending module records permanently become “committed module
> records” but if a transaction fails its pending module records are
> discarded.  During a transaction, HostResoveImportedModule uses both the
> committed and pending module record sets to resolve import requests.  
>
> Regarding storing state (for example, committed or pending) in module
> records:  Module records (like most data structures in the spec) are
> just abstrations used by the specification to describe the semantics and
> observable state changes.  An implementation is free to represent those
> abstractions any way it wants and certainly can incorporate additional
> unobservable implementation state.
>
> Allen
>
>
>
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: ModuleDeclarationInstantiation behaviour after failure

Jason Orendorff
How about this: Linking happens as eagerly as possible whenever a module arrives off the network. This means that very often we'll be linking one module at a time. In any case, every link set will be a strongly connected component of ready-to-link modules in the dependency graph; and therefore if linking fails, we should discard all the modules that we were trying to link.

When a link error or any other error happens, we immediately discard the failed module and everything that depends on it (potentially many modules in various states, etc.). Implementing this efficiently means tracking, for each module, "all the stuff that's depending/waiting on me". But the loader has to do this anyway, right?

More complete description of this proposal, in case that's not clear: https://gist.github.com/jorendorff/fc5bad969137402caa10bb3570b3f202

On Fri, Jul 15, 2016 at 9:32 AM, Jon Coppeard <[hidden email]> wrote:
Another similar approach that also uses the committed state might be to
parse all modules as they become available and discard any uncommitted
dependencies on instantiation error.

What I'm proposing doesn't discard in-flight dependencies, but allows them to finish loading.

However that still has the disadvantage that too many modules are thrown
away on error.

Agreed.
 
I guess what I'd really like is some support for this simultaneous
loading in ES.

At the least, I think it makes sense to mark Module Records for modules that failed to parse or link as permanently bad, and assert as we go that we are only ever working on good Module Records. It would be a nice clarification. (I suspect TC39 would accept patches for this, even though the status quo seems correct in every way -- it's just a matter of someone doing the work.)

-j

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: ModuleDeclarationInstantiation behaviour after failure

Allen Wirfs-Brock
In reply to this post by Jon Coppeard

On Jul 15, 2016, at 7:32 AM, Jon Coppeard <[hidden email]> wrote:

Thank you for the detailed response.

To start with I should say that although I'm working on an
implementation of the basic module loader defined by the HTML spec for
<script type="module">, a goal is also to validate that spec.  As it
stands it suffers from the issue we are discussing and attempts to
simultaneously load multiple top-level modules but does not correctly
handle errors in instantiation.

Can you point me at the parts of the HTML spec. that involved here?  ModuleDeclarationInstantiation is a semantics of the ES spec. and it isn’t clear to me why the HTML spec. would need to invoke it. From you description, I suspect there may be some impedance mismatches between the specs.


In the browser, this simultaneous loading is important for performance
reasons.  Using modules already has the potential to create many more
server round trips as dependencies are fetched, so browsers will want to
parallelise these fetches as much as possible.  This means parsing
modules as soon as they have been fetched to extract their dependencies
and starting to fetch their dependencies straight away, all across
multiple module trees.

Shouldn't be a problem.  Just like some  browsers now  eagerly (or lazily) parse scripts on a separate thread they should be able to parse individual module files as they fetch them and cache parse results (eg, ASTs). Even share cached parse results across multiple windows.  Fetching and mechanism of how a parsed module or script is internally represented is outside of the scope of the ES spec. Eager parsing is explicitly allowed for in the spec. For example, see the NOTE in https://tc39.github.io/ecma262/#sec-toplevelmoduleevaluationjob. Basically an implementation can cache or do anything it wants as long as the fact it is doing so is not observable to ES code. The non-observability requirement is the reason it says that error reporting most be deferred until the explicit parse step in the spec. is reached.


I think it's important to outline how this is going to work in the HTML
spec itself, especially since the algorithm chosen can have subtle
effects on the behaviour.

The approach you suggested above would work but I think it would have
the disadvantage that while while the transaction for one module tree
was in progress it would delay modules from a second module tree being
parsed (and their dependencies fetched).  Did I understand that correctly?

That’s certainly not the intent.  Note that https://tc39.github.io/ecma262/#sec-hostresolveimportedmodule is required to be idempotent if it completes normally. An implication of this is that any complete and valid closed subgraph of a module tree should be reusable and sharable among other module trees, even if the first module tree to process it fails because other subtree are incomplete.  Processing multiple module trees in parallel certainly complicates caching and error recovery. But should be possible.  The ES spec. really doesn’t care, as long as the implementation specific quirks aren’t observable to ES code.

(BTW, a loader API that includes the ability of ES code to programmatically introduce additional modules will also complicate eager or parallel processing.  If that is possible, then failure of eager processing of some module trees might be a temporary condition, and failure means that the processing of the module tree needs to be deferred until the API causes something to change.    In other words, statically fetched modules that have dependencies upon dynamically created modules are tricky )


Another similar approach that also uses the committed state might be to
parse all modules as they become available and discard any uncommitted
dependencies on instantiation error.  Dependencies of a top-level module
are marked as committed on successful load.  I think this would be
equivalent to some serial execution of TopLevelModuleEvaluationJob for
each top-level module.

All you really need to throw away are newly created Module Environment Records as they are the entities that are observable to the ES evaluation semantics. But remember to distingish the implementation’s internal data structures from the specification’s abstract data structures such as Module Records, and Module Environment Records. The implementation is free to use any valid memorization it can come up with. 

Note that the possibility of eager linking/instantiation is why module evaluation is a separate step and not included in module instantiation.  The job queue for TopLevelEvaluationJobs is there to enable the host environment to specify the serial execution order of top level module (and script) evaluations.




However that still has the disadvantage that too many modules are thrown
away on error.  This means that some modules might have to be re-fetched
later if they were used by a sequent load, even if they were present
when that load started.  My concern is mainly that handling this corner
case would complicate the HTML spec.

Don’t see why refetching would be required (assuming the remote resource doesn’t change).  Might have to relink subtrees that had earlier had linkage failures.

I guess what I'd really like is some support for this simultaneous
loading in ES.  Removing the restriction on discarding modules after
failure and making ModuleDeclarationInstantiation fail if called again
after failure would be one way.  Another would be to report to the
caller which modules failed to instantiate.  Either way the problem is
that it's hard to handle errors from ModuleDeclarationInstantiation if
you don't know which modules failed.

My sense is that you are seeing requirements/restrictions in the ES spec. that aren't there (or aren’t intended if they are).

We should probably focus on individual issues, one at a time.

Allen


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: ModuleDeclarationInstantiation behaviour after failure

Jon Coppeard
In reply to this post by Jason Orendorff
OK, this is great and I think it solves the problems of what to throw
away and not throwing away too much.

> Implementing this efficiently means
> tracking, for each module, "all the stuff that's depending/waiting on
> me". But the loader has to do this anyway, right?

Yes, this already has to happen to know when we can evaluate the
top-level module.

Thanks,

Jon

On 15/07/2016 18:31, Jason Orendorff wrote:

> How about this: Linking happens as eagerly as possible whenever a module
> arrives off the network. This means that very often we'll be linking one
> module at a time. In any case, every link set will be a strongly
> connected component of ready-to-link modules in the dependency graph;
> and therefore if linking fails, we should discard all the modules that
> we were trying to link.
>
> When a link error or any other error happens, we immediately discard the
> failed module and everything that depends on it (potentially many
> modules in various states, etc.). Implementing this efficiently means
> tracking, for each module, "all the stuff that's depending/waiting on
> me". But the loader has to do this anyway, right?
>
> More complete description of this proposal, in case that's not clear:
> https://gist.github.com/jorendorff/fc5bad969137402caa10bb3570b3f202
>
> On Fri, Jul 15, 2016 at 9:32 AM, Jon Coppeard <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Another similar approach that also uses the committed state might be to
>     parse all modules as they become available and discard any uncommitted
>     dependencies on instantiation error.
>
>
> What I'm proposing doesn't discard in-flight dependencies, but allows
> them to finish loading.
>
>     However that still has the disadvantage that too many modules are thrown
>     away on error.
>
>
> Agreed.
>  
>
>     I guess what I'd really like is some support for this simultaneous
>     loading in ES.
>
>
> At the least, I think it makes sense to mark Module Records for modules
> that failed to parse or link as permanently bad, and assert as we go
> that we are only ever working on good Module Records. It would be a nice
> clarification. (I suspect TC39 would accept patches for this, even
> though the status quo seems correct in every way -- it's just a matter
> of someone doing the work.)
>
> -j
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: ModuleDeclarationInstantiation behaviour after failure

Jon Coppeard
In reply to this post by Allen Wirfs-Brock
On 15/07/2016 18:36, Allen Wirfs-Brock wrote:

> Can you point me at the parts of the HTML spec. that involved here?

The script tag is defined here:

https://html.spec.whatwg.org/multipage/scripting.html#the-script-element

The mechanism for loading modules starts here:

https://html.spec.whatwg.org/multipage/webappapis.html#fetch-a-module-script-tree

And HostResolveImportedModule is defined here:

https://html.spec.whatwg.org/multipage/webappapis.html#integration-with-the-javascript-module-system

> ModuleDeclarationInstantiation is a semantics of the ES spec. and it
> isn’t clear to me why the HTML spec. would need to invoke it.

The HTML spec needs some way of linking and executing modules.  It's not
clear (to me) which parts of ES can be "called into" and referred to
externally from another spec.

I guess at this point HTML is providing the source texts required by
RunJobs step 2:

"In an implementation dependent manner, obtain the ECMAScript source
texts ... for zero or more ECMAScript scripts and/or ECMAScript modules"

I'm still not sure how I would refer to that exactly from an external spec.

Further to this, if we do pre-instantiation of modules then we need some
way of making this happen before TopLevelModuleEvaluationJob runs.  I
don't think saying something like "use an implementation specific way to
pre-instantiate a module" would be that useful, whereas saying "call
ModuleDeclarationInstantiation for the module record" makes it clear
what is supposed to happen.

> (BTW, a loader API that includes the ability of ES code to
> programmatically introduce additional modules will also complicate eager
> or parallel processing.  If that is possible, then failure of eager
> processing of some module trees might be a temporary condition, and
> failure means that the processing of the module tree needs to be
> deferred until the API causes something to change.    In other words,
> statically fetched modules that have dependencies upon dynamically
> created modules are tricky )

I totally agree and thankfully we are not doing that yet!

> My sense is that you are seeing requirements/restrictions in the ES
> spec. that aren't there (or aren’t intended if they are).

I think you're right.  I'm not sure exactly what I had in mind but it
seems the answer is "do anything you like as long as it is not
observable to script", which is a lower bar than I realised.

Thanks,

Jon

On 15/07/2016 18:36, Allen Wirfs-Brock wrote:

>
>> On Jul 15, 2016, at 7:32 AM, Jon Coppeard <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> Thank you for the detailed response.
>>
>> To start with I should say that although I'm working on an
>> implementation of the basic module loader defined by the HTML spec for
>> <script type="module">, a goal is also to validate that spec.  As it
>> stands it suffers from the issue we are discussing and attempts to
>> simultaneously load multiple top-level modules but does not correctly
>> handle errors in instantiation.
>
> Can you point me at the parts of the HTML spec. that involved here?
>  ModuleDeclarationInstantiation is a semantics of the ES spec. and it
> isn’t clear to me why the HTML spec. would need to invoke it. From you
> description, I suspect there may be some impedance mismatches between
> the specs.
>
>>
>> In the browser, this simultaneous loading is important for performance
>> reasons.  Using modules already has the potential to create many more
>> server round trips as dependencies are fetched, so browsers will want to
>> parallelise these fetches as much as possible.  This means parsing
>> modules as soon as they have been fetched to extract their dependencies
>> and starting to fetch their dependencies straight away, all across
>> multiple module trees.
>
> Shouldn't be a problem.  Just like some  browsers now  eagerly (or
> lazily) parse scripts on a separate thread they should be able to parse
> individual module files as they fetch them and cache parse results (eg,
> ASTs). Even share cached parse results across multiple windows.
>  Fetching and mechanism of how a parsed module or script is internally
> represented is outside of the scope of the ES spec. Eager parsing is
> explicitly allowed for in the spec. For example, see the NOTE
> in https://tc39.github.io/ecma262/#sec-toplevelmoduleevaluationjob. Basically
> an implementation can cache or do anything it wants as long as the fact
> it is doing so is not observable to ES code. The non-observability
> requirement is the reason it says that error reporting most be deferred
> until the explicit parse step in the spec. is reached.
>
>>
>> I think it's important to outline how this is going to work in the HTML
>> spec itself, especially since the algorithm chosen can have subtle
>> effects on the behaviour.
>>
>> The approach you suggested above would work but I think it would have
>> the disadvantage that while while the transaction for one module tree
>> was in progress it would delay modules from a second module tree being
>> parsed (and their dependencies fetched).  Did I understand that correctly?
>
> That’s certainly not the intent.  Note
> that https://tc39.github.io/ecma262/#sec-hostresolveimportedmodule is
> required to be idempotent if it completes normally. An implication of
> this is that any complete and valid closed subgraph of a module tree
> should be reusable and sharable among other module trees, even if the
> first module tree to process it fails because other subtree are
> incomplete.  Processing multiple module trees in parallel certainly
> complicates caching and error recovery. But should be possible.  The ES
> spec. really doesn’t care, as long as the implementation specific quirks
> aren’t observable to ES code.
>
> (BTW, a loader API that includes the ability of ES code to
> programmatically introduce additional modules will also complicate eager
> or parallel processing.  If that is possible, then failure of eager
> processing of some module trees might be a temporary condition, and
> failure means that the processing of the module tree needs to be
> deferred until the API causes something to change.    In other words,
> statically fetched modules that have dependencies upon dynamically
> created modules are tricky )
>
>>
>> Another similar approach that also uses the committed state might be to
>> parse all modules as they become available and discard any uncommitted
>> dependencies on instantiation error.  Dependencies of a top-level module
>> are marked as committed on successful load.  I think this would be
>> equivalent to some serial execution of TopLevelModuleEvaluationJob for
>> each top-level module.
>
> All you really need to throw away are newly created Module Environment
> Records as they are the entities that are observable to the ES
> evaluation semantics. But remember to distingish the implementation’s
> internal data structures from the specification’s abstract data
> structures such as Module Records, and Module Environment Records. The
> implementation is free to use any valid memorization it can come up with.
>
> Note that the possibility of eager linking/instantiation is why module
> evaluation is a separate step and not included in module instantiation.
>  The job queue for TopLevelEvaluationJobs is there to enable the host
> environment to specify the serial execution order of top level module
> (and script) evaluations.
>
>>
>> However that still has the disadvantage that too many modules are thrown
>> away on error.  This means that some modules might have to be re-fetched
>> later if they were used by a sequent load, even if they were present
>> when that load started.  My concern is mainly that handling this corner
>> case would complicate the HTML spec.
>
> Don’t see why refetching would be required (assuming the remote resource
> doesn’t change).  Might have to relink subtrees that had earlier had
> linkage failures.
>>
>> I guess what I'd really like is some support for this simultaneous
>> loading in ES.  Removing the restriction on discarding modules after
>> failure and making ModuleDeclarationInstantiation fail if called again
>> after failure would be one way.  Another would be to report to the
>> caller which modules failed to instantiate.  Either way the problem is
>> that it's hard to handle errors from ModuleDeclarationInstantiation if
>> you don't know which modules failed.
>
> My sense is that you are seeing requirements/restrictions in the ES
> spec. that aren't there (or aren’t intended if they are).
>
> We should probably focus on individual issues, one at a time.
>
> Allen
>
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: ModuleDeclarationInstantiation behaviour after failure

Allen Wirfs-Brock

On Jul 18, 2016, at 7:04 AM, Jon Coppeard <[hidden email]> wrote:

On 15/07/2016 18:36, Allen Wirfs-Brock wrote:



ModuleDeclarationInstantiation is a semantics of the ES spec. and it
isn’t clear to me why the HTML spec. would need to invoke it.

The HTML spec needs some way of linking and executing modules.  It's not
clear (to me) which parts of ES can be "called into" and referred to
externally from another spec.

The intent is that after the “host” has fetched a module (and its dependencies) it enqueue a TopLevelModuleEvaluationJob. But, as stated there, much of the work of that TopLevelModuleEvaluationJob can be performed eagerly. 




I guess at this point HTML is providing the source texts required by
RunJobs step 2:

"In an implementation dependent manner, obtain the ECMAScript source
texts ... for zero or more ECMAScript scripts and/or ECMAScript modules”

That’s the “fetch” process, it’s completely host/implementation defined

RunJobs is dealing with an initial set of prefetched sequentially evaluated scripts and modules.  This would probably include the inline sequentially evaluated scripts/modules within an HTML file. It wouldn’t include any deferred scripts/modules.  A host handles deferred or dynamically identified scripts/modules  (eg, via dynamically inserting a script tag into the DOM)  by enqueuing additional Script/TopLevelModuleEvaluationJobs after RubJobs has already occurred.


I'm still not sure how I would refer to that exactly from an external spec.

The external spec.describes the process of fetching the source text and then either says that it performs RunJobs (if hit is initiating a new ES engine instantiation) or performs EnqueueJob to create a ScriptEvaluationJob or a TopLevelModuleEvaluationJob.


Further to this, if we do pre-instantiation of modules then we need some
way of making this happen before TopLevelModuleEvaluationJob runs.  I
don't think saying something like "use an implementation specific way to
pre-instantiate a module" would be that useful, whereas saying "call
ModuleDeclarationInstantiation for the module record" makes it clear
what is supposed to happen.

In a loader spec. that is expect to have multiple implementations (eg, the HTML module loader) you want to be careful to not over-specify things that should be implementation details. For example, whether or not modules are (fully) eagerly or asynchronously parsed should be something an implementation should be able to decide. 

The only real requirement on a TopLevelEvaluationJob is the one implied by step 8.a of https://tc39.github.io/ecma262/#sec-moduledeclarationinstantiation “Before instantiating a module, all of the modules it requested must be available.” By “module” here, it is really talking about the source code of a module. So, before enqueuing such a job the host needs to ensure that all the transitively referenced source modules are available.  An implementation could perform a full  parse of the module (as specified by https://tc39.github.io/ecma262/#sec-parsemodule) but an implementation might also work out a simpler way to identify a module’s dependencies that doesn’t require a full parse/static semantics analysis. So, you want to be careful not to transitively “call” ParseModule in your spec. to satisfy that requirement. Instead, you might use language like: “Determine, as if by recursively invoking PaarseModule, that all module source texts imported by the top level module are available.”

The primary interface from the ES spec. back into a loader is https://tc39.github.io/ecma262/#sec-hostresolveimportedmodule .  ES expect to get a Module Record for a fully parsed/analyzed source module back from that.  Something that isn’t said there (but probably should be) is that if the referenced module is is a source text module (there can be other, host/implementation defined kinds of modules that are defined using the ES Module grammar) then ParseModule must be “called” to produce the corresponding Source Text Module Record.

I would expect any  host spec. for MostResolveImportedModule to roughly follow this pattern:
  1.   Use referencingModule and specifier to obtain a host internal module id, mid.
  2.   If the host known module registry has an entry for mid, then return the Module Record associated with mid in the registry. 
  3.   If mid identifies a source text module, then
    1. Assert: The source text of the module has already been fetched
    2. Let src be the source text identified by mid.
    3. Let modRec be ?ParseModule(src, currentRealm, hostProvided).
    4. Create an entry in the host known module registry for mid that associates mid with modRec.
    5. Return modred
  4.   handle other implementation specified kinds of modules
Finally, I want to emphasize one more time that Module Records, as defined in the ES spec, are just an abstraction of the spec. They may or may not correspond to actual  descrete records in an implementation. The data structures that a loader uses to track modules are not necessarily concrete Module Records and may have additional semantics that are specific to the loader/host environment. The only requirements on the association between the loader’s concept of a “module” and ES’s Module Record abstraction are those given in https://tc39.github.io/ecma262/#sec-hostresolveimportedmodule 

...
I think you're right.  I'm not sure exactly what I had in mind but it
seems the answer is "do anything you like as long as it is not
observable to script", which is a lower bar than I realized.

YES!


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: ModuleDeclarationInstantiation behaviour after failure

Jon Coppeard
On 18/07/2016 19:04, Allen Wirfs-Brock wrote:
> The intent is that after the “host” has fetched a module (and its
> dependencies) it enqueue a TopLevelModuleEvaluationJob.

OK, so calling EnqueueJob with a TopLevelModuleEvaluationJob is how this
is supposed to happen.

That doesn't give an indication of whether the job succeeded though
(except for the fact that it would call HostReportErrors some time
later).  Is it possible to to find out whether this was successful
without relying on implementation-specific behaviour?

> In a loader spec. that is expect to have multiple implementations (eg,
> the HTML module loader) you want to be careful to not over-specify
> things that should be implementation details.

I agree.  My concern is that since this is a complex and subtle area
under-specifying this could lead to differences in behaviour between
implementations.

> The primary interface from the ES spec. back into a loader
> is https://tc39.github.io/ecma262/#sec-hostresolveimportedmodule .  ES
> expect to get a Module Record for a fully parsed/analyzed source module
> back from that.  Something that isn’t said there (but probably should
> be) is that if the referenced module is is a source text module (there
> can be other, host/implementation defined kinds of modules that are
> defined using the ES Module grammar) then ParseModule must be “called”
> to produce the corresponding Source Text Module Record.
>
> I would expect any  host spec. for MostResolveImportedModule to roughly
> follow this pattern:
>
>  1.   Use /referencingModule/ and /specifier/ to obtain a host internal
>     module id, /mid/.
>  2.   If the host known module registry has an entry for /mid/, then
>     return the Module Record associated with /mid/ in the registry.
>  3.   If /mid/ identifies a source text module, then
>      1. Assert: The source text of the module has already been fetched
>      2. Let /src/ be the source text identified by /mid/.
>      3. Let /modRec/ be ?ParseModule(/src/, /currentRealm/, /hostProvided/).
>      4. Create an entry in the host known module registry for /mid/ that
>         associates /mid/ with /modRec/.
>      5. Return /modred/
>  4.   handle other implementation specified kinds of modules

Thanks, it's useful to see that written down.

Jon


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

RE: ModuleDeclarationInstantiation behaviour after failure

Domenic Denicola
From: es-discuss [mailto:[hidden email]] On Behalf Of Jon Coppeard


> OK, so calling EnqueueJob with a TopLevelModuleEvaluationJob is how this is
> supposed to happen.
>
> That doesn't give an indication of whether the job succeeded though (except
> for the fact that it would call HostReportErrors some time later).  Is it possible
> to to find out whether this was successful without relying on
> implementation-specific behaviour?

It's already been discussed elsewhere how the ES job formalism is not sufficient for HTML. There are plans to refactor it which have previously reached agreement but nobody has taken the time to implement them. (Part of the refactoring was already done, by moving a lot of the job machinery into ES's RunJobs, which HTML explicitly states user agents must never use.)

In the meantime, what we can do (and are doing, in the current spec) is just copying the steps of TopLevelModuleEvaluationJob into HTML's script execution machinery. That allows us to detect failures.
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: ModuleDeclarationInstantiation behaviour after failure

Allen Wirfs-Brock
In reply to this post by Jon Coppeard

On Jul 20, 2016, at 8:47 AM, Jon Coppeard <[hidden email]> wrote:

On 18/07/2016 19:04, Allen Wirfs-Brock wrote:
The intent is that after the “host” has fetched a module (and its
dependencies) it enqueue a TopLevelModuleEvaluationJob. 

OK, so calling EnqueueJob with a TopLevelModuleEvaluationJob is how this
is supposed to happen.

That doesn't give an indication of whether the job succeeded though
(except for the fact that it would call HostReportErrors some time
later).  Is it possible to to find out whether this was successful
without relying on implementation-specific behavior?


“Is it possible to to find out whether this was successful” For whom to find out?  If you are talking about the implementation (or host specs) HostReportError is the mechanism the ES spec. uses to indicate that a point has been reached where extra lingual errors (i.e., errors that are not directly visible to ES code as exceptions) may be make manifest by the implementation. 

(Note that while https://tc39.github.io/ecma262/#sec-toplevelmoduleevaluationjob allows for eager parsing, linking, and even ModuleDeclarationInstantiations but also says that in the eager case “reporting” of the error must be deferred unto the appropriate step of TopLevelModuleEvaluation.)

This primarily is about TopLevelModuleEvaluationJobs that are initiated by the “host”.  In an implementation exposes a module loader with an API that allows an ES program to dynamically load modules (essentially dynamically create TopLevelModuleEvaluationJobs) then then HostReportError action for such a job would presumably trigger whatever sort of error reporting mechanism exposed via the API (probably reject a promise).

Allen



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: ModuleDeclarationInstantiation behaviour after failure

Jon Coppeard
In reply to this post by Domenic Denicola
Ah thanks, that explains what's going on there then.

On 20/07/2016 17:06, Domenic Denicola wrote:

> From: es-discuss [mailto:[hidden email]] On Behalf Of Jon Coppeard
>
>
>> OK, so calling EnqueueJob with a TopLevelModuleEvaluationJob is how this is
>> supposed to happen.
>>
>> That doesn't give an indication of whether the job succeeded though (except
>> for the fact that it would call HostReportErrors some time later).  Is it possible
>> to to find out whether this was successful without relying on
>> implementation-specific behaviour?
>
> It's already been discussed elsewhere how the ES job formalism is not sufficient for HTML. There are plans to refactor it which have previously reached agreement but nobody has taken the time to implement them. (Part of the refactoring was already done, by moving a lot of the job machinery into ES's RunJobs, which HTML explicitly states user agents must never use.)
>
> In the meantime, what we can do (and are doing, in the current spec) is just copying the steps of TopLevelModuleEvaluationJob into HTML's script execution machinery. That allows us to detect failures.
>
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss