Suggestions to triple quoted strings proposal

classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Suggestions to triple quoted strings proposal

Brendan Eich-2
On Dec 13, 2006, at 10:59 AM, Stepan Koltsov wrote:

> Brendan, or anybody else who wants multiline strings should to behave
> like in Python,
>
> Could you please write complex-enough example of code with TQS? In
> that example string constant should be declared inside method inside
> class. There is no good example at
> http://developer.mozilla.org/es4/proposals/triple_quotes.html .

You're right there's no good example, but the Python docs have  
examples, and real code has even more compelling examples. Two  
arguments here:

1.  "Be like Python, reuse brainprint from JS hackers who know Python  
and Python hackers learning JS".  This is non-trivial.  It's not just  
"marketing".  It makes the world better to avoid defining """  
differently in ES4/JS2 from Python.

2.  "Be like Python, stand on its shoulders and reuse the experience  
that informed its design decisions and defaults".  This is certainly  
a gamble, since JS is not Python, and Python ain't perfect (JS is far  
from perfect).  But with some care (e.g., eliminating GeneratorExit  
in the JS Pythonic generators available now in Firefox 2, and going  
into ES4), it can pay off.  There's probably value here, unless  
Python has failed to heed negative feedback on non-stripping """.

3.  Quote means verbatim contents modulo escapes and special case for  
embedded newlines, i.e. literal.  Trimming or stripping does not fit  
under the notion of "literal".  Bob and I have made this point, it's  
about intuition more than optimizing for the common case.

> I used to write in Python, I hated its """ behaviour. I asked people
> who use Python, and they generally agreed with me.

Were they writing doc strings or data?  We have http://
developer.mozilla.org/es4/proposals/documentation.html for  
documentation, that is, Java doc-comments with simpler embedded  
"markup" syntax.

> I'm afraid, that if you keep TQS "simple", they won't be very usable:
> in 99% cases users will be forced to manually strip spaces and leading
> newline. In 1% cases string constant will be defined outside block, or
> amount of spaces will not matter.
>
> I have no other arguments :)

This is the crux of the matter.  My counter-argument is number 2,  
above.  If your Python experience were more common, something would  
have been done.  But I could be wrong.

Can you say more about what these """ strings contained in Python  
(doc vs. data, etc.).  More context, real examples?

/be

Reply | Threaded
Open this post in threaded view
|

Re: Suggestions to triple quoted strings proposal

Brendan Eich-2
On Dec 13, 2006, at 1:15 PM, Brendan Eich wrote:

> On Dec 13, 2006, at 10:59 AM, Stepan Koltsov wrote:
>
>> Brendan, or anybody else who wants multiline strings should to behave
>> like in Python,
>>
>> Could you please write complex-enough example of code with TQS? In
>> that example string constant should be declared inside method inside
>> class. There is no good example at
>> http://developer.mozilla.org/es4/proposals/triple_quotes.html .
>
> You're right there's no good example, but the Python docs have  
> examples, and real code has even more compelling examples. Two  
> arguments here:

Of course, I revised the list to make three:

> 1.  "Be like Python, reuse brainprint from JS hackers who know  
> Python and Python hackers learning JS".  This is non-trivial.  It's  
> not just "marketing".  It makes the world better to avoid defining  
> """ differently in ES4/JS2 from Python.
>
> 2.  "Be like Python, stand on its shoulders and reuse the  
> experience that informed its design decisions and defaults".  This  
> is certainly a gamble, since JS is not Python, and Python ain't  
> perfect (JS is far from perfect).  But with some care (e.g.,  
> eliminating GeneratorExit in the JS Pythonic generators available  
> now in Firefox 2, and going into ES4), it can pay off.  There's  
> probably value here, unless Python has failed to heed negative  
> feedback on non-stripping """.
>
> 3.  Quote means verbatim contents modulo escapes and special case  
> for embedded newlines, i.e. literal.  Trimming or stripping does  
> not fit under the notion of "literal".  Bob and I have made this  
> point, it's about intuition more than optimizing for the common case.

But this is not meant to puff up the case for Pythonic """ -- point 3  
is pretty strong by itself.  Anyway, as you say the crucial question  
is: what's the most common use-case?

/be


Reply | Threaded
Open this post in threaded view
|

Re: Suggestions to triple quoted strings proposal

Stepan Koltsov-2
In reply to this post by Bob Ippolito
On 12/13/06, Bob Ippolito <[hidden email]> wrote:
> On 12/13/06, Stepan Koltsov <[hidden email]> wrote:
> > I've read proposal of triple quoted strings at
...
> > And I have two suggestions.
...
> Eh. String should just grow a method to do that. Literals are literals
> and should be treated as such.

We can call them "heredocs". It is term from bash man page.

I've found two operators: "<<" and "<<-" in bash man page. "<<"
ignores first newline. "<<-" also (surprise!) strips "all leading tab
characters from input lines and the line containing delimiter" (from
bash javadoc). From that man page:

===
This allows here-documents within shell scripts to be indented in a
natural fashion.
===

Wow! This is what I want for ES4.

Example of bash script:

===
#!/bin/bash -e

if true; then
    cat <<- FEOF
    line 1
    line 2
    FEOF
fi

cat << EOF
line 3
line 4
EOF
===

It prints:

===
line 1
line 2
line 3
line 4
===


I'm going to dig Python libraries to find the "most common use-case".

--
Stepan

Reply | Threaded
Open this post in threaded view
|

Re: Suggestions to triple quoted strings proposal

Bob Ippolito
On 12/14/06, Stepan Koltsov <[hidden email]> wrote:

> On 12/13/06, Bob Ippolito <[hidden email]> wrote:
> > On 12/13/06, Stepan Koltsov <[hidden email]> wrote:
> > > I've read proposal of triple quoted strings at
> ...
> > > And I have two suggestions.
> ...
> > Eh. String should just grow a method to do that. Literals are literals
> > and should be treated as such.
>
> We can call them "heredocs". It is term from bash man page.
>
> I've found two operators: "<<" and "<<-" in bash man page. "<<"
> ignores first newline. "<<-" also (surprise!) strips "all leading tab
> characters from input lines and the line containing delimiter" (from
> bash javadoc). From that man page:
>
> ===
> This allows here-documents within shell scripts to be indented in a
> natural fashion.
> ===
>
> Wow! This is what I want for ES4.
>
> Example of bash script:
>
> ===
> #!/bin/bash -e
>
> if true; then
>     cat <<- FEOF
>     line 1
>     line 2
>     FEOF
> fi
>
> cat << EOF
> line 3
> line 4
> EOF
> ===
>
> It prints:
>
> ===
> line 1
> line 2
> line 3
> line 4
> ===
>
>
> I'm going to dig Python libraries to find the "most common use-case".

The most common use case in Python is definitely documentation and
doctests, in which case it doesn't really matter how the indentation
works because it's getting processed by some module before the user
looks at it anyway. With those use cases, there's plenty of
opportunity and little hassle to appropriately mangle the string.

pydoc and doctest would benefit from automatic detabbing, but they
would save exactly one expression each. The downside is huge though;
automatic detabbing would mean that the detabbing stuff would need to
happen in C (the compiler can't use textwrap because it might not
exist in compiled form!), which would be a LOT more than one
expression, duplicated code (compiler in C, textwrap in Python) and
would become an unnecessary maintenance burden.

-bob

Reply | Threaded
Open this post in threaded view
|

Immediate closing of iterators

Jeff Thompson-5
In reply to this post by Brendan Eich-2
This is a follow-up to a bugzilla discussion at:
https://bugzilla.mozilla.org/show_bug.cgi?id=349326

var was_closed = false;
function gen() {
         try {
                 yield 1;
         } finally {
                 was_closed = true;
         }
}

for (var i in gen())
         break;
print("was_closed=" + was_closed);

Right now, this prints "was_closed=false" because when it breaks out of the for loop,
Javascript does not close the iterator, and does not execute the finally clause to
set was_closed true.  I want to argue that Javascript should close the iterator:
* No guarantee is made that the 'finally' clause will ever be
   executed, so if you need it to close a file, etc. it might never happen.
* This seems the opposite of how 'finally' works, which usually means that you can
   be *sure* it will execute, even if the 'try' block throws an exception, etc.
* Python 2.5 and C# both close the iterator and execute the 'finally' clause (as one would expect)
* In fact, before version 2.5, Python gives a compiler error for this code:
   >>> 'yield' not allowed in a 'try' block with a 'finally' clause
   I guess this is because before 2.5, Python did not always close the iterator when
   leaving a for loop from an exception, etc., so they didn't let you write code that
   wouldn't work like you expect.

So, could the spec require Javascript to always close the iterator when leaving a loop?

Thanks,
- Jeff


Reply | Threaded
Open this post in threaded view
|

Re: Re: Suggestions to triple quoted strings proposal

Stepan Koltsov-2
In reply to this post by Brendan Eich-2
Hi, again.

I've looked in sources of Python itself (checked out from
http://svn.python.org/projects/python/trunk). Possibly, nobody writes
in Python "better" then Python developers.

I've written script that counts usages of multiline strings in python source.

(Script is actually a Java program. I code in Java 10 hours a day, I
do it really fast :)

Of course, most TQS in Python are used as docstrings (and doctests).
There are 8214 multiline strings in Python sources.

First, I though about first newline after TQS.

There are 907 uses of multiline strings (that are not docstrings) in
Python sources. Only 1/9 of multiline strings store data.

In 368 cases among 907, starting triple quotes followed by backslash
and newline.

=== real example from Doc/lib/minidom-example.py
document = """\
<slideshow>
...
"""
===

It is more then 1/3.

In 342 cases among 907, starting triple quote followed by newline.


I have no numbers that show that leading spaces should be stripped by
lexer. I don't know what to measure. I can show the extraction from
sources:

http://mx1.ru/~yozh/js2/nds-indent.txt

This file contains real-world examples of data stored inside multiline
strings, where statements declared inside some blocks. I can repeat,
code looks dirty.

Also I have file

http://mx1.ru/~yozh/js2/nds.txt

Contains all fragments with TQS that are not docstrings.

Script sources can be found at

http://mx1.ru/~yozh/js2/dig-tqs.zip

(you can look inside, if you think that my script produced wrong files)

On 12/14/06, Brendan Eich <[hidden email]> wrote:

> On Dec 13, 2006, at 10:59 AM, Stepan Koltsov wrote:
>
> > Brendan, or anybody else who wants multiline strings should to behave
> > like in Python,
> >
> > Could you please write complex-enough example of code with TQS? In
> > that example string constant should be declared inside method inside
> > class. There is no good example at
> > http://developer.mozilla.org/es4/proposals/triple_quotes.html .
>
> You're right there's no good example, but the Python docs have
> examples,

BTW, Python docs has no good examples of multiline strings.

Language reference has no example. Python tutorial has something...
ehh... not nice:

print """
Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to
"""

this prints text surrounded by empty lines (first -- because of
leading newline, last -- because print stmt adds own newline).

> and real code has even more compelling examples. Two
> arguments here:
>
> 1.  "Be like Python, reuse brainprint from JS hackers who know Python
> and Python hackers learning JS".  This is non-trivial.  It's not just
> "marketing".  It makes the world better to avoid defining """
> differently in ES4/JS2 from Python.
>
> 2.  "Be like Python, stand on its shoulders and reuse the experience
> that informed its design decisions and defaults".  This is certainly
> a gamble, since JS is not Python, and Python ain't perfect (JS is far
> from perfect).  But with some care (e.g., eliminating GeneratorExit
> in the JS Pythonic generators available now in Firefox 2, and going
> into ES4), it can pay off.  There's probably value here, unless
> Python has failed to heed negative feedback on non-stripping """.

BTW, there were no design decisions when Guido developed first version
of Pyton 15 years ago as a "hobby" programming project (quote from
Wikipedia).

Long time ago I asked Python developers about their interpretation of
multiline strings. And they answered that behaviour is proper, and
even if it was not proper, it is too late to change it.

> 3.  Quote means verbatim contents modulo escapes and special case for
> embedded newlines, i.e. literal.  Trimming or stripping does not fit
> under the notion of "literal".  Bob and I have made this point, it's
> about intuition more than optimizing for the common case.
>
> > I used to write in Python, I hated its """ behaviour. I asked people
> > who use Python, and they generally agreed with me.
>
> Were they writing doc strings or data? We have http://
> developer.mozilla.org/es4/proposals/documentation.html for
> documentation, that is, Java doc-comments with simpler embedded
> "markup" syntax.

I asked about data. I think, documentation format is not very important.

Personally, I prefer javadoc/doxygen style over docstrings.

> > I'm afraid, that if you keep TQS "simple", they won't be very usable:
> > in 99% cases users will be forced to manually strip spaces and leading
> > newline. In 1% cases string constant will be defined outside block, or
> > amount of spaces will not matter.
> >
> > I have no other arguments :)
>
> This is the crux of the matter.  My counter-argument is number 2,
> above.  If your Python experience were more common, something would
> have been done.  But I could be wrong.
>
> Can you say more about what these """ strings contained in Python
> (doc vs. data, etc.).  More context, real examples?

Any questions?

--
Stepan

Reply | Threaded
Open this post in threaded view
|

Re: Immediate closing of iterators

Brendan Eich-2
In reply to this post by Jeff Thompson-5
On Dec 14, 2006, at 9:01 PM, Jeff Thompson wrote:

> This is a follow-up to a bugzilla discussion at:
> https://bugzilla.mozilla.org/show_bug.cgi?id=349326

Jeff, thanks for writing.  I'll try to add relevant detail below,  
since this question is complicated by a bug you hit when testing in  
the js shell built from SpiderMonkey sources, I think.  Plus, the  
whole topic is complicated, period.

> var was_closed = false;
> function gen() {
>         try {
>                 yield 1;
>         } finally {
>                 was_closed = true;
>         }
> }
>
> for (var i in gen())
>         break;
> print("was_closed=" + was_closed);
>
> Right now, this prints "was_closed=false" because when it breaks  
> out of the for loop,
> Javascript does not close the iterator, and does not execute the  
> finally clause to
> set was_closed true.

Only if you run the above in the js shell, which shows up a bug  
(https://bugzilla.mozilla.org/show_bug.cgi?id=363917).  I just fixed  
that bug in my build, and with this variation on your testcase:

var was_closed = false;
function gen() {
         try {
                 yield 1;
         } finally {
                 was_closed = true;
                 print("closed!");
         }
}

for (var i in gen())
         break;
print("was_closed=" + was_closed);

gc();  // <=== force a GC here

I get this output:

was_closed=false
global
closed!

The gc() call is needed, otherwise the shell does a final GC after  
its global object has become unreachable, and any generator scoped by  
an unreachable global object is not closed, which is intentional (see  
below for why).

This HTML version of your test:

<textarea id="t" rows="4"></textarea>
<script type="application/javascript;version=1.7">
var tarea = document.getElementById('t');
var print = function (s) { t.value += s + '\n'; }
var was_closed = false;
function gen() {
         try {
                 yield 1;
         } finally {
                 was_closed = true;
                 print("closed!");
         }
}

for (var i in gen())
         break;
print("was_closed=" + was_closed);

function toString() { return "global"; }
print(this);
</script>

works as expected (if not as desired), with no forced GC (Firefox  
runs a GC soon enough after page load that you can see the "closed!"  
text in the textarea appear, but off the page-load critical path).

This may seem cold comfort, since the finalization happens well after  
the loop terminates.

IOW, what's wanted is not a guarantee that close always runs the  
generator (a finally clause).  If ES4 did require that, trivial  
denial-of-service attacks exist; this is why JS1.7 doesn't close  
generators that are unreachable if their scope is unreachable.

What's wanted is that close run promptly *only* in the case of a for-
in loop where no reference to the iterator-generator escapes to the  
heap.

This is a reasonable thing to want, and it's what https://
bugzilla.mozilla.org/show_bug.cgi?id=349326 requests.  The question  
is, should ES4 require such prompt finalization in the case where the  
for-in loop creates the iterator and it never escapes to the heap?

Here's another variation on your testcase:

var was_closed = false;
function gen() {
         try {
                 yield 1;
         } finally {
                 was_closed = true;
                 print("closed!");
         }
}

var i = gen().next();
print("was_closed=" + was_closed);

With the js shell bugfix, but without the explicit gc() call at the  
end, this too will fail to close the iterator returned by gen().  But  
there's no for-in loop here, so if you think this case shows a bug,  
then you are not just asking for for-in loops to close non-escaping  
generator-iterators -- you are asking all ES4 implementations to use  
reference counting or something equivalent, which promptly finalizes  
unreachable generator-iterators.  That's a harsher requirement for a  
standard to make on all implementations.

>   I want to argue that Javascript should close the iterator:
> * No guarantee is made that the 'finally' clause will ever be
>   executed, so if you need it to close a file, etc. it might never  
> happen.

You can't count on deferred scripted functions such as timeouts  
running in the browser.  The page may be unloaded and the timeout  
canceled.  The situation with close hooks (finally clauses in  
generators) is entirely analogous.

Also (and this is really an aside -- I'm not arguing about the for-in-
loops-should-close-non-escaping-generator-iterators point):  
finalization should never be required to promptly free scarce  
resources that have been explicitly allocated by a program.  ECMA  
requires some kind of GC, but not promply finalizing GC.

Note that finalization is never canceled, but finally clauses in a  
scripted finalize hook (that's what a generator that yields from a  
try with a finally is) may have to be canceled, just as timeouts may  
be canceled.

> * This seems the opposite of how 'finally' works, which usually  
> means that you can
>   be *sure* it will execute, even if the 'try' block throws an  
> exception, etc.

Python actually does not guarantee that finally clauses in all  
generators run -- if the generator misbehaves by yielding while being  
closed, then an outer finally may not be run.  This is considered the  
best way to deal with a misbehaving generator.  The ES4 design avoids  
this by throwing an exception from yield during close, which runs  
finallys on the way out if uncaught.

Anyway, the js shell bug aside, the issue for es4-discuss is not  
whether finally must always run (it can't or DOS attacks are trivial  
using generators); it's whether for-in loops should promptly finalize  
non-escaping iterators.

> * Python 2.5 and C# both close the iterator and execute the  
> 'finally' clause (as one would expect)
> * In fact, before version 2.5, Python gives a compiler error for  
> this code:
>   >>> 'yield' not allowed in a 'try' block with a 'finally' clause
>   I guess this is because before 2.5, Python did not always close  
> the iterator when
>   leaving a for loop from an exception, etc., so they didn't let  
> you write code that
>   wouldn't work like you expect.

Right; we prototyped this but then tracked 2.5; later, we eliminated  
GeneratorExit (see the python-dev thread mentioned above).

> So, could the spec require Javascript to always close the iterator  
> when leaving a loop?

So long as it's a non-escaping generator-iterator created by the  
loop, then the spec could mandate that.  It requires some extra work  
by non-reference-counting implementations.  They need to keep track  
of such generator-iterators across nested loops in each live function  
or script activation, and close each generator-iterator as control  
exits its loop.

We'll talk about it in the current TG1 meeting, which finishes  
tomorrow (day 3).  More then, or here in this thread of TG1'ers prefer.

/be


Reply | Threaded
Open this post in threaded view
|

Re: Immediate closing of iterators

Jeff Thompson-5
Brendan Eich wrote:
> What's wanted is that close run promptly *only* in the case of a for-in
> loop where no reference to the iterator-generator escapes to the heap.
>
> This is a reasonable thing to want, and it's what
> https://bugzilla.mozilla.org/show_bug.cgi?id=349326 requests.  

Yes, as you say, I'm only talking about the case of the for-in loop.
Thanks for looking into this.

- Jeff


Reply | Threaded
Open this post in threaded view
|

Re: Immediate closing of iterators

Brendan Eich-2
On Dec 15, 2006, at 11:34 AM, Jeff Thompson wrote:

> Brendan Eich wrote:
>> What's wanted is that close run promptly *only* in the case of a  
>> for-in loop where no reference to the iterator-generator escapes  
>> to the heap.
>> This is a reasonable thing to want, and it's what https://
>> bugzilla.mozilla.org/show_bug.cgi?id=349326 requests.
>
> Yes, as you say, I'm only talking about the case of the for-in loop.
> Thanks for looking into this.

TG1 members here today agreed to specify normative prompt close if  
the for-in's generator-iterator can't escape -- yay!

I'll comment in the Mozilla bug.

/be


Reply | Threaded
Open this post in threaded view
|

Re: Immediate closing of iterators

Brendan Eich-2
In reply to this post by Brendan Eich-2
On Dec 14, 2006, at 11:51 PM, Brendan Eich wrote:

> <textarea id="t" rows="4"></textarea>
> <script type="application/javascript;version=1.7">
> var tarea = document.getElementById('t');
> var print = function (s) { t.value += s + '\n'; }

Heh; this works, but careful readers may wonder why, since the print  
function uses t.value, not tarea.value. Turns out Firefox emulates an  
IE DOM quirk, only if the name lookup would fail, where elements  
given ids (or names? I forget) reflect as global properties.  Sick,  
but we found too many pages counting on it without detecting non-IE  
user agent.

/be


Reply | Threaded
Open this post in threaded view
|

Re: Immediate closing of iterators

Chris Hansen-5
In reply to this post by Jeff Thompson-5
>> So, could the spec require Javascript to always close the iterator
>> when leaving a loop?
>
> So long as it's a non-escaping generator-iterator created by the
> loop, then the spec could mandate that.  It requires some extra work
> by non-reference-counting implementations.  They need to keep track
> of such generator-iterators across nested loops in each live function
> or script activation, and close each generator-iterator as control
> exits its loop.

Unless the definition of "created by the loop" is very strict won't
this effectively mandate a full GC whenever you leave a for-in loop?
Even reference counting implementations would have to detect
unreachable cycles in the object graph.


-- Chris

Reply | Threaded
Open this post in threaded view
|

Re: Immediate closing of iterators

Brendan Eich-2
On Dec 20, 2006, at 3:02 AM, Chris Hansen wrote:

>>> So, could the spec require Javascript to always close the iterator
>>> when leaving a loop?
>>
>> So long as it's a non-escaping generator-iterator created by the
>> loop, then the spec could mandate that.  It requires some extra work
>> by non-reference-counting implementations.  They need to keep track
>> of such generator-iterators across nested loops in each live function
>> or script activation, and close each generator-iterator as control
>> exits its loop.
>
> Unless the definition of "created by the loop" is very strict won't
> this effectively mandate a full GC whenever you leave a for-in loop?
> Even reference counting implementations would have to detect
> unreachable cycles in the object graph.

function gen() {
   yield 1; yield 2
}
for (let i in gen())
   print(i)

The generator function gen, when invoked to the right of 'in' in the  
loop head, constructs a generator-iterator that is referenced only by  
the implementation (a stack slot in typical implementations).  Its  
reference can't be discovered by any meta-object protocol.  It should  
die when the loop completes, abruptly or normally.

/be


Reply | Threaded
Open this post in threaded view
|

Re: Immediate closing of iterators

Chris Hansen-5
> function gen() {
>    yield 1; yield 2
> }
> for (let i in gen())
>    print(i)
>
> The generator function gen, when invoked to the right of 'in' in the
> loop head, constructs a generator-iterator that is referenced only by
> the implementation (a stack slot in typical implementations).  Its
> reference can't be discovered by any meta-object protocol.  It should
> die when the loop completes, abruptly or normally.

That's true, the question is: how do you make sure that it's _only_ on
the stack and not stored in some instance variable or closure if you
don't run a full GC?

function gen1() {
  yield 1; yield 2;
}

function gen2() {
  globalVar = gen1();
  return globalVar;
}

for (let i in gen2()) {
  print(i);
}

In this case the generator shouldn't be closed.  If gen2 had been defined as

function gen2() {
  return gen1();
}

I would expect that it should.


-- Chris

Reply | Threaded
Open this post in threaded view
|

Re: Immediate closing of iterators

Brendan Eich-2
On Dec 20, 2006, at 3:49 PM, Chris Hansen wrote:

> In this case the generator shouldn't be closed.  If gen2 had been  
> defined as
>
> function gen2() {
>  return gen1();
> }
>
> I would expect that it should.

This is, as you say, asking for a full reference graph analysis.  We  
don't propose that.

Consider two cases:

1. The Mozilla bug Jeff Thompson mentioned, https://
bugzilla.mozilla.org/show_bug.cgi?id=349326, wants only a generator-
iterator created "under the hood" to be closed promptly:

   for (i in o)
     break;

where o denotes some object that's not an iterator -- an object that  
does not have an iterator::get method at all, or whose iterator::get  
returns a different object (o is the iterable, the returned object is  
its iterator).

In this case, the for-in loop can tell whether it is calling a well-
known iterator::get native method, or the default used when there is  
no o.iterator::get, whose result is newborn and where the result  
can't escape.  The returned iterator could be a generator-iterator:

   o.iterator::get = function () { yield 1; yield 2 }

The internal Generator class's iterator::get method (called  
__iterator__ in JS1.7 in Firefox, for want of full namespace support)  
is immutable (in ECMA-262 terms a generator-iterator's iterator::get  
property has the ReadOnly and DontDelete attributes set).  This  
method returns its |this| parameter, since a generator is an iterator.

The for-in loop can tell whether o's iterator::get (mutable, but it  
doesn't matter for this analysis) references a generator-function.  
It knows that the generator-function will return a new generator-
iterator that can't escape, because the Generator class prototype's  
iterator::get method can't be replaced or shadowed, so its return  
value can't be spied on.  Therefore in this case, the implementation  
can guarantee prompt close without a full heap scan.

2. To handle the

   for (i in gen())
     break;

case, the implementation would need to recognize gen, whether via  
static analysis in strict mode or in any sufficiently optimizing  
implementation, or dynamically in standard mode or in cases where  
static analysis can't decide, as a generator function.  At this point  
the analysis for case 1 applies.

Anything else (function returning generator-iterator that might have  
escaped to a global, etc.) defeats the prompt close promise.

Does this work for the cases users care about?  I believe it does,  
but welcome counterexamples.

/be

Reply | Threaded
Open this post in threaded view
|

Re: Immediate closing of iterators

Chris Hansen-5
This means that you can't even delegate a call that returns a
generator without voiding this guarantee?

class CollectionWrapper {
  var collection = ...some collection...;
  function gen() {
    return this.collection.gen();
  }
}

var c = new CollectionWrapper();
for (i in c.gen())
  break;

If the guarantee is voided so easily, and by operations that users
will percieve as completely trivial, I think people should be very
reluctant to ever rely on it.  And then why even have it?


-- Chris

On 12/21/06, Brendan Eich <[hidden email]> wrote:

> On Dec 20, 2006, at 3:49 PM, Chris Hansen wrote:
>
> > In this case the generator shouldn't be closed.  If gen2 had been
> > defined as
> >
> > function gen2() {
> >  return gen1();
> > }
> >
> > I would expect that it should.
>
> This is, as you say, asking for a full reference graph analysis.  We
> don't propose that.
>
> Consider two cases:
>
> 1. The Mozilla bug Jeff Thompson mentioned, https://
> bugzilla.mozilla.org/show_bug.cgi?id=349326, wants only a generator-
> iterator created "under the hood" to be closed promptly:
>
>    for (i in o)
>      break;
>
> where o denotes some object that's not an iterator -- an object that
> does not have an iterator::get method at all, or whose iterator::get
> returns a different object (o is the iterable, the returned object is
> its iterator).
>
> In this case, the for-in loop can tell whether it is calling a well-
> known iterator::get native method, or the default used when there is
> no o.iterator::get, whose result is newborn and where the result
> can't escape.  The returned iterator could be a generator-iterator:
>
>    o.iterator::get = function () { yield 1; yield 2 }
>
> The internal Generator class's iterator::get method (called
> __iterator__ in JS1.7 in Firefox, for want of full namespace support)
> is immutable (in ECMA-262 terms a generator-iterator's iterator::get
> property has the ReadOnly and DontDelete attributes set).  This
> method returns its |this| parameter, since a generator is an iterator.
>
> The for-in loop can tell whether o's iterator::get (mutable, but it
> doesn't matter for this analysis) references a generator-function.
> It knows that the generator-function will return a new generator-
> iterator that can't escape, because the Generator class prototype's
> iterator::get method can't be replaced or shadowed, so its return
> value can't be spied on.  Therefore in this case, the implementation
> can guarantee prompt close without a full heap scan.
>
> 2. To handle the
>
>    for (i in gen())
>      break;
>
> case, the implementation would need to recognize gen, whether via
> static analysis in strict mode or in any sufficiently optimizing
> implementation, or dynamically in standard mode or in cases where
> static analysis can't decide, as a generator function.  At this point
> the analysis for case 1 applies.
>
> Anything else (function returning generator-iterator that might have
> escaped to a global, etc.) defeats the prompt close promise.
>
> Does this work for the cases users care about?  I believe it does,
> but welcome counterexamples.
>
> /be
>

Reply | Threaded
Open this post in threaded view
|

Re: Immediate closing of iterators

Brendan Eich-2
On Dec 21, 2006, at 3:18 AM, Chris Hansen wrote:

> This means that you can't even delegate a call that returns a
> generator without voiding this guarantee?
>
> class CollectionWrapper {
>  var collection = ...some collection...;
>  function gen() {
>    return this.collection.gen();
>  }

You're right, the case 2 analysis I wrote up in my last message was  
too restrictive.  If you put a type annotation on function gen, then  
the guarantee should hold:

class CollectionWrapper {
  var collection = ...some collection...;
  function gen():Generator.<*,*,*> {
    return this.collection.gen();
  }
}

(You could use narrower types than * if appropriate.)

 From http://developer.mozilla.org/es4/proposals/ 
iterators_and_generators.html:

"A function containing a yield expression is a generator function,  
which when called binds formal parameters to actual arguments but  
evaluates no part of its body. Instead, it returns a generator-
iterator of nominal type Generator:

class Generator.<O, I, E> {
   public function send(i: I) : O,
   public function next() : O;
   public function throw(e : E) : O,
   public function close() : void
};
"

So the guarantee has two conditions: that the object to the right of  
'in' in the 'for-in' loop is not an iterator (case 1 in my last  
message); or if it is, that it is a generator-iterator (case 2,  
revised).  A generator-iterator is an instance of Generator.<O,I,E>.

In this light, case 1 is really just an optimization to allow  
finalization of any non-generator iterator created for an iterable o  
given for (i in o), where o has an iterator::get method that's not a  
generator-function. It's not just an early out to calling close,  
because the iterator may not have a close hook (may not be a  
Generator); the benefit is that it can be finalized promptly. This is  
informative not normative, since case 2 would uphold the guarantee. I  
wasn't clear last time, sorry about that.

/be

Reply | Threaded
Open this post in threaded view
|

Re: Immediate closing of iterators

Chris Hansen-5
> You're right, the case 2 analysis I wrote up in my last message was
> too restrictive.  If you put a type annotation on function gen, then
> the guarantee should hold:
>
> class CollectionWrapper {
>   var collection = ...some collection...;
>   function gen():Generator.<*,*,*> {
>     return this.collection.gen();
>   }
> }

But if the rule is less restrictive I would claim that my original
objection applies: you need to run a full gc whenever a loop exits
abruptly to determine whether or not the generator is reachable:

class CollectionWrapper {
  var collection = ...some collection...;
  function gen():Generator.<*,*,*> {
    var result = this.collection.gen();
    if (someCondition) myGlobal = result;
    return result;
  }
}


-- Chris

Reply | Threaded
Open this post in threaded view
|

Re: Immediate closing of iterators

Brendan Eich-2
On Dec 21, 2006, at 9:55 AM, Chris Hansen wrote:

>> You're right, the case 2 analysis I wrote up in my last message was
>> too restrictive.  If you put a type annotation on function gen, then
>> the guarantee should hold:
>>
>> class CollectionWrapper {
>>   var collection = ...some collection...;
>>   function gen():Generator.<*,*,*> {
>>     return this.collection.gen();
>>   }
>> }
>
> But if the rule is less restrictive I would claim that my original
> objection applies: you need to run a full gc whenever a loop exits
> abruptly to determine whether or not the generator is reachable:

Sorry, caffeinated now.  The outcomes to avoid are:

1. Requiring a full gc on loop exit, obviously a non-starter for  
performance reasons.
2. Requiring reference counting of all implementations.
3. Requiring static escape analysis of all implementations.

What I was groping for with the revision to case 2 is this: a way for  
the type system to promise no escape, and for all implementations to  
trivially check it, when a delegate returns the result of a generator-
function call.  This type-checking can't require 3, so the form of  
the delegate would have to be restricted to linear flow, if not  
return of generator call.

Given the constraints, what's better: simple rules for direct  
generator calls, or special-casing in the type system just to allow  
delegation with guarantee of prompt close?

/be

Reply | Threaded
Open this post in threaded view
|

Re: Immediate closing of iterators

Chris Hansen-5
In C#, as I understand it, they sidestep this problem by (I'll try to
formulate this in terms of ES4) simply not allowing you to iterate
over generators. You can iterate over objects with an iterator::get
method but the returned object is owned by the loop and if it is a
generator it will be closed on loop exit whether or not others have
references to it.  Also, a generator doesn't have an iterator::get
method since that would complicate the question of who "owns" it.

That only "solves" Jeff's problem by disallowing it (ta-daa! ;-) but
it does away with the need for  any kind of finalization, prompt or
not.  In my experience (from java) GC finalization is something you
want to steer well clear of.

What do you think?


-- Chris

On 12/21/06, Brendan Eich <[hidden email]> wrote:

> On Dec 21, 2006, at 9:55 AM, Chris Hansen wrote:
>
> >> You're right, the case 2 analysis I wrote up in my last message was
> >> too restrictive.  If you put a type annotation on function gen, then
> >> the guarantee should hold:
> >>
> >> class CollectionWrapper {
> >>   var collection = ...some collection...;
> >>   function gen():Generator.<*,*,*> {
> >>     return this.collection.gen();
> >>   }
> >> }
> >
> > But if the rule is less restrictive I would claim that my original
> > objection applies: you need to run a full gc whenever a loop exits
> > abruptly to determine whether or not the generator is reachable:
>
> Sorry, caffeinated now.  The outcomes to avoid are:
>
> 1. Requiring a full gc on loop exit, obviously a non-starter for
> performance reasons.
> 2. Requiring reference counting of all implementations.
> 3. Requiring static escape analysis of all implementations.
>
> What I was groping for with the revision to case 2 is this: a way for
> the type system to promise no escape, and for all implementations to
> trivially check it, when a delegate returns the result of a generator-
> function call.  This type-checking can't require 3, so the form of
> the delegate would have to be restricted to linear flow, if not
> return of generator call.
>
> Given the constraints, what's better: simple rules for direct
> generator calls, or special-casing in the type system just to allow
> delegation with guarantee of prompt close?
>
> /be
>

Reply | Threaded
Open this post in threaded view
|

Re: Immediate closing of iterators

Brendan Eich-2
On Dec 21, 2006, at 12:04 PM, Chris Hansen wrote:

> In C#, as I understand it, they sidestep this problem by (I'll try to
> formulate this in terms of ES4) simply not allowing you to iterate
> over generators. You can iterate over objects with an iterator::get
> method but the returned object is owned by the loop and if it is a
> generator it will be closed on loop exit whether or not others have
> references to it.

That's not Pythonic, but it certainly is simple.  I like it.

>   Also, a generator doesn't have an iterator::get
> method since that would complicate the question of who "owns" it.

This is not a problem in the ES4 proposal, or in Python.  Ownership  
of storage and close are coupled only to guarantee that close happens  
eventually, even if the client code fails to call gen.close()  
explicitly.  More below.

> That only "solves" Jeff's problem by disallowing it (ta-daa! ;-) but
> it does away with the need for  any kind of finalization, prompt or
> not.  In my experience (from java) GC finalization is something you
> want to steer well clear of.

Finalization is definitely two-phase in systems that have to support  
close (which might resurrect the generator) and then release its  
storage.  Those of us burdened with GC-based memory management for ES/
JS/AS implementations have to dance with the GC here.

The extensions in SpiderMonkey were not pretty, but became better  
after some over-general wrong turns were undone.  The current code is  
quite concrete: *only* generators participate in the close phase;  
scheduling close operations requires cooperation from the browser  
embedding, to prevent trivial Denial Of Service attacks (similar to  
those possible already via setTimeout).  IOW, a browser embedding may  
cancel outstanding close ops in some cases (e.g., the document whose  
script created the generator has been unloaded already).

For ref-counting implementations, there's still the possibility of a  
reference cycle.  So CPython, e.g., has to run close from its cycle  
collector, if it didn't already get called by the client code.  
CPython doesn't worry about DOSes from generators that resurrect  
themselves or spawn more generators from their close hooks, AFAIK.

> What do you think?

It's a great simplifying change, but it doesn't avoid the need to add  
a close protocol to one's GC, as noted above (there's no way around  
that problem, so I'm not dinging your suggestion, just trying to  
separate issues).  I'll run it up the pole with SpiderMonkey hackers  
and look for comments here from other ES4 parties.

/be


123