Re: [Json] Response to Statement from W3C TAG

classic Classic list List threaded Threaded
44 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Allen Wirfs-Brock

On Dec 4, 2013, at 11:39 PM, Carsten Bormann wrote:

On 05 Dec 2013, at 06:08, Tim Bray <[hidden email]> wrote:

FWIW, I have never understood what the ECMAnauts mean by the word “semantics” in this context, so I have no idea whether I agree with this statement.

As one of the contributors to ECMA-404 I'd be happy to elaborate


You know this, but just for the record: we could be applying the meaning we have for these terms in CS.

Yes, that is indeed the starting point.  However, TC39 is largely composed of language designers and language implementors so the meaning of "semantics" we use is generally the one used within that branch of CS.


The syntax just tells you which sequences of symbols are part of the language.
(This is what we have ABNF for; ECMA-404 uses racetracks plus some English language that maps the characters to the tokens in the syntax-level racetracks for value, object, and array, and to the English language components of the token-level racetracks for number and string.)

Agreed.  I would state this as:  the syntax tells you which sequences of symbols form valid statements within the language.

Language designer also use the term "static semantics".  The static semantics of a language are a set of rules that further restrict  which sequences of symbols form valid statements within the language.  For example, a rule that the 'member' names must be disjoint within an 'object' production could be a static semantic rule (however, there is intentionally no such rule in ECMA-404).

The line between syntax and static semantics can be fuzzy.  Static semantic rules are typically used to express rules that cannot be technically expressed using the chosen syntactic formalism or rules which are simply inconvenient to express using that formalism.  For example, the editor of ECMA-404 chose to simplify the RR track expression of the JSON syntax by using static semantic rules for whitespace rather than incorporating them into RR diagrams. 

Another form of static semantic rules are equivalences that state when two or more different sequences of symbols must be considered as equivalent.  For example, the rules that state equivalencies between escape sequences and individual code points within an JSON 'string'.  Such equivalences are not strictly necessary at this level, but it it simplifies the specification of higher level semantics if equivalent symbol sequences can be normalized at this level of specification.

When we talk about the "semantics" of a language (rather than "static semantics") we are talking about attributing meaning (in some domain and context) to well-formed (as specified via syntax and static semantics) statements expressed in that language.

ECMA-404 intentionally restricts itself to specify the syntax and static semantics of the JSON language.  More below on why.


Semantics is needed to describe e.g. that some whitespace is “insignificant” (not contributing to the semantics), describe the intended interpretation of escape sequences in strings,
Yes these are static semantic rules (although whitespace rules could be expressed using syntactic formalisms).

that the sequences of symbols enabled by the production “number” are to be interpreted in base 10,
Yes, ECMA-404 includes this as a static semantic statement although it is arguably could be classified as a semantic statement above the level of static semantics.  Whether "77" is semantically interpreted as the mathematical value 63 or 77 isn't really relevant to whether "77" is a well-formed JSON number.

or that “the order of the values is significant” in arrays (which seems to be intended to contrast them to JSON objects, where ECMA-404 weasels out of saying whether the order is significant).

ECMA-404 removed the statement "an object is an unordered collection..." that exists in RFC-6427.  Arguably, ECMA-404 should not have made the statement "the the order of values is significant" WRT arrays.  I'll file a bug ticket on that.  The reason that neither of these statements is appropriate at this level of specification is that they are pure semantic statements that have no impact upon determining whether a sequence of symbols are well-formed JSON text.

Objectively, the members of a JSON 'object' do occur in a specific order and a semantic interpreter of an object might ascribe meaning to that ordering.  Similarly, a JSON 'array' also has an objectively observable ordering of its contained values. It is again up to a semantic interpreter as to whether or not it ascribes meaning to that ordering.


ECMA-404 does quite a bit of of the latter, so indeed I also have trouble interpreting such a statement.

So back to "semantics" and why ECMA-404 tries (perhaps imperfectly) to avoid describing JSON beyond the level of static semantics. 

ECMA-404 see JSON as "a text format that facilitates structured data interchange between all programming languages. JSON
is syntax of braces, brackets, colons, and commas that is useful in many contexts, profiles, and applications".

There are many possible semantics and categories of semantics that can be applied to well-formed statements expressed using the JSON syntax.

One type of semantics are language bindings that specify how a JSON text might be translated into the data types and structures of some particular programming language or runtime environment. The translation of a JavaScript string encoding of a JSON text into JavaScript objects and values by JSON.parse is one specific example of this kind of semantic application of JSON.  But there are many languages that can be supported by such language bindings and there is not necessarily a best or canonical JSON binding for any language.

Another form of semantics imposes schema based meaning and restrictions upon a well-formed JSON text.  A schema explicitly defines an application level  meaning to the elements for some specific subset of well-formed SON texts. It might require only certain forms of JSON values, provide specific meaning to JSON numbers or strings that occur in specified positions, require the occurrence of certain object members, apply meaning to the ordering of object members or array elements, etc. This is probably  most common form of semantics applied to JSON and is used by almost all real world JSON use cases.

The problem with trying to standardize JSON semantics is that the various semantics that can be usefully be imposed upon JSON are often mutually incompatible with each other. At a trivial level, we see this with issues like the size of numbers or duplicate object member keys.  It is very hard to decide whose semantics are acceptable and whose is not.

What we can do, is draw a bright-line just above the level of static semantics.This is what ECMA-404 attempts to do. If defines a small set of structuring elements that can be recursively composed and represent in a textual encoding. It provides a common vocabulary upon which various semantics can be overlaid and nothing else.  The intent of ECMA-404 is to provide the definitive specification of the syntax and static semantic of the JSON format that can be used by higher level semantic specifications.

Allen Wirfs-Brock
ECMA-262 project editor

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Allen Wirfs-Brock

On Dec 5, 2013, at 11:34 PM, Carsten Bormann wrote:

Allen,

thank you a lot for this elaborate response.  It really helps me understand the different points of view that have emerged.  I’ll go ahead and insert my personal point of view in this response to maybe make that more understandable as well (I’m not speaking for the JSON WG at all here, of course).  Maybe you can relay this to es-discuss so both mailing lists benefit from it.

Did your reply bounce from es-discuss?  I won't elide any of you comments below, just in case.

if you or anybody else know of actual bugs or ambiguities in ECMA-404 the best way to communicate that to TC39 and the ECMA-404 project editor is to open a ticket at bugs.ecmascript.org.  Product: "ECMA-404  JSON", Component: "1st Edition".


The syntax just tells you which sequences of symbols are part of the language.
(This is what we have ABNF for; ECMA-404 uses racetracks plus some English language that maps the characters to the tokens in the syntax-level racetracks for value, object, and array, and to the English language components of the token-level racetracks for number and string.)

Agreed.  I would state this as:  the syntax tells you which sequences of symbols form valid statements within the language.

Language designer also use the term "static semantics".  The static semantics of a language are a set of rules that further restrict  which sequences of symbols form valid statements within the language.

Right, the "static semantics" is used to form a subset of what we arbitrarily call “syntax”, further restricting what sequence of symbols is in the language.

For example, a rule that the 'member' names must be disjoint within an 'object' production could be a static semantic rule (however, there is intentionally no such rule in ECMA-404).

Thanks, it is interesting to hear that this was a deliberate omission.

The line between syntax and static semantics can be fuzzy.  Static semantic rules are typically used to express rules that cannot be technically expressed using the chosen syntactic formalism or rules which are simply inconvenient to express using that formalism.  For example, the editor of ECMA-404 chose to simplify the RR track expression of the JSON syntax by using static semantic rules for whitespace rather than incorporating them into RR diagrams.

No, that isn’t static semantics.  The racetracks don’t have a useful meaning (i.e., express a different, more restricted syntax) without the English language rules about whitespace,  (More specifically, three of the racetracks operate on a different domain than the other two, without that having made explicit.)  Static semantics can only serve to restrict the set of syntactically valid symbol sequences.  Accepting whitespace is on the syntax level.  (Then ignoring it is indeed semantics.)

I think we're quibble here about unimportant points. Multiple level specifications is a common practice for language specification.  For example, using regular expressions to define the lexical productions (the tokens) and a BNF grammar to define the syntactic level. It is also common practice to use prose to describe the role of whitespace at the lexical level.  For example see http://www.ecma-international.org/ecma-262/5.1/#sec-5.1.2 

The important point is whether or not ECMA-404 under specifies the language, is ambiguous, or has any other errors. If it does, please file bug reports so corrections can be made in a revised editions. 


Another form of static semantic rules are equivalences that state when two or more different sequences of symbols must be considered as equivalent.  For example, the rules that state equivalencies between escape sequences and individual code points within an JSON 'string'.  Such equivalences are not strictly necessary at this level, but it it simplifies the specification of higher level semantics if equivalent symbol sequences can be normalized at this level of specification.

It may be convenient to lump this under static semantics (the static semantics may need to rely on such rules), but we are now in the area of semantic interpretation, no longer in the area of what should be strictly syntax but has been split into “syntax" and "static semantics" for notational convenience.

I disagree. It is useful at the syntactic/static semantic level to specify that two symbol sequences must be equivalent for semantic purposes.  And we can do this without providing any actual semantics for the symbol sequences.  For example:

We can say that
   "abc"
and
   "\u0061\u0062\u0063"
must be assigned identical semantics without actually specify what that semantics is. Whether you prefer to call it static semantics or something else, it is independent of any specific semantic domain and reasonably at the level of concerns addressed by ECMA-404.


When we talk about the "semantics" of a language (rather than "static semantics") we are talking about attributing meaning (in some domain and context) to well-formed (as specified via syntax and static semantics) statements expressed in that language.

Exactly.

ECMA-404 intentionally restricts itself to specify the syntax and static semantics of the JSON language.  More below on why.

If that was the intention, that didn’t work out too well.

specific bugs please...


Semantics is needed to describe e.g. that some whitespace is “insignificant” (not contributing to the semantics), describe the intended interpretation of escape sequences in strings,
Yes these are static semantic rules (although whitespace rules could be expressed using syntactic formalisms).

The syntax allows the whitespace.  The semantics tells you it doesn’t make a difference with respect to the meaning.  (OK, if you lump in semantic equivalence under static semantics, you can say the above, but this muddies the terms.)

that the sequences of symbols enabled by the production “number” are to be interpreted in base 10,
Yes, ECMA-404 includes this as a static semantic statement although it is arguably could be classified as a semantic statement above the level of static semantics.  Whether "77" is semantically interpreted as the mathematical value 63 or 77 isn't really relevant to whether "77" is a well-formed JSON number.

ECMA-404 indeed does not provide the full semantics of its numbers, just saying that they are “represented in base 10”, appealing to a deeply rooted common understanding of what that means (which by the way has been codified in ECMA-63 and then ISO 6093).  Note that there is no meaning of “represented in” outside of the domain of semantics — the text clearly is about mapping the abstract (semantic) concept of a number to its base-10 representation using JSON’s syntax.  It seems that this phrasing is a remnant from a time when the semantics was intended to be part of the specification.

"represented in base 10" probably would be better stated as "represented as a sequence of decimal digits" which would eliminate the semantic implication. 

Yes, there are remnants in ECMA-404 (and in REF-6427bis) from the days when the JSON format and its language binding to ECMAScript tended to be equated. One of the things we should be trying to do is eliminate those remnants.

\
or that “the order of the values is significant” in arrays (which seems to be intended to contrast them to JSON objects, where ECMA-404 weasels out of saying whether the order is significant).

ECMA-404 removed the statement "an object is an unordered collection..." that exists in RFC-6427.  

Indeed, it is again interesting to note that this was an intentional change from the existing JSON specifications.

Arguably, ECMA-404 should not have made the statement "the the order of values is significant" WRT arrays.  I'll file a bug ticket on that.  The reason that neither of these statements is appropriate at this level of specification is that they are pure semantic statements that have no impact upon determining whether a sequence of symbols are well-formed JSON text.

Well, in your definition of static semantics that includes semantic equivalence, the statement is appropriate.  It is, however, somewhat random whether ECMA-404 provides statements about semantic equivalence or not; it is certainly not trying for any completeness.

More specifics please.  I don't see how semantic equivalence enters into this discussion of arrays. What equivalences comes into play?  As I said above, I think the existence of that phase "the order of values is significant" is a bug.  "significant" to what?  Certainly the intent wasn't to forbid a schema level semantics from considering [1,2] and [2,1] as being equivalent in some particular field position.


Objectively, the members of a JSON 'object' do occur in a specific order and a semantic interpreter of an object might ascribe meaning to that ordering.  Similarly, a JSON 'array' also has an objectively observable ordering of its contained values. It is again up to a semantic interpreter as to whether or not it ascribes meaning to that ordering.

It is also up to a semantic interpreter as to whether it interprets base-10 numbers from left to right or from right to left.  However, I would argue that some of the potential interpretations are violating the principle of least surprise.  More so, JSON in the real world benefits from a significant amount of common interpretation.

Agreed.  I believe we have this today, at this level.

A reasonable way to capture this at the specification level is to define a generic “JSON data model” and define the semantic processing that leads up to this, but then of course leave it up to the application how to interpret the elements of the JSON data model.  A JSON array would be delivered as an ordered sequence to the (application-independent) JSON data model, but the application could still interpret that information as a set or as a record structure, depending on application context.

What do you mean by "delivered" in your second sentence.  It sounds like you are either talking about a language binding or perhaps a JSON parser interface. The former is clearly in the realm that I classify as semantics and I would expect any reasonable parser-based interface to preserve all ordering relationships that exist in the parsed text.  

As another example, the JSON to ECMAScript language binding defined by ECMA-262 implicitly defines an ordering of the properties of the ECMAScript objects that are created corresponding to JSON objects even though RFC-6427 said that an array is an unordered set of values.  It just falls out of the ECMAScript data model. 

We could try to say that all semantics applied to the JSON format MUST preserve the ordering of JSON array elements. But is seems unnecessary and in some cases excessively restrictive.

Defining a complete and universal "JSON data model" is hard.  It is possible to defined a normative JSON syntax without providing such a model and that is the direction that ECMA-404 has taken. If somebody wants to attempt to define such a data model they are welcome to write a spec. layered above ECMA-404 and to demonstrate its utility. 

In practice, JSON is almost useless without schema level semantic agreement between the producer and consumer of a JSON text. Most of the issues we are discussing here are easily subsumed by such schema level agreements.


ECMA-404 does quite a bit of of the latter, so indeed I also have trouble interpreting such a statement.

So back to "semantics" and why ECMA-404 tries (perhaps imperfectly) to avoid describing JSON beyond the level of static semantics.

ECMA-404 see JSON as "a text format that facilitates structured data interchange between all programming languages. JSON
is syntax of braces, brackets, colons, and commas that is useful in many contexts, profiles, and applications”.

I hope by now it should be clear that ECMA-404 is neither very successful in focusing on the syntax only, nor is it a particularly good specification of that syntax due to its mix of English language and racetrack graphics.  (I still like it as a tutorial for the syntax.)

No, it isn't clear.  Specific bugs would help clarify. I didn't choose to use racetracks in the specification and I might not have made that choice myself. But I will defend it as a valid formalism and one that is well understood.  A grammar expressed using ovals and arrows is just as valid as one expressed using ASCII characters. 

It's silly to be squabbling over such a notational issues and counter-productive if such squabbles results multiple different normative standards for the same language/format.

TC39 would likely be receptive to a request to add to ECMA-404 an informative annex with a BNF grammar for JSON (even ABNF, even though it isn't TC39's normal BNF conventions). Asking is likely to produce better results than throwing stones.


There are many possible semantics and categories of semantics that can be applied to well-formed statements expressed using the JSON syntax.

The problem with this approach is that much of the interoperability of JSON stems from implementations having derived a common data model.  Some of this is in the spec (RFC 4627), some if it has been derived by implementers drawing analogies with its ancestor JavaScript, some of it stems from the fact that the syntax is simply suggestive of a specific data model.

Much more would be gained in documenting that (common) data model (including documenting the differences that have ensued, and selecting some deviations as canonical and others as mistakes) than from retracting some of the explicit semantics (while keeping some of them as well as the implicit ones weakly in place).

This is where I disagree.  Do you have any examples of interoperability problems occurring at this level? As I said above.  Successful JSON interoperability is most dependent upon schema level semantic agreement and good language bindings. In practice, those levels easily can encompass the sort of data model issues you seem to be concerned about.

However, I don't think TC39 wants to put any barriers in front of somebody trying to specify such a data model or models.  We tried to avoid such barriers is by not including unnecessary semantic restrictions in ECMA-404.


One type of semantics are language bindings that specify how a JSON text might be translated into the data types and structures of some particular programming language or runtime environment. The translation of a JavaScript string encoding of a JSON text into JavaScript objects and values by JSON.parse is one specific example of this kind of semantic application of JSON.  But there are many languages that can be supported by such language bindings and there is not necessarily a best or canonical JSON binding for any language.

Leaving out the common data model and going directly from syntax to language binding is a recipe for creating interoperability problems.  The approach “works” from the view of a single specific language (and thus may seem palatable for a group of experts in a specific language, such as TC39), but it is not aiding in interoperability of JSON at large.

Examples? The language expertise within TC39 certainly extends beyond just ECMAScript and that expertise informs the consensus decisions we make.

A counter example we have actually discussed is any limitation on the number of digits in a JSON number.   While some applications of JSON might want to limit the precision other have a need for arbitrary large digit sequences.  Such restrictions and allowances must be dealt with at the schema specification level so there is no need to arbitrarily restrict precision at the format/language level of specification.

That's it for now. I think I've already addressed any substantive issues you raise below.

Happy to continue the conversation. 

Allen


Another form of semantics imposes schema based meaning and restrictions upon a well-formed JSON text.  A schema explicitly defines an application level  meaning to the elements for some specific subset of well-formed SON texts. It might require only certain forms of JSON values, provide specific meaning to JSON numbers or strings that occur in specified positions, require the occurrence of certain object members, apply meaning to the ordering of object members or array elements, etc. This is probably  most common form of semantics applied to JSON and is used by almost all real world JSON use cases.

Again, leaving out the common data model and leaping from the syntax to a specific application semantics negates all the real-world advantages of having a common data interchange format.

The problem with trying to standardize JSON semantics is that the various semantics that can be usefully be imposed upon JSON are often mutually incompatible with each other. At a trivial level, we see this with issues like the size of numbers or duplicate object member keys.  It is very hard to decide whose semantics are acceptable and whose is not.

Completely agree.  The next step in the evolution of JSON should have been to actually do this hard work based on the experience we have after a decade of usage, instead of punting on it.

What we can do, is draw a bright-line just above the level of static semantics.This is what ECMA-404 attempts to do. If defines a small set of structuring elements that can be recursively composed and represent in a textual encoding. It provides a common vocabulary upon which various semantics can be overlaid and nothing else.  The intent of ECMA-404 is to provide the definitive specification of the syntax and static semantic of the JSON format that can be used by higher level semantic specifications.

It might have been a good idea to do just that as a first step, but rushing out ECMA-404 with little feedback from the wider community has apparently compromised the quality of the result.  As it stands, the seven-year old RFC 4627 continues to fulfill this very objective in a better way.

Grüße, Carsten

_______________________________________________
json mailing list
[hidden email]
https://www.ietf.org/mailman/listinfo/json



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Allen Wirfs-Brock

On Dec 6, 2013, at 12:56 PM, Nico Williams wrote:

> On Fri, Dec 06, 2013 at 11:50:13AM -0800, Allen Wirfs-Brock wrote:
>> In practice, JSON is almost useless without schema level semantic
>> agreement between the producer and consumer of a JSON text. Most of
>
> Yes.
>
>> the issues we are discussing here are easily subsumed by such schema
>> level agreements.
>
> Hmmm, well, there has to be some meta-schema.
>
> That arrays preserve order is meta-schema for JSON, else we'd have no
> interop -- and this is critical for comparisons, so specifying this bit
> of meta-schema/ semantics enables very important semantics: arrays can
> be compared for equivalence without having to sort them (which would
> require further specification of collation for all JSON values!).

What "array" are you talking about.  The an 'array' symbol sequence in a JSON text? A language-specific array-like data structure generated from such a symbol sequence by the parser for a specific JSON language binding?  A domain data structure generate by a schema aware parser?

Why shouldn't an schema be allowed to consider the following to be semantically equivalent:
      {"unordered-list": [0,1]}
and
      {"unordered-list": [1,0]}

Besides, we already agreed above that if you don't have schema-level agreement then JSON is almost useless.  So why not just let schema specifications or schema language specifications handle this.

>
> That whitespace (outside strings) is not significant may be expressed
> syntactically or semantically, but this has to be universally agreed if
> we'll have any chance of interoperating.

ECMA-404 states where insignificant whitespace is allowed. Is there any disagreement about this?

>
> That object names (keys) are not dups is trickier: for on-line
> processing there may be no way to detect dups or do anything about them,
> but for many common implementations object name uniqueness is very much
> a requirement.  So here, finally, we have a bit of semantics that could
> be in the spec but could be left out (we spent a lot of time on the
> current consensus for RFC4627bis, and I think it's safe to say that
> we're all happy with it)

Should be left out.  Both because of legacy precedent and because it can be dealt with that a language binding or schema semantics specification.

But I think we are already in agreement on leaving this out at the static semantic level.

>
> That objects name order is irrelevant and non-deterministic is widely
> assumed/accepted (though often users ask that JSON filters preserve
> object name order from input to output).  (And, of course, for on-line
> encoders and parsers name order could well be made significant, but
> building schemas that demand ordered names means vastly limiting the
> world of JSON tooling that can be used to interoperably implement such
> schemas.)

Object name ordering is significant to widely used JSON language bindings (eg, the ECMA-262 JSON parser).  But again this is a semantic issue.

Because ECMA-404 is trying to restrict itself to describe the space of well-formed JSON text there really is nothing to say about object name ordering at that level. It's a semantic issue.

>
> Similarly for numbers, the *interoperable* number ranges and precisions
> are not really syntactic (they could be expressed via syntax, but it'd
> be a pain to do it that way).
>
> I think it's clear that we have consensus in the IETF JSON WG for:
>
> - whitespace semantics (not significant outside strings)
this is a syntactic issue that is covered by ECMA-404
> - array element order semantics (elements are "ordered")
> - object name dups/order semantics (names SHOULD be unique, but interop
>   considerations described; name/value pairs are "unordered")
> - no real constraints on numeric values but interoperable
>   range/precision described
the rest are semantic issues that ECMA-404 does not want to address.  The one place it arguably over steps, by saying that "the order of array values is significant", really has no associated semantics. This is one place where I prefer the current draft language in RFC-4627bis clause 5 over the corresponding language in ECMA-404. The Introduction (is the intro normative?)  to 4627bis says "an array is an ordered sequence" and  "an object is an unordered collection" but I don't see any actual contextual meaning given to either "ordered" or "unordered" within the document.

>
> If ECMA-404 differs in any way that does not impose more/different
> semantics, then maybe we don't care as far as RFC4627bis goes. If
> ECMA-404 does impose more/different semantics then we'll care a great
> deal.
If it does, that's unintended and a correctable bug in ECMA-404.
>  Since ECMA-404 targets just the syntax and minimal semantics,
> it's probably just fine for RFC4627bis to reference ECMA-404, but since
> RFC4627bis would be specifying a bit more semantics, we'd probably not
> want to make that reference be normative, at least not with some text
> excplaining that it's normative only because we believe that the JSON
> syntax given in both docs are equivalent.

The semantics you want to specify can be layered upon a normative reference to ECMA-404. Rather have competing and potentially divergence specifications we should be looking a clean separation of concerns.

The position stated by TC39 that ECMA-404 already exists as a normative specification of the JSON syntax and we have requested that RFC4627bis normatively reference it as such and that any restatement of ECMA-404 subject matter should be marked as informative.  We think that dueling normative specifications would be a bad thing. Seeing that the form of expression used by ECMA-404 seems to be a issue for some JSON WG participants I have suggested that TC39 could probably be convinced to revise ECMA-404 to include a BNF style formalism for the syntax.  If there is interest in this alternative I'd be happy to champion it within TC39.

Allen
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Carsten Bormann
On 07 Dec 2013, at 12:55, Nico Williams <[hidden email]> wrote:

> And we all now seem to agree
> that the ABNF in draft-ietf-json-rfc4627bis-08 is equivalent to the
> syntax in ECMA-404.

Yes, we like to believe that.
The thing that worries me is that nobody knows whether that is actually true.

(At least I’d hope someone who is comfortable with the description methods in ECMA-404 makes a serious pass at establishing this equivalence, even when it’s ultimately not possible to actually prove it.  That someone will not be me.)

Grüße, Carsten

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Allen Wirfs-Brock
In reply to this post by Allen Wirfs-Brock

On Dec 7, 2013, at 3:55 AM, Nico Williams wrote:

> On Fri, Dec 06, 2013 at 03:00:31PM -0800, Allen Wirfs-Brock wrote:
>> On Dec 6, 2013, at 12:56 PM, Nico Williams wrote:
>>> ...
>
>> Why shouldn't an schema be allowed to consider the following to be semantically equivalent:
>>      {"unordered-list": [0,1]}
>> and
>>      {"unordered-list": [1,0]}
>
> A *schema* is so allowed.
>
> However, if a schema is also to be allowed to treat them as distinct
> then the *meta-schema* must treat them as distinct.  I.e., no matter
> what generic programming language bindings of JSON one users, the above
> two JSON texts must produce equivalent results when parsed!

"Equivalent" according to what definition?

The most basic form of parsing translator, beyond a simple recognizer that reports valid/invalid, is a translator that produces a parse tree. So lets assume that we create such a parse tree generator using the 4627bis grammar.  The parse trees for the two JSON arrays shown above will be different.  As you correctly state, if they weren't then any down stream semantics could not apply different meaning to them.  So, in what sense are you saying that the result of parsing (in this case the parse trees) must be equivalent?

>
> The application is clearly free to then re-order those arrays' elements,
> or to compare them as equivalent.  The application cannot consider them
> not equivalent if the parsers/encoders don't either.

Similarly, the JSON texts:
   {"1":  1, "2", 2}
and
   {"2":  2, "1": 1}

or the JSON texts:
   {"a": 1, "a": 2}
and
   {"a": 2, "a": 1}

have an ordering of the object members that must be preserved by the parser in order for downstream semantics to be applied.  And, in the real world, this ordering can be quite significant.  For example, for both of these cases, the standard JSON to JavaScript language binding produces observably different results.

I think that if we cut through the rhetoric we are probably in agreement.   Within a JSON text, there is a clearly observable ordering to both the values that are elements of a JSON array and to the members of a JSON object.   Parsing must preserve this order, in both cases, because downstream semantics may be dependent upon the orderings.

If we agree to that, then at the level of defining JSON syntax it seems that an assertion that JSON arrays are ordered is redundant (the grammar already tells us that) and an assertion that the members of a JSON object are unordered is incorrect.

Where we seem to disagree is on if or where any such ordering requirements might be imposed. My contention is that they don't belong in a syntactic level specification such as ECMA-404 but do belong in downstream specifications for data models, language bindings, or application level schema.

>
>> Besides, we already agreed above that if you don't have schema-level
>> agreement then JSON is almost useless.  So why not just let schema
>> specifications or schema language specifications handle this.
>
> Because generic filters/tools/apps exist that would be non-conformant if
> they have any expectation about array order preservation in *parsers*
> and *encoders* of related tooling.
>
> I.e., I very much expect these jq filters to repeat their inputs as-is
> without re-ordering any arrays in them (but they may definitely change
> things like whitespace and it may re-order names in objects):

No, this is where we diverge.  Ordering of names within objects can and does, in the real world, have significance. A generic tools that changes member order within a JSON object will break things.

>
>    jq .
>    jq '[.[]]'
>
> I also expect all of the C JSON libraries I know and have written code
> to (I think that's four C JSON libraries) to preserve array order when
> parsing / encoding JSON texts.  It'd be extremely strange to not be able
> to implement a JSON-using application that cares about array order!

and similarly for JSON-using applications that care about object member order

>
>>> That whitespace (outside strings) is not significant may be expressed
>>> syntactically or semantically, but this has to be universally agreed if
>>> we'll have any chance of interoperating.
>>
>> ECMA-404 states where insignificant whitespace is allowed. Is there
>> any disagreement about this?
>
> No.  I was listing some cases where there can be significant differences
> in the "syntax only" vs. "syntax and [some] semantics" approaches.
>
> If ECMA TC39 were to insist that arrays in JSON texts do not denote
> order, that parsers may re-order array elements at will, say, then I
> suspect I'd bet this WG would just... note that difference and move on.
> There's no chance, I think, that the IETF would accept such a departure
> from RFC4627 (which says that an array "is an ordered sequence of zero
> or more values").  The proposal that the original RFC title be restored
> is much less controversial than the idea that JSON arrays are not
> ordered.

Hopefully, it is now clear that this is not what I'm arguing for.  Any statement about array ordering is redundant because the grammar already covers that. The only harm is in somebody misconstruing it to be a requirement about downstream semantics.  However, the same is true about Object members.  You assertion that a generic filter is free to reorder members is a good example of how a statement about ordering, at this level of specification, can be misconstrued.

... [snipping back and forth that I think is already addressed above]
>>
>> Object name ordering is significant to widely used JSON language
>> bindings (eg, the ECMA-262 JSON parser).  But again this is a semantic
>> issue.
>
> But there's no general requirement that object name order be preserved.
> Or at least I don't see you asserting that there is.  (But if you were,
> you'd care a lot about this semantic issue, and you'd want that bit of
> semantics specified somewhere, surely.)

It's hopefully clear by now that, yes I am asserting that object name order is important.

And I do care about the semantic issues.  They just don't belong in a syntactic level specification of the JSON format such as ECMA-404. A problem I see with the RFC4627bis is that it conflates a syntactic level specification with a just little bit of semantic data model. It is neither a pure syntactic specification nor a complete data model.

> That specific programming language bindings/APIs/implementations make
> object name significant (or preserve it) does not impose a requirement
> to preserve object name order on other implementations that don't do so
> today.  A great many implementations use hash tables to represent
> objects internally, and they lose any other object name ordering.
>
>> Because ECMA-404 is trying to restrict itself to describe the space of
>> well-formed JSON text there really is nothing to say about object name
>> ordering at that level. It's a semantic issue.
>
> Of course.  And RFC4627 does deal with semantics.  It is appropriate for
> RFC4627bis to do so as well.  Even if we agreed to drop all RFC2119
> language as to semantics we'd still keep interoperability notes about
> the old (and widely-deployed) semantics.
>
>> The semantics you want to specify can be layered upon a normative
>> reference to ECMA-404. Rather have competing and potentially
>> divergence specifications we should be looking a clean separation of
>> concerns.
>
> We already have a clean separation in RFC4627bis: there's the ABNF
> (syntax) and everything else (semantics).  And we all now seem to agree
> that the ABNF in draft-ietf-json-rfc4627bis-08 is equivalent to the
> syntax in ECMA-404.  If the title of RFC4627 is restored then what ECMA
> concerns remain?

Multiple normative definitions of the same material.  Whether they are equivalent is a matter of interpretation and opinion that can lead to confusion and possibly divergence over time. A solution to this was requested in the TC39 feedback.  RFC4627bis should normatively reference ECMA-404 WRT to the syntax and static semantics of JSON. If it chooses to also restate the ECMA-404 grammar in a different notation (ie, ABNF) that material should be designate as informative with ECMA-404 serving as the normative specification of that material.

>
>> The position stated by TC39 that ECMA-404 already exists as a
>> normative specification of the JSON syntax and we have requested that
>> RFC4627bis normatively reference it as such and that any restatement
>> of ECMA-404 subject matter should be marked as informative.  We think
>> that dueling normative specifications would be a bad thing. Seeing
>> that the form of expression used by ECMA-404 seems to be a issue for
>> some JSON WG participants I have suggested that TC39 could probably be
>> convinced to revise ECMA-404 to include a BNF style formalism for the
>> syntax.  If there is interest in this alternative I'd be happy to
>> champion it within TC39.
>
> Is there an assertion that ECMA-404 and draft-ietf-json-rfc4627bis-08
> disagree as to syntax?  I don't think so.  There's a concern that they
> might, and the easiest way to resolve that concern is to use the same
> syntax specification in both cases.  It would help a lot if TC39 were to
> publish an ABNF syntax for JSON texts, but even without that it's pretty
> clear that the two documents do not disagree as to syntax.

Then I think we should be close to agreement.  Does the JSON WG wish to formally request that TC39 add a ABNF specification to a new edition of ECMA-404?  Would RFC4627bis then normatively reference ECMA-404?

Allen
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Carsten Bormann
On 07 Dec 2013, at 19:05, Allen Wirfs-Brock <[hidden email]> wrote:

> Parsing must preserve this order, in both cases, because downstream semantics may be dependent upon the orderings.

That would be a major breaking change.  The JSON WG is chartered not to do those.

If the purpose of removing semantics from the specification is to create a derivative of JSON where this matters, I can finally have my binary data in JSON.  You see, I have proposed for a while that any string that is immediately preceded by two newlines is interpreted as a base64url representation of a binary string instead of a text string.  Problem solved.

If this usage of whitespace seems somehow revolting, maybe you get an idea of how unacceptable reducing the definition of JSON to its syntax is.  Interoperability requires more than common syntax.

In JSON, objects are unordered collections or sets of name/value pairs.  It says so right there on json.org (“sets”), and it says so in RFC 4627 (“collections”)*).  We may not like it, but it has been a promise for a decade.  We need to heed it.  (Another promise was that JSON doesn’t change**).)

Data interchange formats where this is not the case may be using the JSON syntax, but aren’t JSON.

Grüße, Carsten

*) (The difference is unfortunate, but a fact that we need to deal with.)

**) Which can’t be strictly true, as JSON is as much defined by the collection of its implementations as by its specification.  But that’s just limiting the extent of the promise, not giving us a free get out of jail card.

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Allen Wirfs-Brock

On Dec 7, 2013, at 12:30 PM, Carsten Bormann wrote:

> On 07 Dec 2013, at 19:05, Allen Wirfs-Brock <[hidden email]> wrote:
>
>> Parsing must preserve this order, in both cases, because downstream semantics may be dependent upon the orderings.
>
> That would be a major breaking change.  The JSON WG is chartered not to do those.

It is also a major breaking change if downstream semantics can't depend upon the ordering of object members.  In particular, it means that the standard built-in ECMAScript JSON parsers as well as the classic JavaScript eval-based processing will be non-conforming.  The latter is particularly puzzling as that was the original basis upon which JSON was defined.

In fact, the only place that either the current RFC-4627bis draft or the original RFC-4627 says anything about object name/value pairs being "unordered" is the their introductions  The 4627bis language appears to have been directly copied from the original RFC.  i  It isn't clear whether or not the introduction to 4627bis is intended to be normative.  If it is, then I note that it also says in both the new and old documents) that JSON's design goals were for it to be "a subset of JavaScript".  The syntactic elements of JavaScript that correspond to the JSON object syntax do have  a specified ordering semantics.

When we prepared ECMA-404 we concluded that characterizing JSON objects as unordered was a mistake in the original RFC.  The original author did not object to this interpretation.

>
> If the purpose of removing semantics from the specification is to create a derivative of JSON where this matters,

No the purpose is to ensure that the specification remains compatible with the most widely deployed JSON parsers.  Specifically, the ECMA-262 conforming parsers that are implemented by JavaScript engines in all major browsers.

> I can finally have my binary data in JSON.  You see, I have proposed for a while that any string that is immediately preceded by two newlines is interpreted as a base64url representation of a binary string instead of a text string.  Problem solved.
>
> If this usage of whitespace seems somehow revolting, maybe you get an idea of how unacceptable reducing the definition of JSON to its syntax is.  Interoperability requires more than common syntax.
>
> In JSON, objects are unordered collections or sets of name/value pairs.  It says so right there on json.org (“sets”), and it says so in RFC 4627 (“collections”)*).  We may not like it, but it has been a promise for a decade.  We need to heed it.  (Another promise was that JSON doesn’t change**).)

You also need to look at objective reality and consider the possibility that the informal (and non-normative text) on both the json.org website and in the original RFC never actually matched reality.

JSON is derived from JavaSript (whose standard is ECMA-262) and since 2009, ECMA-262 (and its clone ISO/IEC-16262) has included a normative specification for parsing JSON text that includes an ordering semantics for object members.

>
> Data interchange formats where this is not the case may be using the JSON syntax, but aren’t JSON.

I disagree with this conclusion, but I think you are approaching an important point of possible agreement.  The JSON syntax  is used in many ways and for many purposes  and if worthy of independent standardization.  That is what ECMA-404 does.  The JSON WG is certainly free (actually encouraged) to issue a  normative standard that addresses interchange requirements for the MIME type application/json.  But that should be view only as a spec. for application/json interchange, not the one and only JSON specification.

Allen
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Bjoern Hoehrmann
In reply to this post by Allen Wirfs-Brock
* Allen Wirfs-Brock wrote:
>On Dec 7, 2013, at 3:55 AM, Nico Williams wrote:
>> However, if a schema is also to be allowed to treat them as distinct
>> then the *meta-schema* must treat them as distinct.  I.e., no matter
>> what generic programming language bindings of JSON one users, the above
>> two JSON texts must produce equivalent results when parsed!
>
>"Equivalent" according to what definition?

I suspect intended was "must not produce".

>And I do care about the semantic issues.  They just don't belong in a
>syntactic level specification of the JSON format such as ECMA-404. A
>problem I see with the RFC4627bis is that it conflates a syntactic level
>specification with a just little bit of semantic data model. It is
>neither a pure syntactic specification nor a complete data model.

  JSON_texts = {     x | x is a JSON text }

  JSON_diffs = { (a,b) | a and b are elements of JSON_texts and
                         a is significantly different from b }

A pure specification in your sense above defines only membership in the
`JSON_texts` set. ECMA-404 is not pure in this sense because it defines
that e.g. `("[]", "[ ]")` is not a member of `JSON_diffs`.

ECMA-404 does not define that

  ('{"x":1,"y":2}', '{"y":2,"x":1}')

is not a member of `JSON_diffs`. Right? It says the white space in the
example is insignificant, but it does not say order of key-value-pairs
in objects is insignificant. Carsten Bormann gave other examples like
ECMA-404's definition of equivalent escape sequences.

Readers of ECMA-404 might assume that it gives a complete description
of what people developing and operating JSON-based systems agree are
significant differences. They might build systems that rely on the order
of key-value-pairs in objects because of this, for instance

  http://wiki.apache.org/solr/UpdateJSON#Solr_3.1_Example

Systems like ecmascript's `JSON.stringify` API cannot ordinarily create
such JSON texts and would be unable to interact with such a system. That
is something the IETF JSON Working Group wishes to avoid, accordingly
they provide a more complete definition of the `JSON_diffs` equivalence
relation that better reflects rough consensus and running code of the
JSON community.

I believe the combination of impurity and incompleteness in ECMA-404 is
harmful to the JSON community.
--
Björn Höhrmann · mailto:[hidden email] · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Carsten Bormann
In reply to this post by Allen Wirfs-Brock

On 08 Dec 2013, at 00:05, Allen Wirfs-Brock <[hidden email]> wrote:

>
> On Dec 7, 2013, at 12:30 PM, Carsten Bormann wrote:
>
>> On 07 Dec 2013, at 19:05, Allen Wirfs-Brock <[hidden email]> wrote:
>>
>>> Parsing must preserve this order, in both cases, because downstream semantics may be dependent upon the orderings.
>>
>> That would be a major breaking change.  The JSON WG is chartered not to do those.
>
> It is also a major breaking change if downstream semantics can't depend upon the ordering of object members.  

Wait a minute.  It can’t be a breaking change because it is not a change.

JSON parsers are free to implement extensions (section 4), so none of the JavaScript extensions make them non-conforming JSON parsers.
Many JSON parsers won’t implement these extensions, and many JSON generators won’t be able to generate them, so arguing they have become part of JSON because they are in one parser doesn’t quite work.

> When we prepared ECMA-404 we concluded that characterizing JSON objects as unordered was a mistake in the original RFC.  

Silently making this breaking change is a nice illustration for the process issues that might make some of us a bit reluctant to use ECMA-404 as a normative reference, even if it were turned into a technically superior spec.

> The original author did not object to this interpretation.

It is, however, still on json.org, so we seem to have a bit of a communication problem here.

> JSON is derived from JavaSript

“Was originally derived” would be closer; after JavaScript changed, JSON is not even a subset of JavaScript any more.
And that historical ancestry doesn’t make JavaScript specifications the specification for JSON.

> (whose standard is ECMA-262) and since 2009, ECMA-262 (and its clone ISO/IEC-16262) has included a normative specification for parsing JSON text that includes an ordering semantics for object members.

As it is free to do; that doesn’t change JSON the data interchange format though.

Grüße, Carsten

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Allen Wirfs-Brock
In reply to this post by Allen Wirfs-Brock

On Dec 7, 2013, at 4:55 PM, John Cowan wrote:

Allen Wirfs-Brock scripsit:

Similarly, the JSON texts:
  {"1":  1, "2", 2}
and
  {"2":  2, "1": 1}

or the JSON texts:
  {"a": 1, "a": 2}
and
  {"a": 2, "a": 1}

have an ordering of the object members that must be preserved by the
parser in order for downstream semantics to be applied.

I cannot accept this statement without proof.  Where in the ECMAscript
definition does it say this?

First, the console out from an experiment run from the developer console of Firefox 27

17:22:24.873 var jsonText1 = '{"a": 1, "a": 2}';
17:22:25.244 undefined                                     <----- ignore these, they are a console artifact
17:22:50.107 console.log(jsonText1);
17:22:50.124 undefined
17:22:50.125 "{"a": 1, "a": 2}"    <-----note that the console doesn't property escape embedded quotes
17:23:45.060 var jsonText2 = '{"a": 2, "a": 1}';
17:23:45.062 undefined
17:24:18.594 console.log(jsonText2);
17:24:18.649 undefined
17:24:18.649 "{"a": 2, "a": 1}"
17:25:31.540 var parsedText1 = JSON.parse(jsonText1);
17:25:31.577 undefined
17:26:36.429 console.log(parsedText1.a)
17:26:36.568 undefined
17:26:36.569 2
17:27:13.754 var parsedText2 = JSON.parse(jsonText2);
17:27:13.882 undefined
17:27:37.533 console.log(parsedText2.a)
17:27:37.660 undefined
17:27:37.661 1

Note that the value of the 'a' property on the JavaScript object produced by JSON.parse is either 1 or 2 depending upon the ordering of the member definitions with duplicate names.   I'll leave it to you to try using your favorite browser.  However, I'm confident that you will see the same result as this is what ECMA-262, 5th Edition requires.  I happen to be fairly familiar with that document, so I can explain how that is:

1) JSON.parse is specified in by the algorithms in section 15.12.2 http://www.ecma-international.org/ecma-262/5.1/#sec-15.12.2 starting with the first algorithm in that section.
2) Step 2 of that algorithm requires validation the input string against the JSON grammar provided in http://www.ecma-international.org/ecma-262/5.1/#sec-15.12.1 
3) If the input text cannot be recognized by a parser for that grammar, an exception must be thrown at that point.
4) If the input text is recognized by the parser, then step 3 says to parse and evaluate the input text as with it was ECMAScript source code.  The result of that evaluation is what is normally returned from the function.  The ECMAScript parsing and evaluation rules can be used in this manner because a well-formed JSON text (that is verified in step 2) is a subset of an ECMAScript PrimaryExpression http://www.ecma-international.org/ecma-262/5.1/#sec-11.1 .
5)  The text of a JSON object definition will be parsed and evaluated as if it was an EMAScript ObjectLiteral as specified at http://www.ecma-international.org/ecma-262/5.1/#sec-11.1.5 The evaluation semantics are specified by the algorithms that follow the BNF in that section.
6) Note that the body of an ObjectLiteral is described by the PropertyNameAndValueList production which produces a comma separated list of PropertyAssignment productions. 
7) The PropertyAssignments of a PropertyNameAndValueList are evaluated in left to right order, as specified by 4th algorithm on this section. 
8) As each PropertyAssignment is evaluated, it performs a [[DefineOwnProperty]] operation up on the result object using the property name and value provided by the PropertyAssignment.
9) [[DefineOwnProperty]] is defined in http://www.ecma-international.org/ecma-262/5.1/#sec-8.12.9 . It is a fairly complex operation but the short story is that if a property of that name does not already exist one is created and assigned the associated value. If a property of that name does already exist, the existing value is overwritten with the current value. 

In other words, ECMA-262 explicitly specifies that when multiple occurrences of the same member name occurs in a JSON object, the value associated with the last (right-most) occurrence is used. Order matters.

A similar analysis applies to the first example.  

Allen

 












--
John Cowan              [hidden email]          http://www.ccil.org/~cowan
C'est la` pourtant que se livre le sens du dire, de ce que, s'y conjuguant
le nyania qui bruit des sexes en compagnie, il supplee a ce qu'entre eux,
de rapport nyait pas.               --Jacques Lacan, "L'Etourdit"



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Bjoern Hoehrmann
* Allen Wirfs-Brock wrote:

>On Dec 7, 2013, at 4:55 PM, John Cowan wrote:
>> Allen Wirfs-Brock scripsit:
>>
>>> Similarly, the JSON texts:
>>>   {"1":  1, "2", 2}
>>> and
>>>   {"2":  2, "1": 1}
>>>
>>> or the JSON texts:
>>>   {"a": 1, "a": 2}
>>> and
>>>   {"a": 2, "a": 1}
>>>
>>> have an ordering of the object members that must be preserved by the
>>> parser in order for downstream semantics to be applied.
>>
>> I cannot accept this statement without proof.  Where in the ECMAscript
>> definition does it say this?

>In other words, ECMA-262 explicitly specifies that when multiple
>occurrences of the same member name occurs in a JSON object, the value
>associated with the last (right-most) occurrence is used. Order matters.
>
>A similar analysis applies to the first example.  

Your analysis does not demonstrate that `JSON.parse` preserves ordering.
I am confident that even in the current ES6 draft `JSON.stringify` does
not preserve ordering even if `JSON.parse` somehow did. It's based on
`Object.keys` which does not define ordering as currently proposed. If
you can re-create the key-value-pair order in your first example from
the output of `JSON.parse` without depending on implementation-defined
behavior, seeing the code for that would be most instructive.
--
Björn Höhrmann · mailto:[hidden email] · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Allen Wirfs-Brock

On Dec 7, 2013, at 6:39 PM, Bjoern Hoehrmann wrote:

* Allen Wirfs-Brock wrote:
On Dec 7, 2013, at 4:55 PM, John Cowan wrote:
Allen Wirfs-Brock scripsit:

Similarly, the JSON texts:
 {"1":  1, "2", 2}
and
 {"2":  2, "1": 1}

or the JSON texts:
 {"a": 1, "a": 2}
and
 {"a": 2, "a": 1}

have an ordering of the object members that must be preserved by the
parser in order for downstream semantics to be applied.

I cannot accept this statement without proof.  Where in the ECMAscript
definition does it say this?

In other words, ECMA-262 explicitly specifies that when multiple
occurrences of the same member name occurs in a JSON object, the value
associated with the last (right-most) occurrence is used. Order matters.

A similar analysis applies to the first example.  

Your analysis does not demonstrate that `JSON.parse` preserves ordering.
I am confident that even in the current ES6 draft `JSON.stringify` does
not preserve ordering even if `JSON.parse` somehow did. It's based on
`Object.keys` which does not define ordering as currently proposed. If
you can re-create the key-value-pair order in your first example from
the output of `JSON.parse` without depending on implementation-defined
behavior, seeing the code for that would be most instructive.

You are correct that, ES5 does not define the for-in enumeration order.  But it does say that the Object.keys ordering must be the same as  for-in enumeration order. and there is a defacto standard for a partial enumeration order that all browsers implement.

"The common behavior subset here is: for objects with no properties  
that look like array indices, and no enumerable prototype properties,  
for..in enumeration returns properties in insertion order. That  
particular behavior is a de facto standard and required for Web  
compatibility. A future standard should specify at least that much. "

"We did identify one situation where enumeration order will be the same across all major implementation that are currently in use (including IE6):

The enumeration order of an object's properties will be the order in which the properties were added if all the following conditions hold:
The object has no inherited enumerable properties
The object has no array indexed properties
No properties have been deleted
No property has had its attributes modified or been changed from a data property to an accessor property or visa versa
"

Also see https://mail.mozilla.org/pipermail/es-discuss/2011-March/012965.html must other discussion history you can find in the es-discuss archives.

In practice, JavaScript implementation do have a standard enumeration order that applies for the cases that most commonly when parsing and generating JSON text. Application do depend upon that ordering.

Allen



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Allen Wirfs-Brock
In reply to this post by Allen Wirfs-Brock

On Dec 7, 2013, at 7:22 PM, John Cowan wrote:

> Allen Wirfs-Brock scripsit:
>
>> In other words, ECMA-262 explicitly specifies that when multiple
>> occurrences of the same member name occurs in a JSON object, the value
>> associated with the last (right-most) occurrence is used. Order matters.
>
> Okay, I concede that order matters *when* there are duplicate names.
> I still deny that it matters otherwise.

In reality it defines the JavaScript for-in enumeration order over the JS object property generated by JSON.parse.

Try this in your favorite browser:

   var jText = '{"b": 1, "a": 2, "c": 3};
   for (var key in JSON.parse(jText) console.log(key);

You will get as output:
     b
     a
     c

I can assure you that code exists on the web that depends, whether intentionally or not, on this ordering.  Past experience among browser implementations is that site break when attempts are made to change this ordering.

Allen


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Carsten Bormann
In reply to this post by Carsten Bormann
On 08 Dec 2013, at 10:58, Martin J. Dürst <[hidden email]> wrote:

> There are two description methods is ECMA-404: text and "railroad" diagrams.

There are actually two quite different kinds of racetracks (“railroad diagrams”), the first three at the parser level and the second two at the scanner level.  This is nowhere explained (it just happens to be the only one of the many potential interpretations of ECMA-404 that yields a result getting close to current practice), and you have to piece together from section 4 what the interface between the two levels might be.

> The textual descriptions are in some cases quite precise, but in some other cases, leave quite a bit of ambiguity. And stuff like "It may have an exponent of ten, prefixed by e (U+0065) or E (U+0045) and optionally + (U+002B) or – (U+002D)." (in particlar the first clause of that sentence) doesn't make much sense.

I read them mainly as a way to give meaning to the bubbles in the racetracks.
All the statements that define the semantics of the data are really errata anyway, we have learned.
(If one were to do the semantics properly for section 8, one could simply reference ISO 6093.
RFC 4627 does that implicitly by saying "The representation of numbers is similar to that used in most programming languages.".)

> As for the railroad tracks, besides just floating in the spec without references, the notation is also not at all explained. If one took the most straightforward and obvious interpretation (that's not how standards work, but anyway), it's not too difficult to come up with a formally precise way of converting each of them into a diagrams for a finite state machine. From there, conversion to the ABNF, or showing equivalence, on a quite formal level, shouldn't be too much a problem.

Yes, please.  I was asking for someone to do that work, maybe in the process generating some guidance on how to read ECMA-404.  It won’t be me.

Grüße, Carsten

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Bjoern Hoehrmann
In reply to this post by Carsten Bormann
* Martin J. Dürst wrote:
>The textual descriptions are in some cases quite precise, but in some
>other cases, leave quite a bit of ambiguity. And stuff like "It may have
>an exponent of ten, prefixed by e (U+0065) or E (U+0045) and optionally
>+ (U+002B) or – (U+002D)." (in particlar the first clause of that
>sentence) doesn't make much sense. If e.g. 1.2 has an exponent of 10,
>it's going to be 6.1917 or so, not at all what this notation is usually
>used for.

Apparently in `x²` 2 is "an exponent of" x. That does not make much
sense to me either, but it does appear to be a common english idiom.
--
Björn Höhrmann · mailto:[hidden email] · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Carsten Bormann
In reply to this post by Allen Wirfs-Brock
Hi Allen,

here are some replies to your messages that I promised.  I opted not to
use line-by-line responses; I hope they are easier to read this way.
Two technical, and two more general points.

* Processing model

You are presenting a processing model that is based on a generic
parser that creates an AST and then application-specific
post-processing.  This is pretty much how XML worked.

One of the major advances of JSON was that it has a data model (even
if it is somewhat vaguely specified — implementers were still quick in
picking it up).  JSON implementations typically go right from the
characters to a representation of that data model that is appropriate
for the platform, and encoders typically go all the way back in one
step.  Interoperability happens as a result of this processing.

That's a major reason why it is so important to think about JSON in
terms of its data model.  The IETF JSON WG has elected not to flesh
out the model any more for 4627bis than it is in RFC 4627 (which I
personally don't agree with, but it would have been more hard work).
Dismantling what is there in the way of a data model, and thus falling
back into an XML mindset, would be a major regression, though.

* Description techniques

You are right that programming language designers have been used to
two-level grammars (scanner/parser) for a long time.  One thing that
RFC 4627 got right was not doing this, but using the single-level
ABNF.  (Technically, there still is an UTF-8 decoder below the ABNF,
but that is a rather well-understood, separable machine.)  JSON is
simple enough to enable single-level description, and RFC 5234 ABNF
provides a rigorous yet understandable way to do this.  There are
tools that operate on that ABNF and produce useful results, because it
has a defined meaning.

Let me be very frank here, because I get the impression that previous
statements about this were too diplomatic to be clear.

There is no way on earth that anyone can argue that the description of
the JSON syntax in ECMA-404 is in any way superior to that in RFC
4627.  By a large margin.  This is not about possibly finding bugs in
the hodge-podge that ECMA-404 is; thank you for offering to do ECMA's
work here, but I'm not biting.  This is about making sure from a start
that the spec is right.  Making 4627bis reference ECMA-404 would be a
major regression.  There is no reason for incurring this regression
seven years after it already was done right.  The IETF isn't known for
doing things that are unjustifiable on a technical level.

* Stewardship

You mention that there is significant programming language expertise
in TC39.  I couldn't agree more (actually, the previous sentence is a
wild understatement), and I can say that I have been using ECMA-262
(ES3 is the last version with which I have significant experience) as
a model for programming language standards.

My point was not at all about lack of experience, it is about
attention.  By its nature, TC39's work on JSON will always focus on
the narrower needs of JavaScript.  That makes TC39 less qualified for
the stewardship of a standard that transcends language barriers than
one might like to admit.  I'm not going to illustrate this point
further; there is ample illustration in the mailing list archives.

* Way forward

As always, only the chairs can speak for the JSON WG, and even they
need to confirm any needed consensus in the WG beforehand.  But I
think I can say that we are still only guessing what TC39 is trying to
achieve with the sudden creation of ECMA-404.  I think we need to have
a frank discussion about the objectives of further work on JSON.  The
JSON WG has a charter that defines its objectives, which is focusing
on stability and interoperability.  I'd like to understand TC39's
objectives with respect to JSON, so we can find out whether there is
common ground or not.

Grüße, Carsten

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Jorge Chamorro Bieling
On 08/12/2013, at 16:26, Carsten Bormann wrote:

>
> * Way forward
>
> As always, only the chairs can speak for the JSON WG, and even they
> need to confirm any needed consensus in the WG beforehand.  But I
> think I can say that we are still only guessing what TC39 is trying to
> achieve with the sudden creation of ECMA-404.  I think we need to have
> a frank discussion about the objectives of further work on JSON.  The
> JSON WG has a charter that defines its objectives, which is focusing
> on stability and interoperability.  I'd like to understand TC39's
> objectives with respect to JSON, so we can find out whether there is
> common ground or not.


Here's the message from the very same inventor of JSON telling exactly what's ECMA-404 "trying to achieve". Hope it helps:


Begin forwarded message:

> From: Douglas Crockford <[hidden email]>
> Date: 13 de junio de 2013 17:50:33 GMT+02:00
> To: "[hidden email]" <[hidden email]>
> Subject: [Json] Two Documents
> content-type: text/plain; charset="us-ascii"; Format="flowed"
>
> The confusion and controversy around this work is due to a mistake that I
> made in RFC 4627. The purpose of the RFC, which is clearly indicated
> in the title, was to establish a MIME type. I also gave a description of
> the JSON Data Interchange Format. My mistake was in conflating the two,
> putting details about the MIME type into the description of the format. My
> intention was to add clarity. That obviously was not the result.
>
> JSON is just a format. It describes a syntax of brackets and commas that
> is useful in many contexts, profiles, and applications. JSON is agnostic
> about all of that stuff. JSON shouldn't even care about character encoding.
> Its only dependence on Unicode in the hex numbers used in the \u notation.
> JSON can be encoded in ASCII or EBCDIC or even Hollerith codes. JSON can
> be used in contexts where there is no character encoding at all, such as
> paper documents and marble monuments.
>
> There are uses of JSON however in which such choices matter, and where
> behavior needs to be attached to or derived from the syntax. That is
> important stuff, and it belongs in different documents. Such documents
> will place necessary restrictions on JSON's potential. No such document
> can fit all applications, which causes much of the controversy we've seen
> here. One size cannot fit all. JSON the format is universal. But real
> applications require reasonable restrictions.
>
> So we should be working on at least two documents, which is something we have
> discussed earlier. The first is The JSON Data Interchange Format, which is
> a simple grammar. The second is a best practices document, which recommends
> specific conventions of usage.
>
> _______________________________________________
> json mailing list
> [hidden email]
> https://www.ietf.org/mailman/listinfo/json

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Allen Wirfs-Brock
In reply to this post by Allen Wirfs-Brock

On Dec 8, 2013, at 7:44 AM, John Cowan wrote:

> Allen Wirfs-Brock scripsit:
>
>> You are correct that, ES5 does not define the for-in enumeration order.
>> But it does say that the Object.keys ordering must be the same as
>> for-in enumeration order. and there is a defacto standard for a partial
>> enumeration order that all browsers implement.
>
> In short, half a dozen or so JSON implementations in a JavaScript
> environment agree.  That hardly means that all other JSON implementations
> in whatever environment should be dragged along with them.
>

Right, from an interoperability perspective, the half dozen or so user agents used by essentially the entire world to run web applications really aren't significant at all...

Allen
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Allen Wirfs-Brock
In reply to this post by Allen Wirfs-Brock

On Dec 7, 2013, at 11:00 PM, Martin J. Dürst wrote:

> On 2013/12/08 9:49, John Cowan wrote:
>> Tim Bray scripsit:
>>
>>> I assume all parties to the discussion know that in 100% of all
>>> programming-language libraries in use for dealing with JSON as
>>> transmitted on the wire, JSON objects are loaded into hash tables
>>> or dicts or whatever your language idiom is, and there is no way
>>> for software using those libraries to discover what order they were
>>> serialized in,
>>
>> Well, no, not 100%.  In Lisp-family languages, JSON objects are often
>> deserialized to ordered constructs.  Nevertheless:
>
> Similarly, as of somewhere around version 1.9.x or 2.0, Hash entries in Ruby are ordered, and one would assume that the original order in JSON would be reflected in the order of the Hash entries.
>
>>> any suggestion that this order should be considered significant for
>>> application usage would be regarded as insane.
>>
>> +1 to that.
>
> +1 here, too.

Millions of web developers write code with these sorts of dependencies.  Not because they are insane, more often it is because then are unaware of the bit falls.  However, its not an interoperability issue if they are writing web application code that is only intended run in a web browser and all browsers behave the same on that code.

More broadly, the JSON language binding parsers that I'm most familiar with do not generate a high fidelity  view of all valid JSON texts that they are presented with. It would be a mistake to depend upon such parsers to interchange data using JSON schemas that assign meaning to the ordering of object members.  However, that would not necessarily be the case for an application that is using a streaming JSON parser.

Consider this informal description of a data schema that is representable using JSON.

Conversation Schema:  A Conversation is a JSON text consisting of a single JSON object. The object may have an arbitrary number of members. The members represent a natural language conversation where the key part of each member identifies participant in the conversation and the value part of each member is a JSON string value that captures a statement by the associated participant. Multiple members may have the same key value, corresponding to multiple statements by the same participant. The order of the members corresponding to the order in which the statements were made during the conversation.  

And here is an example of such a JSON text:

------------start JSON text-------------
{
"allenwb":  "there is an objectively observable order to the members of a JSON object",
"JSON WG participant 1":  "It would be insane to depend upon that ordering",
"allenwb":  "not if there is agreement between a producer and consumer on the meaning of the ordering",
"JSON WG participant 2":  "But JSON.parse and similar language bindings don't preserve order",
"allenwb":  "A streaming JSON parser would naturally preserve member order",
"JSON WG participant 2": "I din't think there are any such parsers",
"allenwb": "But someone might decide to create one, and if they do it will expose object members, in order",
"allenwb": "Plus, in this particular case the schema is so simple the application developer might well design to write a custom, schema specific streaming parser."
}
-----------end JSON text-------

Allen

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Response to Statement from W3C TAG

Allen Wirfs-Brock
In reply to this post by Allen Wirfs-Brock

On Dec 7, 2013, at 11:09 PM, Martin J. Dürst wrote:

> [Somebody please forward this message to [hidden email], unless it's not rejected.]
>
> On 2013/12/08 8:05, Allen Wirfs-Brock wrote:
>
>>> In JSON, objects are unordered collections or sets of name/value pairs.  It says so right there on json.org (“sets”), and it says so in RFC 4627 (“collections”)*).  We may not like it, but it has been a promise for a decade.  We need to heed it.  (Another promise was that JSON doesn’t change**).)
>>
>> You also need to look at objective reality and consider the possibility that the informal (and non-normative text) on both the json.org website and in the original RFC never actually matched reality.
>>
>> JSON is derived from JavaSript (whose standard is ECMA-262) and since 2009, ECMA-262 (and its clone ISO/IEC-16262) has included a normative specification for parsing JSON text that includes an ordering semantics for object members.
>
> RFC 4627 was published in July 2006, so the ECMA-262 version of 2009 may not be very relevant.
>
My understanding was that one of reasons for activating the JSONWG was the perceived need for a JSON grammar that could be normatively referenced.  RFC 4627 (2006) was not a normative document.  ECMA-262, 5th Edition (2009) aka ISO/IEC-16262-3 (2011) is a normative document.  So is ECMA-404.

Allen

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
123