JSON specification WAS: Re: JSON Duplicate Keys

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

JSON specification WAS: Re: JSON Duplicate Keys

gaz Heyes
Hey all

Since Doug is here I thought I'd use the opportunity to discuss the broken JSON spec.

When ES5 introduced line/para separators for valid new lines in JavaScript this broke the JSON specification. The code sample used within the RFC is also broken since line/para separators would cause eval to fail. It needs to be fixed IMO. While taking this opportunity to fix the specification by forcing line/para to unicode escapes e.g \u2028, \u2029 we should also force "<" into \x3c and ">" into \x3e. I propose the following changes:

1. Remove the awful invalid code sample.
2. The string section instead of "Any character may be escaped." there should be some exceptions. < and > must be encoded with hex/unicode escapes. Line/Para separators must be encoded to unicode escapes and finally right-to-left and left-to-right mark should also required to be encoded in unicode escapes.

I think we also need to discuss the keywords present in json keys for example:
{"__proto__":[]}
Should proto be a banned keyword in JSON keys? I think giving data the ability to control it's type could break things when it's used elsewhere. There is also an argument for requiring hex escapes for characters lower than 0x09 but I guess a lot of people will be against that.


Thanks

Gareth

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON specification WAS: Re: JSON Duplicate Keys

Jeremy Darling
The only thing that limiting key names in a data description language does is to force people to have to escape those keys when they need to use them.  As an example in Mongo they use $ to denote special action taken by the DB and do not allow any keys in the BSON (yes, I know it isn't JSON) packet to start with a $.  So if you need (or want) to store something like an Aggregation Pipeline inside Mongo you have to escape your keys that start with $, this also means that when you pull your documents back out you have to unescape those sequences.

Thinking about this for JSON that means that if you were wanting to serialize a complete object (prototype, setters, getters, etc) you would then have to escape all of those as well.  Since JavaScript doesn't really have a standard symbol for stating "This is a control key" then dev's would have to code for all possible key words in their escape/unescape sequence.

Or the spec would have to be amended to contain some type of "isReservedKey" statement.  This just introduces a whole new can of worms and more breaking points.

At the end of the day, JSON is what JSON is, a data description and transport language.  As such it should be as flexible as possible within reason.  Duplicate Keys should never have been allowed in the spec in the first place (most all key/value stores don't allow it) or if they were allowed when decomposed to JS objects they should have created arrays of values much like most frameworks do for duplicate keys in the HTTP headers.

I don't exactly understand your point in bullet 2?  Why do < and > have to be encoded, they are perfectly valid values inside of a string today, and again many libraries make use of this functionality for transporting HTML, XML, and Text fragments.  To Crockfords point, "should not break" becomes the issue with changing what should and should not be encoded.

Or did I mis-understand your points?

Just my two cents,
 - Jeremy


On Thu, Jun 6, 2013 at 8:00 AM, gaz Heyes <[hidden email]> wrote:
Hey all

Since Doug is here I thought I'd use the opportunity to discuss the broken JSON spec.

When ES5 introduced line/para separators for valid new lines in JavaScript this broke the JSON specification. The code sample used within the RFC is also broken since line/para separators would cause eval to fail. It needs to be fixed IMO. While taking this opportunity to fix the specification by forcing line/para to unicode escapes e.g \u2028, \u2029 we should also force "<" into \x3c and ">" into \x3e. I propose the following changes:

1. Remove the awful invalid code sample.
2. The string section instead of "Any character may be escaped." there should be some exceptions. < and > must be encoded with hex/unicode escapes. Line/Para separators must be encoded to unicode escapes and finally right-to-left and left-to-right mark should also required to be encoded in unicode escapes.

I think we also need to discuss the keywords present in json keys for example:
{"__proto__":[]}
Should proto be a banned keyword in JSON keys? I think giving data the ability to control it's type could break things when it's used elsewhere. There is also an argument for requiring hex escapes for characters lower than 0x09 but I guess a lot of people will be against that.


Thanks

Gareth

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON specification WAS: Re: JSON Duplicate Keys

gaz Heyes
On 6 June 2013 14:25, Jeremy Darling <[hidden email]> wrote:
Thinking about this for JSON that means that if you were wanting to serialize a complete object (prototype, setters, getters, etc) you would then have to escape all of those as well.  Since JavaScript doesn't really have a standard symbol for stating "This is a control key" then dev's would have to code for all possible key words in their escape/unescape sequence.

Or the spec would have to be amended to contain some type of "isReservedKey" statement.  This just introduces a whole new can of worms and more breaking points.

Yeah I agree it's a can of worms but I wanted to throw it out there for discussion. As we get more keywords and functionality it could bite in the butt in a few years time.

I don't exactly understand your point in bullet 2?  Why do < and > have to be encoded, they are perfectly valid values inside of a string today, and again many libraries make use of this functionality for transporting HTML, XML, and Text fragments.  To Crockfords point, "should not break" becomes the issue with changing what should and should not be encoded.

 For two reasons 1) Some browsers will render json as html (sniffing) 2) Inline JSON data will be parsed in a different order than it's intended. For example < will be parsed first as html and then javascript. Encoding < and > will have no effect on passing xml/html data but will prevent those issues.

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON specification WAS: Re: JSON Duplicate Keys

Brendan Eich-3
In reply to this post by gaz Heyes
gaz Heyes wrote:
> When ES5 introduced line/para separators for valid new lines in
> JavaScript this broke the JSON specification

ES3, not ES5, way back in 1999, introduced LINE_SEPARATOR and
PARA_SEPARATOR as line terminators. See ECMA-262 Edition 3, 7.3.

/be
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON specification WAS: Re: JSON Duplicate Keys

gaz Heyes
On 7 June 2013 11:43, Brendan Eich <[hidden email]> wrote:
ES3, not ES5, way back in 1999, introduced LINE_SEPARATOR and PARA_SEPARATOR as line terminators. See ECMA-262 Edition 3, 7.3.

Oh that makes it better then =)

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON specification WAS: Re: JSON Duplicate Keys

Brendan Eich-3
gaz Heyes wrote:
> On 7 June 2013 11:43, Brendan Eich <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     ES3, not ES5, way back in 1999, introduced LINE_SEPARATOR and
>     PARA_SEPARATOR as line terminators. See ECMA-262 Edition 3, 7.3.
>
>
> Oh that makes it better then =)

No, it's still crappy, but accuracy counts :-/. Also the age of the
standard counts.

/be
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss