Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

Anne van Kesteren
On Thu, Nov 21, 2013 at 1:35 PM, Henri Sivonen <[hidden email]> wrote:
> UTF-32 harms JSON interchange, because Gecko removed all UTF-32
> support throughout the engine (other engines probably did, too, but
> I'm too busy to check) and, therefore, XHR responseType = "json"
> doesn't support UTF-32.

XHR's responseType = "json" only supports UTF-8 (optionally with a
leading BOM), across the board.


--
http://annevankesteren.nl/
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

Allen Wirfs-Brock

On Nov 22, 2013, at 8:39 AM, Tim Bray wrote:

> I’ve been using JSON for quite a few years, but hardly ever in either a to-browser or from-browser role; what I care about is mostly its use in RESTful APIs generally and identity APIs specifically.  In those scenarios, it would be seen as wildly inappropriate to use anything but UTF-8; I’ve never actually seen anything else.  In practice, it would be very unlikely for anyone to deploy UTF-16 or any other non-UTF-8 flavor in a non-browser scenario.
>
> Having said that, I’m still, hundreds of messages later, not 100% sure what our draft should say about BOMs :(

You should say it that it is not an actual issue of the JSON format whose grammar clearly defines the handling of the 0xfeff code point.  Rather it is an upstream data interchange issue that should be dealt with in exactly the same way as with any other data interchange on a similar channel.  Say whatever you think is appropriate about BOMs in the transmission of data conforming to the "application/json" MIME type.  Just be clear that whatever you decide has nothing to do with the abstract, grammar-based interpretation of the actual JSON payload.

Allen

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

Bjoern Hoehrmann
* Matt Miller (mamille2) wrote:

>There does seem to be rough consensus that using an encoding other than
>UTF-8 can have interoperability issues.  The also seems to be rough
>consensus that the current text and table in section 8.1 for detecting
>the encoding will be inaccurate (and potentially harmful).
>
>That appears to mean the approach with the most consensus is to remove
>the encoding detection entirely, leaving only:
>
>""""
>   JSON text SHALL be encoded in Unicode.  The default encoding is
>   UTF-8.
>""""

Neither of the quoted statements mean anything as far as I can tell.
The encoding detection rules are a vital part of the specification and
cannot be removed without replacement. I am not aware of any argument
that the text "will be" inaccurate or harmful.
--
Björn Höhrmann · mailto:[hidden email] · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

Allen Wirfs-Brock
In reply to this post by Allen Wirfs-Brock

On Nov 25, 2013, at 1:37 AM, Martin J. Dürst wrote:



On 2013/11/23 1:54, Allen Wirfs-Brock wrote:



You should say it that it is not an actual issue of the JSON format whose grammar clearly defines the handling of the 0xfeff code point.  Rather it is an upstream data interchange issue that should be dealt with in exactly the same way as with any other data interchange on a similar channel.  Say whatever you think is appropriate about BOMs in the transmission of data conforming to the "application/json" MIME type.  Just be clear that whatever you decide has nothing to do with the abstract, grammar-based interpretation of the actual JSON payload.

That works for ECMA-404. It does not work for the IETF draft, because it is extremely relevant for application/json, which is part of that draft.

Regards,    Martin.

It still seems pretty clear.  Anything feed as input into an actual parser for the JSON grammar as standardized by ECMA-404 must not contain any U+feff code points other than as part of JSON string values.  If the application/json wire format chooses to to use BOMs, then they must be removed before processing by a standard JSON parser.   From a JSON parser prospect, I think that's pretty much all you need to say. 

allen

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

Bjoern Hoehrmann
In reply to this post by Anne van Kesteren
* Nico Williams wrote:
>We must not require encoding detection functionality in parsers.  We
>must not forbid it either.  We might need to say that encodings other
>than UTF-8/16/32 may not be reliably detected, therefore they are highly
>discouraged, even forbidden except where protocols specifically call for
>them.

When I pass a fully conforming UTF-8 encoded application/json entity to
a fully conforming JSON parser I do not want the parser to do something
funny like interpreting the document as if it were Windows-1252 encoded.
I am amazed how many people here think a parser that does that should
not be considered broken.
--
Björn Höhrmann · mailto:[hidden email] · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: [Json] Encoding detection (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

Bjoern Hoehrmann
* Nico Williams wrote:

>On Tue, Nov 26, 2013 at 09:15:38PM +0100, Bjoern Hoehrmann wrote:
>> * Nico Williams wrote:
>> >We must not require encoding detection functionality in parsers.  We
>> >must not forbid it either.  We might need to say that encodings other
>> >than UTF-8/16/32 may not be reliably detected, therefore they are highly
>> >discouraged, even forbidden except where protocols specifically call for
>> >them.
>>
>> When I pass a fully conforming UTF-8 encoded application/json entity to
>> a fully conforming JSON parser I do not want the parser to do something
>> funny like interpreting the document as if it were Windows-1252 encoded.
>> I am amazed how many people here think a parser that does that should
>> not be considered broken.
>
>You missed the point.

"We must require encoding detection functionality in parsers. We must
forbid encoding detection functionality beyond that. We must say that
encodings other than UTF-8/16/32 are forbidden in any and all cases."
is how I would modify what you said above (with some caveats).

Note that I am talking about labeled sequences of octets, application/
json entities, not paintings on a cave wall that look similar to JSON
text in a strange font. In a labeled sequence of octets I can tell for
sure whether there are invisible characters in it if I know the en-
coding.

There are two forms to consider. One is the labeled sequence of octets
that we call "application/json entity". The other is a sequence of Uni-
code scalar values. That is the alphabet of the ABNF grammar in the
specification. If you have anything else, then the specification does
not apply to your situation.

>If you wanted to forbid non-Unicode, non-UTF encodings, then you'd be
>preventing such a shell, and for what reason?  If you only mean that
>auto-detection of encoding should not even be mentioned, I'm fine with
>that, and I've already said so earlier.

Above I said that there are two forms to consider. Encoding detection
is what allows us to convert the "application/json entity" form into
the "sequence of Unicode scalar values" form. We need the latter form
in order to apply the ABNF grammar. Imagine you receive this:

  HTTP/1.1 200 OK
  Content-Type: application/json
  ...

  ABCD...

There would be at least two specifications that apply here, the HTTP
and the application/json specification. Would you like them to say
that you are on your own, "ABCD..." could mean anything? I would like
them to say "ABCD..." is an array with three times the integer zero,
like `[0,0,0]`. I can build robust software based on that.

I cannot build robust software based on "well, maybe it's EBCDIC?
Have you tried GB 18030? UTF-7 might be worth a try otherwise. Are
you sure this matters at all?"
--
Björn Höhrmann · mailto:[hidden email] · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss