JSON.canonicalize()

classic Classic list List threaded Threaded
73 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: Summary of Input. Re: JSON.canonicalize()

Anders Rundgren-2
On 2018-03-18 20:23, Mike Samuel wrote:
>     It is possible that I don't understand what you are asking for here since I have no experience with toJSON.
>
>     Based on this documentation
>     https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify>
>     JSON.canonicalize() would though work out of the box (when integrated in the JSON object NB...) since it would inherit all the functionality (and 99% of the code) of JSON.stringify()
>
>
> JSON.stringify(new Date()) has specific semantics because Date.prototype.toJSON has specific semantics.
> As currently written, JSON.canonicalize(new Date()) === JSON.canonicalize({})

It seems that you (deliberately?) misunderstand what I'm writing above.

JSON.canonicalize(new Date()) would do exactly the same thing as JSON.stringify(new Date()) since it apparently only returns a string.

Again, the sample code I provided is a bare bones solution with the only purpose showing the proposed canonicalization algorithm in code as a complement to the written specification.

Anders
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Summary of Input. Re: JSON.canonicalize()

Mike Samuel


On Sun, Mar 18, 2018, 4:00 PM Anders Rundgren <[hidden email]> wrote:
On 2018-03-18 20:23, Mike Samuel wrote:
>     It is possible that I don't understand what you are asking for here since I have no experience with toJSON.
>
>     Based on this documentation
>     https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify>
>     JSON.canonicalize() would though work out of the box (when integrated in the JSON object NB...) since it would inherit all the functionality (and 99% of the code) of JSON.stringify()
>
>
> JSON.stringify(new Date()) has specific semantics because Date.prototype.toJSON has specific semantics.
> As currently written, JSON.canonicalize(new Date()) === JSON.canonicalize({})

It seems that you (deliberately?) misunderstand what I'm writing above.

JSON.canonicalize(new Date()) would do exactly the same thing as JSON.stringify(new Date()) since it apparently only returns a string.

Where in the spec do you handle this case?

Again, the sample code I provided is a bare bones solution with the only purpose showing the proposed canonicalization algorithm in code as a complement to the written specification.

Understood.  AFAICT neither the text nor the instructional code treat Dates differently from an empty object.


Anders

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Summary of Input. Re: JSON.canonicalize()

Anders Rundgren-2
On 2018-03-18 21:06, Mike Samuel wrote:

>
>
> On Sun, Mar 18, 2018, 4:00 PM Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:
>
>     On 2018-03-18 20:23, Mike Samuel wrote:
>      >     It is possible that I don't understand what you are asking for here since I have no experience with toJSON.
>      >
>      >     Based on this documentation
>      > https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify>
>      >     JSON.canonicalize() would though work out of the box (when integrated in the JSON object NB...) since it would inherit all the functionality (and 99% of the code) of JSON.stringify()
>      >
>      >
>      > JSON.stringify(new Date()) has specific semantics because Date.prototype.toJSON has specific semantics.
>      > As currently written, JSON.canonicalize(new Date()) === JSON.canonicalize({})
>
>     It seems that you (deliberately?) misunderstand what I'm writing above.
>
>     JSON.canonicalize(new Date()) would do exactly the same thing as JSON.stringify(new Date()) since it apparently only returns a string.
>
>
> Where in the spec do you handle this case?

It doesn't, it only describes a canonicalization algorithm.

Integration of the canonicalization algorithm in the ES JSON object might cost as much a 5 lines of code + some refactoring.

Anders

>
>     Again, the sample code I provided is a bare bones solution with the only purpose showing the proposed canonicalization algorithm in code as a complement to the written specification.
>
>
> Understood.  AFAICT neither the text nor the instructional code treat Dates differently from an empty object.
>
>
>     Anders
>

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Summary of Input. Re: JSON.canonicalize()

Anders Rundgren-2
In reply to this post by Mike Samuel
On 2018-03-18 20:23, Mike Samuel wrote:

>
>              F.Y.I: Using ES6 serialization methods for JSON primitive types is headed for standardization in the IETF.
>         https://www.ietf.org/mail-archive/web/jose/current/msg05716.html <https://www.ietf.org/mail-archive/web/jose/current/msg05716.html> <https://www.ietf.org/mail-archive/web/jose/current/msg05716.html <https://www.ietf.org/mail-archive/web/jose/current/msg05716.html>>
>
>              This effort is backed by one of the main authors behind the current de-facto standard for Signed and Encrypted JSON, aka JOSE.
>              If this is in your opinion is a bad idea, now is the right time to shoot it down :-)
>
>
>         Does this main author prefer your particular JSON canonicalization scheme to
>         others?
>
>
>     This proposal does [currently] not rely on canonicalization but on ES6 "predictive parsing and serialization".
>
>
>         Is this an informed opinion based on flaws in the others that make them less suitable for
>         JOSE's needs that are not present in the scheme you back?
>
>
>     A JSON canonicalization scheme has AFAIK never been considered in the relevant IETF groups (JOSE+JSON).
>     On the contrary, it has been dismissed as a daft idea.
>
>     I haven't yet submitted my [private] I-D. I'm basically here for collecting input and finding possible collaborators.
>
>
>         If so, please provide links to their reasoning.
>         If not, how is their backing relevant?
>
>
>     If ES6/JSON.stringify() way of serializing JSON primitives becomes an IETF standard with backed by Microsoft, it may have an impact on the "market".
>
>
> If you can't tell us anything concrete about your backers, what they back, or why they back it, then why bring it up?

Who they are, What they back, and Why the back it (Rationale), is in the referred document above.
Here is a nicer HTML variant of the I-D: https://tools.ietf.org/id/draft-erdtman-jose-cleartext-jws-00.html

Anders
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Anders Rundgren-2
In reply to this post by Mike Samuel
On 2018-03-18 20:15, Mike Samuel wrote:

> I and others have been trying to move towards consensus on what a hashable form of
> JSON should look like.
>
> We've identified key areas including
> * property ordering,
> * number canonicalization,
> * string normalization,
> * whether the input should be a JS value or a string of JSON,
> * and others
>
> but, as in this case, you seem to be arguing both sides of a position to support your
> proposal when you could just say "yes, the proposal could be adjusted along this
> dimension and still provide what's required."

For good or for worse, my proposal is indeed about leveraging ES6's take on JSON including limitations, {bugs}, and all.
I'm not backing from that position because then things get way more complex and probably never even happen.

Extending [*] the range of "Number" is pretty much (in practical terms) the same thing as changing JSON itself.

"Number" is indeed mindless crap but it is what is.

OTOH, the "Number" problem was effectively solved some 10 years ago through putting stuff in "strings".
Using JSON Schema or "Old School" strongly typed programmatic solutions of the kind I use, this actually works great.

Anders

*] The RFC gives you the right to do that but existing implementations do not.
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Mike Samuel


On Sun, Mar 18, 2018, 4:50 PM Anders Rundgren <[hidden email]> wrote:
On 2018-03-18 20:15, Mike Samuel wrote:
> I and others have been trying to move towards consensus on what a hashable form of
> JSON should look like.
>
> We've identified key areas including
> * property ordering,
> * number canonicalization,
> * string normalization,
> * whether the input should be a JS value or a string of JSON,
> * and others
>
> but, as in this case, you seem to be arguing both sides of a position to support your
> proposal when you could just say "yes, the proposal could be adjusted along this
> dimension and still provide what's required."

For good or for worse, my proposal is indeed about leveraging ES6's take on JSON including limitations, {bugs}, and all.
I'm not backing from that position because then things get way more complex and probably never even happen.

Extending [*] the range of "Number" is pretty much (in practical terms) the same thing as changing JSON itself.

Your proposal is limiting Number; my alternative is not extending Number.

"Number" is indeed mindless crap but it is what is.

OTOH, the "Number" problem was effectively solved some 10 years ago through putting stuff in "strings".
Using JSON Schema or "Old School" strongly typed programmatic solutions of the kind I use, this actually works great.

Anders

*] The RFC gives you the right to do that but existing implementations do not.

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Anders Rundgren-2
On 2018-03-18 21:53, Mike Samuel wrote:
>     For good or for worse, my proposal is indeed about leveraging ES6's take on JSON including limitations, {bugs}, and all.
>     I'm not backing from that position because then things get way more complex and probably never even happen.
>
>     Extending [*] the range of "Number" is pretty much (in practical terms) the same thing as changing JSON itself.
>
>
> Your proposal is limiting Number; my alternative is not extending Number.


Quoting earlier messages from you:

   "Your proposal is less interoperable because you are quoting a SHOULD,
    interpreting it as MUST and saying inputs MUST fit into an IEEE 754 double without loss of precision.
    This makes it strictly less interoperable than a proposal that does not have that constraint"

   "JSON does not have numeric precision limits.  There are plenty of systems that use JSON
    that never involve JavaScript and which pack int64s"

Well, it took a while figuring this out.  No harm done.  Nobody died.

I think we can safely put this thread to rest now; you want to fix a problem that was fixed > 10Y+ back through other measures [*].

Thanx,
Anders

*] Cryptography using JSON exchange integers that are 256 bit long and more
    Business system using JSON exchange long decimal numbers
    Scientific systems cramming 80-bit IEEE-754 into "Number" may exist but then we are probably talking about research projects using forked/home-grown JSON software

"Number" was never sufficient and will (IMO MUST) remain in its crippled form, at least if we stick to mainstream.




_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Mike Samuel
In reply to this post by Anders Rundgren-2
How does the transform you propose differ from?

JSON.canonicalize = (x) => JSON.stringify(
    x,
    (_, x) => {
      if (x && typeof x === 'object' && !Array.isArray(x)) {
        const sorted = {}
        for (let key of Object.getOwnPropertyNames(x).sort()) {
          sorted[key] = x[key]
        }
        return sorted
      }
      return x
    })


The proposal says "in lexical (alphabetical) order."
If "lexical order" differs from the lexicographic order that sort uses, then
the above could be adjusted to pass a comparator function.

Applied to your example input,

JSON.canonicalize({
    "escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
    "other":  [null, true, false],
    "numbers": [1E30, 4.50, 6, 2e-3, 0.000000000000000000000000001]
  }) ===
      String.raw`{"escaping":"€$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}`
// proposed {"escaping":"\u20ac$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}


The canonicalized example from section 3.2.3 seems to conflict with the text of 3.2.2:

"""
If the Unicode value is outside of the ASCII control character range, it MUST be serialized "as is" unless it is equivalent to 0x005c (\) or 0x0022 (") which MUST be serialized as \\ and \" respectively.
"""

So I think the "\u20ac" should actually be "" and the implementation above matches your proposal.


On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <[hidden email]> wrote:
Dear List,

Here is a proposal that I would be very happy getting feedback on since it builds on ES but is not (at all) limited to ES.

The request is for a complement to the ES "JSON" object called canonicalize() which would have identical parameters to the existing stringify() method.

The JSON canonicalization scheme (including ES code for emulating it), is described in:
https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html

Current workspace: https://github.com/cyberphone/json-canonicalization

Thanx,
Anders Rundgren
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Anders Rundgren-2
On 2018-03-19 14:34, Mike Samuel wrote:

> How does the transform you propose differ from?
>
> JSON.canonicalize = (x) => JSON.stringify(
>      x,
>      (_, x) => {
>        if (x && typeof x === 'object' && !Array.isArray(x)) {
>          const sorted = {}
>          for (let key of Object.getOwnPropertyNames(x).sort()) {
>            sorted[key] = x[key]
>          }
>          return sorted
>        }
>        return x
>      })

Probably not all.  You are the JS guru, not me :-)

>
> The proposal says "in lexical (alphabetical) order."
> If "lexical order" differs from the lexicographic order that sort uses, then
> the above could be adjusted to pass a comparator function.

I hope (and believe) that this is just a terminology problem.

> Applied to your example input,
>
> JSON.canonicalize({
>      "escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
>      "other":  [null, true, false],
>      "numbers": [1E30, 4.50, 6, 2e-3, 0.000000000000000000000000001]
>    }) ===
>        String.raw`{"escaping":"€$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}`
> // proposed {"escaping":"\u20ac$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}
>
>
> The canonicalized example from section 3.2.3 seems to conflict with the text of 3.2.2:

If you look a under the result you will find a pretty sad explanation:

         "Note: \u20ac denotes the Euro character, which not
          being ASCII, is currently not displayable in RFCs"

After 30 years with RFCs, we can still only use ASCII :-( :-(

Updates:
https://github.com/cyberphone/json-canonicalization/blob/master/JSON.canonicalize.md
https://cyberphone.github.io/doc/security/browser-json-canonicalization.html

Anders

>
> """
> If the Unicode value is outside of the ASCII control character range, it MUST be serialized "as is" unless it is equivalent to 0x005c (\) or 0x0022 (") which MUST be serialized as \\ and \" respectively.
> """
>
> So I think the "\u20ac" should actually be "€" and the implementation above matches your proposal.
>
>
> On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Dear List,
>
>     Here is a proposal that I would be very happy getting feedback on since it builds on ES but is not (at all) limited to ES.
>
>     The request is for a complement to the ES "JSON" object called canonicalize() which would have identical parameters to the existing stringify() method.
>
>     The JSON canonicalization scheme (including ES code for emulating it), is described in:
>     https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html>
>
>     Current workspace: https://github.com/cyberphone/json-canonicalization <https://github.com/cyberphone/json-canonicalization>
>
>     Thanx,
>     Anders Rundgren
>     _______________________________________________
>     es-discuss mailing list
>     [hidden email] <mailto:[hidden email]>
>     https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>
>
>

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Mike Samuel


On Mon, Mar 19, 2018 at 9:53 AM, Anders Rundgren <[hidden email]> wrote:
On 2018-03-19 14:34, Mike Samuel wrote:
How does the transform you propose differ from?

JSON.canonicalize = (x) => JSON.stringify(
     x,
     (_, x) => {
       if (x && typeof x === 'object' && !Array.isArray(x)) {
         const sorted = {}
         for (let key of Object.getOwnPropertyNames(x).sort()) {
           sorted[key] = x[key]
         }
         return sorted
       }
       return x
     })

Probably not all.  You are the JS guru, not me :-)


The proposal says "in lexical (alphabetical) order."
If "lexical order" differs from the lexicographic order that sort uses, then
the above could be adjusted to pass a comparator function.

I hope (and believe) that this is just a terminology problem.

is where it's specified.  After checking that no custom comparator is present:
  1. Let xString be ToString(x).
  2. ReturnIfAbrupt(xString).
  3. Let yString be ToString(y).
  4. ReturnIfAbrupt(yString).
  5. If xString < yString, return −1.
  6. If xString > yString, return 1.
  7. Return +0.

(<) and (>) do not themselves bring in any locale-specific collation rules.

If both px and py are Strings, then
  1. If py is a prefix of px, return false. (A String value p is a prefix of String value q if q can be the result of concatenating p and some other String r. Note that any String is a prefix of itself, because r may be the empty String.)
  2. If px is a prefix of py, return true.
  3. Let k be the smallest nonnegative integer such that the code unit at index k within px is different from the code unit at index k within py. (There must be such a k, for neither String is a prefix of the other.)
  4. Let m be the integer that is the code unit value at index k within px.
  5. Let n be the integer that is the code unit value at index k within py.
  6. If m < n, return true. Otherwise, return false.
Those code unit values are UTF-16 code unit values per

each element in the String is treated as a UTF-16 code unit value

As someone mentioned earlier in this thread, lexicographic string comparisons that use different code
unit sizes can compute different results for the same semantic string value.  Between UTF-8 and UTF-32
you should see no difference, but UTF-16 can differ from those given supplementary codepoints.

It might be worth making explicit that your lexical order is over UTF-16 strings if that's what you intend.

 
Applied to your example input,

JSON.canonicalize({
     "escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
     "other":  [null, true, false],
     "numbers": [1E30, 4.50, 6, 2e-3, 0.000000000000000000000000001]
   }) ===
       String.raw`{"escaping":"€$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}`
// proposed {"escaping":"\u20ac$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}


The canonicalized example from section 3.2.3 seems to conflict with the text of 3.2.2:

If you look a under the result you will find a pretty sad explanation:

        "Note: \u20ac denotes the Euro character, which not
         being ASCII, is currently not displayable in RFCs"

Cool.
 
After 30 years with RFCs, we can still only use ASCII :-( :-(

Updates:
https://github.com/cyberphone/json-canonicalization/blob/master/JSON.canonicalize.md
https://cyberphone.github.io/doc/security/browser-json-canonicalization.html

If this can be implemented in a small amount of library code, what do you need from TC39?

 
Anders


"""
If the Unicode value is outside of the ASCII control character range, it MUST be serialized "as is" unless it is equivalent to 0x005c (\) or 0x0022 (") which MUST be serialized as \\ and \" respectively.
"""

So I think the "\u20ac" should actually be "€" and the implementation above matches your proposal.


On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:

    Dear List,

    Here is a proposal that I would be very happy getting feedback on since it builds on ES but is not (at all) limited to ES.

    The request is for a complement to the ES "JSON" object called canonicalize() which would have identical parameters to the existing stringify() method.

    The JSON canonicalization scheme (including ES code for emulating it), is described in:
    https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html>

    Current workspace: https://github.com/cyberphone/json-canonicalization <https://github.com/cyberphone/json-canonicalization>

    Thanx,
    Anders Rundgren
    _______________________________________________
    es-discuss mailing list
    [hidden email] <mailto:[hidden email]>
    https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>





_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Anders Rundgren-2
On 2018-03-19 15:17, Mike Samuel wrote:

>
>
> On Mon, Mar 19, 2018 at 9:53 AM, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:
>
>     On 2018-03-19 14:34, Mike Samuel wrote:
>
>         How does the transform you propose differ from?
>
>         JSON.canonicalize = (x) => JSON.stringify(
>               x,
>               (_, x) => {
>                 if (x && typeof x === 'object' && !Array.isArray(x)) {
>                   const sorted = {}
>                   for (let key of Object.getOwnPropertyNames(x).sort()) {
>                     sorted[key] = x[key]
>                   }
>                   return sorted
>                 }
>                 return x
>               })
>
>
>     Probably not all.  You are the JS guru, not me :-)
>
>
>         The proposal says "in lexical (alphabetical) order."
>         If "lexical order" differs from the lexicographic order that sort uses, then
>         the above could be adjusted to pass a comparator function.
>
>
>     I hope (and believe) that this is just a terminology problem.
>
>
> I think you're right. http://www.ecma-international.org/ecma-262/6.0/#sec-sortcompare
> is where it's specified.  After checking that no custom comparator is present:
>
>  1. Let/xString/beToString <http://www.ecma-international.org/ecma-262/6.0/#sec-tostring>(/x/).
>  2. ReturnIfAbrupt <http://www.ecma-international.org/ecma-262/6.0/#sec-returnifabrupt>(/xString/).
>  3. Let/yString/beToString <http://www.ecma-international.org/ecma-262/6.0/#sec-tostring>(/y/).
>  4. ReturnIfAbrupt <http://www.ecma-international.org/ecma-262/6.0/#sec-returnifabrupt>(/yString/).
>  5. If/xString/</yString/, return −1.
>  6. If/xString/>/yString/, return 1.
>  7. Return +0.
>
>
> (<) and (>) do not themselves bring in any locale-specific collation rules.
> They bottom out on http://www.ecma-international.org/ecma-262/6.0/#sec-abstract-relational-comparison
>
> If both/px/and/py/are Strings, then
>
>  1. If/py/is a prefix of/px/, return*false*. (A String value/p/is a prefix of String value/q/if/q/can be the result of concatenating/p/and some other String/r/. Note that any String is a prefix of itself, because/r/may be the empty String.)
>  2. If/px/is a prefix of/py/, return*true*.
>  3. Let/k/be the smallest nonnegative integer such that the code unit at index/k/within/px/is different from the code unit at index/k/within/py/. (There must be such a/k/, for neither String is a prefix of the other.)
>  4. Let/m/be the integer that is the code unit value at index/k/within/px/.
>  5. Let/n/be the integer that is the code unit value at index/k/within/py/.
>  6. If/m/</n/, return*true*. Otherwise, return*false*.
>
> Those code unit values are UTF-16 code unit values per
> http://www.ecma-international.org/ecma-262/6.0/#sec-ecmascript-language-types-string-type
>
> each element in the String is treated as a UTF-16 code unit value
>
> As someone mentioned earlier in this thread, lexicographic string comparisons that use different code
> unit sizes can compute different results for the same semantic string value.  Between UTF-8 and UTF-32
> you should see no difference, but UTF-16 can differ from those given supplementary codepoints.
>
> It might be worth making explicit that your lexical order is over UTF-16 strings if that's what you intend.

Right, it is actually already in 3.2.3:

   Property strings to be sorted depend on that strings are represented
   as arrays of 16-bit unsigned integers where each integer holds a single
   UCS2/UTF-16 [UNICODE] code unit. The sorting is based on pure value
   comparisons, independent of locale settings.

This maps "natively" to JS and Java.  Probably to .NET as well.
Other systems may need a specific comparator.



>
>         Applied to your example input,
>
>         JSON.canonicalize({
>               "escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
>               "other":  [null, true, false],
>               "numbers": [1E30, 4.50, 6, 2e-3, 0.000000000000000000000000001]
>             }) ===
>                 String.raw`{"escaping":"€$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}`
>         // proposed {"escaping":"\u20ac$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}
>
>
>         The canonicalized example from section 3.2.3 seems to conflict with the text of 3.2.2:
>
>
>     If you look a under the result you will find a pretty sad explanation:
>
>              "Note: \u20ac denotes the Euro character, which not
>               being ASCII, is currently not displayable in RFCs"
>
>
> Cool.
>
>     After 30 years with RFCs, we can still only use ASCII :-( :-(
>
>     Updates:
>     https://github.com/cyberphone/json-canonicalization/blob/master/JSON.canonicalize.md <https://github.com/cyberphone/json-canonicalization/blob/master/JSON.canonicalize.md>
>     https://cyberphone.github.io/doc/security/browser-json-canonicalization.html <https://cyberphone.github.io/doc/security/browser-json-canonicalization.html>
>
>
> If this can be implemented in a small amount of library code, what do you need from TC39?

At this stage probably nothing, the BIG issue is the algorithm which I took the liberty airing in this forum.
To date all efforts creating a JSON canonicalization standard has been shot down or been abandoned.

Anders

>
>     Anders
>
>
>         """
>         If the Unicode value is outside of the ASCII control character range, it MUST be serialized "as is" unless it is equivalent to 0x005c (\) or 0x0022 (") which MUST be serialized as \\ and \" respectively.
>         """
>
>         So I think the "\u20ac" should actually be "€" and the implementation above matches your proposal.
>
>
>         On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>
>              Dear List,
>
>              Here is a proposal that I would be very happy getting feedback on since it builds on ES but is not (at all) limited to ES.
>
>              The request is for a complement to the ES "JSON" object called canonicalize() which would have identical parameters to the existing stringify() method.
>
>              The JSON canonicalization scheme (including ES code for emulating it), is described in:
>         https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html> <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html>>
>
>              Current workspace: https://github.com/cyberphone/json-canonicalization <https://github.com/cyberphone/json-canonicalization> <https://github.com/cyberphone/json-canonicalization <https://github.com/cyberphone/json-canonicalization>>
>
>              Thanx,
>              Anders Rundgren
>              _______________________________________________
>              es-discuss mailing list
>         [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>
>         https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss> <https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>>
>
>
>
>

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Mike Samuel


On Mon, Mar 19, 2018 at 10:30 AM, Anders Rundgren <[hidden email]> wrote:
On 2018-03-19 15:17, Mike Samuel wrote:


On Mon, Mar 19, 2018 at 9:53 AM, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:

    On 2018-03-19 14:34, Mike Samuel wrote:

        How does the transform you propose differ from?

        JSON.canonicalize = (x) => JSON.stringify(
              x,
              (_, x) => {
                if (x && typeof x === 'object' && !Array.isArray(x)) {
                  const sorted = {}
                  for (let key of Object.getOwnPropertyNames(x).sort()) {
                    sorted[key] = x[key]
                  }
                  return sorted
                }
                return x
              })


    Probably not all.  You are the JS guru, not me :-)


        The proposal says "in lexical (alphabetical) order."
        If "lexical order" differs from the lexicographic order that sort uses, then
        the above could be adjusted to pass a comparator function.


    I hope (and believe) that this is just a terminology problem.


I think you're right. http://www.ecma-international.org/ecma-262/6.0/#sec-sortcompare
is where it's specified.  After checking that no custom comparator is present:

 1. Let/xString/beToString <http://www.ecma-international.org/ecma-262/6.0/#sec-tostring>(/x/).
 2. ReturnIfAbrupt <http://www.ecma-international.org/ecma-262/6.0/#sec-returnifabrupt>(/xString/).
 3. Let/yString/beToString <http://www.ecma-international.org/ecma-262/6.0/#sec-tostring>(/y/).
 4. ReturnIfAbrupt <http://www.ecma-international.org/ecma-262/6.0/#sec-returnifabrupt>(/yString/).
 5. If/xString/</yString/, return −1.
 6. If/xString/>/yString/, return 1.
 7. Return +0.


(<) and (>) do not themselves bring in any locale-specific collation rules.
They bottom out on http://www.ecma-international.org/ecma-262/6.0/#sec-abstract-relational-comparison

If both/px/and/py/are Strings, then

 1. If/py/is a prefix of/px/, return*false*. (A String value/p/is a prefix of String value/q/if/q/can be the result of concatenating/p/and some other String/r/. Note that any String is a prefix of itself, because/r/may be the empty String.)
 2. If/px/is a prefix of/py/, return*true*.
 3. Let/k/be the smallest nonnegative integer such that the code unit at index/k/within/px/is different from the code unit at index/k/within/py/. (There must be such a/k/, for neither String is a prefix of the other.)
 4. Let/m/be the integer that is the code unit value at index/k/within/px/.
 5. Let/n/be the integer that is the code unit value at index/k/within/py/.
 6. If/m/</n/, return*true*. Otherwise, return*false*.

Those code unit values are UTF-16 code unit values per
http://www.ecma-international.org/ecma-262/6.0/#sec-ecmascript-language-types-string-type

each element in the String is treated as a UTF-16 code unit value

As someone mentioned earlier in this thread, lexicographic string comparisons that use different code
unit sizes can compute different results for the same semantic string value.  Between UTF-8 and UTF-32
you should see no difference, but UTF-16 can differ from those given supplementary codepoints.

It might be worth making explicit that your lexical order is over UTF-16 strings if that's what you intend.

Right, it is actually already in 3.2.3:

My apologies.  I missed that.

  Property strings to be sorted depend on that strings are represented
  as arrays of 16-bit unsigned integers where each integer holds a single
  UCS2/UTF-16 [UNICODE] code unit. The sorting is based on pure value
  comparisons, independent of locale settings.

This maps "natively" to JS and Java.  Probably to .NET as well.
Other systems may need a specific comparator.

Yep.  Off the top of my head:
Go and Rust use UTF-8.
Python3 is UTF-16, Python2 is usually UTF-16 but may be UTF-32 depending on sizeof(wchar) when compiling the interpreter.
C++ as is its wont is all of them.

 

        Applied to your example input,

        JSON.canonicalize({
              "escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
              "other":  [null, true, false],
              "numbers": [1E30, 4.50, 6, 2e-3, 0.000000000000000000000000001]
            }) ===
                String.raw`{"escaping":"€$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}`
        // proposed {"escaping":"\u20ac$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}


        The canonicalized example from section 3.2.3 seems to conflict with the text of 3.2.2:


    If you look a under the result you will find a pretty sad explanation:

             "Note: \u20ac denotes the Euro character, which not
              being ASCII, is currently not displayable in RFCs"


Cool.

    After 30 years with RFCs, we can still only use ASCII :-( :-(

    Updates:
    https://github.com/cyberphone/json-canonicalization/blob/master/JSON.canonicalize.md <https://github.com/cyberphone/json-canonicalization/blob/master/JSON.canonicalize.md>
    https://cyberphone.github.io/doc/security/browser-json-canonicalization.html <https://cyberphone.github.io/doc/security/browser-json-canonicalization.html>


If this can be implemented in a small amount of library code, what do you need from TC39?

At this stage probably nothing, the BIG issue is the algorithm which I took the liberty airing in this forum.
To date all efforts creating a JSON canonicalization standard has been shot down or been abandoned.

Like I said, I think the hashing use case is worthwhile.


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Michael J. Ryan
In reply to this post by Anders Rundgren-2
JSON is utf-8 ... As far as 16 but coffee points, there are still astral character pairs.  Binary data should be enclosed to avoid this, such as with base-64.

On Fri, Mar 16, 2018, 09:23 Mike Samuel <[hidden email]> wrote:


On Fri, Mar 16, 2018 at 11:38 AM, C. Scott Ananian <[hidden email]> wrote:
Canonical JSON is often used to imply a security property: two JSON blobs with identical contents are expected to have identical canonical JSON forms (and thus identical hashed values).

What does "identical contents" mean in the context of numbers?  JSON intentionally avoids specifying any precision for numbers.  

JSON.stringify(1/3) === '0.3333333333333333'

What would happen with JSON from systems that allow higher precision?
I.e., what would (JSON.canonicalize(JSON.stringify(1/3) + '3')) produce?



 
However, unicode normalization allows multiple representations of "the same" string, which defeats this security property.  Depending on your implementation language

We shouldn't normalize unicode in strings that contain packed binary data.  JSON strings are strings of UTF-16 code-units, not Unicode scalar values and any system that assumes the latter will break often.
 
and use, a string with precomposed accepts could compare equal to a string with separated accents, even though the canonical JSON or hash differed.  In an extreme case (with a weak hash function, say MD5), this can be used to break security by re-encoding all strings in multiple variants until a collision is found.  This is just a slight variant on the fact that JSON allows multiple ways to encode a character using escape sequences.  You've already taken the trouble to disambiguate this case; security-conscious applications should take care to perform unicode normalization as well, for the same reason.

Similarly, if you don't offer a verifier to ensure that the input is in "canonical JSON" format, then an attacker can try to create collisions by violating the rules of canonical JSON format, whether by using different escape sequences, adding whitespace, etc.  This can be used to make JSON which is "the same" appear "different", violating the intent of the canonicalization.  Any security application of canonical JSON will require a strict mode for JSON.parse() as well as a strict mode for JSON.stringify().

Given the dodginess of "identical" w.r.t. non-integral numbers, shouldn't endpoints be re-canonicalizing before hashing anyway?  Why would one want to ship the canonical form over the wire if it loses precision?

 
  --scott

On Fri, Mar 16, 2018 at 4:48 AM, Anders Rundgren <[hidden email]> wrote:
On 2018-03-16 08:52, C. Scott Ananian wrote:
See http://wiki.laptop.org/go/Canonical_JSON -- you should probably at least
mention unicode normalization of strings.

Yes, I could add that unicode normalization of strings is out of scope for this specification.


You probably should also specify a validator: it doesn't matter if you emit canonical JSON if you can tweak the hash of the value by feeding non-canonical JSON as an input.

Pardon me, but I don't understand what you are writing here.

Hash functions only "raison d'être" are providing collision safe checksums.

thanx,
Anders


   --scott

On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:

    Dear List,

    Here is a proposal that I would be very happy getting feedback on since it builds on ES but is not (at all) limited to ES.

    The request is for a complement to the ES "JSON" object called canonicalize() which would have identical parameters to the existing stringify() method.

Why should canonicalize take a replacer?  Hasn't replacement already happened?

 
    The JSON canonicalization scheme (including ES code for emulating it), is described in:
    https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html>

    Current workspace: https://github.com/cyberphone/json-canonicalization <https://github.com/cyberphone/json-canonicalization>

    Thanx,
    Anders Rundgren
    _______________________________________________
    es-discuss mailing list
    [hidden email] <mailto:[hidden email]>
    https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>





_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
1234