JSON.canonicalize()

classic Classic list List threaded Threaded
73 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Mike Samuel


On Sun, Mar 18, 2018 at 10:08 AM, Richard Gibson <[hidden email]> wrote:
On Sunday, March 18, 2018, Anders Rundgren <[hidden email]> wrote:
On 2018-03-16 20:24, Richard Gibson wrote:
Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general language-agnostic interchange format, and ECMAScript JSON.stringify is not a JSON canonicalization solution.


Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.

In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.
 
That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact.  I also think it looks a bit weird, but that's just a matter of esthetics.  Compatibility is an entirely different issue.

I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.

Does this mean that the language below would need to be fixed at a specific version of Unicode or that we would need to cite a specific version for
canonicalization but might allow a higher version for String.prototype.normalize and in future versions of the spec require it?

"""
A conforming implementation of ECMAScript must interpret source text input in conformance with the Unicode Standard, Version 5.1.0 or later
"""

and in ECMA 404

"""
For undated references, the latest edition of the referenced document (including any amendments) applies. ISO/IEC 10646, Information Technology – Universal Coded Character Set (UCS) The Unicode Consortium. The Unicode Standard http://www.unicode.org/versions/latest.
"""


Sorting on Unicode Code Points is of course "technically 100% right" but strictly put not necessary.

Certain scenarios call for different systems to _independently_ generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.

Code points include orphaned surrogates in a way that scalar values do not, right?  So both "\uD800" and "\uD800\uDC00" are single codepoints.
It seems like a strict prefix of a string should still sort before that string but prefix transitivity in general does not hold: "\uFFFF" < "\uD800\uDC00" && "\uFFFF" > "\uD800".
That shouldn't cause problems for hashability but I thought I'd raise it just in case.

 
Your claim about uppercase Unicode escapes is incorrect, there is no such requirement:
 
I don't recall ever making a claim about uppercase Unicode escapes, other than observing that it is the preferred form for examples in the JSON RFCs... what are you talking about?


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Anders Rundgren-2
In reply to this post by Richard Gibson
On 2018-03-18 15:08, Richard Gibson wrote:
On Sunday, March 18, 2018, Anders Rundgren <[hidden email]> wrote:
On 2018-03-16 20:24, Richard Gibson wrote:
Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general language-agnostic interchange format, and ECMAScript JSON.stringify is not a JSON canonicalization solution.


Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.

In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.

Here it gets interesting...  What in JSON cannot be expressed through JS and JSON.stringify()?

 
That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact.  I also think it looks a bit weird, but that's just a matter of esthetics.  Compatibility is an entirely different issue.

I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.

Agreed.


Sorting on Unicode Code Points is of course "technically 100% right" but strictly put not necessary.

Certain scenarios call for different systems to _independently_ generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.

Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).


Your claim about uppercase Unicode escapes is incorrect, there is no such requirement:
 
I don't recall ever making a claim about uppercase Unicode escapes, other than observing that it is the preferred form for examples in the JSON RFCs... what are you talking about?

You're right, I found it it in the https://gibson042.github.io/canonicaljson-spec/#changelog

Thanx,
Anders


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

C. Scott Ananian
On Sun, Mar 18, 2018, 10:30 AM Anders Rundgren <[hidden email]> wrote:
Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).

Because there are JavaScript strings which do not form valid UTF-16 code units.  For example, the one-character string '\uD800'. On the input validation side, there are 8-bit strings which can not be decoded as UTF-8.  A complete sorting spec needs to describe how these are to be handled. For example, something like WTF-8: http://simonsapin.github.io/wtf-8/
  --scott
 

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Michał Wadas
In reply to this post by Anders Rundgren-2
JSON supports arbitrary precision numbers that can't be properly represented as 64 bit floats. This includes numbers like eg. 1e9999 or 1/1e9999.


On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <[hidden email]> wrote:
On 2018-03-18 15:08, Richard Gibson wrote:
On Sunday, March 18, 2018, Anders Rundgren <[hidden email]> wrote:
On 2018-03-16 20:24, Richard Gibson wrote:
Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general language-agnostic interchange format, and ECMAScript JSON.stringify is not a JSON canonicalization solution.


Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.

In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.

Here it gets interesting...  What in JSON cannot be expressed through JS and JSON.stringify()?

 
That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact.  I also think it looks a bit weird, but that's just a matter of esthetics.  Compatibility is an entirely different issue.

I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.

Agreed.


Sorting on Unicode Code Points is of course "technically 100% right" but strictly put not necessary.

Certain scenarios call for different systems to _independently_ generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.

Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).


Your claim about uppercase Unicode escapes is incorrect, there is no such requirement:
 
I don't recall ever making a claim about uppercase Unicode escapes, other than observing that it is the preferred form for examples in the JSON RFCs... what are you talking about?

You're right, I found it it in the https://gibson042.github.io/canonicaljson-spec/#changelog

Thanx,
Anders

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Mike Samuel
In reply to this post by C. Scott Ananian


On Sun, Mar 18, 2018 at 10:43 AM, C. Scott Ananian <[hidden email]> wrote:
On Sun, Mar 18, 2018, 10:30 AM Anders Rundgren <[hidden email]> wrote:
Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).

Because there are JavaScript strings which do not form valid UTF-16 code units.  For example, the one-character string '\uD800'. On the input validation side, there are 8-bit strings which can not be decoded as UTF-8.  A complete sorting spec needs to describe how these are to be handled. For example, something like WTF-8: http://simonsapin.github.io/wtf-8/

Let's get terminology straight.
"\uD800" is a valid string of UTF-16 code units.   It is also a valid string of codepoints.  It is not a valid string of scalar values.

http://www.unicode.org/glossary/#code_pointAny value in the Unicode codespace; that is, the range of integers from 0 to 10FFFF16.
http://www.unicode.org/glossary/#code_unitThe minimal bit combination that can represent a unit of encoded text for processing or interchange.
http://www.unicode.org/glossary/#unicode_scalar_valueAny Unicode code point except high-surrogate and low-surrogate code points. In other words, the ranges of integers 0 to D7FF16 and E00016 to 10FFFF16 inclusive.

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Mike Samuel
In reply to this post by Michał Wadas


On Sun, Mar 18, 2018 at 10:47 AM, Michał Wadas <[hidden email]> wrote:
JSON supports arbitrary precision numbers that can't be properly represented as 64 bit floats. This includes numbers like eg. 1e9999 or 1/1e9999.

I posted this on the summary thread but not here.

https://gist.github.com/mikesamuel/20710f94a53e440691f04bf79bc3d756 is structured as a string to string transform, so doesn't lose precision when round-tripping, e.g. Python bigints and Java BigDecimals.

It also avoids a space explosion for 1e9999 which might help blunt timing attacks as discussed earlier in this thread.

 
On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <[hidden email]> wrote:
On 2018-03-18 15:08, Richard Gibson wrote:
On Sunday, March 18, 2018, Anders Rundgren <[hidden email]> wrote:
On 2018-03-16 20:24, Richard Gibson wrote:
Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general language-agnostic interchange format, and ECMAScript JSON.stringify is not a JSON canonicalization solution.


Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.

In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.

Here it gets interesting...  What in JSON cannot be expressed through JS and JSON.stringify()?

 
That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact.  I also think it looks a bit weird, but that's just a matter of esthetics.  Compatibility is an entirely different issue.

I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.

Agreed.


Sorting on Unicode Code Points is of course "technically 100% right" but strictly put not necessary.

Certain scenarios call for different systems to _independently_ generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.

Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).


Your claim about uppercase Unicode escapes is incorrect, there is no such requirement:
 
I don't recall ever making a claim about uppercase Unicode escapes, other than observing that it is the preferred form for examples in the JSON RFCs... what are you talking about?

You're right, I found it it in the https://gibson042.github.io/canonicaljson-spec/#changelog

Thanx,
Anders

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Anders Rundgren-2
In reply to this post by Michał Wadas
On 2018-03-18 15:47, Michał Wadas wrote:
> JSON supports arbitrary precision numbers that can't be properly
> represented as 64 bit floats. This includes numbers like eg. 1e9999 or 1/1e9999.

rfc7159:
    Since software that implements
    IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
    generally available and widely used, good interoperability can be
    achieved by implementations that expect no more precision or range
    than these provide, in the sense that implementations will
    approximate JSON numbers within the expected precision

If interoperability is not an issue you are free to do whatever you feel useful.
Targeting a 0.001% customer base with standards, I gladly leave to others to cater for.

The de-facto standard featured in any number of applications, is putting unusual/binary/whatever stuff in text strings.

Anders

>
>
> On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <[hidden email] <mailto:[hidden email]>> wrote:
>
>     On 2018-03-18 15:08, Richard Gibson wrote:
>>     On Sunday, March 18, 2018, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:
>>
>>         On 2018-03-16 20:24, Richard Gibson wrote:
>>>         Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general /language-agnostic/interchange format, and ECMAScript JSON.stringify is *not*a JSON canonicalization solution.
>>>
>>>         [1]: _http://gibson042.github.io/canonicaljson-spec/_
>>>         [2]: http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
>>>         [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
>>>         [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>>
>>         Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.
>>
>>
>>     In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.
>
>     Here it gets interesting...  What in JSON cannot be expressed through JS and JSON.stringify()?
>
>>         That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact.  I also think it looks a bit weird, but that's just a matter of esthetics.  Compatibility is an entirely different issue.
>>
>>
>>     I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.
>
>     Agreed.
>
>>
>>         Sorting on Unicode Code Points is of course "technically 100% right" but strictly put not necessary.
>>
>>
>>     Certain scenarios call for different systems to _independently_ generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.
>
>     Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).
>
>>
>>         Your claim about uppercase Unicode escapes is incorrect, there is no such requirement:
>>
>>         https://tools.ietf.org/html/rfc8259#section-7
>>
>>     I don't recall ever making a claim about uppercase Unicode escapes, other than observing that it is the preferred form for examples in the JSON RFCs... what are you talking about?
>
>     You're right, I found it it in the https://gibson042.github.io/canonicaljson-spec/#changelog
>
>     Thanx,
>     Anders
>
>     _______________________________________________
>     es-discuss mailing list
>     [hidden email] <mailto:[hidden email]>
>     https://mail.mozilla.org/listinfo/es-discuss
>

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Mike Samuel
Interop with systems that use 64b ints is not a .001% issue.

On Sun, Mar 18, 2018, 11:40 AM Anders Rundgren <[hidden email]> wrote:
On 2018-03-18 15:47, Michał Wadas wrote:
> JSON supports arbitrary precision numbers that can't be properly
> represented as 64 bit floats. This includes numbers like eg. 1e9999 or 1/1e9999.

rfc7159:
    Since software that implements
    IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
    generally available and widely used, good interoperability can be
    achieved by implementations that expect no more precision or range
    than these provide, in the sense that implementations will
    approximate JSON numbers within the expected precision

If interoperability is not an issue you are free to do whatever you feel useful.
Targeting a 0.001% customer base with standards, I gladly leave to others to cater for.

The de-facto standard featured in any number of applications, is putting unusual/binary/whatever stuff in text strings.

Anders

>
>
> On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <[hidden email] <mailto:[hidden email]>> wrote:
>
>     On 2018-03-18 15:08, Richard Gibson wrote:
>>     On Sunday, March 18, 2018, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:
>>
>>         On 2018-03-16 20:24, Richard Gibson wrote:
>>>         Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general /language-agnostic/interchange format, and ECMAScript JSON.stringify is *not*a JSON canonicalization solution.
>>>
>>>         [1]: _http://gibson042.github.io/canonicaljson-spec/_
>>>         [2]: http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
>>>         [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
>>>         [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>>
>>         Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.
>>
>>
>>     In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.
>
>     Here it gets interesting...  What in JSON cannot be expressed through JS and JSON.stringify()?
>
>>         That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact.  I also think it looks a bit weird, but that's just a matter of esthetics.  Compatibility is an entirely different issue.
>>
>>
>>     I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.
>
>     Agreed.
>
>>
>>         Sorting on Unicode Code Points is of course "technically 100% right" but strictly put not necessary.
>>
>>
>>     Certain scenarios call for different systems to _independently_ generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.
>
>     Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).
>
>>
>>         Your claim about uppercase Unicode escapes is incorrect, there is no such requirement:
>>
>>         https://tools.ietf.org/html/rfc8259#section-7
>>
>>     I don't recall ever making a claim about uppercase Unicode escapes, other than observing that it is the preferred form for examples in the JSON RFCs... what are you talking about?
>
>     You're right, I found it it in the https://gibson042.github.io/canonicaljson-spec/#changelog
>
>     Thanx,
>     Anders
>
>     _______________________________________________
>     es-discuss mailing list
>     [hidden email] <mailto:[hidden email]>
>     https://mail.mozilla.org/listinfo/es-discuss
>

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Richard Gibson
In reply to this post by Anders Rundgren-2
On Sun, Mar 18, 2018 at 10:29 AM, Mike Samuel <[hidden email]> wrote:
Does this mean that the language below would need to be fixed at a specific version of Unicode or that we would need to cite a specific version for
canonicalization but might allow a higher version for String.prototype.normalize and in future versions of the spec require it?

"""
A conforming implementation of ECMAScript must interpret source text input in conformance with the Unicode Standard, Version 5.1.0 or later
"""

and in ECMA 404

"""
For undated references, the latest edition of the referenced document (including any amendments) applies. ISO/IEC 10646, Information Technology – Universal Coded Character Set (UCS) The Unicode Consortium. The Unicode Standard http://www.unicode.org/versions/latest.
"""

I can't see why either would have to change. JSON canonicalization should produce a JSON text in UTF-8, using JSON escape sequences only for double quote, backslash, and ASCII control characters U+0000 through U+001F (which are not valid in JSON strings) and unpaired surrogates U+D800 through U+DFFF (which are not conforming UTF-8). The algorithm doesn't need to know whether any given code point has a UCS assignment.

Code points include orphaned surrogates in a way that scalar values do not, right?  So both "\uD800" and "\uD800\uDC00" are single codepoints.
It seems like a strict prefix of a string should still sort before that string but prefix transitivity in general does not hold: "\uFFFF" < "\uD800\uDC00" && "\uFFFF" > "\uD800".
That shouldn't cause problems for hashability but I thought I'd raise it just in case.

IMO, "\uD800\uDC00" should never be emitted because a proper canonicalization would be "𐀀" (character sequence U+0022 QUOTATION MARK, U+10000 LINEAR B SYLLABLE B008 A, U+0022 QUOTATION MARK; octet sequence 0x22, 0xF0, 0x90, 0x80, 0x80, 0x22).

As for sorting, using the represented code points makes sense to me, but is not the only option (e.g., another option is using the literal characters of the JSON text such that "Z" < "\"" < "\\" < "\u0000" < "\u001F" < "\uD800" < "\uDC00" < "^" < "x" < "ä" < "가" < "A" < "🔥" < "🙃"). Any specification of a total deterministic ordering would suffice, it's just that some are less intuitive than others.

On Sun, Mar 18, 2018 at 10:30 AM, Anders Rundgren <[hidden email]> wrote:
On 2018-03-18 15:08, Richard Gibson wrote:
In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.

Here it gets interesting...  What in JSON cannot be expressed through JS and JSON.stringify()?

JSON can express arbitrary numbers, but ECMAScript JSON.stringify is limited to those with an exact IEEE 754 binary64 representation.

And probably more importantly (though not a gap with respect to JSON specifically), it emits octet sequences that don't conform to UTF-8 when serializing unpaired surrogates.

Certain scenarios call for different systems to _independently_ generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.

Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).

Any specification of a total deterministic ordering would suffice. Relying upon 16-bit code units would impose a greater burden on systems that do not use such representations internally, but is not fundamentally broken.

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Anders Rundgren-2
In reply to this post by Mike Samuel
On 2018-03-18 16:47, Mike Samuel wrote:
> Interop with systems that use 64b ints is not a .001% issue.

Certainly not but using "Number" for dealing with such data would never be considered by for example the IETF.

This discussion (at least from my point of view), is about creating stuff that fits into standards.

Anders

>
> On Sun, Mar 18, 2018, 11:40 AM Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:
>
>     On 2018-03-18 15:47, Michał Wadas wrote:
>      > JSON supports arbitrary precision numbers that can't be properly
>      > represented as 64 bit floats. This includes numbers like eg. 1e9999 or 1/1e9999.
>
>     rfc7159:
>          Since software that implements
>          IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
>          generally available and widely used, good interoperability can be
>          achieved by implementations that expect no more precision or range
>          than these provide, in the sense that implementations will
>          approximate JSON numbers within the expected precision
>
>     If interoperability is not an issue you are free to do whatever you feel useful.
>     Targeting a 0.001% customer base with standards, I gladly leave to others to cater for.
>
>     The de-facto standard featured in any number of applications, is putting unusual/binary/whatever stuff in text strings.
>
>     Anders
>
>      >
>      >
>      > On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>      >
>      >     On 2018-03-18 15:08, Richard Gibson wrote:
>      >>     On Sunday, March 18, 2018, Anders Rundgren <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>      >>
>      >>         On 2018-03-16 20:24, Richard Gibson wrote:
>      >>>         Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general /language-agnostic/interchange format, and ECMAScript JSON.stringify is *not*a JSON canonicalization solution.
>      >>>
>      >>>         [1]: _http://gibson042.github.io/canonicaljson-spec/_
>      >>>         [2]: http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
>      >>>         [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
>      >>>         [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>      >>
>      >>         Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.
>      >>
>      >>
>      >>     In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.
>      >
>      >     Here it gets interesting...  What in JSON cannot be expressed through JS and JSON.stringify()?
>      >
>      >>         That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact.  I also think it looks a bit weird, but that's just a matter of esthetics.  Compatibility is an entirely different issue.
>      >>
>      >>
>      >>     I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.
>      >
>      >     Agreed.
>      >
>      >>
>      >>         Sorting on Unicode Code Points is of course "technically 100% right" but strictly put not necessary.
>      >>
>      >>
>      >>     Certain scenarios call for different systems to _independently_ generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.
>      >
>      >     Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).
>      >
>      >>
>      >>         Your claim about uppercase Unicode escapes is incorrect, there is no such requirement:
>      >>
>      >> https://tools.ietf.org/html/rfc8259#section-7
>      >>
>      >>     I don't recall ever making a claim about uppercase Unicode escapes, other than observing that it is the preferred form for examples in the JSON RFCs... what are you talking about?
>      >
>      >     You're right, I found it it in the https://gibson042.github.io/canonicaljson-spec/#changelog
>      >
>      >     Thanx,
>      >     Anders
>      >
>      >     _______________________________________________
>      >     es-discuss mailing list
>      > [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>
>      > https://mail.mozilla.org/listinfo/es-discuss
>      >
>
>     _______________________________________________
>     es-discuss mailing list
>     [hidden email] <mailto:[hidden email]>
>     https://mail.mozilla.org/listinfo/es-discuss
>

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Summary of Input. Re: JSON.canonicalize()

Anders Rundgren-2
In reply to this post by Mike Samuel
On 2018-03-18 15:13, Mike Samuel wrote:

>
>
> On Sun, Mar 18, 2018 at 2:14 AM, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Hi Guys,
>
>     Pardon me if you think I was hyperbolic,
>     The discussion got derailed by the bogus claims about hash functions' vulnerability.
>
>
> I didn't say I "think" you were being hyperbolic.  I asked whether you were.
>
> You asserted a number that seemed high to me.
> I demonstrated it was high by a factor of at least 25 by showing an implementation that
> used 80 lines instead of the 2000 you said was required.
>
> If you're going to put out a number as a reason to dismiss an argument, you should own it
> or retract it.
> Were you being hyperbolic?  (Y/N)
N.
To be completely honest I have only considered fullblown serializers and they typically come in the mentioned size.

Your solution have existed a couple days; we may need a little bit more time thinking about it :-)


> Your claim and my counterclaim are in no way linked to hash function vulnerability.
> I never weighed in on that claim and have already granted that hashable JSON is a
> worthwhile use case.

Great!  So we can finally put that argument to rest.


>
>     F.Y.I: Using ES6 serialization methods for JSON primitive types is headed for standardization in the IETF.
>     https://www.ietf.org/mail-archive/web/jose/current/msg05716.html <https://www.ietf.org/mail-archive/web/jose/current/msg05716.html>
>
>     This effort is backed by one of the main authors behind the current de-facto standard for Signed and Encrypted JSON, aka JOSE.
>     If this is in your opinion is a bad idea, now is the right time to shoot it down :-)
>
>
> Does this main author prefer your particular JSON canonicalization scheme to
> others?

This proposal does [currently] not rely on canonicalization but on ES6 "predictive parsing and serialization".


> Is this an informed opinion based on flaws in the others that make them less suitable for
> JOSE's needs that are not present in the scheme you back?

A JSON canonicalization scheme has AFAIK never been considered in the relevant IETF groups (JOSE+JSON).
On the contrary, it has been dismissed as a daft idea.

I haven't yet submitted my [private] I-D. I'm basically here for collecting input and finding possible collaborators.

>
> If so, please provide links to their reasoning.
> If not, how is their backing relevant?

If ES6/JSON.stringify() way of serializing JSON primitives becomes an IETF standard with backed by Microsoft, it may have an impact on the "market".

>
>     This efforts also exploits the ability of JSON.parse() and JSON.stringify() honoring object "Creation Order".
>
>     JSON.canonicalize() would be a "Sorting" alternative to "Creation Order" offering certain advantages with limiting deployment impact to JSON serializers as the most important one.
>
>     The ["completely broken"] sample code was only submitted as a proof-of-concept. I'm sure you JS gurus can do this way better than I :-)
>
>
> This is a misquote.  No-one has said your sample code was completely broken.
> Neither your sample code nor the spec deals with toJSON.  At some point you're
> going to have to address that if you want to keep your proposal moving forward.

It is possible that I don't understand what you are asking for here since I have no experience with toJSON.

Based on this documentation
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify
JSON.canonicalize() would though work out of the box (when integrated in the JSON object NB...) since it would inherit all the functionality (and 99% of the code) of JSON.stringify()

> No amount of JS guru-ry is going to save your sample code from a specification bug.
>
>
>     Creating an alternative based on [1,2,3] seems like a rather daunting task.
>
>
> Maybe if you spend more time laying out the criteria on which a successful proposal
> should be judged, we could move towards consensus on this claim.

Since you have already slashed my proposal there is probably not so much consensus to find...

Anders


>
> As it is, I have only your say so but I have reason to doubt your evaluation
> of task complexity unless you were being hyperbolic before.

It is a free world, you may doubt my competence, motives, whatever.

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Mike Samuel
In reply to this post by Anders Rundgren-2
A definition of canonical that is not tied to JavaScript's current range of values would fit into more standards than the proposal as it stands.

On Sun, Mar 18, 2018, 12:15 PM Anders Rundgren <[hidden email]> wrote:
On 2018-03-18 16:47, Mike Samuel wrote:
> Interop with systems that use 64b ints is not a .001% issue.

Certainly not but using "Number" for dealing with such data would never be considered by for example the IETF.

This discussion (at least from my point of view), is about creating stuff that fits into standards.

Anders

>
> On Sun, Mar 18, 2018, 11:40 AM Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:
>
>     On 2018-03-18 15:47, Michał Wadas wrote:
>      > JSON supports arbitrary precision numbers that can't be properly
>      > represented as 64 bit floats. This includes numbers like eg. 1e9999 or 1/1e9999.
>
>     rfc7159:
>          Since software that implements
>          IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
>          generally available and widely used, good interoperability can be
>          achieved by implementations that expect no more precision or range
>          than these provide, in the sense that implementations will
>          approximate JSON numbers within the expected precision
>
>     If interoperability is not an issue you are free to do whatever you feel useful.
>     Targeting a 0.001% customer base with standards, I gladly leave to others to cater for.
>
>     The de-facto standard featured in any number of applications, is putting unusual/binary/whatever stuff in text strings.
>
>     Anders
>
>      >
>      >
>      > On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>      >
>      >     On 2018-03-18 15:08, Richard Gibson wrote:
>      >>     On Sunday, March 18, 2018, Anders Rundgren <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>      >>
>      >>         On 2018-03-16 20:24, Richard Gibson wrote:
>      >>>         Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general /language-agnostic/interchange format, and ECMAScript JSON.stringify is *not*a JSON canonicalization solution.
>      >>>
>      >>>         [1]: _http://gibson042.github.io/canonicaljson-spec/_
>      >>>         [2]: http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
>      >>>         [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
>      >>>         [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>      >>
>      >>         Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.
>      >>
>      >>
>      >>     In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.
>      >
>      >     Here it gets interesting...  What in JSON cannot be expressed through JS and JSON.stringify()?
>      >
>      >>         That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact.  I also think it looks a bit weird, but that's just a matter of esthetics.  Compatibility is an entirely different issue.
>      >>
>      >>
>      >>     I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.
>      >
>      >     Agreed.
>      >
>      >>
>      >>         Sorting on Unicode Code Points is of course "technically 100% right" but strictly put not necessary.
>      >>
>      >>
>      >>     Certain scenarios call for different systems to _independently_ generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.
>      >
>      >     Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).
>      >
>      >>
>      >>         Your claim about uppercase Unicode escapes is incorrect, there is no such requirement:
>      >>
>      >> https://tools.ietf.org/html/rfc8259#section-7
>      >>
>      >>     I don't recall ever making a claim about uppercase Unicode escapes, other than observing that it is the preferred form for examples in the JSON RFCs... what are you talking about?
>      >
>      >     You're right, I found it it in the https://gibson042.github.io/canonicaljson-spec/#changelog
>      >
>      >     Thanx,
>      >     Anders
>      >
>      >     _______________________________________________
>      >     es-discuss mailing list
>      > [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>
>      > https://mail.mozilla.org/listinfo/es-discuss
>      >
>
>     _______________________________________________
>     es-discuss mailing list
>     [hidden email] <mailto:[hidden email]>
>     https://mail.mozilla.org/listinfo/es-discuss
>


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Anders Rundgren-2
On 2018-03-18 18:40, Mike Samuel wrote:
> A definition of canonical that is not tied to JavaScript's current range of values would fit into more standards than the proposal as it stands.
Feel free submitting an Internet-Draft which addresses a more generic Number handling.
My guess is that it would be rejected due to [quite valid] interoperability concerns.

It would probably fall in the same category as "Fixing JSON" which has not happened either.
https://www.tbray.org/ongoing/When/201x/2016/08/20/Fixing-JSON

Anders

>
> On Sun, Mar 18, 2018, 12:15 PM Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:
>
>     On 2018-03-18 16:47, Mike Samuel wrote:
>      > Interop with systems that use 64b ints is not a .001% issue.
>
>     Certainly not but using "Number" for dealing with such data would never be considered by for example the IETF.
>
>     This discussion (at least from my point of view), is about creating stuff that fits into standards.
>
>     Anders
>
>      >
>      > On Sun, Mar 18, 2018, 11:40 AM Anders Rundgren <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>      >
>      >     On 2018-03-18 15:47, Michał Wadas wrote:
>      >      > JSON supports arbitrary precision numbers that can't be properly
>      >      > represented as 64 bit floats. This includes numbers like eg. 1e9999 or 1/1e9999.
>      >
>      >     rfc7159:
>      >          Since software that implements
>      >          IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
>      >          generally available and widely used, good interoperability can be
>      >          achieved by implementations that expect no more precision or range
>      >          than these provide, in the sense that implementations will
>      >          approximate JSON numbers within the expected precision
>      >
>      >     If interoperability is not an issue you are free to do whatever you feel useful.
>      >     Targeting a 0.001% customer base with standards, I gladly leave to others to cater for.
>      >
>      >     The de-facto standard featured in any number of applications, is putting unusual/binary/whatever stuff in text strings.
>      >
>      >     Anders
>      >
>      >      >
>      >      >
>      >      > On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>> wrote:
>      >      >
>      >      >     On 2018-03-18 15:08, Richard Gibson wrote:
>      >      >>     On Sunday, March 18, 2018, Anders Rundgren <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>> wrote:
>      >      >>
>      >      >>         On 2018-03-16 20:24, Richard Gibson wrote:
>      >      >>>         Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general /language-agnostic/interchange format, and ECMAScript JSON.stringify is *not*a JSON canonicalization solution.
>      >      >>>
>      >      >>>         [1]: _http://gibson042.github.io/canonicaljson-spec/_
>      >      >>>         [2]: http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
>      >      >>>         [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
>      >      >>>         [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>      >      >>
>      >      >>         Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.
>      >      >>
>      >      >>
>      >      >>     In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.
>      >      >
>      >      >     Here it gets interesting...  What in JSON cannot be expressed through JS and JSON.stringify()?
>      >      >
>      >      >>         That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact.  I also think it looks a bit weird, but that's just a matter of esthetics.  Compatibility is an entirely different issue.
>      >      >>
>      >      >>
>      >      >>     I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.
>      >      >
>      >      >     Agreed.
>      >      >
>      >      >>
>      >      >>         Sorting on Unicode Code Points is of course "technically 100% right" but strictly put not necessary.
>      >      >>
>      >      >>
>      >      >>     Certain scenarios call for different systems to _independently_ generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.
>      >      >
>      >      >     Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).
>      >      >
>      >      >>
>      >      >>         Your claim about uppercase Unicode escapes is incorrect, there is no such requirement:
>      >      >>
>      >      >> https://tools.ietf.org/html/rfc8259#section-7
>      >      >>
>      >      >>     I don't recall ever making a claim about uppercase Unicode escapes, other than observing that it is the preferred form for examples in the JSON RFCs... what are you talking about?
>      >      >
>      >      >     You're right, I found it it in the https://gibson042.github.io/canonicaljson-spec/#changelog
>      >      >
>      >      >     Thanx,
>      >      >     Anders
>      >      >
>      >      >     _______________________________________________
>      >      >     es-discuss mailing list
>      >      > [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>
>      >      > https://mail.mozilla.org/listinfo/es-discuss
>      >      >
>      >
>      >     _______________________________________________
>      >     es-discuss mailing list
>      > [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>
>      > https://mail.mozilla.org/listinfo/es-discuss
>      >
>

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Mike Samuel
I think you misunderstood the criticism.  JSON does not have numeric precision limits.  There are plenty of systems that use JSON that never involve JavaScript and which pack int64s.

On Sun, Mar 18, 2018, 1:55 PM Anders Rundgren <[hidden email]> wrote:
On 2018-03-18 18:40, Mike Samuel wrote:
> A definition of canonical that is not tied to JavaScript's current range of values would fit into more standards than the proposal as it stands.
Feel free submitting an Internet-Draft which addresses a more generic Number handling.
My guess is that it would be rejected due to [quite valid] interoperability concerns.

It would probably fall in the same category as "Fixing JSON" which has not happened either.
https://www.tbray.org/ongoing/When/201x/2016/08/20/Fixing-JSON

Anders

>
> On Sun, Mar 18, 2018, 12:15 PM Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:
>
>     On 2018-03-18 16:47, Mike Samuel wrote:
>      > Interop with systems that use 64b ints is not a .001% issue.
>
>     Certainly not but using "Number" for dealing with such data would never be considered by for example the IETF.
>
>     This discussion (at least from my point of view), is about creating stuff that fits into standards.
>
>     Anders
>
>      >
>      > On Sun, Mar 18, 2018, 11:40 AM Anders Rundgren <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>      >
>      >     On 2018-03-18 15:47, Michał Wadas wrote:
>      >      > JSON supports arbitrary precision numbers that can't be properly
>      >      > represented as 64 bit floats. This includes numbers like eg. 1e9999 or 1/1e9999.
>      >
>      >     rfc7159:
>      >          Since software that implements
>      >          IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
>      >          generally available and widely used, good interoperability can be
>      >          achieved by implementations that expect no more precision or range
>      >          than these provide, in the sense that implementations will
>      >          approximate JSON numbers within the expected precision
>      >
>      >     If interoperability is not an issue you are free to do whatever you feel useful.
>      >     Targeting a 0.001% customer base with standards, I gladly leave to others to cater for.
>      >
>      >     The de-facto standard featured in any number of applications, is putting unusual/binary/whatever stuff in text strings.
>      >
>      >     Anders
>      >
>      >      >
>      >      >
>      >      > On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>> wrote:
>      >      >
>      >      >     On 2018-03-18 15:08, Richard Gibson wrote:
>      >      >>     On Sunday, March 18, 2018, Anders Rundgren <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>> wrote:
>      >      >>
>      >      >>         On 2018-03-16 20:24, Richard Gibson wrote:
>      >      >>>         Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general /language-agnostic/interchange format, and ECMAScript JSON.stringify is *not*a JSON canonicalization solution.
>      >      >>>
>      >      >>>         [1]: _http://gibson042.github.io/canonicaljson-spec/_
>      >      >>>         [2]: http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
>      >      >>>         [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
>      >      >>>         [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>      >      >>
>      >      >>         Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.
>      >      >>
>      >      >>
>      >      >>     In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.
>      >      >
>      >      >     Here it gets interesting...  What in JSON cannot be expressed through JS and JSON.stringify()?
>      >      >
>      >      >>         That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact.  I also think it looks a bit weird, but that's just a matter of esthetics.  Compatibility is an entirely different issue.
>      >      >>
>      >      >>
>      >      >>     I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.
>      >      >
>      >      >     Agreed.
>      >      >
>      >      >>
>      >      >>         Sorting on Unicode Code Points is of course "technically 100% right" but strictly put not necessary.
>      >      >>
>      >      >>
>      >      >>     Certain scenarios call for different systems to _independently_ generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.
>      >      >
>      >      >     Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).
>      >      >
>      >      >>
>      >      >>         Your claim about uppercase Unicode escapes is incorrect, there is no such requirement:
>      >      >>
>      >      >> https://tools.ietf.org/html/rfc8259#section-7
>      >      >>
>      >      >>     I don't recall ever making a claim about uppercase Unicode escapes, other than observing that it is the preferred form for examples in the JSON RFCs... what are you talking about?
>      >      >
>      >      >     You're right, I found it it in the https://gibson042.github.io/canonicaljson-spec/#changelog
>      >      >
>      >      >     Thanx,
>      >      >     Anders
>      >      >
>      >      >     _______________________________________________
>      >      >     es-discuss mailing list
>      >      > [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>
>      >      > https://mail.mozilla.org/listinfo/es-discuss
>      >      >
>      >
>      >     _______________________________________________
>      >     es-discuss mailing list
>      > [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>
>      > https://mail.mozilla.org/listinfo/es-discuss
>      >
>


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Summary of Input. Re: JSON.canonicalize()

C. Scott Ananian
In reply to this post by Anders Rundgren-2
On Fri, Mar 16, 2018 at 9:42 PM, Anders Rundgren <[hidden email]> wrote:
Scott A:
https://en.wikipedia.org/wiki/Security_level
"For example, SHA-256 offers 128-bit collision resistance"
That is, the claims that there are cryptographic issues w.r.t. to Unicode Normalization are (fortunately) incorrect.
Well, if you actually do normalize Unicode, signatures would indeed break, so you don't.

Where do you specify SHA-256 signatures in your standard?

If one were to use MD5 signatures, they would indeed break in the way I describe.

It is good security practice to assume that currently-unbroken algorithms may eventually break in similar ways to discovered flaws in older algorithms.  But in any case, it is simply not good practice to allow multiple valid representations of content, if your aim is for a "canonical' representation.
  --scott


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Anders Rundgren-2
In reply to this post by Mike Samuel
On 2018-03-18 19:04, Mike Samuel wrote:
> I think you misunderstood the criticism.  JSON does not have numeric
> precision limits.  

I think I understood that, yes.

> There are plenty of systems that use JSON that never
> involve JavaScript and which pack int64s.

Sure, but if these systems use the "Number" type they belong to a proprietary world where disregarding recommendations and best practices is OK.

BTW, this an ECMAScript mailing list, why push non-JS complient ideas here?

Anders

>
> On Sun, Mar 18, 2018, 1:55 PM Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:
>
>     On 2018-03-18 18:40, Mike Samuel wrote:
>      > A definition of canonical that is not tied to JavaScript's current range of values would fit into more standards than the proposal as it stands.
>     Feel free submitting an Internet-Draft which addresses a more generic Number handling.
>     My guess is that it would be rejected due to [quite valid] interoperability concerns.
>
>     It would probably fall in the same category as "Fixing JSON" which has not happened either.
>     https://www.tbray.org/ongoing/When/201x/2016/08/20/Fixing-JSON
>
>     Anders
>
>      >
>      > On Sun, Mar 18, 2018, 12:15 PM Anders Rundgren <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>      >
>      >     On 2018-03-18 16:47, Mike Samuel wrote:
>      >      > Interop with systems that use 64b ints is not a .001% issue.
>      >
>      >     Certainly not but using "Number" for dealing with such data would never be considered by for example the IETF.
>      >
>      >     This discussion (at least from my point of view), is about creating stuff that fits into standards.
>      >
>      >     Anders
>      >
>      >      >
>      >      > On Sun, Mar 18, 2018, 11:40 AM Anders Rundgren <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>> wrote:
>      >      >
>      >      >     On 2018-03-18 15:47, Michał Wadas wrote:
>      >      >      > JSON supports arbitrary precision numbers that can't be properly
>      >      >      > represented as 64 bit floats. This includes numbers like eg. 1e9999 or 1/1e9999.
>      >      >
>      >      >     rfc7159:
>      >      >          Since software that implements
>      >      >          IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
>      >      >          generally available and widely used, good interoperability can be
>      >      >          achieved by implementations that expect no more precision or range
>      >      >          than these provide, in the sense that implementations will
>      >      >          approximate JSON numbers within the expected precision
>      >      >
>      >      >     If interoperability is not an issue you are free to do whatever you feel useful.
>      >      >     Targeting a 0.001% customer base with standards, I gladly leave to others to cater for.
>      >      >
>      >      >     The de-facto standard featured in any number of applications, is putting unusual/binary/whatever stuff in text strings.
>      >      >
>      >      >     Anders
>      >      >
>      >      >      >
>      >      >      >
>      >      >      > On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>>> wrote:
>      >      >      >
>      >      >      >     On 2018-03-18 15:08, Richard Gibson wrote:
>      >      >      >>     On Sunday, March 18, 2018, Anders Rundgren <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>>> wrote:
>      >      >      >>
>      >      >      >>         On 2018-03-16 20:24, Richard Gibson wrote:
>      >      >      >>>         Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a fully general solution such as [1]. Both its number serialization [2] and string serialization [3] specify aspects that harm compatibility (the former having arbitrary branches dependent upon the value of numbers, the latter being capable of producing invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for exchange outside of a closed ecosystem [4]). JSON is a general /language-agnostic/interchange format, and ECMAScript JSON.stringify is *not*a JSON canonicalization solution.
>      >      >      >>>
>      >      >      >>>         [1]: _http://gibson042.github.io/canonicaljson-spec/_
>      >      >      >>>         [2]: http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
>      >      >      >>>         [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
>      >      >      >>>         [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>      >      >      >>
>      >      >      >>         Richard, I may be wrong but AFAICT, our respective canoncalization schemes are in fact principally IDENTICAL.
>      >      >      >>
>      >      >      >>
>      >      >      >>     In that they have the same goal, yes. In that they both achieve that goal, no. I'm not married to choices like exponential notation and uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.
>      >      >      >
>      >      >      >     Here it gets interesting...  What in JSON cannot be expressed through JS and JSON.stringify()?
>      >      >      >
>      >      >      >>         That the number serialization provided by JSON.stringify() is unacceptable, is not generally taken as a fact.  I also think it looks a bit weird, but that's just a matter of esthetics.  Compatibility is an entirely different issue.
>      >      >      >>
>      >      >      >>
>      >      >      >>     I concede this point. The modified algorithm is sufficient, but note that a canonicalization scheme will remain static even if ECMAScript changes.
>      >      >      >
>      >      >      >     Agreed.
>      >      >      >
>      >      >      >>
>      >      >      >>         Sorting on Unicode Code Points is of course "technically 100% right" but strictly put not necessary.
>      >      >      >>
>      >      >      >>
>      >      >      >>     Certain scenarios call for different systems to _independently_ generate equivalent data structures, and it is a necessary property of canonical serialization that it yields identical results for equivalent data structures. JSON does not specify significance of object member ordering, so member ordering does not distinguish otherwise equivalent objects, so canonicalization MUST specify member ordering that is deterministic with respect to all valid data.
>      >      >      >
>      >      >      >     Violently agree but do not understand (I guess I'm just dumb...) why (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal (although the result would differ).
>      >      >      >
>      >      >      >>
>      >      >      >>         Your claim about uppercase Unicode escapes is incorrect, there is no such requirement:
>      >      >      >>
>      >      >      >> https://tools.ietf.org/html/rfc8259#section-7
>      >      >      >>
>      >      >      >>     I don't recall ever making a claim about uppercase Unicode escapes, other than observing that it is the preferred form for examples in the JSON RFCs... what are you talking about?
>      >      >      >
>      >      >      >     You're right, I found it it in the https://gibson042.github.io/canonicaljson-spec/#changelog
>      >      >      >
>      >      >      >     Thanx,
>      >      >      >     Anders
>      >      >      >
>      >      >      >     _______________________________________________
>      >      >      >     es-discuss mailing list
>      >      >      > [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>>
>      >      >      > https://mail.mozilla.org/listinfo/es-discuss
>      >      >      >
>      >      >
>      >      >     _______________________________________________
>      >      >     es-discuss mailing list
>      >      > [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>>
>      >      > https://mail.mozilla.org/listinfo/es-discuss
>      >      >
>      >
>

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Summary of Input. Re: JSON.canonicalize()

Anders Rundgren-2
In reply to this post by C. Scott Ananian
On 2018-03-18 19:08, C. Scott Ananian wrote:

> On Fri, Mar 16, 2018 at 9:42 PM, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Scott A:
>     https://en.wikipedia.org/wiki/Security_level <https://en.wikipedia.org/wiki/Security_level>
>     "For example, SHA-256 offers 128-bit collision resistance"
>     That is, the claims that there are cryptographic issues w.r.t. to Unicode Normalization are (fortunately) incorrect.
>     Well, if you actually do normalize Unicode, signatures would indeed break, so you don't.
>
>
> Where do you specify SHA-256 signatures in your standard?
>
> If one were to use MD5 signatures, they would indeed break in the way I describe.
>
> It is good security practice to assume that currently-unbroken algorithms may eventually break in similar ways to discovered flaws in older algorithms.  But in any case, it is simply not good practice to allow multiple valid representations of content, if your aim is for a "canonical' representation.

Other people could chime in on this since I have already declared my position on this topic.  BTW, my proposal comes without cryptographic algorithms.

Does Unicode Normalization [naturally] belong to the canonicalization issue we are currently discussing?  I didn't see any of that in Richard's and Mike's specs. at least.

Anders

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Mike Samuel
In reply to this post by Anders Rundgren-2


On Sun, Mar 18, 2018 at 2:18 PM, Anders Rundgren <[hidden email]> wrote:
On 2018-03-18 19:04, Mike Samuel wrote:
I think you misunderstood the criticism.  JSON does not have numeric precision limits. 

I think I understood that, yes.

There are plenty of systems that use JSON that never
involve JavaScript and which pack int64s.

Sure, but if these systems use the "Number" type they belong to a proprietary world where disregarding recommendations and best practices is OK.

No.  They are simply not following a SHOULD recommendation.
I think you have a variance mismatch in your argument.

 
BTW, this an ECMAScript mailing list, why push non-JS complient ideas here?

Let's review.

You asserted "This discussion (at least from my point of view), is about creating stuff that fits into standards."

I agreed and pointed out that not tying the definition to JavaScript's current value limitations would allow it to fit into
standards that do not assume those limitations.

You leveled this criticism: "My guess is that it would be rejected due to [quite valid] interoperability concerns."
Implicit in that is when one standard specifies that an input MUST have a property that conflicts with
an output that a conforming implementation MAY or SHOULD produce then you have an interoperability concern.


But, you are trying to argue that your proposal is more interoperable because it works for fewer inputs in fewer contexts
and, if it were ported to other languages, would reject JSON that is parseable without loss of precision in those languages.
How you can say with a straight face that being non-runtime-agnostic makes a proposal more interoperable is beyond me.


Here's where variance comes in.
MUST on output makes a standard more interoperable.
MAY on input makes a standard more interoperable.

SHOULD and SHOULD NOT do not justify denying service.
They are guidelines that should be followed absent a compelling reason -- specific rules trumps the general.


Your proposal is less interoperable because you are quoting a SHOULD,
interpreting it as MUST and saying inputs MUST fit into an IEEE 754 double without loss of precision.

This makes it strictly less interoperable than a proposal that does not have that constraint.


EmcaScript SHOULD encourage interoperability since it is often a glue language.

At the risk of getting meta-,
TC39 SHOULD prefer library functions that provide service for arbitrary inputs in their range.
TC39 SHOULD prefer library functions that MUST NOT, by virtue of their semantics,
lose precision silently.


Your proposal fails to be more interoperable inasmuch as it reproduces
    JSON.stringify(JSON.parse('1e1000')) === 'null'


There is simply no need to convert a JSON string to JavaScript values in order to hash it.
There is simply no need to specify this in terms of JavaScript values when a runtime
agnostic implementation that takes a string and produces a string provides the same value.


This is all getting very tedious though.
I and others have been trying to move towards consensus on what a hashable form of 
JSON should look like.

We've identified key areas including
* property ordering,
* number canonicalization,
* string normalization,
* whether the input should be a JS value or a string of JSON,
* and others

but, as in this case, you seem to be arguing both sides of a position to support your
proposal when you could just say "yes, the proposal could be adjusted along this
dimension and still provide what's required."


If you plan on putting a proposal before TC39 are you willing to move on any of these.
or are you asking for a YES/NO vote on a proposal that is largely the same as what
you've presented?


If the former, then acknowledge that there is a range of options and collect feedback
instead of sticking to "the presently drafted one is good enough."
If the latter, then I vote NO because I think the proposal in its current form is a poor
solution to the problem.

That's not to say that you've done bad work.
Most non-incremental stage 0 proposals are poor, and the process is designed to
integrate the ideas of people in different specialties to turn poor solutions to interesting
problems into robust solutions to a wider range of problems than originally envisioned.


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Summary of Input. Re: JSON.canonicalize()

C. Scott Ananian
In reply to this post by Anders Rundgren-2
IMO it belongs, at the level of a SHOULD recommendation when the data represented is intended to be a Unicode string. (But not a MUST because neither Javascript's 16-bit strings nor the 8-bit JSON representation necessarily represent Unicode strings.)

But I've said this already.
  --scott

On Sun, Mar 18, 2018, 2:48 PM Anders Rundgren <[hidden email]> wrote:
On 2018-03-18 19:08, C. Scott Ananian wrote:
> On Fri, Mar 16, 2018 at 9:42 PM, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Scott A:
>     https://en.wikipedia.org/wiki/Security_level <https://en.wikipedia.org/wiki/Security_level>
>     "For example, SHA-256 offers 128-bit collision resistance"
>     That is, the claims that there are cryptographic issues w.r.t. to Unicode Normalization are (fortunately) incorrect.
>     Well, if you actually do normalize Unicode, signatures would indeed break, so you don't.
>
>
> Where do you specify SHA-256 signatures in your standard?
>
> If one were to use MD5 signatures, they would indeed break in the way I describe.
>
> It is good security practice to assume that currently-unbroken algorithms may eventually break in similar ways to discovered flaws in older algorithms.  But in any case, it is simply not good practice to allow multiple valid representations of content, if your aim is for a "canonical' representation.

Other people could chime in on this since I have already declared my position on this topic.  BTW, my proposal comes without cryptographic algorithms.

Does Unicode Normalization [naturally] belong to the canonicalization issue we are currently discussing?  I didn't see any of that in Richard's and Mike's specs. at least.

Anders


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Summary of Input. Re: JSON.canonicalize()

Mike Samuel
In reply to this post by Anders Rundgren-2


On Sun, Mar 18, 2018 at 12:50 PM, Anders Rundgren <[hidden email]> wrote:
On 2018-03-18 15:13, Mike Samuel wrote:


On Sun, Mar 18, 2018 at 2:14 AM, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:

    Hi Guys,

    Pardon me if you think I was hyperbolic,
    The discussion got derailed by the bogus claims about hash functions' vulnerability.


I didn't say I "think" you were being hyperbolic.  I asked whether you were.

You asserted a number that seemed high to me.
I demonstrated it was high by a factor of at least 25 by showing an implementation that
used 80 lines instead of the 2000 you said was required.

If you're going to put out a number as a reason to dismiss an argument, you should own it
or retract it.
Were you being hyperbolic?  (Y/N)
N.
To be completely honest I have only considered fullblown serializers and they typically come in the mentioned size.

Your solution have existed a couple days; we may need a little bit more time thinking about it :-)

Fair enough.
 

Your claim and my counterclaim are in no way linked to hash function vulnerability.
I never weighed in on that claim and have already granted that hashable JSON is a
worthwhile use case.

Great!  So we can finally put that argument to rest.

No.  I don't disagree with you, but I don't speak for whoever did.

 


    F.Y.I: Using ES6 serialization methods for JSON primitive types is headed for standardization in the IETF.
    https://www.ietf.org/mail-archive/web/jose/current/msg05716.html <https://www.ietf.org/mail-archive/web/jose/current/msg05716.html>

    This effort is backed by one of the main authors behind the current de-facto standard for Signed and Encrypted JSON, aka JOSE.
    If this is in your opinion is a bad idea, now is the right time to shoot it down :-)


Does this main author prefer your particular JSON canonicalization scheme to
others?

This proposal does [currently] not rely on canonicalization but on ES6 "predictive parsing and serialization".


Is this an informed opinion based on flaws in the others that make them less suitable for
JOSE's needs that are not present in the scheme you back?

A JSON canonicalization scheme has AFAIK never been considered in the relevant IETF groups (JOSE+JSON).
On the contrary, it has been dismissed as a daft idea.

I haven't yet submitted my [private] I-D. I'm basically here for collecting input and finding possible collaborators.


If so, please provide links to their reasoning.
If not, how is their backing relevant?

If ES6/JSON.stringify() way of serializing JSON primitives becomes an IETF standard with backed by Microsoft, it may have an impact on the "market".

If you can't tell us anything concrete about your backers, what they back, or why they back it, then why bring it up?

 

    This efforts also exploits the ability of JSON.parse() and JSON.stringify() honoring object "Creation Order".

    JSON.canonicalize() would be a "Sorting" alternative to "Creation Order" offering certain advantages with limiting deployment impact to JSON serializers as the most important one.

    The ["completely broken"] sample code was only submitted as a proof-of-concept. I'm sure you JS gurus can do this way better than I :-)


This is a misquote.  No-one has said your sample code was completely broken.
Neither your sample code nor the spec deals with toJSON.  At some point you're
going to have to address that if you want to keep your proposal moving forward.

It is possible that I don't understand what you are asking for here since I have no experience with toJSON.

Based on this documentation
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify
JSON.canonicalize() would though work out of the box (when integrated in the JSON object NB...) since it would inherit all the functionality (and 99% of the code) of JSON.stringify()

JSON.stringify(new Date()) has specific semantics because Date.prototype.toJSON has specific semantics.
As currently written, JSON.canonicalize(new Date()) === JSON.canonicalize({})

 


No amount of JS guru-ry is going to save your sample code from a specification bug.


    Creating an alternative based on [1,2,3] seems like a rather daunting task.


Maybe if you spend more time laying out the criteria on which a successful proposal
should be judged, we could move towards consensus on this claim.

Since you have already slashed my proposal there is probably not so much consensus to find...

I didn't mean to slash anything.

I like parts of your proposal and dislike others.  I talk more about the bits that I don't like
because that's the purpose of this list.
For example, I like that it treats strings as sequences of UTF-16 code units instead of trying
to normalize strings that may not encode human readable text.


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
1234