JSON.canonicalize()

classic Classic list List threaded Threaded
73 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

JSON.canonicalize()

Anders Rundgren-2
Dear List,

Here is a proposal that I would be very happy getting feedback on since it builds on ES but is not (at all) limited to ES.

The request is for a complement to the ES "JSON" object called canonicalize() which would have identical parameters to the existing stringify() method.

The JSON canonicalization scheme (including ES code for emulating it), is described in:
https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html

Current workspace: https://github.com/cyberphone/json-canonicalization

Thanx,
Anders Rundgren
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

C. Scott Ananian
See http://wiki.laptop.org/go/Canonical_JSON -- you should probably at least mention unicode normalization of strings.  You probably should also specify a validator: it doesn't matter if you emit canonical JSON if you can tweak the hash of the value by feeding non-canonical JSON as an input.
  --scott

On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <[hidden email]> wrote:
Dear List,

Here is a proposal that I would be very happy getting feedback on since it builds on ES but is not (at all) limited to ES.

The request is for a complement to the ES "JSON" object called canonicalize() which would have identical parameters to the existing stringify() method.

The JSON canonicalization scheme (including ES code for emulating it), is described in:
https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html

Current workspace: https://github.com/cyberphone/json-canonicalization

Thanx,
Anders Rundgren
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Anders Rundgren-2
On 2018-03-16 08:52, C. Scott Ananian wrote:
> See http://wiki.laptop.org/go/Canonical_JSON -- you should probably at least
> mention unicode normalization of strings.

Yes, I could add that unicode normalization of strings is out of scope for this specification.


> You probably should also specify a validator: it doesn't matter if you emit
> canonical JSON if you can tweak the hash of the value by feeding non-canonical
> JSON as an input.

Pardon me, but I don't understand what you are writing here.

Hash functions only "raison d'être" are providing collision safe checksums.

thanx,
Anders


>    --scott
>
> On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Dear List,
>
>     Here is a proposal that I would be very happy getting feedback on since it builds on ES but is not (at all) limited to ES.
>
>     The request is for a complement to the ES "JSON" object called canonicalize() which would have identical parameters to the existing stringify() method.
>
>     The JSON canonicalization scheme (including ES code for emulating it), is described in:
>     https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html>
>
>     Current workspace: https://github.com/cyberphone/json-canonicalization <https://github.com/cyberphone/json-canonicalization>
>
>     Thanx,
>     Anders Rundgren
>     _______________________________________________
>     es-discuss mailing list
>     [hidden email] <mailto:[hidden email]>
>     https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>
>
>

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

C. Scott Ananian
Canonical JSON is often used to imply a security property: two JSON blobs with identical contents are expected to have identical canonical JSON forms (and thus identical hashed values).

However, unicode normalization allows multiple representations of "the same" string, which defeats this security property.  Depending on your implementation language and use, a string with precomposed accepts could compare equal to a string with separated accents, even though the canonical JSON or hash differed.  In an extreme case (with a weak hash function, say MD5), this can be used to break security by re-encoding all strings in multiple variants until a collision is found.  This is just a slight variant on the fact that JSON allows multiple ways to encode a character using escape sequences.  You've already taken the trouble to disambiguate this case; security-conscious applications should take care to perform unicode normalization as well, for the same reason.

Similarly, if you don't offer a verifier to ensure that the input is in "canonical JSON" format, then an attacker can try to create collisions by violating the rules of canonical JSON format, whether by using different escape sequences, adding whitespace, etc.  This can be used to make JSON which is "the same" appear "different", violating the intent of the canonicalization.  Any security application of canonical JSON will require a strict mode for JSON.parse() as well as a strict mode for JSON.stringify().
  --scott

On Fri, Mar 16, 2018 at 4:48 AM, Anders Rundgren <[hidden email]> wrote:
On 2018-03-16 08:52, C. Scott Ananian wrote:
See http://wiki.laptop.org/go/Canonical_JSON -- you should probably at least
mention unicode normalization of strings.

Yes, I could add that unicode normalization of strings is out of scope for this specification.


You probably should also specify a validator: it doesn't matter if you emit canonical JSON if you can tweak the hash of the value by feeding non-canonical JSON as an input.

Pardon me, but I don't understand what you are writing here.

Hash functions only "raison d'être" are providing collision safe checksums.

thanx,
Anders


   --scott

On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:

    Dear List,

    Here is a proposal that I would be very happy getting feedback on since it builds on ES but is not (at all) limited to ES.

    The request is for a complement to the ES "JSON" object called canonicalize() which would have identical parameters to the existing stringify() method.

    The JSON canonicalization scheme (including ES code for emulating it), is described in:
    https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html>

    Current workspace: https://github.com/cyberphone/json-canonicalization <https://github.com/cyberphone/json-canonicalization>

    Thanx,
    Anders Rundgren
    _______________________________________________
    es-discuss mailing list
    [hidden email] <mailto:[hidden email]>
    https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>





_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Mike Samuel


On Fri, Mar 16, 2018 at 11:38 AM, C. Scott Ananian <[hidden email]> wrote:
Canonical JSON is often used to imply a security property: two JSON blobs with identical contents are expected to have identical canonical JSON forms (and thus identical hashed values).

What does "identical contents" mean in the context of numbers?  JSON intentionally avoids specifying any precision for numbers.  

JSON.stringify(1/3) === '0.3333333333333333'

What would happen with JSON from systems that allow higher precision?
I.e., what would (JSON.canonicalize(JSON.stringify(1/3) + '3')) produce?



 
However, unicode normalization allows multiple representations of "the same" string, which defeats this security property.  Depending on your implementation language

We shouldn't normalize unicode in strings that contain packed binary data.  JSON strings are strings of UTF-16 code-units, not Unicode scalar values and any system that assumes the latter will break often.
 
and use, a string with precomposed accepts could compare equal to a string with separated accents, even though the canonical JSON or hash differed.  In an extreme case (with a weak hash function, say MD5), this can be used to break security by re-encoding all strings in multiple variants until a collision is found.  This is just a slight variant on the fact that JSON allows multiple ways to encode a character using escape sequences.  You've already taken the trouble to disambiguate this case; security-conscious applications should take care to perform unicode normalization as well, for the same reason.

Similarly, if you don't offer a verifier to ensure that the input is in "canonical JSON" format, then an attacker can try to create collisions by violating the rules of canonical JSON format, whether by using different escape sequences, adding whitespace, etc.  This can be used to make JSON which is "the same" appear "different", violating the intent of the canonicalization.  Any security application of canonical JSON will require a strict mode for JSON.parse() as well as a strict mode for JSON.stringify().

Given the dodginess of "identical" w.r.t. non-integral numbers, shouldn't endpoints be re-canonicalizing before hashing anyway?  Why would one want to ship the canonical form over the wire if it loses precision?

 
  --scott

On Fri, Mar 16, 2018 at 4:48 AM, Anders Rundgren <[hidden email]> wrote:
On 2018-03-16 08:52, C. Scott Ananian wrote:
See http://wiki.laptop.org/go/Canonical_JSON -- you should probably at least
mention unicode normalization of strings.

Yes, I could add that unicode normalization of strings is out of scope for this specification.


You probably should also specify a validator: it doesn't matter if you emit canonical JSON if you can tweak the hash of the value by feeding non-canonical JSON as an input.

Pardon me, but I don't understand what you are writing here.

Hash functions only "raison d'être" are providing collision safe checksums.

thanx,
Anders


   --scott

On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:

    Dear List,

    Here is a proposal that I would be very happy getting feedback on since it builds on ES but is not (at all) limited to ES.

    The request is for a complement to the ES "JSON" object called canonicalize() which would have identical parameters to the existing stringify() method.

Why should canonicalize take a replacer?  Hasn't replacement already happened?

 
    The JSON canonicalization scheme (including ES code for emulating it), is described in:
    https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html>

    Current workspace: https://github.com/cyberphone/json-canonicalization <https://github.com/cyberphone/json-canonicalization>

    Thanx,
    Anders Rundgren
    _______________________________________________
    es-discuss mailing list
    [hidden email] <mailto:[hidden email]>
    https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>





_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

C. Scott Ananian
On Fri, Mar 16, 2018 at 12:23 PM, Mike Samuel <[hidden email]> wrote:

On Fri, Mar 16, 2018 at 11:38 AM, C. Scott Ananian <[hidden email]> wrote:
Canonical JSON is often used to imply a security property: two JSON blobs with identical contents are expected to have identical canonical JSON forms (and thus identical hashed values).

What does "identical contents" mean in the context of numbers?  JSON intentionally avoids specifying any precision for numbers.  

JSON.stringify(1/3) === '0.3333333333333333'

What would happen with JSON from systems that allow higher precision?
I.e., what would (JSON.canonicalize(JSON.stringify(1/3) + '3')) produce?

However, unicode normalization allows multiple representations of "the same" string, which defeats this security property.  Depending on your implementation language

We shouldn't normalize unicode in strings that contain packed binary data.  JSON strings are strings of UTF-16 code-units, not Unicode scalar values and any system that assumes the latter will break often.

Both of these points are made on the URL I originally cited: http://wiki.laptop.org/go/Canonical_JSON
 --scott


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Carsten Bormann
In reply to this post by Mike Samuel
On Mar 16, 2018, at 16:23, Mike Samuel <[hidden email]> wrote:
>
> JSON strings are strings of UTF-16 code-units

No.

(You are confusing this with JavaScript strings.)

Grüße, Carsten

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Anders Rundgren-2
In reply to this post by C. Scott Ananian
On 2018-03-16 16:38, C. Scott Ananian wrote:
> Canonical JSON is often used to imply a security property: two JSON blobs > with identical contents are expected to have identical canonical JSON
> forms (and thus identical hashed values).

Right.

> However, unicode normalization allows multiple representations
> of "the same" string, which defeats this security property.

This is an aspect that I believe belongs to the "application" level.  This specification is only about "on the wire" format.

Rationale: if this was a part of the SPECIFICATION it would either be ignored (=useless) or be a showstopper (=dead) due to complexity.

If applications using the received data want to address this issue they can for example call
https://msdn.microsoft.com/en-us/library/windows/desktop/dd318671(v=vs.85).aspx
and reject if they want.

Or always normalize: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize


> Depending on your implementation language and use, a string with
> precomposed accepts could compare equal to a string with separated
> accents, even though the canonical JSON or hash differed.  

I don't want to go there for the reasons mentioned.


> In an extreme case (with a weak hash function, say MD5), this can be  > used to break security by re-encoding all strings in multiple variants
> until a collision is found.  This is just a slight variant on the fact
> that JSON allows multiple ways to encode a character using escape sequences.
> You've already taken the trouble to disambiguate this case; security-conscious
> applications should take care to perform unicode normalization as well, for the same reason.

If you are able to break the hash function all bets are off anyway because then you can presumably change *any* part of the object and it would still appear authentic.

Escape normalization: If you don't do this normalization, signatures would typically break and that's not really a "security" (=attacker) problem; it is rather a "nuisance" of the same caliber as a server not responding.


> Similarly, if you don't offer a verifier to ensure that the input is
> in "canonical JSON" format, then an attacker can try to create collisions
> by violating the rules of canonical JSON format, whether by using different
> escape sequences, adding whitespace, etc.  This can be used to make JSON which
> is "the same" appear "different", violating the intent of the canonicalization.

Again, if the hash function is broken, there's nothing to do except maybe cry :-(

This a Unicode problem, not a cryptographic problem.


> Any security application of canonical JSON will require a strict mode for
> JSON.parse() as well as a strict mode for JSON.stringify().

Indeed, you ALWAYS must verify that indata conforms to the agreed conventions.

Anyway, feel free pushing a different JSON canonicalization scheme!

Here is another: http://gibson042.github.io/canonicaljson-spec/
It claims that you should support "lone surrogates" (invalid Unicode) which for example JDK doesn't.
I don't go there either.

Anders

>    --scott
>
> On Fri, Mar 16, 2018 at 4:48 AM, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:
>
>     On 2018-03-16 08:52, C. Scott Ananian wrote:
>
>         See http://wiki.laptop.org/go/Canonical_JSON <http://wiki.laptop.org/go/Canonical_JSON> -- you should probably at least
>         mention unicode normalization of strings.
>
>
>     Yes, I could add that unicode normalization of strings is out of scope for this specification.
>
>
>         You probably should also specify a validator: it doesn't matter if you emit canonical JSON if you can tweak the hash of the value by feeding non-canonical JSON as an input.
>
>
>     Pardon me, but I don't understand what you are writing here.
>
>     Hash functions only "raison d'être" are providing collision safe checksums.
>
>     thanx,
>     Anders
>
>
>             --scott
>
>         On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:
>
>              Dear List,
>
>              Here is a proposal that I would be very happy getting feedback on since it builds on ES but is not (at all) limited to ES.
>
>              The request is for a complement to the ES "JSON" object called canonicalize() which would have identical parameters to the existing stringify() method.
>
>              The JSON canonicalization scheme (including ES code for emulating it), is described in:
>         https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html> <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html>>
>
>              Current workspace: https://github.com/cyberphone/json-canonicalization <https://github.com/cyberphone/json-canonicalization> <https://github.com/cyberphone/json-canonicalization <https://github.com/cyberphone/json-canonicalization>>
>
>              Thanx,
>              Anders Rundgren
>              _______________________________________________
>              es-discuss mailing list
>         [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>
>         https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss> <https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>>
>
>
>
>

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Mike Samuel
In reply to this post by C. Scott Ananian


On Fri, Mar 16, 2018 at 12:44 PM, C. Scott Ananian <[hidden email]> wrote:
On Fri, Mar 16, 2018 at 12:23 PM, Mike Samuel <[hidden email]> wrote:

On Fri, Mar 16, 2018 at 11:38 AM, C. Scott Ananian <[hidden email]> wrote:
Canonical JSON is often used to imply a security property: two JSON blobs with identical contents are expected to have identical canonical JSON forms (and thus identical hashed values).

What does "identical contents" mean in the context of numbers?  JSON intentionally avoids specifying any precision for numbers.  

JSON.stringify(1/3) === '0.3333333333333333'

What would happen with JSON from systems that allow higher precision?
I.e., what would (JSON.canonicalize(JSON.stringify(1/3) + '3')) produce?

However, unicode normalization allows multiple representations of "the same" string, which defeats this security property.  Depending on your implementation language

We shouldn't normalize unicode in strings that contain packed binary data.  JSON strings are strings of UTF-16 code-units, not Unicode scalar values and any system that assumes the latter will break often.

Both of these points are made on the URL I originally cited: http://wiki.laptop.org/go/Canonical_JSON

Thanks, I see
"""
Floating point numbers are not allowed in canonical JSON. Neither are leading zeros or "minus 0" for integers.
"""
which answers my question.

I also see
"""
A previous version of this specification required strings to be valid unicode, and relied on JSON's \u escape. This was abandoned as it doesn't allow representing arbitrary binary data in a string, and it doesn't preserve the identity of non-canonical unicode strings.
"""
which addresses my question.

I also see
"""
It is suggested that unicode strings be represented as the UTF-8 encoding of unicode Normalization Form C (UAX #15). However, arbitrary content may be represented as a string: it is not guaranteed that string contents can be meaningfully parsed as UTF-8.
"""
which seems to be mixing concerns about the wire format used to encode JSON as octets and NFC which would apply to the text of the JSON string.


If that confusion is cleaned up, then it seems a fine subset of JSON to ship over the wire with a JSON content-type.


It is entirely unsuitable to embedding in HTML or XML though.
IIUC, with an implementation based on this

  JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
  JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`

The output of JSON.canonicalize would also not be in the subset of JSON that is also a subset of JavaScript's PrimaryExpression.

   JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`

It also is not suitable for use internally within systems that internally use cstrings.

  JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Anders Rundgren-2
On 2018-03-16 18:04, Mike Samuel wrote:

> It is entirely unsuitable to embedding in HTML or XML though.
> IIUC, with an implementation based on this
>
>    JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
> JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`

I don't know what you are trying to prove here :-)


> The output of JSON.canonicalize would also not be in the subset of JSON that is also a subset of JavaScript's PrimaryExpression.
>
>     JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`
>
> It also is not suitable for use internally within systems that internally use cstrings.
>
>    JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`
>

JSON.canonicalize() would be [almost] identical to JSON.stringify()

JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'  // Returns true

"Emulator":

var canonicalize = function(object) {

     var buffer = '';
     serialize(object);
     return buffer;

     function serialize(object) {
         if (object !== null && typeof object === 'object') {
             if (Array.isArray(object)) {
                 buffer += '[';
                 let next = false;
                 object.forEach((element) => {
                     if (next) {
                         buffer += ',';
                     }
                     next = true;
                     serialize(element);
                 });
                 buffer += ']';
             } else {
                 buffer += '{';
                 let next = false;
                 Object.keys(object).sort().forEach((property) => {
                     if (next) {
                         buffer += ',';
                     }
                     next = true;
                     buffer += JSON.stringify(property);
                     buffer += ':';
                     serialize(object[property]);
                 });
                 buffer += '}';
             }
         } else {
             buffer += JSON.stringify(object);
         }
     }
};
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Mike Samuel


On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <[hidden email]> wrote:
On 2018-03-16 18:04, Mike Samuel wrote:

It is entirely unsuitable to embedding in HTML or XML though.
IIUC, with an implementation based on this

   JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`

I don't know what you are trying to prove here :-)

Only that canonical JSON is useful in a very narrow context.
It cannot be embedded in an HTML script tag.
It cannot be embedded in an XML or HTML foreign content context without extra care.
If it contains a string literal that embeds a NUL it cannot be embedded in XML period even if extra care is taken.

 

The output of JSON.canonicalize would also not be in the subset of JSON that is also a subset of JavaScript's PrimaryExpression.

    JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`

It also is not suitable for use internally within systems that internally use cstrings.

   JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`


JSON.canonicalize() would be [almost] identical to JSON.stringify()

You're correct.  Many JSON producers have a web-safe version, but the JavaScript builtin does not.
My point is that JSON.canonicalize undoes those web-safety tweaks.

 
JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'  // Returns true

"Emulator":

var canonicalize = function(object) {

    var buffer = '';
    serialize(object);

I thought canonicalize took in a string of JSON and produced the same.  Am I wrong?
"Canonicalize" to my mind means a function that returns the canonical member of an
equivalence class given any member from that same equivalence class, so is always 'a -> 'a.
 
    return buffer;

    function serialize(object) {
        if (object !== null && typeof object === 'object') {

JSON.stringify(new Date(0)) === "\"1970-01-01T00:00:00.000Z\""
because Date.prototype.toJSON exists.

If you operate as a JSON_string -> JSON_string function then you
can avoid this complexity. 

            if (Array.isArray(object)) {
                buffer += '[';
                let next = false;
                object.forEach((element) => {
                    if (next) {
                        buffer += ',';
                    }
                    next = true;
                    serialize(element);
                });
                buffer += ']';
            } else {
                buffer += '{';
                let next = false;
                Object.keys(object).sort().forEach((property) => {
                    if (next) {
                        buffer += ',';
                    }
                    next = true; 
                    buffer += JSON.stringify(property);

I think you need a symbol check here.  JSON.stringify(Symbol.for('foo')) === undefined
 
                    buffer += ':';
                    serialize(object[property]);
                });
                buffer += '}';
            }
        } else {
            buffer += JSON.stringify(object);

This fails to distinguish non-integral numbers from integral ones, and produces non-standard output
when object === undefined.  Again, not a problem if the input is required to be valid JSON.
 
        }
    }
};


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

C. Scott Ananian
In reply to this post by Anders Rundgren-2


On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <[hidden email]> wrote:
On 2018-03-16 18:04, Mike Samuel wrote:

It is entirely unsuitable to embedding in HTML or XML though.
IIUC, with an implementation based on this

   JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`

I don't know what you are trying to prove here :-)

He wants to ship it as application/json and have it be safe if the browser happens to ignore the mime type and interpret it as HTML or XML, I believe.  Mandatory encoding of < as an escape would make the output "safe" for such use.  I'm not convinced this is in-scope, but it's an interesting case to consider when determining which characters ought to be escaped.

(I think he's writing `JSON.canonicalize(JSON.stringify(...))` where he means to write `JSON.canonicalize(...)`, at least if I understand the proposed API correctly.)
 
The output of JSON.canonicalize would also not be in the subset of JSON that is also a subset of JavaScript's PrimaryExpression.

    JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`

I'm not sure about this, but I think he's saying you can't just `eval` the canonical JSON output, because newlines appear literally, not escaped. I believe I actually ran into some compatibility issues with this back when I was playing around with canonical JSON as well; certain JSON parsers wouldn't accept "JSON" with embedded literal newlines.

OTOH, I don't think anyone should be encouraged to eval JSON!  As noted previously, there should be a strict parse function to go along with the strict serialize function.
 
It also is not suitable for use internally within systems that internally use cstrings.

   JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`

A literal NUL character is unrepresentable in a naive C implementation.  You need to use pascal-style strings in your low-level implementation.  This is an important consideration for non-JavaScript use.  In my page I noted, "Because only two byte values are escaped, be aware that JSON-encoded data may contain embedded control characters and nulls."  A similar warning is at least called for here.
 
On Fri, Mar 16, 2018 at 12:23 PM, Mike Samuel <[hidden email]> wrote:
I also see
"""
It is suggested that unicode strings be represented as the UTF-8 encoding of unicode Normalization Form C (UAX #15). However, arbitrary content may be represented as a string: it is not guaranteed that string contents can be meaningfully parsed as UTF-8.
"""
which seems to be mixing concerns about the wire format used to encode JSON as octets and NFC which would apply to the text of the JSON string.

Yes, it is rather unfortunate that we have only one datatype here and a bit of an impedance mismatch.  JSON serialization is usually considered literally as a byte-stream, but JavaScript wants to parse those bytes as some encoding (usually UTF-8) of a UTF-16 string.

My suggestion is just to make this very plain in a SHOULD comment to the potentially implementor.  If the underlying data is unicode string data, it SHOULD be represented as the UTF-8 encoding of unicode Normalization Form C (UAX #15).   However, the consumer should be aware that the data may be binary bits and not interpretable as a valid UTF-8 string.

Re:
Escape normalization: If you don't do this normalization, signatures would typically break and that's not really a "security" (=attacker) problem; it is rather a "nuisance" of the same caliber as a server not responding.

Consider signatures for malware detection.  If an attacker can trivially modify their (in this example) JSON-encoded payload so that it is still "canonical" and still passes whatever input verifier exists (so much easier if there is not strict parsing!), then they can bypass your signature-based detection system.  That's a security problem.

Both sides must be true: equal hashes should mean equal content (to high probability) and unequal hashes should mean different content.  Otherwise there is a security problem.
 --scott


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

C. Scott Ananian
In reply to this post by Mike Samuel
And just to be clear: I'm all for standardizing a canonical JSON form.  In addition to my 11-year-old attempt, there have been countless others, and still no *standard*.  I just want us to learn from the previous attempts and try to make something at least as good as everything which has come before, especially in terms of the various non-obvious considerations which individual implementors have discovered the hard way over the years.
  --scott

On Fri, Mar 16, 2018 at 1:46 PM, Mike Samuel <[hidden email]> wrote:


On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <[hidden email]> wrote:
On 2018-03-16 18:04, Mike Samuel wrote:

It is entirely unsuitable to embedding in HTML or XML though.
IIUC, with an implementation based on this

   JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`

I don't know what you are trying to prove here :-)

Only that canonical JSON is useful in a very narrow context.
It cannot be embedded in an HTML script tag.
It cannot be embedded in an XML or HTML foreign content context without extra care.
If it contains a string literal that embeds a NUL it cannot be embedded in XML period even if extra care is taken.

 

The output of JSON.canonicalize would also not be in the subset of JSON that is also a subset of JavaScript's PrimaryExpression.

    JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`

It also is not suitable for use internally within systems that internally use cstrings.

   JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`


JSON.canonicalize() would be [almost] identical to JSON.stringify()

You're correct.  Many JSON producers have a web-safe version, but the JavaScript builtin does not.
My point is that JSON.canonicalize undoes those web-safety tweaks.

 
JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'  // Returns true

"Emulator":

var canonicalize = function(object) {

    var buffer = '';
    serialize(object);

I thought canonicalize took in a string of JSON and produced the same.  Am I wrong?
"Canonicalize" to my mind means a function that returns the canonical member of an
equivalence class given any member from that same equivalence class, so is always 'a -> 'a.
 
    return buffer;

    function serialize(object) {
        if (object !== null && typeof object === 'object') {

JSON.stringify(new Date(0)) === "\"1970-01-01T00:00:00.000Z\""
because Date.prototype.toJSON exists.

If you operate as a JSON_string -> JSON_string function then you
can avoid this complexity. 

            if (Array.isArray(object)) {
                buffer += '[';
                let next = false;
                object.forEach((element) => {
                    if (next) {
                        buffer += ',';
                    }
                    next = true;
                    serialize(element);
                });
                buffer += ']';
            } else {
                buffer += '{';
                let next = false;
                Object.keys(object).sort().forEach((property) => {
                    if (next) {
                        buffer += ',';
                    }
                    next = true; 
                    buffer += JSON.stringify(property);

I think you need a symbol check here.  JSON.stringify(Symbol.for('foo')) === undefined
 
                    buffer += ':';
                    serialize(object[property]);
                });
                buffer += '}';
            }
        } else {
            buffer += JSON.stringify(object);

This fails to distinguish non-integral numbers from integral ones, and produces non-standard output
when object === undefined.  Again, not a problem if the input is required to be valid JSON.
 
        }
    }
};


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Anders Rundgren-2
In reply to this post by Mike Samuel
On 2018-03-16 18:46, Mike Samuel wrote:

>
>
> On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:
>
>     On 2018-03-16 18:04, Mike Samuel wrote:
>
>         It is entirely unsuitable to embedding in HTML or XML though.
>         IIUC, with an implementation based on this
>
>             JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
>         JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`
>
>
>     I don't know what you are trying to prove here :-)
>
>
> Only that canonical JSON is useful in a very narrow context.
> It cannot be embedded in an HTML script tag.
> It cannot be embedded in an XML or HTML foreign content context without extra care.
> If it contains a string literal that embeds a NUL it cannot be embedded in XML period even if extra care is taken.

If we stick to browsers, JSON.canonicalize() would presumably be used with WebCrypto, WebSocket etc.

Node.js is probably a more important target.

Related stuff:
https://tools.ietf.org/id/draft-erdtman-jose-cleartext-jws-00.html
JSON signatures without canonicalization.

>
>
>         The output of JSON.canonicalize would also not be in the subset of JSON that is also a subset of JavaScript's PrimaryExpression.
>
>              JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`
>
>         It also is not suitable for use internally within systems that internally use cstrings.
>
>             JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`
>
>
>     JSON.canonicalize() would be [almost] identical to JSON.stringify()
>
>
> You're correct.  Many JSON producers have a web-safe version, but the JavaScript builtin does not.
> My point is that JSON.canonicalize undoes those web-safety tweaks.
>
>     JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'  // Returns true
>
>     "Emulator":
>
>     var canonicalize = function(object) {
>
>          var buffer = '';
>          serialize(object);
>
>
> I thought canonicalize took in a string of JSON and produced the same.  Am I wrong?

Yes, it is just a variant of JSON.stringify().

> "Canonicalize" to my mind means a function that returns the canonical member of an
> equivalence class given any member from that same equivalence class, so is always 'a -> 'a.

This is rather a canonicalizing serializer.

>
>          return buffer;
>
>          function serialize(object) {
>              if (object !== null && typeof object === 'object') {
>
>
> JSON.stringify(new Date(0)) === "\"1970-01-01T00:00:00.000Z\""
> because Date.prototype.toJSON exists.
>
> If you operate as a JSON_string -> JSON_string function then you
> can avoid this complexity.
>
>                  if (Array.isArray(object)) {
>                      buffer += '[';
>                      let next = false;
>                      object.forEach((element) => {
>                          if (next) {
>                              buffer += ',';
>                          }
>                          next = true;
>                          serialize(element);
>                      });
>                      buffer += ']';
>                  } else {
>                      buffer += '{';
>                      let next = false;
>                      Object.keys(object).sort().forEach((property) => {
>                          if (next) {
>                              buffer += ',';
>                          }
>                          next = true;
>
>                          buffer += JSON.stringify(property);
>
>
> I think you need a symbol check here.  JSON.stringify(Symbol.for('foo')) === undefined
>
>                          buffer += ':';
>                          serialize(object[property]);
>                      });
>                      buffer += '}';
>                  }
>              } else {
>                  buffer += JSON.stringify(object);
>
>
> This fails to distinguish non-integral numbers from integral ones, and produces non-standard output
> when object === undefined.  Again, not a problem if the input is required to be valid JSON.

Well, a proper implementation would build on JSON.stringify() with property sorting as the only enhancement.

>
>              }
>          }
>     };
>
>

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Mike Samuel
In reply to this post by C. Scott Ananian


On Fri, Mar 16, 2018 at 1:54 PM, C. Scott Ananian <[hidden email]> wrote:
And just to be clear: I'm all for standardizing a canonical JSON form.  In addition to my 11-year-old attempt, there have been countless others, and still no *standard*.  I just want us to learn from the previous attempts and try to make something at least as good as everything which has come before, especially in terms of the various non-obvious considerations which individual implementors have discovered the hard way over the years.

I think the hashing use case is an important one.  At the risk of bikeshedding, "canonical" seems to overstate the usefulness.  Many assume that the canonical form of something is usually the one you use in preference to any other equivalent.

If the integer-only restriction is relaxed (see below), then
* The proposed canonical form seems useful as an input to strong hash functions.
* It seems usable as a complete message body, but not preferable due to potential loss of precision.
* It seems usable but not preferable as a long-term storage format.
* It seems a source of additional risk when used in conjunction with other common web languages.

If that is correct, Would people be averse to marketing this as "hashable JSON" instead of "canonical JSON?"

------

Numbers

There seem to be 3 main forks in the design space w.r.t. numbers.  I'm sure
cscott has thought of more, but to make it clear why I think canonical JSON
is not very useful as a wire/storage format.

1. Integers only
    PROS: avoids floating point equality issues that have bedeviled many systems
    CONS: can support only a small portion of the JSON value space
    CONS: small loss of precision risk with integers encoded from Decimal values.
        For example, won't roundtrip Java BigDecimals.
2. Any numbers with minimal changes: dropping + signs, normalizing zeros,
    using a fixed threshold for scientific notation.
    PROS: supports whole JSON value-space
    CONS: less useful for hashing
    CONS: risks loss of precision when decoders decide based on presence of
       decimal point whether to represent as double or int.
3. Preserve textual representation.
    PROS: avoids loss of precision
    PROS: can support whole JSON value-space
    CONS: not very useful for hashing

It seems that there is a tradeoff between usefulness for hashing and the ability to
support the whole JSON value-space.

Recommending this as a wire / storage format further complicates that tradeoff.

Regardless of which fork is chosen, there are some risks with the current design.
For example, 1e100000 takes up some space in memory.  This might allow timing attacks.
Imagine an attacker can get Alice to embed 1e100000 or another number in her JSON.
Alice sends that message to Bob over an encrypted channel.  Bob converts the JSON to
canonical JSON.  If Bob refuses some JSON payloads over a threshold size or the
time to process is noticably different for 1e100000 vs 1e1 then the attacker can
tell, via traffic analysis alone, when Alice communicates with Bob.
We should avoid that in-memory blowup if possible.




 
  --scott

On Fri, Mar 16, 2018 at 1:46 PM, Mike Samuel <[hidden email]> wrote:


On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <[hidden email]> wrote:
On 2018-03-16 18:04, Mike Samuel wrote:

It is entirely unsuitable to embedding in HTML or XML though.
IIUC, with an implementation based on this

   JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`

I don't know what you are trying to prove here :-)

Only that canonical JSON is useful in a very narrow context.
It cannot be embedded in an HTML script tag.
It cannot be embedded in an XML or HTML foreign content context without extra care.
If it contains a string literal that embeds a NUL it cannot be embedded in XML period even if extra care is taken.

 

The output of JSON.canonicalize would also not be in the subset of JSON that is also a subset of JavaScript's PrimaryExpression.

    JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`

It also is not suitable for use internally within systems that internally use cstrings.

   JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`


JSON.canonicalize() would be [almost] identical to JSON.stringify()

You're correct.  Many JSON producers have a web-safe version, but the JavaScript builtin does not.
My point is that JSON.canonicalize undoes those web-safety tweaks.

 
JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'  // Returns true

"Emulator":

var canonicalize = function(object) {

    var buffer = '';
    serialize(object);

I thought canonicalize took in a string of JSON and produced the same.  Am I wrong?
"Canonicalize" to my mind means a function that returns the canonical member of an
equivalence class given any member from that same equivalence class, so is always 'a -> 'a.
 
    return buffer;

    function serialize(object) {
        if (object !== null && typeof object === 'object') {

JSON.stringify(new Date(0)) === "\"1970-01-01T00:00:00.000Z\""
because Date.prototype.toJSON exists.

If you operate as a JSON_string -> JSON_string function then you
can avoid this complexity. 

            if (Array.isArray(object)) {
                buffer += '[';
                let next = false;
                object.forEach((element) => {
                    if (next) {
                        buffer += ',';
                    }
                    next = true;
                    serialize(element);
                });
                buffer += ']';
            } else {
                buffer += '{';
                let next = false;
                Object.keys(object).sort().forEach((property) => {
                    if (next) {
                        buffer += ',';
                    }
                    next = true; 
                    buffer += JSON.stringify(property);

I think you need a symbol check here.  JSON.stringify(Symbol.for('foo')) === undefined
 
                    buffer += ':';
                    serialize(object[property]);
                });
                buffer += '}';
            }
        } else {
            buffer += JSON.stringify(object);

This fails to distinguish non-integral numbers from integral ones, and produces non-standard output
when object === undefined.  Again, not a problem if the input is required to be valid JSON.
 
        }
    }
};


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss




_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Anders Rundgren-2
On 2018-03-16 19:30, Mike Samuel wrote:
> 2. Any numbers with minimal changes: dropping + signs, normalizing zeros,
>      using a fixed threshold for scientific notation.
>      PROS: supports whole JSON value-space
>      CONS: less useful for hashing
>      CONS: risks loss of precision when decoders decide based on presence of
>         decimal point whether to represent as double or int.

Have you actually looked into the specification?
https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2
ES6 has all what it takes.

Anders


> 3. Preserve textual representation.
>      PROS: avoids loss of precision
>      PROS: can support whole JSON value-space
>      CONS: not very useful for hashing
>
> It seems that there is a tradeoff between usefulness for hashing and the ability to
> support the whole JSON value-space.
>
> Recommending this as a wire / storage format further complicates that tradeoff.
>
> Regardless of which fork is chosen, there are some risks with the current design.
> For example, 1e100000 takes up some space in memory.  This might allow timing attacks.
> Imagine an attacker can get Alice to embed 1e100000 or another number in her JSON.
> Alice sends that message to Bob over an encrypted channel.  Bob converts the JSON to
> canonical JSON.  If Bob refuses some JSON payloads over a threshold size or the
> time to process is noticably different for 1e100000 vs 1e1 then the attacker can
> tell, via traffic analysis alone, when Alice communicates with Bob.
> We should avoid that in-memory blowup if possible.
>
>
>
>
>        --scott
>
>     On Fri, Mar 16, 2018 at 1:46 PM, Mike Samuel <[hidden email] <mailto:[hidden email]>> wrote:
>
>
>
>         On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:
>
>             On 2018-03-16 18:04, Mike Samuel wrote:
>
>                 It is entirely unsuitable to embedding in HTML or XML though.
>                 IIUC, with an implementation based on this
>
>                     JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
>                 JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`
>
>
>             I don't know what you are trying to prove here :-)
>
>
>         Only that canonical JSON is useful in a very narrow context.
>         It cannot be embedded in an HTML script tag.
>         It cannot be embedded in an XML or HTML foreign content context without extra care.
>         If it contains a string literal that embeds a NUL it cannot be embedded in XML period even if extra care is taken.
>
>
>                 The output of JSON.canonicalize would also not be in the subset of JSON that is also a subset of JavaScript's PrimaryExpression.
>
>                      JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`
>
>                 It also is not suitable for use internally within systems that internally use cstrings.
>
>                     JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`
>
>
>             JSON.canonicalize() would be [almost] identical to JSON.stringify()
>
>
>         You're correct.  Many JSON producers have a web-safe version, but the JavaScript builtin does not.
>         My point is that JSON.canonicalize undoes those web-safety tweaks.
>
>             JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'  // Returns true
>
>             "Emulator":
>
>             var canonicalize = function(object) {
>
>                  var buffer = '';
>                  serialize(object);
>
>
>         I thought canonicalize took in a string of JSON and produced the same.  Am I wrong?
>         "Canonicalize" to my mind means a function that returns the canonical member of an
>         equivalence class given any member from that same equivalence class, so is always 'a -> 'a.
>
>                  return buffer;
>
>                  function serialize(object) {
>                      if (object !== null && typeof object === 'object') {
>
>
>         JSON.stringify(new Date(0)) === "\"1970-01-01T00:00:00.000Z\""
>         because Date.prototype.toJSON exists.
>
>         If you operate as a JSON_string -> JSON_string function then you
>         can avoid this complexity.
>
>                          if (Array.isArray(object)) {
>                              buffer += '[';
>                              let next = false;
>                              object.forEach((element) => {
>                                  if (next) {
>                                      buffer += ',';
>                                  }
>                                  next = true;
>                                  serialize(element);
>                              });
>                              buffer += ']';
>                          } else {
>                              buffer += '{';
>                              let next = false;
>                              Object.keys(object).sort().forEach((property) => {
>                                  if (next) {
>                                      buffer += ',';
>                                  }
>                                  next = true;
>
>                                  buffer += JSON.stringify(property);
>
>
>         I think you need a symbol check here.  JSON.stringify(Symbol.for('foo')) === undefined
>
>                                  buffer += ':';
>                                  serialize(object[property]);
>                              });
>                              buffer += '}';
>                          }
>                      } else {
>                          buffer += JSON.stringify(object);
>
>
>         This fails to distinguish non-integral numbers from integral ones, and produces non-standard output
>         when object === undefined.  Again, not a problem if the input is required to be valid JSON.
>
>                      }
>                  }
>             };
>
>
>
>         _______________________________________________
>         es-discuss mailing list
>         [hidden email] <mailto:[hidden email]>
>         https://mail.mozilla.org/listinfo/es-discuss <https://mail.mozilla.org/listinfo/es-discuss>
>
>
>

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Mike Samuel


On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren <[hidden email]> wrote:
On 2018-03-16 19:30, Mike Samuel wrote:
2. Any numbers with minimal changes: dropping + signs, normalizing zeros,
     using a fixed threshold for scientific notation.
     PROS: supports whole JSON value-space
     CONS: less useful for hashing
     CONS: risks loss of precision when decoders decide based on presence of
        decimal point whether to represent as double or int.

Have you actually looked into the specification?
https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2
ES6 has all what it takes.

Yes, but other notions of canonical equivalence have been mentioned here
so reasons to prefer one to another seem in scope.


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Anders Rundgren-2
On 2018-03-16 19:51, Mike Samuel wrote:

>
>
> On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:
>
>     On 2018-03-16 19:30, Mike Samuel wrote:
>
>         2. Any numbers with minimal changes: dropping + signs, normalizing zeros,
>               using a fixed threshold for scientific notation.
>               PROS: supports whole JSON value-space
>               CONS: less useful for hashing
>               CONS: risks loss of precision when decoders decide based on presence of
>                  decimal point whether to represent as double or int.
>
>
>     Have you actually looked into the specification?
>     https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2 <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2>
>     ES6 has all what it takes.
>
>
> Yes, but other notions of canonical equivalence have been mentioned here
> so reasons to prefer one to another seem in scope.

Availability beats perfection anytime.  This is the VHS (if anybody remember that old story) of canonicalization and I don't feel too bad about that :-)

Anders

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

Mike Samuel


On Fri, Mar 16, 2018 at 3:03 PM, Anders Rundgren <[hidden email]> wrote:
On 2018-03-16 19:51, Mike Samuel wrote:


On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:

    On 2018-03-16 19:30, Mike Samuel wrote:

        2. Any numbers with minimal changes: dropping + signs, normalizing zeros,
              using a fixed threshold for scientific notation.
              PROS: supports whole JSON value-space
              CONS: less useful for hashing
              CONS: risks loss of precision when decoders decide based on presence of
                 decimal point whether to represent as double or int.


    Have you actually looked into the specification?
    https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2 <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2>
    ES6 has all what it takes.


Yes, but other notions of canonical equivalence have been mentioned here
so reasons to prefer one to another seem in scope.

Availability beats perfection anytime.  This is the VHS (if anybody remember that old story) of canonicalization and I don't feel too bad about that :-)

Perhaps.  Any thoughts on my question about the merits of "Hashable" vs "Canonical"? 

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: JSON.canonicalize()

C. Scott Ananian
I think the horse is out of the barn re hashable-vs-canonical.  It has (independently) been invented and named canonical JSON many many times, starting 11 years ago.


"Content Addressable JSON" is a variant of your "hashable JSON" proposal, though.  But the "canonicals" seem to vastly outnumber the "hashables".

My question for Anders is: do you actually plan to incorporate any feedback into changes to your proposal?  Or were you really just looking for us to validate your work, not actually contribute to it?
 --scott

On Fri, Mar 16, 2018 at 3:09 PM, Mike Samuel <[hidden email]> wrote:


On Fri, Mar 16, 2018 at 3:03 PM, Anders Rundgren <[hidden email]> wrote:
On 2018-03-16 19:51, Mike Samuel wrote:


On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren <[hidden email] <mailto:[hidden email]>> wrote:

    On 2018-03-16 19:30, Mike Samuel wrote:

        2. Any numbers with minimal changes: dropping + signs, normalizing zeros,
              using a fixed threshold for scientific notation.
              PROS: supports whole JSON value-space
              CONS: less useful for hashing
              CONS: risks loss of precision when decoders decide based on presence of
                 decimal point whether to represent as double or int.


    Have you actually looked into the specification?
    https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2 <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2>
    ES6 has all what it takes.


Yes, but other notions of canonical equivalence have been mentioned here
so reasons to prefer one to another seem in scope.

Availability beats perfection anytime.  This is the VHS (if anybody remember that old story) of canonicalization and I don't feel too bad about that :-)

Perhaps.  Any thoughts on my question about the merits of "Hashable" vs "Canonical"? 


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
1234