Intent to unship: ISO-2022-JP-2 support in the ISO-2022-JP decoder

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Intent to unship: ISO-2022-JP-2 support in the ISO-2022-JP decoder

Henri Sivonen-2
Japanese *email* is often encoded as ISO-2022-JP, and Web browsers
also support ISO-2022-JP even though Shift_JIS and EUC-JP are the more
common Japanese legacy encodings on the *Web*. The two UTF-16 variants
and ISO-2022-JP are the only remaining encodings in the Web Platform
that encode non-Basic Latin characters to bytes that represent Basic
Latin characters in ASCII.

There exists an extension of ISO-2022-JP called ISO-2022-JP-2. The
ISO-2022-JP decoder (not encoder) in Gecko supports ISO-2022-JP-2
features, which include the use of characters from JIS X 0212, KS X
1001 (better known as the repertoire for EUC-KR), GB 2312, ISO-8859-1
and ISO-8859-7. The reason originally given for adding ISO-2022-JP-2
support to Gecko was: "I want to add a ISO-2022-JP-2 charset decoder
to Mozilla."[1]

Other browsers don't support this extension, so it clearly can't be a
requirement for the Web Platform, and the Encoding Standard doesn't
include the ISO-2022-JP-2 extension in its definition for the
ISO-2022-JP decoder. Bringing our ISO-2022-JP decoder to compliance[2]
would, therefore, involve removing ISO-2022-JP-2 support.

The only known realistic source of ISO-2022-JP-2 data is Apple's Mail
application under some circumstances, which may impact Thunderbird and
SeaMonkey.

Are there any objections to removing the ISO-2022-JP-2 functionality
from mozilla-central?

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=72468
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=715833
--
Henri Sivonen
[hidden email]
https://hsivonen.fi/
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Intent to unship: ISO-2022-JP-2 support in the ISO-2022-JP decoder

Jörg Knobloch
On 30/11/2015 16:38, Henri Sivonen wrote:
> Are there any objections to removing the ISO-2022-JP-2 functionality
> from mozilla-central?

Hello,

I am currently in the process of repairing long-standing issues with CJK
e-mail in general and Japanese e-mail using ISO-2022-JP in particularm
see below.

While working on these bugs I learned that ISO-2022-JP is still widely
used for e-mail in Japan, especially in more conservative circles like
the banking sector.

As far as I see, there are no objections to removing the ISO-2022-JP-2
variant as long as the ISO-2022-JP is maintained.

In Thunderbird we allow to send (encode) and view (decode) e-mail using
ISO-2022-JP but not ISO-2022-JP-2.

Also note:
https://dxr.mozilla.org/mozilla-central/source/intl/uconv/nsTextToSubURI.cpp#163

Jorg K.

https://bugzilla.mozilla.org/show_bug.cgi?id=1225864 - M-C
https://bugzilla.mozilla.org/show_bug.cgi?id=1225904 - C-C
https://bugzilla.mozilla.org/show_bug.cgi?id=653342 -C-C



_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Intent to unship: ISO-2022-JP-2 support in the ISO-2022-JP decoder

Jonas Sicking-2
In reply to this post by Henri Sivonen-2
On Mon, Nov 30, 2015 at 7:38 AM, Henri Sivonen <[hidden email]> wrote:
> Other browsers don't support this extension, so it clearly can't be a
> requirement for the Web Platform

Generally speaking, I don't think this reasoning is entirely accurate.
We know that there's lots of browser-specific code paths out there, so
just because other browsers don't support a given feature doesn't mean
that removing it from gecko won't affect our users.

Getting telemetry data is generally a better way to go.

That said, I know nothing about the specific encodings involved here,
so maybe there are other reasons to believe that this won't affect our
users.

/ Jonas
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Intent to unship: ISO-2022-JP-2 support in the ISO-2022-JP decoder

Adam Roach
In reply to this post by Henri Sivonen-2
On 11/30/15 09:38, Henri Sivonen wrote:
> The only known realistic source of ISO-2022-JP-2 data is Apple's Mail
> application under some circumstances, which may impact Thunderbird and
> SeaMonkey.

Does this mean it might interact with webmail services as well? Or do
they tend to do server-side transcoding from the received encoding to
something like UTF8?

--
Adam Roach
Principal Platform Engineer
[hidden email]
+1 650 903 0800 x863
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Intent to unship: ISO-2022-JP-2 support in the ISO-2022-JP decoder

Andrew Sutherland-5
On Mon, Nov 30, 2015, at 01:24 PM, Adam Roach wrote:
> Does this mean it might interact with webmail services as well? Or do
> they tend to do server-side transcoding from the received encoding to
> something like UTF8?

They do server-side decoding.  It would take a tremendous amount of
effort to try and expose the underlying character set directly to the
browser given that the MIME part also has transport-encoding occurring
(base64 or quoted-printable), may have higher level things like
format=flowed going on, and may need multipart/related cid-protocol
transforms going on.

Andrew
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Intent to unship: ISO-2022-JP-2 support in the ISO-2022-JP decoder

Henri Sivonen-2
In reply to this post by Jörg Knobloch
On Mon, Nov 30, 2015 at 7:09 PM, Jörg Knobloch <[hidden email]> wrote:
> As far as I see, there are no objections to removing the ISO-2022-JP-2
> variant as long as the ISO-2022-JP is maintained.

Cool. This intent is indeed scoped to the -2 stuff only.

On Mon, Nov 30, 2015 at 7:21 PM, Jonas Sicking <[hidden email]> wrote:
> On Mon, Nov 30, 2015 at 7:38 AM, Henri Sivonen <[hidden email]> wrote:
>> Other browsers don't support this extension, so it clearly can't be a
>> requirement for the Web Platform
>
> Generally speaking, I don't think this reasoning is entirely accurate.
> We know that there's lots of browser-specific code paths out there, so
> just because other browsers don't support a given feature doesn't mean
> that removing it from gecko won't affect our users.

What you say is a concern in the general case, yes. In the specific
case of encodings, it is *very* unlikely that sites would serve
ISO-2022-JP-2 on a per-browser basis.

First of all, if a site has the capability to vary the character
encoding of the HTML it generates, that capability has been
fundamentally useless for the past decade: Just use the capability to
generate UTF-8 for everyone and then quit varying the output.

But Web authors (as a collective, no insult to specific clueful Web
authors implied) are known to do fundamentally useless per-browser
things, so maybe the above isn't convincing. Let's consider what a
hypothetical site that server ISO-2022-JP-2 to Firefox would serve to
other browsers:

If the site wanted to use JIS X 0212 characters, it could send EUC-JP
to other browser. But if you can generate EUC-JP, there's no upside to
be had from sending ISO-2022-JP-2 to Firefox. Firefox has had EUC-JP
all this time. There's no reason for anyone to have done this instead
of using EUC-jP for all browsers.

If the site wanted to use Chinese, Korean, Western Latin and Greek
characters alongside Japanese ones, it could send UTF-8 to other
browsers, but again, if you can send UTF-8, just send UTF8 to
everyone, including Firefox.

What if the site sends ISO-2022-JP + numeric character references to
other browsers? This implies the capability to do more complex
re-encoding, since you need Unicode scalars for the NCRs. However, if
the site depended on ISO-2022-JP inheriting into CSS and JS, this
could be a silly fix that just sending UTF-8 would break. But it seems
unlikely that someone is clueful to design for this all the while not
being the sort of author who addresses the problem with UTF-8.

Finally, one might postulate an Emacs MULE user to have created some
content not caring about non-*nix users at the time when "everyone" on
*nix used Gecko. Or an insecurely implemented email-to-Web archives
containing messages sent from Apple's Mail. However, we aren't in the
business of supporting hypothetical Gecko-only fringe content.
(Firefox 43 removes support for the previously Firefox-only
Unicode-at-On feature of Big5, for example.)

> Getting telemetry data is generally a better way to go.

ISO-2022-JP in general affects just 0.05% of Firefox release channel
sessions. It's reasonable to assume the number to be higher for users
who read Japanese and even lower for everyone else. We don't have
telemetry for the -2 subfeature, but it has to by *tiny*. Note that
even the less popular ones of the many Cyrillic legacy encodings
affect a larger proportion of sessions *each* (except KOI8-U).

On Mon, Nov 30, 2015 at 8:24 PM, Adam Roach <[hidden email]> wrote:
> On 11/30/15 09:38, Henri Sivonen wrote:
>
> The only known realistic source of ISO-2022-JP-2 data is Apple's Mail
> application under some circumstances, which may impact Thunderbird and
> SeaMonkey.
>
>
> Does this mean it might interact with webmail services as well?

No, for the reasons others gave. The change would be relevant to the
Gaia email app to the extent it's relevant to anyone at all.

--
Henri Sivonen
[hidden email]
https://hsivonen.fi/
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Intent to unship: ISO-2022-JP-2 support in the ISO-2022-JP decoder

Henri Sivonen-2
In reply to this post by Henri Sivonen-2
On Mon, Nov 30, 2015 at 5:38 PM, Henri Sivonen <[hidden email]> wrote:

> Japanese *email* is often encoded as ISO-2022-JP, and Web browsers
> also support ISO-2022-JP even though Shift_JIS and EUC-JP are the more
> common Japanese legacy encodings on the *Web*. The two UTF-16 variants
> and ISO-2022-JP are the only remaining encodings in the Web Platform
> that encode non-Basic Latin characters to bytes that represent Basic
> Latin characters in ASCII.
>
> There exists an extension of ISO-2022-JP called ISO-2022-JP-2. The
> ISO-2022-JP decoder (not encoder) in Gecko supports ISO-2022-JP-2
> features, which include the use of characters from JIS X 0212, KS X
> 1001 (better known as the repertoire for EUC-KR), GB 2312, ISO-8859-1
> and ISO-8859-7. The reason originally given for adding ISO-2022-JP-2
> support to Gecko was: "I want to add a ISO-2022-JP-2 charset decoder
> to Mozilla."[1]
>
> Other browsers don't support this extension, so it clearly can't be a
> requirement for the Web Platform, and the Encoding Standard doesn't
> include the ISO-2022-JP-2 extension in its definition for the
> ISO-2022-JP decoder. Bringing our ISO-2022-JP decoder to compliance[2]
> would, therefore, involve removing ISO-2022-JP-2 support.
>
> The only known realistic source of ISO-2022-JP-2 data is Apple's Mail
> application under some circumstances, which may impact Thunderbird and
> SeaMonkey.
>
> Are there any objections to removing the ISO-2022-JP-2 functionality
> from mozilla-central?
>
> [1] https://bugzilla.mozilla.org/show_bug.cgi?id=72468
> [2] https://bugzilla.mozilla.org/show_bug.cgi?id=715833
> --
> Henri Sivonen
> [hidden email]
> https://hsivonen.fi/

Code implementing the above-quoted intent has landed.

--
Henri Sivonen
[hidden email]
https://hsivonen.fi/
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird