Re: Intent to remove: UTF-16 encoders from the XUL platform

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Intent to remove: UTF-16 encoders from the XUL platform

Philipp Kewisch-2
Hello Thunderbird Folks,

if this is an issue for Thunderbird we should speak up, and/or
contribute support for UTF-16 in encoding-rs. Maybe UTF-16 is used for
some languages in email?

Please followup to m.d.a.thunderbird if it is not relevant for the
platform folks, otherwise use the thread on m.d.platform.

Philipp

On 6/1/16 8:31 PM, Henri Sivonen wrote:

> UTF-16 encoders are not part of the Web Platform. Therefore, it
> doesn't make sense to implement them in a Web-oriented encoding
> library that has the Encoding Standard as its conformance target. As a
> result, I intend not to add UTF-16 encoders to encoding-rs
> (https://github.com/hsivonen/encoding-rs).
>
> As a result, once the time comes to replace uconv with encoding-rs
> (see https://groups.google.com/d/msg/mozilla.dev.platform/sefrg5Of8tw/_WK7Vtk9AAAJ
> ), UTF-16 encoders will get removed from the XUL platform as a side
> effect.
>
> Note: "Encoders" above means components that output a byte stream for
> interchange. It's always a bad idea to use UTF-16 for interchange.
> There's still going be support for conversions where the output is an
> application-internal sequence of char16_t. (Those are pretty vital to
> Gecko!)
>

_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Intent to remove: UTF-16 encoders from the XUL platform

R Kent James
I don't know what the issues are here. Is there any way we can figure
out if we are using the code that they are threatening to remove?

Kent James

On 6/1/2016 2:04 PM, Philipp Kewisch wrote:

> Hello Thunderbird Folks,
>
> if this is an issue for Thunderbird we should speak up, and/or
> contribute support for UTF-16 in encoding-rs. Maybe UTF-16 is used for
> some languages in email?
>
> Please followup to m.d.a.thunderbird if it is not relevant for the
> platform folks, otherwise use the thread on m.d.platform.
>
> Philipp
>
> On 6/1/16 8:31 PM, Henri Sivonen wrote:
>> UTF-16 encoders are not part of the Web Platform. Therefore, it
>> doesn't make sense to implement them in a Web-oriented encoding
>> library that has the Encoding Standard as its conformance target. As a
>> result, I intend not to add UTF-16 encoders to encoding-rs
>> (https://github.com/hsivonen/encoding-rs).
>>
>> As a result, once the time comes to replace uconv with encoding-rs
>> (see https://groups.google.com/d/msg/mozilla.dev.platform/sefrg5Of8tw/_WK7Vtk9AAAJ
>> ), UTF-16 encoders will get removed from the XUL platform as a side
>> effect.
>>
>> Note: "Encoders" above means components that output a byte stream for
>> interchange. It's always a bad idea to use UTF-16 for interchange.
>> There's still going be support for conversions where the output is an
>> application-internal sequence of char16_t. (Those are pretty vital to
>> Gecko!)
>>
>

_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Intent to remove: UTF-16 encoders from the XUL platform

TRC-2
Do we need the UTF-16
R Kent James wrote on 6/1/2016 6:29 PM:

> I don't know what the issues are here. Is there any way we can figure
> out if we are using the code that they are threatening to remove?
>
> Kent James
>
> On 6/1/2016 2:04 PM, Philipp Kewisch wrote:
>> Hello Thunderbird Folks,
>>
>> if this is an issue for Thunderbird we should speak up, and/or
>> contribute support for UTF-16 in encoding-rs. Maybe UTF-16 is used for
>> some languages in email?
>>
>> Please followup to m.d.a.thunderbird if it is not relevant for the
>> platform folks, otherwise use the thread on m.d.platform.
>>
>> Philipp
>>
>> On 6/1/16 8:31 PM, Henri Sivonen wrote:
>>> UTF-16 encoders are not part of the Web Platform. Therefore, it
>>> doesn't make sense to implement them in a Web-oriented encoding
>>> library that has the Encoding Standard as its conformance target. As a
>>> result, I intend not to add UTF-16 encoders to encoding-rs
>>> (https://github.com/hsivonen/encoding-rs).
>>>
>>> As a result, once the time comes to replace uconv with encoding-rs
>>> (see https://groups.google.com/d/msg/mozilla.dev.platform/sefrg5Of8tw/_WK7Vtk9AAAJ
>>> ), UTF-16 encoders will get removed from the XUL platform as a side
>>> effect.
>>>
>>> Note: "Encoders" above means components that output a byte stream for
>>> interchange. It's always a bad idea to use UTF-16 for interchange.
>>> There's still going be support for conversions where the output is an
>>> application-internal sequence of char16_t. (Those are pretty vital to
>>> Gecko!)
>>>
>>
>

--
Ron K.
Thunderbird user since May, 2003
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Intent to remove: UTF-16 encoders from the XUL platform

TRC-2
In reply to this post by R Kent James
Don't we need the UTF-16 for the CJK languages?

Ron K.
Thunderbird user since May, 2003

R Kent James wrote on 6/1/2016 6:29 PM:

> I don't know what the issues are here. Is there any way we can figure
> out if we are using the code that they are threatening to remove?
>
> Kent James
>
> On 6/1/2016 2:04 PM, Philipp Kewisch wrote:
>> Hello Thunderbird Folks,
>>
>> if this is an issue for Thunderbird we should speak up, and/or
>> contribute support for UTF-16 in encoding-rs. Maybe UTF-16 is used for
>> some languages in email?
>>
>> Please followup to m.d.a.thunderbird if it is not relevant for the
>> platform folks, otherwise use the thread on m.d.platform.
>>
>> Philipp
>>
>> On 6/1/16 8:31 PM, Henri Sivonen wrote:
>>> UTF-16 encoders are not part of the Web Platform. Therefore, it
>>> doesn't make sense to implement them in a Web-oriented encoding
>>> library that has the Encoding Standard as its conformance target. As a
>>> result, I intend not to add UTF-16 encoders to encoding-rs
>>> (https://github.com/hsivonen/encoding-rs).
>>>
>>> As a result, once the time comes to replace uconv with encoding-rs
>>> (see https://groups.google.com/d/msg/mozilla.dev.platform/sefrg5Of8tw/_WK7Vtk9AAAJ
>>> ), UTF-16 encoders will get removed from the XUL platform as a side
>>> effect.
>>>
>>> Note: "Encoders" above means components that output a byte stream for
>>> interchange. It's always a bad idea to use UTF-16 for interchange.
>>> There's still going be support for conversions where the output is an
>>> application-internal sequence of char16_t. (Those are pretty vital to
>>> Gecko!)
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Intent to remove: UTF-16 encoders from the XUL platform

Joshua Cranmer 🐧
In reply to this post by R Kent James
On 6/1/2016 6:29 PM, R Kent James wrote:
> I don't know what the issues are here. Is there any way we can figure
> out if we are using the code that they are threatening to remove?

This impacts the TextEncoder API primarily, which isn't used in mailnews
code outside of RFC 2047 encoding (where we use UTF-8 exclusively
anyways) AFAIK. The goal seems to also be to remove the encoder accessed
via nsIUnicodeEncoder, which we don't seem to use at all judging from a
search for UTF-16. The only case I wanted it was where I was
implementing SASL NTLM authentication, but UTF-16 encoding is simple
enough to implement in base JS as it is. Note that we don't support
composing to UTF-16 (it really is a nasty charset to use for a message,
since it's not a superset of ASCII).

Speaking of which, we should probably look at trimming our compose
charset support to UTF-8, ISO-8859-1, ISO-2022-JP, Big5, and GB18030 (I
think that's the minimal set we need). At the very least, we should look
at killing all non-CJK charsets except for ISO-8559-1 and UTF-8.

--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Intent to remove: UTF-16 encoders from the XUL platform

ISHIKAWA,chiaki
In reply to this post by TRC-2
Ron K. wrote:
> Don't we need the UTF-16 for the CJK languages?
>
> Ron K.
> Thunderbird user since May, 2003
>

As for Japanese,
support for UTF-8 and ISO-2022-JP would be required but not for UTF-16.
I think Korean and Chinese also do not need UTF-16.

TIA

> R Kent James wrote on 6/1/2016 6:29 PM:
>> I don't know what the issues are here. Is there any way we can figure
>> out if we are using the code that they are threatening to remove?
>>
>> Kent James
>>
>> On 6/1/2016 2:04 PM, Philipp Kewisch wrote:
>>> Hello Thunderbird Folks,
>>>
>>> if this is an issue for Thunderbird we should speak up, and/or
>>> contribute support for UTF-16 in encoding-rs. Maybe UTF-16 is used for
>>> some languages in email?
>>>
>>> Please followup to m.d.a.thunderbird if it is not relevant for the
>>> platform folks, otherwise use the thread on m.d.platform.
>>>
>>> Philipp
>>>
>>> On 6/1/16 8:31 PM, Henri Sivonen wrote:
>>>> UTF-16 encoders are not part of the Web Platform. Therefore, it
>>>> doesn't make sense to implement them in a Web-oriented encoding
>>>> library that has the Encoding Standard as its conformance target. As a
>>>> result, I intend not to add UTF-16 encoders to encoding-rs
>>>> (https://github.com/hsivonen/encoding-rs).
>>>>
>>>> As a result, once the time comes to replace uconv with encoding-rs
>>>> (see
>>>> https://groups.google.com/d/msg/mozilla.dev.platform/sefrg5Of8tw/_WK7Vtk9AAAJ
>>>>
>>>> ), UTF-16 encoders will get removed from the XUL platform as a side
>>>> effect.
>>>>
>>>> Note: "Encoders" above means components that output a byte stream for
>>>> interchange. It's always a bad idea to use UTF-16 for interchange.
>>>> There's still going be support for conversions where the output is an
>>>> application-internal sequence of char16_t. (Those are pretty vital to
>>>> Gecko!)
> _______________________________________________
> dev-apps-thunderbird mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-apps-thunderbird
>
>

_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Intent to remove: UTF-16 encoders from the XUL platform

Henri Sivonen-2
In reply to this post by Joshua Cranmer 🐧
On Thu, Jun 2, 2016 at 6:04 PM, Joshua Cranmer 🐧 <[hidden email]> wrote:
> Speaking of which, we should probably look at trimming our compose charset
> support to UTF-8, ISO-8859-1, ISO-2022-JP, Big5, and GB18030 (I think that's
> the minimal set we need). At the very least, we should look at killing all
> non-CJK charsets except for ISO-8559-1 and UTF-8.

I know why UTF-8 and ISO-2022-JP are on that list, but what's the
rationale for ISO-8859-1, Big5 and GB18030 being on that list?

ISO-8859-1 as an encoding distinct from windows-1252 is on track for
removal from m-c (https://bugzilla.mozilla.org/show_bug.cgi?id=1071470
but in practice will happen as part of
https://bugzilla.mozilla.org/show_bug.cgi?id=encoding_rs).
(windows-1252 will remain, of course, and "ISO-8859-1" will remain as
one of its labels.)

--
Henri Sivonen
[hidden email]
https://hsivonen.fi/
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Intent to remove: UTF-16 encoders from the XUL platform

Joshua Cranmer 🐧
In reply to this post by Joshua Cranmer 🐧
On 8/25/2016 3:22 AM, Henri Sivonen wrote:
> On Thu, Jun 2, 2016 at 6:04 PM, Joshua Cranmer 🐧 <[hidden email]> wrote:
>> Speaking of which, we should probably look at trimming our compose charset
>> support to UTF-8, ISO-8859-1, ISO-2022-JP, Big5, and GB18030 (I think that's
>> the minimal set we need). At the very least, we should look at killing all
>> non-CJK charsets except for ISO-8559-1 and UTF-8.
> I know why UTF-8 and ISO-2022-JP are on that list, but what's the
> rationale for ISO-8859-1, Big5 and GB18030 being on that list?

I don't know if Big5 or GB18030 should be on the list, my knowledge of
practices in the Sinosphere is pretty poor. Someone did post a comment
on a bug suggesting that software in China had to support GB18030,
although I can't substantiate that comment or analyze its consequences.
ISO-8859-1(/Windows-1252) is included largely because, if there ever is
a reason to change a charset manually, that is the charset you're going
to change to (outside of the jp/UTF-8 pissing match).

> ISO-8859-1 as an encoding distinct from windows-1252 is on track for
> removal from m-c (https://bugzilla.mozilla.org/show_bug.cgi?id=1071470
> but in practice will happen as part of
> https://bugzilla.mozilla.org/show_bug.cgi?id=encoding_rs).
> (windows-1252 will remain, of course, and "ISO-8859-1" will remain as
> one of its labels.)

I'm using ISO-8859-1 as a synonym for Windows-1252 here. In theory, a
mode that outputs pure ISO-8859-1 and kicks up to UTF-8 if not possible
would be preferable, but I don't care enough to do extra work to make it
happen


--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Intent to remove: UTF-16 encoders from the XUL platform

Henri Sivonen-2
On Thu, Aug 25, 2016 at 6:23 PM, Joshua Cranmer 🐧 <[hidden email]>
wrote:

On 8/25/2016 3:22 AM, Henri Sivonen wrote:

>
>> On Thu, Jun 2, 2016 at 6:04 PM, Joshua Cranmer 🐧 <[hidden email]>
>> wrote:
>>
>>> Speaking of which, we should probably look at trimming our compose
>>> charset
>>> support to UTF-8, ISO-8859-1, ISO-2022-JP, Big5, and GB18030 (I think
>>> that's
>>> the minimal set we need). At the very least, we should look at killing
>>> all
>>> non-CJK charsets except for ISO-8559-1 and UTF-8.
>>>
>> I know why UTF-8 and ISO-2022-JP are on that list, but what's the
>> rationale for ISO-8859-1, Big5 and GB18030 being on that list?
>>
>
> I don't know if Big5 or GB18030 should be on the list, my knowledge of
> practices in the Sinosphere is pretty poor. Someone did post a comment on a
> bug suggesting that software in China had to support GB18030, although I
> can't substantiate that comment or analyze its consequences.


I've seen zero evidence supporting the notion that Big5 would be more
important to retain as an output option than e.g. ISO-8859-2 or KOI-R.

In the case of GB18030, there seem to be more concerns (based mostly on
stories scaring American enterprise vendors to pay for certification
consultation AFAICT) than solid facts around. It's worth noting that
GB18030 specifies an encoding and a mandatory-to-support subset-of-Unicode
repertoire. I'd expect people to care more about the mandatory-to-support
repertoire (including support for astral characters) than to care about the
encoding on the wire being GB18030.

As an anecdote about the importance of the encoding (as opposed to
repertoire), when I checked a few years ago, the site of the agency
overseeing GB18030 certification used GB2312, and today the site uses
UTF-8: http://www.cesi.cn/index.html

>
> ISO-8859-1(/Windows-1252) is included largely because, if there ever is a
> reason to change a charset manually, that is the charset you're going to
> change to (outside of the jp/UTF-8 pissing match).


I don't really see value in offering a way to manually request
windows-1252. It seems that receiving MUAs are able to deal with UTF-8 just
fine and users who might do the override out of very old habit are likely
to object to windows-1252 as opposed to "real" ISO-8859-1 anyway.

--
Henri Sivonen
[hidden email]
https://hsivonen.fi/
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird