EAI support in Thunderbird

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

EAI support in Thunderbird

Joshua Cranmer 🐧
In the process of doing background research for a blog post I'm writing,
I started looking at what we need to do to be fully compliant with EAI
specifications. To my knowledge, no other major email client supports
EAI (i.e., no major webmail clients, not Apple Mail, Outlook, etc.), so
if we implemented it, we would be one of the first, which gives us a
very nice feature list we can add. The downside is that email servers
don't also appear to be moving quickly to implement it, so I may end up
patching Dovecot and Exim to test this functionality (I'm currently
setting up a full email distribution network on my desktop computer
using VMs).

EAI, which stands for Email Address Internationalization, is the ability
to use full Unicode in email addresses, which are historically pure
ASCII. So an email like 🐧@☃.net [1] could be used instead of
[hidden email]. It is specified in RFCs 6530-6533, RFC 6783, and
RFC 6855-6858. I'll pause a moment to let you read those RFCs.

.

.

.

.

.

Since you probably didn't take the time to read them, I'll summarize the
key changes that need to be made in all of our major components to
follow the ew specifications properly. The EAI specifications break
things down into two categories (ASCII and non-ASCII), but it is better
to break it down into three categories (ASCII, IDN, and EAI). The
distinguishing characteristic is whether encoding into Punycode can make
the email address all-ASCII.

Parsing:
1. Headers are now UTF-8 instead of ASCII (header names remain UTF-8,
however).
2. The local parts of email addresses (everything before the @) have no
restrictions except that they cannot contain the control characters of
Unicode. Not even a normalization requirement.
3. Many message/* types have a "global" variant that supports non-ASCII
characters.
4. message/global can have a content-transfer-encoding of base64 or
quoted-printable.
--> Changes we need to make:
A. Modify the header parser to prefer parsing as UTF-8 if possible
before falling back into the charset we guess it is. I already have a
patch that effectively does this, just awaiting review.
B. Add message/global and friends as aliases for message/rfc822 and
friends in our parser. Since we already handle 8-bit characters in
headers, there should be no need to differentiate between the two of
them (and I would not be surprised if use of message/global ends up
being rarer than hoped).
C. Comparing two addresses for equality usually means doing a
case-insensitive comparison right now. Since this isn't quite
sufficient, it would probably be a good idea to add a method somewhere
that asks if two email addresses are equivalent and use that instead of
global code.
[ We already support base64/qp'd message/rfc822, since that does come up
in practice. ]

IMAP:
1. You can opt-in to using UTF-8 instead of modified-UTF-7 for mailbox
names (\o/).
2. You have to opt-in to the server not downgrading EAI messages to RFC
2047/RFC 2231 encodings.
3. Authentication protocols let you specify non-ASCII usernames and
passwords. Interestingly, this only applies to SASL authentication
commands for IMAP.
--> Changes we need to make:
A. Identify if the server supports the new features in capabilities and
use them if possible.

POP:
1. Similar changes to auth, and ability to request not downgrading UTF-8.
2. Server messages can be localized (they can be too in IMAP, but that
is a separate RFC).
---> Changes we need to make:
A. Similar to IMAP.

IDN changes we need to make:
A. Avoid displaying Punycode variants of an IDN, unless it causes
homograph attacks. Reusing Firefox's policy here should be sufficient.
B. I want to keep the Punycode variant of the display name an
implementation detail hidden in MIME parsing/MIME writing as much as
possible.
C. We should make sure that users can specify servers and accounts with
IDN [read: have tests for this sort of stuff].
D. Account autoconfiguration needs to properly support IDN domain names.
For logic that attempts to use user@host as a login name, we probably
need to make sure that it tries the Unicode variant of the hostname
instead of the Punycode variant.
E. Make sure that the address book and compose support IDN names without
complaining about errors in the format of email addresses.
F. I need to coordinate a bit with smontagu or other people who actually
understand IDN to make sure that I properly understand all consequences.

Changes to compose and SMTP:
I want to change the basic model of how compose works. The basic model
I've been playing at in my mind is that we change the composition code
to essentially generate a fully internationalized email message, and
then convert that to a lesser form by downgrading (i.e., using RFC
2047/2231). An ancillary change that would need to be made is to
deemphasize the current model, where every header is stored and
manipulated in full string form [2], in favor of one that keeps things
in a more easy-to-work-with structured form that treats the MIME format
as a minor implementation detail. Supporting EAI properly requires
feedback from SMTP about what capabilities it supports (particularly in
being able to use 8BITMIME and SMTPUTF8). The EAI specifications give no
guidance on what to do, so the following algorithm is roughly what I'm
proposing be implemented eventually:

For every email [3], associate a trivalent value for "can receive EAI."
The three values are yes, no, and don't know. "Don't know" means we turn
to a hidden preference (defaulting to false, eventually to be flipped to
true when support is more widespread) to answer the question. For anyone
with an EAI address, this value is fixed to yes; for everyone else, it
defaults to no.

When sending a message, we first figure out if the SMTP server supports
EAI (there would be a hidden preference which could force the answer to
yes). If it does not, we need to attempt to obtain ASCII addresses for
all recipients. This would be done, I presume, by looking in the address
book for other email addresses. If we can't find an ASCII address, we
report inability to send. Sending without SMTP EAI support proceeds by
converting all EAI addresses to alternative ASCII and IDN addresses to
Punycode variants and then sending a downgraded message.

If the SMTP server supports EAI, we need to figure out which recipients
can support it. If at least one recipient can't support EAI, we attempt
to downgrade it as above. If that fails, due to lacking an alternate
ASCII address, instead of aborting the send, we prepare two copies of
the message. The original copy is sent to everyone who supports EAI. A
downgraded variant that replaces problematic email addresses with
invalid email address markers is sent to everyone who doesn't support EAI.


Everything except the composition step is fairly straightforward and is
probably worth implementing prior to TB 31. Implementing the composition
step probably requires gutting compose and starting all over. It does
require some design points for the address book/ensemble though (so
mconley better not have given up reading this post yet), and it is
probably worth coordinating with Gaia folks about their plans in this
regard.

Questions/comments/thoughts/concerns/inquiries/trolls/flames/rebuttals/ideas?

[1] Pendants will note that ☃.net is not valid under IDNA2008 rules, but
the .net registrar appears to still be following IDNA2003 rules for the
time being...
[2] This makes mailing lists in compose both extremely fragile and
amazingly time consuming. Also, don't ask how many times we convert the
header between UTF-8 and UTF-16.
[3] This is meant to mirror the HTML send preference for contacts, which
is proposed to move into a separate, not-address-book facility. I'd like
to store this email in a similar, separate, not-address-book facility,
although UI may need to be driven from the address-book.

--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: EAI support in Thunderbird

peterpasschier
Great, I'm very interested in this area, but without clients (gmail doesn't accept non-ascii either) it's a no-starter.
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: EAI support in Thunderbird

insensitiveclod42
In reply to this post by Joshua Cranmer 🐧
This is by far the best overview that I've found, so far, about what it takes to properly do EAI.
I recently registered 'トトロ.みんな' as a test to see just how far 'the internet' is in supporting all that's required to make it work.
Gmail/google has declared their intent to make IDN/EAI for their products; and I can indeed mail from/to my gmail-account from/to there without too much fuss. Too bad that it displays punicode-versions in the 'From:' headers, but so be it.

However, using postmaster@トトロ.みんな as an 'identity' in thunderbird fails completely. I have to enter the punycode version of the address or it'll barf.
There's yet a bit of work to be done, indeed; however overview of exactly what that work is, is greatly appreciated and very very welcome.

Thank you for this!

_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird