In the process of doing background research for a blog post I'm writing,
I started looking at what we need to do to be fully compliant with EAI specifications. To my knowledge, no other major email client supports EAI (i.e., no major webmail clients, not Apple Mail, Outlook, etc.), so if we implemented it, we would be one of the first, which gives us a very nice feature list we can add. The downside is that email servers don't also appear to be moving quickly to implement it, so I may end up patching Dovecot and Exim to test this functionality (I'm currently setting up a full email distribution network on my desktop computer using VMs). EAI, which stands for Email Address Internationalization, is the ability to use full Unicode in email addresses, which are historically pure ASCII. So an email like 🐧@☃.net [1] could be used instead of [hidden email]. It is specified in RFCs 6530-6533, RFC 6783, and RFC 6855-6858. I'll pause a moment to let you read those RFCs. . . . . . Since you probably didn't take the time to read them, I'll summarize the key changes that need to be made in all of our major components to follow the ew specifications properly. The EAI specifications break things down into two categories (ASCII and non-ASCII), but it is better to break it down into three categories (ASCII, IDN, and EAI). The distinguishing characteristic is whether encoding into Punycode can make the email address all-ASCII. Parsing: 1. Headers are now UTF-8 instead of ASCII (header names remain UTF-8, however). 2. The local parts of email addresses (everything before the @) have no restrictions except that they cannot contain the control characters of Unicode. Not even a normalization requirement. 3. Many message/* types have a "global" variant that supports non-ASCII characters. 4. message/global can have a content-transfer-encoding of base64 or quoted-printable. --> Changes we need to make: A. Modify the header parser to prefer parsing as UTF-8 if possible before falling back into the charset we guess it is. I already have a patch that effectively does this, just awaiting review. B. Add message/global and friends as aliases for message/rfc822 and friends in our parser. Since we already handle 8-bit characters in headers, there should be no need to differentiate between the two of them (and I would not be surprised if use of message/global ends up being rarer than hoped). C. Comparing two addresses for equality usually means doing a case-insensitive comparison right now. Since this isn't quite sufficient, it would probably be a good idea to add a method somewhere that asks if two email addresses are equivalent and use that instead of global code. [ We already support base64/qp'd message/rfc822, since that does come up in practice. ] IMAP: 1. You can opt-in to using UTF-8 instead of modified-UTF-7 for mailbox names (\o/). 2. You have to opt-in to the server not downgrading EAI messages to RFC 2047/RFC 2231 encodings. 3. Authentication protocols let you specify non-ASCII usernames and passwords. Interestingly, this only applies to SASL authentication commands for IMAP. --> Changes we need to make: A. Identify if the server supports the new features in capabilities and use them if possible. POP: 1. Similar changes to auth, and ability to request not downgrading UTF-8. 2. Server messages can be localized (they can be too in IMAP, but that is a separate RFC). ---> Changes we need to make: A. Similar to IMAP. IDN changes we need to make: A. Avoid displaying Punycode variants of an IDN, unless it causes homograph attacks. Reusing Firefox's policy here should be sufficient. B. I want to keep the Punycode variant of the display name an implementation detail hidden in MIME parsing/MIME writing as much as possible. C. We should make sure that users can specify servers and accounts with IDN [read: have tests for this sort of stuff]. D. Account autoconfiguration needs to properly support IDN domain names. For logic that attempts to use user@host as a login name, we probably need to make sure that it tries the Unicode variant of the hostname instead of the Punycode variant. E. Make sure that the address book and compose support IDN names without complaining about errors in the format of email addresses. F. I need to coordinate a bit with smontagu or other people who actually understand IDN to make sure that I properly understand all consequences. Changes to compose and SMTP: I want to change the basic model of how compose works. The basic model I've been playing at in my mind is that we change the composition code to essentially generate a fully internationalized email message, and then convert that to a lesser form by downgrading (i.e., using RFC 2047/2231). An ancillary change that would need to be made is to deemphasize the current model, where every header is stored and manipulated in full string form [2], in favor of one that keeps things in a more easy-to-work-with structured form that treats the MIME format as a minor implementation detail. Supporting EAI properly requires feedback from SMTP about what capabilities it supports (particularly in being able to use 8BITMIME and SMTPUTF8). The EAI specifications give no guidance on what to do, so the following algorithm is roughly what I'm proposing be implemented eventually: For every email [3], associate a trivalent value for "can receive EAI." The three values are yes, no, and don't know. "Don't know" means we turn to a hidden preference (defaulting to false, eventually to be flipped to true when support is more widespread) to answer the question. For anyone with an EAI address, this value is fixed to yes; for everyone else, it defaults to no. When sending a message, we first figure out if the SMTP server supports EAI (there would be a hidden preference which could force the answer to yes). If it does not, we need to attempt to obtain ASCII addresses for all recipients. This would be done, I presume, by looking in the address book for other email addresses. If we can't find an ASCII address, we report inability to send. Sending without SMTP EAI support proceeds by converting all EAI addresses to alternative ASCII and IDN addresses to Punycode variants and then sending a downgraded message. If the SMTP server supports EAI, we need to figure out which recipients can support it. If at least one recipient can't support EAI, we attempt to downgrade it as above. If that fails, due to lacking an alternate ASCII address, instead of aborting the send, we prepare two copies of the message. The original copy is sent to everyone who supports EAI. A downgraded variant that replaces problematic email addresses with invalid email address markers is sent to everyone who doesn't support EAI. Everything except the composition step is fairly straightforward and is probably worth implementing prior to TB 31. Implementing the composition step probably requires gutting compose and starting all over. It does require some design points for the address book/ensemble though (so mconley better not have given up reading this post yet), and it is probably worth coordinating with Gaia folks about their plans in this regard. Questions/comments/thoughts/concerns/inquiries/trolls/flames/rebuttals/ideas? [1] Pendants will note that ☃.net is not valid under IDNA2008 rules, but the .net registrar appears to still be following IDNA2003 rules for the time being... [2] This makes mailing lists in compose both extremely fragile and amazingly time consuming. Also, don't ask how many times we convert the header between UTF-8 and UTF-16. [3] This is meant to mirror the HTML send preference for contacts, which is proposed to move into a separate, not-address-book facility. I'd like to store this email in a similar, separate, not-address-book facility, although UI may need to be driven from the address-book. -- Joshua Cranmer Thunderbird and DXR developer Source code archæologist _______________________________________________ dev-apps-thunderbird mailing list [hidden email] https://lists.mozilla.org/listinfo/dev-apps-thunderbird |
Great, I'm very interested in this area, but without clients (gmail doesn't accept non-ascii either) it's a no-starter.
_______________________________________________ dev-apps-thunderbird mailing list [hidden email] https://lists.mozilla.org/listinfo/dev-apps-thunderbird |
In reply to this post by Joshua Cranmer 🐧
This is by far the best overview that I've found, so far, about what it takes to properly do EAI.
I recently registered 'トトロ.みんな' as a test to see just how far 'the internet' is in supporting all that's required to make it work. Gmail/google has declared their intent to make IDN/EAI for their products; and I can indeed mail from/to my gmail-account from/to there without too much fuss. Too bad that it displays punicode-versions in the 'From:' headers, but so be it. However, using postmaster@トトロ.みんな as an 'identity' in thunderbird fails completely. I have to enter the punycode version of the address or it'll barf. There's yet a bit of work to be done, indeed; however overview of exactly what that work is, is greatly appreciated and very very welcome. Thank you for this! _______________________________________________ dev-apps-thunderbird mailing list [hidden email] https://lists.mozilla.org/listinfo/dev-apps-thunderbird |
Free forum by Nabble | Edit this page |