Removing libmime functionality

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Removing libmime functionality

Joshua Cranmer-2
libmime, as you may or may not know, is a very old module. Comments
indicate that its original genesis is at least early 1996, and it has no
references to RFC 2045 but a smattering to RFC 1521 and 1341, even older
variations of MIME. Time has moved in the 17 years since its genesis,
and I think it's worth considering removing functionality and features
that appear to be almost nonexistent in the modern email world. In my
rewrite of libmime in JS, I am not planning on providing support for
these features at all, and I am willing to countenance wholesale removal
in the current C implementation.

The candidate features are:
* multipart/appledouble - The last bug filed here was filed at the
beginning of 2011, where it was noted that the we appear to be mangling
the output data, but no progress has been observed on the bug, and it's
mentioned that few apps even bother with it. I do recall seeing
appledouble crop up in earlier code archaeology as being a potential
security hole.
* x-sun-attachment - I see no open bugs mentioning this. And the name
does not even follow any semblance of MIME rules. i suspect this has
been unused for a long time.
* BinHex support - There's an open bug on this not working correctly on
mac (but apparently on Linux), filed in 2006 with very little indication
if anyone still cares about it not working.
* text/enriched and text/richtext - The RFCs here actually go so far as
to say that they are a temporary solution. We don't generate these
emails (there is, unsurprisingly, an open bug here, filed in 2006, about
doing so), and we don't even parse them properly: we ignore features
(the bug to implement these was WONTFIX'd in 2008), and one of the
translations is to an HTML tag that hasn't been handled by gecko...
since 2002 (with no one apparently noticing this issue).

I also have lower-level technical features whose utility I am dubious of:
* The forceCharset parameter to RFC 2047 decoding (which overrides the
charset declared in the =? ?= tokens).
* Split and Header display parameters to nsIMimeStreamConverter
* text/xml and text/plain MIME emitters
* rot13, headers= [note: this is not header=] magic MIME URL parameters
* attempts to parse pre-libmime part numbers


If you disagree with removing support for these features, please speak
up. Also speak up if you have any other questions, comments, inquiries,
or concerns.

_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Removing libmime functionality

Axel Hecht-2
My concern would be the ability of reading archived data.

What happens after this change if you hit an email that uses these
features? Worse still, could there be emails relying on bugs in our impl?

Axel

On 15.01.13 06:38, Joshua Cranmer wrote:

> libmime, as you may or may not know, is a very old module. Comments
> indicate that its original genesis is at least early 1996, and it has no
> references to RFC 2045 but a smattering to RFC 1521 and 1341, even older
> variations of MIME. Time has moved in the 17 years since its genesis,
> and I think it's worth considering removing functionality and features
> that appear to be almost nonexistent in the modern email world. In my
> rewrite of libmime in JS, I am not planning on providing support for
> these features at all, and I am willing to countenance wholesale removal
> in the current C implementation.
>
> The candidate features are:
> * multipart/appledouble - The last bug filed here was filed at the
> beginning of 2011, where it was noted that the we appear to be mangling
> the output data, but no progress has been observed on the bug, and it's
> mentioned that few apps even bother with it. I do recall seeing
> appledouble crop up in earlier code archaeology as being a potential
> security hole.
> * x-sun-attachment - I see no open bugs mentioning this. And the name
> does not even follow any semblance of MIME rules. i suspect this has
> been unused for a long time.
> * BinHex support - There's an open bug on this not working correctly on
> mac (but apparently on Linux), filed in 2006 with very little indication
> if anyone still cares about it not working.
> * text/enriched and text/richtext - The RFCs here actually go so far as
> to say that they are a temporary solution. We don't generate these
> emails (there is, unsurprisingly, an open bug here, filed in 2006, about
> doing so), and we don't even parse them properly: we ignore features
> (the bug to implement these was WONTFIX'd in 2008), and one of the
> translations is to an HTML tag that hasn't been handled by gecko...
> since 2002 (with no one apparently noticing this issue).
>
> I also have lower-level technical features whose utility I am dubious of:
> * The forceCharset parameter to RFC 2047 decoding (which overrides the
> charset declared in the =? ?= tokens).
> * Split and Header display parameters to nsIMimeStreamConverter
> * text/xml and text/plain MIME emitters
> * rot13, headers= [note: this is not header=] magic MIME URL parameters
> * attempts to parse pre-libmime part numbers
>
>
> If you disagree with removing support for these features, please speak
> up. Also speak up if you have any other questions, comments, inquiries,
> or concerns.
>

_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Removing libmime functionality

Arivald
In reply to this post by Joshua Cranmer-2
W dniu 2013-01-15 06:38, Joshua Cranmer pisze:
> libmime, as you may or may not know, is a very old module. Comments
> indicate that its original genesis is at least early 1996, and it has no
> references to RFC 2045 but a smattering to RFC 1521 and 1341, even older
> variations of MIME. Time has moved in the 17 years since its genesis,
> and I think it's worth considering removing functionality and features
> that appear to be almost nonexistent in the modern email world. In my
> rewrite of libmime in JS, I am not planning on providing support for
> these features at all, and I am willing to countenance wholesale removal
> in the current C implementation.

Why JS? Emails with attachments could be large, are You sure JS
performance is enough?

I think it will be better to write it in C++, and move most heavy
processing off main thread. As I remember, all JS is executed in main
thread, and there is already too much problems with it (infamous TB
"pauses"...)

Also it will be good to make it "pluggable". I mean allow to add support
for any MIME through some interfaces. This will allow anyone to add any
old or new MIME feature support. Best is plug-in could be written in JS.

--
Arivald


_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Removing libmime functionality

Jonathan Protzenko
In reply to this post by Joshua Cranmer-2
First of all, congratulations on your heroic efforts for getting rid of
libmime. Having hacked on it significantly, I think I can truly
appreciate your work :).

A great deal of complexity stems from the fact that libmime has
provisions for old, buggy email clients; I remember reading some
comments about special-casing for one special version of Navigator that
used to send malformed emails... so yay for removing support for old
emails!

A few questions.
- What is the transition plan? One thing we could do is, whenever gloda
indexes a message, have it decoded both by your library and the original
libmime, and see whether the two disagree. That would be a good test for
your library, and it would only affect the jsmimeemitter, not the
regular message display component.
- Have you talked this over with Patrick Brunschwig (Enigmail author)?
There are some people out there who definitely need to be able to plug
in your infrastructure to provide support for extra mime parts.
- Does your new parser create a MimeMessage as in
mailnews/db/gloda/modules/mimemsg.js?
- If so, do you have plans for creating a MimeMessage → HTML renderer?

Thanks for all your efforts!

jonathan


On 01/15/2013 06:38 AM, Joshua Cranmer wrote:

> libmime, as you may or may not know, is a very old module. Comments
> indicate that its original genesis is at least early 1996, and it has
> no references to RFC 2045 but a smattering to RFC 1521 and 1341, even
> older variations of MIME. Time has moved in the 17 years since its
> genesis, and I think it's worth considering removing functionality and
> features that appear to be almost nonexistent in the modern email
> world. In my rewrite of libmime in JS, I am not planning on providing
> support for these features at all, and I am willing to countenance
> wholesale removal in the current C implementation.
>
> The candidate features are:
> * multipart/appledouble - The last bug filed here was filed at the
> beginning of 2011, where it was noted that the we appear to be
> mangling the output data, but no progress has been observed on the
> bug, and it's mentioned that few apps even bother with it. I do recall
> seeing appledouble crop up in earlier code archaeology as being a
> potential security hole.
> * x-sun-attachment - I see no open bugs mentioning this. And the name
> does not even follow any semblance of MIME rules. i suspect this has
> been unused for a long time.
> * BinHex support - There's an open bug on this not working correctly
> on mac (but apparently on Linux), filed in 2006 with very little
> indication if anyone still cares about it not working.
> * text/enriched and text/richtext - The RFCs here actually go so far
> as to say that they are a temporary solution. We don't generate these
> emails (there is, unsurprisingly, an open bug here, filed in 2006,
> about doing so), and we don't even parse them properly: we ignore
> features (the bug to implement these was WONTFIX'd in 2008), and one
> of the translations is to an HTML tag that hasn't been handled by
> gecko... since 2002 (with no one apparently noticing this issue).
>
> I also have lower-level technical features whose utility I am dubious of:
> * The forceCharset parameter to RFC 2047 decoding (which overrides the
> charset declared in the =? ?= tokens).
> * Split and Header display parameters to nsIMimeStreamConverter
> * text/xml and text/plain MIME emitters
> * rot13, headers= [note: this is not header=] magic MIME URL parameters
> * attempts to parse pre-libmime part numbers
>
>
> If you disagree with removing support for these features, please speak
> up. Also speak up if you have any other questions, comments,
> inquiries, or concerns.
>
> _______________________________________________
> dev-apps-thunderbird mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-apps-thunderbird

_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Removing libmime functionality

Joshua Cranmer-2
In reply to this post by Joshua Cranmer-2
On 1/15/2013 3:44 AM, Jonathan Protzenko wrote:
> - What is the transition plan? One thing we could do is, whenever gloda
> indexes a message, have it decoded both by your library and the original
> libmime, and see whether the two disagree. That would be a good test for
> your library, and it would only affect the jsmimeemitter, not the
> regular message display component.

There are approximately 5 distinct entry points to libmime:
- nsIMimeHeaders (an XPCOM wrapper around the MimeHeaders, er, struct)
- nsIMimeConverter (effectively an exposed RFC 2047 encoding/decoding
library, with other noscript methods I have a patch to replace with a
more natural C++ API)
- nsIMsgHeaderParser (effectively a parser for To: and related headers)
- the stream converter
- Gloda

The first three interfaces I plan to unconditionally replace with my
implementation, and I already have WIPs for two of them. It turns out
that we actually have pretty decent coverage of these interfaces in
tests, so passing our test suite on these interfaces makes me relatively
confident in my implementation. For gloda and the stream converter, my
plan is to provide a transition by providing alternate implementations
that can be controlled with a preference. My original goal was to have a
prototype in the tree by the time we ship the TB 24 branch, but it looks
like I will slip that schedule.

There are also other places where people should be using libmime but
aren't because, well, you can't, and as a result code up their own
ad-hoc parsers; my plan is to switch these as what I have works. The
cases I can think of off the top of my head are the fakeservers,
nsMsgBodySearch, and nsMsgDBFolder::GetMessageTextFromStream.

> - Have you talked this over with Patrick Brunschwig (Enigmail author)?
> There are some people out there who definitely need to be able to plug
> in your infrastructure to provide support for extra mime parts.

No, not yet--I have not yet prototyped this stage, but the needs of
Enigmail are one of the factors that drive my design decisions.
> - Does your new parser create a MimeMessage as in
> mailnews/db/gloda/modules/mimemsg.js?
No. Gloda's model of a mime message is similar to my own, but it is not
quite sufficient for my needs, and I need to investigate how gloda deals
with some of libmime's magic better [1], particularly for the case of
uuencode and yenc message bodies. Generating mimemsg.js from my own mime
parser takes fewer than 100 lines of code at present.
> - If so, do you have plans for creating a MimeMessage → HTML renderer?
My plans are to cleanly divide the MIME parser into three separate stages:
1. MIME structure parser
2. Mime structure -> body and attachments view
3. Actually displaying a message in the UI

(although keeping 2 and 3 separate is proving harder than anticipated)

[1] One key example here is that gloda lets libmime convert some parts
to HTML but not others--text/enriched being the dominant example
here--which is actually what prompted my featurectomy proposal in the
first place.
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Removing libmime functionality

Joshua Cranmer-2
In reply to this post by Axel Hecht-2
On 1/15/2013 2:01 AM, Axel Hecht wrote:
> My concern would be the ability of reading archived data.
>
> What happens after this change if you hit an email that uses these
> features? Worse still, could there be emails relying on bugs in our impl?

text/enriched would degrade to text/plain (so you would see the
formatting tags), but I suspect that whatever little use it now sees is
mostly paired with a text/plain or text/html in multipart/alternatives,
which we would prefer over it. I'm not as familiar with the other
formats, but multipart/appledouble would have a spurious second
attachment for the resource fork. We don't appear to even decode binhex
anymore (it may have been partially removed some time ago), so people
may not even notice a loss there compared to current versions. In the
case of x-sun-attachment, we would probably just show something that
would be akin to looking at MIME in a non-MIME compliant email messenger.
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Removing libmime functionality

Joshua Cranmer-2
In reply to this post by Arivald
On 1/15/2013 3:01 AM, Arivald wrote:

> W dniu 2013-01-15 06:38, Joshua Cranmer pisze:
>> libmime, as you may or may not know, is a very old module. Comments
>> indicate that its original genesis is at least early 1996, and it has no
>> references to RFC 2045 but a smattering to RFC 1521 and 1341, even older
>> variations of MIME. Time has moved in the 17 years since its genesis,
>> and I think it's worth considering removing functionality and features
>> that appear to be almost nonexistent in the modern email world. In my
>> rewrite of libmime in JS, I am not planning on providing support for
>> these features at all, and I am willing to countenance wholesale removal
>> in the current C implementation.
>
> Why JS? Emails with attachments could be large, are You sure JS
> performance is enough?
>
> I think it will be better to write it in C++, and move most heavy
> processing off main thread. As I remember, all JS is executed in main
> thread, and there is already too much problems with it (infamous TB
> "pauses"...)
>
> Also it will be good to make it "pluggable". I mean allow to add
> support for any MIME through some interfaces. This will allow anyone
> to add any old or new MIME feature support. Best is plug-in could be
> written in JS.
>

There are several reasons to prefer JS:
1. This allows for better future-proofing of our code with respect to
changes in Gecko. Web-compatible APIs are much less likely to change
under us than internal XPCOM APIs, and also more likely to see
performance improvements.
2. Writing JS that can run with content privileges allows us to share
this code with Gaia.
3. JS provides much more flexible APIs for several components, in
particular easier string processing and easy-to-use hashtables.
4. It is actually easier to use multiple threads in JS than it is in
C++, since C++ tempts you to use main-thread-only XPCOM interfaces. For
example, libmime presently reads about 50 preferences in several
different places, which cannot be done off the main thread [1].

I have given a great deal of thought in how to design the JS MIME
parser, and key goals do include extensibility and minimizing
unnecessary translation work.

[1] Libmime is not the reason we must do everything on the main thread.
That reason is actually the database, which is a very inherently
single-threaded implementation and whose use is very pervasive in the
backend. That said, this implementation may be enough to let us do
various indexing tasks off the main thread.
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Removing libmime functionality

Jonathan Kamens-4
In reply to this post by Joshua Cranmer-2
Would love teo see us get rid of libmime.

Is the text/plain emitter used when a message has only HTML
and the user sets View | Message Body As | Plain Text? If so,
how will that behave if the text/plain emitter is gone?
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: Removing libmime functionality

Joshua Cranmer-2
On 1/18/2013 10:34 AM, Jonathan Kamens wrote:
> Would love teo see us get rid of libmime.
>
> Is the text/plain emitter used when a message has only HTML
> and the user sets View | Message Body As | Plain Text? If so,
> how will that behave if the text/plain emitter is gone?
That part of the original post was more directed at developers and addon
authors, so the terminology may be confusing. The emitters live in the
offshoot directory mime/emitters/src and are the bridge between the
internal libmime functionality and the actual output mechanisms. Message
display (and most libmime functionality, in fact) uses the HTML emitter;
View | Message Body As | Plain Text affects which converter gets used to
translate a text/plain part.
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird