nsIMsgLocalMailFolder addMessage produces weirdly encoded message source

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

nsIMsgLocalMailFolder addMessage produces weirdly encoded message source

karel.gudera
Hello,

I have following code:

var messageSource = "From - Thu Dec 29 19:20:20 2016\r\n" +
                    "Subject: Příliš žluťoučký kůň\r\n" +
                    "Content-Type: text/plain; charset=UTF-8\n" +
                    "\r\n" +
                    "Body: Příliš žluťoučký kůň\r\n" +
                    "\r\n";
folder.QueryInterface(Components.interfaces.nsIMsgLocalMailFolder);
folder.addMessage(messageSource);

This will produce message with following source:

----------------------------------------
From - Thu Dec 29 19:20:20 2016
Subject: PYília ~lueou
ký koH
Content-Type: text/plain; charset=UTF-8

Body: PYília ~lueou
ký koH
----------------------------------------

This is displayed incorectly in the UI and I belive, that it is caused, because
folder default encoding is ISO-8859-1 and javascript string encoding is UTF-16, so this source is UTF16 -> ISO-8859-1.

Then I tried folowing:

I took utf8Encode function from https://gist.github.com/chrisveness/bcb00eb717e6382c5608 and changed:

folder.addMessage(utf8Encode(messageSource));

This produced following source:

--------------------------------------
From - Thu Dec 29 19:20:20 2016
Subject: Příliš žluťoučký kůň
Content-Type: text/plain; charset=UTF-8

Body: Příliš žluťoučký kůň
--------------------------------------

This is displayed correctly in UI and in fact other 1378 of 1382 random messages
are also displayed fine (yes checked myself).

This is a piece of messageSource, that I can't get to display correctly (I always have raw mime data in messageSource variable):

--frontier
Content-type: text/html; charset=ISO-8859-2

p��li� �lu�ou�k� k��

So how to handle it? How is thunderbird processing mime headers like that.
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: nsIMsgLocalMailFolder addMessage produces weirdly encoded message source

Jörg Knobloch
On 29/12/2016 23:17, [hidden email] wrote:
> This is a piece of messageSource, that I can't get to display correctly (I always have raw mime data in messageSource variable):
>
> --frontier
> Content-type: text/html; charset=ISO-8859-2
>
> p��li� �lu�ou�k� k��
>
> So how to handle it? How is thunderbird processing mime headers like that.
I don't really understand the question. ISO-8859-2 is an 8-bit
single-byte charset. Interpreting a byte with the high bit set as
unicode will fail and result in invalid unicode characters commonly
shown as �

Raw unicode (UTF-8) is also admissible in mail headers, see RFC 6532.

Thunderbird has MIME libraries that are pretty good these days. So if
the message doesn't display correctly, it's because it's not encoded
properly.

Jörg.


_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: nsIMsgLocalMailFolder addMessage produces weirdly encoded message source

karel.gudera
In reply to this post by karel.gudera
Dne čtvrtek 29. prosince 2016 23:46:49 UTC+1 Jörg Knobloch napsal(a):

> On 29/12/2016 23:17, [hidden email] wrote:
> > This is a piece of messageSource, that I can't get to display correctly (I always have raw mime data in messageSource variable):
> >
> > --frontier
> > Content-type: text/html; charset=ISO-8859-2
> >
> > p��li� �lu�ou�k� k��
> >
> > So how to handle it? How is thunderbird processing mime headers like that.
> I don't really understand the question. ISO-8859-2 is an 8-bit
> single-byte charset. Interpreting a byte with the high bit set as
> unicode will fail and result in invalid unicode characters commonly
> shown as �
>
> Raw unicode (UTF-8) is also admissible in mail headers, see RFC 6532.
>
> Thunderbird has MIME libraries that are pretty good these days. So if
> the message doesn't display correctly, it's because it's not encoded
> properly.
>
> Jörg.

I just want to know how to correctly add message to folder if I have original mime content, because when I do folder.addMessage(originalMime) and when I download the message using imap/pop3 the message source is different. I doesen't seem that the mime content is processed the same way.
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: nsIMsgLocalMailFolder addMessage produces weirdly encoded message source

Jörg Knobloch
On 30/12/2016 00:18, [hidden email] wrote:
> I just want to know how to correctly add message to folder if I have original mime content, because when I do folder.addMessage(originalMime) and when I download the message using imap/pop3 the message source is different. I doesen't seem that the mime content is processed the same way.

Looks like you end up calling

NS_IMETHODIMP
nsMsgLocalMailFolder::AddMessage(const char *aMessage, nsIMsgDBHdr **aHdr)

So the body you pass in is just a simple byte stream. If your result is
meant to be UFT-8 encoded, it's a good idea to convert your string to
UTF-8 before passing it in.

I guess that if you want to encode your message as ISO-8859-2, you need
to make sure that the byte stream you're passing in is in fact valid in
ISO-8859-2.

I don't know where the "mime content" comes from. But if you retrieved
it through some JS interface, it will be a JS string (which is
internally encoded as UTF-16 if I'm not mistaken). That needs to be
encoded with the charset of the message.

Take a look at the hoops we go though when encoding strings, for example
here:
https://dxr.mozilla.org/comm-central/rev/f63a0b7059c7e99e7a3435d22a0d0cb9fa35857f/mailnews/compose/test/unit/test_longLines.js#41

BTW, there are plenty of simple examples of addMessage, but most use
pure ASCII:

https://dxr.mozilla.org/comm-central/search?q=addMessage&redirect=false

Jörg.

_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: nsIMsgLocalMailFolder addMessage produces weirdly encoded message source

karel.gudera
In reply to this post by karel.gudera
It's funny that every time I post question somewhere few hours later I got the problem solved although I was struggling with it for 3 days. The problem was that I corrupted the mime content data by base64 encoding/decoding. I have no idea why it's broking it but nevermind. Anyway thank you for your response and directing me the right way.
_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird
Reply | Threaded
Open this post in threaded view
|

Re: nsIMsgLocalMailFolder addMessage produces weirdly encoded message source

Joshua Cranmer 🐧
In reply to this post by karel.gudera
On 12/29/2016 5:39 PM, Jörg Knobloch wrote:

> On 30/12/2016 00:18, [hidden email] wrote:
>> I just want to know how to correctly add message to folder if I have
>> original mime content, because when I do
>> folder.addMessage(originalMime) and when I download the message using
>> imap/pop3 the message source is different. I doesen't seem that the
>> mime content is processed the same way.
>
> Looks like you end up calling
>
> NS_IMETHODIMP
> nsMsgLocalMailFolder::AddMessage(const char *aMessage, nsIMsgDBHdr
> **aHdr)
>
> So the body you pass in is just a simple byte stream. If your result
> is meant to be UFT-8 encoded, it's a good idea to convert your string
> to UTF-8 before passing it in.

Well, actually the thing to look at is:
<https://dxr.mozilla.org/comm-central/source/mailnews/local/public/nsIMsgLocalMailFolder.idl#73>,
since you're calling it from JS.

Parameters of type string or ACString are interpreted as bytestrings
when called from JS, which is to say that each character is converted by
dropping the high byte (i.e., ISO-8859-1). A parameter of type
AUTF8String is converted to UTF-8, and wstring and AString retain
UTF-16. The confusing part is that all of these parameters are treated
as the same string type in JS, and ACString and AUTF8String have the
same C++ representation.

Yes, it is a mess.

--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

_______________________________________________
dev-apps-thunderbird mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-apps-thunderbird