Display unicode character in subject line using ThunderBird

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Display unicode character in subject line using ThunderBird

wing328hk
Hi,

I'm using ThunderBird and found that the following subject line is
corrupted:

Subject: =?UTF-8?Q?Out=20of=20Office=20=E7=A7=81=E3=81=AF=E3?=

=?UTF-8?Q?=82=AA=E3=83=95=E3=82=A3=E3=82=B9=E3=81=AB=E3=81=AA=E3=81=84?=
 <

=?UTF-8?Q?=E5=86=8D=E5=A4=9A=E5=81=9A=E4=B8=80=E9=BB=9E=EF=BC=8C=E6=88=90?=
 =?UTF-8?Q?=E5=8A=9F=E5=B0=B1=E5=A4=9A=E4=B8=80=E9=BB=9E=EF=BC=81?=>

(Originally, the text above is aligned properly. Please copy above text
into notepad or vi to see the aligned version)

The character at the end of second line, which is =84, is corrupted.
However, Outlook2003 can display the above subject line without any
problem.

Does anyone know how to solve the problem?

Is it a bug in thunderbird? or the above subject line doesn't comply
with the MIME standard (RFC2047?) ?

Thanks,
Wing

_______________________________________________
mozilla-general mailing list
[hidden email]
http://mail.mozilla.org/listinfo/mozilla-general
Reply | Threaded
Open this post in threaded view
|

Re: Display unicode character in subject line using ThunderBird

gwtc
[hidden email] wrote:

> Hi,
>
> I'm using ThunderBird and found that the following subject line is
> corrupted:
>
> Subject: =?UTF-8?Q?Out=20of=20Office=20=E7=A7=81=E3=81=AF=E3?=
>
> =?UTF-8?Q?=82=AA=E3=83=95=E3=82=A3=E3=82=B9=E3=81=AB=E3=81=AA=E3=81=84?=
>  <
>
> =?UTF-8?Q?=E5=86=8D=E5=A4=9A=E5=81=9A=E4=B8=80=E9=BB=9E=EF=BC=8C=E6=88=90?=
>  =?UTF-8?Q?=E5=8A=9F=E5=B0=B1=E5=A4=9A=E4=B8=80=E9=BB=9E=EF=BC=81?=>
>
> (Originally, the text above is aligned properly. Please copy above text
> into notepad or vi to see the aligned version)
>
> The character at the end of second line, which is =84, is corrupted.
> However, Outlook2003 can display the above subject line without any
> problem.
>
> Does anyone know how to solve the problem?
>
> Is it a bug in thunderbird? or the above subject line doesn't comply
> with the MIME standard (RFC2047?) ?
>
> Thanks,
> Wing
>
try fiddling around with the character encoding under View
_______________________________________________
mozilla-general mailing list
[hidden email]
http://mail.mozilla.org/listinfo/mozilla-general
Reply | Threaded
Open this post in threaded view
|

Re: Display unicode character in subject line using ThunderBird

Ralph Fox-2
In reply to this post by wing328hk
On 4 Jan 2006 06:41:31 -0800, in message
 <[hidden email]>, [hidden email] wrote:

> Hi,
>
> I'm using ThunderBird and found that the following subject line is
> corrupted:
>
> Subject: =?UTF-8?Q?Out=20of=20Office=20=E7=A7=81=E3=81=AF=E3?=
>
> =?UTF-8?Q?=82=AA=E3=83=95=E3=82=A3=E3=82=B9=E3=81=AB=E3=81=AA=E3=81=84?=
>  <
>
> =?UTF-8?Q?=E5=86=8D=E5=A4=9A=E5=81=9A=E4=B8=80=E9=BB=9E=EF=BC=8C=E6=88=90?=
>  =?UTF-8?Q?=E5=8A=9F=E5=B0=B1=E5=A4=9A=E4=B8=80=E9=BB=9E=EF=BC=81?=>
>
> (Originally, the text above is aligned properly. Please copy above text
> into notepad or vi to see the aligned version)
>
> The character at the end of second line, which is =84, is corrupted.
> However, Outlook2003 can display the above subject line without any
> problem.
>
> Does anyone know how to solve the problem?
>
> Is it a bug in thunderbird? or the above subject line doesn't comply
> with the MIME standard (RFC2047?) ?

 
1.  What looks corrupted to me is a different character, which is split
    between two =?charset?Q?...?= parts.  

    =E3=82=AA  has been split into =E3 (not valid UTF-8) and
    =82=AA (not valid UTF-8).

    AIUI this does not comply with RFC2047.


2.  The =84 at the end of the second line is not a character on its own
    but only part of the UTF-8 data for the character =E3=81=84.


--
Cheers,
Ralph

"Curiosity skilled the cat."
_______________________________________________
mozilla-general mailing list
[hidden email]
http://mail.mozilla.org/listinfo/mozilla-general
Reply | Threaded
Open this post in threaded view
|

Re: Display unicode character in subject line using ThunderBird

M Cowperthwaite
In reply to this post by wing328hk
[hidden email] wrote:

> I'm using ThunderBird and found that the following subject line is
> corrupted:
>
> Subject: =?UTF-8?Q?Out=20of=20Office=20=E7=A7=81=E3=81=AF=E3?=
>  =?UTF-8?Q?=82=AA=E3=83=95=E3=82=A3=E3=82=B9=E3=81=AB=E3=81=AA=E3=81=84?=
>  <
>  =?UTF-8?Q?=E5=86=8D=E5=A4=9A=E5=81=9A=E4=B8=80=E9=BB=9E=EF=BC=8C=E6=88=90?=
>  =?UTF-8?Q?=E5=8A=9F=E5=B0=B1=E5=A4=9A=E4=B8=80=E9=BB=9E=EF=BC=81?=>
>
> The character at the end of second line, which is =84, is corrupted.
> However, Outlook2003 can display the above subject line without any
> problem.
>
> Does anyone know how to solve the problem?
>
> Is it a bug in thunderbird? or the above subject line doesn't comply
> with the MIME standard (RFC2047?) ?

It's not the =84 byte -- between the '=20' in the first line and the '<'
standing alone on the third line, there are only a few characters --
under TB, I'm seeing two Chinese characters plus the 'unknown character'
glyph -- encoded by a total of 27 octets.  These first two characters
apparently have Unicode (decimal) values of: 31169, 12399
   (http://www.pinyin.info/tools/converter/chars2uninumbers.html)
which are hex 79C1, 306F.  According to
   http://www.unicode.org/reports/tr17/index.html
these two characters should take three octets each under UTF-8, which
would imply that there should be nine characters displayed between the
"Out of Office " and "<".  The last two lines are comprised of 39 octets
total (plus a trailing '>') and are displayed as 13 Chinese characters
in the subject line -- again, three octets per character.

The first line has seven octets -- so
   =E7=A7=81  is the first character and
   =E3=81=AF  is the second, leaving
   =E3        at the end of that RFC2047 atom.
I think this is the problem: it's not legal to split a character's
octets across atoms.  Each atom is supposed to be self-decodeable, and
under UTF-8, a single byte of E3 is not a valid character.  Presumably
the 20-octet second line is entirely illegal as well, since it doesn't
have the initial E3 to properly set up the decoding.  See:
   http://mozillanews.org/bugzilla_warning.php3?id=286551

And in fact, if I tweak the header to move the =E3 at the end of the
first atom to the beginning of the second atom, I get a display of nine
Chinese characters before the '<' with no corruption.

The question is, what mail client generated the message with that Subject?

--
Michael Cowperthwaite

To send mail, remove 'Z's from the poster's email address.
_______________________________________________
mozilla-general mailing list
[hidden email]
http://mail.mozilla.org/listinfo/mozilla-general
Reply | Threaded
Open this post in threaded view
|

Re: Display unicode character in subject line using ThunderBird

wing328hk
Thanks for the reply. The subject line shoud be displayed as follows:

Out of Office 私はオフィスにない  <
再多做一點,成功就多一點!>

(I post this message through Google and I found that the Chinese
character is forced to a newline instead of just one single line for
the whole subject)

The mail header of generated by a Perl script, which handles
out-of-office reply. The subject line is created in the following
steps:

1) Extract the subject line, which can be encoded in any character set
such as UTF-8, ISO-2022-JP and more, from the client's email

2) Convert the extracted subject line to UTF-8

3) Concatenate the out-of-office subject line specified by the sender
with the converted subject line in step 2 as follows:

Sender's subject line <subject line from client's email>

4) Format the subject line in step 3 according to RFC2047 using the
encode function in Perl library

encode('MIME-Q', $subject_line)

I would appreciate if someone can show me the correct way to do the
above. In the meantime, I'll go through the links you mentioned above.

Thanks and regards,
Wing

_______________________________________________
mozilla-general mailing list
[hidden email]
http://mail.mozilla.org/listinfo/mozilla-general
Reply | Threaded
Open this post in threaded view
|

Re: Display unicode character in subject line using ThunderBird

M Cowperthwaite
[hidden email] wrote:
> [...]
>
> 4) Format the subject line in step 3 according to RFC2047 using the
> encode function in Perl library
>
> encode('MIME-Q', $subject_line)

Looks like   encode()   needs to be fixed.

--
Michael Cowperthwaite

To send mail, remove 'Z's from the poster's email address.
_______________________________________________
mozilla-general mailing list
[hidden email]
http://mail.mozilla.org/listinfo/mozilla-general
Reply | Threaded
Open this post in threaded view
|

Re: Display unicode character in subject line using ThunderBird

wing328hk
Thanks for your reply.

You mean there is a bug in encode? If yes, where can I report the bug?

Wing

_______________________________________________
mozilla-general mailing list
[hidden email]
http://mail.mozilla.org/listinfo/mozilla-general
Reply | Threaded
Open this post in threaded view
|

Re: Display unicode character in subject line using ThunderBird

M Cowperthwaite
[hidden email] wrote:
> You mean there is a bug in encode? If yes, where can I report the bug?

I don't know; it's a problem with that Perl library, not with Mozilla.

--
Michael Cowperthwaite

To send mail, remove 'Z's from the poster's email address.
_______________________________________________
mozilla-general mailing list
[hidden email]
http://mail.mozilla.org/listinfo/mozilla-general