localized DTD and Properties files - ANSI 8bit or UTF8?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

localized DTD and Properties files - ANSI 8bit or UTF8?

Vito Smolej-3
The question in the subj. is simple enough I hope to answer. The
difference in SL for CancelBtn for example:

ANSI coding:         Prekli\u010di
UTF8:                   Prekliči

My personal preference of course UTF8. I thought I'd better ask,
because the original files are just plain 8bit ANSI code.

TiA

smo
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: localized DTD and Properties files - ANSI 8bit or UTF8?

Andras Timar
2010.02.10. 7:26 keltezéssel, smo írta:

> The question in the subj. is simple enough I hope to answer. The
> difference in SL for CancelBtn for example:
>
> ANSI coding:         Prekli\u010di
> UTF8:                   Prekliči
>
> My personal preference of course UTF8. I thought I'd better ask,
> because the original files are just plain 8bit ANSI code.
>
> TiA
>
> smo
Hi,

You can use UTF8 in Mozilla projects. Originally .dtd files used UTF8
and .properties files used escaped Unicode (which you called ANSI).
About 2-3 years ago the parser of .properties files was extended in
Mozilla code, therefore UTF8 can be used everywhere.

Best regards,
Andras
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: localized DTD and Properties files - ANSI 8bit or UTF8?

Axel Hecht
On 10.02.10 09:49, Andras Timar wrote:

> 2010.02.10. 7:26 keltezéssel, smo írta:
>> The question in the subj. is simple enough I hope to answer. The
>> difference in SL for CancelBtn for example:
>>
>> ANSI coding:         Prekli\u010di
>> UTF8:                   Prekliči
>>
>> My personal preference of course UTF8. I thought I'd better ask,
>> because the original files are just plain 8bit ANSI code.
>>
>> TiA
>>
>> smo
> Hi,
>
> You can use UTF8 in Mozilla projects. Originally .dtd files used UTF8
> and .properties files used escaped Unicode (which you called ANSI).
> About 2-3 years ago the parser of .properties files was extended in
> Mozilla code, therefore UTF8 can be used everywhere.
>

Basically yes, though there are some slight difference in the details.

Java (where we borrowed .properties from) defines those to be mac-roman
encoded, IIRC. I.e., not ASCII. There was a long standing bug in our
implementation that actually treated them as utf-8 instead.

At some point, I declared that bug to be a feature, screw compat with java.

Axel
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: localized DTD and Properties files - ANSI 8bit or UTF8?

Vito Smolej-3
In reply to this post by Andras Timar
On 10 feb., 09:49, Andras Timar <[hidden email]> wrote:

> 2010.02.10. 7:26 keltezéssel, smo írta:> The question in the subj. is simple enough I hope to answer. The
> > difference in SL for CancelBtn for example:
>
> > ANSI coding:         Prekli\u010di
> > UTF8:                   Prekliči
>
> > My personal preference of course UTF8. I thought I'd better ask,
> > because the original files are just plain 8bit ANSI code.
>
> > TiA
>
> > smo
>
> Hi,
>
> You can use UTF8 in Mozilla projects. Originally .dtd files used UTF8
> and .properties files used escaped Unicode (which you called ANSI).
> About 2-3 years ago the parser of .properties files was extended in
> Mozilla code, therefore UTF8 can be used everywhere.
>
> Best regards,
> Andras

Stand corrected - sloppy language, did not care to look for the proper
expression.

_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: localized DTD and Properties files - ANSI 8bit or UTF8?

Vito Smolej-3
In reply to this post by Axel Hecht
On 10 feb., 13:19, Axel Hecht <[hidden email]> wrote:

> On 10.02.10 09:49, Andras Timar wrote:
>
>
>
> > 2010.02.10. 7:26 keltezéssel, smo írta:
> >> The question in the subj. is simple enough I hope to answer. The
> >> difference in SL for CancelBtn for example:
>
> >> ANSI coding:         Prekli\u010di
> >> UTF8:                   Prekliči
>
> >> My personal preference of course UTF8. I thought I'd better ask,
> >> because the original files are just plain 8bit ANSI code.
>
> >> TiA
>
> >> smo
> > Hi,
>
> > You can use UTF8 in Mozilla projects. Originally .dtd files used UTF8
> > and .properties files used escaped Unicode (which you called ANSI).
> > About 2-3 years ago the parser of .properties files was extended in
> > Mozilla code, therefore UTF8 can be used everywhere.
>
> Basically yes, though there are some slight difference in the details.
>
> Java (where we borrowed .properties from) defines those to be mac-roman
> encoded, IIRC. I.e., not ASCII. There was a long standing bug in our
> implementation that actually treated them as utf-8 instead.
>
> At some point, I declared that bug to be a feature, screw compat with java.
>
> Axel

Hi Axel:

I can then assume mac-roman / escaped ASCII will not disappear some
time soon?

This  would make it even simpler for me - as much as I prefer UTF8..
You may
remember Simon raising his brows on some \010d etc characters in the
SL material
localized so far. If I do not need to correct that I'll gladly give up
on UTF8 - for now;

smo
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: localized DTD and Properties files - ANSI 8bit or UTF8?

Axel Hecht
On 10.02.10 20:24, smo wrote:

> On 10 feb., 13:19, Axel Hecht<[hidden email]>  wrote:
>> On 10.02.10 09:49, Andras Timar wrote:
>>
>>
>>
>>> 2010.02.10. 7:26 keltezéssel, smo írta:
>>>> The question in the subj. is simple enough I hope to answer. The
>>>> difference in SL for CancelBtn for example:
>>
>>>> ANSI coding:         Prekli\u010di
>>>> UTF8:                   Prekliči
>>
>>>> My personal preference of course UTF8. I thought I'd better ask,
>>>> because the original files are just plain 8bit ANSI code.
>>
>>>> TiA
>>
>>>> smo
>>> Hi,
>>
>>> You can use UTF8 in Mozilla projects. Originally .dtd files used UTF8
>>> and .properties files used escaped Unicode (which you called ANSI).
>>> About 2-3 years ago the parser of .properties files was extended in
>>> Mozilla code, therefore UTF8 can be used everywhere.
>>
>> Basically yes, though there are some slight difference in the details.
>>
>> Java (where we borrowed .properties from) defines those to be mac-roman
>> encoded, IIRC. I.e., not ASCII. There was a long standing bug in our
>> implementation that actually treated them as utf-8 instead.
>>
>> At some point, I declared that bug to be a feature, screw compat with java.
>>
>> Axel
>
> Hi Axel:
>
> I can then assume mac-roman / escaped ASCII will not disappear some
> time soon?
>
> This  would make it even simpler for me - as much as I prefer UTF8..
> You may
> remember Simon raising his brows on some \010d etc characters in the
> SL material
> localized so far. If I do not need to correct that I'll gladly give up
> on UTF8 - for now;
>

I strongly suggest to use utf-8. It's just less risky and eases our
technical reviews down the line.

Axel
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: localized DTD and Properties files - ANSI 8bit or UTF8?

Tim Chien (MozTW)
Dears,

This conversation reminds me I should post my escape-fix script out
there, so here it is:
http://github.com/timdream/escape-fix

It basically rewrite all the \uXXXX in your files into real UTF-8 characters.
Please contact me if you have any problem.

Regards,


Tim

On Thu, Feb 11, 2010 at 5:30 AM, Axel Hecht <[hidden email]> wrote:

> On 10.02.10 20:24, smo wrote:
>>
>> On 10 feb., 13:19, Axel Hecht<[hidden email]>  wrote:
>>>
>>> On 10.02.10 09:49, Andras Timar wrote:
>>>
>>>
>>>
>>>> 2010.02.10. 7:26 keltezéssel, smo írta:
>>>>>
>>>>> The question in the subj. is simple enough I hope to answer. The
>>>>> difference in SL for CancelBtn for example:
>>>
>>>>> ANSI coding:         Prekli\u010di
>>>>> UTF8:                   Prekliči
>>>
>>>>> My personal preference of course UTF8. I thought I'd better ask,
>>>>> because the original files are just plain 8bit ANSI code.
>>>
>>>>> TiA
>>>
>>>>> smo
>>>>
>>>> Hi,
>>>
>>>> You can use UTF8 in Mozilla projects. Originally .dtd files used UTF8
>>>> and .properties files used escaped Unicode (which you called ANSI).
>>>> About 2-3 years ago the parser of .properties files was extended in
>>>> Mozilla code, therefore UTF8 can be used everywhere.
>>>
>>> Basically yes, though there are some slight difference in the details.
>>>
>>> Java (where we borrowed .properties from) defines those to be mac-roman
>>> encoded, IIRC. I.e., not ASCII. There was a long standing bug in our
>>> implementation that actually treated them as utf-8 instead.
>>>
>>> At some point, I declared that bug to be a feature, screw compat with
>>> java.
>>>
>>> Axel
>>
>> Hi Axel:
>>
>> I can then assume mac-roman / escaped ASCII will not disappear some
>> time soon?
>>
>> This  would make it even simpler for me - as much as I prefer UTF8..
>> You may
>> remember Simon raising his brows on some \010d etc characters in the
>> SL material
>> localized so far. If I do not need to correct that I'll gladly give up
>> on UTF8 - for now;
>>
>
> I strongly suggest to use utf-8. It's just less risky and eases our
> technical reviews down the line.
>
> Axel
> _______________________________________________
> dev-l10n mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-l10n
>
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: localized DTD and Properties files - ANSI 8bit or UTF8?

Simon Paquet-2
In reply to this post by Vito Smolej-3
smo wrote on 10. Feb 2010:

>> Java (where we borrowed .properties from) defines those to be
>> mac-roman encoded, IIRC. I.e., not ASCII. There was a long
>> standing bug in our implementation that actually treated them
>> as utf-8 instead.
>>
>> At some point, I declared that bug to be a feature, screw
>> compat with java.
>
> I can then assume mac-roman / escaped ASCII will not disappear
> some time soon?
>
> This would make it even simpler for me - as much as I prefer UTF8.
> You may remember Simon raising his brows on some \010d etc
> characters in the SL material localized so far. If I do not need
> to correct that I'll gladly give up on UTF8 - for now;

Hi Vito,

like Axel, I would strongly suggest to move to UTF-8. If you need
help in converting all your .properties files, there is the tool
that Tim just posted about and I'm sure other people here would be
willing to help you out here as well.

Cya
Simon

--
Thunderbird/Calendar Localisation (L10n) Coordinator
Thunderbird l10n blog:       http://thunderbird-l10n.blogspot.com
Calendar website maintainer: http://www.mozilla.org/projects/calendar
Calendar developer blog:     http://weblogs.mozillazine.org/calendar
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: localized DTD and Properties files - ANSI 8bit or UTF8?

Vito Smolej-3
In reply to this post by Axel Hecht
On 10 feb., 13:19, Axel Hecht <[hidden email]> wrote:


>
> Java (where we borrowed .properties from) defines those to be mac-roman
> encoded, IIRC. I.e., not ASCII. There was a long standing bug in our
> implementation that actually treated them as utf-8 instead.
>
> At some point, I declared that bug to be a feature, screw compat with java.
>
> Axel

Hi Axel:
we (i.e. OmegaT developers) are more orthodox than the pope himself -
for files of properties type it allows the input to be in UTF8  too,
but unexorably produces IIRC targets (gulp). Thank God there's an
alternative file prototype (intended for simple-minded INI files) and
there I have no problem producing the UTF8 output.

So Simon / repository will get a nice fat shipment one of these days.

Regards and thanks everybody for prompt and to-the-point hints,
suggestions and answers.

smo
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: localized DTD and Properties files - ANSI 8bit or UTF8?

Dwayne Bailey
In reply to this post by Simon Paquet-2
On Thu, 2010-02-11 at 11:54 +0100, Simon Paquet wrote:

> smo wrote on 10. Feb 2010:
>
> >> Java (where we borrowed .properties from) defines those to be
> >> mac-roman encoded, IIRC. I.e., not ASCII. There was a long
> >> standing bug in our implementation that actually treated them
> >> as utf-8 instead.
> >>
> >> At some point, I declared that bug to be a feature, screw
> >> compat with java.
> >
> > I can then assume mac-roman / escaped ASCII will not disappear
> > some time soon?
> >
> > This would make it even simpler for me - as much as I prefer UTF8.
> > You may remember Simon raising his brows on some \010d etc
> > characters in the SL material localized so far. If I do not need
> > to correct that I'll gladly give up on UTF8 - for now;
>
> Hi Vito,
>
> like Axel, I would strongly suggest to move to UTF-8. If you need
> help in converting all your .properties files, there is the tool
> that Tim just posted about and I'm sure other people here would be
> willing to help you out here as well.

Axel I'm pretty sure its Latin1, everything else gets escaped.

Like everyone else says, just please use UTF-8, for 1000 reasons its
just better :)  Axel declaring his bug a feature was the best thing for
localising these files!  The difficulty of localising, reviewing, etc
\uNNNN is the best reason to get rid of this.  We're localisers not
machines.

moz2po has done \uNNNN escaping forever since we hit this issue early
with Venda translations.  The code that does the unescaping is here
http://translate.svn.sourceforge.net/viewvc/translate/src/trunk/translate/misc/quote.py?view=markup line #286

So please use the tool suggested earlier to get rid of those escape or
work with moz2po ;)

--
Dwayne Bailey
Associate             Research Director        +27 12 460 1095 (w)
Translate.org.za      ANLoc                    +27 83 443 7114 (c)

Recent blog posts:
* Virtaal supports Haitian Creole through Machine Translation plugin
http://www.translate.org.za/blogs/dwayne/en/content/virtaal-supports-haitian-creole-through-machine-translation-plugin
* Translate Toolkit - a powerful localisation toolkit
* The sky's the limit for new Zulu spell checker

Firefox web browser in Afrikaans - http://af.www.mozilla.com/af/
African Network for Localisation (ANLoc) - http://africanlocalisation.net/


_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n