LOCALIZATION NOTE, DONT_TRANSLATE and more

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Serbian l10n

Filip Miletic-4
Axel Hecht wrote:
> So, from my point of view, there is a significant cost per locale in
> terms of build resources, QA resources, ftp-and-mirror resources,
[...]
> I'd say that having the cyrrilic version sounds like the one that I
> would suggest, as going from there to latin is easier than the other way
> around, plus being a tad more official.

My interest is to provide the cyrillic translation, so I am in principle
willing to stop there.

However, leaving it at that would beyond doubt inspire criticism from
the users. Therefore, I want to at least negotiate the way to add the
latin translation to the cyrillic one in the future, in case it is not
possible to do so from the beginning.

I understand your concern about inserting the new locales and the burden
it incurs on the release schedule and logistics. But I also admit I kind
of thought that's the whole point of localization: trade the user
convenience off for the extra resources used. Could you be so kind to
explain what complications these two new locales would introduce? Or
better yet, if there's a document, point me to so I can read it?

> short term solution there. That is, we don't have the resources to
> develop a technical solution, let alone deploy and maintain it.

Would pushing this task towards the L10N team not lift some burden off
of you? Think of it as one team that provides two locales. I cannot see
a reason why your development process would not support it.

>From where I stand, there is no (special) problem to deliver two locales
instead of one. In that case you should not care what process the L10N
team uses to make them.

> That said, I'm curious on how the mapping of cyrrilic to latin works
> technically, so that I can give a better guestimate on the cost.

Given a file containing cyrillic UTF-8 encoded text as input, a sed
script along the following lines is used to output an equivalent file in
latin script.

#! /bin/sed -f
s/а/a/g
s/б/b/g
# ...
# 60 lines in total. Each letter of the cyrillic
# alphabet maps to either a one-character or
# a two-character latin string.
# ...
s/Ш/Š/g

hth,
f
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Serbian l10n

Axel Hecht
Filip Miletic wrote:

> Axel Hecht wrote:
>> So, from my point of view, there is a significant cost per locale in
>> terms of build resources, QA resources, ftp-and-mirror resources,
> [...]
>> I'd say that having the cyrrilic version sounds like the one that I
>> would suggest, as going from there to latin is easier than the other way
>> around, plus being a tad more official.
>
> My interest is to provide the cyrillic translation, so I am in principle
> willing to stop there.
>
> However, leaving it at that would beyond doubt inspire criticism from
> the users. Therefore, I want to at least negotiate the way to add the
> latin translation to the cyrillic one in the future, in case it is not
> possible to do so from the beginning.
>
> I understand your concern about inserting the new locales and the burden
> it incurs on the release schedule and logistics. But I also admit I kind
> of thought that's the whole point of localization: trade the user
> convenience off for the extra resources used. Could you be so kind to
> explain what complications these two new locales would introduce? Or
> better yet, if there's a document, point me to so I can read it?

QA and trademarks approval are one, build resources are others,
administrational efforts during release etc. are things that need to be
done extra for each locale.

I will be investigating if we can in some time in the future support
derived localizations, but that is not really a short time goal. And, to
admit that, I'm not going to drive that on behalf of the latin script of
the serbian locale. We could use something like this for the japanese
mac locale, though, so that's why I'm going to look at something in the
direction. Without any timeline, though.

>> short term solution there. That is, we don't have the resources to
>> develop a technical solution, let alone deploy and maintain it.
>
> Would pushing this task towards the L10N team not lift some burden off
> of you? Think of it as one team that provides two locales. I cannot see
> a reason why your development process would not support it.
>
> From where I stand, there is no (special) problem to deliver two locales
> instead of one. In that case you should not care what process the L10N
> team uses to make them.
>
>> That said, I'm curious on how the mapping of cyrrilic to latin works
>> technically, so that I can give a better guestimate on the cost.
>
> Given a file containing cyrillic UTF-8 encoded text as input, a sed
> script along the following lines is used to output an equivalent file in
> latin script.
>
> #! /bin/sed -f
> s/а/a/g
> s/б/b/g
> # ...
> # 60 lines in total. Each letter of the cyrillic
> # alphabet maps to either a one-character or
> # a two-character latin string.
> # ...
> s/Ш/Š/g

This only holds for the utf-8 encoded files, so there is a limit to this
algorithm. That doesn't make it impossible, just more involved.

Axel
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
12