Ways to Count

classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Ways to Count

Gervase Markham
The current discussion has raised the question of how many ways there
are there to count language coverage for a bit of software.

I can think of:

1) Percentage of the Internet population covered in native language
   (this is what my spreadsheet attempts to do)

2) Percentage of the world population covered in native language
   (this doesn't take into account internet penetration, but you could
   say "hey, that have a browser waiting for them when they get online")

3) Percentage of the world/internet population covered by any language
   they speak
   (Not sure where you'd find the right figures to do this)

4) Number of languages covered
   (this is what counting packs does, although it runs into trouble with
   the definition of "language", and the fact that you give a language
   with 10,000 speakers the same weight as one with 100,000,000)

5) Percentage of countries covered, by official language
   (This might be a good proxy for method 3, because you would hope that
   everyone in a country speaks at least one of the official languages)


Can anyone think of any more?

Gerv

_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

Michael Wolf-2
Gervase Markham napisa:

> The current discussion has raised the question of how many ways there
> are there to count language coverage for a bit of software.
>
> I can think of:
>
> 1) Percentage of the Internet population covered in native language
>    (this is what my spreadsheet attempts to do)
>
> 2) Percentage of the world population covered in native language
>    (this doesn't take into account internet penetration, but you could
>    say "hey, that have a browser waiting for them when they get online")
>
> 3) Percentage of the world/internet population covered by any language
>    they speak
>    (Not sure where you'd find the right figures to do this)

Where do you get reliebla figures for internet and/or world population?
What's the use of such figures? What does it use if you know that the
Sorbian languages cover 0,01% of the world population if I assume 6
milliards people? What's advantage if you know that? IMHO those figures
will be meaningless, they will be too vague.

>
> 4) Number of languages covered
>    (this is what counting packs does, although it runs into trouble with
>    the definition of "language", and the fact that you give a language
>    with 10,000 speakers the same weight as one with 100,000,000)
>
> 5) Percentage of countries covered, by official language
>    (This might be a good proxy for method 3, because you would hope that
>    everyone in a country speaks at least one of the official languages)

Number of languages you should combine with number of localesto include
varieties of a language.

Why percentage? A percentage has statistical significance only, without
real use. Better is the number of covered countries and not covered
countries combined with locales because in a lot of countries more than
1 language are spoken.


Michael
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

user-41
Michael Wolf wrote:

> Where do you get [reliable] figures for internet and/or world population?

The blog message referenced in the previous subject ("Language Analysis
for FF3") mentions that the figures are taken from CIA World Factbook.
Probably this page:  Internet users (by country)
https://www.cia.gov/library/publications/the-world-factbook/fields/2153.html

> What's the use of such figures? What does it use if you know that the
> Sorbian languages cover 0,01% of the world population if I assume 6
> milliards people? What's advantage if you know that? IMHO those figures
> will be meaningless, they will be too vague.

It is one way to get a rough estimate which populations are under served
and represent possible opportunities, to help prioritize future
community-building efforts.  (Mozilla Manifesto says Mozilla works for
public benefit; the public includes all populations of the world.)


> Why percentage? A percentage has statistical significance only, without
> real use.

Firefox gets free publicity in the news when it gains market share
percentage points.  One way these figures can be used is to find
opportunities where it may gain most market share.


 > Better is the number of covered countries and not covered
> countries combined with locales because in a lot of countries more than
> 1 language are spoken.

Better for what purpose?

Maybe a marketing-oriented person likes comparing feature checklists.
Locale checklists without populations seem simpler in that case.  But it
doesn't provide much guidance on how to focus resources to fill the
empty check-boxes.

One possible fear: The Mozilla community has limited human resources to
reach out and help support new localization communities, so for the
purpose of allocating time to candidate communities, one consideration
is population.  This doesn't mean that the long tail of smaller
communities are to be excluded, but smaller communities may get less
attention.

(One way to allocate is by population: if say five 1% locales are
waiting for reviews and two 2.5% locales don't have localizations yet,
then maybe roughly half the person-hours could be spent on the reviews
and half the person-hours spent on helping a localization community get
started in the under served locales.)
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

Axel Hecht
In reply to this post by Gervase Markham
Gervase Markham wrote:

> The current discussion has raised the question of how many ways there
> are there to count language coverage for a bit of software.
>
> I can think of:
>
> 1) Percentage of the Internet population covered in native language
>    (this is what my spreadsheet attempts to do)
>
> 2) Percentage of the world population covered in native language
>    (this doesn't take into account internet penetration, but you could
>    say "hey, that have a browser waiting for them when they get online")
>
> 3) Percentage of the world/internet population covered by any language
>    they speak
>    (Not sure where you'd find the right figures to do this)
>
> 4) Number of languages covered
>    (this is what counting packs does, although it runs into trouble with
>    the definition of "language", and the fact that you give a language
>    with 10,000 speakers the same weight as one with 100,000,000)
>
> 5) Percentage of countries covered, by official language
>    (This might be a good proxy for method 3, because you would hope that
>    everyone in a country speaks at least one of the official languages)
>
>
> Can anyone think of any more?

I don't really think that getting more metrics is worthwhile. For status
reports on how our localization coverage is growing, the metrics should
yield more or less similar answers.

If you have more specific questions, it's probably a good idea to pick a
well-suited metric to answer that question, but that will likely be a
different metric for each.

Expect that the data we have might not help in getting answers to those
questions, independent of the metric we use. The data we have might be
just to vague, or changing.

And then there's still Churchill.

Axel
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

Robert Kaiser
In reply to this post by Gervase Markham
Gervase Markham wrote:
> 1) Percentage of the Internet population covered in native language
>
> 2) Percentage of the world population covered in native language

Both of those usually assume people have only one native language, which
applies to the majority of people but the rest might be significant when
looking at 2-10% of the (internet) population not being covered and
trying to find out about differences there.

Of course, counting gets hard when you want or need to respect people
with multiple native languages and still respect what their first choice
is between them.

Robert Kaiser
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

Gervase Markham
Robert Kaiser wrote:
> Both of those usually assume people have only one native language, which
> applies to the majority of people but the rest might be significant when
> looking at 2-10% of the (internet) population not being covered and
> trying to find out about differences there.
>
> Of course, counting gets hard when you want or need to respect people
> with multiple native languages and still respect what their first choice
> is between them.

So you agree that the people you say have multiple native languages
still have a first choice? Then what's the problem? :-)

The last US census (I think) replaced the question about native language
with one about "language spoken at home". Which seems like a better
question, to which (for the vast majority of people) there is just one
answer. Can we in future assume that this is what we mean when we say
"native language"? :-)

Gerv
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

Gervase Markham
In reply to this post by Axel Hecht
Axel Hecht wrote:
> I don't really think that getting more metrics is worthwhile. For status
> reports on how our localization coverage is growing, the metrics should
> yield more or less similar answers.

I think that's clearly not the case.

If we added support for Anfillo, Ainu, Amurdag, Alsatian, Abenaki and
Amanaye, then by the "number of language packs" metric, it would seem
that our coverage has grown by 10%. But by the "% population covered"
metric, it would hardly have grown at all.
(Source: http://en.wikipedia.org/wiki/List_of_endangered_languages)

> If you have more specific questions, it's probably a good idea to pick a
> well-suited metric to answer that question, but that will likely be a
> different metric for each.

Right. So that would be an argument for having good figures for several
different metrics.

> Expect that the data we have might not help in getting answers to those
> questions, independent of the metric we use. The data we have might be
> just to vague, or changing.

That's possible, but requires proof. If you have issues with the data
I'm using, please raise them :-)

Gerv
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

Axel Hecht
Gervase Markham wrote:

> Axel Hecht wrote:
>> I don't really think that getting more metrics is worthwhile. For status
>> reports on how our localization coverage is growing, the metrics should
>> yield more or less similar answers.
>
> I think that's clearly not the case.
>
> If we added support for Anfillo, Ainu, Amurdag, Alsatian, Abenaki and
> Amanaye, then by the "number of language packs" metric, it would seem
> that our coverage has grown by 10%. But by the "% population covered"
> metric, it would hardly have grown at all.
> (Source: http://en.wikipedia.org/wiki/List_of_endangered_languages)

So "less similar" in this case. Btw, "Population covered" is not going
to change significantly anymore, "population not covered" might, though.

>> If you have more specific questions, it's probably a good idea to pick a
>> well-suited metric to answer that question, but that will likely be a
>> different metric for each.
>
> Right. So that would be an argument for having good figures for several
> different metrics.
>
>> Expect that the data we have might not help in getting answers to those
>> questions, independent of the metric we use. The data we have might be
>> just to vague, or changing.
>
> That's possible, but requires proof. If you have issues with the data
> I'm using, please raise them :-)

Do you have errors for your data? Time they were measured? Trends? And
errors in trends? If we had all that, take your metric, and do some
happy propagation of uncertainty,
http://en.wikipedia.org/wiki/Propagation_of_uncertainty.

In general, I'd expect the errors to be larger for smaller languages
than for big ones, bigger for poorer countries than for thoroughly
industrialized (or post-industrialized) ones.

Axel
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

Simos Xenitellis-4
In reply to this post by Gervase Markham
I think it would be useful to look for the populations (either
Internet-connected or not) that no version of Firefox exists to cater
their needs in terms of language.

If we take for example the Thai language; there is no native language
build for F3 at mozilla.com. There is a http://www.firefoxthai.com/
for F2 (which uses the Firefox logos) but no indication for Firefox3.
What I would think is useful, is statistics that show the percentage
of the population of Thailand that speaks only Thai, and cannot find a
suitable Firefox 3.

For the purposes of L10n, it would be good to see which populations
(sorted by size) are affected by the lack of a version of F3 that is
not available in any language they speak/read.

Simos

On Sat, Jun 14, 2008 at 9:33 AM, Gervase Markham <[hidden email]> wrote:

> The current discussion has raised the question of how many ways there
> are there to count language coverage for a bit of software.
>
> I can think of:
>
> 1) Percentage of the Internet population covered in native language
>   (this is what my spreadsheet attempts to do)
>
> 2) Percentage of the world population covered in native language
>   (this doesn't take into account internet penetration, but you could
>   say "hey, that have a browser waiting for them when they get online")
>
> 3) Percentage of the world/internet population covered by any language
>   they speak
>   (Not sure where you'd find the right figures to do this)
>
> 4) Number of languages covered
>   (this is what counting packs does, although it runs into trouble with
>   the definition of "language", and the fact that you give a language
>   with 10,000 speakers the same weight as one with 100,000,000)
>
> 5) Percentage of countries covered, by official language
>   (This might be a good proxy for method 3, because you would hope that
>   everyone in a country speaks at least one of the official languages)
>
>
> Can anyone think of any more?
>
> Gerv
>
> _______________________________________________
> dev-l10n mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-l10n
>
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

Gervase Markham
In reply to this post by Gervase Markham
Simos Xenitellis wrote:
> For the purposes of L10n, it would be good to see which populations
> (sorted by size) are affected by the lack of a version of F3 that is
> not available in any language they speak/read.

This is my method (3). The problem is that I don't know of a source of
data which can provide the necessary figures.

For a given country, you would need to know data something like this:

UK: All Languages Spoken:

English only: 47.3%
English and Bengali: 4.3%
Bengali only: 0.14%
English and Polish: 1.6%
Polish, Latvian and Russian: 0.006%
...

Then, you could look at the languages and go through ticking off the
groups for which you had at least one hit.

It would be a very long and very detailed list. I don't know if such
data is available even for countries with very good censuses.

Gerv
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

Gervase Markham
In reply to this post by Axel Hecht
Axel Hecht wrote:

> Gervase Markham wrote:
>> Axel Hecht wrote:
>>> I don't really think that getting more metrics is worthwhile. For status
>>> reports on how our localization coverage is growing, the metrics should
>>> yield more or less similar answers.
>>
>> I think that's clearly not the case.
>>
>> If we added support for Anfillo, Ainu, Amurdag, Alsatian, Abenaki and
>> Amanaye, then by the "number of language packs" metric, it would seem
>> that our coverage has grown by 10%. But by the "% population covered"
>> metric, it would hardly have grown at all.
>> (Source: http://en.wikipedia.org/wiki/List_of_endangered_languages)
>
> So "less similar" in this case.

:-) "More or less similar" is an idiom; it doesn't mean "either more
similar or less similar", it means "quite similar".

> Btw, "Population covered" is not going
> to change significantly anymore, "population not covered" might, though.

I don't understand what you mean by that. If "population covered" +
"population not covered" = 100%, how can "population covered" not change
if "population not covered" changes?

> Do you have errors for your data? Time they were measured? Trends? And
> errors in trends? If we had all that, take your metric, and do some
> happy propagation of uncertainty,
> http://en.wikipedia.org/wiki/Propagation_of_uncertainty.
>
> In general, I'd expect the errors to be larger for smaller languages
> than for big ones,

But such errors have less effect on the overall result, because the
absolute numbers are smaller. If I say the population whose native
language is Alsatian is 10,000, when in fact it's 20,000, that's not
going to produce a noticeable error when the internet population is
about 1 billion.

> bigger for poorer countries than for thoroughly
> industrialized (or post-industrialized) ones.

You are probably right there.

But the question is: if your data is not perfect, do you just give up,
or do you work with the data you have? When I said "If you have issues",
I didn't mean "List all of the statistical problems it might have", I
meant "provide better data if you have some, otherwise let's go with
what we've got".

Gerv
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

Simos Xenitellis-4
In reply to this post by Gervase Markham
You can try the Ethnologue website,
http://www.ethnologue.com/

For example, for Thailand, the page is
http://www.ethnologue.com/show_country.asp?name=TH
which says that the vast majority of the population (>93%) speaks Thai
or dialects of Thai.

Simos

On Mon, Jun 16, 2008 at 11:43 AM, Gervase Markham <[hidden email]> wrote:

> Simos Xenitellis wrote:
>> For the purposes of L10n, it would be good to see which populations
>> (sorted by size) are affected by the lack of a version of F3 that is
>> not available in any language they speak/read.
>
> This is my method (3). The problem is that I don't know of a source of
> data which can provide the necessary figures.
>
> For a given country, you would need to know data something like this:
>
> UK: All Languages Spoken:
>
> English only: 47.3%
> English and Bengali: 4.3%
> Bengali only: 0.14%
> English and Polish: 1.6%
> Polish, Latvian and Russian: 0.006%
> ...
>
> Then, you could look at the languages and go through ticking off the
> groups for which you had at least one hit.
>
> It would be a very long and very detailed list. I don't know if such
> data is available even for countries with very good censuses.
>
> Gerv
> _______________________________________________
> dev-l10n mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-l10n
>
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

Axel Hecht
In reply to this post by Gervase Markham
Gervase Markham wrote:

> Axel Hecht wrote:
>> Gervase Markham wrote:
>>> Axel Hecht wrote:
>>>> I don't really think that getting more metrics is worthwhile. For status
>>>> reports on how our localization coverage is growing, the metrics should
>>>> yield more or less similar answers.
>>> I think that's clearly not the case.
>>>
>>> If we added support for Anfillo, Ainu, Amurdag, Alsatian, Abenaki and
>>> Amanaye, then by the "number of language packs" metric, it would seem
>>> that our coverage has grown by 10%. But by the "% population covered"
>>> metric, it would hardly have grown at all.
>>> (Source: http://en.wikipedia.org/wiki/List_of_endangered_languages)
>> So "less similar" in this case.
>
> :-) "More or less similar" is an idiom; it doesn't mean "either more
> similar or less similar", it means "quite similar".

To me, it's equivalent norms, which is, when one grows, the other grows
and vice versa.

>> Btw, "Population covered" is not going
>> to change significantly anymore, "population not covered" might, though.
>
> I don't understand what you mean by that. If "population covered" +
> "population not covered" = 100%, how can "population covered" not change
> if "population not covered" changes?

If you count in %, both have to change.

On the other hand, say we take 90% for one and 10% for the other, an
additional language with 2% changes one by a mere 2.2%, which it changes
the other by a whopping 20%.

Thus, relatively, covered population is hardly going to change by any
language we get, uncovered population on the other hand might.

>> Do you have errors for your data? Time they were measured? Trends? And
>> errors in trends? If we had all that, take your metric, and do some
>> happy propagation of uncertainty,
>> http://en.wikipedia.org/wiki/Propagation_of_uncertainty.
>>
>> In general, I'd expect the errors to be larger for smaller languages
>> than for big ones,
>
> But such errors have less effect on the overall result, because the
> absolute numbers are smaller. If I say the population whose native
> language is Alsatian is 10,000, when in fact it's 20,000, that's not
> going to produce a noticeable error when the internet population is
> about 1 billion.

It does when you start comparing it to other small languages. Which you
to some extent did in your blog post.

>> bigger for poorer countries than for thoroughly
>> industrialized (or post-industrialized) ones.
>
> You are probably right there.
>
> But the question is: if your data is not perfect, do you just give up,
> or do you work with the data you have? When I said "If you have issues",
> I didn't mean "List all of the statistical problems it might have", I
> meant "provide better data if you have some, otherwise let's go with
> what we've got".

Neither. You have to ask appropriate questions for your data to answer,
and if it's really noisy or uncertain data, you have to ask questions
that work well with fuzzy answers.

For example "Which language should Microsoft do next?" is likely a bogus
question, given that you answer was Balochi. That is probably affected
by each and every discaimer you gave on the assumptions you made, and
has likely uncertain initial data, too. I'm running on the assumption
that the next runner up wasn't at just 0.10%.

Axel
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

Robert Kaiser
In reply to this post by Gervase Markham
Gervase Markham wrote:
> So you agree that the people you say have multiple native languages
> still have a first choice? Then what's the problem? :-)

The problem is that you can reach them equally well in all of those
languages, but they still prefer one. So if you ask "how many people do
we reach?" you only have to serve them any of those languages (e.g.
Sorbs, who all grow up bilingually). That e.g. means you can wipe out
minority languages whose native speakers are all bilingual from your
statistics (like Sorbian, or I guess also Gaelic), and you can wipe out
language variants, as Canadians, British, Irish, and South African
people probably can all be reached with en-US.
If you want to serve them in their preferred language, then you need to
look into variants as well as minority languages whose speakers are all
bilingual, and the picture gets both more difficult but also more
interesting.

So, the big problem is that you can't say "Microsoft doesn't reach
Sorbs" just because they don't offer Sorbian, as they reach them pretty
well with German (probably as well as one reaches Brits with en-US,
actually). What you can say is that we serve Sorbs better by offering
that language than by German, but we actually reach them with both.

> The last US census (I think) replaced the question about native language
> with one about "language spoken at home". Which seems like a better
> question, to which (for the vast majority of people) there is just one
> answer. Can we in future assume that this is what we mean when we say
> "native language"? :-)

So, you mean French _and_ German _and_ their Austrian dialect for a
friend of mine, as she speaks all three of them at home? ;-)

Robert Kaiser
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

Gervase Markham
In reply to this post by Axel Hecht
Axel Hecht wrote:
> For example "Which language should Microsoft do next?" is likely a bogus
> question, given that you answer was Balochi.

That is, I think, because I worked that out by eye and I can't count.
The right answer is Belarusian.

Gerv
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

Gervase Markham
In reply to this post by Robert Kaiser
Robert Kaiser wrote:

> Gervase Markham wrote:
>> So you agree that the people you say have multiple native languages
>> still have a first choice? Then what's the problem? :-)
>
> The problem is that you can reach them equally well in all of those
> languages, but they still prefer one. So if you ask "how many people do
> we reach?" you only have to serve them any of those languages (e.g.
> Sorbs, who all grow up bilingually). That e.g. means you can wipe out
> minority languages whose native speakers are all bilingual from your
> statistics (like Sorbian, or I guess also Gaelic),

I think Axel and Pascal would eat me alive if I tried that.

> and you can wipe out
> language variants, as Canadians, British, Irish, and South African
> people probably can all be reached with en-US.

I did make that optimisation.

> If you want to serve them in their preferred language, then you need to
> look into variants as well as minority languages whose speakers are all
> bilingual, and the picture gets both more difficult but also more
> interesting.

I think that if you talk about preferred language, the picture gets less
difficult. "What language do you speak at home?" is a question that
anyone can answer, and many countries have stats for. "What are all the
languages you speak?" is not a question I've found data for, for any
country.

> So, the big problem is that you can't say "Microsoft doesn't reach
> Sorbs" just because they don't offer Sorbian, as they reach them pretty
> well with German (probably as well as one reaches Brits with en-US,
> actually).

Quite so. Which is why I don't say that Microsoft doesn't reach Sorbs. :-)

> What you can say is that we serve Sorbs better by offering
> that language than by German, but we actually reach them with both.

That is true. We are again back to my point that I am putting 90% and
100% solutions in the same basket and contrasting them with the 0% solution.

> So, you mean French _and_ German _and_ their Austrian dialect for a
> friend of mine, as she speaks all three of them at home? ;-)

:-P

Gerv
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

Gervase Markham
In reply to this post by Gervase Markham
Simos Xenitellis wrote:
> For example, for Thailand, the page is
> http://www.ethnologue.com/show_country.asp?name=TH
> which says that the vast majority of the population (>93%) speaks Thai
> or dialects of Thai.

Right. But it doesn't tell us what you wanted to know, which was
"statistics that show the percentage of the population of Thailand that
speaks *only* Thai".

Gerv
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

Michael Wolf-2
In reply to this post by Robert Kaiser
Robert Kaiser napisa:
> So, you mean French _and_ German _and_ their Austrian dialect for a
> friend of mine, as she speaks all three of them at home? ;-)

Yes, and I know a German who speaks Upper Sorbian, Lower Sorbian,
Esperanto and Lithuanian. He is professor for Sorabistics in Lipsia.
AFAIK he speaks with wife Lithuanian at home though she is esperantist
as well. :-)

Michael
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

Axel Hecht
In reply to this post by Gervase Markham
Gervase Markham wrote:
> Axel Hecht wrote:
>> For example "Which language should Microsoft do next?" is likely a bogus
>> question, given that you answer was Balochi.
>
> That is, I think, because I worked that out by eye and I can't count.
> The right answer is Belarusian.

Huh? Anyway, looking at the snapshot, MS neither has bal nor be, which
you attribute 0.511 and 0.452 % to, resp., which is a 13% difference.
10-ish % seems to be *very* low as an error bar, so I don't see how your
data should make a call on whether it'd be bal or be.

Thus, "Which locale should Microsoft do next" is an ill-posed question
for the data you have.

Axel
_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
Reply | Threaded
Open this post in threaded view
|

Re: Ways to Count

Gervase Markham
Axel Hecht wrote:
> Huh? Anyway, looking at the snapshot, MS neither has bal nor be, which
> you attribute 0.511 and 0.452 % to, resp.,

Try the new one. There was a bug: bal is now 0.047%, and be is 0.450% -
a factor of 10 difference.

Next after Belarusian is Oriya, 5478000 vs. 2085318 - so more than half.
With the data I have, Belarusian is the clear answer. Of course, if you
want to improve the data, that could possibly change. :-)

Gerv

_______________________________________________
dev-l10n mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-l10n
12