Getting a Tier-X for localized builds. X=1

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Getting a Tier-X for localized builds. X=1

Axel Hecht
Hi,

right now, localized builds, and in particular localized Nightlies, are
part of our job exclusion profile, i.e., they're hidden from any sight.

On the release channel, 60% of our users use those builds, and that's a
weird combination. Even on aurora, it's 50%.

I think we should make our localized nightlies Tier 1 builds.

There's benefit from doing that today, in terms of tracking
infrastructure failures, and regressing landings. More on that at the end.

Reading through
https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy, I recognize
some short comings. Some of them are actual challenges in our status
quo. Some of them might just call for bending our policies, I think.

The first one on the list there is "has an owner".

And this is the first and foremost we just need to bend. (I'm dismissing
the "active)

L10n builds have several owners, and not accommodating for that might be
the prime reason why we haven't assigned a tier to l10n builds.

I think we need to deal with the distributed ownership, and find peace
with that.

The current windows nightly bustage is a great example,
https://bugzilla.mozilla.org/show_bug.cgi?id=1239074. There's an update
to NSIS, and Nightlies broke. The fix has nothing to do with NSIS, but
(we hope) it's missing '\' in buildbot configs. Finding this out was a
joint effort of some 5 folks from various groups,
http://logs.glob.uno/?c=mozilla%23releng&s=12+Jan+2016&e=12+Jan+2016&h=pike#c217466.

We should see strength in that shared ownership instead of a blocker.

The next section is titled "Usable job logs". My brain is at odds with
this one, as I can't decipher the current logs. But I also strongly
believe that that's tied to how my brain works with written text. We
need way more stakeholders than me to make sure what this means.

The remaining sections are about what to do when things break.

I do think that we need to back out stuff if it breaks localized builds
in general.

I also think that our history shows that that only happens really
rarely, so I'm not convinced that we win by making things like inbound
or try a prerequisite. There's a bunch of constructive things we can do
right now with shared ownership that will make progress.

That said, I do think that getting try and inbounds to expose l10n
failures (beyond what l10n-check does) is good.

That's why I think we should make our l10n nightlies tier 1 now.

I really expect to get some help from sheriffs immediately.
I also think that we're not getting our requirements figured out and
assigned unless we actually commit to a particular tier.

To show some common failures these days and their respective actions:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=7f6dc6f3589b&filter-job_group_symbol=L10n&exclusion_profile=false 
shows some infra failures and some broken builds for desktop. Some of
those got retriggered, and for hg time-out reasons. I think this is
something that'd be OK to handle for Sheriffs, in particular if it'd be
part of the standard display.
It also shows some per-locale bustages for Android, which are somewhat
easy to detect from the logs. It's easy to see that you don't want to
retrigger, though I have a hard time to detect what the actual problem
is. But those can be fixed by the localizers, as
https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=cb11e926ac33&filter-job_group_symbol=L10n&exclusion_profile=false 
shows. N4 going green on Android is related to Romansh fixing stuff.
Tough to find out, but the initial error clearly looked like "locale
busted".

The thing that's sad is that there are some jobs here that should have
been retriggered, and weren't. And that I wonder if there's an infra
problem that should've been flagged when it started, and not when I
happened to look at it. hg timeouts worry me, tbh.

And then there are rare occasions where the builds just break.
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=531d1f6d1cde&exclusion_profile=false&filter-job_group_symbol=L10n 
is the most recent example, which is showing a "windows nightlies are
broken". The logs show that configure fails.

Those are hard to resolve, but they're also easy to distinguish from
infra or locale problems, and for those, filing them as early as we can
is key. And getting them attention by a shared ownership will certainly
help. Sheriffs can help here initially with other bugs they've seen. And
then loop in various stakeholders, or trigger IRC/mailing list
conversations.

We've been doing l10n builds without tier for some 10 years, and with
our focus on quality, getting them a tier can only help. I strongly
favor getting them a tier and then fix the problems.

Axel
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Kyle Huey-2
IMO in the most daunting obstacle is one that you do not mention at all:
l10n builds are not scheduled on every push.

- Kyle

On Wed, Jan 13, 2016 at 4:16 PM, Axel Hecht <[hidden email]> wrote:

> Hi,
>
> right now, localized builds, and in particular localized Nightlies, are
> part of our job exclusion profile, i.e., they're hidden from any sight.
>
> On the release channel, 60% of our users use those builds, and that's a
> weird combination. Even on aurora, it's 50%.
>
> I think we should make our localized nightlies Tier 1 builds.
>
> There's benefit from doing that today, in terms of tracking infrastructure
> failures, and regressing landings. More on that at the end.
>
> Reading through https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy,
> I recognize some short comings. Some of them are actual challenges in our
> status quo. Some of them might just call for bending our policies, I think.
>
> The first one on the list there is "has an owner".
>
> And this is the first and foremost we just need to bend. (I'm dismissing
> the "active)
>
> L10n builds have several owners, and not accommodating for that might be
> the prime reason why we haven't assigned a tier to l10n builds.
>
> I think we need to deal with the distributed ownership, and find peace
> with that.
>
> The current windows nightly bustage is a great example,
> https://bugzilla.mozilla.org/show_bug.cgi?id=1239074. There's an update
> to NSIS, and Nightlies broke. The fix has nothing to do with NSIS, but (we
> hope) it's missing '\' in buildbot configs. Finding this out was a joint
> effort of some 5 folks from various groups,
> http://logs.glob.uno/?c=mozilla%23releng&s=12+Jan+2016&e=12+Jan+2016&h=pike#c217466
> .
>
> We should see strength in that shared ownership instead of a blocker.
>
> The next section is titled "Usable job logs". My brain is at odds with
> this one, as I can't decipher the current logs. But I also strongly believe
> that that's tied to how my brain works with written text. We need way more
> stakeholders than me to make sure what this means.
>
> The remaining sections are about what to do when things break.
>
> I do think that we need to back out stuff if it breaks localized builds in
> general.
>
> I also think that our history shows that that only happens really rarely,
> so I'm not convinced that we win by making things like inbound or try a
> prerequisite. There's a bunch of constructive things we can do right now
> with shared ownership that will make progress.
>
> That said, I do think that getting try and inbounds to expose l10n
> failures (beyond what l10n-check does) is good.
>
> That's why I think we should make our l10n nightlies tier 1 now.
>
> I really expect to get some help from sheriffs immediately.
> I also think that we're not getting our requirements figured out and
> assigned unless we actually commit to a particular tier.
>
> To show some common failures these days and their respective actions:
>
>
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=7f6dc6f3589b&filter-job_group_symbol=L10n&exclusion_profile=false
> shows some infra failures and some broken builds for desktop. Some of those
> got retriggered, and for hg time-out reasons. I think this is something
> that'd be OK to handle for Sheriffs, in particular if it'd be part of the
> standard display.
> It also shows some per-locale bustages for Android, which are somewhat
> easy to detect from the logs. It's easy to see that you don't want to
> retrigger, though I have a hard time to detect what the actual problem is.
> But those can be fixed by the localizers, as
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=cb11e926ac33&filter-job_group_symbol=L10n&exclusion_profile=false
> shows. N4 going green on Android is related to Romansh fixing stuff. Tough
> to find out, but the initial error clearly looked like "locale busted".
>
> The thing that's sad is that there are some jobs here that should have
> been retriggered, and weren't. And that I wonder if there's an infra
> problem that should've been flagged when it started, and not when I
> happened to look at it. hg timeouts worry me, tbh.
>
> And then there are rare occasions where the builds just break.
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=531d1f6d1cde&exclusion_profile=false&filter-job_group_symbol=L10n
> is the most recent example, which is showing a "windows nightlies are
> broken". The logs show that configure fails.
>
> Those are hard to resolve, but they're also easy to distinguish from infra
> or locale problems, and for those, filing them as early as we can is key.
> And getting them attention by a shared ownership will certainly help.
> Sheriffs can help here initially with other bugs they've seen. And then
> loop in various stakeholders, or trigger IRC/mailing list conversations.
>
> We've been doing l10n builds without tier for some 10 years, and with our
> focus on quality, getting them a tier can only help. I strongly favor
> getting them a tier and then fix the problems.
>
> Axel
> _______________________________________________
> dev-planning mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-planning
>
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Axel Hecht
In reply to this post by Axel Hecht
On 14/01/16 01:22, Kyle Huey wrote:
> IMO in the most daunting obstacle is one that you do not mention at all:
> l10n builds are not scheduled on every push.

I actually don't think this is an obstacle. Or should be.

I also don't agree that we're not doing l10n builds per push. We do,
they're the l10n-check target in browser/locales, and they're run on try
and other automation,
https://dxr.mozilla.org/mozilla-central/source/build/moz-automation.mk#37.

They're catching some errors, but not others. For example, the try runs
on https://bugzilla.mozilla.org/show_bug.cgi?id=1215694 failed for much
of its development cycle. And once they were green, the only thing that
was left busted were manifests in language packs.

In contrast, we have dozens of locales/platform combinations failing
each day for mercurial timeouts. Like, 3 failed chunks is easily that.
All that's required to get those builds out to people is to retrigger in
the treeherder UI.

So we can benefit today.

Having a meaningful test suite for localized builds would be great. But
blocking on that hasn't proven to be constructive for the past ten year.
https://groups.google.com/d/msg/mozilla.dev.automation/RbgbaShQb_Y/u7Tsnhj5FgAJ 
has thoughts on that, too.

Axel

>
> - Kyle
>
> On Wed, Jan 13, 2016 at 4:16 PM, Axel Hecht <[hidden email]> wrote:
>
>> Hi,
>>
>> right now, localized builds, and in particular localized Nightlies, are
>> part of our job exclusion profile, i.e., they're hidden from any sight.
>>
>> On the release channel, 60% of our users use those builds, and that's a
>> weird combination. Even on aurora, it's 50%.
>>
>> I think we should make our localized nightlies Tier 1 builds.
>>
>> There's benefit from doing that today, in terms of tracking infrastructure
>> failures, and regressing landings. More on that at the end.
>>
>> Reading through https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy,
>> I recognize some short comings. Some of them are actual challenges in our
>> status quo. Some of them might just call for bending our policies, I think.
>>
>> The first one on the list there is "has an owner".
>>
>> And this is the first and foremost we just need to bend. (I'm dismissing
>> the "active)
>>
>> L10n builds have several owners, and not accommodating for that might be
>> the prime reason why we haven't assigned a tier to l10n builds.
>>
>> I think we need to deal with the distributed ownership, and find peace
>> with that.
>>
>> The current windows nightly bustage is a great example,
>> https://bugzilla.mozilla.org/show_bug.cgi?id=1239074. There's an update
>> to NSIS, and Nightlies broke. The fix has nothing to do with NSIS, but (we
>> hope) it's missing '\' in buildbot configs. Finding this out was a joint
>> effort of some 5 folks from various groups,
>> http://logs.glob.uno/?c=mozilla%23releng&s=12+Jan+2016&e=12+Jan+2016&h=pike#c217466
>> .
>>
>> We should see strength in that shared ownership instead of a blocker.
>>
>> The next section is titled "Usable job logs". My brain is at odds with
>> this one, as I can't decipher the current logs. But I also strongly believe
>> that that's tied to how my brain works with written text. We need way more
>> stakeholders than me to make sure what this means.
>>
>> The remaining sections are about what to do when things break.
>>
>> I do think that we need to back out stuff if it breaks localized builds in
>> general.
>>
>> I also think that our history shows that that only happens really rarely,
>> so I'm not convinced that we win by making things like inbound or try a
>> prerequisite. There's a bunch of constructive things we can do right now
>> with shared ownership that will make progress.
>>
>> That said, I do think that getting try and inbounds to expose l10n
>> failures (beyond what l10n-check does) is good.
>>
>> That's why I think we should make our l10n nightlies tier 1 now.
>>
>> I really expect to get some help from sheriffs immediately.
>> I also think that we're not getting our requirements figured out and
>> assigned unless we actually commit to a particular tier.
>>
>> To show some common failures these days and their respective actions:
>>
>>
>> https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=7f6dc6f3589b&filter-job_group_symbol=L10n&exclusion_profile=false
>> shows some infra failures and some broken builds for desktop. Some of those
>> got retriggered, and for hg time-out reasons. I think this is something
>> that'd be OK to handle for Sheriffs, in particular if it'd be part of the
>> standard display.
>> It also shows some per-locale bustages for Android, which are somewhat
>> easy to detect from the logs. It's easy to see that you don't want to
>> retrigger, though I have a hard time to detect what the actual problem is.
>> But those can be fixed by the localizers, as
>> https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=cb11e926ac33&filter-job_group_symbol=L10n&exclusion_profile=false
>> shows. N4 going green on Android is related to Romansh fixing stuff. Tough
>> to find out, but the initial error clearly looked like "locale busted".
>>
>> The thing that's sad is that there are some jobs here that should have
>> been retriggered, and weren't. And that I wonder if there's an infra
>> problem that should've been flagged when it started, and not when I
>> happened to look at it. hg timeouts worry me, tbh.
>>
>> And then there are rare occasions where the builds just break.
>> https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=531d1f6d1cde&exclusion_profile=false&filter-job_group_symbol=L10n
>> is the most recent example, which is showing a "windows nightlies are
>> broken". The logs show that configure fails.
>>
>> Those are hard to resolve, but they're also easy to distinguish from infra
>> or locale problems, and for those, filing them as early as we can is key.
>> And getting them attention by a shared ownership will certainly help.
>> Sheriffs can help here initially with other bugs they've seen. And then
>> loop in various stakeholders, or trigger IRC/mailing list conversations.
>>
>> We've been doing l10n builds without tier for some 10 years, and with our
>> focus on quality, getting them a tier can only help. I strongly favor
>> getting them a tier and then fix the problems.
>>
>> Axel
>> _______________________________________________
>> dev-planning mailing list
>> [hidden email]
>> https://lists.mozilla.org/listinfo/dev-planning
>>

_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Gregory Szorc-3
In reply to this post by Axel Hecht
On Wed, Jan 13, 2016 at 4:16 PM, Axel Hecht <[hidden email]> wrote:

> Hi,
>
> right now, localized builds, and in particular localized Nightlies, are
> part of our job exclusion profile, i.e., they're hidden from any sight.
>
> On the release channel, 60% of our users use those builds, and that's a
> weird combination. Even on aurora, it's 50%.
>
> I think we should make our localized nightlies Tier 1 builds.
>
> There's benefit from doing that today, in terms of tracking infrastructure
> failures, and regressing landings. More on that at the end.
>
> Reading through https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy,
> I recognize some short comings. Some of them are actual challenges in our
> status quo. Some of them might just call for bending our policies, I think.
>
> The first one on the list there is "has an owner".
>
> And this is the first and foremost we just need to bend. (I'm dismissing
> the "active)
>
> L10n builds have several owners, and not accommodating for that might be
> the prime reason why we haven't assigned a tier to l10n builds.
>
> I think we need to deal with the distributed ownership, and find peace
> with that.
>
> The current windows nightly bustage is a great example,
> https://bugzilla.mozilla.org/show_bug.cgi?id=1239074. There's an update
> to NSIS, and Nightlies broke. The fix has nothing to do with NSIS, but (we
> hope) it's missing '\' in buildbot configs. Finding this out was a joint
> effort of some 5 folks from various groups,
> http://logs.glob.uno/?c=mozilla%23releng&s=12+Jan+2016&e=12+Jan+2016&h=pike#c217466
> .
>
> We should see strength in that shared ownership instead of a blocker.
>
> The next section is titled "Usable job logs". My brain is at odds with
> this one, as I can't decipher the current logs. But I also strongly believe
> that that's tied to how my brain works with written text. We need way more
> stakeholders than me to make sure what this means.
>
> The remaining sections are about what to do when things break.
>
> I do think that we need to back out stuff if it breaks localized builds in
> general.
>
> I also think that our history shows that that only happens really rarely,
> so I'm not convinced that we win by making things like inbound or try a
> prerequisite. There's a bunch of constructive things we can do right now
> with shared ownership that will make progress.
>
> That said, I do think that getting try and inbounds to expose l10n
> failures (beyond what l10n-check does) is good.
>
> That's why I think we should make our l10n nightlies tier 1 now.
>
> I really expect to get some help from sheriffs immediately.
> I also think that we're not getting our requirements figured out and
> assigned unless we actually commit to a particular tier.
>
> To show some common failures these days and their respective actions:
>
>
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=7f6dc6f3589b&filter-job_group_symbol=L10n&exclusion_profile=false
> shows some infra failures and some broken builds for desktop. Some of those
> got retriggered, and for hg time-out reasons. I think this is something
> that'd be OK to handle for Sheriffs, in particular if it'd be part of the
> standard display.
> It also shows some per-locale bustages for Android, which are somewhat
> easy to detect from the logs. It's easy to see that you don't want to
> retrigger, though I have a hard time to detect what the actual problem is.
> But those can be fixed by the localizers, as
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=cb11e926ac33&filter-job_group_symbol=L10n&exclusion_profile=false
> shows. N4 going green on Android is related to Romansh fixing stuff. Tough
> to find out, but the initial error clearly looked like "locale busted".
>
> The thing that's sad is that there are some jobs here that should have
> been retriggered, and weren't. And that I wonder if there's an infra
> problem that should've been flagged when it started, and not when I
> happened to look at it. hg timeouts worry me, tbh.
>
> And then there are rare occasions where the builds just break.
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=531d1f6d1cde&exclusion_profile=false&filter-job_group_symbol=L10n
> is the most recent example, which is showing a "windows nightlies are
> broken". The logs show that configure fails.
>
> Those are hard to resolve, but they're also easy to distinguish from infra
> or locale problems, and for those, filing them as early as we can is key.
> And getting them attention by a shared ownership will certainly help.
> Sheriffs can help here initially with other bugs they've seen. And then
> loop in various stakeholders, or trigger IRC/mailing list conversations.
>
> We've been doing l10n builds without tier for some 10 years, and with our
> focus on quality, getting them a tier can only help. I strongly favor
> getting them a tier and then fix the problems.
>

We absolutely need visibility of l10n automation in TreeHerder.

Bustage in l10n automation should be subject to the same backout policy as
everything else. I suppose this means making it a Tier 1 supported job (or
whatever terminology we need to use).

A major obstacle to moving forward is that many l10n jobs only run
periodically. e.g. l10n Nightlies. What we really need to start doing is
running these jobs (or at least a representative subset of them) on every
build (or at least intelligently scheduled). We produce an l10n nightly, we
just don't publish it until the actual Nightly build. This gives us the
automation coverage and confidence that l10n automation is working
properly. We should also extend this same strategy to other
release-oriented jobs, such as partner repacks. We can't be waiting until
the next uplift or even the next Nightly to discover a regression in l10n
or packaging. Regressions need to be detected soon after the commit that
introduced them.
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Ryan VanderMeulen
In reply to this post by Axel Hecht
The other big issue I see is that l10n builds can break at any time due
to changes from the multitude of external repos where the strings are
actually stored (i.e. someone lands a busted string in some locale's
repo, next day's l10n nightlies break as a result). This was an issue we
pushed back hard against in the early B2G days because it makes
sheriffing a huge pain. The solution for B2G was in-tree manifests
automatically updated via the B2G Bumper Bot. Not sure what we could do
for l10n builds.

-Ryan

On 1/13/2016 7:41 PM, Axel Hecht wrote:

> On 14/01/16 01:22, Kyle Huey wrote:
>> IMO in the most daunting obstacle is one that you do not mention at all:
>> l10n builds are not scheduled on every push.
>
> I actually don't think this is an obstacle. Or should be.
>
> I also don't agree that we're not doing l10n builds per push. We do,
> they're the l10n-check target in browser/locales, and they're run on try
> and other automation,
> https://dxr.mozilla.org/mozilla-central/source/build/moz-automation.mk#37.
>
> They're catching some errors, but not others. For example, the try runs
> on https://bugzilla.mozilla.org/show_bug.cgi?id=1215694 failed for much
> of its development cycle. And once they were green, the only thing that
> was left busted were manifests in language packs.
>
> In contrast, we have dozens of locales/platform combinations failing
> each day for mercurial timeouts. Like, 3 failed chunks is easily that.
> All that's required to get those builds out to people is to retrigger in
> the treeherder UI.
>
> So we can benefit today.
>
> Having a meaningful test suite for localized builds would be great. But
> blocking on that hasn't proven to be constructive for the past ten year.
> https://groups.google.com/d/msg/mozilla.dev.automation/RbgbaShQb_Y/u7Tsnhj5FgAJ
> has thoughts on that, too.
>
> Axel
>
>>
>> - Kyle
>>
>> On Wed, Jan 13, 2016 at 4:16 PM, Axel Hecht <[hidden email]> wrote:
>>
>>> Hi,
>>>
>>> right now, localized builds, and in particular localized Nightlies, are
>>> part of our job exclusion profile, i.e., they're hidden from any sight.
>>>
>>> On the release channel, 60% of our users use those builds, and that's a
>>> weird combination. Even on aurora, it's 50%.
>>>
>>> I think we should make our localized nightlies Tier 1 builds.
>>>
>>> There's benefit from doing that today, in terms of tracking
>>> infrastructure
>>> failures, and regressing landings. More on that at the end.
>>>
>>> Reading through
>>> https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy,
>>> I recognize some short comings. Some of them are actual challenges in
>>> our
>>> status quo. Some of them might just call for bending our policies, I
>>> think.
>>>
>>> The first one on the list there is "has an owner".
>>>
>>> And this is the first and foremost we just need to bend. (I'm dismissing
>>> the "active)
>>>
>>> L10n builds have several owners, and not accommodating for that might be
>>> the prime reason why we haven't assigned a tier to l10n builds.
>>>
>>> I think we need to deal with the distributed ownership, and find peace
>>> with that.
>>>
>>> The current windows nightly bustage is a great example,
>>> https://bugzilla.mozilla.org/show_bug.cgi?id=1239074. There's an update
>>> to NSIS, and Nightlies broke. The fix has nothing to do with NSIS,
>>> but (we
>>> hope) it's missing '\' in buildbot configs. Finding this out was a joint
>>> effort of some 5 folks from various groups,
>>> http://logs.glob.uno/?c=mozilla%23releng&s=12+Jan+2016&e=12+Jan+2016&h=pike#c217466
>>>
>>> .
>>>
>>> We should see strength in that shared ownership instead of a blocker.
>>>
>>> The next section is titled "Usable job logs". My brain is at odds with
>>> this one, as I can't decipher the current logs. But I also strongly
>>> believe
>>> that that's tied to how my brain works with written text. We need way
>>> more
>>> stakeholders than me to make sure what this means.
>>>
>>> The remaining sections are about what to do when things break.
>>>
>>> I do think that we need to back out stuff if it breaks localized
>>> builds in
>>> general.
>>>
>>> I also think that our history shows that that only happens really
>>> rarely,
>>> so I'm not convinced that we win by making things like inbound or try a
>>> prerequisite. There's a bunch of constructive things we can do right now
>>> with shared ownership that will make progress.
>>>
>>> That said, I do think that getting try and inbounds to expose l10n
>>> failures (beyond what l10n-check does) is good.
>>>
>>> That's why I think we should make our l10n nightlies tier 1 now.
>>>
>>> I really expect to get some help from sheriffs immediately.
>>> I also think that we're not getting our requirements figured out and
>>> assigned unless we actually commit to a particular tier.
>>>
>>> To show some common failures these days and their respective actions:
>>>
>>>
>>> https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=7f6dc6f3589b&filter-job_group_symbol=L10n&exclusion_profile=false
>>>
>>> shows some infra failures and some broken builds for desktop. Some of
>>> those
>>> got retriggered, and for hg time-out reasons. I think this is something
>>> that'd be OK to handle for Sheriffs, in particular if it'd be part of
>>> the
>>> standard display.
>>> It also shows some per-locale bustages for Android, which are somewhat
>>> easy to detect from the logs. It's easy to see that you don't want to
>>> retrigger, though I have a hard time to detect what the actual
>>> problem is.
>>> But those can be fixed by the localizers, as
>>> https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=cb11e926ac33&filter-job_group_symbol=L10n&exclusion_profile=false
>>>
>>> shows. N4 going green on Android is related to Romansh fixing stuff.
>>> Tough
>>> to find out, but the initial error clearly looked like "locale busted".
>>>
>>> The thing that's sad is that there are some jobs here that should have
>>> been retriggered, and weren't. And that I wonder if there's an infra
>>> problem that should've been flagged when it started, and not when I
>>> happened to look at it. hg timeouts worry me, tbh.
>>>
>>> And then there are rare occasions where the builds just break.
>>> https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=531d1f6d1cde&exclusion_profile=false&filter-job_group_symbol=L10n
>>>
>>> is the most recent example, which is showing a "windows nightlies are
>>> broken". The logs show that configure fails.
>>>
>>> Those are hard to resolve, but they're also easy to distinguish from
>>> infra
>>> or locale problems, and for those, filing them as early as we can is
>>> key.
>>> And getting them attention by a shared ownership will certainly help.
>>> Sheriffs can help here initially with other bugs they've seen. And then
>>> loop in various stakeholders, or trigger IRC/mailing list conversations.
>>>
>>> We've been doing l10n builds without tier for some 10 years, and with
>>> our
>>> focus on quality, getting them a tier can only help. I strongly favor
>>> getting them a tier and then fix the problems.
>>>
>>> Axel
>>> _______________________________________________
>>> dev-planning mailing list
>>> [hidden email]
>>> https://lists.mozilla.org/listinfo/dev-planning
>>>
>
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Ryan VanderMeulen
And I guess there's a related question there as to why can't we
better-validate string changes before they get checked into the repo (or
at least immediately after)?

On 1/13/2016 8:00 PM, Ryan VanderMeulen wrote:

> The other big issue I see is that l10n builds can break at any time due
> to changes from the multitude of external repos where the strings are
> actually stored (i.e. someone lands a busted string in some locale's
> repo, next day's l10n nightlies break as a result). This was an issue we
> pushed back hard against in the early B2G days because it makes
> sheriffing a huge pain. The solution for B2G was in-tree manifests
> automatically updated via the B2G Bumper Bot. Not sure what we could do
> for l10n builds.
>
> -Ryan
>
> On 1/13/2016 7:41 PM, Axel Hecht wrote:
>> On 14/01/16 01:22, Kyle Huey wrote:
>>> IMO in the most daunting obstacle is one that you do not mention at all:
>>> l10n builds are not scheduled on every push.
>>
>> I actually don't think this is an obstacle. Or should be.
>>
>> I also don't agree that we're not doing l10n builds per push. We do,
>> they're the l10n-check target in browser/locales, and they're run on try
>> and other automation,
>> https://dxr.mozilla.org/mozilla-central/source/build/moz-automation.mk#37.
>>
>>
>> They're catching some errors, but not others. For example, the try runs
>> on https://bugzilla.mozilla.org/show_bug.cgi?id=1215694 failed for much
>> of its development cycle. And once they were green, the only thing that
>> was left busted were manifests in language packs.
>>
>> In contrast, we have dozens of locales/platform combinations failing
>> each day for mercurial timeouts. Like, 3 failed chunks is easily that.
>> All that's required to get those builds out to people is to retrigger in
>> the treeherder UI.
>>
>> So we can benefit today.
>>
>> Having a meaningful test suite for localized builds would be great. But
>> blocking on that hasn't proven to be constructive for the past ten year.
>> https://groups.google.com/d/msg/mozilla.dev.automation/RbgbaShQb_Y/u7Tsnhj5FgAJ
>>
>> has thoughts on that, too.
>>
>> Axel
>>
>>>
>>> - Kyle
>>>
>>> On Wed, Jan 13, 2016 at 4:16 PM, Axel Hecht <[hidden email]> wrote:
>>>
>>>> Hi,
>>>>
>>>> right now, localized builds, and in particular localized Nightlies, are
>>>> part of our job exclusion profile, i.e., they're hidden from any sight.
>>>>
>>>> On the release channel, 60% of our users use those builds, and that's a
>>>> weird combination. Even on aurora, it's 50%.
>>>>
>>>> I think we should make our localized nightlies Tier 1 builds.
>>>>
>>>> There's benefit from doing that today, in terms of tracking
>>>> infrastructure
>>>> failures, and regressing landings. More on that at the end.
>>>>
>>>> Reading through
>>>> https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy,
>>>> I recognize some short comings. Some of them are actual challenges in
>>>> our
>>>> status quo. Some of them might just call for bending our policies, I
>>>> think.
>>>>
>>>> The first one on the list there is "has an owner".
>>>>
>>>> And this is the first and foremost we just need to bend. (I'm
>>>> dismissing
>>>> the "active)
>>>>
>>>> L10n builds have several owners, and not accommodating for that
>>>> might be
>>>> the prime reason why we haven't assigned a tier to l10n builds.
>>>>
>>>> I think we need to deal with the distributed ownership, and find peace
>>>> with that.
>>>>
>>>> The current windows nightly bustage is a great example,
>>>> https://bugzilla.mozilla.org/show_bug.cgi?id=1239074. There's an update
>>>> to NSIS, and Nightlies broke. The fix has nothing to do with NSIS,
>>>> but (we
>>>> hope) it's missing '\' in buildbot configs. Finding this out was a
>>>> joint
>>>> effort of some 5 folks from various groups,
>>>> http://logs.glob.uno/?c=mozilla%23releng&s=12+Jan+2016&e=12+Jan+2016&h=pike#c217466
>>>>
>>>>
>>>> .
>>>>
>>>> We should see strength in that shared ownership instead of a blocker.
>>>>
>>>> The next section is titled "Usable job logs". My brain is at odds with
>>>> this one, as I can't decipher the current logs. But I also strongly
>>>> believe
>>>> that that's tied to how my brain works with written text. We need way
>>>> more
>>>> stakeholders than me to make sure what this means.
>>>>
>>>> The remaining sections are about what to do when things break.
>>>>
>>>> I do think that we need to back out stuff if it breaks localized
>>>> builds in
>>>> general.
>>>>
>>>> I also think that our history shows that that only happens really
>>>> rarely,
>>>> so I'm not convinced that we win by making things like inbound or try a
>>>> prerequisite. There's a bunch of constructive things we can do right
>>>> now
>>>> with shared ownership that will make progress.
>>>>
>>>> That said, I do think that getting try and inbounds to expose l10n
>>>> failures (beyond what l10n-check does) is good.
>>>>
>>>> That's why I think we should make our l10n nightlies tier 1 now.
>>>>
>>>> I really expect to get some help from sheriffs immediately.
>>>> I also think that we're not getting our requirements figured out and
>>>> assigned unless we actually commit to a particular tier.
>>>>
>>>> To show some common failures these days and their respective actions:
>>>>
>>>>
>>>> https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=7f6dc6f3589b&filter-job_group_symbol=L10n&exclusion_profile=false
>>>>
>>>>
>>>> shows some infra failures and some broken builds for desktop. Some of
>>>> those
>>>> got retriggered, and for hg time-out reasons. I think this is something
>>>> that'd be OK to handle for Sheriffs, in particular if it'd be part of
>>>> the
>>>> standard display.
>>>> It also shows some per-locale bustages for Android, which are somewhat
>>>> easy to detect from the logs. It's easy to see that you don't want to
>>>> retrigger, though I have a hard time to detect what the actual
>>>> problem is.
>>>> But those can be fixed by the localizers, as
>>>> https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=cb11e926ac33&filter-job_group_symbol=L10n&exclusion_profile=false
>>>>
>>>>
>>>> shows. N4 going green on Android is related to Romansh fixing stuff.
>>>> Tough
>>>> to find out, but the initial error clearly looked like "locale busted".
>>>>
>>>> The thing that's sad is that there are some jobs here that should have
>>>> been retriggered, and weren't. And that I wonder if there's an infra
>>>> problem that should've been flagged when it started, and not when I
>>>> happened to look at it. hg timeouts worry me, tbh.
>>>>
>>>> And then there are rare occasions where the builds just break.
>>>> https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=531d1f6d1cde&exclusion_profile=false&filter-job_group_symbol=L10n
>>>>
>>>>
>>>> is the most recent example, which is showing a "windows nightlies are
>>>> broken". The logs show that configure fails.
>>>>
>>>> Those are hard to resolve, but they're also easy to distinguish from
>>>> infra
>>>> or locale problems, and for those, filing them as early as we can is
>>>> key.
>>>> And getting them attention by a shared ownership will certainly help.
>>>> Sheriffs can help here initially with other bugs they've seen. And then
>>>> loop in various stakeholders, or trigger IRC/mailing list
>>>> conversations.
>>>>
>>>> We've been doing l10n builds without tier for some 10 years, and with
>>>> our
>>>> focus on quality, getting them a tier can only help. I strongly favor
>>>> getting them a tier and then fix the problems.
>>>>
>>>> Axel
>>>> _______________________________________________
>>>> dev-planning mailing list
>>>> [hidden email]
>>>> https://lists.mozilla.org/listinfo/dev-planning
>>>>
>>

_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Axel Hecht
In general, yes, we should make localizations infallible, and I wish we
could.

In my considerations for the really-next-big-shift for l10n, that'll
come. Or the second. Updates soon here.

In the particular case of the failing android builds, this was the
following scenario:

<!ENTITY my-bad "This is &something; aweful">

The sad truth is, I haven't found out a reliable way to say whether
&something; is resolving to something in the document that's including
the DTD, or if it's gonna produce an xml parsing error.

In the case of Romansh, the localizer was using &apos;, which resolves
in HTML docs, but not in the plain XML that the strings.xml on Android is.

Right now, these are "Warnings". Some of those are OK, some of these are
not. Like, if a localizer uses &brandShortName; instead of
&brandShorterName; is that bad enough to drop their localization from
the build? This is a problem of one edge case vs the other, and I don't
have strong data to educate us if there's one side to pick.

I don't expect that we'll advance off of this particular problem unless we

- move to a localization infrastructure which has stronger semantics of
context of localization
- move to a localization infrastructure which can recover from unknown
references at runtime (extra candy for doing so on native platforms)

Also, if we got meaningful runtime testing for localized builds, we
could do stuff. "Meaningful" being both in scope and in educating to a
non-technical localizer as to what the fix is.

The cases where localizers introduce problems are actually pretty rare
these days, though.

....

On 14/01/16 02:06, Ryan VanderMeulen wrote:
> And I guess there's a related question there as to why can't we
> better-validate string changes before they get checked into the repo (or
> at least immediately after)?

The validation part that we can do is actually part of l10n-merge, which
in cases where a bot can make the call that a string is bad, removes it
from the localized file.

So a thing like

<!ENTITY oh-my "&doh">

is just gonna be dropped, and if "oh-my" was a string that en-US has,
it'd be added to the localized file at build time.

Axel

> On 1/13/2016 8:00 PM, Ryan VanderMeulen wrote:
>> The other big issue I see is that l10n builds can break at any time due
>> to changes from the multitude of external repos where the strings are
>> actually stored (i.e. someone lands a busted string in some locale's
>> repo, next day's l10n nightlies break as a result). This was an issue we
>> pushed back hard against in the early B2G days because it makes
>> sheriffing a huge pain. The solution for B2G was in-tree manifests
>> automatically updated via the B2G Bumper Bot. Not sure what we could do
>> for l10n builds.
>>
>> -Ryan
>>
>> On 1/13/2016 7:41 PM, Axel Hecht wrote:
>>> On 14/01/16 01:22, Kyle Huey wrote:
>>>> IMO in the most daunting obstacle is one that you do not mention at
>>>> all:
>>>> l10n builds are not scheduled on every push.
>>>
>>> I actually don't think this is an obstacle. Or should be.
>>>
>>> I also don't agree that we're not doing l10n builds per push. We do,
>>> they're the l10n-check target in browser/locales, and they're run on try
>>> and other automation,
>>> https://dxr.mozilla.org/mozilla-central/source/build/moz-automation.mk#37.
>>>
>>>
>>>
>>> They're catching some errors, but not others. For example, the try runs
>>> on https://bugzilla.mozilla.org/show_bug.cgi?id=1215694 failed for much
>>> of its development cycle. And once they were green, the only thing that
>>> was left busted were manifests in language packs.
>>>
>>> In contrast, we have dozens of locales/platform combinations failing
>>> each day for mercurial timeouts. Like, 3 failed chunks is easily that.
>>> All that's required to get those builds out to people is to retrigger in
>>> the treeherder UI.
>>>
>>> So we can benefit today.
>>>
>>> Having a meaningful test suite for localized builds would be great. But
>>> blocking on that hasn't proven to be constructive for the past ten year.
>>> https://groups.google.com/d/msg/mozilla.dev.automation/RbgbaShQb_Y/u7Tsnhj5FgAJ
>>>
>>>
>>> has thoughts on that, too.
>>>
>>> Axel
>>>
>>>>
>>>> - Kyle
>>>>
>>>> On Wed, Jan 13, 2016 at 4:16 PM, Axel Hecht <[hidden email]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> right now, localized builds, and in particular localized Nightlies,
>>>>> are
>>>>> part of our job exclusion profile, i.e., they're hidden from any
>>>>> sight.
>>>>>
>>>>> On the release channel, 60% of our users use those builds, and
>>>>> that's a
>>>>> weird combination. Even on aurora, it's 50%.
>>>>>
>>>>> I think we should make our localized nightlies Tier 1 builds.
>>>>>
>>>>> There's benefit from doing that today, in terms of tracking
>>>>> infrastructure
>>>>> failures, and regressing landings. More on that at the end.
>>>>>
>>>>> Reading through
>>>>> https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy,
>>>>> I recognize some short comings. Some of them are actual challenges in
>>>>> our
>>>>> status quo. Some of them might just call for bending our policies, I
>>>>> think.
>>>>>
>>>>> The first one on the list there is "has an owner".
>>>>>
>>>>> And this is the first and foremost we just need to bend. (I'm
>>>>> dismissing
>>>>> the "active)
>>>>>
>>>>> L10n builds have several owners, and not accommodating for that
>>>>> might be
>>>>> the prime reason why we haven't assigned a tier to l10n builds.
>>>>>
>>>>> I think we need to deal with the distributed ownership, and find peace
>>>>> with that.
>>>>>
>>>>> The current windows nightly bustage is a great example,
>>>>> https://bugzilla.mozilla.org/show_bug.cgi?id=1239074. There's an
>>>>> update
>>>>> to NSIS, and Nightlies broke. The fix has nothing to do with NSIS,
>>>>> but (we
>>>>> hope) it's missing '\' in buildbot configs. Finding this out was a
>>>>> joint
>>>>> effort of some 5 folks from various groups,
>>>>> http://logs.glob.uno/?c=mozilla%23releng&s=12+Jan+2016&e=12+Jan+2016&h=pike#c217466
>>>>>
>>>>>
>>>>>
>>>>> .
>>>>>
>>>>> We should see strength in that shared ownership instead of a blocker.
>>>>>
>>>>> The next section is titled "Usable job logs". My brain is at odds with
>>>>> this one, as I can't decipher the current logs. But I also strongly
>>>>> believe
>>>>> that that's tied to how my brain works with written text. We need way
>>>>> more
>>>>> stakeholders than me to make sure what this means.
>>>>>
>>>>> The remaining sections are about what to do when things break.
>>>>>
>>>>> I do think that we need to back out stuff if it breaks localized
>>>>> builds in
>>>>> general.
>>>>>
>>>>> I also think that our history shows that that only happens really
>>>>> rarely,
>>>>> so I'm not convinced that we win by making things like inbound or
>>>>> try a
>>>>> prerequisite. There's a bunch of constructive things we can do right
>>>>> now
>>>>> with shared ownership that will make progress.
>>>>>
>>>>> That said, I do think that getting try and inbounds to expose l10n
>>>>> failures (beyond what l10n-check does) is good.
>>>>>
>>>>> That's why I think we should make our l10n nightlies tier 1 now.
>>>>>
>>>>> I really expect to get some help from sheriffs immediately.
>>>>> I also think that we're not getting our requirements figured out and
>>>>> assigned unless we actually commit to a particular tier.
>>>>>
>>>>> To show some common failures these days and their respective actions:
>>>>>
>>>>>
>>>>> https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=7f6dc6f3589b&filter-job_group_symbol=L10n&exclusion_profile=false
>>>>>
>>>>>
>>>>>
>>>>> shows some infra failures and some broken builds for desktop. Some of
>>>>> those
>>>>> got retriggered, and for hg time-out reasons. I think this is
>>>>> something
>>>>> that'd be OK to handle for Sheriffs, in particular if it'd be part of
>>>>> the
>>>>> standard display.
>>>>> It also shows some per-locale bustages for Android, which are somewhat
>>>>> easy to detect from the logs. It's easy to see that you don't want to
>>>>> retrigger, though I have a hard time to detect what the actual
>>>>> problem is.
>>>>> But those can be fixed by the localizers, as
>>>>> https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=cb11e926ac33&filter-job_group_symbol=L10n&exclusion_profile=false
>>>>>
>>>>>
>>>>>
>>>>> shows. N4 going green on Android is related to Romansh fixing stuff.
>>>>> Tough
>>>>> to find out, but the initial error clearly looked like "locale
>>>>> busted".
>>>>>
>>>>> The thing that's sad is that there are some jobs here that should have
>>>>> been retriggered, and weren't. And that I wonder if there's an infra
>>>>> problem that should've been flagged when it started, and not when I
>>>>> happened to look at it. hg timeouts worry me, tbh.
>>>>>
>>>>> And then there are rare occasions where the builds just break.
>>>>> https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=531d1f6d1cde&exclusion_profile=false&filter-job_group_symbol=L10n
>>>>>
>>>>>
>>>>>
>>>>> is the most recent example, which is showing a "windows nightlies are
>>>>> broken". The logs show that configure fails.
>>>>>
>>>>> Those are hard to resolve, but they're also easy to distinguish from
>>>>> infra
>>>>> or locale problems, and for those, filing them as early as we can is
>>>>> key.
>>>>> And getting them attention by a shared ownership will certainly help.
>>>>> Sheriffs can help here initially with other bugs they've seen. And
>>>>> then
>>>>> loop in various stakeholders, or trigger IRC/mailing list
>>>>> conversations.
>>>>>
>>>>> We've been doing l10n builds without tier for some 10 years, and with
>>>>> our
>>>>> focus on quality, getting them a tier can only help. I strongly favor
>>>>> getting them a tier and then fix the problems.
>>>>>
>>>>> Axel
>>>>> _______________________________________________
>>>>> dev-planning mailing list
>>>>> [hidden email]
>>>>> https://lists.mozilla.org/listinfo/dev-planning
>>>>>
>>>
>

_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Mike Hommey
In reply to this post by Gregory Szorc-3
On Wed, Jan 13, 2016 at 05:00:36PM -0800, Gregory Szorc wrote:

> We absolutely need visibility of l10n automation in TreeHerder.
>
> Bustage in l10n automation should be subject to the same backout policy as
> everything else. I suppose this means making it a Tier 1 supported job (or
> whatever terminology we need to use).
>
> A major obstacle to moving forward is that many l10n jobs only run
> periodically. e.g. l10n Nightlies. What we really need to start doing is
> running these jobs (or at least a representative subset of them) on every
> build (or at least intelligently scheduled). We produce an l10n nightly, we
> just don't publish it until the actual Nightly build. This gives us the
> automation coverage and confidence that l10n automation is working
> properly. We should also extend this same strategy to other
> release-oriented jobs, such as partner repacks. We can't be waiting until
> the next uplift or even the next Nightly to discover a regression in l10n
> or packaging. Regressions need to be detected soon after the commit that
> introduced them.

Worse, we can't wait to discover regressions when things hit beta (that
happens a *lot*)

Either way, saying that things breaking l10n builds should be backed out
is nice, but the problem is that in many cases, the answer to "what
should be backed out?" is not trivial, and you don't know if the backout
worked until the next nightly...

This, to me, is what makes l10n builds hard to make tier-1 with the
current state of things.

Mike
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Justin Dolske-2
In reply to this post by Ryan VanderMeulen
On 1/13/16 5:00 PM, Ryan VanderMeulen wrote:
> The other big issue I see is that l10n builds can break at any time due
> to changes from the multitude of external repos where the strings are
> actually stored (i.e. someone lands a busted string in some locale's
> repo, next day's l10n nightlies break as a result).

I'm not sure precisely what Axel's proposal entails, in part because I
don't know what the L10N tests are currently doing. But it sounds like a
chunk of it is just making sure that the L10N machinery is working in
general? As a strawman, could we just pick one well-maintained locale,
and run everything against that?

In other words, "can we produce _a_ localized build" is something that
sounds important to have in mozilla-central CI testing, whereas "is
locale X working" sounds more like something that should happen with CI
testing against the L10N repo. (At least to start with?)

Justin
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Steve Fink-4
In reply to this post by Mike Hommey
On 01/13/2016 07:09 PM, Mike Hommey wrote:

> On Wed, Jan 13, 2016 at 05:00:36PM -0800, Gregory Szorc wrote:
>> We absolutely need visibility of l10n automation in TreeHerder.
>>
>> Bustage in l10n automation should be subject to the same backout policy as
>> everything else. I suppose this means making it a Tier 1 supported job (or
>> whatever terminology we need to use).
>>
>> A major obstacle to moving forward is that many l10n jobs only run
>> periodically. e.g. l10n Nightlies. What we really need to start doing is
>> running these jobs (or at least a representative subset of them) on every
>> build (or at least intelligently scheduled). We produce an l10n nightly, we
>> just don't publish it until the actual Nightly build. This gives us the
>> automation coverage and confidence that l10n automation is working
>> properly. We should also extend this same strategy to other
>> release-oriented jobs, such as partner repacks. We can't be waiting until
>> the next uplift or even the next Nightly to discover a regression in l10n
>> or packaging. Regressions need to be detected soon after the commit that
>> introduced them.
> Worse, we can't wait to discover regressions when things hit beta (that
> happens a *lot*)
>
> Either way, saying that things breaking l10n builds should be backed out
> is nice, but the problem is that in many cases, the answer to "what
> should be backed out?" is not trivial, and you don't know if the backout
> worked until the next nightly...
>
> This, to me, is what makes l10n builds hard to make tier-1 with the
> current state of things.

Yes. I don't see what declaring l10n builds to be "tier 1" would do
right now, other than weakening the meaning of tier 1.

The main meaning of "tier 1" in my mind is whether things get backed out
for breaking them. Which only makes sense if it's possible to detect
that a push has in fact broken them. A once per day build really does
not allow that, imho; you'd need to train all the sheriffs to be experts
in diagnosing l10n breakages. So I would think it would be more
productive to figure out what parts of l10n testing *can* be made tier 1.

Products don't belong to a tier. Specific *jobs* do.

We totally want to get better at this. I've maintained non tier 1 code,
and it's amazing how many ways jobs can start breaking if you're not
keeping on top of them all the time. (Or rather, if you don't have
sheriffs keeping on top of them!) For something we're shipping, it's
crazy to be in that state.

What testing is feasible to run per-checkin? What useful jobs can we
carve out?

As RyanVM said, is there more checkin automation that would prevent some
classes from reaching the tree?

How can we identify what change broke things? (eg, would a bumper bot work?)

Are l10n breakages distinguishable from the logs? It's not about the
logs being nicely usable by humans -- we're far from that -- but rather
that the sheriffs don't have to open up each and every failing job's log
and manually read through it to figure out what's going on. For build
failures, you're already good -- treeherder will display those failures
properly. But if, for example, &some; xml entity breaks things, is the
error message in a pattern recognizable by the current log scanners and
if not, does the error message need to change or is there a pattern we
can check for in the scanner?

Going beyond, I have long speculated about having a tier 2 (tier 1.5?)
classification for jobs that *don't* run on every push, using some
additional mechanisms beyond what we have now to make them visible
without being distracting. But I'd rather not divert the focus with that.

_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Chris Peterson-12
In reply to this post by Justin Dolske-2
On 1/13/16 7:21 PM, Justin Dolske wrote:
> In other words, "can we produce _a_ localized build" is something that
> sounds important to have in mozilla-central CI testing, whereas "is
> locale X working" sounds more like something that should happen with CI
> testing against the L10N repo. (At least to start with?)

I have naive question: why do we have l10n builds?

Why is l10n a build-time step? Can we ship a universal build that
includes all language strings in mozilla-central? Most OS X
applications, including Chrome, include strings for multiple languages,
selecting a language at startup to match the system locale.
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Chris AtLee-3
In reply to this post by Steve Fink-4
I think having a separate l10n repack per push for a small set of
relatively stable locales would go a long way towards detecting bustage
early.

https://bugzilla.mozilla.org/show_bug.cgi?id=848284 was filed a while back
to get l10n jobs available on Try and Inbound. I've taken a quick poke at
this, but haven't had a chance to test it. If we want per-push repacks on
inbound, we may want to take a different approach than the one I've taken
anyway.

Is there someone who could help debug any issues that come up if we decide
to tackle this?

Cheers,
Chris

On 14 January 2016 at 13:23, Steve Fink <[hidden email]> wrote:

> On 01/13/2016 07:09 PM, Mike Hommey wrote:
>
>> On Wed, Jan 13, 2016 at 05:00:36PM -0800, Gregory Szorc wrote:
>>
>>> We absolutely need visibility of l10n automation in TreeHerder.
>>>
>>> Bustage in l10n automation should be subject to the same backout policy
>>> as
>>> everything else. I suppose this means making it a Tier 1 supported job
>>> (or
>>> whatever terminology we need to use).
>>>
>>> A major obstacle to moving forward is that many l10n jobs only run
>>> periodically. e.g. l10n Nightlies. What we really need to start doing is
>>> running these jobs (or at least a representative subset of them) on every
>>> build (or at least intelligently scheduled). We produce an l10n nightly,
>>> we
>>> just don't publish it until the actual Nightly build. This gives us the
>>> automation coverage and confidence that l10n automation is working
>>> properly. We should also extend this same strategy to other
>>> release-oriented jobs, such as partner repacks. We can't be waiting until
>>> the next uplift or even the next Nightly to discover a regression in l10n
>>> or packaging. Regressions need to be detected soon after the commit that
>>> introduced them.
>>>
>> Worse, we can't wait to discover regressions when things hit beta (that
>> happens a *lot*)
>>
>> Either way, saying that things breaking l10n builds should be backed out
>> is nice, but the problem is that in many cases, the answer to "what
>> should be backed out?" is not trivial, and you don't know if the backout
>> worked until the next nightly...
>>
>> This, to me, is what makes l10n builds hard to make tier-1 with the
>> current state of things.
>>
>
> Yes. I don't see what declaring l10n builds to be "tier 1" would do right
> now, other than weakening the meaning of tier 1.
>
> The main meaning of "tier 1" in my mind is whether things get backed out
> for breaking them. Which only makes sense if it's possible to detect that a
> push has in fact broken them. A once per day build really does not allow
> that, imho; you'd need to train all the sheriffs to be experts in
> diagnosing l10n breakages. So I would think it would be more productive to
> figure out what parts of l10n testing *can* be made tier 1.
>
> Products don't belong to a tier. Specific *jobs* do.
>
> We totally want to get better at this. I've maintained non tier 1 code,
> and it's amazing how many ways jobs can start breaking if you're not
> keeping on top of them all the time. (Or rather, if you don't have sheriffs
> keeping on top of them!) For something we're shipping, it's crazy to be in
> that state.
>
> What testing is feasible to run per-checkin? What useful jobs can we carve
> out?
>
> As RyanVM said, is there more checkin automation that would prevent some
> classes from reaching the tree?
>
> How can we identify what change broke things? (eg, would a bumper bot
> work?)
>
> Are l10n breakages distinguishable from the logs? It's not about the logs
> being nicely usable by humans -- we're far from that -- but rather that the
> sheriffs don't have to open up each and every failing job's log and
> manually read through it to figure out what's going on. For build failures,
> you're already good -- treeherder will display those failures properly. But
> if, for example, &some; xml entity breaks things, is the error message in a
> pattern recognizable by the current log scanners and if not, does the error
> message need to change or is there a pattern we can check for in the
> scanner?
>
> Going beyond, I have long speculated about having a tier 2 (tier 1.5?)
> classification for jobs that *don't* run on every push, using some
> additional mechanisms beyond what we have now to make them visible without
> being distracting. But I'd rather not divert the focus with that.
>
>
> _______________________________________________
> dev-planning mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-planning
>
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Chris AtLee-3
In reply to this post by Chris Peterson-12
I'd very much like to ship language-agnostic builds and have strings
delivered independently from the release process, as a system addon or
somesuch. That would dramatically simplify our release process!

On 14 January 2016 at 13:27, Chris Peterson <[hidden email]> wrote:

> On 1/13/16 7:21 PM, Justin Dolske wrote:
>
>> In other words, "can we produce _a_ localized build" is something that
>> sounds important to have in mozilla-central CI testing, whereas "is
>> locale X working" sounds more like something that should happen with CI
>> testing against the L10N repo. (At least to start with?)
>>
>
> I have naive question: why do we have l10n builds?
>
> Why is l10n a build-time step? Can we ship a universal build that includes
> all language strings in mozilla-central? Most OS X applications, including
> Chrome, include strings for multiple languages, selecting a language at
> startup to match the system locale.
>
> _______________________________________________
> dev-planning mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-planning
>
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Mike Hommey
On Thu, Jan 14, 2016 at 01:36:36PM -0500, Chris AtLee wrote:
> I'd very much like to ship language-agnostic builds and have strings
> delivered independently from the release process, as a system addon or
> somesuch. That would dramatically simplify our release process!

Yes, using a system addon for locale would simplify a lot of things,
starting with repacks. And making en-US a locale like any other would
ensure that things work properlty with langpacks. There are technical
details to figure out to make that work if we want to go that path,
though (as in, within gecko, iirc there are a few thing that currently
don't work with langpacks)

Mike
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Axel Hecht
In reply to this post by Chris AtLee-3
On 14/01/16 22:45, Mike Hommey wrote:

> On Thu, Jan 14, 2016 at 01:36:36PM -0500, Chris AtLee wrote:
>> I'd very much like to ship language-agnostic builds and have strings
>> delivered independently from the release process, as a system addon or
>> somesuch. That would dramatically simplify our release process!
>
> Yes, using a system addon for locale would simplify a lot of things,
> starting with repacks. And making en-US a locale like any other would
> ensure that things work properlty with langpacks. There are technical
> details to figure out to make that work if we want to go that path,
> though (as in, within gecko, iirc there are a few thing that currently
> don't work with langpacks)
>
> Mike
>

Doing this is part of a rewrite of Firefox and Gecko, basically.

That doesn't make it bad, but it makes it expensive.

The current l10n infra makes l10n infallible. Separating l10n from the
build means that l10n might fail, and if it fails, it needs to fall back
to something (like en-US). Which is additional IO, and thus needs to be
async. Which violates all gecko APIs.

L10n.js/l20n.js in gaia land are way ahead of the curve, but we also
rewrote a bunch of gaia to support runtime fallback and modern l10n apis.

And then there's the UX question about how to do installers that
actually offer language choice that's no worse than the web.

'cause if we do per-locale installers, we just do per-locale builds,
there's no difference in wall clock time and complexity of the system.

Let's make this part of the thread end here? It's all part of a
different plan, and won't help us in a timeframe that's relevant to this
thread.

Axel
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Mike Hommey
On Thu, Jan 14, 2016 at 11:00:33PM +0100, Axel Hecht wrote:

> On 14/01/16 22:45, Mike Hommey wrote:
> >On Thu, Jan 14, 2016 at 01:36:36PM -0500, Chris AtLee wrote:
> >>I'd very much like to ship language-agnostic builds and have strings
> >>delivered independently from the release process, as a system addon or
> >>somesuch. That would dramatically simplify our release process!
> >
> >Yes, using a system addon for locale would simplify a lot of things,
> >starting with repacks. And making en-US a locale like any other would
> >ensure that things work properlty with langpacks. There are technical
> >details to figure out to make that work if we want to go that path,
> >though (as in, within gecko, iirc there are a few thing that currently
> >don't work with langpacks)
> >
> >Mike
> >
>
> Doing this is part of a rewrite of Firefox and Gecko, basically.
>
> That doesn't make it bad, but it makes it expensive.
>
> The current l10n infra makes l10n infallible. Separating l10n from the build
> means that l10n might fail, and if it fails, it needs to fall back to
> something (like en-US). Which is additional IO, and thus needs to be async.
> Which violates all gecko APIs.
>
> L10n.js/l20n.js in gaia land are way ahead of the curve, but we also rewrote
> a bunch of gaia to support runtime fallback and modern l10n apis.
>
> And then there's the UX question about how to do installers that actually
> offer language choice that's no worse than the web.
>
> 'cause if we do per-locale installers, we just do per-locale builds, there's
> no difference in wall clock time and complexity of the system.
>
> Let's make this part of the thread end here? It's all part of a different
> plan, and won't help us in a timeframe that's relevant to this thread.

I think you're overthinking this. en-US would stay in m-c, and built by
default, but it would be a system addon instead of intermixed with other
things. Then repacks become building a system addon for a different
language, and replacing the en-US addon with it.

Mike
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Nicholas Alexander
On Thu, Jan 14, 2016 at 2:42 PM, Mike Hommey <[hidden email]> wrote:

> On Thu, Jan 14, 2016 at 11:00:33PM +0100, Axel Hecht wrote:
> > On 14/01/16 22:45, Mike Hommey wrote:
> > >On Thu, Jan 14, 2016 at 01:36:36PM -0500, Chris AtLee wrote:
> > >>I'd very much like to ship language-agnostic builds and have strings
> > >>delivered independently from the release process, as a system addon or
> > >>somesuch. That would dramatically simplify our release process!
> > >
> > >Yes, using a system addon for locale would simplify a lot of things,
> > >starting with repacks. And making en-US a locale like any other would
> > >ensure that things work properlty with langpacks. There are technical
> > >details to figure out to make that work if we want to go that path,
> > >though (as in, within gecko, iirc there are a few thing that currently
> > >don't work with langpacks)
> > >
> > >Mike
> > >
> >
> > Doing this is part of a rewrite of Firefox and Gecko, basically.
> >
> > That doesn't make it bad, but it makes it expensive.
> >
> > The current l10n infra makes l10n infallible. Separating l10n from the
> build
> > means that l10n might fail, and if it fails, it needs to fall back to
> > something (like en-US). Which is additional IO, and thus needs to be
> async.
> > Which violates all gecko APIs.
> >
> > L10n.js/l20n.js in gaia land are way ahead of the curve, but we also
> rewrote
> > a bunch of gaia to support runtime fallback and modern l10n apis.
> >
> > And then there's the UX question about how to do installers that actually
> > offer language choice that's no worse than the web.
> >
> > 'cause if we do per-locale installers, we just do per-locale builds,
> there's
> > no difference in wall clock time and complexity of the system.
> >
> > Let's make this part of the thread end here? It's all part of a different
> > plan, and won't help us in a timeframe that's relevant to this thread.
>
> I think you're overthinking this. en-US would stay in m-c, and built by
> default, but it would be a system addon instead of intermixed with other
> things. Then repacks become building a system addon for a different
> language, and replacing the en-US addon with it.
>

This isn't feasible for Fennec builds, which need to do some Java and
Android toolchain work to process resources into the APK.  However, that
work is already quite special (and not computationally expensive), so I'm
sure we can make it happen at repack time.  Fennec doesn't currently use
langpacks for its Gecko-level string resources, but perhaps it could?

Nick
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Axel Hecht
In reply to this post by Steve Fink-4
On 14/01/16 19:34, Chris AtLee wrote:

> I think having a separate l10n repack per push for a small set of
> relatively stable locales would go a long way towards detecting bustage
> early.
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=848284 was filed a while back
> to get l10n jobs available on Try and Inbound. I've taken a quick poke at
> this, but haven't had a chance to test it. If we want per-push repacks on
> inbound, we may want to take a different approach than the one I've taken
> anyway.
>
> Is there someone who could help debug any issues that come up if we decide
> to tackle this?

I believe that we should tackle this. And I'm happy to help.

I've also proven that I can be only of limited help, given how often
I've broken the tree for two days past week. Finally even backing out my
code. More on that in a separate reply.

Broadly speaking:

I think there would be two things to tackle first, one is to break out
from repacking Nightlies. It sounded like a good idea at the time, but
not any more. The other is to have repacks for just a select locale or
two, and not all locales.

The good news is that we're having artifact builds now, which is really
what repacks are, or can be. Have someone else do the compile, unwrap
it, do your thing, be happy. I'm not so sure on the choice if routes
(with fx-team first), mac being mac64..., requiring pushlog data. But
that should be stuff we can refactor.

Given that I feel like hyatt on a t-shirt today (and I'm not sure if
that was actually hyatt or someone else), I'm wondering if it'd make
sense to start an alternative script on try/inbound, and then slowly
deploy that to nightly/beta/release?

In particular when we get to PRETTY_PACKAGE_NAMES, the current code in
artifacts.py probably breaks?

Axel

>
> Cheers,
> Chris
>
> On 14 January 2016 at 13:23, Steve Fink <[hidden email]> wrote:
>
>> On 01/13/2016 07:09 PM, Mike Hommey wrote:
>>
>>> On Wed, Jan 13, 2016 at 05:00:36PM -0800, Gregory Szorc wrote:
>>>
>>>> We absolutely need visibility of l10n automation in TreeHerder.
>>>>
>>>> Bustage in l10n automation should be subject to the same backout policy
>>>> as
>>>> everything else. I suppose this means making it a Tier 1 supported job
>>>> (or
>>>> whatever terminology we need to use).
>>>>
>>>> A major obstacle to moving forward is that many l10n jobs only run
>>>> periodically. e.g. l10n Nightlies. What we really need to start doing is
>>>> running these jobs (or at least a representative subset of them) on every
>>>> build (or at least intelligently scheduled). We produce an l10n nightly,
>>>> we
>>>> just don't publish it until the actual Nightly build. This gives us the
>>>> automation coverage and confidence that l10n automation is working
>>>> properly. We should also extend this same strategy to other
>>>> release-oriented jobs, such as partner repacks. We can't be waiting until
>>>> the next uplift or even the next Nightly to discover a regression in l10n
>>>> or packaging. Regressions need to be detected soon after the commit that
>>>> introduced them.
>>>>
>>> Worse, we can't wait to discover regressions when things hit beta (that
>>> happens a *lot*)
>>>
>>> Either way, saying that things breaking l10n builds should be backed out
>>> is nice, but the problem is that in many cases, the answer to "what
>>> should be backed out?" is not trivial, and you don't know if the backout
>>> worked until the next nightly...
>>>
>>> This, to me, is what makes l10n builds hard to make tier-1 with the
>>> current state of things.
>>>
>>
>> Yes. I don't see what declaring l10n builds to be "tier 1" would do right
>> now, other than weakening the meaning of tier 1.
>>
>> The main meaning of "tier 1" in my mind is whether things get backed out
>> for breaking them. Which only makes sense if it's possible to detect that a
>> push has in fact broken them. A once per day build really does not allow
>> that, imho; you'd need to train all the sheriffs to be experts in
>> diagnosing l10n breakages. So I would think it would be more productive to
>> figure out what parts of l10n testing *can* be made tier 1.
>>
>> Products don't belong to a tier. Specific *jobs* do.
>>
>> We totally want to get better at this. I've maintained non tier 1 code,
>> and it's amazing how many ways jobs can start breaking if you're not
>> keeping on top of them all the time. (Or rather, if you don't have sheriffs
>> keeping on top of them!) For something we're shipping, it's crazy to be in
>> that state.
>>
>> What testing is feasible to run per-checkin? What useful jobs can we carve
>> out?
>>
>> As RyanVM said, is there more checkin automation that would prevent some
>> classes from reaching the tree?
>>
>> How can we identify what change broke things? (eg, would a bumper bot
>> work?)
>>
>> Are l10n breakages distinguishable from the logs? It's not about the logs
>> being nicely usable by humans -- we're far from that -- but rather that the
>> sheriffs don't have to open up each and every failing job's log and
>> manually read through it to figure out what's going on. For build failures,
>> you're already good -- treeherder will display those failures properly. But
>> if, for example, &some; xml entity breaks things, is the error message in a
>> pattern recognizable by the current log scanners and if not, does the error
>> message need to change or is there a pattern we can check for in the
>> scanner?
>>
>> Going beyond, I have long speculated about having a tier 2 (tier 1.5?)
>> classification for jobs that *don't* run on every push, using some
>> additional mechanisms beyond what we have now to make them visible without
>> being distracting. But I'd rather not divert the focus with that.
>>
>>
>> _______________________________________________
>> dev-planning mailing list
>> [hidden email]
>> https://lists.mozilla.org/listinfo/dev-planning
>>

_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Mike Hommey
On Thu, Jan 21, 2016 at 04:45:40PM +0800, Axel Hecht wrote:

> On 14/01/16 19:34, Chris AtLee wrote:
> >I think having a separate l10n repack per push for a small set of
> >relatively stable locales would go a long way towards detecting bustage
> >early.
> >
> >https://bugzilla.mozilla.org/show_bug.cgi?id=848284 was filed a while back
> >to get l10n jobs available on Try and Inbound. I've taken a quick poke at
> >this, but haven't had a chance to test it. If we want per-push repacks on
> >inbound, we may want to take a different approach than the one I've taken
> >anyway.
> >
> >Is there someone who could help debug any issues that come up if we decide
> >to tackle this?
>
> I believe that we should tackle this. And I'm happy to help.
>
> I've also proven that I can be only of limited help, given how often I've
> broken the tree for two days past week. Finally even backing out my code.
> More on that in a separate reply.
>
> Broadly speaking:
>
> I think there would be two things to tackle first, one is to break out from
> repacking Nightlies. It sounded like a good idea at the time, but not any
> more.

I disagree with this. Specifically, the problem is not with the idea of
repacking nightlies. That is actually sound. The problem is with the
implementation. Build jobs that are independent of nightlies, starting
from a possibly unrelated tree, driven by buildbot scripts (now
mozharness, thankfully, aiui) that drive make scripts, which means
configure must run before even knowing what the tree corresponding to
the nightly is because the makefiles will tell, then update the tree,
rerun configure because the tree changed, and then start running
makefile contraptions replicating part of the build system to build
langpacks that are then fed to the actual thing that does the repack.
And that's probably a simplified view of the mess this actually is.

The thing is, if the jobs started from the result of the nightly jobs,
with the right tree, like, you know, test jobs do, the first part of
the horror show would go away. A proper build system to build a
langpack would kill another part. Then remains applying the langpack
to a new build, and that part, afaik, works, mostly.

Mike
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: Getting a Tier-X for localized builds. X=1

Chris AtLee-3
On 21 January 2016 at 05:37, Mike Hommey <[hidden email]> wrote:

> On Thu, Jan 21, 2016 at 04:45:40PM +0800, Axel Hecht wrote:
> > On 14/01/16 19:34, Chris AtLee wrote:
> > >I think having a separate l10n repack per push for a small set of
> > >relatively stable locales would go a long way towards detecting bustage
> > >early.
> > >
> > >https://bugzilla.mozilla.org/show_bug.cgi?id=848284 was filed a while
> back
> > >to get l10n jobs available on Try and Inbound. I've taken a quick poke
> at
> > >this, but haven't had a chance to test it. If we want per-push repacks
> on
> > >inbound, we may want to take a different approach than the one I've
> taken
> > >anyway.
> > >
> > >Is there someone who could help debug any issues that come up if we
> decide
> > >to tackle this?
> >
> > I believe that we should tackle this. And I'm happy to help.
> >
> > I've also proven that I can be only of limited help, given how often I've
> > broken the tree for two days past week. Finally even backing out my code.
> > More on that in a separate reply.
> >
> > Broadly speaking:
> >
> > I think there would be two things to tackle first, one is to break out
> from
> > repacking Nightlies. It sounded like a good idea at the time, but not any
> > more.
>
> I disagree with this. Specifically, the problem is not with the idea of
> repacking nightlies. That is actually sound. The problem is with the
> implementation. Build jobs that are independent of nightlies, starting
> from a possibly unrelated tree, driven by buildbot scripts (now
> mozharness, thankfully, aiui) that drive make scripts, which means
> configure must run before even knowing what the tree corresponding to
> the nightly is because the makefiles will tell, then update the tree,
> rerun configure because the tree changed, and then start running
> makefile contraptions replicating part of the build system to build
> langpacks that are then fed to the actual thing that does the repack.
> And that's probably a simplified view of the mess this actually is.
>
> The thing is, if the jobs started from the result of the nightly jobs,
> with the right tree, like, you know, test jobs do, the first part of
> the horror show would go away. A proper build system to build a
> langpack would kill another part. Then remains applying the langpack
> to a new build, and that part, afaik, works, mostly.
>


I'm not sure what you mean by this. This is how l10n repacks happen
generally right now* - they are triggered after nightly builds finish, and
consume the nightly builds as input. The problem (as I see it) is that this
pretty much happens *only* for nightly builds, which makes it impossible to
test on try or catch bustage on inbound.

Cheers,
Chris

* we also do l10n repacks whenever one of the l10n repositories change -
but I'm not sure if those valuable to consider for this discussion?
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
12