Optimizing what runs on which push

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Optimizing what runs on which push

dmitchell
Background:
 https://bugzilla.mozilla.org/show_bug.cgi?id=1359942

As jobs move to taskcluster, we have an improved opportunity to do some smarter scheduling of what jobs to run on what sort of push.  Of course, it's a thorny subject: optimizing away a task that should run may let a bad push show green, while a subsequent push bears responsibility for the orange it introduces.

One of the more common expectations is that pushes that only change a directory affecting one platform should not cause other platforms' tasks to run.

In the bug above, I have proposed a method of identifying pushes "affecting" a particular platform, and Greg has raised some concerns about the generality of my solution.  I'm happy to generalize, but I would like to keep the process in motion rather than let the perfect be the enemy of the good.

To that end, I'd like some further feedback on implementing this sort of optimization support.

If there's sufficient interest, then this is probably something we could set up a time to talk about in SFO in June.

Dustin
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

Benjamin Smedberg
Dustin, I am very interested in following up on this. I believe that our current strategy of running every almost every test on every checkin is unsustainable long-term, and we're going to have to move to a model where expensive tests typically are run less frequently. In order to make this successful, we're going to have to touch a lot of moving pieces: sheriffing, autobisection, code coverage, writing more unit tests versus integration tests, and so on, and so we need to carefully consider the order in which we start making changes and the tooling required.

I'm not sure that SFO is the right time for this though, especially if it would distract from quantum work.

--BDS


On Thu, May 11, 2017 at 1:05 PM, <[hidden email]> wrote:
Background:
 https://bugzilla.mozilla.org/show_bug.cgi?id=1359942

As jobs move to taskcluster, we have an improved opportunity to do some smarter scheduling of what jobs to run on what sort of push.  Of course, it's a thorny subject: optimizing away a task that should run may let a bad push show green, while a subsequent push bears responsibility for the orange it introduces.

One of the more common expectations is that pushes that only change a directory affecting one platform should not cause other platforms' tasks to run.

In the bug above, I have proposed a method of identifying pushes "affecting" a particular platform, and Greg has raised some concerns about the generality of my solution.  I'm happy to generalize, but I would like to keep the process in motion rather than let the perfect be the enemy of the good.

To that end, I'd like some further feedback on implementing this sort of optimization support.

If there's sufficient interest, then this is probably something we could set up a time to talk about in SFO in June.

Dustin
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds


_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

dmitchell
Thanks Ben --

I agree this is probably not the time to go all-in on this, if it's a
major project.  But I think it's a good time to come to some consensus
on direction, so that other changes we make along the way support, or
at least do not preclude, solving it later.  In particular, I think
we're in a situation where by not discussing our goals we have
different people making contradictory changes.  The part I'm concerned
about -- the in-tree schedule machinery -- is still fairly actively
developed, so it's going to evolve in some direction, and IMHO it's
best that be one, positive direction.

Dustin

2017-05-19 16:08 GMT-04:00 Benjamin Smedberg <[hidden email]>:

> Dustin, I am very interested in following up on this. I believe that our
> current strategy of running every almost every test on every checkin is
> unsustainable long-term, and we're going to have to move to a model where
> expensive tests typically are run less frequently. In order to make this
> successful, we're going to have to touch a lot of moving pieces: sheriffing,
> autobisection, code coverage, writing more unit tests versus integration
> tests, and so on, and so we need to carefully consider the order in which we
> start making changes and the tooling required.
>
> I'm not sure that SFO is the right time for this though, especially if it
> would distract from quantum work.
>
> --BDS
>
>
> On Thu, May 11, 2017 at 1:05 PM, <[hidden email]> wrote:
>>
>> Background:
>>  https://bugzilla.mozilla.org/show_bug.cgi?id=1359942
>>
>> As jobs move to taskcluster, we have an improved opportunity to do some
>> smarter scheduling of what jobs to run on what sort of push.  Of course,
>> it's a thorny subject: optimizing away a task that should run may let a bad
>> push show green, while a subsequent push bears responsibility for the orange
>> it introduces.
>>
>> One of the more common expectations is that pushes that only change a
>> directory affecting one platform should not cause other platforms' tasks to
>> run.
>>
>> In the bug above, I have proposed a method of identifying pushes
>> "affecting" a particular platform, and Greg has raised some concerns about
>> the generality of my solution.  I'm happy to generalize, but I would like to
>> keep the process in motion rather than let the perfect be the enemy of the
>> good.
>>
>> To that end, I'd like some further feedback on implementing this sort of
>> optimization support.
>>
>> If there's sufficient interest, then this is probably something we could
>> set up a time to talk about in SFO in June.
>>
>> Dustin
>> _______________________________________________
>> dev-builds mailing list
>> [hidden email]
>> https://lists.mozilla.org/listinfo/dev-builds
>
>
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

dmitchell
In reply to this post by Benjamin Smedberg
Greg sent some previous threads on this topic along, and I've been in
some other related conversations.  The topic of optimization overlaps
substantially with try, so I'll mix that in too.

There seems to be a reluctance to talk about this face-to-face, but
there's some urgency in that we're spending a *lot* of cash right now
doing unnecessary builds, since the current in-tree implementation is
worse than the ad-hoc thing we had in Buildbot.  So I'll outline a
proposal here and we can all just agree on it! ;)

For try, we will want three variants:
 - "there is no try, only do" -- just run the tasks appropriate for
the push (machines figure out what to do)
 - "try this" -- an explicit list of desired task labels; this would
be supported by a trychooser-like UI to generate the list, but the
trivial case of "please run this exact task" would be easy to type
directly. No surprises. Selected tasks will never be optimized away.
 - "trying my patience" -- the legacy try syntax, which seems to
rankle everyone I talk to about it

As far as optimization:

We support two kinds of optimization: replacement, where we find a
completed task that produced outputs we can re-use; and skipping,
where we decide a task need not be performed at all.  Future plans for
replacement are clear, so we will limit consideration to skipping
optimization.  We'll further ignore SETA: it just causes tasks to be
skipped except every seventh run, and doesn't apply on try.

So what we're left with is a list of files that have been changed in
this push, and a pile of tasks.  The proposal is this:

At the "job description" level, each task specifies when it should be
executed as a disjunctive list - if any match, the task is executed.
That can either be by specifying specific file patterns, or by
specifying an "affected" key/value.  Something like

when:
  - files-changed: ['foo/bar/**/*.js']
  - change-affects: ['platform:windows', 'platform:macosx']
..or..
when:
  - files-changed: ['**/*.rst']
  - change-affects: ['job:docs']

We then add a specification of what "affects" mean to the moz.build files:

with Files('**/*.rst'):
  AFFECTS += ['job:docs']
with Files('**/*.js'):
  AFFECTS += ['job:eslint']

the tricky bit is platforms[*], where everything-except-these affects
a particular platform.  As greg has said, we want to avoid manually
decorating common files like `build/**` with every possible platform,
as that violates the loose coupling between those components.
Instead, let's let computers do that for us, using AFFECTS_ONLY:

with Files('mobile/android/**'):
  AFFECTS_ONLY += ['platform:android']
with Files('browser/**'):
  AFFECTS_ONLY += ['platform:desktop']
with Files('stylo/**'):  # or, per bholley, more specific patterns!
  AFFECTS_ONLY += ['platform:stylo']

then do some post-processing to translate that so that *all* files
affect platform:android except those which have AFFECTS_ONLY for some
other platform.  So `stylo/foo` would have AFFECTS set to
['platform:stylo'], but `build/foo` would have ['platform:android',
'platform:desktop', 'platform:stylo'].

The details of implementing this efficiently are just that -- details.

Note that this gets us the necessary condition that every file affects
*something*, avoiding the case where a push to a particular file might
trigger no tasks.

So let's bikeshed on this - both the try division and the "affects"
stuff - a little, and then we can re-evaluate the level of agreement
and see if we should sit down in SFO.

Dustin

[*] This is the 72nd meaning of the term "platform" - I'm open to better ideas
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

Justin Wood
Heres my thoughts:

* High level:
** I like the idea of letting cringe-worthy `try: ` obsolete out as a
bad idea going forward
** I like the idea of annotations of affects in moz.build files.

I want to avoid "AFFECTS" doing things like having 30 cases of
"windows" here. (e.g. win10, win64, windows10, 'win 10', etc) .

We should still (imo) support (until better exists) a command line
syntax for try, maybe `try: -v2` or something?  This would require
some thought anyway.

We will need a way to normalize the list of `job:` and such as it
relates to AFFECTS* in moz.build to avoid adding bogus ones, and to
facilitate in realizing when ones already exist when adding new
things.

A way to do AND and OR in some way on the taskcluster `when` clause
could also help, e.g. if we want a set of jobs:docs but only when
taskcluster/**.rst is touched, or a python integration test run on
only windows, when the only file touched is a mozbase file only ever
used on windows, we won't need to run linux/osx mozbase tests.

To summarize, I'm a fan of any endeavor to make this more intuitive
and easier. And I'm not strongly against any color for the shed,
especially if someone other than me is painting it.
~Justin Wood (Callek)

On Fri, May 26, 2017 at 2:59 PM, Dustin Mitchell <[hidden email]> wrote:

> Greg sent some previous threads on this topic along, and I've been in
> some other related conversations.  The topic of optimization overlaps
> substantially with try, so I'll mix that in too.
>
> There seems to be a reluctance to talk about this face-to-face, but
> there's some urgency in that we're spending a *lot* of cash right now
> doing unnecessary builds, since the current in-tree implementation is
> worse than the ad-hoc thing we had in Buildbot.  So I'll outline a
> proposal here and we can all just agree on it! ;)
>
> For try, we will want three variants:
>  - "there is no try, only do" -- just run the tasks appropriate for
> the push (machines figure out what to do)
>  - "try this" -- an explicit list of desired task labels; this would
> be supported by a trychooser-like UI to generate the list, but the
> trivial case of "please run this exact task" would be easy to type
> directly. No surprises. Selected tasks will never be optimized away.
>  - "trying my patience" -- the legacy try syntax, which seems to
> rankle everyone I talk to about it
>
> As far as optimization:
>
> We support two kinds of optimization: replacement, where we find a
> completed task that produced outputs we can re-use; and skipping,
> where we decide a task need not be performed at all.  Future plans for
> replacement are clear, so we will limit consideration to skipping
> optimization.  We'll further ignore SETA: it just causes tasks to be
> skipped except every seventh run, and doesn't apply on try.
>
> So what we're left with is a list of files that have been changed in
> this push, and a pile of tasks.  The proposal is this:
>
> At the "job description" level, each task specifies when it should be
> executed as a disjunctive list - if any match, the task is executed.
> That can either be by specifying specific file patterns, or by
> specifying an "affected" key/value.  Something like
>
> when:
>   - files-changed: ['foo/bar/**/*.js']
>   - change-affects: ['platform:windows', 'platform:macosx']
> ..or..
> when:
>   - files-changed: ['**/*.rst']
>   - change-affects: ['job:docs']
>
> We then add a specification of what "affects" mean to the moz.build files:
>
> with Files('**/*.rst'):
>   AFFECTS += ['job:docs']
> with Files('**/*.js'):
>   AFFECTS += ['job:eslint']
>
> the tricky bit is platforms[*], where everything-except-these affects
> a particular platform.  As greg has said, we want to avoid manually
> decorating common files like `build/**` with every possible platform,
> as that violates the loose coupling between those components.
> Instead, let's let computers do that for us, using AFFECTS_ONLY:
>
> with Files('mobile/android/**'):
>   AFFECTS_ONLY += ['platform:android']
> with Files('browser/**'):
>   AFFECTS_ONLY += ['platform:desktop']
> with Files('stylo/**'):  # or, per bholley, more specific patterns!
>   AFFECTS_ONLY += ['platform:stylo']
>
> then do some post-processing to translate that so that *all* files
> affect platform:android except those which have AFFECTS_ONLY for some
> other platform.  So `stylo/foo` would have AFFECTS set to
> ['platform:stylo'], but `build/foo` would have ['platform:android',
> 'platform:desktop', 'platform:stylo'].
>
> The details of implementing this efficiently are just that -- details.
>
> Note that this gets us the necessary condition that every file affects
> *something*, avoiding the case where a push to a particular file might
> trigger no tasks.
>
> So let's bikeshed on this - both the try division and the "affects"
> stuff - a little, and then we can re-evaluate the level of agreement
> and see if we should sit down in SFO.
>
> Dustin
>
> [*] This is the 72nd meaning of the term "platform" - I'm open to better ideas
> _______________________________________________
> dev-builds mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-builds
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

Andrew Halberstadt
In reply to this post by dmitchell
I'm very interested in the try aspect of this (please CC me on relevant bugs).

I believe there is a relatively simple way taskcluster can wash its hand of try syntax completely and instead just use a list of tasks (or other similar data structure). We could still provide a legacy try syntax to people, but the syntax parser would be on the client side (e.g in a mach_command). If taskcluster does this, we can build all sorts of crazy trychooser thingamajigs. See this blog post for specific implementation steps as well as what a "fuzzyfinder" trychooser might look like:

I agree with your 3 variants for try, I just don't think they should be implemented under /taskcluster.

On Fri, May 26, 2017 at 2:59 PM, Dustin Mitchell <[hidden email]> wrote:
Greg sent some previous threads on this topic along, and I've been in
some other related conversations.  The topic of optimization overlaps
substantially with try, so I'll mix that in too.

There seems to be a reluctance to talk about this face-to-face, but
there's some urgency in that we're spending a *lot* of cash right now
doing unnecessary builds, since the current in-tree implementation is
worse than the ad-hoc thing we had in Buildbot.  So I'll outline a
proposal here and we can all just agree on it! ;)

For try, we will want three variants:
 - "there is no try, only do" -- just run the tasks appropriate for
the push (machines figure out what to do)
 - "try this" -- an explicit list of desired task labels; this would
be supported by a trychooser-like UI to generate the list, but the
trivial case of "please run this exact task" would be easy to type
directly. No surprises. Selected tasks will never be optimized away.
 - "trying my patience" -- the legacy try syntax, which seems to
rankle everyone I talk to about it

As far as optimization:

We support two kinds of optimization: replacement, where we find a
completed task that produced outputs we can re-use; and skipping,
where we decide a task need not be performed at all.  Future plans for
replacement are clear, so we will limit consideration to skipping
optimization.  We'll further ignore SETA: it just causes tasks to be
skipped except every seventh run, and doesn't apply on try.

So what we're left with is a list of files that have been changed in
this push, and a pile of tasks.  The proposal is this:

At the "job description" level, each task specifies when it should be
executed as a disjunctive list - if any match, the task is executed.
That can either be by specifying specific file patterns, or by
specifying an "affected" key/value.  Something like

when:
  - files-changed: ['foo/bar/**/*.js']
  - change-affects: ['platform:windows', 'platform:macosx']
..or..
when:
  - files-changed: ['**/*.rst']
  - change-affects: ['job:docs']

We then add a specification of what "affects" mean to the moz.build files:

with Files('**/*.rst'):
  AFFECTS += ['job:docs']
with Files('**/*.js'):
  AFFECTS += ['job:eslint']

the tricky bit is platforms[*], where everything-except-these affects
a particular platform.  As greg has said, we want to avoid manually
decorating common files like `build/**` with every possible platform,
as that violates the loose coupling between those components.
Instead, let's let computers do that for us, using AFFECTS_ONLY:

with Files('mobile/android/**'):
  AFFECTS_ONLY += ['platform:android']
with Files('browser/**'):
  AFFECTS_ONLY += ['platform:desktop']
with Files('stylo/**'):  # or, per bholley, more specific patterns!
  AFFECTS_ONLY += ['platform:stylo']

then do some post-processing to translate that so that *all* files
affect platform:android except those which have AFFECTS_ONLY for some
other platform.  So `stylo/foo` would have AFFECTS set to
['platform:stylo'], but `build/foo` would have ['platform:android',
'platform:desktop', 'platform:stylo'].

The details of implementing this efficiently are just that -- details.

Note that this gets us the necessary condition that every file affects
*something*, avoiding the case where a push to a particular file might
trigger no tasks.

So let's bikeshed on this - both the try division and the "affects"
stuff - a little, and then we can re-evaluate the level of agreement
and see if we should sit down in SFO.

Dustin

[*] This is the 72nd meaning of the term "platform" - I'm open to better ideas
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds


_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

dmitchell
Andrew, I apologize for not giving credit -- the "try this" variant is
intended to be exactly what you've proposed in that blog entry.

I like the idea of doing the processing client-side.  I think we will
still want awareness of the three options in taskcluster/, though:
 - "there is no try" is basically a normal push
 - "try this" amounts to a simple target_task method (return
task.label in config.params['requested_tasks'] or something like that)
 - "trying my patience" would be legacy, so re-implementing it
elsewhere is probably not productive.  It could potentially feed into
the "try this" implementation.
I've copied you on more bugs than you probably want, and will continue
to do so :)

Callek, I think this would be command-line accessible at any rate.
Whether it's accessible with a command line embedded in a commit
message is an open question, but certainly the trivial "I want to run
the task labeled foo-bar" will be a one-liner, either via mach, an hg
extension, or simple hg syntax.

I would like to avoid arbitrary logic expressions, as they can be hard
to compose, and they start to move the AFFECTS logic from moz.build
into task definitions.  I think we should make AFFECTS sufficiently
expressive to handle that, and I'm not sure it's there yet.  For
example, as proposed we might end up with winversion:10, :7, etc. if
we have code specific to those versions.  But I think we'll find a
balance: get too fine-grained in AFFECTS and you'll start making
mistakes and failing to run important tasks.

To the hypotheticals you raised:

> if we want a set of jobs:docs but only when taskcluster/**.rst is touched

I think this would be

with Files('taskcluster/**.rst'):
  AFFECTS += ['job:tc-docs']

> python integration test run on only windows, when the only file touched is a mozbase file only ever used on windows, we won't need to run linux/osx mozbase tests.

I think the idea is that there is a mozbase integration test which
runs in a task on each platform, mozbase/integrationtest/winthing.py
is only even imported on Windows. That seems too fine-grained: as you
suggest, the only push that would affect is one that changes *only*
winthing.py. If the push also changes other files in
mozbase/integrationtest, then we will run the test on all platforms.
And if any non-test code is changed, we'll run the entire test suite
for that change, or at least a large portion of it.  I'm not sure how
best to represent the latter two pushes (just mozbase integration
tests, and only test code having changed).

Dustin

2017-05-26 16:10 GMT-04:00 Andrew Halberstadt <[hidden email]>:

> I'm very interested in the try aspect of this (please CC me on relevant
> bugs).
>
> I believe there is a relatively simple way taskcluster can wash its hand of
> try syntax completely and instead just use a list of tasks (or other similar
> data structure). We could still provide a legacy try syntax to people, but
> the syntax parser would be on the client side (e.g in a mach_command). If
> taskcluster does this, we can build all sorts of crazy trychooser
> thingamajigs. See this blog post for specific implementation steps as well
> as what a "fuzzyfinder" trychooser might look like:
> https://ahal.ca/blog/2017/fuzzy-try-chooser/
>
> I agree with your 3 variants for try, I just don't think they should be
> implemented under /taskcluster.
>
> On Fri, May 26, 2017 at 2:59 PM, Dustin Mitchell <[hidden email]>
> wrote:
>>
>> Greg sent some previous threads on this topic along, and I've been in
>> some other related conversations.  The topic of optimization overlaps
>> substantially with try, so I'll mix that in too.
>>
>> There seems to be a reluctance to talk about this face-to-face, but
>> there's some urgency in that we're spending a *lot* of cash right now
>> doing unnecessary builds, since the current in-tree implementation is
>> worse than the ad-hoc thing we had in Buildbot.  So I'll outline a
>> proposal here and we can all just agree on it! ;)
>>
>> For try, we will want three variants:
>>  - "there is no try, only do" -- just run the tasks appropriate for
>> the push (machines figure out what to do)
>>  - "try this" -- an explicit list of desired task labels; this would
>> be supported by a trychooser-like UI to generate the list, but the
>> trivial case of "please run this exact task" would be easy to type
>> directly. No surprises. Selected tasks will never be optimized away.
>>  - "trying my patience" -- the legacy try syntax, which seems to
>> rankle everyone I talk to about it
>>
>> As far as optimization:
>>
>> We support two kinds of optimization: replacement, where we find a
>> completed task that produced outputs we can re-use; and skipping,
>> where we decide a task need not be performed at all.  Future plans for
>> replacement are clear, so we will limit consideration to skipping
>> optimization.  We'll further ignore SETA: it just causes tasks to be
>> skipped except every seventh run, and doesn't apply on try.
>>
>> So what we're left with is a list of files that have been changed in
>> this push, and a pile of tasks.  The proposal is this:
>>
>> At the "job description" level, each task specifies when it should be
>> executed as a disjunctive list - if any match, the task is executed.
>> That can either be by specifying specific file patterns, or by
>> specifying an "affected" key/value.  Something like
>>
>> when:
>>   - files-changed: ['foo/bar/**/*.js']
>>   - change-affects: ['platform:windows', 'platform:macosx']
>> ..or..
>> when:
>>   - files-changed: ['**/*.rst']
>>   - change-affects: ['job:docs']
>>
>> We then add a specification of what "affects" mean to the moz.build files:
>>
>> with Files('**/*.rst'):
>>   AFFECTS += ['job:docs']
>> with Files('**/*.js'):
>>   AFFECTS += ['job:eslint']
>>
>> the tricky bit is platforms[*], where everything-except-these affects
>> a particular platform.  As greg has said, we want to avoid manually
>> decorating common files like `build/**` with every possible platform,
>> as that violates the loose coupling between those components.
>> Instead, let's let computers do that for us, using AFFECTS_ONLY:
>>
>> with Files('mobile/android/**'):
>>   AFFECTS_ONLY += ['platform:android']
>> with Files('browser/**'):
>>   AFFECTS_ONLY += ['platform:desktop']
>> with Files('stylo/**'):  # or, per bholley, more specific patterns!
>>   AFFECTS_ONLY += ['platform:stylo']
>>
>> then do some post-processing to translate that so that *all* files
>> affect platform:android except those which have AFFECTS_ONLY for some
>> other platform.  So `stylo/foo` would have AFFECTS set to
>> ['platform:stylo'], but `build/foo` would have ['platform:android',
>> 'platform:desktop', 'platform:stylo'].
>>
>> The details of implementing this efficiently are just that -- details.
>>
>> Note that this gets us the necessary condition that every file affects
>> *something*, avoiding the case where a push to a particular file might
>> trigger no tasks.
>>
>> So let's bikeshed on this - both the try division and the "affects"
>> stuff - a little, and then we can re-evaluate the level of agreement
>> and see if we should sit down in SFO.
>>
>> Dustin
>>
>> [*] This is the 72nd meaning of the term "platform" - I'm open to better
>> ideas
>> _______________________________________________
>> dev-builds mailing list
>> [hidden email]
>> https://lists.mozilla.org/listinfo/dev-builds
>
>
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

Mike Hommey
On Fri, May 26, 2017 at 06:20:33PM -0400, Dustin Mitchell wrote:

> Andrew, I apologize for not giving credit -- the "try this" variant is
> intended to be exactly what you've proposed in that blog entry.
>
> I like the idea of doing the processing client-side.  I think we will
> still want awareness of the three options in taskcluster/, though:
>  - "there is no try" is basically a normal push
>  - "try this" amounts to a simple target_task method (return
> task.label in config.params['requested_tasks'] or something like that)
>  - "trying my patience" would be legacy, so re-implementing it
> elsewhere is probably not productive.  It could potentially feed into
> the "try this" implementation.
> I've copied you on more bugs than you probably want, and will continue
> to do so :)
>
> Callek, I think this would be command-line accessible at any rate.
> Whether it's accessible with a command line embedded in a commit
> message is an open question, but certainly the trivial "I want to run
> the task labeled foo-bar" will be a one-liner, either via mach, an hg
> extension, or simple hg syntax.

Please make it so that it's possible to figure out why some job has run
or not. Currently, it's near impossible.

Case at point: I recently had two try pushes run a tc[tier2](l10n) job,
that happens to be busted for reasons entirely unrelated to my changes,
and I have no idea why it was triggered. It gets better: another try
push of mine with about the same changes did *not* trigger it.

Also, in multiple occasions, I've had to wonder why jobs have not been
started that I was expecting to have.

Mike
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

dmitchell
2017-05-26 18:29 GMT-04:00 Mike Hommey <[hidden email]>:
> Please make it so that it's possible to figure out why some job has run
> or not. Currently, it's near impossible.

Yep. According to my list, that's reason #34 to call the legacy try
parser[*] "trying my patience".

Dustin

[*] reason #13 is that there are actually six distinct try parsers
with distinct behaviors
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

Mike Hommey
On Fri, May 26, 2017 at 06:35:42PM -0400, Dustin Mitchell wrote:

> 2017-05-26 18:29 GMT-04:00 Mike Hommey <[hidden email]>:
> > Please make it so that it's possible to figure out why some job has run
> > or not. Currently, it's near impossible.
>
> Yep. According to my list, that's reason #34 to call the legacy try
> parser[*] "trying my patience".
>
> Dustin
>
> [*] reason #13 is that there are actually six distinct try parsers
> with distinct behaviors

Another issue is that some jobs simply don't have any trigger in the try
syntax at all (at least, not documented on trychooser), and happen by
"chance", for some definition of chance that varies by job (and the
"optimizations" are partially responsible for this, e.g. SM jobs).

Mike
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

Gregory Szorc-3
In reply to this post by dmitchell
On Thu, May 11, 2017 at 10:05 AM, <[hidden email]> wrote:
Background:
 https://bugzilla.mozilla.org/show_bug.cgi?id=1359942

As jobs move to taskcluster, we have an improved opportunity to do some smarter scheduling of what jobs to run on what sort of push.  Of course, it's a thorny subject: optimizing away a task that should run may let a bad push show green, while a subsequent push bears responsibility for the orange it introduces.

One of the more common expectations is that pushes that only change a directory affecting one platform should not cause other platforms' tasks to run.

In the bug above, I have proposed a method of identifying pushes "affecting" a particular platform, and Greg has raised some concerns about the generality of my solution.  I'm happy to generalize, but I would like to keep the process in motion rather than let the perfect be the enemy of the good.

To that end, I'd like some further feedback on implementing this sort of optimization support.

If there's sufficient interest, then this is probably something we could set up a time to talk about in SFO in June.

I still owe a proper reply to everything in this thread. But as I'm preparing to send out another Firefox developer survey, I'm looking at the old one we conducted and there are some results that seemingly justify doing work to intelligently run things based on what changed.

One of the questions on the last survey was "Thinking of running automated tests, rank the following potential improvements in terms of their impact on your productivity." "Determine and run relevant tests based on what source files have been modified" was one of the most wanted improvements - right up there with "make try runs really fast so I can effectively iterate on automated tests using try instead."

Another question was "Rank the following issues according to how they impact your ability to debug tests locally." "Tests take too long to run / stop me from using my computer" was one of the "most impact" elements. Although I concede the "stop me from using my computer" bit could be contributing a bit of weight, since front-end developers were 2:1 more likely than platform developers to answer "most impact." This could be attributed to many mochitests requiring focus and thus rendering your machine unusable.

If you want, I can try to tease out the developer impact of running fewer tests in the next survey.

_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

Mike Hommey
On Tue, May 30, 2017 at 05:25:20PM -0700, Gregory Szorc wrote:

> On Thu, May 11, 2017 at 10:05 AM, <[hidden email]> wrote:
>
> > Background:
> >  https://bugzilla.mozilla.org/show_bug.cgi?id=1359942
> >
> > As jobs move to taskcluster, we have an improved opportunity to do some
> > smarter scheduling of what jobs to run on what sort of push.  Of course,
> > it's a thorny subject: optimizing away a task that should run may let a bad
> > push show green, while a subsequent push bears responsibility for the
> > orange it introduces.
> >
> > One of the more common expectations is that pushes that only change a
> > directory affecting one platform should not cause other platforms' tasks to
> > run.
> >
> > In the bug above, I have proposed a method of identifying pushes
> > "affecting" a particular platform, and Greg has raised some concerns about
> > the generality of my solution.  I'm happy to generalize, but I would like
> > to keep the process in motion rather than let the perfect be the enemy of
> > the good.
> >
> > To that end, I'd like some further feedback on implementing this sort of
> > optimization support.
> >
> > If there's sufficient interest, then this is probably something we could
> > set up a time to talk about in SFO in June.
> >
>
> I still owe a proper reply to everything in this thread. But as I'm
> preparing to send out another Firefox developer survey, I'm looking at the
> old one we conducted and there are some results that seemingly justify
> doing work to intelligently run things based on what changed.
>
> One of the questions on the last survey was "Thinking of running automated
> tests, rank the following potential improvements in terms of their impact
> on your productivity." "Determine and run relevant tests based on what
> source files have been modified" was one of the most wanted improvements -
> right up there with "make try runs really fast so I can effectively iterate
> on automated tests using try instead."

FWIW, I recently added a unit test for Firefox. On try, I essentially
had to run the whole corresponding test suite (browser-chrome), instead
of just the block that contains the test, because it's almost impossible
to figure out which one it's going to run in.

Making /that/ less painful would go a long way.

Mike
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

dmitchell
I think this topic is big enough already without broadening it into
"how can we make automation better".  But getting some data from the
survey sounds great! Maybe it makes sense to get down to the core
question we have here:

When you push to try, how often do you want:
 * to run every job relevant to the changes you have made
   [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always]
 * to run a specific job or set of jobs
   [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always]
* to run all jobs for one or more platforms
   [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always]

Or something like that?

2017-05-30 21:21 GMT-04:00 Mike Hommey <[hidden email]>:

> On Tue, May 30, 2017 at 05:25:20PM -0700, Gregory Szorc wrote:
>> On Thu, May 11, 2017 at 10:05 AM, <[hidden email]> wrote:
>>
>> > Background:
>> >  https://bugzilla.mozilla.org/show_bug.cgi?id=1359942
>> >
>> > As jobs move to taskcluster, we have an improved opportunity to do some
>> > smarter scheduling of what jobs to run on what sort of push.  Of course,
>> > it's a thorny subject: optimizing away a task that should run may let a bad
>> > push show green, while a subsequent push bears responsibility for the
>> > orange it introduces.
>> >
>> > One of the more common expectations is that pushes that only change a
>> > directory affecting one platform should not cause other platforms' tasks to
>> > run.
>> >
>> > In the bug above, I have proposed a method of identifying pushes
>> > "affecting" a particular platform, and Greg has raised some concerns about
>> > the generality of my solution.  I'm happy to generalize, but I would like
>> > to keep the process in motion rather than let the perfect be the enemy of
>> > the good.
>> >
>> > To that end, I'd like some further feedback on implementing this sort of
>> > optimization support.
>> >
>> > If there's sufficient interest, then this is probably something we could
>> > set up a time to talk about in SFO in June.
>> >
>>
>> I still owe a proper reply to everything in this thread. But as I'm
>> preparing to send out another Firefox developer survey, I'm looking at the
>> old one we conducted and there are some results that seemingly justify
>> doing work to intelligently run things based on what changed.
>>
>> One of the questions on the last survey was "Thinking of running automated
>> tests, rank the following potential improvements in terms of their impact
>> on your productivity." "Determine and run relevant tests based on what
>> source files have been modified" was one of the most wanted improvements -
>> right up there with "make try runs really fast so I can effectively iterate
>> on automated tests using try instead."
>
> FWIW, I recently added a unit test for Firefox. On try, I essentially
> had to run the whole corresponding test suite (browser-chrome), instead
> of just the block that contains the test, because it's almost impossible
> to figure out which one it's going to run in.
>
> Making /that/ less painful would go a long way.
>
> Mike
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

Benjamin Smedberg
I don't know if I'm the typical use-case, but the big problem for me is that when I change something such as plugins, the jobs as currently bucketed don't help much. There are reftests, crashtests, mochitest-plain, and mochitest-browser-chrome which test plugin code paths, and the splitting means that I pretty much need to run all of those suites on try to get adequate coverage.

There is a facility in the tree for tagging tests (at least some suites) and running only tests with certain tags. However, that doesn't help on try very much because you can't tell try to run only a certain set of tags.

Your ultimate goal is to save $$, and my ultimate goal is to reduce the time to results to make the developer cycle faster (let's measure round-trip times in minutes instead of hours). I don't believe that running subsets of the current jobs will solve either of those goals. Maybe it's a step along the path, but I don't see how that fits together yet.

Related to this, you need to remember one of the primary functions of try is to validate changes before landing on inbound/autoland, and so reduce the backout rate from sheriffs. Running subsets of tests will increase the backout rate. I think that's probably ok, but we need to be aware of this social/workflow impact as it's not just a technical decision.

--BDS


On Wed, May 31, 2017 at 8:47 AM, Dustin Mitchell <[hidden email]> wrote:
I think this topic is big enough already without broadening it into
"how can we make automation better".  But getting some data from the
survey sounds great! Maybe it makes sense to get down to the core
question we have here:

When you push to try, how often do you want:
 * to run every job relevant to the changes you have made
   [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always]
 * to run a specific job or set of jobs
   [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always]
* to run all jobs for one or more platforms
   [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always]

Or something like that?

2017-05-30 21:21 GMT-04:00 Mike Hommey <[hidden email]>:
> On Tue, May 30, 2017 at 05:25:20PM -0700, Gregory Szorc wrote:
>> On Thu, May 11, 2017 at 10:05 AM, <[hidden email]> wrote:
>>
>> > Background:
>> >  https://bugzilla.mozilla.org/show_bug.cgi?id=1359942
>> >
>> > As jobs move to taskcluster, we have an improved opportunity to do some
>> > smarter scheduling of what jobs to run on what sort of push.  Of course,
>> > it's a thorny subject: optimizing away a task that should run may let a bad
>> > push show green, while a subsequent push bears responsibility for the
>> > orange it introduces.
>> >
>> > One of the more common expectations is that pushes that only change a
>> > directory affecting one platform should not cause other platforms' tasks to
>> > run.
>> >
>> > In the bug above, I have proposed a method of identifying pushes
>> > "affecting" a particular platform, and Greg has raised some concerns about
>> > the generality of my solution.  I'm happy to generalize, but I would like
>> > to keep the process in motion rather than let the perfect be the enemy of
>> > the good.
>> >
>> > To that end, I'd like some further feedback on implementing this sort of
>> > optimization support.
>> >
>> > If there's sufficient interest, then this is probably something we could
>> > set up a time to talk about in SFO in June.
>> >
>>
>> I still owe a proper reply to everything in this thread. But as I'm
>> preparing to send out another Firefox developer survey, I'm looking at the
>> old one we conducted and there are some results that seemingly justify
>> doing work to intelligently run things based on what changed.
>>
>> One of the questions on the last survey was "Thinking of running automated
>> tests, rank the following potential improvements in terms of their impact
>> on your productivity." "Determine and run relevant tests based on what
>> source files have been modified" was one of the most wanted improvements -
>> right up there with "make try runs really fast so I can effectively iterate
>> on automated tests using try instead."
>
> FWIW, I recently added a unit test for Firefox. On try, I essentially
> had to run the whole corresponding test suite (browser-chrome), instead
> of just the block that contains the test, because it's almost impossible
> to figure out which one it's going to run in.
>
> Making /that/ less painful would go a long way.
>
> Mike
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds


_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

dmitchell
2017-05-31 9:26 GMT-04:00 Benjamin Smedberg <[hidden email]>:
> Your ultimate goal is to save $$, and my ultimate goal is to reduce the time
> to results to make the developer cycle faster (let's measure round-trip
> times in minutes instead of hours). I don't believe that running subsets of
> the current jobs will solve either of those goals. Maybe it's a step along
> the path, but I don't see how that fits together yet.

We share the goal of getting the most appropriate results to
developers as quickly as possible with the resources we have
available. What we're discussing here is only a part of the work
required (other parts being hyperchunking, faster job startup, and
recompiling only when necessary, to name a few).

One of the longest-lived approaches to getting results efficiently has
been try syntax, and the admonition not to use `-p all`.  The
experience of trying to figure out what syntax to use instead has been
pretty awful, and suffers from both under-estimation (failing to run a
job that would be orange) and over-estimation (running unnecessary
jobs).  It's a double-whammy: time lost figuring out try syntax, then
time lost over a backout due to a missed job in try.

So the proposal addresses three issues:
 - frustrating and ineffective try user interface
 - high load and consequent long wait times
 - elevated backout rate

> Related to this, you need to remember one of the primary functions of try is
> to validate changes before landing on inbound/autoland, and so reduce the
> backout rate from sheriffs. Running subsets of tests will increase the
> backout rate. I think that's probably ok, but we need to be aware of this
> social/workflow impact as it's not just a technical decision.

You'll note that a consequence of the proposal would be to *decrease*
the backout rate.  To achieve this, we must be careful about two
things:
 1. Be conservative in what we decide to skip (better to run a
trivially green job than skip an orange)
 2. Provide tools that figure out what needs to run in try, to avoid
the double-whammy described above

#1 is why I built optimization around "skipping" instead of "running"
-- the burden of proof should be on the task configuration to say when
a job can be safely skipped, rather than trying to enumerate all the
conditions in which the job should be run.  #2 is behind the "there is
no try, just do"[*] behavior in the proposal.  But "will this push
break things" is not the only use-case for try, so we need "try this"
functionality (requesting specific jobs) for most of the other
use-cases.

Dustin

[*] Maybe "just try it" is a better name..
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

Gregory Szorc-3
In reply to this post by Benjamin Smedberg
On Wed, May 31, 2017 at 6:26 AM, Benjamin Smedberg <[hidden email]> wrote:
I don't know if I'm the typical use-case, but the big problem for me is that when I change something such as plugins, the jobs as currently bucketed don't help much. There are reftests, crashtests, mochitest-plain, and mochitest-browser-chrome which test plugin code paths, and the splitting means that I pretty much need to run all of those suites on try to get adequate coverage.

Exactly.

FWIW my ideal end state is try syntax is eliminated or relegated to a <5% use case because the tools figure out the optimal set of what to run based on what changed.
 

There is a facility in the tree for tagging tests (at least some suites) and running only tests with certain tags. However, that doesn't help on try very much because you can't tell try to run only a certain set of tags.

Not supporting this is a bug IMO.

Of course, you can argue that the tagging system is a subset or one-off of a properly designed "change impacts" system. The difference is the tagging system exists today, so it provides end-user benefit today.
 

Your ultimate goal is to save $$, and my ultimate goal is to reduce the time to results to make the developer cycle faster (let's measure round-trip times in minutes instead of hours). I don't believe that running subsets of the current jobs will solve either of those goals. Maybe it's a step along the path, but I don't see how that fits together yet.

I agree we should largely leave money out of the discussion. While the cost to run the CI is significant, it is still relatively cheap compared to people time (developers cost ~1000x more than many EC2 instances). The people time that will be saved from an efficient development cycle will dwarf the money savings from reduced platform consumption. And deploying a more efficient CI pipeline will naturally reduce operational costs. So we should keep focused on the people impact.

That being said, we also need to take care to not drastically increase our cost to run CI because it is a non-negligible expense. What's happening now is groups like Stylo and Quantum want one-off build configurations. We also have "Go Faster" efforts to decouple development of some features from core Firefox, leading to more one-off build configurations. Running most of the jobs most of the time with N+1 build configurations quickly increases our CI operational costs. And more intelligently running things based on what changed can keep costs in check, avoiding most discussions about budget, value, etc.
 

Related to this, you need to remember one of the primary functions of try is to validate changes before landing on inbound/autoland, and so reduce the backout rate from sheriffs. Running subsets of tests will increase the backout rate. I think that's probably ok, but we need to be aware of this social/workflow impact as it's not just a technical decision.

Agreed. Anecdotally, I find it more frustrating to be backed out the farther down the release pipeline the changeset is. My threshold for getting more than inconvenienced (read: mildly frustrated) is when things run OK on autoland/inbound then fail on central.

I also agree we should be concerned about sheriff impact. FWIW, we would like most backouts to be automated. But this requires a way to identify when a changeset is good. This is actually a hard problem. We were planning to implement an API on Treeherder to determine this. However, this project seemed to have gotten lost as part of recent reorgs. My guess is it will surface again sometime in the next year as part of overall {sheriff happiness, development cycle, autoland} work.
 


On Wed, May 31, 2017 at 8:47 AM, Dustin Mitchell <[hidden email]> wrote:
I think this topic is big enough already without broadening it into
"how can we make automation better".  But getting some data from the
survey sounds great! Maybe it makes sense to get down to the core
question we have here:

When you push to try, how often do you want:
 * to run every job relevant to the changes you have made
   [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always]
 * to run a specific job or set of jobs
   [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always]
* to run all jobs for one or more platforms
   [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always]

Or something like that?

2017-05-30 21:21 GMT-04:00 Mike Hommey <[hidden email]>:
> On Tue, May 30, 2017 at 05:25:20PM -0700, Gregory Szorc wrote:
>> On Thu, May 11, 2017 at 10:05 AM, <[hidden email]> wrote:
>>
>> > Background:
>> >  https://bugzilla.mozilla.org/show_bug.cgi?id=1359942
>> >
>> > As jobs move to taskcluster, we have an improved opportunity to do some
>> > smarter scheduling of what jobs to run on what sort of push.  Of course,
>> > it's a thorny subject: optimizing away a task that should run may let a bad
>> > push show green, while a subsequent push bears responsibility for the
>> > orange it introduces.
>> >
>> > One of the more common expectations is that pushes that only change a
>> > directory affecting one platform should not cause other platforms' tasks to
>> > run.
>> >
>> > In the bug above, I have proposed a method of identifying pushes
>> > "affecting" a particular platform, and Greg has raised some concerns about
>> > the generality of my solution.  I'm happy to generalize, but I would like
>> > to keep the process in motion rather than let the perfect be the enemy of
>> > the good.
>> >
>> > To that end, I'd like some further feedback on implementing this sort of
>> > optimization support.
>> >
>> > If there's sufficient interest, then this is probably something we could
>> > set up a time to talk about in SFO in June.
>> >
>>
>> I still owe a proper reply to everything in this thread. But as I'm
>> preparing to send out another Firefox developer survey, I'm looking at the
>> old one we conducted and there are some results that seemingly justify
>> doing work to intelligently run things based on what changed.
>>
>> One of the questions on the last survey was "Thinking of running automated
>> tests, rank the following potential improvements in terms of their impact
>> on your productivity." "Determine and run relevant tests based on what
>> source files have been modified" was one of the most wanted improvements -
>> right up there with "make try runs really fast so I can effectively iterate
>> on automated tests using try instead."
>
> FWIW, I recently added a unit test for Firefox. On try, I essentially
> had to run the whole corresponding test suite (browser-chrome), instead
> of just the block that contains the test, because it's almost impossible
> to figure out which one it's going to run in.
>
> Making /that/ less painful would go a long way.
>
> Mike
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds



_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

Andrew Halberstadt
On Wed, May 31, 2017 at 1:31 PM, Gregory Szorc <[hidden email]> wrote:

There is a facility in the tree for tagging tests (at least some suites) and running only tests with certain tags. However, that doesn't help on try very much because you can't tell try to run only a certain set of tags.

Not supporting this is a bug IMO.

Of course, you can argue that the tagging system is a subset or one-off of a properly designed "change impacts" system. The difference is the tagging system exists today, so it provides end-user benefit today.
 

Actually, you can run tags on try with:

./mach try <syntax> --and --tag <tag1> --tag <tag2>

You can also do test paths:
./mach try <syntax> --and <test path>

The --and takes the intersection of <syntax> and <tag> rather than
the union. I agree that it isn't as intuitive as could be, and there are
probably many edge cases for which it falls apart. The |mach try|
command in general could use a lot of TLC.

On Wed, May 31, 2017 at 1:31 PM, Gregory Szorc <[hidden email]> wrote:
On Wed, May 31, 2017 at 6:26 AM, Benjamin Smedberg <[hidden email]> wrote:
I don't know if I'm the typical use-case, but the big problem for me is that when I change something such as plugins, the jobs as currently bucketed don't help much. There are reftests, crashtests, mochitest-plain, and mochitest-browser-chrome which test plugin code paths, and the splitting means that I pretty much need to run all of those suites on try to get adequate coverage.

Exactly.

FWIW my ideal end state is try syntax is eliminated or relegated to a <5% use case because the tools figure out the optimal set of what to run based on what changed.
 

There is a facility in the tree for tagging tests (at least some suites) and running only tests with certain tags. However, that doesn't help on try very much because you can't tell try to run only a certain set of tags.

Not supporting this is a bug IMO.

Of course, you can argue that the tagging system is a subset or one-off of a properly designed "change impacts" system. The difference is the tagging system exists today, so it provides end-user benefit today.
 

Your ultimate goal is to save $$, and my ultimate goal is to reduce the time to results to make the developer cycle faster (let's measure round-trip times in minutes instead of hours). I don't believe that running subsets of the current jobs will solve either of those goals. Maybe it's a step along the path, but I don't see how that fits together yet.

I agree we should largely leave money out of the discussion. While the cost to run the CI is significant, it is still relatively cheap compared to people time (developers cost ~1000x more than many EC2 instances). The people time that will be saved from an efficient development cycle will dwarf the money savings from reduced platform consumption. And deploying a more efficient CI pipeline will naturally reduce operational costs. So we should keep focused on the people impact.

That being said, we also need to take care to not drastically increase our cost to run CI because it is a non-negligible expense. What's happening now is groups like Stylo and Quantum want one-off build configurations. We also have "Go Faster" efforts to decouple development of some features from core Firefox, leading to more one-off build configurations. Running most of the jobs most of the time with N+1 build configurations quickly increases our CI operational costs. And more intelligently running things based on what changed can keep costs in check, avoiding most discussions about budget, value, etc.
 

Related to this, you need to remember one of the primary functions of try is to validate changes before landing on inbound/autoland, and so reduce the backout rate from sheriffs. Running subsets of tests will increase the backout rate. I think that's probably ok, but we need to be aware of this social/workflow impact as it's not just a technical decision.

Agreed. Anecdotally, I find it more frustrating to be backed out the farther down the release pipeline the changeset is. My threshold for getting more than inconvenienced (read: mildly frustrated) is when things run OK on autoland/inbound then fail on central.

I also agree we should be concerned about sheriff impact. FWIW, we would like most backouts to be automated. But this requires a way to identify when a changeset is good. This is actually a hard problem. We were planning to implement an API on Treeherder to determine this. However, this project seemed to have gotten lost as part of recent reorgs. My guess is it will surface again sometime in the next year as part of overall {sheriff happiness, development cycle, autoland} work.
 


On Wed, May 31, 2017 at 8:47 AM, Dustin Mitchell <[hidden email]> wrote:
I think this topic is big enough already without broadening it into
"how can we make automation better".  But getting some data from the
survey sounds great! Maybe it makes sense to get down to the core
question we have here:

When you push to try, how often do you want:
 * to run every job relevant to the changes you have made
   [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always]
 * to run a specific job or set of jobs
   [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always]
* to run all jobs for one or more platforms
   [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always]

Or something like that?

2017-05-30 21:21 GMT-04:00 Mike Hommey <[hidden email]>:
> On Tue, May 30, 2017 at 05:25:20PM -0700, Gregory Szorc wrote:
>> On Thu, May 11, 2017 at 10:05 AM, <[hidden email]> wrote:
>>
>> > Background:
>> >  https://bugzilla.mozilla.org/show_bug.cgi?id=1359942
>> >
>> > As jobs move to taskcluster, we have an improved opportunity to do some
>> > smarter scheduling of what jobs to run on what sort of push.  Of course,
>> > it's a thorny subject: optimizing away a task that should run may let a bad
>> > push show green, while a subsequent push bears responsibility for the
>> > orange it introduces.
>> >
>> > One of the more common expectations is that pushes that only change a
>> > directory affecting one platform should not cause other platforms' tasks to
>> > run.
>> >
>> > In the bug above, I have proposed a method of identifying pushes
>> > "affecting" a particular platform, and Greg has raised some concerns about
>> > the generality of my solution.  I'm happy to generalize, but I would like
>> > to keep the process in motion rather than let the perfect be the enemy of
>> > the good.
>> >
>> > To that end, I'd like some further feedback on implementing this sort of
>> > optimization support.
>> >
>> > If there's sufficient interest, then this is probably something we could
>> > set up a time to talk about in SFO in June.
>> >
>>
>> I still owe a proper reply to everything in this thread. But as I'm
>> preparing to send out another Firefox developer survey, I'm looking at the
>> old one we conducted and there are some results that seemingly justify
>> doing work to intelligently run things based on what changed.
>>
>> One of the questions on the last survey was "Thinking of running automated
>> tests, rank the following potential improvements in terms of their impact
>> on your productivity." "Determine and run relevant tests based on what
>> source files have been modified" was one of the most wanted improvements -
>> right up there with "make try runs really fast so I can effectively iterate
>> on automated tests using try instead."
>
> FWIW, I recently added a unit test for Firefox. On try, I essentially
> had to run the whole corresponding test suite (browser-chrome), instead
> of just the block that contains the test, because it's almost impossible
> to figure out which one it's going to run in.
>
> Making /that/ less painful would go a long way.
>
> Mike
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds



_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds



_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

Benjamin Smedberg
In reply to this post by dmitchell
On Wed, May 31, 2017 at 11:08 AM, Dustin Mitchell <[hidden email]> wrote:
 

So the proposal addresses three issues:
 - frustrating and ineffective try user interface
 - high load and consequent long wait times
 - elevated backout rate


Let me try and repeat back the assumptions/assertions I've heard, to see if I understand.

#1 try user interface is a big problem
#2 some people are running too many try jobs
#3 too many try jobs causes long wait times
#4 too many try jobs costs too much money
#5 some people aren't running enough try jobs
#6 not enough try jobs is causing an elevated backout rate
#7 a computer can do a better job today than people can at picking jobs
#8 which will simultaneously solve the problem of over-running and under-running jobs
#9 which will improve the patch sticking/backout rate



I'm perhaps missing the key part of your proposal, which is the actual logic the system uses to automagically determine  which jobs to run. I see a lot of description about AFFECTS in the build tree, but it's not clear to me who is adding these annotations and how they are maintained.

Manual annotations aren't cheap: they aren't cheap to add, and they certainly aren't cheap to maintain. I don't know the exact dollar figures we're talking about here, but it doesn't take much developer time to swamp the cost of running more jobs.

There's also perverse incentives in annotations: the cost of excluding too little is that we keep running more tests than we need to, but the cost of excluding too much is that we don't run enough tests before final commit. The natural reaction is going to be leaning on the side of excluding very little.

Is there a feedback loop where we know via code coverage or some other magic which files are run as part of which tests? That would avoid the massive pitfalls of manual annotations (but create its own set of headaches).

 

> Related to this, you need to remember one of the primary functions of try is
> to validate changes before landing on inbound/autoland, and so reduce the
> backout rate from sheriffs. Running subsets of tests will increase the
> backout rate. I think that's probably ok, but we need to be aware of this
> social/workflow impact as it's not just a technical decision.

You'll note that a consequence of the proposal would be to *decrease*
the backout rate.  To achieve this, we must be careful about two
things:
 1. Be conservative in what we decide to skip (better to run a
trivially green job than skip an orange)
 2. Provide tools that figure out what needs to run in try, to avoid
the double-whammy described above

#1 is why I built optimization around "skipping" instead of "running"
-- the burden of proof should be on the task configuration to say when
a job can be safely skipped, rather than trying to enumerate all the
conditions in which the job should be run.  #2 is behind the "there is
no try, just do"[*] behavior in the proposal.  But "will this push
break things" is not the only use-case for try, so we need "try this"
functionality (requesting specific jobs) for most of the other
use-cases.

I think I don't believe your assertion. I think that given the way our code is structured, there is no way for a machine in the typical case to run a smaller subset of jobs than a human, and so the machine is going to have to run more jobs than a human in most cases in order to reduce the backout rate. And that this will ultimately increase the $$/machine time. Or the machine will have to guess with a smaller subset, and have a higher backout rate later.

As an example: typically I develop either on a Window machine or a Linux machine. Sometimes both. So often what I do is run the relevant tests locally (dom/plugins and a few others). Then I run a try push on the other platforms only (Mac, maybe android depending). I see many of our core developers adopting similar strategies.

I hope you can prove me wrong, but I'm deeply skeptical that the tradeoffs involved here are going to get us *both* reduced machine consumption and more thorough testing, without a significantly larger scope.

--BDS


_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

Andrew Halberstadt
Speaking for myself I agree that it seems unlikely we'll be able to both reduce machine consumption and increase test coverage, and I think that's fine. When I read dustin's proposal, I see:

#1. The interface to try is terrible, we should fix that
#2. There is some scheduling we could be doing automatically, we should be smarter about that

Each of those things is a worthy goal independent of the other, and regardless of how much of a change in machine consumption or backout rates we end up with at the end of the day.

I understand your concerns about #2 and the need to either manually maintain affected files or stand up complicated code coverage calculations. But I don't think anyone is proposing we exclusively rely on automated scheduling. In my mind, the only things we should automatically schedule are things we are "very damn sure" about. For example, the recent change to only run jsreftests if there was a change to the JS engine. Or self-contained unittests only applicable to a particular module of the tree. If there is any doubt about whether a task could be affected by code in other parts of the tree, then that should immediately disqualify it from automatic scheduling.

On another note, there actually is a chance that #1 can reduce machine consumption without decreasing test coverage. That's because with the current try syntax it's often really hard to schedule only the exact task a developer wants. If we can make the try interface both easier to use and more precise, then there will be fewer "wasted" tasks (i.e tasks the developer didn't want, but ended up getting anyway).

-Andrew

On Wed, May 31, 2017 at 5:00 PM, Benjamin Smedberg <[hidden email]> wrote:
On Wed, May 31, 2017 at 11:08 AM, Dustin Mitchell <[hidden email]> wrote:
 

So the proposal addresses three issues:
 - frustrating and ineffective try user interface
 - high load and consequent long wait times
 - elevated backout rate


Let me try and repeat back the assumptions/assertions I've heard, to see if I understand.

#1 try user interface is a big problem
#2 some people are running too many try jobs
#3 too many try jobs causes long wait times
#4 too many try jobs costs too much money
#5 some people aren't running enough try jobs
#6 not enough try jobs is causing an elevated backout rate
#7 a computer can do a better job today than people can at picking jobs
#8 which will simultaneously solve the problem of over-running and under-running jobs
#9 which will improve the patch sticking/backout rate



I'm perhaps missing the key part of your proposal, which is the actual logic the system uses to automagically determine  which jobs to run. I see a lot of description about AFFECTS in the build tree, but it's not clear to me who is adding these annotations and how they are maintained.

Manual annotations aren't cheap: they aren't cheap to add, and they certainly aren't cheap to maintain. I don't know the exact dollar figures we're talking about here, but it doesn't take much developer time to swamp the cost of running more jobs.

There's also perverse incentives in annotations: the cost of excluding too little is that we keep running more tests than we need to, but the cost of excluding too much is that we don't run enough tests before final commit. The natural reaction is going to be leaning on the side of excluding very little.

Is there a feedback loop where we know via code coverage or some other magic which files are run as part of which tests? That would avoid the massive pitfalls of manual annotations (but create its own set of headaches).

 

> Related to this, you need to remember one of the primary functions of try is
> to validate changes before landing on inbound/autoland, and so reduce the
> backout rate from sheriffs. Running subsets of tests will increase the
> backout rate. I think that's probably ok, but we need to be aware of this
> social/workflow impact as it's not just a technical decision.

You'll note that a consequence of the proposal would be to *decrease*
the backout rate.  To achieve this, we must be careful about two
things:
 1. Be conservative in what we decide to skip (better to run a
trivially green job than skip an orange)
 2. Provide tools that figure out what needs to run in try, to avoid
the double-whammy described above

#1 is why I built optimization around "skipping" instead of "running"
-- the burden of proof should be on the task configuration to say when
a job can be safely skipped, rather than trying to enumerate all the
conditions in which the job should be run.  #2 is behind the "there is
no try, just do"[*] behavior in the proposal.  But "will this push
break things" is not the only use-case for try, so we need "try this"
functionality (requesting specific jobs) for most of the other
use-cases.

I think I don't believe your assertion. I think that given the way our code is structured, there is no way for a machine in the typical case to run a smaller subset of jobs than a human, and so the machine is going to have to run more jobs than a human in most cases in order to reduce the backout rate. And that this will ultimately increase the $$/machine time. Or the machine will have to guess with a smaller subset, and have a higher backout rate later.

As an example: typically I develop either on a Window machine or a Linux machine. Sometimes both. So often what I do is run the relevant tests locally (dom/plugins and a few others). Then I run a try push on the other platforms only (Mac, maybe android depending). I see many of our core developers adopting similar strategies.

I hope you can prove me wrong, but I'm deeply skeptical that the tradeoffs involved here are going to get us *both* reduced machine consumption and more thorough testing, without a significantly larger scope.

--BDS


_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds



_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing what runs on which push

Dustin Mitchell
2017-06-01 10:54 GMT-04:00 Andrew Halberstadt <[hidden email]>:
> Speaking for myself I agree that it seems unlikely we'll be able to both
> reduce machine consumption and increase test coverage, and I think that's
> fine. When I read dustin's proposal, I see:
>
> #1. The interface to try is terrible, we should fix that
> #2. There is some scheduling we could be doing automatically, we should be
> smarter about that

Thanks, that's a great summary :) I agree that they are independent,
but they generally bleed together in discussions so we should probably
address them both at the same time.

For #2, there's a clear risk of over-doing it, introducing complexity
and failure modes.  Let's not do that.  There's also a risk kof
optimizing the wrong thing -- for example, being really careful about
when an eslint job runs is silly, because eslint is fast and easy.  We
will see better value in optimizing for constrained resources --
specifically, tasks that have to run on Mozilla-owned hardware fo
which we have a finite supply, such as OS X, especially if we can
optimize those jobs away on integration branches.  And if we stick to
the 'very damn sure" rule, we can avoid increasing the backout rate.

In talking to Ben, I realized this was proposed as if it was a major
project that would take a few engineers a few months. That's my
mistake -- rather, this is a design proposal to guide work that is
already underway (we are optimizing already) or planned (such as
ahal's try work).  The idea is to make sure that this work proceeds in
a consistent, positive direction.

So! I'll put this into a google doc to which we can all refer when our
work touches on optimization or try, and post the result here for
review.

Thanks for the comments so far :)

Dustin
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
12