Reminder: Please add "regression" keyword to FX OS bugs that are regressions

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Reminder: Please add "regression" keyword to FX OS bugs that are regressions

Marcia Knous-2
If you are seeing bugs that are caused by regressions, please be sure you are actively adding the keyword "regression" to all of your bugs. Of course, you must then add the regression-window wanted keyword also to search for a window.  But please start by adding "regression", as triage is on the lookout for these keywords.

If you don't know if its a regression or not, you can always ask in the bug.

Thanks in advance for contributing to the testing of FX OS devices!
_______________________________________________
dev-quality mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-quality
Reply | Threaded
Open this post in threaded view
|

Re: Reminder: Please add "regression" keyword to FX OS bugs that are regressions

Kevin Brosnan
On Android we will liberally add the regression-windowwanted keyword to
bugs that we suspect are regressions. This helps to make sure that things
are not overlooked. It does mean watching the bare regression-windowwanted
keyword. It may make sense to prioritize speculative regressions below
known regressions.

Kevin

On Thu, Oct 30, 2014 at 7:01 AM, Marcia Knous <[hidden email]
> wrote:

> If you are seeing bugs that are caused by regressions, please be sure you
> are actively adding the keyword "regression" to all of your bugs. Of
> course, you must then add the regression-window wanted keyword also to
> search for a window.  But please start by adding "regression", as triage is
> on the lookout for these keywords.
>
> If you don't know if its a regression or not, you can always ask in the
> bug.
>
> Thanks in advance for contributing to the testing of FX OS devices!
> _______________________________________________
> dev-quality mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-quality
>
_______________________________________________
dev-quality mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-quality
Reply | Threaded
Open this post in threaded view
|

Re: Reminder: Please add "regression" keyword to FX OS bugs that are regressions

Geo Mealer
I won’t speak for your specific process, but I get a little concerned about always leading with a window. They’re pretty expensive, comparatively speaking, and not always necessary. It strikes me as one of those things where, economically speaking, it’s better to try to debug without it and take the delay for a dev having to ask later in the exceptional case than to add the effort to the normal case. Mozilla is pretty unusual in my experience to expect quite so much bisection/windowing from QA, and I wonder if it isn’t an overall productivity sap.

I’m going to get nerdy about this to explore that concept. (TL;DR: I compare the two cases systematically to eliminate common steps and understand what has to take more time than what to make a bisect-always strategy work. Jump to CONCLUSION below to skip all this)

Note: I realize that we don’t actually window everything, but I’ve seen pushes towards windowing all *actionable* or QA-reported bugs. It doesn’t really matter. You can think of “always” as “the subset of bugs for which we’d always window” and the logic stands up. The bigger the subset, the more relevant this is. So, pushes towards doing more mean this sort of thing should be weighed.

C: number of issues too complex to efficiently debug without windowing
S: number of issues simple enough to efficiently debug without windowing

W: time to window
L: average bugzilla request/response lag time
D(simple): time to debug simple issue w/o a window
D(fail): time to realize you can’t debug an issue w/o window because it’s too complex
D(post): time to debug an issue w/ window

Every time:

(S+C) * (W + L + D(post))

In other words, the total number of bugs, regardless of complexity, multiplied by the time to window, a request/response lag (here’s the results), and time to debug a windowed bug.

This is equivalent to:

S * (W + L + D(post)) + C * (W + L + D(post))

which is

SW + SL + SD(post) + CW + CL + CD(post)

Freezer that for a sec.

Conversely, by request:

(S * D(simple)) + (C * (D(fail) + L + W + L + D(post)))

In other words, the total number of simple bugs multiplied by the time to debug a simple bug + the number of complex bugs multiplied by time to try and fail to debug it, a lag (need window), a window, another lag (here’s the results), and the time to debug a windowed bug.

This is equivalent to:

SD(simple) + CD(fail) + CL + CW + CL + CD(post)

We can take out the common steps between the two (windowing, reporting and debugging all the complex bugs) because we’ll have to do those no matter what. So that leaves us comparing:

SW + SL + SD(post) -and- SD(simple) + CD(fail) + CL.

In other words, we’re comparing:

(always) the number of simple bugs * time to window, lag for dev to receive results, and debug a windowed bug

and

(by request) the number of simple bugs * time to debug a simple bug; plus the number of complex bugs times the amount of time to fail to debug and realize it needs windowing plus request lag.

Assuming unwindowed simple bugs (SD(simple)) and windowed simple bugs (SD(post)) take roughly the same amount of time to debug we can remove that commonality too. So that’s now:

SW + SL -and- CD(fail) + CL

which is

S(W + L) to C(D(fail) + L)

So, in English that leaves a time/effort comparison between:

(always) number of simple bugs * time to window and report

and

(by request) number of complex bugs * time to realize it needs windowing and request

CONCLUSION:

So, for windowing proactively to be superior, either we need to assume correctly there are significantly more bugs that can’t reasonably be debugged without a window than bugs that can be; or the time to fail a debug session and request needs to be significantly higher than the time to window and report.

Thing is, the dev has full control over how much time they spend before requesting a window: they can probably identify most complex-enough bugs by eyeballing Bugzilla and immediately request (i.e. the time is trivial); even if they can’t, they can define their own threshold to fail a debug session and then request, keeping their own investment on D(fail) to a minimum. Since we know windowing is usually pretty expensive, it seems easy to keep the threshold quite a bit cheaper.

There’s also result rot, where a window potentially bleeds value over time as additional changes happen to the related code if the bug isn’t immediately addressed. This is especially important in multifactor system bugs like concurrency, performance, etc. where the behavior might change completely and need a different fix than the original patch implies. That can weigh against doing an bisection much earlier than the subsequent debugging session will happen.

So at the very least, I’m pretty certain proactive windowing should only happen on A) only in bugs that can be windowed quicker than the dev can determine whether or not they actually need to window (so probably no intermittent, performance, or other hard-to-window bugs) and B) only once the bug is actually assigned for immediate work.

I will say that the exception is if you’re looking to immediately back out the offending code ASAP, not flag it to fix by a dev later. Then a proactive window is required to identify the patch to back out, because even debugging doesn’t necessarily give you that and because you can’t afford any of the request lags. However, it’s way expensive to sheriff that way, to the point of not making sense as a strategy—that happening should be a red flag “smell” for increasing automation.  

But as a debugging tool, I think windowing is a poor first resort. In the normal case, the dev who will actually be fixing the bug in the immediate term (not their boss, not the PM) should -at least- eyeball the bug first and request the window themselves once they think they can’t just fix it with the information they already have.

Geo

On Nov 4, 2014, at 3:18 PM, Kevin Brosnan <[hidden email]> wrote:

> On Android we will liberally add the regression-windowwanted keyword to
> bugs that we suspect are regressions. This helps to make sure that things
> are not overlooked. It does mean watching the bare regression-windowwanted
> keyword. It may make sense to prioritize speculative regressions below
> known regressions.
>
> Kevin
>
> On Thu, Oct 30, 2014 at 7:01 AM, Marcia Knous <[hidden email]
>> wrote:
>
>> If you are seeing bugs that are caused by regressions, please be sure you
>> are actively adding the keyword "regression" to all of your bugs. Of
>> course, you must then add the regression-window wanted keyword also to
>> search for a window.  But please start by adding "regression", as triage is
>> on the lookout for these keywords.
>>
>> If you don't know if its a regression or not, you can always ask in the
>> bug.
>>
>> Thanks in advance for contributing to the testing of FX OS devices!
>> _______________________________________________
>> dev-quality mailing list
>> [hidden email]
>> https://lists.mozilla.org/listinfo/dev-quality
>>
> _______________________________________________
> dev-quality mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-quality

_______________________________________________
dev-quality mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-quality
Reply | Threaded
Open this post in threaded view
|

Re: Reminder: Please add "regression" keyword to FX OS bugs that are regressions

Benjamin Smedberg

On 11/7/2014 7:44 PM, Geo Mealer wrote:

[snip a lot of good stuff!]

> I will say that the exception is if you’re looking to immediately back out the offending code ASAP, not flag it to fix by a dev later. Then a proactive window is required to identify the patch to back out, because even debugging doesn’t necessarily give you that and because you can’t afford any of the request lags. However, it’s way expensive to sheriff that way, to the point of not making sense as a strategy—that happening should be a red flag “smell” for increasing automation.
>
> But as a debugging tool, I think windowing is a poor first resort. In the normal case, the dev who will actually be fixing the bug in the immediate term (not their boss, not the PM) should -at least- eyeball the bug first and request the window themselves once they think they can’t just fix it with the information they already have.

I mostly agree with your conclusion! I think we over-use regression
windows in lots of cases. I will note several things which may change
the equation:

0) I think we should be a lot more willing to do backouts for
regressions. Especially where we're trading off regressions for
features, engineers may have a tendency to put regressions onto the back
burner, and being able to just back things out for serious regressions
is an important tool in the quality arsenal.

1) There may be value in finding regression windows for larger units
(e.g. entire release windows). Knowing that a bug is present in FF34 but
not in FF33 helps release drivers decide whether to track/block a
release on a particular bug. That takes a lot less time than finding a
window down to the nightly or especially down to a particular cset.

2) Bugs often don't start out with a clear owner. One of the reasons QA
uses regression windows right now is to figure out which person caused a
regression so that they can give the bug to that person. I often will go
through a nightly regression window for QA and identify which bugs might
have caused a regression, just to narrow down the set of potential
"victims" to something reasonable.

3) Some of the regressionwindow-wanted requests aren't for windows down
to a particular cset, but just by nightly. So e.g. if we see a new crash
signature, we try to find out which nightly the crash started in, and
then link up to a pushlog for that nightly. That's significantly less
effort than bisecting down to a particular cset. Also because these
often don't have clear STR, finding a window by nightly is the best we
can get.

--BDS

_______________________________________________
dev-quality mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-quality
Reply | Threaded
Open this post in threaded view
|

Re: Reminder: Please add "regression" keyword to FX OS bugs that are regressions

Boris Zbarsky
In reply to this post by Geo Mealer
On 11/10/14, 3:32 PM, Benjamin Smedberg wrote:
> 2) Bugs often don't start out with a clear owner. One of the reasons QA
> uses regression windows right now is to figure out which person caused a
> regression so that they can give the bug to that person.

Indeed.

Also, and this is important, doing a regression window on nightlies is
something that can be done by pretty much anyone, and generally doesn't
require, for a single bug, a huge time commitment.  Especially if you
have a local cache of recentish nightlies.

We have volunteer QA who do these, and it's very helpful in terms of
getting the right developer's eyes on the bug in a timely manner.

-Boris
_______________________________________________
dev-quality mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-quality
Reply | Threaded
Open this post in threaded view
|

Re: Reminder: Please add "regression" keyword to FX OS bugs that are regressions

Geo Mealer
I think you'd be surprised how long a regression window can take. It really depends on the project and the bug.

Large cache of nightlies, desktop-simple install, always-reproducible bug, not too bad. Still a lot longer than just filing the bug, and I still think it should be eyeballed first, but not bad.

Build-your-own? Lot longer.

Mobile phone install? Longer.

Mobile phone reflash? A -heck- of a lot longer.

Intermittent? Add a multiplier. Intermittents are identified with "didn't happen in N repeated iterations." Probability being what it is, that's not even perfect.

Concurrency? Probably intermittent. Might not be deterministic at all.

Performance? Unless it's a regression so vast that it's a ton bigger than noise, no chance of a high quality result. Can't make a clear yes/no decision like you need for windowing.

The idea that all our windows are fungible tasks is nice, but again, depending on project that might not be true (it's not for FxOS because most people don't have devices, for example) and there's generally a lot of pressure on QA to be the people picking up the fungible tasks.

Volunteer QA? Sure. But they could still conceivably be doing something better with their time. QA tasks in general aren't very fungible, because everyone thinks QA will do them. When was the last time you personally followed up a qawanted?

I take Benjamin's points--they mostly go to variations in "how long does it take" or "how quickly do we need to react", so I think they complement my conclusions as much as they modify them. They also add "just how fine a window do you need?" which I think is a terrific question to be asked.

However, I don't think mavening is a very good reason. The bug already has external manifestation, or else it wouldn't have been caught. I think we have a systemic issue in that everyone owns the internals but way less people feel like they own the externals. Upshot is everyone wants it windowed to a root cause before they'll touch it, which is...unusual, in my experience.

One good thing I can say about the FxOS project in this regard is that it's significantly more cut and dry: if it's externally visible, there's probably a Gaia team to toss it at for mavening. If *they* think they need a regression window to do that, then I think that's one of many good reasons to do so.

But I still think someone should ask, and I think they should at least eyeball the issue first. This idea that we do it more and more by assumption at the very least should be measured. I think it's a bad path.

----- Original Message -----

> From: "Boris Zbarsky" <[hidden email]>
> To: [hidden email]
> Sent: Monday, November 10, 2014 2:35:14 PM
> Subject: Re: Reminder: Please add "regression" keyword to FX OS bugs that are regressions
>
> On 11/10/14, 3:32 PM, Benjamin Smedberg wrote:
> > 2) Bugs often don't start out with a clear owner. One of the reasons QA
> > uses regression windows right now is to figure out which person caused a
> > regression so that they can give the bug to that person.
>
> Indeed.
>
> Also, and this is important, doing a regression window on nightlies is
> something that can be done by pretty much anyone, and generally doesn't
> require, for a single bug, a huge time commitment.  Especially if you
> have a local cache of recentish nightlies.
>
> We have volunteer QA who do these, and it's very helpful in terms of
> getting the right developer's eyes on the bug in a timely manner.
>
> -Boris
> _______________________________________________
> dev-quality mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-quality
>
_______________________________________________
dev-quality mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-quality
Reply | Threaded
Open this post in threaded view
|

Re: Reminder: Please add "regression" keyword to FX OS bugs that are regressions

Boris Zbarsky
On 11/10/14, 6:17 PM, Geo Mealer wrote:
> Large cache of nightlies, desktop-simple install, always-reproducible bug, not too bad.

Right.

> Build-your-own? Lot longer.
>
> Mobile phone install? Longer.
>
> Mobile phone reflash? A -heck- of a lot longer.

Yep, in this case doing the regression window thing as a default triage
tool may well no longer be the right tradeoff.  You're right that our
attitudes towards it may be biased by the historical situation with
desktop firefox.

> Intermittent? Add a multiplier.

Finding a regression range for an intermittent is not likely to be a
good time investment unless other avenues of attacking it have been
exhausted, I agree.

> Volunteer QA? Sure. But they could still conceivably be doing something better with their time.

True.

> When was the last time you personally followed up a qawanted?

I don't do it purposefully, but the last time I personally found a
regression range or created a testcase for a bug that had qawanted on it
was likely sometime in the last two weeks.  If you ignore the
requirement to have "qawanted", then definitely in the last several
days, since I try to proactively triage bugs before they end up in the
qawanted bucket.

> However, I don't think mavening is a very good reason.

I'm not sure what you mean by "mavening" here.

> But I still think someone should ask, and I think they should at least eyeball the issue first.

The question is who this "someone" should be.

Historically, again in desktop-browser-land, we had the following
conditions:

1)  A fair number of people who can install and run nightlies and hence
find regression ranges on nightlies, and are willing to do so.

2)  Several people who have good overall knowledge of the codebase and
can map from a short enough regression range (historically a day, since
it mapped well to nightlies, though recently a day contains a heck of a
lot of checkins) and an observed problem to a likely list of possible
checkins that caused the problem.

3)  A fair number of bugs for which the root cause is unclear at first.

The combination of those three factors made regression range finding a
useful exercise.  We don't have all three factors for all our projects
now, as you point out.

For web content regressions (which are typically observable on desktop),
we do still have these factors.  Particularly #3; it's often hard to
tell whether the observed symptom ("site doesn't work") is a JS bug, a
DOM bug, a layout bug, or a networking bug.  Sometimes a triager who
knows enough about how those possibilities might manifest can take a
good guess.  That takes a lot more skill than finding a regression range
and needinfoing Benjamin, say, so the latter can be done by more people
and hence might be a good investment if we have people who are willing
to do that but not more involved tasks.

I'd love it if all the people we have who are willing to do bug triage
had such skill, of course.  What sometimes happens instead is that a
component is picked at random and the bug is thrown in there with the
idea that the development team responsible for that component will
figure out where it should really live.  This works ok in a few
components that are very actively triaged by their developers; poorly in
many others.

In an ideal world, we would have people doing whatever tasks they have
comparative advantage in, to maximize our overall productivity (modulo
obvious things like people wanting to learn new things and whatnot).
How best to do that is a good question, of course.  Finding regression
ranges is one possible thing people can do, but you're absolutely
correct that it's easy to find cases where it's just not worth the
effort as a front-line triage tool.

-Boris
_______________________________________________
dev-quality mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-quality
Reply | Threaded
Open this post in threaded view
|

Re: Reminder: Please add "regression" keyword to FX OS bugs that are regressions

Anthony Hughes-3
I want to try to avoid hijacking this thread too much but I want to throw this out there.

Would it be useful to have a regressionwindow-needed keyword?
-needed for bugs of certain urgency/impact or where a window is required to move the bug forward
-wanted for bugs of less importance or where a window is desired but not necessarily critical

Just a thought,

Anthony Hughes
Senior Test Engineer
Mozilla Corporation


----- Original Message -----

> From: "Boris Zbarsky" <[hidden email]>
> To: [hidden email]
> Sent: Monday, November 10, 2014 7:18:49 PM
> Subject: Re: Reminder: Please add "regression" keyword to FX OS bugs that are regressions
>
> On 11/10/14, 6:17 PM, Geo Mealer wrote:
> > Large cache of nightlies, desktop-simple install, always-reproducible bug,
> > not too bad.
>
> Right.
>
> > Build-your-own? Lot longer.
> >
> > Mobile phone install? Longer.
> >
> > Mobile phone reflash? A -heck- of a lot longer.
>
> Yep, in this case doing the regression window thing as a default triage
> tool may well no longer be the right tradeoff.  You're right that our
> attitudes towards it may be biased by the historical situation with
> desktop firefox.
>
> > Intermittent? Add a multiplier.
>
> Finding a regression range for an intermittent is not likely to be a
> good time investment unless other avenues of attacking it have been
> exhausted, I agree.
>
> > Volunteer QA? Sure. But they could still conceivably be doing something
> > better with their time.
>
> True.
>
> > When was the last time you personally followed up a qawanted?
>
> I don't do it purposefully, but the last time I personally found a
> regression range or created a testcase for a bug that had qawanted on it
> was likely sometime in the last two weeks.  If you ignore the
> requirement to have "qawanted", then definitely in the last several
> days, since I try to proactively triage bugs before they end up in the
> qawanted bucket.
>
> > However, I don't think mavening is a very good reason.
>
> I'm not sure what you mean by "mavening" here.
>
> > But I still think someone should ask, and I think they should at least
> > eyeball the issue first.
>
> The question is who this "someone" should be.
>
> Historically, again in desktop-browser-land, we had the following
> conditions:
>
> 1)  A fair number of people who can install and run nightlies and hence
> find regression ranges on nightlies, and are willing to do so.
>
> 2)  Several people who have good overall knowledge of the codebase and
> can map from a short enough regression range (historically a day, since
> it mapped well to nightlies, though recently a day contains a heck of a
> lot of checkins) and an observed problem to a likely list of possible
> checkins that caused the problem.
>
> 3)  A fair number of bugs for which the root cause is unclear at first.
>
> The combination of those three factors made regression range finding a
> useful exercise.  We don't have all three factors for all our projects
> now, as you point out.
>
> For web content regressions (which are typically observable on desktop),
> we do still have these factors.  Particularly #3; it's often hard to
> tell whether the observed symptom ("site doesn't work") is a JS bug, a
> DOM bug, a layout bug, or a networking bug.  Sometimes a triager who
> knows enough about how those possibilities might manifest can take a
> good guess.  That takes a lot more skill than finding a regression range
> and needinfoing Benjamin, say, so the latter can be done by more people
> and hence might be a good investment if we have people who are willing
> to do that but not more involved tasks.
>
> I'd love it if all the people we have who are willing to do bug triage
> had such skill, of course.  What sometimes happens instead is that a
> component is picked at random and the bug is thrown in there with the
> idea that the development team responsible for that component will
> figure out where it should really live.  This works ok in a few
> components that are very actively triaged by their developers; poorly in
> many others.
>
> In an ideal world, we would have people doing whatever tasks they have
> comparative advantage in, to maximize our overall productivity (modulo
> obvious things like people wanting to learn new things and whatnot).
> How best to do that is a good question, of course.  Finding regression
> ranges is one possible thing people can do, but you're absolutely
> correct that it's easy to find cases where it's just not worth the
> effort as a front-line triage tool.
>
> -Boris
> _______________________________________________
> dev-quality mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-quality
>
_______________________________________________
dev-quality mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-quality