A/B Testing in Firefox for Android

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

A/B Testing in Firefox for Android

Mark Finkle-2
We have decided to start running A/B Testing [1] in Firefox for Android.
These experiments are intended to optimize specific outcomes, as well as,
inform our long-term design decisions. We want to create the best Firefox
experience we can, and these experiments will help.

The system will also allow us to throttle the release of features, called
staged rollout, so we can monitor new features in a controlled manner
across a large userbase and a fragmented device ecosystem. If we need to
rollback a feature for some reason, we'd have the ability to do that.

Technical details:
* Switchboard is used to control experiment segmenting and staged rollout.
* Telemetry is used to collect metrics about an experiment.
* FHR is used to track active experiments so we can correlate to
application usage.

== What is Switchboard? ==

Switchboard is an open source SDK for doing A/B testing and staged rollouts
[2][3]. It connects to a server component, which maintains a list of active
experiments.

The SDK does create a UUID, which is stored on the device. The UUID is sent
to the server, which uses it to "bucket" the client, but the UUID is never
stored on the server. In fact, the server does not store any data. The
server we are using is being hosted by Mozilla.

We decided to start using Switchboard because it's simple, open source,
saves no data and can be hosted by Mozilla. Thanks to the KeepSafe folks
for releasing Switchboard.

== Planning Experiments ==

The Mobile Product and UX teams are the primary drivers for creating
experiments, but as is common on the Mobile team, ideas can come from
anywhere. We have been working with the Mozilla Growth team, getting a
better understanding of how to design the experiments and analyze the
metrics. UX researchers will also have input into the experiments.

Once Product and UX complete the experiment design, Development would land
code in Firefox to implement the desired variations of the experiment.
Development would also land code in the Switchboard server to control the
configuration of the experiment: Is it active? How are the variations
distributed across the user population?

Since we use Telemetry to collect metrics on the experiments, the Beta
channel is likely our best time period to run experiments. Telemetry is on
by default on Nightly, Aurora and Beta; and Beta is the largest userbase of
those three channels.

Once we decide which variation of the experiment is the "winner", we'll
change the Switchboard server configuration for the experiment so that 100%
of the userbase will flow through the winning variation.

Yes, a small percentage of the Release channel has Telemetry enabled, but
it might be too small to be useful for experimentation. Time will tell.

Note: Switchboard itself will be enabled on all channels. It collects no
data and gives us a "code-free" way of staging rollouts. It much less risky
and time consuming than uplifting patches that need to land on branches at
specific times.

== What's Happening Now? ==

Switchboard has already landed in Nightly [4] and is currently behind a
Nightly build flag. Once we feel comfortable, we'll let it ride the trains.
Our first experiment will likely be testing a new onboarding experience [5].

[1] https://en.wikipedia.org/wiki/A/B_testing
[2] https://github.com/KeepSafe/Switchboard
[3]
http://keepsafe-engineering.tumblr.com/post/28437940369/easy-mobile-ab-testing
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1196897
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1199859
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: A/B Testing in Firefox for Android

Martin Thomson
This seems like a good idea.

Has anyone considered contributing a change to switchboard that would
allow the experiment policy to be downloaded to clients and the A/B
decision made there?  That removes any possibility of creating another
potential tracking mechanism.

On Thu, Sep 3, 2015 at 11:16 AM, Mark Finkle <[hidden email]> wrote:

> We have decided to start running A/B Testing [1] in Firefox for Android.
> These experiments are intended to optimize specific outcomes, as well as,
> inform our long-term design decisions. We want to create the best Firefox
> experience we can, and these experiments will help.
>
> The system will also allow us to throttle the release of features, called
> staged rollout, so we can monitor new features in a controlled manner
> across a large userbase and a fragmented device ecosystem. If we need to
> rollback a feature for some reason, we'd have the ability to do that.
>
> Technical details:
> * Switchboard is used to control experiment segmenting and staged rollout.
> * Telemetry is used to collect metrics about an experiment.
> * FHR is used to track active experiments so we can correlate to
> application usage.
>
> == What is Switchboard? ==
>
> Switchboard is an open source SDK for doing A/B testing and staged rollouts
> [2][3]. It connects to a server component, which maintains a list of active
> experiments.
>
> The SDK does create a UUID, which is stored on the device. The UUID is sent
> to the server, which uses it to "bucket" the client, but the UUID is never
> stored on the server. In fact, the server does not store any data. The
> server we are using is being hosted by Mozilla.
>
> We decided to start using Switchboard because it's simple, open source,
> saves no data and can be hosted by Mozilla. Thanks to the KeepSafe folks
> for releasing Switchboard.
>
> == Planning Experiments ==
>
> The Mobile Product and UX teams are the primary drivers for creating
> experiments, but as is common on the Mobile team, ideas can come from
> anywhere. We have been working with the Mozilla Growth team, getting a
> better understanding of how to design the experiments and analyze the
> metrics. UX researchers will also have input into the experiments.
>
> Once Product and UX complete the experiment design, Development would land
> code in Firefox to implement the desired variations of the experiment.
> Development would also land code in the Switchboard server to control the
> configuration of the experiment: Is it active? How are the variations
> distributed across the user population?
>
> Since we use Telemetry to collect metrics on the experiments, the Beta
> channel is likely our best time period to run experiments. Telemetry is on
> by default on Nightly, Aurora and Beta; and Beta is the largest userbase of
> those three channels.
>
> Once we decide which variation of the experiment is the "winner", we'll
> change the Switchboard server configuration for the experiment so that 100%
> of the userbase will flow through the winning variation.
>
> Yes, a small percentage of the Release channel has Telemetry enabled, but
> it might be too small to be useful for experimentation. Time will tell.
>
> Note: Switchboard itself will be enabled on all channels. It collects no
> data and gives us a "code-free" way of staging rollouts. It much less risky
> and time consuming than uplifting patches that need to land on branches at
> specific times.
>
> == What's Happening Now? ==
>
> Switchboard has already landed in Nightly [4] and is currently behind a
> Nightly build flag. Once we feel comfortable, we'll let it ride the trains.
> Our first experiment will likely be testing a new onboarding experience [5].
>
> [1] https://en.wikipedia.org/wiki/A/B_testing
> [2] https://github.com/KeepSafe/Switchboard
> [3]
> http://keepsafe-engineering.tumblr.com/post/28437940369/easy-mobile-ab-testing
> [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1196897
> [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1199859
> _______________________________________________
> dev-planning mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-planning
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: A/B Testing in Firefox for Android

Axel Hecht
In reply to this post by Mark Finkle-2
Can testers self-select which part of the experiment they're seeing?

I'm thinking about l10n testers in particular, but also in general.

Axel

On 9/3/15 8:16 PM, Mark Finkle wrote:

> We have decided to start running A/B Testing [1] in Firefox for Android.
> These experiments are intended to optimize specific outcomes, as well as,
> inform our long-term design decisions. We want to create the best Firefox
> experience we can, and these experiments will help.
>
> The system will also allow us to throttle the release of features, called
> staged rollout, so we can monitor new features in a controlled manner
> across a large userbase and a fragmented device ecosystem. If we need to
> rollback a feature for some reason, we'd have the ability to do that.
>
> Technical details:
> * Switchboard is used to control experiment segmenting and staged rollout.
> * Telemetry is used to collect metrics about an experiment.
> * FHR is used to track active experiments so we can correlate to
> application usage.
>
> == What is Switchboard? ==
>
> Switchboard is an open source SDK for doing A/B testing and staged rollouts
> [2][3]. It connects to a server component, which maintains a list of active
> experiments.
>
> The SDK does create a UUID, which is stored on the device. The UUID is sent
> to the server, which uses it to "bucket" the client, but the UUID is never
> stored on the server. In fact, the server does not store any data. The
> server we are using is being hosted by Mozilla.
>
> We decided to start using Switchboard because it's simple, open source,
> saves no data and can be hosted by Mozilla. Thanks to the KeepSafe folks
> for releasing Switchboard.
>
> == Planning Experiments ==
>
> The Mobile Product and UX teams are the primary drivers for creating
> experiments, but as is common on the Mobile team, ideas can come from
> anywhere. We have been working with the Mozilla Growth team, getting a
> better understanding of how to design the experiments and analyze the
> metrics. UX researchers will also have input into the experiments.
>
> Once Product and UX complete the experiment design, Development would land
> code in Firefox to implement the desired variations of the experiment.
> Development would also land code in the Switchboard server to control the
> configuration of the experiment: Is it active? How are the variations
> distributed across the user population?
>
> Since we use Telemetry to collect metrics on the experiments, the Beta
> channel is likely our best time period to run experiments. Telemetry is on
> by default on Nightly, Aurora and Beta; and Beta is the largest userbase of
> those three channels.
>
> Once we decide which variation of the experiment is the "winner", we'll
> change the Switchboard server configuration for the experiment so that 100%
> of the userbase will flow through the winning variation.
>
> Yes, a small percentage of the Release channel has Telemetry enabled, but
> it might be too small to be useful for experimentation. Time will tell.
>
> Note: Switchboard itself will be enabled on all channels. It collects no
> data and gives us a "code-free" way of staging rollouts. It much less risky
> and time consuming than uplifting patches that need to land on branches at
> specific times.
>
> == What's Happening Now? ==
>
> Switchboard has already landed in Nightly [4] and is currently behind a
> Nightly build flag. Once we feel comfortable, we'll let it ride the trains.
> Our first experiment will likely be testing a new onboarding experience [5].
>
> [1] https://en.wikipedia.org/wiki/A/B_testing
> [2] https://github.com/KeepSafe/Switchboard
> [3]
> http://keepsafe-engineering.tumblr.com/post/28437940369/easy-mobile-ab-testing
> [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1196897
> [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1199859
>

_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: A/B Testing in Firefox for Android

Mark Finkle-2
In reply to this post by Martin Thomson
I added a patch which is in review now [1]. It allows the experiment's
segmenting configuration to be used locally, instead of sent from the
server.

We only intend to use this when the experiment is in a code path that
happens very early in application startup. We lose all ability to
dynamically alter the configuration and code path if we use this approach.
Any changes must be landed in the client application.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1201384

On Thu, Sep 3, 2015 at 2:25 PM, Martin Thomson <[hidden email]> wrote:

> This seems like a good idea.
>
> Has anyone considered contributing a change to switchboard that would
> allow the experiment policy to be downloaded to clients and the A/B
> decision made there?  That removes any possibility of creating another
> potential tracking mechanism.
>
> On Thu, Sep 3, 2015 at 11:16 AM, Mark Finkle <[hidden email]> wrote:
> > We have decided to start running A/B Testing [1] in Firefox for Android.
> > These experiments are intended to optimize specific outcomes, as well as,
> > inform our long-term design decisions. We want to create the best Firefox
> > experience we can, and these experiments will help.
> >
> > The system will also allow us to throttle the release of features, called
> > staged rollout, so we can monitor new features in a controlled manner
> > across a large userbase and a fragmented device ecosystem. If we need to
> > rollback a feature for some reason, we'd have the ability to do that.
> >
> > Technical details:
> > * Switchboard is used to control experiment segmenting and staged
> rollout.
> > * Telemetry is used to collect metrics about an experiment.
> > * FHR is used to track active experiments so we can correlate to
> > application usage.
> >
> > == What is Switchboard? ==
> >
> > Switchboard is an open source SDK for doing A/B testing and staged
> rollouts
> > [2][3]. It connects to a server component, which maintains a list of
> active
> > experiments.
> >
> > The SDK does create a UUID, which is stored on the device. The UUID is
> sent
> > to the server, which uses it to "bucket" the client, but the UUID is
> never
> > stored on the server. In fact, the server does not store any data. The
> > server we are using is being hosted by Mozilla.
> >
> > We decided to start using Switchboard because it's simple, open source,
> > saves no data and can be hosted by Mozilla. Thanks to the KeepSafe folks
> > for releasing Switchboard.
> >
> > == Planning Experiments ==
> >
> > The Mobile Product and UX teams are the primary drivers for creating
> > experiments, but as is common on the Mobile team, ideas can come from
> > anywhere. We have been working with the Mozilla Growth team, getting a
> > better understanding of how to design the experiments and analyze the
> > metrics. UX researchers will also have input into the experiments.
> >
> > Once Product and UX complete the experiment design, Development would
> land
> > code in Firefox to implement the desired variations of the experiment.
> > Development would also land code in the Switchboard server to control the
> > configuration of the experiment: Is it active? How are the variations
> > distributed across the user population?
> >
> > Since we use Telemetry to collect metrics on the experiments, the Beta
> > channel is likely our best time period to run experiments. Telemetry is
> on
> > by default on Nightly, Aurora and Beta; and Beta is the largest userbase
> of
> > those three channels.
> >
> > Once we decide which variation of the experiment is the "winner", we'll
> > change the Switchboard server configuration for the experiment so that
> 100%
> > of the userbase will flow through the winning variation.
> >
> > Yes, a small percentage of the Release channel has Telemetry enabled, but
> > it might be too small to be useful for experimentation. Time will tell.
> >
> > Note: Switchboard itself will be enabled on all channels. It collects no
> > data and gives us a "code-free" way of staging rollouts. It much less
> risky
> > and time consuming than uplifting patches that need to land on branches
> at
> > specific times.
> >
> > == What's Happening Now? ==
> >
> > Switchboard has already landed in Nightly [4] and is currently behind a
> > Nightly build flag. Once we feel comfortable, we'll let it ride the
> trains.
> > Our first experiment will likely be testing a new onboarding experience
> [5].
> >
> > [1] https://en.wikipedia.org/wiki/A/B_testing
> > [2] https://github.com/KeepSafe/Switchboard
> > [3]
> >
> http://keepsafe-engineering.tumblr.com/post/28437940369/easy-mobile-ab-testing
> > [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1196897
> > [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1199859
> > _______________________________________________
> > dev-planning mailing list
> > [hidden email]
> > https://lists.mozilla.org/listinfo/dev-planning
>
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: A/B Testing in Firefox for Android

Martin Thomson
On Thu, Sep 3, 2015 at 12:21 PM, Mark Finkle <[hidden email]> wrote:
> We only intend to use this when the experiment is in a code path that
> happens very early in application startup. We lose all ability to
> dynamically alter the configuration and code path if we use this approach.
> Any changes must be landed in the client application.


I'm not sure that I understand your concern here.  If you were to
publish the low and high values for a given experiment, then you do
commit to using CRC32 (and low and high markers), but that's not a
problem in an of itself.  After all, you could include an indicator
that describes how the buckets are calculated if you wanted to allow
for some flexibility.

If the concern is that you won't be able to update rapidly, I'd
suggest that you might want to look at pushing updates rather than
rely on clients polling.
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: A/B Testing in Firefox for Android

Eric Rescorla
On Thu, Sep 3, 2015 at 1:11 PM, Martin Thomson <[hidden email]> wrote:

> On Thu, Sep 3, 2015 at 12:21 PM, Mark Finkle <[hidden email]> wrote:
> > We only intend to use this when the experiment is in a code path that
> > happens very early in application startup. We lose all ability to
> > dynamically alter the configuration and code path if we use this
> approach.
> > Any changes must be landed in the client application.
>
>
> I'm not sure that I understand your concern here.  If you were to
> publish the low and high values for a given experiment, then you do
> commit to using CRC32 (and low and high markers), but that's not a
> problem in an of itself.  After all, you could include an indicator
> that describes how the buckets are calculated if you wanted to allow
> for some flexibility.
>
> If the concern is that you won't be able to update rapidly, I'd
> suggest that you might want to look at pushing updates rather than
> rely on clients polling.


I don't follow what the issue is here either with using client-side
decisioning.

-Ekr
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: A/B Testing in Firefox for Android

Mark Finkle-2
How would we change the decision?

How would we do staged rollouts or backouts?

On Thu, Sep 3, 2015 at 4:15 PM, Eric Rescorla <[hidden email]> wrote:

>
>
> On Thu, Sep 3, 2015 at 1:11 PM, Martin Thomson <[hidden email]> wrote:
>
>> On Thu, Sep 3, 2015 at 12:21 PM, Mark Finkle <[hidden email]> wrote:
>> > We only intend to use this when the experiment is in a code path that
>> > happens very early in application startup. We lose all ability to
>> > dynamically alter the configuration and code path if we use this
>> approach.
>> > Any changes must be landed in the client application.
>>
>>
>> I'm not sure that I understand your concern here.  If you were to
>> publish the low and high values for a given experiment, then you do
>> commit to using CRC32 (and low and high markers), but that's not a
>> problem in an of itself.  After all, you could include an indicator
>> that describes how the buckets are calculated if you wanted to allow
>> for some flexibility.
>>
>> If the concern is that you won't be able to update rapidly, I'd
>> suggest that you might want to look at pushing updates rather than
>> rely on clients polling.
>
>
> I don't follow what the issue is here either with using client-side
> decisioning.
>
> -Ekr
>
>
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: A/B Testing in Firefox for Android

Mark Finkle-2
In reply to this post by Martin Thomson
On Thu, Sep 3, 2015 at 4:11 PM, Martin Thomson <[hidden email]> wrote:

> On Thu, Sep 3, 2015 at 12:21 PM, Mark Finkle <[hidden email]> wrote:
> > We only intend to use this when the experiment is in a code path that
> > happens very early in application startup. We lose all ability to
> > dynamically alter the configuration and code path if we use this
> approach.
> > Any changes must be landed in the client application.
>
>
> I'm not sure that I understand your concern here.  If you were to
> publish the low and high values for a given experiment, then you do
> commit to using CRC32 (and low and high markers), but that's not a
> problem in an of itself.  After all, you could include an indicator
> that describes how the buckets are calculated if you wanted to allow
> for some flexibility.
>
> If the concern is that you won't be able to update rapidly, I'd
> suggest that you might want to look at pushing updates rather than
> rely on clients polling.
>

Pushing updates is exactly what we want to avoid. Pushing updates is what
we do right now, and it's not without issues. Pushing updates also makes it
almost impossible to mange staged rollouts, or quick backouts. Going faster
is part of our goals too.
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: A/B Testing in Firefox for Android

Eric Rescorla
On Thu, Sep 3, 2015 at 5:24 PM, Mark Finkle <[hidden email]> wrote:

> On Thu, Sep 3, 2015 at 4:11 PM, Martin Thomson <[hidden email]> wrote:
>
> > On Thu, Sep 3, 2015 at 12:21 PM, Mark Finkle <[hidden email]>
> wrote:
> > > We only intend to use this when the experiment is in a code path that
> > > happens very early in application startup. We lose all ability to
> > > dynamically alter the configuration and code path if we use this
> > approach.
> > > Any changes must be landed in the client application.
> >
> >
> > I'm not sure that I understand your concern here.  If you were to
> > publish the low and high values for a given experiment, then you do
> > commit to using CRC32 (and low and high markers), but that's not a
> > problem in an of itself.  After all, you could include an indicator
> > that describes how the buckets are calculated if you wanted to allow
> > for some flexibility.
> >
> > If the concern is that you won't be able to update rapidly, I'd
> > suggest that you might want to look at pushing updates rather than
> > rely on clients polling.
> >
>
> Pushing updates is exactly what we want to avoid. Pushing updates is what
> we do right now, and it's not without issues. Pushing updates also makes it
> almost impossible to mange staged rollouts, or quick backouts. Going faster
> is part of our goals too.


I'm not following the concern.

The usual way to do this is to have the server publish a manifest that tells
the client "if you are in this range of the UUID space, then behave this
way". You then use exactly the same retrieval policies you currently
do but instead of publishing per-client instructions, you publish the
manifest.

-Ekr
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: A/B Testing in Firefox for Android

Mark Finkle-2
OK, I understand your saying now. Switchboard doesn't currently work that
way, but after bug 1201384 lands we could consider converting the code to
do that.

On Thu, Sep 3, 2015 at 8:45 PM, Eric Rescorla <[hidden email]> wrote:

>
> On Thu, Sep 3, 2015 at 5:24 PM, Mark Finkle <[hidden email]> wrote:
>
>> On Thu, Sep 3, 2015 at 4:11 PM, Martin Thomson <[hidden email]> wrote:
>>
>> > On Thu, Sep 3, 2015 at 12:21 PM, Mark Finkle <[hidden email]>
>> wrote:
>> > > We only intend to use this when the experiment is in a code path that
>> > > happens very early in application startup. We lose all ability to
>> > > dynamically alter the configuration and code path if we use this
>> > approach.
>> > > Any changes must be landed in the client application.
>> >
>> >
>> > I'm not sure that I understand your concern here.  If you were to
>> > publish the low and high values for a given experiment, then you do
>> > commit to using CRC32 (and low and high markers), but that's not a
>> > problem in an of itself.  After all, you could include an indicator
>> > that describes how the buckets are calculated if you wanted to allow
>> > for some flexibility.
>> >
>> > If the concern is that you won't be able to update rapidly, I'd
>> > suggest that you might want to look at pushing updates rather than
>> > rely on clients polling.
>> >
>>
>> Pushing updates is exactly what we want to avoid. Pushing updates is what
>> we do right now, and it's not without issues. Pushing updates also makes
>> it
>> almost impossible to mange staged rollouts, or quick backouts. Going
>> faster
>> is part of our goals too.
>
>
> I'm not following the concern.
>
> The usual way to do this is to have the server publish a manifest that
> tells
> the client "if you are in this range of the UUID space, then behave this
> way". You then use exactly the same retrieval policies you currently
> do but instead of publishing per-client instructions, you publish the
> manifest.
>
> -Ekr
>
>
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning
Reply | Threaded
Open this post in threaded view
|

Re: A/B Testing in Firefox for Android

Martin Thomson
In reply to this post by Mark Finkle-2
On Thu, Sep 3, 2015 at 5:24 PM, Mark Finkle <[hidden email]> wrote:
> Pushing updates is exactly what we want to avoid. Pushing updates is what we
> do right now, and it's not without issues. Pushing updates also makes it
> almost impossible to mange staged rollouts, or quick backouts. Going faster
> is part of our goals too.

I'm not talking about pushing software updates, just updates to the
A/B rules.  And by push, I mean this:
https://developer.mozilla.org/en-US/docs/Web/API/Push_API
_______________________________________________
dev-planning mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-planning