Upcoming Cache Work

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Upcoming Cache Work

Nick Hurley
Hi All,

In preparation for discussing and working on the disk cache during our
upcoming necko work week, I wanted to start some discussion with a
wider audience about directions we can go for making our disk cache
present- (and future-) compliant. Right now, I'm starting with a focus
on (1) making the cache more performant, and (2) increasing
stability/preventing data loss under certain circumstances. With that
in mind, here is my initial list of ideas (in no particular order) of
things we can/should do. I would love to hear what others have to say
in terms of different ideas, reasons my ideas suck (or are great), or
something that I've missed.

(A) Reduce lock contention. Currently, the entire cache is protected
by one big lock, which effectively makes our cache single threaded. My
first step idea for this is to start making our locking finer grained.
[1] is the first bug in this work, with more to follow as more
opportunities for finer grained locking are identified. Eventually, it
would be nice to make some of our locks into multiple reader/single
writer locks for data structures where that makes sense (the global
cache metadata, for example).

(B) Reduce round-trips between the main thread and the cache thread.
This would mean pageloads aren't pushed to the tail of the queue each
time they need to get more data from the cache (among other things).
Discussion on this particular topic has already started at [2].

(C) Increase parallelism within the cache. Currently all cache I/O
happens serially on one thread, even if we're trying to read/write
more than one element in the cache at a time. We should improve this
situation by having more than one thread available for reading/writing
cache entries (this is dependent on having finer-grained locking of
some sort).

(D) Make it so changing the on-disk format of the cache is less likely
to require a wholesale deletion of the cache. Discussion has started
in [3] with some suggestions for strategies to achieve this goal,
which include versioning each cache entry independently of the cache
as a whole, keeping a bitmask of "options" used in the particular
cache entry (enabling us to ignore entries with options we don't
recognize or missing options we require), or making the vast majority
of the metadata more freeform (similar to the bitmask solution, but
text-based and allowing for key/value pairs).

(E) Make crash recovery more robust. This is a long-standing bug [4]
about the fact that we have precisely one dirty bit for the entire
cache, and if we ever crash (shutdown uncleanly), the entire cache is
deleted when the browser next starts up.

(F) More efficient storage of entries. One incremental improvement
might be to store an entry's metadata with that entry when the entry
is not stored in a block file. Currently, the entry and its metadata
are stored in 2 separate files, which can affect things like directory
seek times in a very full cache.

That's the list I've created for myself as a starting point. Of course
none of this is The One Final Answer, and priorities of these (and
other suggestions) are still up in the air. Comments are highly
desired so we can get our cache in way better shape.

Thanks!
-Nick

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=717761
[2] https://groups.google.com/d/topic/mozilla.dev.tech.network/RTqPxjy-Eq0/discussion
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=715752
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=105843
_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: Upcoming Cache Work

Patrick McManus
Nick - its an awesome list you have!

  I've been keeping an evernote with some ideas about this - so let me
just dump my notes out here in their unfleshed-out glory as input to
your project. They might not be helpful to you - I won't be hurt, but I
did want to share them. I'll admit some of this is thinking-big instead
of just fixing a weakness but if its an area we can leap frog ahead
surely that's worth discussing.

* Reduce random access nature of I/O by replacing with block reads and
writes that can minimize seek and utilize read ahead.. instead of 1 IO
event per URI, maybe organize things temporally or by their embedded
object relationships. I like fifo data structures for this - they are
great at minimizing contention. Consider if a little duplication of
smallish things (js) is really so bad if it saves seek times.

* Similarly, apply the linked-object concept to replacement policy.

* bring concepts of cost and value into the replacement policy (both for
mem and disk) decision. by cost I mean that fetching a large .swf from
another continent has a different transfer time  than fetching a 300
byte icon from the local LAN - LRU is probably not the right way to
compare those two things bidding for the last disk block.  By value I
mean that a hit on some objects is simply more valuable than a hit on
other objects - html, css, and js for instance are imo more important to
the page load experience than images. (which is not to say that you want
to store js for rarely used pages at the cost of evicting images of
highly used ones, but that in all-other-things-being equal sense they
are more valuable).

* Consider reducing the size of the data under management (and therefore
bound the IO and memory) by only caching the first mumble bytes of image
resources - enough to give layout hints. Maybe do this as a L2 disk
cache after the full object has been evicted from the smaller full L1 cache.

* invent Self organizing cache sharing LAN protocols. A privacy
nightmare in a coffee shop, but a killer feature in a classroom setting.
IMO this absolutely has potential to radically step up performance in
classroom setting.

* Spdy server push needs some way to create a cache scoped to a document
as temporary storage for the pushed data while reusing the "is this is
hit" logic for subresouces of the document - it would be great to have
an in-mem cache that allowed reuse for this kind of thing as well as
promotion of used URIs to the general cache. I can explain more in person.

hth

On 1/24/2012 7:02 PM, Nick Hurley wrote:

> Hi All,
>
> In preparation for discussing and working on the disk cache during our
> upcoming necko work week, I wanted to start some discussion with a
> wider audience about directions we can go for making our disk cache
> present- (and future-) compliant. Right now, I'm starting with a focus
> on (1) making the cache more performant, and (2) increasing
> stability/preventing data loss under certain circumstances. With that
> in mind, here is my initial list of ideas (in no particular order) of
> things we can/should do. I would love to hear what others have to say
> in terms of different ideas, reasons my ideas suck (or are great), or
> something that I've missed.
>
> (A) Reduce lock contention. Currently, the entire cache is protected
> by one big lock, which effectively makes our cache single threaded. My
> first step idea for this is to start making our locking finer grained.
> [1] is the first bug in this work, with more to follow as more
> opportunities for finer grained locking are identified. Eventually, it
> would be nice to make some of our locks into multiple reader/single
> writer locks for data structures where that makes sense (the global
> cache metadata, for example).
>
> (B) Reduce round-trips between the main thread and the cache thread.
> This would mean pageloads aren't pushed to the tail of the queue each
> time they need to get more data from the cache (among other things).
> Discussion on this particular topic has already started at [2].
>
> (C) Increase parallelism within the cache. Currently all cache I/O
> happens serially on one thread, even if we're trying to read/write
> more than one element in the cache at a time. We should improve this
> situation by having more than one thread available for reading/writing
> cache entries (this is dependent on having finer-grained locking of
> some sort).
>
> (D) Make it so changing the on-disk format of the cache is less likely
> to require a wholesale deletion of the cache. Discussion has started
> in [3] with some suggestions for strategies to achieve this goal,
> which include versioning each cache entry independently of the cache
> as a whole, keeping a bitmask of "options" used in the particular
> cache entry (enabling us to ignore entries with options we don't
> recognize or missing options we require), or making the vast majority
> of the metadata more freeform (similar to the bitmask solution, but
> text-based and allowing for key/value pairs).
>
> (E) Make crash recovery more robust. This is a long-standing bug [4]
> about the fact that we have precisely one dirty bit for the entire
> cache, and if we ever crash (shutdown uncleanly), the entire cache is
> deleted when the browser next starts up.
>
> (F) More efficient storage of entries. One incremental improvement
> might be to store an entry's metadata with that entry when the entry
> is not stored in a block file. Currently, the entry and its metadata
> are stored in 2 separate files, which can affect things like directory
> seek times in a very full cache.
>
> That's the list I've created for myself as a starting point. Of course
> none of this is The One Final Answer, and priorities of these (and
> other suggestions) are still up in the air. Comments are highly
> desired so we can get our cache in way better shape.
>
> Thanks!
> -Nick
>
> [1] https://bugzilla.mozilla.org/show_bug.cgi?id=717761
> [2] https://groups.google.com/d/topic/mozilla.dev.tech.network/RTqPxjy-Eq0/discussion
> [3] https://bugzilla.mozilla.org/show_bug.cgi?id=715752
> [4] https://bugzilla.mozilla.org/show_bug.cgi?id=105843
> _______________________________________________
> dev-tech-network mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-tech-network

_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: Upcoming Cache Work

Taras Glek-3
In reply to this post by Nick Hurley
On 1/24/2012 5:39 PM, Patrick McManus wrote:

> Nick - its an awesome list you have!
>
> I've been keeping an evernote with some ideas about this - so let me
> just dump my notes out here in their unfleshed-out glory as input to
> your project. They might not be helpful to you - I won't be hurt, but I
> did want to share them. I'll admit some of this is thinking-big instead
> of just fixing a weakness but if its an area we can leap frog ahead
> surely that's worth discussing.
>
> * Reduce random access nature of I/O by replacing with block reads and
> writes that can minimize seek and utilize read ahead.. instead of 1 IO
> event per URI, maybe organize things temporally or by their embedded
> object relationships. I like fifo data structures for this - they are
> great at minimizing contention. Consider if a little duplication of
> smallish things (js) is really so bad if it saves seek times.

I agree with everything that has been suggested(except for the cache
sharing idea, that can be done in an addon).

We absolutely need to get rid of hashtable-on-disk pattern of cache
block files. It has insane overhead. As mentioned elsewhere we should
change the format to instead have better data-locality and readahead.

We should stop assuming that disk cache is faster than network(Bug 725423)

Taras
_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: Upcoming Cache Work

Josh Aas-3
A high-level summary of our plan from the work week:

First (now) we'll go after some high priority issues that can be resolved without changing the on-disk format. Main thread locking issues and bug 725423, for example. After that we'll make sure we have a good system in place for handling on-disk formatting version changes. Then we'll start making changes to the on-disk format.
_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: Upcoming Cache Work

Henri Sivonen
Are there any plans to include the referring origin in the cache key
to address the cache probing history leak that was demoed at
http://lcamtuf.coredump.cx/cachetime/ ?

--
Henri Sivonen
[hidden email]
http://hsivonen.iki.fi/
_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: Upcoming Cache Work

Brian Smith-31
Henri Sivonen wrote:
> To: [hidden email]
> Sent: Monday, February 20, 2012 3:48:44 AM
> Subject: Re: Upcoming Cache Work
>
> Are there any plans to include the referring origin in the cache key
> to address the cache probing history leak that was demoed at
> http://lcamtuf.coredump.cx/cachetime/ ?

There is no plan. I suggestion you file a bug (if there isn't one already). It would be tricky to do, as we would have to avoid major performance regressions by doing so.

Cheers,
Brian
_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: Upcoming Cache Work

Zack Weinberg-2
In reply to this post by Nick Hurley
On 2012-02-28 4:06 PM, Brian Smith wrote:
> Henri Sivonen wrote:
>> Are there any plans to include the referring origin in the cache
>> key to address the cache probing history leak that was demoed at
>> http://lcamtuf.coredump.cx/cachetime/ ?
>
> There is no plan. I suggestion you file a bug (if there isn't one
> already).

Please cc: me on the bug.  I may be able to help and/or scare up some
student help.

> It would be tricky to do, as we would have to avoid major
> performance regressions by doing so.

To some extent it *has* to regress performance, because this is a timing
attack: when the attack site tries to iframe something, it has to
*appear* to not be in the cache, even if it is, and that means delaying
the load.

I have no idea whether this is feasible with our cache architecture, but
in principle it would be possible to record the original load time for
each cached item.  Then, when we see one of these cross-domain loads
that has to appear to be a miss, we could delay the load for that much
time (with some randomization) but not actually hit the network.  That
might mitigate secondary performance hits from having to waste a network
channel on stuff we already have but can't admit to having.  On the
other hand, an even sneakier attacker might be able to find a way to
detect *this* behavior... but that's the arms race for ya.

zw
_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: Upcoming Cache Work

Patrick McManus
I'm going to suggest dev-tech-network is the wrong level to be making
this decision. (Ironic, I know - I'm always pushing stuff at d.t.n).

But in this case this should be decision driven, imo, by the product
teams (performance regression!) and the privacy teams. They are in the
position to manage the tradeoff. If they come to a conclusion that this
should be done if the hit is within some cost boundary then d.t.n seems
like the right place to work on making that happen, fill out a product
feature, etc.. I would ask those teams to really consider, at least in
the initial pass, the question of "how much of a performance hit is this
worth to you upfront" rather than punting it back and saying "how much
would this cost?".


On Tue, 2012-02-28 at 21:56 -0800, Zack Weinberg wrote:

> On 2012-02-28 4:06 PM, Brian Smith wrote:
> > Henri Sivonen wrote:
> >> Are there any plans to include the referring origin in the cache
> >> key to address the cache probing history leak that was demoed at
> >> http://lcamtuf.coredump.cx/cachetime/ ?
> >
> > There is no plan. I suggestion you file a bug (if there isn't one
> > already).
>
> Please cc: me on the bug.  I may be able to help and/or scare up some
> student help.
>
> > It would be tricky to do, as we would have to avoid major
> > performance regressions by doing so.
>
> To some extent it *has* to regress performance, because this is a timing
> attack: when the attack site tries to iframe something, it has to
> *appear* to not be in the cache, even if it is, and that means delaying
> the load.
>
> I have no idea whether this is feasible with our cache architecture, but
> in principle it would be possible to record the original load time for
> each cached item.  Then, when we see one of these cross-domain loads
> that has to appear to be a miss, we could delay the load for that much
> time (with some randomization) but not actually hit the network.  That
> might mitigate secondary performance hits from having to waste a network
> channel on stuff we already have but can't admit to having.  On the
> other hand, an even sneakier attacker might be able to find a way to
> detect *this* behavior... but that's the arms race for ya.
>
> zw
> _______________________________________________
> dev-tech-network mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-tech-network


_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: Upcoming Cache Work

Jason Duell-3
I think delaying every single cached load the browser does to mitigate
this attack is a poor first line of defense.   To be practical, the
approach requires a fair amount of browser-specific DOM-foolery┬«:  see
the comments in the source at
http://lcamtuf.coredump.cx/cachetime/firefox.html.   Perhaps we can go
after the hack to sidestep timer resolution clamping, or some of the
other browser-specific stuff.    Most attacks are also likely to
request, then cancel w/o rendering, a lot of URIs from domains that are
otherwise not represented on the page: if we detect that's going on, we
could take countermeasures (delaying cache loads just for that page,
warning user, etc).

Also, this attack is a variation on an issue that's been known since at
least 2000, with the main difference that this version is
non-destructive (i.e. doesn't change the cache, so it can be repeated
without messing up its own results).  If we can change our code so that
changing src= mid-load calls some new form of Cancel that either causes
the URI to be loaded and cached anyway, we can make this attack
cache-destructive once again, and be back to the status quo of the past
10 years.    I'm not sure how urgent even that level of mitigation is,
though:  it seems to me that most users in the world who have real
reason to fear consequential implications of their browser history being
exposed are being tracked by other means (their ISP/government is spying
on them).  Perhaps I'm under-reacting.

Jason


On 02/29/2012 05:35 AM, Patrick McManus wrote:

> I'm going to suggest dev-tech-network is the wrong level to be making
> this decision. (Ironic, I know - I'm always pushing stuff at d.t.n).
>
> But in this case this should be decision driven, imo, by the product
> teams (performance regression!) and the privacy teams. They are in the
> position to manage the tradeoff. If they come to a conclusion that this
> should be done if the hit is within some cost boundary then d.t.n seems
> like the right place to work on making that happen, fill out a product
> feature, etc.. I would ask those teams to really consider, at least in
> the initial pass, the question of "how much of a performance hit is this
> worth to you upfront" rather than punting it back and saying "how much
> would this cost?".
>
>
> On Tue, 2012-02-28 at 21:56 -0800, Zack Weinberg wrote:
>> On 2012-02-28 4:06 PM, Brian Smith wrote:
>>> Henri Sivonen wrote:
>>>> Are there any plans to include the referring origin in the cache
>>>> key to address the cache probing history leak that was demoed at
>>>> http://lcamtuf.coredump.cx/cachetime/ ?
>>> There is no plan. I suggestion you file a bug (if there isn't one
>>> already).
>> Please cc: me on the bug.  I may be able to help and/or scare up some
>> student help.
>>
>>> It would be tricky to do, as we would have to avoid major
>>> performance regressions by doing so.
>> To some extent it *has* to regress performance, because this is a timing
>> attack: when the attack site tries to iframe something, it has to
>> *appear* to not be in the cache, even if it is, and that means delaying
>> the load.
>>
>> I have no idea whether this is feasible with our cache architecture, but
>> in principle it would be possible to record the original load time for
>> each cached item.  Then, when we see one of these cross-domain loads
>> that has to appear to be a miss, we could delay the load for that much
>> time (with some randomization) but not actually hit the network.  That
>> might mitigate secondary performance hits from having to waste a network
>> channel on stuff we already have but can't admit to having.  On the
>> other hand, an even sneakier attacker might be able to find a way to
>> detect *this* behavior... but that's the arms race for ya.
>>
>> zw
>> _______________________________________________
>> dev-tech-network mailing list
>> [hidden email]
>> https://lists.mozilla.org/listinfo/dev-tech-network
>
> _______________________________________________
> dev-tech-network mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-tech-network

_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: Upcoming Cache Work

Christopher Blizzard-2
In reply to this post by Patrick McManus
On 2/29/2012 5:35 AM, Patrick McManus wrote:

> I'm going to suggest dev-tech-network is the wrong level to be making
> this decision. (Ironic, I know - I'm always pushing stuff at d.t.n).
>
> But in this case this should be decision driven, imo, by the product
> teams (performance regression!) and the privacy teams. They are in the
> position to manage the tradeoff. If they come to a conclusion that this
> should be done if the hit is within some cost boundary then d.t.n seems
> like the right place to work on making that happen, fill out a product
> feature, etc.. I would ask those teams to really consider, at least in
> the initial pass, the question of "how much of a performance hit is this
> worth to you upfront" rather than punting it back and saying "how much
> would this cost?".
>

I don't really have an opinion here from a product standpoint except to
say that the idea of implementing something that causes a pretty serious
negative performance change isn't something that's very appealing.

--Chris
_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: Upcoming Cache Work

Brian Smith-31
In reply to this post by Zack Weinberg-2
Zack Weinberg wrote:
> On 2012-02-28 4:06 PM, Brian Smith wrote:
> To some extent it *has* to regress performance, because this is a
> timing attack: when the attack site tries to iframe something, it
> has to *appear* to not be in the cache, even if it is, and that
> means delaying the load.

First, we should separate out good vs. bad subresource loads, and common vs. uncommon.

I don't think anybody would object to slowing down "bad" loads. But, if you were able to identify them as bad, then why not just block them to start with? Perahps, then, we need to implement features that allow sites to say things like "only allow cross-origin requests for this resource from origins X, Y, Z." That would seem to solve this problem (albeit, only for sites that are proactive about it) and others (e.g. it would prevent some  DoS attacks too).

Cheers,
Brian
_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network