local storage vs cache

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

local storage vs cache

Taras Glek-3
Hi,
I've been looking at why people use dom local
storage(http://hacks.mozilla.org/2012/03/there-is-no-simple-solution-for-local-storage/ 
has some good discussion). A lot of use seems to come from the fact that
developers don't trust the network cache.

Perhaps we could provide some apis to
* group items(this would also be a big win for disk locality)
* prioritize cache items
* query if an item is in cache + ability to request items out of cache
without a network request(even if headers are expired)
* evict an item
* place an item in cache(ie a generated image)
* evict an item from cache

Here is post to start thinking about this:
http://www.garfieldtech.com/blog/caching-tng

I know historically we were reluctant to expose detailed cache info to
the web, perhaps it's time to reconsider some aspects of that?

The nice thing about the disk cache is that it has expiry logic, is
supposed to efficiently cache large blobs(something that
indexeddb/LS[databases in general] struggle at).

Btw, it turns out indexeddb isn't a good successor to LS in current form
because it's slow, requires a prompt, doesn't have a good data-cleanup
story.

Taras

ps. This isn't a concrete feature request, just something to consider
while we look at how to evolve the network cache.
_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: local storage vs cache

Patrick McManus
On Mon, 2012-03-05 at 12:17 -0800, Taras Glek wrote:

> * query if an item is in cache

I was at a meeting of devops folks last week and I was asked for that
one a lot. An awful lot :)

Additionally - ability to asynchronously fetch and place items in the
cache for future use if they weren't currently fresh. (it wasn't clear
to me exactly why manipulating the dom for link prefetch didn't do that
- though I was assured it didn't.)

it seems to me that grouping and prioritizing are probably
implementation details that would be better off not tieing our hands by
exposing. If we did a decent job with them it would be a non issue.
maybe I'm wrong.



_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: local storage vs cache

Jonas Sicking-2
On Mon, Mar 5, 2012 at 12:30 PM, Patrick McManus <[hidden email]> wrote:

> On Mon, 2012-03-05 at 12:17 -0800, Taras Glek wrote:
>
>> * query if an item is in cache
>
> I was at a meeting of devops folks last week and I was asked for that
> one a lot. An awful lot :)
>
> Additionally - ability to asynchronously fetch and place items in the
> cache for future use if they weren't currently fresh. (it wasn't clear
> to me exactly why manipulating the dom for link prefetch didn't do that
> - though I was assured it didn't.)
>
> it seems to me that grouping and prioritizing are probably
> implementation details that would be better off not tieing our hands by
> exposing. If we did a decent job with them it would be a non issue.
> maybe I'm wrong.

We should definitely be able to come up with an API which provides
some cache control for pages. My first question is, what are the use
cases? I.e. what is it that people want to do?

Is it as simple as facebook knowing that you'll likely need resource
at url X in a few seconds, and so want to make us start fetching it
now?

Or do they need to pin an resource in the cache because the user
experience would be really bad if they had to hit the network (I.e.
the resource is an image that is part of an animation or a mouse-over
effect, or is used in a game).

I'd like to hear more about why people want to query if an item exists
in the cache. Would it be ok that even if they get a "yes" answer, by
the time they use the resource it might have gotten evicted. Last time
I checked, people didn't quite realize this. We could of course
provide a way to hand back a cache token which would keep the item
pinned as long as it was kept alive.

The article Taras is linking to is actually asking for something very
simple: Being able to update the expiration of a already-cached
resource. But looking at the comments it doesn't seem like everyone
thinks it's a good idea. And further down in the comments he
acknowledge that he might not understand the HTTP cache as well as he
gives the impression to do.


In general, one problem that I feel that we often run into is that
people simply don't understand the features that we already have for
cache control. In particular people seem to have a terrible time
setting their cache headers correctly. This is of course not meant to
blow off the problem. If the features are too complex to use then we
need better features.

/ Jonas
_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: local storage vs cache

Brian Smith-31
In reply to this post by Patrick McManus
Patrick McManus wrote:
> Additionally - ability to asynchronously fetch and place items in the
> cache for future use if they weren't currently fresh.

AppCache does this, more or less.

> (it wasn't clear to me exactly why manipulating the dom for link
> prefetch didn't do that - though I was assured it didn't.)

If it doesn't work, then my guess is that link prefetch is only processed during HTML parsing, so that prefetch links added by JS would be ignored.

Taras's blog post makes some good points. But, the examples he chose were not great as far as best practices are concerned. For example, if whitehouse.gov makes the browser revalidate 95 resources every time it is loaded, that is not HTTP's fault; that is the developer's fault. A new API isn't going to help that developer. People who build high-performance websites know how to make the browser avoid those requests. Many of these techniques have even been automated into things like mod_pagespeed.

Even when you have to do revalidation, at least in theory revalidating 95 entries should be really fast if you have (SPDY or (TLS compression and full HTTP pipelining)) AND a smart server). The fact that that describes about 0% of the web is the main problem, AFAICT. And, most of the remaining work on correcting that has to happen server-side, not client-side.

If I were a web app developer, in the short term I would try putting as much in AppCache as possible, for browsers that don't prompt for AppCache. This should work well unless/until other people start doing so, if the browser de-prioritizes the eviction of AppCache-cached resources. (Once everybody does this, then such browsers will have to garbage collect your AppCache-cached resources just like non-AppCache resources, AFAICT.) Though long-term, AFAICT, AppCache doesn't really solve any problems unless it is used as the manifest for explicitly-installed web apps.

I wouldn't doubt that some IndexedDB implementations are slow. But, I think IndexedDB can be made fast if it is slow now. There's no reason that an IndexedDB implementation should be slower than a persistent HTTP cache. I would think it would be easier to make an IndexedDB implementation faster than a disk cache than to make a disk cache faster than IndexedDB. What operations are slow in Gecko's IndexedDB implementation?

One thing that I think is missing in IndexedDB is a way to indicate which entries can be safely garbage collected by the browser. Right now, the browser has to choose to throw away none of the data for a site, or all of it. This means it can't automatically allow a site to use up to 100MB (say) of space, and then selectively delete some of it as needed. I think this is where some kind of cache API like Taras suggested may be helpful.

FWIW, this weekend I reviewed the use of the nsICache* API by dozens of extensions on AMO and AFAICT, most if not all of them were prone to race conditions, at least in theory.

Cheers,
Brian
_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: local storage vs cache

Jonas Sicking-2
On Mon, Mar 5, 2012 at 7:47 PM, Brian Smith <[hidden email]> wrote:

> Patrick McManus wrote:
>> Additionally - ability to asynchronously fetch and place items in the
>> cache for future use if they weren't currently fresh.
>
> AppCache does this, more or less.
>
>> (it wasn't clear to me exactly why manipulating the dom for link
>> prefetch didn't do that - though I was assured it didn't.)
>
> If it doesn't work, then my guess is that link prefetch is only processed during HTML parsing, so that prefetch links added by JS would be ignored.
>
> Taras's blog post makes some good points. But, the examples he chose were not great as far as best practices are concerned. For example, if whitehouse.gov makes the browser revalidate 95 resources every time it is loaded, that is not HTTP's fault; that is the developer's fault. A new API isn't going to help that developer. People who build high-performance websites know how to make the browser avoid those requests. Many of these techniques have even been automated into things like mod_pagespeed.
>
> Even when you have to do revalidation, at least in theory revalidating 95 entries should be really fast if you have (SPDY or (TLS compression and full HTTP pipelining)) AND a smart server). The fact that that describes about 0% of the web is the main problem, AFAICT. And, most of the remaining work on correcting that has to happen server-side, not client-side.
>
> If I were a web app developer, in the short term I would try putting as much in AppCache as possible, for browsers that don't prompt for AppCache. This should work well unless/until other people start doing so, if the browser de-prioritizes the eviction of AppCache-cached resources. (Once everybody does this, then such browsers will have to garbage collect your AppCache-cached resources just like non-AppCache resources, AFAICT.) Though long-term, AFAICT, AppCache doesn't really solve any problems unless it is used as the manifest for explicitly-installed web apps.
>
> I wouldn't doubt that some IndexedDB implementations are slow. But, I think IndexedDB can be made fast if it is slow now. There's no reason that an IndexedDB implementation should be slower than a persistent HTTP cache. I would think it would be easier to make an IndexedDB implementation faster than a disk cache than to make a disk cache faster than IndexedDB. What operations are slow in Gecko's IndexedDB implementation?
>
> One thing that I think is missing in IndexedDB is a way to indicate which entries can be safely garbage collected by the browser. Right now, the browser has to choose to throw away none of the data for a site, or all of it. This means it can't automatically allow a site to use up to 100MB (say) of space, and then selectively delete some of it as needed. I think this is where some kind of cache API like Taras suggested may be helpful.
>
> FWIW, this weekend I reviewed the use of the nsICache* API by dozens of extensions on AMO and AFAICT, most if not all of them were prone to race conditions, at least in theory.

Hmm.. I actually disagree with you here.

AppCache is a feature to cache a whole "web app" locally. It isn't
designed to be a "normal HTTP cache" enhancement. I.e. if a user is
browsing around on your site and you want to either ensure that
there's no network request for stylesheets/script/images which appear
on most pages, or if you want to prefetch the next page that you think
the user is going to browse to, then trying to hack the AppCache to do
this for you would be just a hack. I don't think that we should try to
bend the AppCache into supporting that usecase.

As for using IndexedDB, I think that's too much of a complex solution
for pages. It means that you can't simply stick <img
src="somepic.jpg"> in your markup. Instead you have to use the DOM and
a pile of JS in order to dynamically create a image element which
loads data from IndexedDB. It'd work, but it means dramatically
changing how people write web pages, IMHO for the worse.

/ Jonas
_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: local storage vs cache

Brian Smith-31
Jonas Sicking wrote:
> On Mon, Mar 5, 2012 at 7:47 PM, Brian Smith <[hidden email]>
> > If I were a web app developer, in the short term I would try
> > putting as much in AppCache as possible, for browsers that don't
> > prompt for AppCache.

> Hmm.. I actually disagree with you here.
>
> I don't think that we should try to bend the AppCache into
> supporting that usecase.

We don't disagree here. My point is that if/when we remove the prompts for AppCache, we should expect that web developers will use AppCache in this way. In particular, if you're a web developer, and if putting resources in AppCache means that your resources will get evicted with a lower priority than some other website's normal resources, then why wouldn't you (ab)use it? Just because that's not what it's for?

> As for using IndexedDB, I think that's too much of a complex solution
> for pages. It means that you can't simply stick <img
> src="somepic.jpg"> in your markup. Instead you have to use the DOM
> and a pile of JS in order to dynamically create a image element
> which loads data from IndexedDB. It'd work, but it means
> dramatically changing how people write web pages, IMHO for the
> worse.

It is hard to talk about this without some specific use cases. I think a lot of use cases are already handled in a pretty good way using XHR for preloading things into the cache, if you're willing to make a separate (hopefully pipelined/multiplexed) HTTP request for each one. Getting that to work efficiently is more of a networking connection management problem than a disk cache or database problem, AFAICT.

Perhaps we should look at how we'd reimplement Thunderbird as a modern HTML5 application (ignoring the lack of support for non-HTTP/non-websocket networking) and see how we'd efficiently cache resources with it. (We will need to implement such an email client for B2G anyway.)

- Brian
_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: local storage vs cache

Jonas Sicking-2
On Mon, Mar 5, 2012 at 9:10 PM, Brian Smith <[hidden email]> wrote:

> Jonas Sicking wrote:
>> On Mon, Mar 5, 2012 at 7:47 PM, Brian Smith <[hidden email]>
>> > If I were a web app developer, in the short term I would try
>> > putting as much in AppCache as possible, for browsers that don't
>> > prompt for AppCache.
>
>> Hmm.. I actually disagree with you here.
>>
>> I don't think that we should try to bend the AppCache into
>> supporting that usecase.
>
> We don't disagree here. My point is that if/when we remove the prompts for AppCache, we should expect that web developers will use AppCache in this way. In particular, if you're a web developer, and if putting resources in AppCache means that your resources will get evicted with a lower priority than some other website's normal resources, then why wouldn't you (ab)use it? Just because that's not what it's for?

Yes! I definitely agree with this. It's an especially bad problem
because things will look great right now when only a few sites uses
the AppCache and so authors will push for adding all sorts of features
to AppCache. But a year from now when everyone does, we'll be back
here with the same problem set and not having actually solved
anything, but wasted effort adding features that make the web platform
more complex.

Maybe one solution would be to not AppCache a website unless we see
that the user goes there a lot. So if you visit a site for the first
time, or the first time in a few weeks, then we simply would ignore
the manifest attribute. But if we see that a user has visited the site
5 times the past week, we could stick it in the AppCache.

>> As for using IndexedDB, I think that's too much of a complex solution
>> for pages. It means that you can't simply stick <img
>> src="somepic.jpg"> in your markup. Instead you have to use the DOM
>> and a pile of JS in order to dynamically create a image element
>> which loads data from IndexedDB. It'd work, but it means
>> dramatically changing how people write web pages, IMHO for the
>> worse.
>
> It is hard to talk about this without some specific use cases. I think a lot of use cases are already handled in a pretty good way using XHR for preloading things into the cache, if you're willing to make a separate (hopefully pipelined/multiplexed) HTTP request for each one. Getting that to work efficiently is more of a networking connection management problem than a disk cache or database problem, AFAICT.
>
> Perhaps we should look at how we'd reimplement Thunderbird as a modern HTML5 application (ignoring the lack of support for non-HTTP/non-websocket networking) and see how we'd efficiently cache resources with it. (We will need to implement such an email client for B2G anyway.)

I think the thunderbird scenario is a somewhat simpler one. The
solution there is basically AppCache. The part that feels more complex
is sites like facebook or cnn.com where the user will be visiting a
large number of URLs with a large number of resources in it.

/ Jonas
_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: local storage vs cache

Brian Smith-31
Jonas Sicking wrote:
> I think the thunderbird scenario is a somewhat simpler one. The
> solution there is basically AppCache.

What about the caching of all the emails? And the handling of unsent drafts in offline mode? You definitely don't want to automatically garbage collect an unsent email.

- Brian
_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: local storage vs cache

Jonas Sicking-2
On Mon, Mar 5, 2012 at 9:49 PM, Brian Smith <[hidden email]> wrote:
> Jonas Sicking wrote:
>> I think the thunderbird scenario is a somewhat simpler one. The
>> solution there is basically AppCache.
>
> What about the caching of all the emails? And the handling of unsent drafts in offline mode? You definitely don't want to automatically garbage collect an unsent email.

For "data" like that I would normally say IndexedDB is the solution.
You're not going to fetch those using <img src=...> anyway so there's
no point in having them represented as a URL.

Though possibly things are different for a webmail app like gmail.

/ Jonas
_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: local storage vs cache

Patrick McManus
In reply to this post by Jonas Sicking-2
On Mon, 2012-03-05 at 19:01 -0800, Jonas Sicking wrote:

> We should definitely be able to come up with an API which provides
> some cache control for pages. My first question is, what are the use
> cases? I.e. what is it that people want to do?
>

I heard that cache detection might be used to choose between low and
high bandwidth page elements.. especially if the high res ones could be
asynchronously brought into cache for future use.

> Is it as simple as facebook knowing that you'll likely need resource
> at url X in a few seconds, and so want to make us start fetching it
> now?
>

that is an additional part of it yes - but also the straight up query.

> Or do they need to pin an resource in the cache because the user
> experience would be really bad if they had to hit the network (I.e.
> the resource is an image that is part of an animation or a mouse-over
> effect, or is used in a game).

I asked this - the answer was that pinning and {cache query, async load}
are separate. They seemed to feel that localstorage (etc..) were
acceptable for pinning, and while they wanted the http cache to do the
rest for them but they were willing to abuse localstorage only if they
couldn't get the results they wanted any other way.

>
> I'd like to hear more about why people want to query if an item exists
> in the cache. Would it be ok that even if they get a "yes" answer, by
> the time they use the resource it might have gotten evicted. Last time
> I checked, people didn't quite realize this. We could of course
> provide a way to hand back a cache token which would keep the item
> pinned as long as it was kept alive.

I double checked this - and at least the 2 people that responded to me
understood there was a time of check time of use problem and they were
ok with that.

>
> The article Taras is linking to is actually asking for something very
> simple: Being able to update the expiration of a already-cached
> resource. But looking at the comments it doesn't seem like everyone
> thinks it's a good idea. And further down in the comments he
> acknowledge that he might not understand the HTTP cache as well as he
> gives the impression to do.
>

cache invalidation is pretty interesting especially in a "logout" kind
of scenario.

>
> In general, one problem that I feel that we often run into is that
> people simply don't understand the features that we already have for
> cache control. In particular people seem to have a terrible time
> setting their cache headers correctly. This is of course not meant to
> blow off the problem. If the features are too complex to use then we
> need better features.
>

well, the meeting I was talking about was the velocity group you hooked
me up with. The attendees there were certainly aware of how caching is
specified to work (which I agree does not make them typical) and they
still had the distinct impression that hit rates were much poorer than
they would expect - though no hard data on that front. I'll have a
different summary post to d.t.n later on the generalized feedback from
that group.



_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network
Reply | Threaded
Open this post in threaded view
|

Re: local storage vs cache

Randell Jesup-4
In reply to this post by Taras Glek-3
>Hi,
>I've been looking at why people use dom local
>storage(http://hacks.mozilla.org/2012/03/there-is-no-simple-solution-for-local-storage/
>has some good discussion). A lot of use seems to come from the fact that
>developers don't trust the network cache.

In addition the discussion that occured, I should note that the cache is
invalidated on a crash/unclean shutdown/power-loss/etc.  Bug 105843
(yes, almost 11 year old bug).

--
Randell Jesup, Mozilla Corp
remove ".news" for personal email
_______________________________________________
dev-tech-network mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-network