SessionStore size - Bloat in sessionstore.js from Google and others causing major UI freezes

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

SessionStore size - Bloat in sessionstore.js from Google and others causing major UI freezes

Randell Jesup-3
[ This seemed long for in-bug discussion...  (And it's a pain to edit
there.)  We can move it back there if people wish. ]

As identified in bug 669034 (and bug 669603), we have a serious problem
with bloat in sessionstore.js caused largely by websites storing large
amounts of data in the dom sessionstorage.  See
http://dev.w3.org/html5/webstorage/#dom-sessionstorage for more info.
(My apologies for the naming confusion between "sessionstore.js" and
"sessionstorage".  Not my doing. :-)

For example, in a fresh profile browsing to google.com and typing two
words causes a 366KB sessionstore.js file.  Having Google anywhere in
the history of a tab typically uses 200-700KB of data.  sessionstore.js
files of 5, 10, 20 and even 50MB are not unheard of.

This is made worse by the fact that sessionstore.js is a single, huge
JSON object.  That means that on every save (and they occur a few
seconds after you navigate, scroll, or type), you get a burst of
activity to gather all this info, serialize it in JSON (taking a LOT of
temporary memory, and with larger ones causing GCs), and write it all to
disk (which is at least handled as a background task).  The memory-use
spikes are clearly visible (temporary usage of 3x+ the file size).

This also causes BAD delays in UI performance - starting with stutters,
and working up to 5-15 second freezes(!)

In larger profiles/sessionstores, this leads to NS_ERROR_OUT_OF_MEMORY
trying to save profiles after it's been running a little while, perhaps
due to VM fragmentation or allocations that are just too large (I
haven't traced down the source yet as I don't have  debugger set up on
the machine I run into it most with).  At that point it stops saving
data, but continues to try and cause freezes.

We should also consider what the target for sessionstore in Electrolysis
is, and if possible ease that work by hitting the right solution (or one
on the path to what E10s needs) now.

Options:
a) Convert sessionstore to SQLite, totally.

    Updates can be relatively small, though a single tab might have
    several sessionstore-heavy items in it and so be moderately large (a
    few meg)

    Variation: unchanged history items may be stored independently and
    so minor scrolling/navigation changes might be very small.  Note
    that we'd want to reference history pages by absolute depth, not
    relative.

    Downside: downgrade/revert will be bad (though support could be
    rolled in over 2 releases to ease the problem).  Could cause
    problems for Aurora/Beta users.

b) Separate sessionstore data from session restore data.  The
    sessionstore data is much larger and doesn't change as often; store
    it in a separate file (JSON or SQLite) when it changes (keep a dirty
    bit).

    Danger of synchronization issues between sessionstore.js and the new
    file.

    If a JSON file it will be one big file and need to be rewritten
    pretty often, negating the gains.

c) Don't store sessionstore data if larger than X.  I think there are
    lots of problems with this, and it would cause no end of user
    confusion/frustration, and annoyance of web authors, and hurt our
    'brand'

d) Push overhead to the startup read from the write side

    Use a pseudo-journal, as was mentioned  Write only dirty tabs, and
    write them with a version that increments. On Read, load only the
    most recent version.  This will require occasional rewrites in order
    to discard older updates, or a more sophisticated memory
    allocation-like file structure to overwrite older updates with ones
    that fit in the (unused) space.

    Plus: fast, low/very-low overhead for the most common operation
          (write after typing or scrolling).
    Minus: extra disk usage, more complex housekeeping, slightly slower
           startup.

    Added possibility: write updates as deltas.  Saves space/time,
    reduces housekeeping, slightly slows reading probably (but smaller
    files helps), more complex writing.

    This option amounts to a poor-man's filesystem (or if you prefer, a
    poor-man's DB).

e) Push the processing of the per-tab data over to a background
    thread/process.

    May play into work with Electrolysis.  Not necessarily at odds with
    other options above.  Fetching data from each tab would be the
    primary UI-blocker, and it would only block long enough to pass the
    data for one tab to the sessionstore background process.  At the
    cost of memory, could keep data for tabs around and fetch data only
    from those that indicate dirty state.

Others?  Any ideas out there?

Several of these options (e, f, etc) could be implemented using a
memory-mapped file (or rather a pair of files that it flips between for
crash-safety).  This would allow easy caching/reordering/modification of
the session data, though those sorts of operations may cause significant
disk IO.

P.S.  It's nice to be back (heavily active and later a driver in the
0.9-1.2ish days), and nice to sink my teeth into a few of these meaty
problems.

--
Randell Jesup, Mozilla Corporation
Remove ".news" for personal email
_______________________________________________
dev-performance mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-performance
Reply | Threaded
Open this post in threaded view
|

Re: SessionStore size - Bloat in sessionstore.js from Google and others causing major UI freezes

Benjamin Smedberg
On 7/19/2011 10:51 AM, Randell Jesup wrote:

> [ This seemed long for in-bug discussion...  (And it's a pain to edit
> there.)  We can move it back there if people wish. ]
>
> As identified in bug 669034 (and bug 669603), we have a serious
> problem with bloat in sessionstore.js caused largely by websites
> storing large amounts of data in the dom sessionstorage.  See
> http://dev.w3.org/html5/webstorage/#dom-sessionstorage for more info.
> (My apologies for the naming confusion between "sessionstore.js" and
> "sessionstorage".  Not my doing. :-)
>
> For example, in a fresh profile browsing to google.com and typing two
> words causes a 366KB sessionstore.js file.  Having Google anywhere in
> the history of a tab typically uses 200-700KB of data.  
> sessionstore.js files of 5, 10, 20 and even 50MB are not unheard of.
The reason we are storing "session" data in a file is because we want
session restore to work, right? Otherwise we wouldn't need to save this
data...
> e) Push the processing of the per-tab data over to a background
>    thread/process.
>
>    May play into work with Electrolysis.  Not necessarily at odds with
>    other options above.  Fetching data from each tab would be the
>    primary UI-blocker, and it would only block long enough to pass the
>    data for one tab to the sessionstore background process.  At the
>    cost of memory, could keep data for tabs around and fetch data only
>    from those that indicate dirty state.
There is no reason why session restore should have to block the UI ever
in an electrolysis world. What should happen is that when a page
navigates (or session data changes), the content process sends the
change over via a message. Because we are targeting multiple content
processes, this data will be received at various times by chrome.

The policy for content processes (enforced by the message-passing
system) is that the chrome process must not (cannot) block on content.

Offhand, sqlite sounds like a good choid, because most of the other
options seem to be basically be emulating a database/filesystem, and
we've already got one available with builtin async IO and the other
features you really want for this. I don't think that downgrade issues
are particularly relevant, because we don't normally treat saved
sessions as "precious" data.

It sounds like you ought to own bug 516755?

--BDS

_______________________________________________
dev-performance mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-performance
Reply | Threaded
Open this post in threaded view
|

Re: SessionStore size - Bloat in sessionstore.js from Google and others causing major UI freezes

Robert Kaiser
In reply to this post by Randell Jesup-3
Randell Jesup schrieb:
> e) Push the processing of the per-tab data over to a background
> thread/process.

That sounds like a good idea to me in any case. We should try to get as
much stuff away from the main thread as possible to ensure good
responsiveness of UI.

> P.S. It's nice to be back

\o/ welcome back \o/

Robert Kaiser


--
Note that any statements of mine - no matter how passionate - are never
meant to be offensive but very often as food for thought or possible
arguments that we as a community should think about. And most of the
time, I even appreciate irony and fun! :)
_______________________________________________
dev-performance mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-performance
Reply | Threaded
Open this post in threaded view
|

Re: SessionStore size - Bloat in sessionstore.js from Google and others causing major UI freezes

Boris Zbarsky
In reply to this post by Randell Jesup-3
On 7/19/11 11:11 AM, Benjamin Smedberg wrote:
> because we don't normally treat saved sessions as "precious" data.

This is arguably a mistake....

I know I treat _my_ saved sessions as precious data.

-Boris
_______________________________________________
dev-performance mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-performance
Reply | Threaded
Open this post in threaded view
|

Re: SessionStore size - Bloat in sessionstore.js from Google and others causing major UI freezes

Mike Shaver
In reply to this post by Randell Jesup-3
> e) Push the processing of the per-tab data over to a background
>   thread/process.
>
>   May play into work with Electrolysis.  Not necessarily at odds with
>   other options above.  Fetching data from each tab would be the
>   primary UI-blocker,

IMO, that's not really OK long term (e10s-term) -- the tabs will need
to report their state back to chrome, so that the blocking goes the
other way.

> Others?  Any ideas out there?

File-per-tab containing all the data for that tab, maybe?  Or two
files per tab, to avoid re-writing the big sessionstorage data every
time, if it doesn't change that often.  This would let us write out
the current tab more frequently than background tabs, as well.

Mike
_______________________________________________
dev-performance mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-performance
Reply | Threaded
Open this post in threaded view
|

Re: SessionStore size - Bloat in sessionstore.js from Google and others causing major UI freezes

Mike Shaver
In reply to this post by Boris Zbarsky
On Tue, Jul 19, 2011 at 11:41 AM, Boris Zbarsky <[hidden email]> wrote:
> On 7/19/11 11:11 AM, Benjamin Smedberg wrote:
>>
>> because we don't normally treat saved sessions as "precious" data.
>
> This is arguably a mistake....
>
> I know I treat _my_ saved sessions as precious data.

Across upgrades, though?

Mike
_______________________________________________
dev-performance mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-performance
Reply | Threaded
Open this post in threaded view
|

Re: SessionStore size - Bloat in sessionstore.js from Google and others causing major UI freezes

Boris Zbarsky
In reply to this post by Boris Zbarsky
On 7/19/11 11:47 AM, Mike Shaver wrote:

> On Tue, Jul 19, 2011 at 11:41 AM, Boris Zbarsky<[hidden email]>  wrote:
>> On 7/19/11 11:11 AM, Benjamin Smedberg wrote:
>>>
>>> because we don't normally treat saved sessions as "precious" data.
>>
>> This is arguably a mistake....
>>
>> I know I treat _my_ saved sessions as precious data.
>
> Across upgrades, though?

Across an upgrade, yes.  Across downgrades, no, but that's because I
never downgrade.  ;)

-Boris
_______________________________________________
dev-performance mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-performance
Reply | Threaded
Open this post in threaded view
|

Re: SessionStore size - Bloat in sessionstore.js from Google and others causing major UI freezes

Randell Jesup-3
In reply to this post by Randell Jesup-3
I cross-posted this at first since .performance seemed pretty quiet
recently, but I think it makes sense for future comments to all be there
if people don't mind.

--
Randell Jesup, Mozilla Corporation
Remove ".news" for personal email
_______________________________________________
dev-performance mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-performance
Reply | Threaded
Open this post in threaded view
|

Re: SessionStore size - Bloat in sessionstore.js from Google and others causing major UI freezes

Randell Jesup-3
In reply to this post by Randell Jesup-3
On 7/19/2011 11:46 AM, Mike Shaver wrote:

>> e) Push the processing of the per-tab data over to a background
>>    thread/process.
>>
>>    May play into work with Electrolysis.  Not necessarily at odds with
>>    other options above.  Fetching data from each tab would be the
>>    primary UI-blocker,
>
> IMO, that's not really OK long term (e10s-term) -- the tabs will need
> to report their state back to chrome, so that the blocking goes the
> other way.

Right - I didn't know the E10s architecture yet (on my list...) ;-)

>> Others?  Any ideas out there?
>
> File-per-tab containing all the data for that tab, maybe?  Or two
> files per tab, to avoid re-writing the big sessionstorage data every
> time, if it doesn't change that often.  This would let us write out
> the current tab more frequently than background tabs, as well.

[ Hi Mike. Long time no chat  :-) ]

That can be done with many of these ideas, especially the DB-based ones.
  And file-per-tab is pretty much the same as a DB, it's just
filesystem-as-dumb-DB.  :-)  Not that it's necessarily bad.

Kyle (in .platform - that will teach me to cross-post) and Taras
suggested LevelDB over SQLite (we don't really need much in the way of
complexity here, mostly just safety against crashes and large blob
storage - which may speak to files, since filesystems are good at that).

An intern has expressed interest in trying some of the options proposed.
I dropped some dumb testcode in the bug to help.

--
Randell Jesup, Mozilla Corporation
Remove ".news" for personal email
_______________________________________________
dev-performance mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-performance
Reply | Threaded
Open this post in threaded view
|

Re: SessionStore size - Bloat in sessionstore.js from Google and others causing major UI freezes

Joshua Cranmer-2
On 7/19/2011 2:29 PM, Randell Jesup wrote:
> Kyle (in .platform - that will teach me to cross-post) and Taras
> suggested LevelDB over SQLite (we don't really need much in the way of
> complexity here, mostly just safety against crashes and large blob
> storage - which may speak to files, since filesystems are good at that).
LevelDB is designed to sensibly handle values from the small sizes to
multi-GB per value, or so I am told.
_______________________________________________
dev-performance mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-performance
Reply | Threaded
Open this post in threaded view
|

Re: SessionStore size - Bloat in sessionstore.js from Google and others causing major UI freezes

Taras Glek-3
In reply to this post by Randell Jesup-3
On 07/19/2011 02:29 PM, Randell Jesup wrote:

> On 7/19/2011 11:46 AM, Mike Shaver wrote:
>>> e) Push the processing of the per-tab data over to a background
>>> thread/process.
>>>
>>> May play into work with Electrolysis. Not necessarily at odds with
>>> other options above. Fetching data from each tab would be the
>>> primary UI-blocker,
>>
>> IMO, that's not really OK long term (e10s-term) -- the tabs will need
>> to report their state back to chrome, so that the blocking goes the
>> other way.
>
> Right - I didn't know the E10s architecture yet (on my list...) ;-)
>
>>> Others? Any ideas out there?
>>
>> File-per-tab containing all the data for that tab, maybe? Or two
>> files per tab, to avoid re-writing the big sessionstorage data every
>> time, if it doesn't change that often. This would let us write out
>> the current tab more frequently than background tabs, as well.
>
> [ Hi Mike. Long time no chat :-) ]
>
> That can be done with many of these ideas, especially the DB-based ones.
> And file-per-tab is pretty much the same as a DB, it's just
> filesystem-as-dumb-DB. :-) Not that it's necessarily bad.

So just to clarify. Conceptually it may be the same as a db. In practice
keeping independent items in a file is a very bad idea. Ie if you have 7
tabs saved and then you close tab 3, you get in an ugly situation.
A database will try to pretend it's ok at expense of serious
fragmentation, requiring a vacuum, etc.

I think a sessionrestore directory with a json(or something else) per
tab is the only sane solution here. Let the filesystem do what it's good at.

The other benefit of file-per-tab is that they can be fsynced
independently without blocking io on other tab stores.

>
> Kyle (in .platform - that will teach me to cross-post) and Taras
> suggested LevelDB over SQLite (we don't really need much in the way of
> complexity here, mostly just safety against crashes and large blob
> storage - which may speak to files, since filesystems are good at that).
>
> An intern has expressed interest in trying some of the options proposed.
> I dropped some dumb testcode in the bug to help.

First step should be adding telemetry to record sizes of json files and
time to read/write them.

Taras
>

_______________________________________________
dev-performance mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-performance