Long pauses caused by cycle collector

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Long pauses caused by cycle collector

L. David Baron
So since around when the patch to make the cycle collector stop
faulting was checked in [1] I've been noticing long (5-15 second)
pauses during my usage of trunk builds, during which the CPU was
pegged.  Since session restore landed, my typical browser usage
pattern has involved having about 50 windows open with about 200
tabs between them, so my use case is a little unusual.  I'm also
using a debug build.

On Friday I set up buttons on my panel to start and pause jprof
profiling on the trunk build that I was using, so I could diagnose
the cause of the pauses.  I just ran jprof over the resulting data,
and it confirms that the pauses are caused by the cycle collector.

Of 25076 hits in the profile, 21400 were within
nsCycleCollector::Collect.  (I think the delay in my ability to
pause the profiling when the hang ended can fully account for the
difference between those two numbers.)

The callees within nsCycleCollector::Collect are as follows:
 55479   0    21400 nsCycleCollector::Collect()
              16375 nsCycleCollector::ScanRoots()
               3622 nsCycleCollector::MarkRoots()
                582 nsCycleCollector::ForgetAll()
                450 nsCycleCollector::CollectWhite()
                290 nsBaseHashtable<nsClearingVoidPtrHashKey, PtrInfo, PtrInfo>::Clear()
                 63 nsXPConnect::FinishCycleCollection()
                  8 nsCycleCollector::Forget(nsISupports*)
                  7 .plt
                  3 nsDeque::Pop()
Note that the profile data have a strong bias towards what's
happening later in the pause, since there's also a delay (probably
more significant than the ending delay) before I start the
profiling.  This means that the BeginCycleCollection() calls (which
don't show up at all, but include a JS_GC call) and the MarkRoots()
call are probably significantly underestimated.

Of note:

EnsurePtrInfo and the Get/Put hash table operations it calls are 922
hits (mostly Get).

The top function for function time (*excluding* descendants) is
nsXPConnect::Traverse (2483 hits excluding descendants).

Notable descendants of nsXPConnect::Traverse (other than the
recursive traversal calls) are 1559 hits in XPCCallContext's
constructor, 1509 hits in its descructor, and 348 hits in
JSObjectRefcounts::Get (which is mostly the hash table Get).

A bunch of the time is spent in debugging code (note again that I'm
profiling a debug build, and it has XPCOM_MEM_LEAK_LOG set):  3128
hits in NS_LogAddRef_P, 2592 hits in NS_LogRelease_P, 3641 hits in
nsCycleCollectionParticipant::CheckForRightISupports, and 1021 hits
in JS_snprintf inside of nsXPConnect::Traverse.  But the total of
DEBUG code doesn't look like it's going to add up to a majority.

-David

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=368523

--
L. David Baron                                <URL: http://dbaron.org/ >
           Technical Lead, Layout & CSS, Mozilla Corporation

_______________________________________________
dev-performance mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-performance

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Long pauses caused by cycle collector

Graydon Hoare-3
L. David Baron wrote:
> So since around when the patch to make the cycle collector stop
> faulting was checked in [1] I've been noticing long (5-15 second)
> pauses during my usage of trunk builds, during which the CPU was
> pegged.  Since session restore landed, my typical browser usage
> pattern has involved having about 50 windows open with about 200
> tabs between them, so my use case is a little unusual.  I'm also
> using a debug build.

Yikes! I'm afraid I didn't test the collector under a 50-window, 200-tab
scenario. I didn't realize that was common. Still, let's look at what's
going on.

> On Friday I set up buttons on my panel to start and pause jprof
> profiling on the trunk build that I was using, so I could diagnose
> the cause of the pauses.  I just ran jprof over the resulting data,
> and it confirms that the pauses are caused by the cycle collector.

Thanks for doing this. It's always better to talk about numbers :)

> Of 25076 hits in the profile, 21400 were within
> nsCycleCollector::Collect.  (I think the delay in my ability to
> pause the profiling when the hang ended can fully account for the
> difference between those two numbers.)

I'd like to see the full profile if you have it; there are lots of
places where speed can be squeezed out, some quite trivially. Especially
in a debug build. Even without analyzing a profile I can point to some
obvious candidates:

  - As you noted, there's a JS_snprintf() on each object if the
    traversal through the JS graph, enabled in debug builds.

  - There are duplicate calls to traverse using a "safety callback" at
    lines 659 & 678 of nsCycleCollector.cpp. This debug code to catch
    both the parent and the child of a crashing traverse edge, rather
    than letting the pointer buffer and then crashing when we only see
    the child. This means we call traverse twice as often as we need to.
    Can be disabled.

  - The ClearingVoidPtrHashKey stuff can be degraded to VoidPtrHashKey
    if you like. "Clearing" is actually to help conservative
    heap-scanning debug tools like valgrind running on top of the
    collector avoid seeing pointers.

and less trivially:

  - The traversal algorithm itself runs twice: once in scanning phase,
    once in marking phase. The result of the first call to traverse() on
    a given pointer can be cached into a temporary edge table if we find
    we're spending too much time doing traverse logic.

> EnsurePtrInfo and the Get/Put hash table operations it calls are 922
> hits (mostly Get).

That doesn't sound unusual to me; the algorithm *is* storing thousands
(or in your case millions) of pointers in hashtables. Let's focus on
higher-level logic for a while before we get into hashtable optimization.

> The top function for function time (*excluding* descendants) is
> nsXPConnect::Traverse (2483 hits excluding descendants).

Excluding, huh. Can you isolate it to a particular statement?

> Notable descendants of nsXPConnect::Traverse (other than the
> recursive traversal calls) are 1559 hits in XPCCallContext's
> constructor, 1509 hits in its descructor, and 348 hits in
> JSObjectRefcounts::Get (which is mostly the hash table Get).

Oh! A new XPCCallContext is constructed for each XPConnect::Traverse
call, but that was just an instance of me copying "something XPConnect
functions do to get a JSContext*", not really clear reasoning. Reading
XPCCallContext's ctor, I see it does much more. Perhaps we can cache the
JSContext* used during the traversal at the beginning of cycle
collection, and reuse it between ::Traverse calls.

> A bunch of the time is spent in debugging code (note again that I'm
> profiling a debug build, and it has XPCOM_MEM_LEAK_LOG set):  3128
> hits in NS_LogAddRef_P, 2592 hits in NS_LogRelease_P, 3641 hits in
> nsCycleCollectionParticipant::CheckForRightISupports, and 1021 hits
> in JS_snprintf inside of nsXPConnect::Traverse.  But the total of
> DEBUG code doesn't look like it's going to add up to a majority.

Maybe not a majority, but it's a sizable chunk. Add this up:

     3128 hits in log addref
     2592 hits in log release
     3641 hits in CheckForRightISupports
     1021 hits in JS_snprintf
  -------------------------------------------
    10382 hits of pure debugging machinery

Of your ~20000 hits inside the collector, that's quite a bit. Assume
that we can cut the number of traverse calls further in half (removing
the safety callback) and there may only be ~5000 samples to whittle down
in the non-debug code paths. Possibly cut in half again if we decide to
cache the results of the scan traversal.

I'm willing to hack on this of course, if we can define configurations
and targets. Is it OK if the optimized build is fast and the debug build
is slow? How slow?

-graydon
_______________________________________________
dev-performance mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-performance
Reply | Threaded
Open this post in threaded view
|

Re: Long pauses caused by cycle collector

Boris Zbarsky
Graydon Hoare wrote:
> Oh! A new XPCCallContext is constructed for each XPConnect::Traverse
> call, but that was just an instance of me copying "something XPConnect
> functions do to get a JSContext*"

This is a persistent problem with XPConnect performance, actually...  If there's
a way we can cache the JSContext* (in the callback object?) that would be very nice.

-Boris
_______________________________________________
dev-performance mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-performance