So the current result is that the newer versions use about twice as much
memory. But before proceeding I'd like to know for sure where this
additional memory usage comes from.
I have a few hypotesis(in random order):
1. SpiderMonkey needs more memory for type inference information and
maybe other information required for the JIT compiler.
2. There's something wrong with my measurements or there's a bug/
difference in what "JS_GetGCParameter(Runtime, JSGC_BYTES);" returns in
v1.8.5 or the newer versions.
3. There's a bug in my patch that causes it to keep memory that should be
I've dumped the heap and used grep to get an idea from where additional
rooted objects are coming from. this is the result:
grep string in heapdump 1.8.5 turn1500 v24 turn 1500 v24turn 6000
via CscriptValRooted 82729 82404 102373
via machine_stack 32747 11355 8152
via js::AutoValueVector 0 30956 46201
via interned_atom 3995 4176 4176
via ...( 378 15290 32324
via self-hosting 0 2168 2168
via default compartment 0 820 807
via wrapper 0 170 170
via length2-static-string 0 4096 4096
via 119992 151849 200883
So the two interesting parts are "via js::AutoValueVector" and "via ...(".
For the second one the full lines look something like this:
0x7f795b047830 shape via ...(0x7f795b061580
Object).shape(0x7f7958d20330 shape).parent(0x7f795b047768 shape).parent
(0x7f795b047790 shape).parent(0x7f795b0477b8 shape).parent(0x7f795b0477e0
The js::AutoValueVector makes sense because my patch changes our code to
use that instead of a custom AutoRooter class which uses
JS_SetExtraGCRoots to register a trace hook.
I have a few questions that would help me figuring out what's happening:
1. Where are the objects rooted with the custom trace hook visible in the
v1.8.5 heapdump? The only higher number of objects is in "via
machine_stack", but I'd say that doesn't make sense.
2. What's about those shape objects ("via ...(")? Is this additional TI
or JIT compiler information?
3. At the moment I just have information about numbers of objects. Is
there a way to get information about memory usage of these objects?
Also please tell me if you have other input to this problem!
Re: Need help analyzing memory usage v24 vs v1.8.5
Not enabling the following options reduces memory usage a little bit but
it's still much higher than with v1.8.5.
Another idea I have for testing is using the old rooting approach again
instead of js::AutoValueVector. That would reveal it if the objects
rooted with the trace hook aren't in the dump or if this new approach
causes leaks somehow.
Re: Need help analyzing memory usage v24 vs v1.8.5
I've figured out that if I call JS_GC often instead of JS_MaybeGC, the
memory usage is about the same as with v1.8.5.
I didn't know that JS_MaybeGC now triggers an incremental GC slice
instead of "maybe" a full GC. This behaviour change was the cause for the
different memory usage profiles.
Where I could still need help is with fine-tuning the GC settings.
The only documentation I could find are basically the comments in jsapi.h
for the JSGCParamKey enumeration.
These descriptions are difficult to understand because I don't know
enough about the GC internals and the terms used there.
Some information about the general GC behaviour and terms would be nice.
What I know so far (could be partially wrong though):
- Incremental GC is meant to split the same time used for a full GC more
evenly. It's not meant to increase performance, but to avoid lag spikes.
- There's marking and sweeping. Marking happens in the main thread and
sweeping in a background thread (right?).
- There's high and low frequency GC. By default high frequency GC is when
GC happens more than once a second.
- The difference between high and low frequency GC is only the heap
growth factor (right?).
- JSGC_HIGH_FREQUENCY_HIGH_LIMIT sets a limit where full GC happens
similar to JSGC_MAX_BYTES. What's the difference? Is it just that the
former only applies to high frequency GC?
I've profiled memory usage a bit to figure out what the settings do (the
graph is hardly readable and the descriptions lack a bit... did it more
A few open questions at the moment:
- When do OOM errors happen and how can I avoid them?
They seem to happen also if there's memory that could be freed by a GC.
- Why does the heap always grow until a full GC happens? I see this in my
profiles and also the comments and heap-growth settings seem to imply
that. Can I make it grow slower by increasing the time per incremental GC
or the frequency of incremental GCs?
- Why do I measure nothing more than about 0.002 ms for the MaybeGC call
if I set JSGC_SLICE_TIME_BUDGET to 10? Does it really run that short?
If you want to have a look, you can download the latest patch from the
ticket I've linked above. I can give a short summary of the current state:
The performance was much worse first, but we figured out that we have to
change the way AI players share data to avoid wrapping overhead. Boris
(bz) suggested the approach with a module pattern which worked quite well
and improved performance a lot.
The related change: http://trac.wildfiregames.com/ticket/2322
He also helped a lot by pointing out other issues and showing me how to
use the tracelogging tool, Ion Spew and other tools.
I suspect the main problem that remains are our entity collections.
They are used to manage entities such as units and buildings and need to
be accessed, iterated, filtered and updated a lot. They seem to be too
polymorphic for IonMonkey.
There are some details about this here:
Another open problem is the way the JIT compiler handles cloned objects
that are passed from the engine to JS. It apparently doesn't recognice
that the object has the same type each time but sees it as a completely
different object. I don't know enough about the JIT compiler here though.
There's a bug related to this problem here (but I should probably measure
again because we didn't know about the wrapping issue at this point):
Last but not least we need a better way than JS::NotifyAnimationActivity
to prevent garbage collection of JIT code. Hacking it into Runtime.h
helps quite a bit as you can see on the profiling graph.
I've suspended my work on improving performance because I'd like to do
upgrade first. I also concluded that improving our alogrithms is far more
promising than trying to improve the JS engine (I'll still help as good
as I can if anyone wants to look at engine performance issues related to