Reducing NSS's allocation rate

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Reducing NSS's allocation rate

Nicholas Nethercote
Hi,

I've been doing some heap allocation profiling and found that during
basic usage NSS accounts for 1/3 of all of Firefox's cumulative (*not*
live) heap allocations. We're talking gigabytes of allocations in
short browsing sessions. That is *insane*.

I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1095272 about
this. I've written several patches that fix problems, one of which has
r+ and is awaiting checkin; check the dependent bugs.

This is making Ryan Sleevi is nervous and he wanted me to post
something here about my plans. So here they are: I want to reduce
unnecessary allocations. I want to do so in a very non-intrusive
fashion: I'm aware that NSS is security-sensitive code, and TBH it's
not that enjoyable to read or modify. I will plug away at this as long
as there is low-hanging fruit to be found, which may not be that much
longer.

Nick
--
dev-tech-crypto mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-crypto
Reply | Threaded
Open this post in threaded view
|

Re: Reducing NSS's allocation rate

Ryan Sleevi
On Mon, November 10, 2014 6:51 pm, Nicholas Nethercote wrote:
>  Hi,
>
>  I've been doing some heap allocation profiling and found that during
>  basic usage NSS accounts for 1/3 of all of Firefox's cumulative (*not*
>  live) heap allocations. We're talking gigabytes of allocations in
>  short browsing sessions. That is *insane*.

Could you explain why it's insane? I guess that's sort of why I poked you
on the bug. Plenty of allocators are rather smart under such churn, and
NSS itself uses an Arena allocator designed to re-use some (but
understandably not all) allocations.

Is there a set of performance criteria you're measuring? For example,
we're spending X% of CPU in the allocator, and we believe we can reduce it
to Y%, which will improve test Z.

Not to be a pain and discourage someone from hacking on NSS (we always
need more NSS hackers), but I guess I'm just trying to understand the
complexity (and locally caching / lazy instantiating always adds some
degree of complexity, though hopefully minor) vs performance (which is
hopefully being measured) tradeoffs. Further, if we aren't continuously
monitoring this, meaning it's not a metric integrated as something we
watch, then it seems very easy that any improvements you make can quickly
regress, which would be unfortunate for your work.

--
dev-tech-crypto mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-crypto
Reply | Threaded
Open this post in threaded view
|

Re: Reducing NSS's allocation rate

Julien Pierre-3
In reply to this post by Nicholas Nethercote
Personally, I would like to encourage your efforts. If you are able to
move many of these allocations from heap-based with locks, to something
stack-based instead, this will improve NSS server performance
tremendously. I would be surprised if it was a significant boost to
client apps like Firefox, though.

On the other hand, I'm not sure that there is that much low-hanging
fruit, based on the stacks you list in the bug.
Many are dictated by the design of NSS, the PKCS#11 API, and the current
softoken implementation.
Working within these constraints is not so simple. Keep in mind that
many things you might want to change cannot be, in order to preserve NSS
API binary compatibility.

Julien

On 11/10/2014 18:51, Nicholas Nethercote wrote:

> Hi,
>
> I've been doing some heap allocation profiling and found that during
> basic usage NSS accounts for 1/3 of all of Firefox's cumulative (*not*
> live) heap allocations. We're talking gigabytes of allocations in
> short browsing sessions. That is *insane*.
>
> I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1095272 about
> this. I've written several patches that fix problems, one of which has
> r+ and is awaiting checkin; check the dependent bugs.
>
> This is making Ryan Sleevi is nervous and he wanted me to post
> something here about my plans. So here they are: I want to reduce
> unnecessary allocations. I want to do so in a very non-intrusive
> fashion: I'm aware that NSS is security-sensitive code, and TBH it's
> not that enjoyable to read or modify. I will plug away at this as long
> as there is low-hanging fruit to be found, which may not be that much
> longer.
>
> Nick

--
dev-tech-crypto mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-crypto
Reply | Threaded
Open this post in threaded view
|

Re: Reducing NSS's allocation rate

Brian Smith-19
In reply to this post by Nicholas Nethercote
On Mon, Nov 10, 2014 at 6:51 PM, Nicholas Nethercote
<[hidden email]> wrote:
> I've been doing some heap allocation profiling and found that during
> basic usage NSS accounts for 1/3 of all of Firefox's cumulative (*not*
> live) heap allocations. We're talking gigabytes of allocations in
> short browsing sessions. That is *insane*.
>
> I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1095272 about
> this. I've written several patches that fix problems, one of which has
> r+ and is awaiting checkin; check the dependent bugs.

In your analysis, it would be better to use a call stack trace depth
larger than 5 that allows us to see what non-NSS function is calling
into NSS.

The checks done in mozilla::pkix's CheckPublicKeySize can easily be
optimized. But, first check how often the call stack contains
CheckPublicKey vs VerifySignedData; CheckPublicKey can be optimized
even more than VerifySignedData.

My original plans for VerifySignedData was for it to have a cache
added to it, if/when performance testing showed that there was a
performance problem. It is likely that such a cache is important, even
without the heap thrashing that you are concerned about.

Also, there is already a bug on file about caching and coalescing SSL
server cert verification results in SSLServerCertVerification. This is
trickier than the type of caching you can do in VerifySignedData but
it is potentially a bigger win. Also, I think recent changes to
Gecko's connection management (the "parallelism to a new host
restricted to 1" bug being fixed) made it more important to do at
least the coalescing part.

Note that when bug 1036103 is fixed (which will be basically whenever
I get around to posting one more patch), it will be possible to avoid
any of the NSS CERT_* API during certificate verification, if people
are willing to do a little (probably quite a bit, actiually)
refactoring.

That that, except for the calls to
SECKEY_DecodeDERSubjectPublicKeyInfo and SECKEY_ExtractPublicKey in
CheckPublicKeySize, mozilla::pkix allocates no memory at all, ever
(once CheckNameConstraints is replaced, which is the thing that is one
patch away from happening).

Cheers,
Brian
--
dev-tech-crypto mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-crypto
Reply | Threaded
Open this post in threaded view
|

Re: Reducing NSS's allocation rate

Nicholas Nethercote
On Mon, Nov 10, 2014 at 8:53 PM, Brian Smith <[hidden email]> wrote:
>>
>> I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1095272 about
>> this. I've written several patches that fix problems, one of which has
>> r+ and is awaiting checkin; check the dependent bugs.
>
> In your analysis, it would be better to use a call stack trace depth
> larger than 5 that allows us to see what non-NSS function is calling
> into NSS.

I've attached to the bug a profile that uses a stack trace depth of 10.

Nick
--
dev-tech-crypto mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-crypto
Reply | Threaded
Open this post in threaded view
|

Re: Reducing NSS's allocation rate

Nicholas Nethercote
In reply to this post by Ryan Sleevi
On Mon, Nov 10, 2014 at 7:06 PM, Ryan Sleevi
<[hidden email]> wrote:
>
> Not to be a pain and discourage someone from hacking on NSS

My patches are in the following bugs:

https://bugzilla.mozilla.org/show_bug.cgi?id=1094650
https://bugzilla.mozilla.org/show_bug.cgi?id=1095307
https://bugzilla.mozilla.org/show_bug.cgi?id=1096741

I'm happy to hear specific criticisms.

Nick
--
dev-tech-crypto mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-crypto
Reply | Threaded
Open this post in threaded view
|

Re: Reducing NSS's allocation rate

Ryan Sleevi
On Tue, November 11, 2014 10:26 am, Nicholas Nethercote wrote:

>  On Mon, Nov 10, 2014 at 7:06 PM, Ryan Sleevi
>  <[hidden email]> wrote:
> >
> > Not to be a pain and discourage someone from hacking on NSS
>
>  My patches are in the following bugs:
>
>  https://bugzilla.mozilla.org/show_bug.cgi?id=1094650
>  https://bugzilla.mozilla.org/show_bug.cgi?id=1095307
>  https://bugzilla.mozilla.org/show_bug.cgi?id=1096741
>
>  I'm happy to hear specific criticisms.
>
>  Nick
>  --

Not trying to be a pain, but I don't think that's fair to position like
that. I'd rather the first question be answered - each one of your patches
adds more branches and more complexity. That part is indisputable, even if
it may be seen as small. However, what measures do we have to ensure that
this is meaningfully improving any objective measure of performance (other
than allocation churn, which allocators are exceptionally capable of
handling)? How do we ensure this doesn't regress? Otherwise, we're adding
complexity without any benefits. And I don't think there are - or at
least, I haven't seen, other than "reduces allocations".

--
dev-tech-crypto mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-crypto
Reply | Threaded
Open this post in threaded view
|

Re: Reducing NSS's allocation rate

Robert Relyea
On 11/11/2014 12:32 PM, Ryan Sleevi wrote:

> On Tue, November 11, 2014 10:26 am, Nicholas Nethercote wrote:
>>   On Mon, Nov 10, 2014 at 7:06 PM, Ryan Sleevi
>>   <[hidden email]> wrote:
>>> Not to be a pain and discourage someone from hacking on NSS
>>   My patches are in the following bugs:
>>
>>   https://bugzilla.mozilla.org/show_bug.cgi?id=1094650
>>   https://bugzilla.mozilla.org/show_bug.cgi?id=1095307
>>   https://bugzilla.mozilla.org/show_bug.cgi?id=1096741
>>
>>   I'm happy to hear specific criticisms.
>>
>>   Nick
>>   --
> Not trying to be a pain, but I don't think that's fair to position like
> that. I'd rather the first question be answered - each one of your patches
> adds more branches and more complexity.
OK, looking at that patches, I would challege that assertion. The patch
in  https://bugzilla.mozilla.org/show_bug.cgi?id=1094650 actually
simplifies the code. I do agree we need to make complexity versus reward
calculations, but most of these changes don't raise the complexity bar
very high, if at all.
> That part is indisputable, even if
> it may be seen as small.
Sorry, I just disputed it;).
>   However, what measures do we have to ensure that
> this is meaningfully improving any objective measure of performance (other
> than allocation churn, which allocators are exceptionally capable of
> handling)?
I agree that these changes should be targeted based on instrumentation.
I believe Nicholas has provided this in the bug.
>   How do we ensure this doesn't regress?
I think that depends on the change.  Certainly
https://bugzilla.mozilla.org/show_bug.cgi?id=1095307 is simple enough,
that the combination of review and normal testing should give us
confidence in the patch's correctness.


I'd say the same forhttps://bugzilla.mozilla.org/show_bug.cgi?id=1094650  .
The patch in bug  https://bugzilla.mozilla.org/show_bug.cgi?id=1096741  seems more risky, as it delays allocation. So it would require more testing and and a more dillegent review, and may actually cross the complexity/benefit bar for me. (also, the code in question is code meant to avoid cache attacks, we want to make sure we don't reintroduce a new back channel attack).

> Otherwise, we're adding
> complexity without any benefits. And I don't think there are - or at
> least, I haven't seen, other than "reduces allocations".
For me the bar is pretty low, particularly since I've seen patches that
actually simplify the code.  The 'use static until some threshold, the
malloc' is a pretty well established meme in NSS already, and is
particularly safe if we can see it's entire use over a single function.

The bigger issue is are we getting proper bang for the buck.  Here we
have an NSS user who is admittedly tuning for his particular workload,
but doing performance tuning. It's targeted by hitting the largest
offenders using instrumented tools to figure that out given the client's
workload, then knocking out the low hanging fruit. I think that's pretty
classic optimization strategy.

bob

--
dev-tech-crypto mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-crypto

smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Reducing NSS's allocation rate

Brian Smith-19
In reply to this post by Nicholas Nethercote
On Mon, Nov 10, 2014 at 9:04 PM, Nicholas Nethercote
<[hidden email]> wrote:
>> In your analysis, it would be better to use a call stack trace depth
>> larger than 5 that allows us to see what non-NSS function is calling
>> into NSS.
>
> I've attached to the bug a profile that uses a stack trace depth of 10.

Unfortunately, 10 isn't enough to see the non-NSS entry (one that
doesn't start with "security/nss/") for every case. However, it looks
like the data supports the types of changes that you are making and
also my suggestions for coalescing and caching results, as well as my
suggestion to avoid constructing CERTCertificate objects. Depending on
how much effort you're willing to invest in this, many (probably most)
of those allocations can be avoided. David Keeler is very familiar
with the code in security/certverifier and security/manager/ssl/src
that would be changed to implement the additional things I suggested,
so I suggest you talk to him about it.

Cheers,
Brian
--
dev-tech-crypto mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-crypto
Reply | Threaded
Open this post in threaded view
|

Re: Reducing NSS's allocation rate

Nicholas Nethercote
On Tue, Nov 11, 2014 at 11:10 PM, Brian Smith <[hidden email]> wrote:
>>
>> I've attached to the bug a profile that uses a stack trace depth of 10.
>
> Unfortunately, 10 isn't enough to see the non-NSS entry (one that
> doesn't start with "security/nss/") for every case.

I've now attached to the bug a profile that uses a stack trace depth
of 24. That's the maximum I have in this particular profile, and
definitely gets out of security/nss/ :)

Nick
--
dev-tech-crypto mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-tech-crypto