Proposal: Array.prototype.shuffle (Fisher-Yates)

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Proposal: Array.prototype.shuffle (Fisher-Yates)

John Terenzio
Many scripting languages, such as Ruby, PHP, and Python (via the random module), have a built-in for properly shuffling an array. I propose adding Array.prototype.shuffle to the ES spec. I think it should act like Array.prototype.reverse, shuffling the array in place and returning a reference to it. It should use the Fisher-Yates algorithm. See https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle

Since this functionality is missing, some less-seasoned developers fall into the trap of using the "sort by random" method, which produces bias. See http://sroucheray.org/blog/2009/11/array-sort-should-not-be-used-to-shuffle-an-array/

In a perfect world, this would use a better random number generator than the existing Math.random, but for now an implementation like the one below would still be a vast improvement over having to implement manually and possibly incorrectly. See https://bugzilla.mozilla.org/show_bug.cgi?id=322529


I don't think adding this to the spec would cause many compatibility issues and the implementation should be very straightforward. Let me know what you think!

Best,
John

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Array.prototype.shuffle (Fisher-Yates)

Axel Rauschmayer
If it was up to me, I’d add most if not all Underscore.js functions to the standard library (possibly via a module). The one function I most often miss is _.range() [1], which can be used for all kinds of things, including for filling an array with values:

> Array.range(5).map(() => 'x')
[ 'x', 'x', 'x', 'x', 'x' ]

A more principled approach to determine candidates for ES6 inclusion: search large code bases that depend on Underscore and determine how much each of its functions is actually used.


On May 16, 2013, at 23:02 , John Terenzio <[hidden email]> wrote:

Many scripting languages, such as Ruby, PHP, and Python (via the random module), have a built-in for properly shuffling an array. I propose adding Array.prototype.shuffle to the ES spec. I think it should act like Array.prototype.reverse, shuffling the array in place and returning a reference to it. It should use the Fisher-Yates algorithm. See https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle

Since this functionality is missing, some less-seasoned developers fall into the trap of using the "sort by random" method, which produces bias. See http://sroucheray.org/blog/2009/11/array-sort-should-not-be-used-to-shuffle-an-array/

In a perfect world, this would use a better random number generator than the existing Math.random, but for now an implementation like the one below would still be a vast improvement over having to implement manually and possibly incorrectly. See https://bugzilla.mozilla.org/show_bug.cgi?id=322529


I don't think adding this to the spec would cause many compatibility issues and the implementation should be very straightforward. Let me know what you think!

Best,
John
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss

-- 
Dr. Axel Rauschmayer



_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Re: Proposal: Array.prototype.shuffle (Fisher-Yates)

Alexander Lichter
In reply to this post by John Terenzio
Hey all!

Though this proposal was created more than five years ago, it's still
something that should be standardized.

As the algorithm is known and an implementation wouldn't take ages,
bringing in Array.prototype.shuffle is a low-hanging fruit.

Furthermore, it helps to provide a "real" random shuffle mechanism,
keeping people away from trying to do implement their own (biased or
wrong) algorithm (see https://stackoverflow.com/a/18650169/3975480 for
example).

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Re: Proposal: Array.prototype.shuffle (Fisher-Yates)

Naveen Chawla
I don't think there's such a thing as "real random" in digital algos, just "pseudo random".

Apart from card games, what's the use case?

(I'm not in TC39 so I can't vote on this)

I think that this comes after `toObject(keyCallback)`, `takeWhile`, `skipWhile`, and `forEachReturningOriginalArray` in terms of usefulness! (and maybe even some others)

On Sun, 29 Apr 2018 at 18:28 Alexander Lichter <[hidden email]> wrote:
Hey all!

Though this proposal was created more than five years ago, it's still
something that should be standardized.

As the algorithm is known and an implementation wouldn't take ages,
bringing in Array.prototype.shuffle is a low-hanging fruit.

Furthermore, it helps to provide a "real" random shuffle mechanism,
keeping people away from trying to do implement their own (biased or
wrong) algorithm (see https://stackoverflow.com/a/18650169/3975480 for
example).

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Re: Proposal: Array.prototype.shuffle (Fisher-Yates)

Isiah Meadows-2
I think this would be better suited for a library function rather than
a language feature. I could see this also being useful also for
randomized displays, but that's about it. And I'm not sure what an
engine could provide here that a library couldn't - you can't really
get much faster than what's in the language (minus bounds checking,
but the likely frequent cache misses will eclipse that greatly), and
it's not unlocking any real new possibilities.

(Not in TC39, either, BTW.)
-----

Isiah Meadows
[hidden email]

Looking for web consulting? Or a new website?
Send me an email and we can get started.
www.isiahmeadows.com


On Sun, Apr 29, 2018 at 12:34 PM, Naveen Chawla <[hidden email]> wrote:

> I don't think there's such a thing as "real random" in digital algos, just
> "pseudo random".
>
> Apart from card games, what's the use case?
>
> (I'm not in TC39 so I can't vote on this)
>
> I think that this comes after `toObject(keyCallback)`, `takeWhile`,
> `skipWhile`, and `forEachReturningOriginalArray` in terms of usefulness!
> (and maybe even some others)
>
> On Sun, 29 Apr 2018 at 18:28 Alexander Lichter <[hidden email]> wrote:
>>
>> Hey all!
>>
>> Though this proposal was created more than five years ago, it's still
>> something that should be standardized.
>>
>> As the algorithm is known and an implementation wouldn't take ages,
>> bringing in Array.prototype.shuffle is a low-hanging fruit.
>>
>> Furthermore, it helps to provide a "real" random shuffle mechanism,
>> keeping people away from trying to do implement their own (biased or
>> wrong) algorithm (see https://stackoverflow.com/a/18650169/3975480 for
>> example).
>>
>> _______________________________________________
>> es-discuss mailing list
>> [hidden email]
>> https://mail.mozilla.org/listinfo/es-discuss
>
>
> _______________________________________________
> es-discuss mailing list
> [hidden email]
> https://mail.mozilla.org/listinfo/es-discuss
>
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Re: Proposal: Array.prototype.shuffle (Fisher-Yates)

Tab Atkins Jr.
On Sun, Apr 29, 2018 at 10:01 AM, Isiah Meadows <[hidden email]> wrote:
> I think this would be better suited for a library function rather than
> a language feature. I could see this also being useful also for
> randomized displays, but that's about it. And I'm not sure what an
> engine could provide here that a library couldn't - you can't really
> get much faster than what's in the language (minus bounds checking,
> but the likely frequent cache misses will eclipse that greatly), and
> it's not unlocking any real new possibilities.

Yes, this is very easily doable in userland, and a built-in will have
little if any speed benefit.

The point, tho, is that this is (a) a relatively common operation, and
(b) very, very easy to do *wrong*, as demonstrated by the SO link.
Thus it's a reasonable candidate for inclusion as a built-in anyway,
to avoid the footgun of trying to do it yourself and screwing it up.

~TJ
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Re: Proposal: Array.prototype.shuffle (Fisher-Yates)

Alexander Lichter
On 29.04.2018 18:34, Naveen Chawla wrote:
> I don't think there's such a thing as "real random" in digital algos,
> just "pseudo random".
You are right. I meant 'unbiased' pseudo randomness here.

> Apart from card games, what's the use case?
There are a lot of potential use cases. The best that comes into my mind
is sampling test data.


On 29.04.2018 19:01, Isiah Meadows wrote:
> I think this would be better suited for a library function rather than a language feature. I could see this also being useful also for
> randomized displays, but that's about it. And I'm not sure what an
> engine could provide here that a library couldn't - you can't really
> get much faster than what's in the language (minus bounds checking,
> but the likely frequent cache misses will eclipse that greatly), and
> it's not unlocking any real new possibilities.
As Tab Atkins Jr. already pointed out it's not about performance
benefits. It's about how error-prone custom shuffle implementations
are/can be.
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Re: Proposal: Array.prototype.shuffle (Fisher-Yates)

Isiah Meadows-2
BTW, I added this to my list of various proposed array additions (as a
weak one) [1].

I did do a little reading up and found that in general, there's a
major glitch that makes shuffling very hard to get right (issues even
Underscore's `_.shuffle` doesn't address), specifically that of the
size of the random number generator vs how many permutations it can
hit. Specifically, if you want any properly unbiased shuffles covering
all permutations of arrays larger than 34 entries (and that's not a
lot), you can't use any major engines' `Math.random` (which typically
have a max seed+period of 128 bits). You have to roll your own
implementation, and you need at least something like xorshift1024* [2]
(1024-bit max seed/period size) for up to 170 entries, MT19937
[3]/WELL19937 [4] (seed/period up to 2^19937-1) for up to 2080
entries, or MT44497/WELL44497 for up to 4199 entries. Keep in mind the
minimum seed/period size in bits grows roughly `ceil(log2(fact(N)))`
where `N` is the length of the array [5], and the only way you can
guarantee you can even generate all possible permutations of all
arrays (ignoring potential bias) is through a true hardware generator
(with its potentially infinite number of possibilities). Also, another
concern is that the loop's bottleneck is specifically the random
number generation, so you can't get too slow without resulting in a
*very* slow shuffle.

If you want anything minimally biased and much larger than that,
you'll need a cryptographically secure pseudorandom number generator
just to cover the possible states. (This is about where the
intersection meets between basic statistics and cryptography, since
cipher blocks are frequently that long.) But I'm leaving that as out
of scope of that proposal.

[1]: https://github.com/isiahmeadows/array-additions-proposal#arrayprototypeshuffle
[2]: https://en.wikipedia.org/wiki/Xorshift#xorshift*
[3]: https://en.wikipedia.org/wiki/Mersenne_Twister
[4]: https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear
[5]: http://www.wolframalpha.com/input/?i=plot+ceiling(log2(x!))+where+0+%3C+x+%3C+10000

-----

Isiah Meadows
[hidden email]

Looking for web consulting? Or a new website?
Send me an email and we can get started.
www.isiahmeadows.com


On Sun, Apr 29, 2018 at 4:01 PM, Alexander Lichter <[hidden email]> wrote:

> On 29.04.2018 18:34, Naveen Chawla wrote:
>>
>> I don't think there's such a thing as "real random" in digital algos, just
>> "pseudo random".
>
> You are right. I meant 'unbiased' pseudo randomness here.
>
>> Apart from card games, what's the use case?
>
> There are a lot of potential use cases. The best that comes into my mind is
> sampling test data.
>
>
> On 29.04.2018 19:01, Isiah Meadows wrote:
>>
>> I think this would be better suited for a library function rather than a
>> language feature. I could see this also being useful also for
>> randomized displays, but that's about it. And I'm not sure what an
>> engine could provide here that a library couldn't - you can't really
>> get much faster than what's in the language (minus bounds checking,
>> but the likely frequent cache misses will eclipse that greatly), and
>> it's not unlocking any real new possibilities.
>
> As Tab Atkins Jr. already pointed out it's not about performance benefits.
> It's about how error-prone custom shuffle implementations are/can be.
>
> _______________________________________________
> es-discuss mailing list
> [hidden email]
> https://mail.mozilla.org/listinfo/es-discuss
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Re: Proposal: Array.prototype.shuffle (Fisher-Yates)

J Decker
I can see that's certainly something that can be gotten wrong; the in-place sort is kind-of nice; but you end up hitting most numbers twice, and with a longer array (say a set of 75 bingo balls) you can move the same number multiple times.

which gives numbers at the end a higher chance of being at the start when all is done; and almost guaranteed to not be at the end.  I use sha2 as a stream of bits; and when it runs out of bits, invokes a callback to get more salt ( that is optional, although adding Date.now() is sufficient to increase the overall entropy slightly  ).   This can also be used as a generator procedural content generation, since you can specify the initial salt and subsequent salts,  or even copy the state, and fork two streams using slightly different progressive entropy.

And then to shuffle, for each number in the array, assign a random number; hang in a binary tree from least to most; then gather the tree back out.  requires additional memory approximately the 4xsize of the original array tough temporarily.  (don't seem to have a JS version, but the C version ports pretty easily)

This allows any number to result in any position equally.
The SHA2 hash generates very good random number characteristics; evaluating with diehard tests; chi-square etc.
Allowing more salt to be introduced periodically destroys any overall period (even if it is in the realm of 256 bits).

if you were shuffling a deck of cards (52) you only need random values that are 6-7 bits at a time...

It is slower than say http://www.pcg-random.org/using-pcg.html or mersenne, both of which I found to be very poor.  pcg generates zeros like 70% of the time.

While a good shuffle should be able to start from the initial state and generate a good shuffled result, it is slightly better to progressively shuffle the previous result array into new positions than to start from an initial state and compute a 1-off.

On Sun, Apr 29, 2018 at 3:27 PM, Isiah Meadows <[hidden email]> wrote:
BTW, I added this to my list of various proposed array additions (as a
weak one) [1].

I did do a little reading up and found that in general, there's a
major glitch that makes shuffling very hard to get right (issues even
Underscore's `_.shuffle` doesn't address), specifically that of the
size of the random number generator vs how many permutations it can
hit. Specifically, if you want any properly unbiased shuffles covering
all permutations of arrays larger than 34 entries (and that's not a
lot), you can't use any major engines' `Math.random` (which typically
have a max seed+period of 128 bits). You have to roll your own
implementation, and you need at least something like xorshift1024* [2]
(1024-bit max seed/period size) for up to 170 entries, MT19937
[3]/WELL19937 [4] (seed/period up to 2^19937-1) for up to 2080
entries, or MT44497/WELL44497 for up to 4199 entries. Keep in mind the
minimum seed/period size in bits grows roughly `ceil(log2(fact(N)))`
where `N` is the length of the array [5], and the only way you can
guarantee you can even generate all possible permutations of all
arrays (ignoring potential bias) is through a true hardware generator
(with its potentially infinite number of possibilities). Also, another
concern is that the loop's bottleneck is specifically the random
number generation, so you can't get too slow without resulting in a
*very* slow shuffle.

If you want anything minimally biased and much larger than that,
you'll need a cryptographically secure pseudorandom number generator
just to cover the possible states. (This is about where the
intersection meets between basic statistics and cryptography, since
cipher blocks are frequently that long.) But I'm leaving that as out
of scope of that proposal.

[1]: https://github.com/isiahmeadows/array-additions-proposal#arrayprototypeshuffle
[2]: https://en.wikipedia.org/wiki/Xorshift#xorshift*
[3]: https://en.wikipedia.org/wiki/Mersenne_Twister
[4]: https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear
[5]: http://www.wolframalpha.com/input/?i=plot+ceiling(log2(x!))+where+0+%3C+x+%3C+10000

-----

Isiah Meadows
[hidden email]

Looking for web consulting? Or a new website?
Send me an email and we can get started.
www.isiahmeadows.com


On Sun, Apr 29, 2018 at 4:01 PM, Alexander Lichter <[hidden email]> wrote:
> On 29.04.2018 18:34, Naveen Chawla wrote:
>>
>> I don't think there's such a thing as "real random" in digital algos, just
>> "pseudo random".
>
> You are right. I meant 'unbiased' pseudo randomness here.
>
>> Apart from card games, what's the use case?
>
> There are a lot of potential use cases. The best that comes into my mind is
> sampling test data.
>
>
> On 29.04.2018 19:01, Isiah Meadows wrote:
>>
>> I think this would be better suited for a library function rather than a
>> language feature. I could see this also being useful also for
>> randomized displays, but that's about it. And I'm not sure what an
>> engine could provide here that a library couldn't - you can't really
>> get much faster than what's in the language (minus bounds checking,
>> but the likely frequent cache misses will eclipse that greatly), and
>> it's not unlocking any real new possibilities.
>
> As Tab Atkins Jr. already pointed out it's not about performance benefits.
> It's about how error-prone custom shuffle implementations are/can be.
>
> _______________________________________________
> es-discuss mailing list
> [hidden email]
> https://mail.mozilla.org/listinfo/es-discuss
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Re: Proposal: Array.prototype.shuffle (Fisher-Yates)

Tab Atkins Jr.
On Sun, Apr 29, 2018 at 4:31 PM, J Decker <[hidden email]> wrote:
> I can see that's certainly something that can be gotten wrong; the in-place
> sort is kind-of nice; but you end up hitting most numbers twice, and with a
> longer array (say a set of 75 bingo balls) you can move the same number
> multiple times.
> which gives numbers at the end a higher chance of being at the start when
> all is done; and almost guaranteed to not be at the end.

This is true of a "naive" in-place shuffle, which draws the "item to
be placed at the end" from the full array, rather than from just the
unshuffled portion. But the proper Fisher-Yates algorithm does not
have this bias, and gives a proper unbiased sort, as shown by a
multitude of links across the web, many of which can be found in the
SO answers linked from earlier in this thread.

> And then to shuffle, for each number in the array, assign a random number;
> hang in a binary tree from least to most; then gather the tree back out.

While this shuffling method (assign a random number to each value,
then sort by that) also works and is unbiased, it takes O(nlogn) time,
and may, as you note for your implementation, have significant space
costs as well.  Fisher-Yates is a faster (linear-time, bounded by the
cost of generating N random numbers) and more space-efficient unbiased
algorithm.

~TJ
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Re: Proposal: Array.prototype.shuffle (Fisher-Yates)

Isiah Meadows-2
In reply to this post by J Decker
Replying inline, so I can target specific points better.

-----

Isiah Meadows
[hidden email]

Looking for web consulting? Or a new website?
Send me an email and we can get started.
www.isiahmeadows.com


On Sun, Apr 29, 2018 at 7:31 PM, J Decker <[hidden email]> wrote:
> I can see that's certainly something that can be gotten wrong; the in-place
> sort is kind-of nice; but you end up hitting most numbers twice, and with a
> longer array (say a set of 75 bingo balls) you can move the same number
> multiple times.

I specifically mentioned this (the easy to screw up) in the link I
created and posted here.

>
> which gives numbers at the end a higher chance of being at the start when
> all is done; and almost guaranteed to not be at the end.  I use sha2 as a
> stream of bits; and when it runs out of bits, invokes a callback to get more
> salt ( that is optional, although adding Date.now() is sufficient to
> increase the overall entropy slightly  ).   This can also be used as a
> generator procedural content generation, since you can specify the initial
> salt and subsequent salts,  or even copy the state, and fork two streams
> using slightly different progressive entropy.
> https://github.com/d3x0r/-/blob/master/org.d3x0r.common/salty_random_generator.js
>
> And then to shuffle, for each number in the array, assign a random number;
> hang in a binary tree from least to most; then gather the tree back out.
> requires additional memory approximately the 4xsize of the original array
> tough temporarily.  (don't seem to have a JS version, but the C version
> ports pretty easily)
> https://github.com/d3x0r/SACK/blob/master/src/utils/cardodds/shuffle.c

For an in-place shuffle, that's still not especially efficient. It's
still taking O(n log n) stack space + global space + time, even though
it doesn't require real heap allocation.

>
> This allows any number to result in any position equally.
> The SHA2 hash generates very good random number characteristics; evaluating
> with diehard tests; chi-square etc.
> Allowing more salt to be introduced periodically destroys any overall period
> (even if it is in the realm of 256 bits).

Basing it on a cryptographic primitive is what I was referring to for
larger arrays. You need one that powerful for large arrays if you hope
to maintain any sort of statistical soundness. However, for small
arrays (like a "deck" of cards), it is likely overkill.

>
> if you were shuffling a deck of cards (52) you only need random values that
> are 6-7 bits at a time...
>
> It is slower than say http://www.pcg-random.org/using-pcg.html or mersenne,
> both of which I found to be very poor.  pcg generates zeros like 70% of the
> time.

If it's generating zeros that frequently, don't use it. The website
itself looks a bit on the sketchy side to me, but that's just me.\* A
Mersenne Twister with a large state shouldn't give you too much
trouble for simple stuff, and if you take the SFMT variant [1], you
shouldn't have speed issues too much. Also, note that I did specify a
higher-quality iteration on that, WELL [2], which is similar, but
improves on both the statistical properties and speed of the
traditional Mersenne Twister.

\* If you're not cryptographically secure, don't compare yourself to
ones that are, especially trying to make yourself look better than
them. Also, it missed a few other high quality generators (like the
xorshift128 family and xorshiro128+, two of the most efficient,
effective fast PRNGs out there) in its online comparison.

[1]: https://en.wikipedia.org/wiki/Mersenne_Twister#SFMT
[2]: https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear

>
> While a good shuffle should be able to start from the initial state and
> generate a good shuffled result, it is slightly better to progressively
> shuffle the previous result array into new positions than to start from an
> initial state and compute a 1-off.

Evidence? (BTW, the Fisher-Yates shuffle is theoretically as biased as
the RNG that powers it.)

>
> On Sun, Apr 29, 2018 at 3:27 PM, Isiah Meadows <[hidden email]>
> wrote:
>>
>> BTW, I added this to my list of various proposed array additions (as a
>> weak one) [1].
>>
>> I did do a little reading up and found that in general, there's a
>> major glitch that makes shuffling very hard to get right (issues even
>> Underscore's `_.shuffle` doesn't address), specifically that of the
>> size of the random number generator vs how many permutations it can
>> hit. Specifically, if you want any properly unbiased shuffles covering
>> all permutations of arrays larger than 34 entries (and that's not a
>> lot), you can't use any major engines' `Math.random` (which typically
>> have a max seed+period of 128 bits). You have to roll your own
>> implementation, and you need at least something like xorshift1024* [2]
>> (1024-bit max seed/period size) for up to 170 entries, MT19937
>> [3]/WELL19937 [4] (seed/period up to 2^19937-1) for up to 2080
>> entries, or MT44497/WELL44497 for up to 4199 entries. Keep in mind the
>> minimum seed/period size in bits grows roughly `ceil(log2(fact(N)))`
>> where `N` is the length of the array [5], and the only way you can
>> guarantee you can even generate all possible permutations of all
>> arrays (ignoring potential bias) is through a true hardware generator
>> (with its potentially infinite number of possibilities). Also, another
>> concern is that the loop's bottleneck is specifically the random
>> number generation, so you can't get too slow without resulting in a
>> *very* slow shuffle.
>>
>> If you want anything minimally biased and much larger than that,
>> you'll need a cryptographically secure pseudorandom number generator
>> just to cover the possible states. (This is about where the
>> intersection meets between basic statistics and cryptography, since
>> cipher blocks are frequently that long.) But I'm leaving that as out
>> of scope of that proposal.
>>
>> [1]:
>> https://github.com/isiahmeadows/array-additions-proposal#arrayprototypeshuffle
>> [2]: https://en.wikipedia.org/wiki/Xorshift#xorshift*
>> [3]: https://en.wikipedia.org/wiki/Mersenne_Twister
>> [4]: https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear
>> [5]:
>> http://www.wolframalpha.com/input/?i=plot+ceiling(log2(x!))+where+0+%3C+x+%3C+10000
>>
>> -----
>>
>> Isiah Meadows
>> [hidden email]
>>
>> Looking for web consulting? Or a new website?
>> Send me an email and we can get started.
>> www.isiahmeadows.com
>>
>>
>> On Sun, Apr 29, 2018 at 4:01 PM, Alexander Lichter <[hidden email]> wrote:
>> > On 29.04.2018 18:34, Naveen Chawla wrote:
>> >>
>> >> I don't think there's such a thing as "real random" in digital algos,
>> >> just
>> >> "pseudo random".
>> >
>> > You are right. I meant 'unbiased' pseudo randomness here.
>> >
>> >> Apart from card games, what's the use case?
>> >
>> > There are a lot of potential use cases. The best that comes into my mind
>> > is
>> > sampling test data.
>> >
>> >
>> > On 29.04.2018 19:01, Isiah Meadows wrote:
>> >>
>> >> I think this would be better suited for a library function rather than
>> >> a
>> >> language feature. I could see this also being useful also for
>> >> randomized displays, but that's about it. And I'm not sure what an
>> >> engine could provide here that a library couldn't - you can't really
>> >> get much faster than what's in the language (minus bounds checking,
>> >> but the likely frequent cache misses will eclipse that greatly), and
>> >> it's not unlocking any real new possibilities.
>> >
>> > As Tab Atkins Jr. already pointed out it's not about performance
>> > benefits.
>> > It's about how error-prone custom shuffle implementations are/can be.
>> >
>> > _______________________________________________
>> > es-discuss mailing list
>> > [hidden email]
>> > https://mail.mozilla.org/listinfo/es-discuss
>> _______________________________________________
>> es-discuss mailing list
>> [hidden email]
>> https://mail.mozilla.org/listinfo/es-discuss
>
>
_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Re: Proposal: Array.prototype.shuffle (Fisher-Yates)

J Decker

>
> While a good shuffle should be able to start from the initial state and
> generate a good shuffled result, it is slightly better to progressively
> shuffle the previous result array into new positions than to start from an
> initial state and compute a 1-off.

Evidence? (BTW, the Fisher-Yates shuffle is theoretically as biased as
the RNG that powers it.)

 
Specifically re-1-off vs progressive shuffling?  Not on-hand; it was an evaluation I did 10 years ago; and while the difference wasn't that much, it was notable.  the rate for any single ball to be in a specific position took longer (by iteration count) when resetting the array to shuffle to initial conditions (1-N)


Fisher-Yates. is biased by the usage of the number generator; not just the generator.
I've been considering how to graphically/visually demonstrate it; but I'v failed so far.

for a list of 10 numbers; and a 'perfect' RNG.
For the last number, it's 90% guaranteed to evaluate it's shuffle twice.  (10% it won't move) It has a 10% chance to be in the last position, but only a 1.1% chance to be in the second to last position.  (1/10 * 1/9 = 1/90 = 0.011...) so it will NOT be the last position 90% but it will NOT be the second to last 98.9% of the time.  For an ideal shuffle, every original position will end up in every other position 10% of the time.

 
>
> On Sun, Apr 29, 2018 at 3:27 PM, Isiah Meadows <[hidden email]>
> wrote:
>>
>> BTW, I added this to my list of various proposed array additions (as a
>> weak one) [1].
>>
>> I did do a little reading up and found that in general, there's a
>> major glitch that makes shuffling very hard to get right (issues even
>> Underscore's `_.shuffle` doesn't address), specifically that of the
>> size of the random number generator vs how many permutations it can
>> hit. Specifically, if you want any properly unbiased shuffles covering
>> all permutations of arrays larger than 34 entries (and that's not a
>> lot), you can't use any major engines' `Math.random` (which typically
>> have a max seed+period of 128 bits). You have to roll your own
>> implementation, and you need at least something like xorshift1024* [2]
>> (1024-bit max seed/period size) for up to 170 entries, MT19937
>> [3]/WELL19937 [4] (seed/period up to 2^19937-1) for up to 2080
>> entries, or MT44497/WELL44497 for up to 4199 entries. Keep in mind the
>> minimum seed/period size in bits grows roughly `ceil(log2(fact(N)))`
>> where `N` is the length of the array [5], and the only way you can
>> guarantee you can even generate all possible permutations of all
>> arrays (ignoring potential bias) is through a true hardware generator
>> (with its potentially infinite number of possibilities). Also, another
>> concern is that the loop's bottleneck is specifically the random
>> number generation, so you can't get too slow without resulting in a
>> *very* slow shuffle.
>>
>> If you want anything minimally biased and much larger than that,
>> you'll need a cryptographically secure pseudorandom number generator
>> just to cover the possible states. (This is about where the
>> intersection meets between basic statistics and cryptography, since
>> cipher blocks are frequently that long.) But I'm leaving that as out
>> of scope of that proposal.
>>
>> [1]:
>> https://github.com/isiahmeadows/array-additions-proposal#arrayprototypeshuffle
>> [2]: https://en.wikipedia.org/wiki/Xorshift#xorshift*
>> [3]: https://en.wikipedia.org/wiki/Mersenne_Twister
>> [4]: https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear
>> [5]:
>> http://www.wolframalpha.com/input/?i=plot+ceiling(log2(x!))+where+0+%3C+x+%3C+10000
>>
>> -----
>>
>> Isiah Meadows
>> [hidden email]
>>
>> Looking for web consulting? Or a new website?
>> Send me an email and we can get started.
>> www.isiahmeadows.com
>>
>>
>> On Sun, Apr 29, 2018 at 4:01 PM, Alexander Lichter <[hidden email]> wrote:
>> > On 29.04.2018 18:34, Naveen Chawla wrote:
>> >>
>> >> I don't think there's such a thing as "real random" in digital algos,
>> >> just
>> >> "pseudo random".
>> >
>> > You are right. I meant 'unbiased' pseudo randomness here.
>> >
>> >> Apart from card games, what's the use case?
>> >
>> > There are a lot of potential use cases. The best that comes into my mind
>> > is
>> > sampling test data.
>> >
>> >
>> > On 29.04.2018 19:01, Isiah Meadows wrote:
>> >>
>> >> I think this would be better suited for a library function rather than
>> >> a
>> >> language feature. I could see this also being useful also for
>> >> randomized displays, but that's about it. And I'm not sure what an
>> >> engine could provide here that a library couldn't - you can't really
>> >> get much faster than what's in the language (minus bounds checking,
>> >> but the likely frequent cache misses will eclipse that greatly), and
>> >> it's not unlocking any real new possibilities.
>> >
>> > As Tab Atkins Jr. already pointed out it's not about performance
>> > benefits.
>> > It's about how error-prone custom shuffle implementations are/can be.
>> >
>> > _______________________________________________
>> > es-discuss mailing list
>> > [hidden email]
>> > https://mail.mozilla.org/listinfo/es-discuss
>> _______________________________________________
>> es-discuss mailing list
>> [hidden email]
>> https://mail.mozilla.org/listinfo/es-discuss
>
>


_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Re: Proposal: Array.prototype.shuffle (Fisher-Yates)

Isiah Meadows-2
If you can come up with something better than what I have, feel free to file an issue against my repo. (I have minimal need, but I'd love to have something that works.)


-----

Just an item of note about algorithm choice: I strongly doubt TC39 reps would be okay with a cryptographic primitive being used in the spec, since it's pretty useless elsewhere, and I'm not sure about the legal aspect (export restrictions, etc.).

On Mon, Apr 30, 2018, 01:50 J Decker <[hidden email]> wrote:

>
> While a good shuffle should be able to start from the initial state and
> generate a good shuffled result, it is slightly better to progressively
> shuffle the previous result array into new positions than to start from an
> initial state and compute a 1-off.

Evidence? (BTW, the Fisher-Yates shuffle is theoretically as biased as
the RNG that powers it.)

 
Specifically re-1-off vs progressive shuffling?  Not on-hand; it was an evaluation I did 10 years ago; and while the difference wasn't that much, it was notable.  the rate for any single ball to be in a specific position took longer (by iteration count) when resetting the array to shuffle to initial conditions (1-N)


Fisher-Yates. is biased by the usage of the number generator; not just the generator.
I've been considering how to graphically/visually demonstrate it; but I'v failed so far.

for a list of 10 numbers; and a 'perfect' RNG.
For the last number, it's 90% guaranteed to evaluate it's shuffle twice.  (10% it won't move) It has a 10% chance to be in the last position, but only a 1.1% chance to be in the second to last position.  (1/10 * 1/9 = 1/90 = 0.011...) so it will NOT be the last position 90% but it will NOT be the second to last 98.9% of the time.  For an ideal shuffle, every original position will end up in every other position 10% of the time.

 
>
> On Sun, Apr 29, 2018 at 3:27 PM, Isiah Meadows <[hidden email]>
> wrote:
>>
>> BTW, I added this to my list of various proposed array additions (as a
>> weak one) [1].
>>
>> I did do a little reading up and found that in general, there's a
>> major glitch that makes shuffling very hard to get right (issues even
>> Underscore's `_.shuffle` doesn't address), specifically that of the
>> size of the random number generator vs how many permutations it can
>> hit. Specifically, if you want any properly unbiased shuffles covering
>> all permutations of arrays larger than 34 entries (and that's not a
>> lot), you can't use any major engines' `Math.random` (which typically
>> have a max seed+period of 128 bits). You have to roll your own
>> implementation, and you need at least something like xorshift1024* [2]
>> (1024-bit max seed/period size) for up to 170 entries, MT19937
>> [3]/WELL19937 [4] (seed/period up to 2^19937-1) for up to 2080
>> entries, or MT44497/WELL44497 for up to 4199 entries. Keep in mind the
>> minimum seed/period size in bits grows roughly `ceil(log2(fact(N)))`
>> where `N` is the length of the array [5], and the only way you can
>> guarantee you can even generate all possible permutations of all
>> arrays (ignoring potential bias) is through a true hardware generator
>> (with its potentially infinite number of possibilities). Also, another
>> concern is that the loop's bottleneck is specifically the random
>> number generation, so you can't get too slow without resulting in a
>> *very* slow shuffle.
>>
>> If you want anything minimally biased and much larger than that,
>> you'll need a cryptographically secure pseudorandom number generator
>> just to cover the possible states. (This is about where the
>> intersection meets between basic statistics and cryptography, since
>> cipher blocks are frequently that long.) But I'm leaving that as out
>> of scope of that proposal.
>>
>> [1]:
>> https://github.com/isiahmeadows/array-additions-proposal#arrayprototypeshuffle
>> [2]: https://en.wikipedia.org/wiki/Xorshift#xorshift*
>> [3]: https://en.wikipedia.org/wiki/Mersenne_Twister
>> [4]: https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear
>> [5]:
>> http://www.wolframalpha.com/input/?i=plot+ceiling(log2(x!))+where+0+%3C+x+%3C+10000
>>
>> -----
>>
>> Isiah Meadows
>> [hidden email]
>>
>> Looking for web consulting? Or a new website?
>> Send me an email and we can get started.
>> www.isiahmeadows.com
>>
>>
>> On Sun, Apr 29, 2018 at 4:01 PM, Alexander Lichter <[hidden email]> wrote:
>> > On 29.04.2018 18:34, Naveen Chawla wrote:
>> >>
>> >> I don't think there's such a thing as "real random" in digital algos,
>> >> just
>> >> "pseudo random".
>> >
>> > You are right. I meant 'unbiased' pseudo randomness here.
>> >
>> >> Apart from card games, what's the use case?
>> >
>> > There are a lot of potential use cases. The best that comes into my mind
>> > is
>> > sampling test data.
>> >
>> >
>> > On 29.04.2018 19:01, Isiah Meadows wrote:
>> >>
>> >> I think this would be better suited for a library function rather than
>> >> a
>> >> language feature. I could see this also being useful also for
>> >> randomized displays, but that's about it. And I'm not sure what an
>> >> engine could provide here that a library couldn't - you can't really
>> >> get much faster than what's in the language (minus bounds checking,
>> >> but the likely frequent cache misses will eclipse that greatly), and
>> >> it's not unlocking any real new possibilities.
>> >
>> > As Tab Atkins Jr. already pointed out it's not about performance
>> > benefits.
>> > It's about how error-prone custom shuffle implementations are/can be.
>> >
>> > _______________________________________________
>> > es-discuss mailing list
>> > [hidden email]
>> > https://mail.mozilla.org/listinfo/es-discuss
>> _______________________________________________
>> es-discuss mailing list
>> [hidden email]
>> https://mail.mozilla.org/listinfo/es-discuss
>
>

_______________________________________________
es-discuss mailing list
[hidden email]
https://mail.mozilla.org/listinfo/es-discuss