public abstract class FastLoadedDiceRollerDiscreteSampler extends Object implements SharedStateDiscreteSampler
n
values each with an associated relative weight. If all unique items
are assigned the same weight it is more efficient to use the DiscreteUniformSampler
.
Given a list L
of n
positive numbers,
where L[i]
represents the relative weight of the i
th side, FLDR returns
integer i
with relative probability L[i]
.
FLDR produces exact samples from the specified probability distribution.
i
is precisely equal to the
rational number L[i] / m
, where m
is the sum of L
.
L[i]
is converted to the
corresponding rational number p[i] / q[i]
where p[i]
is a positive integer and
q[i]
is a power of 2. The rational weights are then normalized (exactly) to sum to unity.
Note that if exact samples are not required then an alternative sampler that ignores very small relative weights may have improved sampling performance.
This implementation is based on the algorithm in:
Feras A. Saad, Cameron E. Freer, Martin C. Rinard, and Vikash K. Mansinghka. The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions. In AISTATS 2020: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research 108, Palermo, Sicily, Italy, 2020.
Sampling uses UniformRandomProvider.nextInt()
as the source of random bits.
Modifier and Type | Method and Description |
---|---|
static FastLoadedDiceRollerDiscreteSampler |
of(UniformRandomProvider rng,
double[] weights)
Creates a sampler.
|
static FastLoadedDiceRollerDiscreteSampler |
of(UniformRandomProvider rng,
double[] weights,
int alpha)
Creates a sampler.
|
static FastLoadedDiceRollerDiscreteSampler |
of(UniformRandomProvider rng,
long[] frequencies)
Creates a sampler.
|
abstract FastLoadedDiceRollerDiscreteSampler |
withUniformRandomProvider(UniformRandomProvider rng)
Create a new instance of the sampler with the same underlying state using the given
uniform random provider as the source of randomness.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
sample, samples, samples
public abstract FastLoadedDiceRollerDiscreteSampler withUniformRandomProvider(UniformRandomProvider rng)
withUniformRandomProvider
in interface SharedStateSampler<SharedStateDiscreteSampler>
rng
- Generator of uniformly distributed random numbers.public static FastLoadedDiceRollerDiscreteSampler of(UniformRandomProvider rng, long[] frequencies)
Note: The discrete distribution generating (DDG) tree requires (n + 1) * k
entries
where n
is the number of categories, k == ceil(log2(m))
and m
is the sum of the observed frequencies. An exception is raised if this cannot be allocated
as a single array.
For reference the sum is limited to Long.MAX_VALUE
and the value k
to 63.
The number of categories is limited to approximately ((2^31 - 1) / k) = 34,087,042
when the sum of frequencies is large enough to create k=63.
rng
- Generator of uniformly distributed random numbers.frequencies
- Observed frequencies of the discrete distribution.IllegalArgumentException
- if frequencies
is null or empty, a
frequency is negative, the sum of all frequencies is either zero or
above Long.MAX_VALUE
, or the size of the discrete distribution generating tree
is too large.public static FastLoadedDiceRollerDiscreteSampler of(UniformRandomProvider rng, double[] weights)
Weights are converted to rational numbers p / q
where q
is a power of 2.
The numerators p
are scaled to use a common denominator before summing.
All weights are used to create the sampler. Weights with a small magnitude relative
to the largest weight can be excluded using the constructor method with the
relative magnitude parameter alpha
(see of(UniformRandomProvider, double[], int)
).
rng
- Generator of uniformly distributed random numbers.weights
- Weights of the discrete distribution.IllegalArgumentException
- if weights
is null or empty, a
weight is negative, infinite or NaN
, the sum of all weights is zero, or the size
of the discrete distribution generating tree is too large.of(UniformRandomProvider, double[], int)
public static FastLoadedDiceRollerDiscreteSampler of(UniformRandomProvider rng, double[] weights, int alpha)
Weights are converted to rational numbers p / q
where q
is
a power of 2. The numerators p
are scaled to use a common
denominator before summing.
Note: The discrete distribution generating (DDG) tree requires
(n + 1) * k
entries where n
is the number of categories,
k == ceil(log2(m))
and m
is the sum of the weight numerators
q
. An exception is raised if this cannot be allocated as a single
array.
For reference the value k
is equal to or greater than the ratio of
the largest to the smallest weight expressed as a power of 2. For
Double.MAX_VALUE / Double.MIN_VALUE
this is ~2098. The value
k
increases with the sum of the weight numerators. A number of
weights in excess of 1,000,000 with values equal to Double.MAX_VALUE
would be required to raise an exception when the minimum weight is
Double.MIN_VALUE
.
Weights with a small magnitude relative to the largest weight can be
excluded using the relative magnitude parameter alpha
. This will set
any weight to zero if the magnitude is approximately 2alpha
smaller than the largest weight. This comparison is made using only
the exponent of the input weights. The alpha
parameter is ignored if
not above zero. Note that a small alpha
parameter will exclude more
weights than a large alpha
parameter.
The alpha parameter can be used to exclude categories that
have a very low probability of occurrence and will improve the construction
performance of the sampler. The effect on sampling performance depends on
the relative weights of the excluded categories; typically a high alpha
is used to exclude categories that would be visited with a very low probability
and the sampling performance is unchanged.
Implementation Note
This method creates a sampler with exact samples from the specified probability distribution. It is recommended to use this method:
If the weights are computed from empirical observations then it is
recommended to use the factory method
accepting frequencies
. This
requires the total number of observations to be representable as a long
integer.
Note that if all weights are scaled by a power of 2 to be integers, and
each integer can be represented as a positive 64-bit long value, then the
sampler created using this method will match the output from a sampler
created with the scaled weights converted to long values for the factory
method accepting frequencies
. This
assumes the sum of the integer values does not overflow.
It should be noted that the conversion of weights to rational numbers has
a performance overhead during construction (sampling performance is not
affected). This may be avoided by first converting them to integer values
that can be summed without overflow. For example by scaling values by
2^62 / sum
and converting to long by casting or rounding.
This approach may increase the efficiency of construction. The resulting sampler may no longer produce exact samples from the distribution. In particular any weights with a converted frequency of zero cannot be sampled.
rng
- Generator of uniformly distributed random numbers.weights
- Weights of the discrete distribution.alpha
- Alpha parameter.IllegalArgumentException
- if weights
is null or empty, a
weight is negative, infinite or NaN
, the sum of all weights is zero,
or the size of the discrete distribution generating tree is too large.of(UniformRandomProvider, long[])
Copyright © 2016–2022 The Apache Software Foundation. All rights reserved.