edu.usc.bg.base.generator
Class ZipfianGenerator

java.lang.Object
  extended by edu.usc.bg.base.generator.Generator
      extended by edu.usc.bg.base.generator.IntegerGenerator
          extended by edu.usc.bg.base.generator.ZipfianGenerator

public class ZipfianGenerator
extends IntegerGenerator

A generator of a zipfian distribution. It produces a sequence of items, such that some items are more popular than others, according to a zipfian distribution. When you construct an instance of this class, you specify the number of items in the set to draw from, either by specifying an itemcount (so that the sequence is of items from 0 to itemcount-1) or by specifying a min and a max (so that the sequence is of items from min to max inclusive). After you construct the instance, you can change the number of items by calling nextInt(itemcount) or nextLong(itemcount). Note that the popular items will be clustered together, e.g. item 0 is the most popular, item 1 the second most popular, and so on (or min is the most popular, min+1 the next most popular, etc.) If you don't want this clustering, and instead want the popular items scattered throughout the item space, then use ScrambledZipfianGenerator instead. Be aware: initializing this generator may take a long time if there are lots of items to choose from (e.g. over a minute for 100 million objects). This is because certain mathematical values need to be computed to properly generate a zipfian skew, and one of those values (zeta) is a sum sequence from 1 to n, where n is the itemcount. Note that if you increase the number of items in the set, we can compute a new zeta incrementally, so it should be fast unless you have added millions of items. However, if you decrease the number of items, we recompute zeta from scratch, so this can take a long time. The algorithm used here is from "Quickly Generating Billion-Record Synthetic Databases", Jim Gray et al, SIGMOD 1994.


Field Summary
static java.util.Vector<java.lang.Double> probabilities
           
static double ZIPFIAN_CONSTANT
           
 
Constructor Summary
ZipfianGenerator(long _items)
          Create a zipfian generator for the specified number of items.
ZipfianGenerator(long _items, double _zipfianconstant)
          Create a zipfian generator for the specified number of items using the specified zipfian constant.
ZipfianGenerator(long _min, long _max)
          Create a zipfian generator for items between min and max.
ZipfianGenerator(long min, long max, double _zipfianconstant)
          Create a zipfian generator for items between min and max (inclusive) for the specified zipfian constant.
ZipfianGenerator(long min, long max, double _zipfianconstant, double _zetan)
          Create a zipfian generator for items between min and max (inclusive) for the specified zipfian constant, using the precomputed value of zeta.
 
Method Summary
static void main(java.lang.String[] args)
           
 double mean()
          Return the expected value (mean) of the values this generator will return.
 int nextInt()
          Return the next value, skewed by the Zipfian distribution.
 int nextInt(int itemcount)
          Generate the next item.
 long nextLong()
          Return the next value, skewed by the Zipfian distribution.
 long nextLong(long itemcount)
          Generate the next item as a long.
 
Methods inherited from class edu.usc.bg.base.generator.IntegerGenerator
lastInt, lastString, nextString
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ZIPFIAN_CONSTANT

public static final double ZIPFIAN_CONSTANT
See Also:
Constant Field Values

probabilities

public static java.util.Vector<java.lang.Double> probabilities
Constructor Detail

ZipfianGenerator

public ZipfianGenerator(long _items)
Create a zipfian generator for the specified number of items.

Parameters:
_items - The number of items in the distribution.

ZipfianGenerator

public ZipfianGenerator(long _min,
                        long _max)
Create a zipfian generator for items between min and max.

Parameters:
_min - The smallest integer to generate in the sequence.
_max - The largest integer to generate in the sequence.

ZipfianGenerator

public ZipfianGenerator(long _items,
                        double _zipfianconstant)
Create a zipfian generator for the specified number of items using the specified zipfian constant.

Parameters:
_items - The number of items in the distribution.
_zipfianconstant - The zipfian constant to use.

ZipfianGenerator

public ZipfianGenerator(long min,
                        long max,
                        double _zipfianconstant)
Create a zipfian generator for items between min and max (inclusive) for the specified zipfian constant.

Parameters:
min - The smallest integer to generate in the sequence.
max - The largest integer to generate in the sequence.
_zipfianconstant - The zipfian constant to use.

ZipfianGenerator

public ZipfianGenerator(long min,
                        long max,
                        double _zipfianconstant,
                        double _zetan)
Create a zipfian generator for items between min and max (inclusive) for the specified zipfian constant, using the precomputed value of zeta.

Parameters:
min - The smallest integer to generate in the sequence.
max - The largest integer to generate in the sequence.
_zipfianconstant - The zipfian constant to use.
_zetan - The precomputed zeta constant.
Method Detail

nextInt

public int nextInt(int itemcount)
Generate the next item. this distribution will be skewed toward lower integers; e.g. 0 will be the most popular, 1 the next most popular, etc.

Parameters:
itemcount - The number of items in the distribution.
Returns:
The next item in the sequence.

nextLong

public long nextLong(long itemcount)
Generate the next item as a long.

Parameters:
itemcount - The number of items in the distribution.
Returns:
The next item in the sequence.

nextInt

public int nextInt()
Return the next value, skewed by the Zipfian distribution. The 0th item will be the most popular, followed by the 1st, followed by the 2nd, etc. (Or, if min != 0, the min-th item is the most popular, the min+1th item the next most popular, etc.) If you want the popular items scattered throughout the item space, use ScrambledZipfianGenerator instead.

Specified by:
nextInt in class IntegerGenerator

nextLong

public long nextLong()
Return the next value, skewed by the Zipfian distribution. The 0th item will be the most popular, followed by the 1st, followed by the 2nd, etc. (Or, if min != 0, the min-th item is the most popular, the min+1th item the next most popular, etc.) If you want the popular items scattered throughout the item space, use ScrambledZipfianGenerator instead.


main

public static void main(java.lang.String[] args)

mean

public double mean()
Description copied from class: IntegerGenerator
Return the expected value (mean) of the values this generator will return.

Specified by:
mean in class IntegerGenerator