Rating Data Stores

Figure 1. MongoDB's SoAR and Socialites Rating for a given workload

BG rates the performance of a system for processing interactive social networking actions by computing two values: Socialites and Social Action Rating (SoAR) using a pre-specified Service Level Agreement, SLA. An example SLA may require 95% of issued requests to observe a response time faster than 100 milliseconds with at most 0.01% observing unpredictable data for one hour. With such a criterion, BG computes two possible ratings for a system for a given workload:

  1. SoAR: Highest number of completed actions per second that satisfy the specified SLA.
  2. Socialites: Highest number of simultaneous threads that issue requests against the data store and satisfy the specified SLA. It quantifies the multithreading capability of the data store and whether it suffers from limitations such as the convoy phenomena that diminish its throughput with a large number of simultaneous requests.

Given several systems, depending on the application, the one with a higher SoAR and Socialites rating is the better system among others. Figure 1 shows MongoDB's SoAR and Socialites rating for a given workload. The Sociliates rating for it is around 1025 threads and its SoAR rating is around 36,000 actions/second which is observed with about 260 simultaneous threads issuing requests against MongoDB.

The key advantage of these ratings is that they reduce the performance of a system to two numbers, simplifying communication of results, allowing definition of clear performance objectives and enabling comparative studies. BG rates a data store by imposing an increasing amount of load starting from a low load to a high load (T), emulating a mix of actions against the data store. It computes the percentage of actions (α confidence value) that observe a response time faster than Β and provides insights into the system behavior.

Figure 2. BG's rating architecture

These ratings can be used to either identify the performance limits of a data store, compare the performance of different data stores with one another, or both using these two value.

Rating Architecture

BG utilizes two software components to rate a data store. These include multiple BG Clients (BGClient) and one BGCoordinator (BGCoord). BGCoord, issues commands to BGClients to either create BG's schema, construct a database and load it, or generate a workload for the data store. A BGListener on each BGClient node facilitates communication between BGCoord and its spawned BGClient. One may host multiple BGListeners on different ports on a node. A configuration file informs the BGCoord of the different BGListeners and their ports.

The BGCoord employs heuristic search techniques that conduct experiments, each with a fixed number of threads T, using the target data store to compute the SoAR and Socialites rating for a data store for a given workload. These threads are spread across the N BGClients. At the end of each experiment, each BGClient reports its observed number of unpredictable reads, and the percentage of requests that observed a response time equal to or faster than that required by the SLA, Β.

BGCoord employs heuristic search to expedite rating of a data store. This expedites rating of a data store by conducting fewer experiments than an exhaustive search. These techniques makes the following 3 assumptions about the behavior of a data store as a function of T:

  1. Throughput of a data store is either a square root function or a concave inverse parabola of the number of threads, see Figure 3.a.
  2. Average response time of a workload either remains constant or increases as a function of the number of threads, see Figure 3.b.
  3. Percentage of stale data produced by a data store either remains constant or increases as a function of the number of threads, see Figure 3.c.

These are reasonable assumptions that hold true in most cases. The heuristic used for computing the Socialites rating is guaranteed to compute the correct Socialiates rating for a data store whereas the SoAR heuristic used computes the SoAR of a system with ±10% margin of error.

Figure 3.c Τ Amount of stale data as a function of T
Figure 3.a Throughput as a function of T               Figure 3.b RT as a function of T

Rating Runtime Parameters

The BGCoord should be provided with the configuration file containing all the parameters required for rating a data store, ConfigFile. This is given to it as an input.

BG Coordinator consists of two sets of parameters. The first set consists of parameters that are used by the BGCoord to compute the ratings for a data store. These parameters are detailed in the table below. The second set consists of parameters that are required by each BGClient to emulate a workload against a data store. These parameters are available here.

All parameters are used as keyname=value and are provided in a configuration file which is given as an input to the BGCoord.

Rating Parameters

The existence of some of these parameters is optional. These parameters are highlighted in green.

numclients An Integer

Identifies the number of BGClients used for the load and benchmarking of a data store.

ratingtype 0 or 1

0 for Sociliates rating, 1 for SoAR rating.

loadbetweenrounds true/false

If set to true, it reconstructs the database between experiment rounds else it constructs the database once before the first round of experiment.

bgmode onetime/repeated

If set to onetime, it restarts the BGClients between different rounds of experiments, else it starts the BGClients only once and communicates messages with them.

With workloads involving update use one of the following combinations:

  1. loadbetweenrounds=true    bgmode=onetime
  2. loadbetweenrounds=false   bgmode=repeated

datastore String

Name of the data store client class.

exe String

This string identifies the execution command for each BGClient process spawned by the BGCoord. It has the following format:

java -cp $CLASSPATH$ edu.usc.bg.BGMainClass where the $CLASSPATH$ should be replaced with the path containing bg.jar and its required libraries.

Example: java -Xmx1G -cp C:/BG/build/bg.jar;C:/BG/db/mongodb/lib/* edu.usc.bg.BGMainClass

BGCoord assumes the path for BG's executable is the same for all the BGClients.

loadworkloadfile String

File containing the data for the social graph such as usercount and etc. The full path for the file can be specified.

workloadfile String

File containing the mix of actions emulated against the data store. The full path for the file can be specified.

threadcount Integer

The number of threads issuing actions againts the data store in the first round of the heuristic search technique.

This value should be assigned carefully. With values larger than 1, depending on the data store and the mix of actions issued against it, the heuristic search techniqye may be misguided.

ratingunit Integer

Identifies the duration (in seconds) of each of the rating experiments conducted by the BGCoord's heuristic.

finalexecutiontime Integer

Identifies the duration (in seconds) of final round of experiment identified by the SLA.

expectedlatency Double

Identifies the average response time (in seconds) for requests specified in the SLA requirement (Β).

expectedconfidence Double between 0 to 100

Identifies the percentage of actions (α) that should observe a response time lower than that specified by SLA (Β).

expectedstaleness Double between 0 to 100

Identifies the maximum acceptable percentage of unpredictable data (Τ) produced by a data store given by SLA.

numloadthreads Integer

Number of threads used by each BGClient to load the data store.

monitor Integer, multiplies of 10

Identifies the monitoring duratin (in seconds) for the behvaior of a data store. Once set to t, it outputs some statistics about the data store behavior every t seconds to the console.

This value should be less than the duration of each rating experiment.

In addition to the parameters provided in the table, BGCoord also needs to be aware of the nodes running BGClients and the ports their listener's are listening on. For this purpose for each BGClient the following should be added to the end of the ConfigFile.


Where ip is the ip for the node hosting the BGClient. port is the port the BGListener for the BGClient is running on. usercount is the number of users in the social graph and is the same for all the BGClients. These parameters should be specified in the same order as shown in the example.

One may run multiple BGClients on the same node by starting a separate BGListener process for each of them. These BGListeners should run on different ports.

For every BGListener a configuration file is provided as an input. This file containts the following:
which identifies which port the listener is running on.

Rating Output

The BGCoordinator produces two output files during its execution.
  1. FinalResultDataStoreX.txt: which maintains the overall results for each rating experiment as well as the final computed rating. Every line in this file is a comma separated String consisting of currentTime, time ,Objective, SLA Latency, SLA Confidence, numClients, ThreadCount, Throughput, ActThroughput, totalStaleness, TotalClientsSucceeded
    • currentTime: Shows the rating experiment's start time. This does not include the time it takes to load the data store between different rounds of the experiment.
    • time: The duration of the each rating experiment. This includes the duration of the warmup phase as well as the validation phase.
    • Objective: Identifies if a SoAR or a Socilites rating is being conducted.
    • SLA Latency: The average response time specified by the SLA.
    • SLA Confidence: The percentage of actions observing a response time below SLA latency, specified by the SLA.
    • numClients: Number of BGClients used to rate the data store.
    • ThreadCount: Total number of threads issuing request against the data store.
    • Throughput: Observed throughput in terms of sessions/second. This includes the sessions that result in "noOps". A noOp occurs when a selected action can not be issued against the data store. For example user A may be selected by BG to accept a friend request, but this user may not have any pending friend requests, which will result in a noOp.
    • ActThroughput: Observed throughput in terms of actions/second.
    • totalStaleness: Average percentage of stale data observed by a data store across all BGClients.
    • TotalClientsSucceeded: Number of BGClients satisfying the SLA requirement for an experiment

  2. Results.txt: Shows details about each rating experiment for each of the BGClient nodes. This includes the load imposed by each BGClient in terms of number of threads, the staleness and the throughput (actions/second) observed and the percentage of actions that satisfy the average reponse time specified by the SLA for each BGClient.