To evaluate a data store using BG, one must follow these eight steps:
The first six steps require development of software that implements BGClient. The last two consist of running BG to gather performance numbers from the target data store. Step 8 involves the use of a coordinator that manages BGClients and aggregates their obtained statistics.
We use the following terminology in this manual:
An evaluation consists of a target data store, TestDS, and an implementation of its client interface, TestDSClient. An implementation of TestDSClient realizes the BGClient for TestDS. TestDS is the focus of the evaluation with the objective to establish its SoAR and Socialites ratings. BG's ratings are valid as long as the BGClients do not become a bottleneck. This means the CPU, network, and disk utilization of a BGClient should not be fully utilized. Otherwise, one must increase the number of nodes used to host BGClients and repeat a rating experiment.
BG consists of 3 components:
BGCoord, a coordinator that computes the SoAR and Socialites rating of a data store by implementing both an exhaustive and a heuristic search technique. Its inputs include the number of BGClients to use for a rating and the tolerable Service Level Agreements (SLAs): percentage of requests (α) that must observe a response time lower than Β for Δ time unit with no more than τ% of requests observing unpredictable data.
During the load phase, BGCoord computes the portion of data that must be loaded by each BGClient. During the experimentation phase, BGCoord computes the fraction of workload that should be issued by each BGClient. BGCoord launches the BGClients, communicates with each BGClient to provide it with the necessary input to get it started, and monitors their progress periodically. At the end of an experiment, BGCoord aggregates the results produced by the different BGClients, computes either the SoAR or Socialites ratings for TestDS, and reports them to BG's visualization deck for display.
BGClient, an implementation of BG's workload generator that embodies TestDSClient. TestDSClient is specific to TestDS and must implement BG's schema (createSchema method), populate the schema with data (insertEntity method), and those social actions that constitue the focus of an evaluation.
Either BGCoord or an evaluator may employ the participating BGClients to perform three tasks. First, to populate the target data store with the specified database characteristics (number of members, friends per member, resources per member). Second, to impose work on TestDS using the specified workload that is provided either as a part of an input file or an input arguments when launching BGClient. (The latter is used when BGCoord launches BGClient). Third, to quantify TestDS's observed throughput, average response time and its confidence, and amount of unpredictable data. These key metrics are written to the output once the experiment ends. If the BGClient is rating the target data store, it transmits the key metrics to the BGCoord both periodically and at the end of the experiment.
BG Visualization deck enables a user to specify parameter settings for BGCoord and the BGClients, and start and monitor an experiment. It then displays the aggregate key metrics computed by BGCoord in collaboration with BGClient(s).
TestDSClient is a software component of BGClient. It consists of a set of interfaces that must be implemented by a programmer for the target data store, TestDS. It converts the workload generated by a BGClient to calls issued to TestDS.
We strongly encourage an evaluator to debug and test the implementation of TestDSClient prior to rating TestDS. It is a mistake to rate a data store using an erroneous implementation of TestDSClient interfaces. A developer may utilize the command line interface of BG to invoke each BG action (implemented by TestDSClient) and display the data provided by TestDS. Prior to using the command line interface, one must create the data store and populate it with data, see the next two sections.
To get started, download BG and unzip the folder to obtain its Java source code. Create a folder named TestDS inside BG/db. Construct two folders named lib and src inside BG/db/TestDS. The lib folder will contain the .jar files specific to TestDS (typically its client component, e.g., JDBC driver of a SQL system) and the src folder will contain the new TestDSClient.java file. TestDSClient class must extend the edu.usc.bg.base.DB class and provide an implementation for all the abstract methods in that class.
Below, we describe the 8 steps in turn.
BG requires a pre-specified conceptual schema that must be implemented by the createSchema method of TestDSClient. This schema consists of three entity sets: users, resources and manipulations. The attributes for each entity are as follows and MUST be implemented by TestDS. This is because BG actions retrieve the value of these attributes when computing the amount of unpredictable data produced by TestDS. The resulting schema might be implicit or explicit. With SQL systems, the schema is created explicitly. With some NOSQL systems such as RavenDB, the schema might be implicit and realizes when creating documents (during load phase).
Note: One may configure BG to not have profile images for the users, using
-p insertimage=false. In this case, both "pic" and "tpic" attributes are redundant and can be skipped.
The schema must also capture the concept of friendship where users A and B are friends if and only if user A is friends with user B and user B is friends with user A. The friendship relationship between two users can only be generated by the one user initiating a friend request to the other user and the second user accepting it.
In the second step, fine tune the schema of Step 1 by designing an implementation of BG's actions. This process helps refine the schema. Actions of TestDSClient implement these designs, see Step 6. A design must consider the information required by BG from each action. With read only actions (e.g., View Profile), BG uses the returned results to detect whether the data store has produced unpredictable data. If an action fails to retrieve the correct results, BG penalizes the data store by identifying the incorrect data as unpredictable data. Below, we describe the data that must be retrieved by each read only action:
viewProfile must provide the following information for the specified userid:
In the absence of the first 3 fields, BG raises exceptions. Ensure the spelling of the key is consistent with the provided specification.
listFriends must provide the list of friends for a user. The result must be populated with the "userid" for every friend and all the other user attributes and their values for each friend. If profiles are configured with images, the thumbnail image for each friend must be returned to BG. Ensure the spelling of the key and its case is consistent with the provided specification for attribute names, e.g., userid may not be USERID.
viewFriendReq must provide the list of pending friendships for a user. For every invitation, the "userid" of the inviter and all other attributes of the inviter's profile should be inserted in the result hashmap. If the users are provided with images, the thumbnail image for each inviter should be retrieved from the data store and returned to BG. Ensure the spelling of the key is consistent with the provided specification.
viewTopKResource returns the top K resources posted on a user's profile. For every resource a hashmap is returned. This hashmap contains the resource id ("rid"), the "walluserid" (the unique identifier of the profile the resource has been created on) as well as all other resource attributes. Ensure the spellings of the keys are consistent with the provided specification.
viewCommentOnResource returns the manipulations/comments posted on a resource. A hashmap is returned for every manipulation. This hashmap contains the manipulation attributes as well as their values.
getInitialStats returns the initial statistics of the database which contains the user count("usercount"), average number of friends per user ("avgfriendsperuser"), average number of pending requests per user("avgpendingperuser"), and average number of resources per user ("resourcesperuser"). Ensure the spelling of the key is consistent with the provided specification.
getCreatedResources returns the unique identifier of the resources ("rid") that are created by a user. Ensure that the spelling of the key for each resource is consistent with the provided specifications.
queryPendingFriendshipIds returns the unique identifier of the users who have generated friend requests for a specific user. The information returned by this method is needed both for the correct execution of the BG benchmark and for computing the amount of unpredictable data.
queryConfirmedFriendshipIds returns the unique identifier of the users who are friends with a specific user. The information returned by this method is needed both for the correct execution of the BG benchmark and for computing the amount of unpredictable data.
Finetune the schema of Step 1 to ensure it can produce the output required by each BG action.
Similar to YCSB, BG provides an "init" interface in the TestDSClient that a developer utilizes to initialize the client component of a data store.
Any code related to initializing the data store and creating a connection to the data store must be written inside the init method. The init() method is called for every BG thread. Hence, with T threads, each thread will initiate an instance of data store separately and may create a separate connection to the data store. One may utilize synchronization primitives to implement concepts such as the first thread (racing with T-1 threads) to open connection to TestDS.
The developer implements the interface named "cleanup" of TestDSClient to clean up communication between TestDS Client and Server components. BGClient invokes "cleanup" once is has completed execution. With T threads, each thread invokes the cleanup method. Using synchronization primitives, one may implement the concept of the last thread (executing with T-1 threads concurrently) shutting down TestDS in cleanup, see below.
One implements the conceptual schema of BG by implementing the body of "createSchema" method in TestDSClient. Index structures are specified by implementing the "buildIndexes" method of TestDSClient. One should implement "buildIndexes" (instead of specifying indexes as a part of "createSchema") if it is faster to load TestDS without index structures. This is typically true of SQL systems. With load, an option forces BG to invoke the provided "buildIndexes" method once loading of the database is complete.
Both createSchema and buildindexes methods obtain their input properties as their arguments.
Once Steps 3 and 4 are completed, assuming all the external libraries specific to TestDS have already been added to the project, one may construct the schema by invoking the edu.usc.bg.base.BGMainClass with the following argument:
onetime -schema -db TestDS.TestDSClient
All the other data store parameters such as connection url, driver and etc can be added as a property using
-p to the list of arguments as follows:
onetime -schema -db TestDS.TestDSClient -p TestDS.url=10.0.0.1:8767
-p insertimage=true is added to the arguments, then the developer may implement createSchema to create the necessary attributes to support a profile and a thumbnail image for each user entity.
One populates the schema of Step 4 by implementing the insertEntity, createFriendship, and inviteFriend methods of TestDSClient. Below, we describe these methods in detail followed by a description of the input parameter values required to load TestDS.
The arguments of the insertEntity method include the name of a target entity set, its primary key, a hash table containing attributes of the specified entity set and their values, whether the schema supports an image for the users, and the size of the profile image.
Target entity might have two possible values: users and resources. BG has made the arbitrary decision that only these two entity sets can be populated during loading phase. Comments on resources can be created only during Steps 7 and 8 when evaluating a data store. One may configure BG with different profile image sizes: imagesize=1 corresponds to an image of size 1KB, imagesize=2 corresponds to an image of size 2KB and so on. Image size, imagesize=2 is always considered as the thumbnail image.
The arguments of the createFriendship method include the unique identifier's of two users. This method populates the schema by creating friendship relationships between the two identified users.
The arguments of the inviteFriend method include the unique identifier of the user generating a friend request and the unique identifier of the user receiving the friend request. This method populates the schema by creating pending friendship relationships between users.
In addition to these three methods, the developer must implement the getInitialStats method. This method is responsible for querying the data store statistics such as user count, number of friends per user, number of pending friendship requests per user and number of resources created by each user, populating a hashmap with the appropriate tuples ("usercount", "avgfriendsperuser", "avgpendingperuser", "resourcesperuser") and returning it to BG. A lack of these attributes in the specified format causes BG to raise exceptions.
The getInitialStats method is called both in the load and in the experimental phase. For the load phase it is called once the loading of data is completed and its returned value allows the developer to verify if the load phase succeeded. In the experimental phase it is called before the benchmark execution and its return value is used for validation and quantifying the amount of unpredictable data. For more information on quantifying unpredictable data please refer to Unpredictable data and validation.
The populate code can be tested using the
onetime -load or
onetime -loadindex command along with the load parameters. One may specify these in either a file that is provided as input to BG using
-P filename or provided as runtime properties using
-p. If a parameter is specified both as a property within the command line arguments and in the file, BG uses the value provided in the command line. If the same parameter has been set multiple times in the command line argument, the last specified value is used by BG.
The core parameters used for the load phase include number of users (usercount), number of friends per user (friendcountperuser), number of resources created by each user (resourcecountperuser), whether an image will be inserted for users (insertimage), the size of the image(imagesize), the image path (imagepath) if any being inserted, the percentage of confirmed friendships (confperc) and the data store properties.
The friendcountperuser should be an even number. If its value is set to an odd number BG will automatically convert it to the highest even number lower than the odd number specified.
The confperc specifies the percentage of confirmed friendships per user. If set to 1, then the load phase will create friendcountperuser number of friends per user using the createFriendship method. If set to 0, the load phase will only create pending friendships using the inviteFriend method and as pending relationships are not symmetric, friendcountperuser/2 number of friend requests will be generated per user.
Depending on the data store, using multiple threads may enhance the performance of the load phase. It is important to conduct small scale experiments with the data store to determine if multiple threads enhances the load time. The number of threads can be controlled using the threadcount parameter as follows:
-p threadcount=10. Conceptually, BG uses multiple threads by dividing the number of users with the number of threads. Next, it sets up each collection of users with the specified number of friends per user. It is important for the number of users in each collection to exceed the number of friends per user. In other words, the number of friends per user should always be lower than the number of users divided by the number of BG threads otherwise load fails as it would be impossible to create the appropriate friendships in each cluster.
onetime -load argument is only responsible for loading the data store. The
onetime -loadindex argument both populates the data store with data and creates index structures for the data that was populated. An example usage for
-load has been provided below:
onetime -load -db TestDS.TestDSClient -P C:/BG/workloads/poplateDB -p TestDS.url=10.0.0.1:8767 -p insertimage=true -p imagesize=2 -p threadcount=10
Where the populateDB files has a structure as follows:
BG supports 11 different social actions. BG is flexible to generate a mix of these actions. Implementation of each action is specific to a data store and its capabilities. The definition of each of these methods is provided as an abstract method in the edu.usc.bg.base.DB class and must be overridden by the data store client i.e. TestDSClient class. Each method requires a predefined set of parameters as input and must support the requirements described in step 2.
To run a mix of actions, the developer must specify the data store client using "db" parameter and the appropriate method calls will be loaded dynamically during the execution.
viewProfile: Emulates a user accessing her own or another user's profile. The unique identifier of the user accessing the profile (requesterID), the unique identifier of the user who owns the profile (profileOwnerID), whether image has been inserted for users (insertImage), if the profile image should be stored in the file system (testMode), and a HashMap which will be containing the profile attributes and their values are provided to the method.
The testMode flag is used for testing the store and retrieval of profile images.
listFriends: Emulates a user accessing her own or another user's list of friends and their profile information. The unique identifier of the user accessing the list of friends (requesterID) for a profile, the unique identifier of the profile being accessed for list of friends retrieval (profileOwnerID), whether images are inserted or not (insertImage), whether the retrieved images (thumbnail images for each friend in the list) need to be stored in the file system(testMode), and a Vector of HashMaps where every HashMap will contain the profile information for one friend are provided to this method.
viewFriendReq: Emulates a user viewing a list of all her pending invitations and each inviter's profile information. This action can be performed only by an owner of a profile. The unique identifier of the user accessing her list of pending invitations (profileOwnerID), whether images are inserted or not (insertImage), whether the retrieved images (thumbnail images for each inviter in the list) need to be stored in the file system(testMode), and a Vector of HashMaps where every HashMap will contain the profile information for one inviter are provided to this method.
acceptFriend: Emulates a user accepting a friend request generated by another user. The unique identifier of the user who generated the friend request (inviterID) and the unique identifier of the user whom has received the friend request (inviteeID) and wants to accept it are provided to this method.
rejectFriend: Emulates a user rejecting a friend request generated by another user. The unique identifier of the user who generated the friend request (inviterID) and the unique identifier of the user whom has received the friend request (inviteeID) and wants to reject it are provided to this method.
inviteFriend: Emulates a user extending a friend request to another user. The unique identifier of the user generating the friend request (inviterID) and the unique identifier of the user receiving the friend request (inviteeID) are provided to this method.
thawFriendship: Emulates a user removing a friend. As the friendship relationship is symmetric, once user A removes user B from her list of friends, then user A also will be removed from user B's list of friends. The unique identifier of the user who wants to remove a friend (friendid1) and the friend who is going to be removed (friendid2) are provided to the method.
viewTopKResources: Emulates a user retrieving the top k resources posted on her own profile or another user's profile. The unique identifier of the user displaying the resources (requesterID), the unique identifier of the user profile containing the resources(profileOwnerID), the number of resources that need to be displayed (k) and a Vector of HashMaps where each HashMap will contain a resource and its attributes are provided as inputs to this method.
viewCommentOnResource: Emulates a user displaying the comments/manipulations create on a resource posted on her own or another user's profile. The unique identifier of the user requesting to display the comments (requesterID) , the owner of the profile which the resource is posted on (profileOwnerID), the unique identifier of the resources (resourceID) and a Vector of HashMaps where each HashMap will contain the attributes of a comment are provided to the method.
postCommentOnResource: Emulates a user posting a comment on resource posted on her own or another users' profile. The unique identifier of the user creating the comment (commentCreatorID), the unique identifier of the profile which the resource was posted on (profileOwnerID), the unique identifier of the resource (resourceID) , and a HashMap containing the comment attributes and their values are provided to the method.
delCommentOnResource: Emulates a user deleting a comment on resource owned by her. The unique identifier of the user deleting the comment (requesterID), the unique identifier of the resource(resourceID), and the unique identifier of the comment (manipulationID) are provided to the method.
A return value of 0 for a method indicates that the execution of the social action corresponding to that method was successful. A negative return value indicates an error and results in BG to terminate.
Every method implementation can be tested using the edu.usc.bg.FunctionCommandLine class. For testing the methods corresponding to a data store, the appropriate data store parameters as well as the data store interface layer (i.e. TestDSClient) need to be passed to the FunctionCommandLine class such as below:
-db TestDS.TestDSClient -p TestDS.url=10.0.0.1:8767
Next the FunctionCommandLine class will use the input parameters to connect to the data store. Once the connection is successful (this can be verified by seeing the "Connected" message in the console), every social action can be tested by entering the name of the social action and its required parameters. If the social action execution is successful, the output for that action will be displayed in the console; else the related error will be printed to the console. If a social action requires retrieving images, the images will be retrieved and stored in the current working directory.
For a list of social actions and their required parameters enter "help" in the console.
Prior to testing the action's implementations, one must ensure that the data store schema is created and it is populated with the appropriate data.
Apart from the 11 social actions described above, there are three other methods that need to be implemented by the data store interface developer. The results of these methods are used both for valid execution of BG benchmark and quantification of unpredictable data and if not implemented may result in a wrong/invalid experiment and results.
BG must be aware of the initial state of a data store before running a benchmark. This includes information about the resources that were created by each user and the number of comments produced on each of them, all the pending friendship relationships and the confirmed friendship relationships in the data store. Failing to provide the approproiate implementation of these methods may result in exceptions in BG and errors in the final results. For example if the load phase constructs friendships for users and if this information is not queried by BG before the benchmarking phase, then BG may start constructing already existing friendships which may result in integrity constraint exceptions with relational databases.
Once step 6 is completed and tested using the FunctionCommandLine class, BG can be used to run benchmarks against a data store and evaluate its performance. For this purpose we execute the edu.usc.bg.base.Client class with the
onetime -t argument. We also need to specify the data store interface class using
-db and data store properties as well as the benchmarking parameters. For example we can have:
onetime -t -db TestDS.TestDSClient -P C:/BG/workloads/MixOfAction -p maxexecutiontime=30 -p usercount=1000 -p initapproach=querydata
The core benchmark parameters include the mix of social actions that need to be emulated, the number of users in the system, and the execution duration. The mix of social actions should be specified in a workload file and given as an argument to BG using
-P such as
-P C:/BG/workloads/MixOfAction. This file has a format similar to below where every property can have a value between 0 to 1 which identifies the percentage of that action in the workload.
The total sum of the values assigned to user actions and user activities should be 1.
If an image is inserted for the users, then
-p insertimage=true should be passed as a runtime argument to ensure that BG retrieves the user images for related actions.
The number of users in the system (usercount) can be specified either as a command line argument using
-p usercount=1000 or in the workload file by adding usercount=1000.
The duration of an experiment can be limited by either specifying and execution time or number of actions. The execution time can be passed as a command line argument
-p maxexecutiontime=600 in seconds. The number of operations to be executed before termination can be passed as a command line parameter using
-p operationcount=1000. If both execution time and operation count are specified, the BG continues execution until one of them is completed.
In order to force BG to query the initial state of the data store,
-p initapproach=querydata should be passed as a command line argument. For a list of run time parameters please refer to BG Runtime Parameters.
Once BG completes execution, the benchmarking results will be displayed in the output or can be directed to a file. For a list of benchmark output parameters please refer to Benchmark Output.
BG also supports a warm up phase. In the warm up phase BG only issues the read only operations specified in the workload mix. The warm up phase can happen right before the benchmark starts; once the warm up phase is completed, BG continues to run the benchmark by issuing both the read and write social action specified in the workload mix against the data store. The number of warm up operations and the number of threads issuing the warm up operations can be specified using the
-p warmup=10000 and
-p warmupthreads=10 input parameters. The warm up phase can be used for various scenarios. For example it can be used to warm up a server by reading the data from the disk into the memory which may result in a higher performance for the data store.
BGClient makes decisions on how to pick users to be emulated. BG supports three in-built distributions which can be identified in the workload phase using the
-p requestdistribution parameter. This parameter can either be specified in the workload file or can be added as a run time parameter for BG in the command line.
BG emulates a DZipdian user distribution with a mean of 0.27 if requestdistribution parameters is set to dzipfian and the zipfianmean parameters is se to 0.27 i.e. in the workload file as follows:
If the workload benchmarked contains both read and update operations, then BG will perform a validation phase to identify the amount of stale data. Querying the initial sate of the data store using
-p initapproach=querydata or
-p initapproach=deterministic is required for BG to produce valid statics on amount of unpredictable data. With
-p initapproach=deterministic one may also need to specify the following parameters: usercount, resourcecountperuser, friendcountperuser, useroffset, confperc and numloadthreads. The value for these parameters should be the values that were used for the loading phase (numloadthreads should be set to the threadcount used in the load phase).
BG logs every update operation performed against the data store and uses this information to quantify the amount of stale data once the experiment is complete.
Currently, there are two centralized implementations of the validation phase using interval-trees as in-memory data structures and a persistent store using a relational database. The user of each of these techniques is identified by the "validationapproach" parameter.
If the validationapproach parameters is set to "INTERVAL", BG uses interval tree in-memory data structures to quantify the amount of unpredictable data.
If the validationapproach parameter is set to "RDBMS" then a relational database is used for quantifying the amount of stale data. In this case the "validation.driver", "validation.passwd", "validation.user" and "validation.url" need to be provided to BG as input parameters. For more details on unpredictable data and validation please refer to Unpredictable data and validation.
To rate a data store with N BGClients, we start N BGListeners. Each BGListener requires an input file which consists of
x is the port BGListener is listening on.
BGListener facilitates the communication between the BGCoord and the BGClient (Each BGClient and its BGListener run on the same node). Next the rating parameters as well as the BGClient parameters including the port each BGClient's listener is running on is provided
as an input to the BGCoord. Finally, BGCoord uses the parameters to rate the data store, see here for more details.
The BGCoord assumes that data store is up and running. One can add the code to start and shutdown a data store to the BGCoord.