HBase Client Details

Author
Jia Li
Schema Description
  • Data is stored as tables: "members", "pending_friends", "confirmed_friends", "resource_owner", "resources" and "manipulations".
  • Each table consists of one or more column families as follows: members ("attributes", "thumbImg", "profilelImg"), pending_friends ("pendingFriends"), confirmed_friends ("confirmedFriends"), resource_owner ("resources"), resources ("resourceAttribute") and manipulations ("attributes").
  • Each column family may consist of one or more columns.
  • The profile aggregates "pendingcount", "friendcount" and "resourcecount" are stored as columns in the attributes column family of the members table.
  • Profile and thumbnail images for a member, are stored as a separate column family in the members table.
Index Structures
  • HBase uses an approach similar to Google's BigTable to lookup data.
  • The row key for every table is used to uniquely identify a row in the table. The row key for the members, pending_friends, confirmed_friends and resource_owner is membrid. The row key for the resources table is the resourceid and the row key for the manipulations table is the concatenation of resourceid and manipulationid.
  • Rows are lexicographically sorted based on their row key with the lowest order appearing first in a table.
Database Load Time
(10 Load Threads)
  • 100,000 members, 100 friends per user, 0 resources per user, 12KB profile images and 2KB thumbnail images: 10 Minutes
  • 1,000,000 members, 100 friends per user, 0 resources per user, 12KB profile images and 2KB thumbnail images: 10 Hours
SoAR Rating# †
100,000 members, 100 friends per user, 0 resources per user, 12KB profile images and 2KB thumbnail images
  • ViewProfile: 8302 actions/sec
  • ListFriends: 322 actions/sec
  • NewHighUpdateActions: 2554 actions/sec
Source
Download

# 100,000 members, 100 friends per user, 0 resources per user,12KB profile images and 2KB thumbnail images.

† The reported performance numbers were obtained using instances with the following specifications: 1Gbps network card, Intel(R) Core(TM) i7-4770 CPU@ 3.40GHz, 16 Gigabyte memory, Ubuntu 13.10. One instance hosted the data store while multiple BGClients were used to generate workload. Processor and network resources of the BGClients were not a bottleneck while obtaining these numbers. The experiments were conducted using Hadoop-2.2.0, HBase-Server 0.96.2, HBase-Client 0.96-2 and Zookeeper 3.4.6.