Introduction
Enterprises
often start with a single server setup.
As the
enterprise grows, so does the data and the number of requests for the
data. Using a single server setup makes it difficult to manage the
increasing load. This creates the requirement to scale out.
One
solution would be to replicate to read servers. The writes go to the
master and the reads go to the slaves.
Although
this setup handles the increased read load, it still cannot handle
the increasing write load. Trying to handle this by adding another
master just compounds the problem. The write load must be repeated on
every master (more work for the application and each master is just
as busy as if there were just one).
Thus we
need to tackle the problem in a different way. Instead of focusing
on our setup we shift focus to our data. By splitting the data we can
correspondingly distribute the load on the data.
Fabric Sharding allows the distribution of rows from a MySQL table over multiple
database
servers or physical locations.
MySQL Fabric Sharding - Basic Architecture
The basic
architecture consists of the,
- Fabric Aware Connector
- Fabric Server
- Server Groups - That store the shard data.
MySQL Fabric Aware Connector
The
connector contacts the Fabric server to find the location of a
particular shard key. The connector also implements caching to avoid
repeated round trips to the Fabric server to fetch the sharding
information. By building the functionality into the connector, the
architecture avoids the need to for an extra hop to a proxy/routing
node and so minimizes latency. In the future we may add the option of
a standalone routing node to enable Fabric-unaware connectors work
with MySQL Fabric.
MySQL Fabric Server
The Fabric server,
- Stores the sharding information
- Range Based Sharding
- Hash Based Sharding
- Handles Re-Sharding
- Shard Moving
- Shard Splitting
Server Groups - Storing the shard data
Each server group represents an HA
configuration of MySQL Servers, with one of the servers acting as
master and the others acting as slaves.
Sharding Schemes
The data can basically be partitioned
using the RANGE based or HASH based sharding scheme. In the current
version, the application must provide the sharding key but this is
something that we aim to make more transparent in the future.
Range Based Sharding
The Range based sharding scheme
partitions a given table based on, user defined ranges of the value
of a column, chosen as the sharding key.
For example, in the following
Employee table schema,
EmpID
|
EmpName
|
EmpAddress
|
EmpPhone
|
The table can be partitioned on
EmpID. Let us assume that the EmpID ranges between values 0 –
10000. We can create five partitions with lower_bounds = 0, 201, 401,
601,801. Where the partitions with lower_bound = 0 stores all the
rows with EmpID between 0 – 200. Similarly the partition with
lower_bound = 201 stores all the rows with EmpID between 201 – 400.
Hash Based Sharding
The Hash based sharding scheme uses
the md5 cryptographic hash and builds on the underlying RANGE based
sharding scheme. The lower_bounds for the underlying RANGE based
sharding scheme are generated by applying the cryptographic hash on
the group ID for the particular shard.
In order to decide whether a given row of a table must be in a particular shard, we apply the md5 algorithm on the value of the column chosen as the shard key. We compare this value with the lower_bound of the shard definition to decide if the row should be placed in this shard.
Hence in the case of the employee table schema, assume that we have a FABRIC group with ID as GROUPID-1. We set lower_bound = md5(“GROUPID-1”). Now when we want to insert a row into this shard, we check to see if md5(EmpID for row) > md5(“GROUPID-1”) before inserting.
Handling Global Operations
In order to handle transactions that
need to be executed on all the shards, E.g. Schema updates / updates
on global tables, FABRIC creates the concept of a global group.
The Global Group stores the schema (IMPORTANT: Not the data) of the tables that are sharded. It also stores the tables (IMPORTANT: And the data) for all such tables that need to be present on all the shards (E.g. Pincode table that might need to participate in joins across all the shards).
When we need to fire a global operation, for example a schema update on a sharded table, we run this on the global group. The update automatically gets propagated across all the shards.
Moving the Shard
When a given shard is overloaded, we may decide to move the shard into a new Fabric Group (For e.g. The new Fabric Group could be composed of servers running on more powerful hardware). While the shard move takes place we would like to ensure minimal impact on ongoing transactions.
Fabric allows moving a shard registered with the Fabric system into another Fabric group seamlessly.
Splitting the Shard
When the data in a shard grows larger than the size optimal for good performance, we can split the data in this shard into 2 server groups. While shard split takes place we would like to ensure minimal impact on ongoing transactions. Fabric allows splitting a shard registered with the Fabric system into another Fabric group seamlessly.
Find an use case for starting with a single server (single shard setup) and scaling gradually as the application and its data grows here http://vnwrites.blogspot.in/2013/09/mysqlfabric-sharding-example.html .