VN: MySQL Fabric - Sharding

In this article we discuss a use case for starting with a single server (single shard setup) and scaling gradually as the application and its data grows. We also describe here the different aspects of creating shards using MySQL Fabric and what each of it means.

Note: Unlike the previous example the present example is not working with data that already exists.

Note: The example also covers how to modify the python application code to work with Fabric.

Prototyping the Application on a Single MySQL Shard Setup

The figure below depicts the difference in the initial setup between using and not using MySQL Fabric.

Note: For the initial setup the Global Group can contain just another MySQL Server running on the same machine.

Note: The application would also need changes in the way it uses the python connector, when it switches to FABRIC.

The Application

In order to keep the focus on working with Fabric, a very simple application is used as an example. The application basically registers and queries employee details. The schema itself is highly simplified and contains the bare minimum to cover the all the use cases in Fabric.

Application Schema:

The employee, department and the salary tables are sharded based on the emp_no that is used as the shard key.

The fuel_reimb represents a table that is often used (in joins queries) to find the fuel allowance for various employees. All the data in this table is often required in joins.

Connecting to the database:
We use the python connector to write into the database. The python connector is MySQL Fabric aware. We shall look into the details of using the Fabric aware python connector later in this blog.

Starting MySQL Fabric

Start a MySQL Server that acts as the state store.
Change fabric.cfg, for configuring Fabric, to use this MySQL Server as the state store.
Fabric persists metadata about the shard topology in a set of relation tables in the MySQL Server acting as the state store. This step creates the schemas of the tables in the state store.

Commands:

mysqlfabric manage setup

Starting the Fabric Server (contains an XML-RPC Server).

Commands:

mysqlfabric manage start

Creating the sharding topology

Creating the Global Group

The global group in a sharding setup is used to propagate schema updates to all tables in the sharding setup and updates to global tables throughout the sharding ecosystem. The global group contains,

The schema of the sharded tables
The global tables

Commands:

mysqlfabric group create GLOBAL_GROUP

mysqlfabric group add GLOBAL_GROUP localhost:13001 root ""

mysqlfabric group add GLOBAL_GROUP localhost:13002 root ""

Find the UUID of localhost:13001 and promote it as master

mysqlfabric group lookup_servers GLOBAL_GROUP

mysqlfabric group promote GLOBAL_GROUP <UUID of MySQL Server running on localhost:13001>

Creating the Server Group (for the first shard)

Each shard maps to a server group. Each server group is a set of MySQL servers in a HA configuration. The application starts with a single server group and adds more shards (server groups) as required. Let us create the first shard.

Commands:

mysqlfabric group create GROUPID1

mysqlfabric group add GROUPID1 localhost:13003 root ""

mysqlfabric group add GROUPID1 localhost:13004 root ""

mysqlfabric group lookup_servers GROUPID1

mysqlfabric group promote <UUID of MySQLServer running on localhost:13003>

Create a Sharding Definition

A sharding definition introduces a particular sharding scheme to which multiple tables (related by a common sharding key) can be mapped. A Sharding definition is composed of

Sharding scheme – RANGE, HASH
Global Group – Gets the global updates for all the shard tables mapped to this sharding definition. It also stores the global tables in this sharding definition.

Command

mysqlfabric sharding define RANGE GLOBAL_GROUP

Map the tables to the sharding definition

Each sharding definition is associated with a unique ID. This unique ID is generated when we create a sharding definition. In the above case this ID is 1.We map this unique ID to the table being sharded. Each mapping maps a table and the column containing the shard key to the sharding definition.

Commands:

mysqlfabric sharding add_mapping 1 employee.employee emp_no

mysqlfabric sharding add_mapping 1 employee.salaries emp_no

mysqlfabric sharding add_mapping 1 employee.dept_emp emp_no

Add the Shards

When we add shards we define the way the data is sharded based on the shard key. Each of the shards,

Specify the shard mapping ID to which it belongs.
It also specifies the group in which the shard data will be present.
When we add a shard we also need to specify if it will be ENABLED for operations on shards.
A RANGE shard definition also specifies a lower_bound which represents the least value of the shard key that will be present in a particular shard. In this case we assume that the lower_bound=1.

Commands:

mysqlfabric sharding add_shard 1 GROUPID1 ENABLED 1

Define the Schema on the Global Group

The schema for the tables is defined on the master of the global group so that it gets replicated to all the shards.

Queries:

CREATE DATABASE employee;

use employee;

CREATE TABLE employee(emp_no INT PRIMARY KEY AUTO_INCREMENT, birth_date DATE, first_name VARCHAR(14), last_name VARCHAR(16), gender ENUM('M','F'), hire_date DATE);

CREATE TABLE salaries(emp_no INT, salary INT, from_date DATE, to_date DATE, FOREIGN KEY(emp_no) REFERENCES employee(emp_no));

CREATE TABLE dept_emp(emp_no INT, dept_no CHAR(4), from_date DATE, to_date DATE, FOREIGN KEY(emp_no) REFERENCES employee(emp_no));

CREATE TABLE fuel_reimb(role VARCHAR(20), allowance INTEGER);

Write the Application

The below application is written in Python. The application is very simple and uses connector python to insert a row into the table. The goal of the application is to demonstrate the extensions in the python connector that make it FABRIC aware.

Application Code:

import mysql.connector.fabric as connector

if __name__ == "__main__":

fabric_params = {

"fabric" : {"host" : "localhost", "port" : 8080},

"user" : "root", "passwd" : ""

}

__cnx = connector.MySQLFabricConnection(**fabric_params)

__cnx.set_property(key=str(1),tables=["employee.employee"])

cur = __cnx.cursor()

cur.execute("use employee")

cur.execute("INSERT INTO employee(birth_date, first_name, last_name, gender, hire_date) VALUES(NULL, 'A', 'B', 'M', NULL)")

__cnx.commit()

Explanation:

fabric_params = {

"fabric" : {"host" : "localhost", "port" : 8080},

"user" : "root", "passwd" : ""

}

__cnx = connector.MySQLFabricConnection(**fabric_params)

The above code gets a connection to the running Fabric XML-RPC server. The parameters that are passed to the MySQLFabricConnection class are

“host” - The hostname at which the Fabric server runs.

“port” - The port number at which the Fabric server runs.

“user”, “passwd” - Login credentials for the Fabric state store.

__cnx.set_property(key=str(1),tables=["employee.employee"])

key – The shard key value

tables – The table name being sharded.

set_property automatically redirects internally to the correct shard on which the queries are executed.

If a query needs to be executed on all the shards, example an update on a global table or a schema change operation on all the shards, then add a parameter

scope = “GLOBAL”

to signify global operations. This is set of “LOCAL” by default.

Adding more Shards

As the size of the shard increases, we add more shards to the sharding topology.

Commands:

Add a group for the shard

mysqlfabric group create GROUPID2

mysqlfabric group add GROUPID2 localhost:13005 root ""

mysqlfabric group add GROUPID2 localhost:13006 root ""

mysqlfabric group lookup_servers GROUPID2

mysqlfabric group promote <UUID of MySQLServer running on localhost:13005>

Add the shard definition

The difference from the previous definition will be that the lower_bound will point to employee number from which we start inserting into this shard. In this case we choose a hypothetical value of 10001. This means that starting 10001 the employee information will be stored in this shard.

mysqlfabric sharding add_shard 1 GROUPID2 ENABLED 10001

The following will direct the application to this new shard.

__cnx.set_property(key=str(10001),tables=["employee.employee"])

In the above case any key value greater than 10001 will automatically redirect the insert to the new shard.

NOTE: The above step actually adds a new shard starting at a specified lower_bound. Other options for moving data around in a sharded setup are shard moves and splits. These shall be explored in later blogs.

Find out how to migrate from an unsharded to a sharded setup here http://vnwrites.blogspot.in/2013/09/mysqlfabric-sharding-migration.html .

VN

Saturday, September 21, 2013

MySQL Fabric - Sharding - Simple Example

Prototyping the Application on a Single MySQL Shard Setup

The Application

Starting MySQL Fabric

Creating the sharding topology

Creating the Global Group

Creating the Server Group (for the first shard)

Create a Sharding Definition

Map the tables to the sharding definition

Add the Shards

Define the Schema on the Global Group

Write the Application

Adding more Shards