MongoDB – Sharding | Some thoughts, ideas and fun!!!

MongoDB – Sharding

Why use Sharding

Replica sets offers you certain features as follows (from MongoDB Site):

Scaling out to thousands of nodes
Easy addition of new machines
Automatic balancing for changes in load and data distribution
Zero single points of failure
Automatic failover

All this and much more can be gleaned from the MongoDB web site here.

How do I create a Shard

Now to move into how you would go about creating a sharding setup. Basically you would want two servers to start with. Mongo allows you to add more as and when you need them. This section will simply give you the short commands that you should run from Command Prompt in windows. I will then discuss the various parameters on the command and why they’re important to know about. Once again, all this information can be found on the Mongo site, but if you’re new to Mongo like myself and you just want a quick overview and set up so you can start playing with this “new” technology, then this guide will help you get up and running quickly. Please follow the steps on my Introduction page to get MongoDB downloaded and running before attempting this step in the process.

To set this up do as follows:

Create a batch file named: RunServer10000.bat to the root of your MongoDB installation’s bin folder.
Add the following lines to the batch file:
mkdir “data/localhost10000”
mongod –rest –shardsvr –port 10000 –dbpath data/localhost10000 –logpath data/localhost10000/log.txt
PAUSE
Create another batch file named: RunServer10001.bat to the root of your MongoDB installation’s bin folder.
Add the following lines to the batch file:
mkdir “data/localhost10001”
mongod –rest –shardsvr –port 10001 –dbpath data/localhost10001 –logpath data/localhost10001/log.txt
PAUSE
Run RunServer10000.bat
Run RunServer10001.bat

That’s it now you should have two replica set instances running. In the next section I’ll discuss how to configure the collections in the shards. For now, let’s go through what we’ve just done.

If you open up the batch file called RunServer10000.bat in Notepad, then you’ll see four lines explained as follows:

mkdir “data/localhost10000”
If you’re not familiar with Command Line scripts then the simple answer to this one is that it creates the directory structure “data/localhost1000” for you under your default Mongo bin directory.
mongod –rest –shardsvr –port 10000 –dbpath data/localhost10000 –logpath data/localhost10000/log.txt
This has a couple of bits to take into consideration as follows:
1. mongod
  This runs the Mongo Database instance.
2. –rest
  This runs the Mongo database in a restful manner which will allow you to open the web admin UI that comes built in with MongoDB usually found at: http://localhost:11000/ if you run MongoDB on port 10000. You should be able to pick up the port number for the admin UI in the log file which I’ll explain in point below.
3. –shardsvr
  This enables sharding on this specific instance, which I’ll explain in more detail later.
4. –port 10000
  Specifies the port on which this specific instance of the mongod.exe should run. This is the port you’ll use to connect to this instance.
5. –dbpath data/localhost10000
  This specifies where to put our database. If you don’t add this parameter Mongo will try and create the database in the default location of “c:\data\db” and will not start if that directory structure does not exist. Personally I don’t like things on my C drive, which is why I’m specifying this parameter and ensuring that the directory structure exists by adding it to my batch file as specified above.
6. –logpath data/localhost10000/log.txt
  This specifies where Mongo should write it’s log files for this specific instance. Personally I like having my log file and my database files in the same directory if I’m running in development/test mode, because it keeps things nice and together. In Production I would split the logging off to its own drive for speed and backup purposes.
7. 3. PAUSE
  This is only in the batch file because I want to see if something went wrong and it’s not written to the Mongo log file. You can remove it…

If you would like to get more information you can go here for further reading. For this example though I’m not going to go into more detail seeing that I’m trying to explain things in as simple a manner as possible to get you up and running as fast as possible.

How do I set up a Configuration Server

A configuration database instance is where you’ll be saving the configuration for your shards. To set this up do as follows:

Create a batch file named: RunConfigServer10002.bat to the root of your MongoDB installation’s bin folder.
Add the following lines to the batch file:
mkdir “data/localhost10002”
mongod –rest –port 10002 –dbpath data/localhost10002 –logpath data/localhost10002/log.txt
PAUSE
Run RunServer10002.bat

The only setting that changes when we’re running a Configuration database instance is the “–configsvr” parameter that get added and the “–enablesharding” parameter that falls away. The “–configsvr” parameter let’s the mongod process know that this instance will be used for configuration.

How do I set up a Routing Service

A routing service is the service which will enable our applications to connect to the shard. On startup it reads the configuration in the configuration database to understand how the underlying shards are configured. To set this up do as follow:

Create a batch file named: RunRoutingService.bat to the root of your MongoDB installation’s bin folder.
Add the following lines to the batch file:
mongos –configdb localhost:10002 > tmp/ RunRoutingService.txt
PAUSE
Run RunRoutingService.bat

There are a few new parameters to note here as follows:

mongos –port 10003 –configdb localhost:10002 > tmp/ RunRoutingService.txt
This has a couple of bits to take into consideration as follows:
1. mongos
  This is the routing process that we’ll be connecting to from our application. This service will potentially be installed on every AppServer.
2. –port 10003
  This specifies the port on which the routing service will be running.
3. –configdb localhost:10002
  This parameter tells the routing process which Configuration Server to connect to for it’s configuration.
4. > tmp/ RunRoutingService.txt
  This simply tells the routing service where to write it’s logs to.

Now that you have your shards up and running, your configuration database has been set up and your routing service is connected to it you can start configuring collections and databases for sharding.

How do I configure collections for Sharding

Now that we have our two “servers” named “localhost:10000” and “localhost:10001” and our configuration server as well as our routing service up and running we should configure them so they can start talking to each other. To make this process running the MongoDB console against routing service a bit easier you need to do as follows:

Create a batch file named: RunMongoDBConsoleRoutingService.bat to the root of your MongoDB installation’s bin folder.
Add the following lines to the batch file:
mongo.exe localhost:10003
PAUSE

Now that you have that let me quickly explain what it means.

1. mongo.exe
This is the application which gives you the MongoDB Console where you can do all kinds of funky things which you can read about on Mongo’s site.
2. localhost:10003
This specifies which instance you want to connect to. In our case we want to connect to the routing service which is running on port 10003.

Once you have this up and running you should see something like:

MongoDB shell version: 1.6.2
connecting to: localhost:10000/test
>

Now that you have the console running you can type in the following:

use admin
This will switch you to the admin database.
db.runCommand( { addshard : “localhost:10000”, name : “shard10000” } );
This will add the server “localhost:10000” to our cluster. The name that we specify is an optional parameter which you can leave out. MongoDB will assign a name automatically, but I like naming my shards, because then I can name it something that will make sense to me…
db.runCommand( { addshard : “localhost:10001”, name : “shard10001” } );
This will add our second server to the shard configuration.

Now that we’ve added the two servers to our sharding configuration we have to specify what we want to shard. There are two options when it comes to our data, 1) is that we can shard whole databases; 2) we can shard only certain collections. I’m not going to go too much into detail and will be jumping straight into the how.

To shard a database you can run the following command:

db.runCommand( { enablesharding : “” } );

If you do sharding on a database level then the routing service will automatically place the collections of the database on different shards. This is all handled by the routing service and you will never know where your collections are stored. In this scenario each collection will only exist on one shard at a time and could be move around without you knowing about it.

To shard a collection you can run the following command:

db.runCommand( { enablesharding : “”, key: } );

In this scenario you will be splitting the collections based on the key pattern that you’ve specified. I’ll explain how this works in a location based scenario a bit later on. For now, you need to know that while MongoDB allows you to specify keys to split the underlying collection in the database, it does not allow you to specify which shard to write to. So, if you set this up MongoDB will decide where to place the data. There is a way to force this to happen here, but that doesn’t mean that MongoDB will keep that configuration.

How do I add a new shard server to the configuration:

Now that you have your cluster of two servers up and running you may need to add another server into the mix. This is extremely simple as follows:

Create a batch file named: RunServer10004.bat to the root of your MongoDB installation’s bin folder.
Add the following lines to the batch file:
mkdir “data/localhost10004”
mongod –rest –shardsvr –port 10004 –dbpath data/localhost10004 –logpath data/localhost10004/log.txt
PAUSE
Run RunServer10004.bat

That should run your new sharding server instance. Now to configure it all you have to do is run the batch file you created in the “How do I configure the set” section above called: RunMongoDBConsoleRoutingService.bat. This will start the MongoDB console by connecting to our routing service where you should then run the following simple command:

db.runCommand( { addshard : “localhost:10004”, name : “shard10004” } );

This will configure this new server for you in the cluster and the routing service will start adding collections to this new shard.

How to configure a shardkeypatternobject

In this scenario I will be using the following Object Model to demonstrate how you can use a shardkeypatternobject to do location based sharding on your cluster. This means that data for South Africa will be available on one shard and data for the UK will be available on a different shard. I’ll explain how you can query these two shards separately and how you can have a global view of data through the routing service. After everything is done, I’ll explain how you can move a collection from one shard to another one, but do remember that this can change based on what the routing service decides to do.

If our object model looks as follows:

Then we would want to do Sharding based on the “Country” property on the “Customer” class. In our scenario we would like to do location based sharding which means that we would want to create the key using the “Country” property. In this scenario my database’s name will be “Customers” and my collection will be “Customer” which makes my namespace “Customers.Customer”. So, if you run the MongoDB console against our routing service by running “RunMongoDBConsoleRoutingService.bat” file then we can do the following steps:

We’re going to set variables equal to our three databases as follows:
db = db.getSisterDB( “test” );
config = db.getSisterDB ( “config” );
admin = db.getSisterDB ( “admin” );
Once we have our databases in variables we insert a few documents as follows:
1. db.Customers.save({“MongoId”:”7ce9ce33-1b8b-4afa-bd54-7e0547ddb6d5″,”CustomerId”:1,”FirstName”:”John”,”Country”:”South Africa”});
2. db.Customers.save({“MongoId”:”7ce9ce33-1b8b-4afa-bd54-7e0547ddb6d6″,”CustomerId”:2,”FirstName”:”Mary”,”Country”:”England”});
3. db.Customers.save({“MongoId”:”7ce9ce33-1b8b-4afa-bd54-7e0547ddb6d7″,”CustomerId”:3,”FirstName”:”Mike”,”Country”:”South Africa”});
4. Now that we have some data in our “Customers” database, we’ll enable the database for sharding like so:
  db.runCommand( { enablesharding : “Customers” } );
5. MongoDB expects an Index on the property that you would like to use in your key. This is an optional step whereas if you don’t define an Index, then MongoDB will do it for you. You can create it as follows:
  db.Customers.ensureIndex( { “Country” : 1 } );
6. Now that we have some data and our database “Customers” is enabled for sharding, we create the key as follows:
  admin.runCommand( { shardcollection : “Customers.Customers” , key : { Country : 1 } } );

That’s it. Now you have a key on which Sharding will happen. You can have a look here on the MongoDB web site if you are looking for more information.

Conclusion

The only drawback that I can see with sharding in MongoDB is that you can’t specify where the data should be placed. You can force it, but this may be changed as the routing service sees fit and the collections could be moved. What is really nice is that the routing service knows where to go fetch the data and how to aggregate it all back to your application. What you have to take into consideration with MongoDB is that was built with fail-over and high availability in mind. Sharding in combination with Replica Sets gives you and extremely powerful toolset to play with. I’ll be covering these two toolsets in combination in an upcoming article.

Comments (1) Leave a comment

Manjesh

December 2, 2011 at 14:38

Reply

Great job !!!

The only hurdle I faced during the setting up shard on my localhost is… we should add allowLocal to the adShard command else Mongo ignores sharding on local …

db.runCommand( { addshard : “localhost:10000″, name : “shard10000″ ,allowLocal:true} );

No trackbacks yet.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Some thoughts, ideas and fun!!!