Difference between Sharding And Replication on MongoDB

Replica-Set means that you have multiple instances of MongoDB which each mirror all the data of each other. A replica-set consists of one Master (also called “Primary”) and one or more Slaves (aka Secondary). Read-operations can be served by any slave, so you can increase read-performance by adding more slaves to the replica-set. But write-operations always take place on the master of the replica-set and are then propagated to the slaves, so writes won’t get faster when you add more slaves.

Replica-sets also offer fault-tolerance. When one of the members of the replica-set goes down, the others take over. When the master goes down, the slaves will elect a new master. For that reason it is suggested for productive deployment to always use MongoDB as a replica-set of at least three servers, two of them holding data (the third one is a data-less “arbiter” which is required for determining a new master when one of the slaves goes down).

Sharded Cluster means that each shard of the cluster (which can also be a replica-set) takes care of a part of the data. Each request, both reads and writes, is served by the cluster where the data resides, This means that both read- and write performance can be increased by adding more shards to a cluster. Which document resides on which shard is determined by the shard key of each collection. It should be chosen in a way that the data can be evenly distributed on all clusters and so that it is clear for the most common queries where the shard-key resides (example: when you frequently query by user_name, your shard-key should include the field user_name so each query can be delegated to only the one shard which has that document).

The drawback is that the fault-tolerance suffers. When one shard of the cluster goes down, any data on it is inaccessible. For that reason each member of the cluster should also be a replica-set. This is not required. When you don’t care about high-availability, a shard can also be a single mongod instance without replication. But for production-use you should always use replication.

So what does that mean for your example?
                            Sharded Cluster             
             /                    |                    \
      Shard A                  Shard B                  Shard C
        / \                      / \                      / \
+-------+ +---------+    +-------+ +---------+    +-------+ +---------+
|Primary| |Secondary|    |Primary| |Secondary|    |Primary| |Secondary|
|  25GB |=| 25GB    |    | 25 GB |=| 25 GB   |    | 25GB  |=| 25GB    |   
+-------+ +---------+    +-------+ +---------+    +-------+ +---------+

When you want to split your data of 75GB into 3 shards of 25GB each, you need at least 6 database servers organized in three replica-sets. Each replica-set consists of two servers who have the same 25GB of data.

You also need servers for the arbiters of the three replica-sets as well as the mongos router and the config server for the shard. The arbiters are very lightweight and are only needed when a replica-set member goes down, so they can usually share the same hardware with something else. But Mongos router and config-server should be redundant and on their own servers.

It’s copy from stackexchange.The source here: http://dba.stackexchange.com/questions/52632/difference-between-sharding-and-replication-on-mongodb

Advertisements