A Replica-Set means that you have multiple instances of MongoDB which each mirror all the data of each other. A replica-set consists of one Master (also called “Primary”) and one or more Slaves (aka Secondary). Read-operations can be served by any slave, so you can increase read-performance by adding more slaves to the replica-set. But write-operations always take place on the master of the replica-set and are then propagated to the slaves, so writes won’t get faster when you add more slaves.
Replica-sets also offer fault-tolerance. When one of the members of the replica-set goes down, the others take over. When the master goes down, the slaves will elect a new master. For that reason it is suggested for productive deployment to always use MongoDB as a replica-set of at least three servers, two of them holding data (the third one is a data-less “arbiter” which is required for determining a new master when one of the slaves goes down).
A Sharded Cluster means that each shard of the cluster (which can also be a replica-set) takes care of a part of the data. Each request, both reads and writes, is served by the cluster where the data resides, This means that both read- and write performance can be increased by adding more shards to a cluster. Which document resides on which shard is determined by the shard key of each collection. It should be chosen in a way that the data can be evenly distributed on all clusters and so that it is clear for the most common queries where the shard-key resides (example: when you frequently query by
user_name, your shard-key should include the field
user_name so each query can be delegated to only the one shard which has that document).
The drawback is that the fault-tolerance suffers. When one shard of the cluster goes down, any data on it is inaccessible. For that reason each member of the cluster should also be a replica-set. This is not required. When you don’t care about high-availability, a shard can also be a single mongod instance without replication. But for production-use you should always use replication.
So what does that mean for your example?
Sharded Cluster / | \ Shard A Shard B Shard C / \ / \ / \ +-------+ +---------+ +-------+ +---------+ +-------+ +---------+ |Primary| |Secondary| |Primary| |Secondary| |Primary| |Secondary| | 25GB |=| 25GB | | 25 GB |=| 25 GB | | 25GB |=| 25GB | +-------+ +---------+ +-------+ +---------+ +-------+ +---------+
When you want to split your data of 75GB into 3 shards of 25GB each, you need at least 6 database servers organized in three replica-sets. Each replica-set consists of two servers who have the same 25GB of data.
You also need servers for the arbiters of the three replica-sets as well as the mongos router and the config server for the shard. The arbiters are very lightweight and are only needed when a replica-set member goes down, so they can usually share the same hardware with something else. But Mongos router and config-server should be redundant and on their own servers.
It’s copy from stackexchange.The source here: http://dba.stackexchange.com/questions/52632/difference-between-sharding-and-replication-on-mongodb