What is a Replica Set?
A replica set is a group of database servers that maintain the same dataset to provide redundancy and high availability. Essentially, it’s a way to make your database fault-tolerant: if one server fails, another can take over with minimal downtime.
Key Points:
- Primary: Receives all write operations. There is only one primary at a time.
- Secondary: Copies data from the primary (replication). Can serve read operations depending on configuration.
- Arbiter: Optional member that participates in elections but doesn’t store data. Helps in deciding a new primary if the current one fails.
How Replica Sets Work
- Replication:
- Secondary nodes continuously replicate data from the primary node.
- Replication is usually asynchronous, but some databases can configure it to be synchronous for stronger consistency.
- Automatic Failover:
- If the primary goes down, the secondaries hold an election to choose a new primary automatically.
- Clients can reconnect to the new primary without manual intervention.
- Read & Write Operations:
- Writes: Always go to the primary.
- Reads: Can go to the primary or secondaries depending on the read preference.
Advantages of Replica Sets
- High Availability: System remains online even if a server fails.
- Data Redundancy: Multiple copies prevent data loss.
- Scalability: Read operations can be distributed across secondaries.
- Disaster Recovery: Can place replicas in different data centers.
Example in MongoDB
Suppose we have three nodes:
Primary: db1
Secondary: db2
Secondary: db3
- db1 receives all writes.
- db2 and db3 replicate data from db1.
- If db1 fails, db2 and db3 vote, and one of them becomes the new primary automatically.
Visual Representation
+----------------+
| Primary |
| db1 |
+----------------+
/ \
/ \
+----------------+ +----------------+
| Secondary | | Secondary |
| db2 | | db3 |
+----------------+ +----------------+
Takeaway
A replica set ensures your database can handle failures gracefully and scale reads efficiently. It’s a cornerstone of modern distributed database design.