Discussion:
[redis-db] Redis Cluster Becomes Unavailable After n/2 Master Failures
Du M
2018-10-30 11:25:18 UTC
Permalink
Hi,

We have set up a 12 node Redis cluster with 6 Masters and 6 replicas.

While testing failover, we stopped (using shutdown save) 1, 2, and 3
masters on purpose. Cluster recovers from simultaneous failure of 1 and 2
masters, but when 3 Masters are down simultaneously, cluster becomes
unavailable (client gets "CLUSTERDOWN The cluster is down" error).

Is this expected behaviour?

Redis Cluster Spec <https://redis.io/topics/cluster-spec> says majority of
Masters must be available for any recovery.

Just to be sure, does this mean that at least ceil(n/2 + 1) masters must be
available for cluster to be available and any recovery, where n is the
original number of Masters in Cluster?
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+***@googlegroups.com.
To post to this group, send email to redis-***@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
Salvatore Sanfilippo
2018-10-30 11:38:41 UTC
Permalink
Hello, if you have 6 masters the majority is 4, so with 3 it is not
possible to continue because otherwise two sides of the cluster would
potentially continue.
ceil(n/2+1) correctly defines the majority of N nodes indeed. Note
that using the replicas migration feature, it is possible in case of
non-simultaneous failures to resist to more failure events.
Post by Du M
Hi,
We have set up a 12 node Redis cluster with 6 Masters and 6 replicas.
While testing failover, we stopped (using shutdown save) 1, 2, and 3 masters on purpose. Cluster recovers from simultaneous failure of 1 and 2 masters, but when 3 Masters are down simultaneously, cluster becomes unavailable (client gets "CLUSTERDOWN The cluster is down" error).
Is this expected behaviour?
Redis Cluster Spec says majority of Masters must be available for any recovery.
Just to be sure, does this mean that at least ceil(n/2 + 1) masters must be available for cluster to be available and any recovery, where n is the original number of Masters in Cluster?
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
--
Salvatore 'antirez' Sanfilippo
open source developer - Redis Labs https://redislabs.com

"If a system is to have conceptual integrity, someone must control the
concepts."
— Fred Brooks, "The Mythical Man-Month", 1975.
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+***@googlegroups.com.
To post to this group, send email to redis-***@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
Юрий Соколов
2018-10-30 18:00:01 UTC
Permalink
I'm interesting in other question: why only masters counts? Why replicas
doesn't participate in majority?
Post by Salvatore Sanfilippo
Hello, if you have 6 masters the majority is 4, so with 3 it is not
possible to continue because otherwise two sides of the cluster would
potentially continue.
ceil(n/2+1) correctly defines the majority of N nodes indeed. Note
that using the replicas migration feature, it is possible in case of
non-simultaneous failures to resist to more failure events.
Post by Du M
Hi,
We have set up a 12 node Redis cluster with 6 Masters and 6 replicas.
While testing failover, we stopped (using shutdown save) 1, 2, and 3
masters on purpose. Cluster recovers from simultaneous failure of 1 and 2
masters, but when 3 Masters are down simultaneously, cluster becomes
unavailable (client gets "CLUSTERDOWN The cluster is down" error).
Post by Du M
Is this expected behaviour?
Redis Cluster Spec says majority of Masters must be available for any
recovery.
Post by Du M
Just to be sure, does this mean that at least ceil(n/2 + 1) masters must
be available for cluster to be available and any recovery, where n is the
original number of Masters in Cluster?
Post by Du M
--
You received this message because you are subscribed to the Google
Groups "Redis DB" group.
Post by Du M
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
--
Salvatore 'antirez' Sanfilippo
open source developer - Redis Labs https://redislabs.com
"If a system is to have conceptual integrity, someone must control the
concepts."
— Fred Brooks, "The Mythical Man-Month", 1975.
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+***@googlegroups.com.
To post to this group, send email to redis-***@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
Sidd S
2018-11-08 18:01:23 UTC
Permalink
I am also curious about this. Why should Redis Cluster need to have
consensus from a majority of master nodes? Are there any limitations of
having the replicas participate in the majority consensus?
Post by Юрий Соколов
I'm interesting in other question: why only masters counts? Why replicas
doesn't participate in majority?
Post by Salvatore Sanfilippo
Hello, if you have 6 masters the majority is 4, so with 3 it is not
possible to continue because otherwise two sides of the cluster would
potentially continue.
ceil(n/2+1) correctly defines the majority of N nodes indeed. Note
that using the replicas migration feature, it is possible in case of
non-simultaneous failures to resist to more failure events.
Post by Du M
Hi,
We have set up a 12 node Redis cluster with 6 Masters and 6 replicas.
While testing failover, we stopped (using shutdown save) 1, 2, and 3
masters on purpose. Cluster recovers from simultaneous failure of 1 and 2
masters, but when 3 Masters are down simultaneously, cluster becomes
unavailable (client gets "CLUSTERDOWN The cluster is down" error).
Post by Du M
Is this expected behaviour?
Redis Cluster Spec says majority of Masters must be available for any
recovery.
Post by Du M
Just to be sure, does this mean that at least ceil(n/2 + 1) masters
must be available for cluster to be available and any recovery, where n is
the original number of Masters in Cluster?
Post by Du M
--
You received this message because you are subscribed to the Google
Groups "Redis DB" group.
Post by Du M
To unsubscribe from this group and stop receiving emails from it, send
<javascript:>.
Post by Du M
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
--
Salvatore 'antirez' Sanfilippo
open source developer - Redis Labs https://redislabs.com
"If a system is to have conceptual integrity, someone must control the
concepts."
— Fred Brooks, "The Mythical Man-Month", 1975.
--
You received this message because you are subscribed to the Google Groups
"Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an
<javascript:>.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+***@googlegroups.com.
To post to this group, send email to redis-***@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
Loading...