Possible split-brain when using Sentinel

Discussion:

Andrei Lukovenko

2014-05-28 08:32:07 UTC

Hi,

Consider the following configuration:
* master A (slaveof no one)
* slave B (slaveof master-A-ip master-A-port)
* slave C (slaveof master-A-ip master-A-port)
* Sentinel (quorum=1), considers A as the master

If master A fails, Sentinel promotes one of the slaves, and change
configs accordingly. So the configuration becomes:
* master A (down, slaveof no one)
* master B (slaveof no one)
* slave C (slaveof master-B-ip master-B-port)
* Sentinel (quorum=1), considers B as the master

Now for some reason Sentinel goes down and restarts. As sentinel.conf has
not been rewritten, Sentinel still thinks about host A being the master. As
host A is down, the system becomes unresponsive, and our system
administrator recovers host A:
* master A (slaveof no one)
* master B (slaveof no one)
* slave C (slaveof master-B-ip master-B-port)
* Sentinel (quorum=1), considers A as the master

It is a very possible case of split brain, and currently there is no
straight way do avoid it. I have considered the following workarounds:
a) Restoring redis.conf to the original state before restarting an
instance. Not good, as we lose all the benefits of rewriting it.
b) Manually resolving these conflicts.

In my opinion, it would've been better if either:
a) We could explicitly describe our network configuration, including slaves
in sentinel.conf. Then after restarting a sentinel it would turn B and C to
slaves of A.
b) Sentinel would rewrite sentinel.conf after changing configuration. In
this example, after promoting B to master and changing B's and C's config,
it would also rewrite it's own config to consider B as the master.

What do you think?

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to redis-db-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Salvatore Sanfilippo

2014-05-28 12:19:51 UTC

Permalink

Hello,

what you describe is AFAIK not possible, but moreover it is not
technically what is called a split-brain condition.
A split brain condition happens when multiple processes should agree
about some value, but instead they don't agree and actually have two
distinct values.
In eventually consistent systems like Sentinel, split-brain conditions
are possible during partitions, but there is the guarantee that when
all the partitions heal, all the sentinels agree about what is the
master.

What instead you describe is a loss of state during a crash-recovery
event. However AFAIK this is not possible because when a new
configuration is considered to be valid (we receive an acknowledge
examining the INFO output that the promoted slave actually turned into
master role), the configuration is persisted and fsync()-ed on disk
before it to be propagated to other nodes, or advertised by Sentinel
to any client.

However it is possible that the Sentinel sends a SLAVEOF NO ONE to the
promoted slave, and restarts before the slave is able to confirm the
role change.
But this case is exactly like a Sentinel observing a switch of a slave
from slave to master operated externally (for instance, manually).

In this case, because of the Sentinel liveness property to always try
to set the current logical configuration if there are instances
diverging from this configuration, such a slave with role equal to
master, is, after a small delay, converted back to slave role.

Regards,
Salvatore

Post by Andrei Lukovenko
Hi,
* master A (slaveof no one)
* slave B (slaveof master-A-ip master-A-port)
* slave C (slaveof master-A-ip master-A-port)
* Sentinel (quorum=1), considers A as the master
If master A fails, Sentinel promotes one of the slaves, and change configs
* master A (down, slaveof no one)
* master B (slaveof no one)
* slave C (slaveof master-B-ip master-B-port)
* Sentinel (quorum=1), considers B as the master
Now for some reason Sentinel goes down and restarts. As sentinel.conf has
not been rewritten, Sentinel still thinks about host A being the master. As
host A is down, the system becomes unresponsive, and our system
* master A (slaveof no one)
* master B (slaveof no one)
* slave C (slaveof master-B-ip master-B-port)
* Sentinel (quorum=1), considers A as the master
It is a very possible case of split brain, and currently there is no
a) Restoring redis.conf to the original state before restarting an instance.
Not good, as we lose all the benefits of rewriting it.
b) Manually resolving these conflicts.
a) We could explicitly describe our network configuration, including slaves
in sentinel.conf. Then after restarting a sentinel it would turn B and C to
slaves of A.
b) Sentinel would rewrite sentinel.conf after changing configuration. In
this example, after promoting B to master and changing B's and C's config,
it would also rewrite it's own config to consider B as the master.
What do you think?
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Andrei Lukovenko

2014-05-28 13:26:02 UTC

Permalink

Hello,

First of all, thank you for response.

Regarding the definition of the split-brain I am still not convinced. In
my example both instances A and B consider themselves masters. Both of them
are able to serve clients, including writes. If it is not a split-brain,
then what is?..

The sequence described above is not imaginary. I've actually seen this
exact situation during my tests, it is very real, and what I really want is
to find a way to prevent repeating this in production.

So far it seems that sentinel is able to change (and actually save on
disk) configuration of an instance (master or slave), but does not change
it's own configuration. Is that correct?

Post by Salvatore Sanfilippo
Hello,
what you describe is AFAIK not possible, but moreover it is not
technically what is called a split-brain condition.
A split brain condition happens when multiple processes should agree
about some value, but instead they don't agree and actually have two
distinct values.
In eventually consistent systems like Sentinel, split-brain conditions
are possible during partitions, but there is the guarantee that when
all the partitions heal, all the sentinels agree about what is the
master.
What instead you describe is a loss of state during a crash-recovery
event. However AFAIK this is not possible because when a new
configuration is considered to be valid (we receive an acknowledge
examining the INFO output that the promoted slave actually turned into
master role), the configuration is persisted and fsync()-ed on disk
before it to be propagated to other nodes, or advertised by Sentinel
to any client.
However it is possible that the Sentinel sends a SLAVEOF NO ONE to the
promoted slave, and restarts before the slave is able to confirm the
role change.
But this case is exactly like a Sentinel observing a switch of a slave
from slave to master operated externally (for instance, manually).
In this case, because of the Sentinel liveness property to always try
to set the current logical configuration if there are instances
diverging from this configuration, such a slave with role equal to
master, is, after a small delay, converted back to slave role.
Regards,
Salvatore

configs

Post by Andrei Lukovenko
* master A (down, slaveof no one)
* master B (slaveof no one)
* slave C (slaveof master-B-ip master-B-port)
* Sentinel (quorum=1), considers B as the master
Now for some reason Sentinel goes down and restarts. As sentinel.conf

has

Post by Andrei Lukovenko
not been rewritten, Sentinel still thinks about host A being the master.

Post by Andrei Lukovenko
host A is down, the system becomes unresponsive, and our system
* master A (slaveof no one)
* master B (slaveof no one)
* slave C (slaveof master-B-ip master-B-port)
* Sentinel (quorum=1), considers A as the master
It is a very possible case of split brain, and currently there is no
a) Restoring redis.conf to the original state before restarting an

instance.

Post by Andrei Lukovenko
Not good, as we lose all the benefits of rewriting it.
b) Manually resolving these conflicts.
a) We could explicitly describe our network configuration, including

slaves

Post by Andrei Lukovenko
in sentinel.conf. Then after restarting a sentinel it would turn B and C

Post by Andrei Lukovenko
slaves of A.
b) Sentinel would rewrite sentinel.conf after changing configuration. In
this example, after promoting B to master and changing B's and C's

config,

Post by Andrei Lukovenko
it would also rewrite it's own config to consider B as the master.
What do you think?
--
You received this message because you are subscribed to the Google Groups
"Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org
To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
â Wikipedia (Straw man page)
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

--
Best regards, Andrei
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to redis-db-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Salvatore Sanfilippo

2014-05-28 13:36:11 UTC

Permalink

Post by Salvatore Sanfilippo
Hello,
First of all, thank you for response.
Regarding the definition of the split-brain I am still not convinced. In
my example both instances A and B consider themselves masters. Both of them
are able to serve clients, including writes. If it is not a split-brain,
then what is?..

Split brain conditions must be evaluated from the point of view of who
should be the source of authority in a distributed system.
In this case, it is the set of Sentinel instances, so as long as there
is no split-brain condition in the Sentinels themselves, the split
brain condition you see in the Redis instances is not a problem
because of the Sentinel property to always (with a delay) set the
logical configuration as the instances configuration.

Post by Salvatore Sanfilippo
The sequence described above is not imaginary. I've actually seen this
exact situation during my tests, it is very real, and what I really want is
to find a way to prevent repeating this in production.

Probably what you observed is what I described in the previous email?
That's definitely possible.

1) A failover starts.
2) The Sentinel sends SLAVEOF NO ONE to the slave.
3) The Sentinel gets killed before getting the acknowledge.
4) The Sentinel restarts with the old config (which is correct since
the previous failover was not technically finished, and the Sentinel
never advertised the new master).

At this point you have two masters if you check the instances, but for
Sentinel the master is still the old one.
After some time (8 seconds, which is, four times the configuration
broadcasting period) it should detect that one of the slaves is
misconfigured, and reconfigure it accordingly, if this does not happen
there is a bug.

All this, of course, in Sentinel >= 2.8.
Sentinel shipped with 2.6 is broken and deprecated. Actually in the
latest 2.6 branch it is a dummy binary that warns you to use 2.8.

Post by Salvatore Sanfilippo
So far it seems that sentinel is able to change (and actually save on
disk) configuration of an instance (master or slave), but does not change
it's own configuration. Is that correct?

Yes and no. It does not save the new configuration on purpose, because
it still did not received the acknowledge.
But here what is interesting is that, it saves the updated
configuration (with fsync) always *before* of advertising the new
configuration to clients and other Sentinels.

If it is not able to get the ack, it will reconfigure the new master
again back to slave.

If this does not happen, than there is a bug in the implementation,
but the designed semantics is very clear, the problem is if you find a
case where because of an implementation bug things does not work as
expected.

I'm trying to reproduce right now. Thanks for posting, it is vital
that we try to remove all the bugs in order to end with a system that
acts like the specification claims.

Salvatore