Discussion:
Possible split-brain when using Sentinel
Andrei Lukovenko
2014-05-28 08:32:07 UTC
Permalink
Hi,

Consider the following configuration:
* master A (slaveof no one)
* slave B (slaveof master-A-ip master-A-port)
* slave C (slaveof master-A-ip master-A-port)
* Sentinel (quorum=1), considers A as the master

If master A fails, Sentinel promotes one of the slaves, and change
configs accordingly. So the configuration becomes:
* master A (down, slaveof no one)
* master B (slaveof no one)
* slave C (slaveof master-B-ip master-B-port)
* Sentinel (quorum=1), considers B as the master

Now for some reason Sentinel goes down and restarts. As sentinel.conf has
not been rewritten, Sentinel still thinks about host A being the master. As
host A is down, the system becomes unresponsive, and our system
administrator recovers host A:
* master A (slaveof no one)
* master B (slaveof no one)
* slave C (slaveof master-B-ip master-B-port)
* Sentinel (quorum=1), considers A as the master

It is a very possible case of split brain, and currently there is no
straight way do avoid it. I have considered the following workarounds:
a) Restoring redis.conf to the original state before restarting an
instance. Not good, as we lose all the benefits of rewriting it.
b) Manually resolving these conflicts.

In my opinion, it would've been better if either:
a) We could explicitly describe our network configuration, including slaves
in sentinel.conf. Then after restarting a sentinel it would turn B and C to
slaves of A.
b) Sentinel would rewrite sentinel.conf after changing configuration. In
this example, after promoting B to master and changing B's and C's config,
it would also rewrite it's own config to consider B as the master.

What do you think?
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to redis-db-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
Salvatore Sanfilippo
2014-05-28 12:19:51 UTC
Permalink
Hello,

what you describe is AFAIK not possible, but moreover it is not
technically what is called a split-brain condition.
A split brain condition happens when multiple processes should agree
about some value, but instead they don't agree and actually have two
distinct values.
In eventually consistent systems like Sentinel, split-brain conditions
are possible during partitions, but there is the guarantee that when
all the partitions heal, all the sentinels agree about what is the
master.

What instead you describe is a loss of state during a crash-recovery
event. However AFAIK this is not possible because when a new
configuration is considered to be valid (we receive an acknowledge
examining the INFO output that the promoted slave actually turned into
master role), the configuration is persisted and fsync()-ed on disk
before it to be propagated to other nodes, or advertised by Sentinel
to any client.

However it is possible that the Sentinel sends a SLAVEOF NO ONE to the
promoted slave, and restarts before the slave is able to confirm the
role change.
But this case is exactly like a Sentinel observing a switch of a slave
from slave to master operated externally (for instance, manually).

In this case, because of the Sentinel liveness property to always try
to set the current logical configuration if there are instances
diverging from this configuration, such a slave with role equal to
master, is, after a small delay, converted back to slave role.

Regards,
Salvatore
Post by Andrei Lukovenko
Hi,
* master A (slaveof no one)
* slave B (slaveof master-A-ip master-A-port)
* slave C (slaveof master-A-ip master-A-port)
* Sentinel (quorum=1), considers A as the master
If master A fails, Sentinel promotes one of the slaves, and change configs
* master A (down, slaveof no one)
* master B (slaveof no one)
* slave C (slaveof master-B-ip master-B-port)
* Sentinel (quorum=1), considers B as the master
Now for some reason Sentinel goes down and restarts. As sentinel.conf has
not been rewritten, Sentinel still thinks about host A being the master. As
host A is down, the system becomes unresponsive, and our system
* master A (slaveof no one)
* master B (slaveof no one)
* slave C (slaveof master-B-ip master-B-port)
* Sentinel (quorum=1), considers A as the master
It is a very possible case of split brain, and currently there is no
a) Restoring redis.conf to the original state before restarting an instance.
Not good, as we lose all the benefits of rewriting it.
b) Manually resolving these conflicts.
a) We could explicitly describe our network configuration, including slaves
in sentinel.conf. Then after restarting a sentinel it would turn B and C to
slaves of A.
b) Sentinel would rewrite sentinel.conf after changing configuration. In
this example, after promoting B to master and changing B's and C's config,
it would also rewrite it's own config to consider B as the master.
What do you think?
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to redis-db-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
Andrei Lukovenko
2014-05-28 13:26:02 UTC
Permalink
Hello,

First of all, thank you for response.

Regarding the definition of the split-brain I am still not convinced. In
my example both instances A and B consider themselves masters. Both of them
are able to serve clients, including writes. If it is not a split-brain,
then what is?..

The sequence described above is not imaginary. I've actually seen this
exact situation during my tests, it is very real, and what I really want is
to find a way to prevent repeating this in production.

So far it seems that sentinel is able to change (and actually save on
disk) configuration of an instance (master or slave), but does not change
it's own configuration. Is that correct?
Post by Salvatore Sanfilippo
Hello,
what you describe is AFAIK not possible, but moreover it is not
technically what is called a split-brain condition.
A split brain condition happens when multiple processes should agree
about some value, but instead they don't agree and actually have two
distinct values.
In eventually consistent systems like Sentinel, split-brain conditions
are possible during partitions, but there is the guarantee that when
all the partitions heal, all the sentinels agree about what is the
master.
What instead you describe is a loss of state during a crash-recovery
event. However AFAIK this is not possible because when a new
configuration is considered to be valid (we receive an acknowledge
examining the INFO output that the promoted slave actually turned into
master role), the configuration is persisted and fsync()-ed on disk
before it to be propagated to other nodes, or advertised by Sentinel
to any client.
However it is possible that the Sentinel sends a SLAVEOF NO ONE to the
promoted slave, and restarts before the slave is able to confirm the
role change.
But this case is exactly like a Sentinel observing a switch of a slave
from slave to master operated externally (for instance, manually).
In this case, because of the Sentinel liveness property to always try
to set the current logical configuration if there are instances
diverging from this configuration, such a slave with role equal to
master, is, after a small delay, converted back to slave role.
Regards,
Salvatore
Post by Andrei Lukovenko
Hi,
* master A (slaveof no one)
* slave B (slaveof master-A-ip master-A-port)
* slave C (slaveof master-A-ip master-A-port)
* Sentinel (quorum=1), considers A as the master
If master A fails, Sentinel promotes one of the slaves, and change
configs
Post by Andrei Lukovenko
* master A (down, slaveof no one)
* master B (slaveof no one)
* slave C (slaveof master-B-ip master-B-port)
* Sentinel (quorum=1), considers B as the master
Now for some reason Sentinel goes down and restarts. As sentinel.conf
has
Post by Andrei Lukovenko
not been rewritten, Sentinel still thinks about host A being the master.
As
Post by Andrei Lukovenko
host A is down, the system becomes unresponsive, and our system
* master A (slaveof no one)
* master B (slaveof no one)
* slave C (slaveof master-B-ip master-B-port)
* Sentinel (quorum=1), considers A as the master
It is a very possible case of split brain, and currently there is no
a) Restoring redis.conf to the original state before restarting an
instance.
Post by Andrei Lukovenko
Not good, as we lose all the benefits of rewriting it.
b) Manually resolving these conflicts.
a) We could explicitly describe our network configuration, including
slaves
Post by Andrei Lukovenko
in sentinel.conf. Then after restarting a sentinel it would turn B and C
to
Post by Andrei Lukovenko
slaves of A.
b) Sentinel would rewrite sentinel.conf after changing configuration. In
this example, after promoting B to master and changing B's and C's
config,
Post by Andrei Lukovenko
it would also rewrite it's own config to consider B as the master.
What do you think?
--
You received this message because you are subscribed to the Google Groups
"Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org
To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
--
Best regards, Andrei
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to redis-db-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
Salvatore Sanfilippo
2014-05-28 13:36:11 UTC
Permalink
Post by Salvatore Sanfilippo
Hello,
First of all, thank you for response.
Regarding the definition of the split-brain I am still not convinced. In
my example both instances A and B consider themselves masters. Both of them
are able to serve clients, including writes. If it is not a split-brain,
then what is?..
Split brain conditions must be evaluated from the point of view of who
should be the source of authority in a distributed system.
In this case, it is the set of Sentinel instances, so as long as there
is no split-brain condition in the Sentinels themselves, the split
brain condition you see in the Redis instances is not a problem
because of the Sentinel property to always (with a delay) set the
logical configuration as the instances configuration.
Post by Salvatore Sanfilippo
The sequence described above is not imaginary. I've actually seen this
exact situation during my tests, it is very real, and what I really want is
to find a way to prevent repeating this in production.
Probably what you observed is what I described in the previous email?
That's definitely possible.

1) A failover starts.
2) The Sentinel sends SLAVEOF NO ONE to the slave.
3) The Sentinel gets killed before getting the acknowledge.
4) The Sentinel restarts with the old config (which is correct since
the previous failover was not technically finished, and the Sentinel
never advertised the new master).

At this point you have two masters if you check the instances, but for
Sentinel the master is still the old one.
After some time (8 seconds, which is, four times the configuration
broadcasting period) it should detect that one of the slaves is
misconfigured, and reconfigure it accordingly, if this does not happen
there is a bug.

All this, of course, in Sentinel >= 2.8.
Sentinel shipped with 2.6 is broken and deprecated. Actually in the
latest 2.6 branch it is a dummy binary that warns you to use 2.8.
Post by Salvatore Sanfilippo
So far it seems that sentinel is able to change (and actually save on
disk) configuration of an instance (master or slave), but does not change
it's own configuration. Is that correct?
Yes and no. It does not save the new configuration on purpose, because
it still did not received the acknowledge.
But here what is interesting is that, it saves the updated
configuration (with fsync) always *before* of advertising the new
configuration to clients and other Sentinels.

If it is not able to get the ack, it will reconfigure the new master
again back to slave.

If this does not happen, than there is a bug in the implementation,
but the designed semantics is very clear, the problem is if you find a
case where because of an implementation bug things does not work as
expected.

I'm trying to reproduce right now. Thanks for posting, it is vital
that we try to remove all the bugs in order to end with a system that
acts like the specification claims.

Salvatore
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Hello,
what you describe is AFAIK not possible, but moreover it is not
technically what is called a split-brain condition.
A split brain condition happens when multiple processes should agree
about some value, but instead they don't agree and actually have two
distinct values.
In eventually consistent systems like Sentinel, split-brain conditions
are possible during partitions, but there is the guarantee that when
all the partitions heal, all the sentinels agree about what is the
master.
What instead you describe is a loss of state during a crash-recovery
event. However AFAIK this is not possible because when a new
configuration is considered to be valid (we receive an acknowledge
examining the INFO output that the promoted slave actually turned into
master role), the configuration is persisted and fsync()-ed on disk
before it to be propagated to other nodes, or advertised by Sentinel
to any client.
However it is possible that the Sentinel sends a SLAVEOF NO ONE to the
promoted slave, and restarts before the slave is able to confirm the
role change.
But this case is exactly like a Sentinel observing a switch of a slave
from slave to master operated externally (for instance, manually).
In this case, because of the Sentinel liveness property to always try
to set the current logical configuration if there are instances
diverging from this configuration, such a slave with role equal to
master, is, after a small delay, converted back to slave role.
Regards,
Salvatore
Post by Andrei Lukovenko
Hi,
* master A (slaveof no one)
* slave B (slaveof master-A-ip master-A-port)
* slave C (slaveof master-A-ip master-A-port)
* Sentinel (quorum=1), considers A as the master
If master A fails, Sentinel promotes one of the slaves, and change configs
* master A (down, slaveof no one)
* master B (slaveof no one)
* slave C (slaveof master-B-ip master-B-port)
* Sentinel (quorum=1), considers B as the master
Now for some reason Sentinel goes down and restarts. As sentinel.conf has
not been rewritten, Sentinel still thinks about host A being the master. As
host A is down, the system becomes unresponsive, and our system
* master A (slaveof no one)
* master B (slaveof no one)
* slave C (slaveof master-B-ip master-B-port)
* Sentinel (quorum=1), considers A as the master
It is a very possible case of split brain, and currently there is no
a) Restoring redis.conf to the original state before restarting an instance.
Not good, as we lose all the benefits of rewriting it.
b) Manually resolving these conflicts.
a) We could explicitly describe our network configuration, including slaves
in sentinel.conf. Then after restarting a sentinel it would turn B and C to
slaves of A.
b) Sentinel would rewrite sentinel.conf after changing configuration. In
this example, after promoting B to master and changing B's and C's config,
it would also rewrite it's own config to consider B as the master.
What do you think?
--
You received this message because you are subscribed to the Google Groups
"Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org
To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)
--
You received this message because you are subscribed to the Google Groups
"Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
--
Best regards, Andrei
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to redis-db-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
Salvatore Sanfilippo
2014-05-28 14:10:33 UTC
Permalink
Update: I tried to simulate the problem, and Sentinel always
reconfigures the instance that claims to be a master but is not the
logical master as a slave, after a bit more than 8 seconds.
During all the time, Sentinel never advertised the non-logical master
as a master to clients.

So apparently I'm not able to trigger the bug. If you are able to find
a sequence of operations where instances known by Sentinel can be
simultaneously masters for more than a few seconds, please post the
exact sequence, I'll try to reproduce and track the issue.

Cheers,
Salvatore
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Hello,
First of all, thank you for response.
Regarding the definition of the split-brain I am still not convinced. In
my example both instances A and B consider themselves masters. Both of them
are able to serve clients, including writes. If it is not a split-brain,
then what is?..
Split brain conditions must be evaluated from the point of view of who
should be the source of authority in a distributed system.
In this case, it is the set of Sentinel instances, so as long as there
is no split-brain condition in the Sentinels themselves, the split
brain condition you see in the Redis instances is not a problem
because of the Sentinel property to always (with a delay) set the
logical configuration as the instances configuration.
Post by Salvatore Sanfilippo
The sequence described above is not imaginary. I've actually seen this
exact situation during my tests, it is very real, and what I really want is
to find a way to prevent repeating this in production.
Probably what you observed is what I described in the previous email?
That's definitely possible.
1) A failover starts.
2) The Sentinel sends SLAVEOF NO ONE to the slave.
3) The Sentinel gets killed before getting the acknowledge.
4) The Sentinel restarts with the old config (which is correct since
the previous failover was not technically finished, and the Sentinel
never advertised the new master).
At this point you have two masters if you check the instances, but for
Sentinel the master is still the old one.
After some time (8 seconds, which is, four times the configuration
broadcasting period) it should detect that one of the slaves is
misconfigured, and reconfigure it accordingly, if this does not happen
there is a bug.
All this, of course, in Sentinel >= 2.8.
Sentinel shipped with 2.6 is broken and deprecated. Actually in the
latest 2.6 branch it is a dummy binary that warns you to use 2.8.
Post by Salvatore Sanfilippo
So far it seems that sentinel is able to change (and actually save on
disk) configuration of an instance (master or slave), but does not change
it's own configuration. Is that correct?
Yes and no. It does not save the new configuration on purpose, because
it still did not received the acknowledge.
But here what is interesting is that, it saves the updated
configuration (with fsync) always *before* of advertising the new
configuration to clients and other Sentinels.
If it is not able to get the ack, it will reconfigure the new master
again back to slave.
If this does not happen, than there is a bug in the implementation,
but the designed semantics is very clear, the problem is if you find a
case where because of an implementation bug things does not work as
expected.
I'm trying to reproduce right now. Thanks for posting, it is vital
that we try to remove all the bugs in order to end with a system that
acts like the specification claims.
Salvatore
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Hello,
what you describe is AFAIK not possible, but moreover it is not
technically what is called a split-brain condition.
A split brain condition happens when multiple processes should agree
about some value, but instead they don't agree and actually have two
distinct values.
In eventually consistent systems like Sentinel, split-brain conditions
are possible during partitions, but there is the guarantee that when
all the partitions heal, all the sentinels agree about what is the
master.
What instead you describe is a loss of state during a crash-recovery
event. However AFAIK this is not possible because when a new
configuration is considered to be valid (we receive an acknowledge
examining the INFO output that the promoted slave actually turned into
master role), the configuration is persisted and fsync()-ed on disk
before it to be propagated to other nodes, or advertised by Sentinel
to any client.
However it is possible that the Sentinel sends a SLAVEOF NO ONE to the
promoted slave, and restarts before the slave is able to confirm the
role change.
But this case is exactly like a Sentinel observing a switch of a slave
from slave to master operated externally (for instance, manually).
In this case, because of the Sentinel liveness property to always try
to set the current logical configuration if there are instances
diverging from this configuration, such a slave with role equal to
master, is, after a small delay, converted back to slave role.
Regards,
Salvatore
Post by Andrei Lukovenko
Hi,
* master A (slaveof no one)
* slave B (slaveof master-A-ip master-A-port)
* slave C (slaveof master-A-ip master-A-port)
* Sentinel (quorum=1), considers A as the master
If master A fails, Sentinel promotes one of the slaves, and change configs
* master A (down, slaveof no one)
* master B (slaveof no one)
* slave C (slaveof master-B-ip master-B-port)
* Sentinel (quorum=1), considers B as the master
Now for some reason Sentinel goes down and restarts. As sentinel.conf has
not been rewritten, Sentinel still thinks about host A being the master. As
host A is down, the system becomes unresponsive, and our system
* master A (slaveof no one)
* master B (slaveof no one)
* slave C (slaveof master-B-ip master-B-port)
* Sentinel (quorum=1), considers A as the master
It is a very possible case of split brain, and currently there is no
a) Restoring redis.conf to the original state before restarting an instance.
Not good, as we lose all the benefits of rewriting it.
b) Manually resolving these conflicts.
a) We could explicitly describe our network configuration, including slaves
in sentinel.conf. Then after restarting a sentinel it would turn B and C to
slaves of A.
b) Sentinel would rewrite sentinel.conf after changing configuration. In
this example, after promoting B to master and changing B's and C's config,
it would also rewrite it's own config to consider B as the master.
What do you think?
--
You received this message because you are subscribed to the Google Groups
"Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org
To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)
--
You received this message because you are subscribed to the Google Groups
"Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
--
Best regards, Andrei
--
You received this message because you are subscribed to the Google Groups
"Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org
To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)
--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to redis-db-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
Andrei Lukovenko
2014-05-28 14:29:14 UTC
Permalink
Hi,

It seems that you are right. I wasn't able to reproduce this bug so far.
Let's consider this a rare occasional fluke.

Thank you for your support!
Post by Salvatore Sanfilippo
Update: I tried to simulate the problem, and Sentinel always
reconfigures the instance that claims to be a master but is not the
logical master as a slave, after a bit more than 8 seconds.
During all the time, Sentinel never advertised the non-logical master
as a master to clients.
So apparently I'm not able to trigger the bug. If you are able to find
a sequence of operations where instances known by Sentinel can be
simultaneously masters for more than a few seconds, please post the
exact sequence, I'll try to reproduce and track the issue.
Cheers,
Salvatore
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Hello,
First of all, thank you for response.
Regarding the definition of the split-brain I am still not convinced.
In
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
my example both instances A and B consider themselves masters. Both of
them
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
are able to serve clients, including writes. If it is not a split-brain,
then what is?..
Split brain conditions must be evaluated from the point of view of who
should be the source of authority in a distributed system.
In this case, it is the set of Sentinel instances, so as long as there
is no split-brain condition in the Sentinels themselves, the split
brain condition you see in the Redis instances is not a problem
because of the Sentinel property to always (with a delay) set the
logical configuration as the instances configuration.
Post by Salvatore Sanfilippo
The sequence described above is not imaginary. I've actually seen this
exact situation during my tests, it is very real, and what I really
want is
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
to find a way to prevent repeating this in production.
Probably what you observed is what I described in the previous email?
That's definitely possible.
1) A failover starts.
2) The Sentinel sends SLAVEOF NO ONE to the slave.
3) The Sentinel gets killed before getting the acknowledge.
4) The Sentinel restarts with the old config (which is correct since
the previous failover was not technically finished, and the Sentinel
never advertised the new master).
At this point you have two masters if you check the instances, but for
Sentinel the master is still the old one.
After some time (8 seconds, which is, four times the configuration
broadcasting period) it should detect that one of the slaves is
misconfigured, and reconfigure it accordingly, if this does not happen
there is a bug.
All this, of course, in Sentinel >= 2.8.
Sentinel shipped with 2.6 is broken and deprecated. Actually in the
latest 2.6 branch it is a dummy binary that warns you to use 2.8.
Post by Salvatore Sanfilippo
So far it seems that sentinel is able to change (and actually save on
disk) configuration of an instance (master or slave), but does not
change
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
it's own configuration. Is that correct?
Yes and no. It does not save the new configuration on purpose, because
it still did not received the acknowledge.
But here what is interesting is that, it saves the updated
configuration (with fsync) always *before* of advertising the new
configuration to clients and other Sentinels.
If it is not able to get the ack, it will reconfigure the new master
again back to slave.
If this does not happen, than there is a bug in the implementation,
but the designed semantics is very clear, the problem is if you find a
case where because of an implementation bug things does not work as
expected.
I'm trying to reproduce right now. Thanks for posting, it is vital
that we try to remove all the bugs in order to end with a system that
acts like the specification claims.
Salvatore
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Hello,
what you describe is AFAIK not possible, but moreover it is not
technically what is called a split-brain condition.
A split brain condition happens when multiple processes should agree
about some value, but instead they don't agree and actually have two
distinct values.
In eventually consistent systems like Sentinel, split-brain conditions
are possible during partitions, but there is the guarantee that when
all the partitions heal, all the sentinels agree about what is the
master.
What instead you describe is a loss of state during a crash-recovery
event. However AFAIK this is not possible because when a new
configuration is considered to be valid (we receive an acknowledge
examining the INFO output that the promoted slave actually turned into
master role), the configuration is persisted and fsync()-ed on disk
before it to be propagated to other nodes, or advertised by Sentinel
to any client.
However it is possible that the Sentinel sends a SLAVEOF NO ONE to the
promoted slave, and restarts before the slave is able to confirm the
role change.
But this case is exactly like a Sentinel observing a switch of a slave
from slave to master operated externally (for instance, manually).
In this case, because of the Sentinel liveness property to always try
to set the current logical configuration if there are instances
diverging from this configuration, such a slave with role equal to
master, is, after a small delay, converted back to slave role.
Regards,
Salvatore
Post by Andrei Lukovenko
Hi,
* master A (slaveof no one)
* slave B (slaveof master-A-ip master-A-port)
* slave C (slaveof master-A-ip master-A-port)
* Sentinel (quorum=1), considers A as the master
If master A fails, Sentinel promotes one of the slaves, and change configs
* master A (down, slaveof no one)
* master B (slaveof no one)
* slave C (slaveof master-B-ip master-B-port)
* Sentinel (quorum=1), considers B as the master
Now for some reason Sentinel goes down and restarts. As
sentinel.conf
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Post by Andrei Lukovenko
has
not been rewritten, Sentinel still thinks about host A being the
master.
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Post by Andrei Lukovenko
As
host A is down, the system becomes unresponsive, and our system
* master A (slaveof no one)
* master B (slaveof no one)
* slave C (slaveof master-B-ip master-B-port)
* Sentinel (quorum=1), considers A as the master
It is a very possible case of split brain, and currently there is
no
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Post by Andrei Lukovenko
straight way do avoid it. I have considered the following
a) Restoring redis.conf to the original state before restarting an instance.
Not good, as we lose all the benefits of rewriting it.
b) Manually resolving these conflicts.
a) We could explicitly describe our network configuration, including slaves
in sentinel.conf. Then after restarting a sentinel it would turn B
and C
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Post by Andrei Lukovenko
to
slaves of A.
b) Sentinel would rewrite sentinel.conf after changing
configuration. In
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Post by Andrei Lukovenko
this example, after promoting B to master and changing B's and C's config,
it would also rewrite it's own config to consider B as the master.
What do you think?
--
You received this message because you are subscribed to the Google Groups
"Redis DB" group.
To unsubscribe from this group and stop receiving emails from it,
send
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Post by Andrei Lukovenko
an
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org
To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)
--
You received this message because you are subscribed to the Google
Groups
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
"Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send
an
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
--
Best regards, Andrei
--
You received this message because you are subscribed to the Google
Groups
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
"Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send
an
Post by Salvatore Sanfilippo
Post by Salvatore Sanfilippo
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org
To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)
--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org
To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
--
Best regards, Andrei
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to redis-db-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
Salvatore Sanfilippo
2014-05-28 14:34:02 UTC
Permalink
Post by Andrei Lukovenko
It seems that you are right. I wasn't able to reproduce this bug so far.
Let's consider this a rare occasional fluke.
That's the problem, the current implementation of Sentinel, for how it
is specified, should never ever have occasional flukes.
It is more likely that you used an older version that contained bugs
perhaps (there are known issues in the past versions), or that there
is a non yet discovered issue which is hard to trigger.

One characteristic of eventually consistent systems is that they are
very resilient, because eventually, there is always an unique
information that wins over the other informations, and gets applied,
so it should be technically impossible to put it into a state where it
does not act in an obvious way. All this, modulo implementation bugs.
(For the same reasons some bugs in EC systems are hard to discover
since the system eventually converges masking issues, but Sentinel
tends to log everything so that bugs can be traced more easily).

Please if you happen to discover some problem, ping me, and thanks for
notifying about the possible problem.

Salvatore
--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to redis-db-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
Loading...