Healing a GlusterFS Split-Brain (Misc)

When GlusterFS nodes get isolated, you can end up with a condition known as split-brain - basically one (or more) nodes is aware of changes that the others are not.

Depending on how bad the split-brain is, and what it effects, it has the ability to completely ruin your day.

One of the first things you want to do, once you become aware of the issues, is (if at all possible) cease all writes to the volume on all nodes - you don't want to dig the hole any deeper than it already is

This documentation shows you how to resolve a simple split-brain, and should hopefully be sufficient most of the time. We'll assume the gluster volume is called "myglustervol", replace this with your actual volume name

Details

  • Language: Misc

Snippet

# Check that all peers are communicating
gluster peer status

# If any show as disconnected, resolve this first

# Get the volume status
gluster volume status

# Do this on each node 
# Confirm which files need healing
# Your lists should, hopefully match - if not, check comms again
gluster volume heal myglustervol info split-brain

# Run the heal (only needs running one one)
gluster volume heal myglustervol split-brain

# If that didn't work, you may need to tell gluster how 
# to resolve conflicts on a per file basis
#
# Choose one of the following (whichever you think will be best)

# Use the bigger file
gluster volume heal myglustervol split-brain bigger-file <FILE>

# Use the file with the most recent mtime
gluster volume heal myglustervol split-brain latest-mtime <FILE>

# Force use of one of the bricks as the authoritative source
gluster volume heal myglustervol split-brain source-brick <Host:brick> <FILE>

# Use one brick as a source for all the things
gluster volume heal myglustervol split-brain source-brick <Host:brick>