Automatically resolving Gluster GFID Mismatch (BASH)

GFID mismatch is something that can occur on nodes running GlusterFS, particularly after a split-brain or similar incident.

The symptoms of a GFID mismatch are I/O errors for certain files within the Gluster filesystem, and/or question marks in the output of ls. Ultimately, though within Glusters logs you'll find Gfid mismatch detected.

The technically correct way to resolve GFID mismatch is to compare mtimes for each affected GFID in order to identify which node has the most recent copy.

However, where a lot of file are affected, this may not be possible due to the time/effort required - particularly if there's a reasonable certainty that the files themselves don't differ between the nodes

This snippet provides a script which will pull affected GFIDs from the Gluster logs, resolve those GFIDs back to a path and then move that path. It should be run on all but one of your Gluster nodes

Details

Snippet

#!/bin/bash
#
# resolve-gfid-mismatch.sh
#
# Defaults:
# - gluster volume called "shared". Set environment variable GLUSTERVOL to override
# - brick paths /data1/gluster and /data2/gluster. Set env var BRICKS to a space seperated list to override
#
# Should run on all but one of your gluster nodes. That remaining node will be the "master" copy
#
# Note - this script pays absolutely *no* attention to which node has the most 
# recent copy of the data
#
# If you care about that, then see 
# https://www.bentasker.co.uk/documentation/linux/683-resolving-gfid-mismatch-problems-in-gluster-rhgs-volumes
# 

GLUSTERVOL=${GLUSTERVOL:-"shared"}
BRICKS=${BRICKS:-"/data1/gluster /data2/gluster"}

# Pull the affected GFIDs out of the gluster logs
GFIDS=$(grep "Gfid mismatch detected" /var/log/glusterfs/${GLUSTERVOL}.log | \
        grep -o -P "<gfid:([0-9a-fA-F\-]+)" | sed 's/<gfid://g' | sort | uniq)

# Grab a copy of the GFID resolve script
curl https://projects.bentasker.co.uk/static/resolve-gfid.sh -o ./resolve-gfid.sh
chmod +x resolve-gfid.sh

# Assumption: the affected GFID will always be a directory, 
# so will exist on all bricks
brick1=$(echo "$BRICKS" | cut -d\  -f1)

for gfid in $GFIDS
do
    gfid_dir_path=$(./resolve-gfid.sh "$brick1" "$gfid" -q)
    brickless_path=$(echo "$gfid_dir_path" | sed "s~^$brick1~~g")

    echo "Path: $brickless_path will be moved to $brickless_path.old"

    for brick in $BRICKS
    do
        mv "$brick/$brickless_path" "$brick/$brickless_path.old"  
    done
done

echo "You'll need to run 'gluster volume heal $GLUSTERVOL' on one of your nodes once you've run this script on all but one"

Usage Example

GLUSTERVOL=gvol0
BRICKS="/mnt/brick1 /mnt/brick2"
export GLUSTERVOL
export BRICKS
./resolve-gfid-mismatch.sh 
gluster volume heal $GLUSTERVOL