Automatically resolving Gluster GFID Mismatch (BASH)
GFID mismatch is something that can occur on nodes running GlusterFS, particularly after a split-brain or similar incident.
The symptoms of a GFID mismatch are I/O errors for certain files within the Gluster filesystem, and/or question marks in the output of ls
. Ultimately, though within Glusters logs you'll find Gfid mismatch detected
.
The technically correct way to resolve GFID mismatch is to compare mtimes for each affected GFID in order to identify which node has the most recent copy.
However, where a lot of file are affected, this may not be possible due to the time/effort required - particularly if there's a reasonable certainty that the files themselves don't differ between the nodes
This snippet provides a script which will pull affected GFIDs from the Gluster logs, resolve those GFIDs back to a path and then move that path. It should be run on all but one of your Gluster nodes
Details
- Language: BASH
- License: BSD-3-Clause
Snippet
#!/bin/bash
#
# resolve-gfid-mismatch.sh
#
# Defaults:
# - gluster volume called "shared". Set environment variable GLUSTERVOL to override
# - brick paths /data1/gluster and /data2/gluster. Set env var BRICKS to a space seperated list to override
#
# Should run on all but one of your gluster nodes. That remaining node will be the "master" copy
#
# Note - this script pays absolutely *no* attention to which node has the most
# recent copy of the data
#
# If you care about that, then see
# https://www.bentasker.co.uk/documentation/linux/683-resolving-gfid-mismatch-problems-in-gluster-rhgs-volumes
#
GLUSTERVOL=${GLUSTERVOL:-"shared"}
BRICKS=${BRICKS:-"/data1/gluster /data2/gluster"}
# Pull the affected GFIDs out of the gluster logs
GFIDS=$(grep "Gfid mismatch detected" /var/log/glusterfs/${GLUSTERVOL}.log | \
grep -o -P "<gfid:([0-9a-fA-F\-]+)" | sed 's/<gfid://g' | sort | uniq)
# Grab a copy of the GFID resolve script
curl https://projects.bentasker.co.uk/static/resolve-gfid.sh -o ./resolve-gfid.sh
chmod +x resolve-gfid.sh
# Assumption: the affected GFID will always be a directory,
# so will exist on all bricks
brick1=$(echo "$BRICKS" | cut -d\ -f1)
for gfid in $GFIDS
do
gfid_dir_path=$(./resolve-gfid.sh "$brick1" "$gfid" -q)
brickless_path=$(echo "$gfid_dir_path" | sed "s~^$brick1~~g")
echo "Path: $brickless_path will be moved to $brickless_path.old"
for brick in $BRICKS
do
mv "$brick/$brickless_path" "$brick/$brickless_path.old"
done
done
echo "You'll need to run 'gluster volume heal $GLUSTERVOL' on one of your nodes once you've run this script on all but one"
Usage Example
GLUSTERVOL=gvol0
BRICKS="/mnt/brick1 /mnt/brick2"
export GLUSTERVOL
export BRICKS
./resolve-gfid-mismatch.sh
gluster volume heal $GLUSTERVOL