Filtering and Selecting Substrings in JSON with jq

jq is an incredibly versatile tool for parsing JSON from the command line.

This snippet describes how to use it filter based on the presence (or absence) of a substring within a JSON object's attribute. It also describes how to use a regular expression to extract that substring with an example of printing to a CSV.

Details

  • Language: MISC

Snippet

# Filtering based on presence of substring in property
select(.property1 | contains("channel: "))

# Filtering based on absence of substring in property
select(.property1 | contains("channel: ") | not)

# Extracting substring value
.property1 | capture("channel: \\\"(?<channel>[a-z,0-9,-]+)\\\"","i").channel

Usage Example

JSON_STR='[{"ID": "1", "Name": "obj1", "prop1" : "foo\nbar\nchannel: \"sed\"\nzoo"},
{"ID": "2", "Name": "obj2", "prop1" : "foo\nbar\nchannel: \"\"\nzoo"},
{"ID": "3", "Name": "obj2", "prop1" : "foo\nbar\n\nzoo"},
{"ID": "4", "Name": "obj2", "prop1" : "foo\nbar\nchannel: \"zoo\"\nzoo"}]'


# Items containing channel:
echo "$JSON_STR" |  jq '.[] | select(.prop1 | contains("channel: "))'

# Items not containing channel
echo "$JSON_STR" |  jq '.[] | select(.prop1 | contains("channel: ") | not)'

# Items with a non-empty channel
echo "$JSON_STR" |  jq '.[] | select(.prop1 | contains("channel: ")) |  select(.prop1 | contains("channel: \"\"") | not)'

# Extracting channel names
echo "$JSON_STR" |  jq '.[] | select(.prop1 | contains("channel: ")) | .prop1 | capture("channel: \\\"(?<channel>[a-z,0-9,-]+)\\\"","i").channel'

# Generating a Tab seperated CSV of ID, Name and channel name
# Skip empty channels
echo "$JSON_STR" |  jq -r '.[] | select(.prop1 | contains("channel: ")) |  select(.prop1 | contains("channel: \"\"") | not) | [.ID, .Name, (.prop1 | capture("channel: \\\"(?<channel>[a-z,0-9,-]+)\\\"","i").channel)] | @tsv'