Dennis Zheleznyak

Dennis Zheleznyak

DevOps Engineer bigpanda

© 2021

Query data usage across the month

To query your elasticsearch cluster for data usage across the month, you can use the following script:

#!/bin/bash
# The following script calculates our elasticsearch usage across the current month

# Define date inputs
from_day=01
to_day=$(date +%d)
this_month=$(date +%m)
this_year=$(date +%Y)

# Define temp file
tmp_file="/tmp/numbers"

for day in $(eval echo "{${from_day}..${to_day}}"); do
	# Get the data usage only for the primary shard
        curl -s -X GET "localhost:9200/_cat/indices/*-${this_year}.${this_month}.${day}/?bytes=b&h=pri.store.size" > ${tmp_file}
        total=$(paste -sd+ ${tmp_file} | bc)
	# Check if the total variable is not empty, that way only days with data will be printed
        if [ ${total} ]; then
		# Output the full date of the day and the data usage in GB
                printf '%s\n' "${day}.${this_month}.${this_year} $((${total}/ 10**9))"
        fi;
done

Output:

01.05.2020 93
02.05.2020 110
03.05.2020 96
04.05.2020 94
05.05.2020 93
06.05.2020 94
07.05.2020 100
08.05.2020 98
09.05.2020 96
10.05.2020 93
11.05.2020 95
12.05.2020 107
13.05.2020 90

You can replace:

pri.store.size

With:

store.size

To get the data usage of primary and replica shards.