How to get the number of requests per unique host from a log file using shell script.
I managed to get the requested output with the below script. Can someone help with the proper solution?
filename=test.log
# Below for loop loops over each line and fetches all the valid hostnames.
for i in $(cat $filename); do
declare -a hosts
#regx needs improvement
if [[ "$i" =~ ^[A-Za-z0-9-]+\.[A-Za-z0-9-]+\.[A-Za-z0-9-]+$ ]]
then
count=`cat test| grep $i | wc -l`
hosts+=("$i")
fi
done
# Get uniq hostnmaes from list and save the sorted unique results back into array
hosts=($(echo "${hosts[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '))
# Get the number occurances for each host from entire file
for i in ${hosts[@]}; do
count=$(cat $filename| grep $i| wc -l)
echo "$i" "$count"
done
Logfile contains the below lines.
unicomp6.unicomp.net - - [01/Jul/1995:00:00:06 -0400] "GET /shuttle/countdown/ HTTP/1.0" 200 3985
burger.letters.com - - [01/Jul/1995:00:00:11 -0400] "GET /shuttle/countdown/liftoff.html HTTP/1.0" 304 0
burger.letters.com - - [01/Jul/1995:00:00:12 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 304 0
burger.letters.com - - [01/Jul/1995:00:00:12 -0400] "GET /shuttle/countdown/video/livevideo.gif HTTP/1.0" 200 0
d104.aa.net - - [01/Jul/1995:00:00:13 -0400] "GET /shuttle/countdown/ HTTP/1.0" 200 3985
unicomp6.unicomp.net - - [01/Jul/1995:00:00:14 -0400] "GET /shuttle/countdown/count.gif HTTP/1.0" 200 40310
unicomp6.unicomp.net - - [01/Jul/1995:00:00:14 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 200 786
unicomp6.unicomp.net - - [01/Jul/1995:00:00:14 -0400] "GET /images/KSC-logosmall.gif HTTP/1.0" 200 1204
d104.aa.net - - [01/Jul/1995:00:00:15 -0400] "GET /shuttle/countdown/count.gif HTTP/1.0" 200 40310
d104.aa.net - - [01/Jul/1995:00:00:15 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 200 786
I encountered this in my recent hacker rank test and I didn't find the proper example anywhere. Hope this will be useful for someone