Page 1 of 1

Navigating throug SafeSquid's Extended logs using awk

Posted: Thu Oct 06, 2022 12:22 pm
by Pratik
The extended.log (NCSA / Extended log format) records maximum details of each request handled by the proxy application.
Extended logs will be helpful for generation of reports to analyze the user activities.

However sometimes information in extended logs can become overwhelming and you may find it difficult to understand.
Extended logs have 37 rows which contains user connection details and each of the row are separated by tabs.

Below is the FORMAT / LEGEND:

Code: Select all

"record_id"     "client_id"     "request_id"    "date_time"     "elapsed_time"  "status"        "size"  "upload"  "download"      "bypassed"      "client_ip"     "username"      "method"        "url"   "http_referer"    "useragent"     "mime"  "filter_name"   "filtering_reason"      "interface"     "cachecode"     "peercode"        "peer"  "request_host"  "request_tld"   "referer_host"  "referer_tld"   "range" "time_profiles"   "user_groups"   "request_profiles"      "application_signatures"        "categories"    "response_profiles"       "upload_content_types"  "download_content_types"        "profiles" 
Ideal extended logs

Code: Select all

"1531492103912WfkgX"    "91"    "2"    "13/Jul/2018:19:58:26"    "3663"    "200"    "626"    "0"    "626"    "FALSE"    "192.168.0.24"    "anonymous@192.168.0.24"    "GET"    "https://ssl.gstatic.com:443/accounts/ui/avatar_2x.png"    "https://accounts.google.com/"    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0"    "image/png"    "-"    "-"    "192.168.24.74:8080"    "TCP_MISS"    "DIRECT"    "ssl.gstatic.com"    "ssl.gstatic.com"    "gstatic.com"    "accounts.google.com"    "google.com"    "100-1K"    ""    "ADMINS"    ""    "Unidentified Web2.0,Firefox,Internet Browser"    "Search Engines & Portals"    "POTENTIAL MALWARE THREATS,SMALL DOWNLOADS"    "-"    "image/png"    "READ ONLY,ANTIVIRUS"
A utility called awk can be used to extract information which are only required for trouble shooting.

Code: Select all

tail -F /var/log/safesquid/extended/extended.log | awk -F"\t" '{print $2, $n}'

Code: Select all

tail -F /var/log/safesquid/extended/extended.log | awk -F"\t" '{print $2, $6, $11, $12, $30, $37}' 
awk.jpg
awk.jpg (84.43 KiB) Viewed 2332 times
Below is a list of all fields in extended logs represent by numeric value which are used to print values using awk utility
1 = record_id
2 = client_id
3 = request_id
4 = date_time
5 = elapsed_time
6 = status
7 = size
8 = upload
9 = download
10 = bypassed
11 = client_ip
12 = username
13 = method
14 = url
15 = http_referer
16 = useragent
17 = mime
18 = filter_name
19 = filtering_reason
20 = interface
21 = cachecode
22 = peercode
23 = peer
24 = request_host
25 = request_tld
26 = referer_host
27 = referer_tld
28 = range
29 = time_profiles
30 = user_groups
31 = request_profiles
32 = application_signatures
33 = categories
34 = response_profiles
35 = upload_content_types
36 = download_content_types
37 = profile