Log Data Selection
Site: Show: Crawlers: Period: Ending:

Introduction

Background and Motivation

I developed and operate a number of websites. None of them are large in terms of user base. The most prominent among them have a few hundred daily users. The other sites, including an osprey camera and weather station, normally have only a few users.

In late April, 2020, I noticed an unusual pattern of access to the larger site. Until then I rarely looked at the server logs directly but used Awstats and, more recently, GoAccess for viewing aggregate data. I became aware of the unusual access pattern during server maintenance, perhaps from a tail on a server log though I don't remember the first evidence for certain.

While the details are not relevant for this discussion, for the curious... The larger site came under attack by unknown actor making a large number of requests for one specific .php file, over 17,000 in one day at the peak, when the usual number is in the 700 to 800 range, at a rate of one every several seconds. The .php file prepares data for one item and the unusual requests were for items in only a few data domains. The site is encrypted (HTTPS) and the .php file is requested via an AJAX POST (as distinct from GET) such that the request parameters are not visible nor are they able to be spoofed. After a number of defensive measure and finally blocking access to the data the attack stopped and the actor was never identified by more than an IP number. I can only speculate that the actor was screen-scraping to obtain data with nefarious intent.

In trying to understand the nature of the attack it became clear that the raw server logs are all but useless. While they are rich in data it is impossible to see patterns and trends, the proverbial forest through the trees. There are several fine log viewers, Awstats, GoAccess, and Apache Log Viewer among them. They are definitely better than the raw logs but still lacked the kind of information I wanted. LoggerBar is the result.

Demonstration Site

The operational site, loggerbar.wrwetzel.com is password protected and not publicly accessible. The demonstration site, loggerdemo.wrwetzel.com is open to the public but contains anonyized data to protect the security of the reported sites.

All site names, file names, urls and high-order host addresses from the server logs and jail names and host addresses from the fail2ban logs are anonymized. The host IP numbers may or may not map to actual host addresses but they are probably not the hosts using the site. Access times, status, and other information is real. The low-order octets of the host address is not anonymized to preserve patterns of attack from sequential ranges of addresses.

Chart:
TimeDelta T (sec)Host IPStatusFileUrl
TimeHost IPJailAction
Chart: Record Limit: Bucket size: Marker value: ColorScale:
Network Diagram of Time Ordered File Requests
TimeDelta T (sec)Host IPStatusFileUrl
Network Diagram of Time Ordered File Requests
TimeHost IPJailAction