I recently had a customer who experienced high network throughput on one of his machines. He noticed by receiving multiple traffic warnings from the hetzner monitoring system, which monitors in and outgoing network traffic and sends emails when a total defined limit is exceeded. If you are with hetzner.de, I highly recommend using this.

A great tool to get an overview of current connections and how much bandwith those are using is the tool iftop. When you open iftop, all data displayed rushes through with an update time of one second, which makes it a little hard to work with the data. Therefore, here is a little iftop crash course:

Open iftop by executing iftop as root. You can get a good overview of all relevant Options by pressing h. Press h again to go back to the main view.

You will want to enable the Port Display by pressing p. You can enable port resolution (http = 80) by pressing N. Press those keys again to disable the triggered functionality. When you want to take a closer look at some of the connections, press P to pause the display. You can then still trigger DNS host resolution by pressing n.

When your display looks about something like this:

machine:80	133.7.207.91.unknown.SteepHost.Net:39697            246kb   307kb   142kb
machine:80      news.popstar-fanwatch.com:38341                     111kb   262kb  65,5kb
machine:80      30.7.207.91.unknown.SteepHost.Net:51785             204kb   243kb   147kb
machine:80      133.7.207.91.unknown.SteepHost.Net:60678            257kb   224kb  69,5kb
machine:80      210.4.207.91.unknown.SteepHost.Net:60388            93,3kb   196kb  49,1kb
machine:80      17.8.207.91.unknown.SteepHost.Net:45000             378kb   166kb  41,5kb
machine:80      30.7.207.91.unknown.SteepHost.Net:49526             208b    165kb   154kb
machine:80      205.5.207.91.unknown.SteepHost.Net:37422            0b    144kb   149kb
machine:80      news.popstar-fanwatch.com:55741                     134kb   139kb  34,8kb
machine:80      210.4.207.91.unknown.SteepHost.Net:33202            99,2kb   128kb  32,0kb

You know that someone using SteepHost is the evil adversary today. Please note, that 133.7.207.91.unknown.SteepHost.Net does NOT have the IP 133.7.207.91, but 91.207.7.133. You can see that using the n key in iftop during the paused mode triggered by P.

Write down a few of those IPs. In our case, iftop shows that the webserver was for example accessed from the following IPs owned by steephost:

91.207.4.210
91.207.4.58
91.207.5.205

It seems save to assume that we can grep Logfiles for the string ‘^91.207’ (The ^ means starting with). The webserver hosting the targeted infrastructure in this case was nginx.

root@machine:~# tail -f /var/log/nginx/access.log | grep 91.207
91.207.6.142 - - [16/Sep/2014:00:05:52 +0200] "GET /index.php?title=Some_title:foo HTTP/1.1" 200 833240 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
91.207.6.142 - - [16/Sep/2014:00:05:53 +0200] "-" 400 0 "-" "-"
91.207.4.210 - - [16/Sep/2014:00:05:53 +0200] "GET /index.php?title=Foo:bar_fo_bar&action=edit&redlink=1 HTTP/1.1" 302 5 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
91.207.7.33 - - [16/Sep/2014:00:05:53 +0200] "GET /index.php?title=Foo:some_other_bar&action=edit&redlink=1 HTTP/1.1" 302 5 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
91.207.7.141 - - [16/Sep/2014:00:05:56 +0200] "GET /index.php?title=Foo:even_more_bar HTTP/1.1" 200 500200 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
91.207.7.141 - - [16/Sep/2014:00:05:58 +0200] "GET /index.php?title=Foo:baz&action=edit&redlink=1 HTTP/1.1" 302 5 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
91.207.5.205 - - [16/Sep/2014:00:05:58 +0200] "GET /index.php?title=Bar_Foo:bazbar_barn=edit&redlink=1 HTTP/1.1" 302 5 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
[...]

You can now do some further research on that logfile. To get all IPs in that Network accessing your machine, do:

root@services:~# cat /var/log/nginx/access.log | grep ^91.207 | awk '{print $1}' | sort | uniq | nl
     1	91.207.158.113
     2	91.207.4.210
     3	91.207.4.58
     4	91.207.5.142
     5	91.207.5.186
     6	91.207.5.205
     7	91.207.60.70
     8	91.207.60.71
     9	91.207.60.73
    10	91.207.60.88
    11	91.207.6.138
    12	91.207.6.142
    13	91.207.61.70
    14	91.207.61.88
    15	91.207.6.46
    16	91.207.7.110
    17	91.207.7.133
    18	91.207.7.141
    19	91.207.7.169
    20	91.207.7.173
    21	91.207.7.186
    22	91.207.7.30
    23	91.207.7.33
    24	91.207.7.81
    25	91.207.8.17
    26	91.207.9.150
    27	91.207.9.162
    28	91.207.9.166
    29	91.207.9.170

You can find out what website is being accessed by the following:

root@machine:~# cat /var/log/nginx/access.log | grep ^91.207 | awk '{print $1,$11}' | sort | uniq | nl | tail -n 5
   967	91.207.9.170 "http://foo.bar.com/index.php?title=Foo_bar_baz&action=edit"
   968	91.207.9.170 "http://foo.bar.com/index.php?title=Foo-bar-bla_baz&action=edit"
   969	91.207.9.170 "http://foo.bar.com/index.php?title=some_text_bla&action=edit"
   970	91.207.9.170 "http://foo.bar.com/index.php?title=Even_more_text_bla&action=edit"
   971	91.207.9.170 "http://foo.bar.com/index.php?title=Running_out_of_ideas&action=edit"

A hint: In this case, the Application being accessed was the popular PHP Mediawiki.

Another hint on nginx logfiles: You see two types of logs, one with and one without an url linking to the page. Those without are HTTP referers, as in where the request originated. If you want to identify the actual website being targeted, you can also grep for http://.

The bots crawling through the website are seeking for holes, scraping, stealing, indexing and / or doing lameness.

The best way is to get rid of all those IPs. You can do that with nginx deny directives or do it one step before that by using iptables. To me, it seems save to ban the whole hosting company steephost, as there offers seem quite illegal after doing a short google search.

To identify the Networks owned by SteepHost, execute the following bash script:

#!/bin/bash

# Example IP from Steephost:
# 91.207.9.170

for c in {1..255}; do
  for d in {1..255}; do
    if [[ $( whois 91.207.$c.$d | grep SteepHost ) ]]; then
      echo Positive match for 91.207.$c.$d | tee -a steep.log
    else
      echo Negative for 91.207.$c.$d
    fi
  done
done

exit 0

# Output: 
user@machine ~$ cat steep.log
Negative for 91.207.3.254
Negative for 91.207.3.255
Positive match for 91.207.4.1
Positive match for 91.207.4.2
[...]
Positive match for 91.207.4.254
Positive match for 91.207.4.255
Positive match for 91.207.5.1
Positive match for 91.207.5.2
[...]
Positive match for 91.207.9.254
Positive match for 91.207.9.255
Negative for 91.207.10.1
Negative for 91.207.10.2

# Logfile:
user@machine ~$ cat steep.log

Note that the IP 91.207.158.113 listed above is NOT from steephost. Doing this test, it seems save to assume that we can ban the Networks known to be owned by steephost like follows using iptables:

iptables -I INPUT -s 91.207.4.0/24 -j DROP
iptables -I INPUT -s 91.207.5.0/24 -j DROP
iptables -I INPUT -s 91.207.6.0/24 -j DROP
iptables -I INPUT -s 91.207.7.0/24 -j DROP
iptables -I INPUT -s 91.207.8.0/24 -j DROP
iptables -I INPUT -s 91.207.9.0/24 -j DROP

For future cases, consider employing fail2ban to monitor your nginx logfiles and automatically executing iptables rulesets for requests as frequent as those from SteepHost.