Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple IPs in nginx module #4322

Closed
sepal opened this issue May 16, 2017 · 12 comments
Closed

Support multiple IPs in nginx module #4322

sepal opened this issue May 16, 2017 · 12 comments

Comments

@sepal
Copy link
Contributor

sepal commented May 16, 2017

Currently the nginx module only allows to fetch one IP, but if you use a proxy, you might want to output the X-Forwarded-For header into the logs, which would result in lines containing multiple IPs. Here is an example from our log file (with random client IPs).

68.75.44.178, 172.68.146.54, 127.0.0.1 - - [15/May/2017:12:16:27 +0200] "GET /sites/default/files/styles/company_profile_cover_crop/public/1500x500_1_10.jpg?itok=RUgim2UQ&sc=297009042628d7de3f0eb50e807d29e4 HTTP/1.1" 200 92763 "https://www.startus.cc/company/finleap" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
221.247.242.171, 162.158.166.51, 127.0.0.1 - - [15/May/2017:12:16:27 +0200] "GET /sites/default/files/styles/company_profile_logo/public/company_logos/aaeaaqaaaaaaaawvaaaajdk3n2vkzme0lte0zjctngy3ms1inmm4lta4ntnhzwqymzvmoq.png?itok=H2B05xX0 HTTP/1.1" 200 9296 "https://www.startus.cc/company/finleap" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
192.228.32.190, 108.162.246.21, 127.0.0.1 - - [15/May/2017:12:16:27 +0200] "GET /jobs/24237/it-back-end HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
137.56.184.63, 162.158.165.50, 127.0.0.1 - - [15/May/2017:12:16:27 +0200] "GET /sites/default/files/styles/company_profile_cover/public/1500x500_1_10.jpg?itok=1cNqdGYK HTTP/1.1" 200 102268 "https://www.startus.cc/company/finleap" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
92.222.165.172, 162.158.167.202, 127.0.0.1 - - [15/May/2017:12:16:27 +0200] "POST /jstats.php HTTP/1.0" 200 13 "https://www.startus.cc/company/finleap" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"

In our case the first IP is the actual client IP, the second is the one from cloudflare, and the localhost address comes from varnish, which runs on the same host as nginx.

Filebeat only takes logs the localhost address from varnish i.e. 127.0.0.1, which is kind of useless. It would be cool to have all IPs to store all IPs into elastic search, but I guess it raises some questions regarding how to geocode them.
As a quick hack I tried to create a custom pattern, which fetches the first IP instead of the last, but didn't succeed. I would really appreciate it if someone could point me in the right direction.
My Idea was to fetch the first IPORHOST by also matching by a comma, but I don't get how to exclude it with the "not captured group", at least on https://grokconstructor.appspot.com it doesn't seem to work:

FIRSTIPORHOST (%{IPORHOST}(?:,))

Here are my specs:
Filebeat 5.4 (Running in docker)
OS: Debian 8

Steps to Reproduce:

  • Enable cloudflare infront of a site with nginx, cloudflare will start sending the X-Forwarded-Forheader.
  • Change the nginx.conf log_format to log the $http_x_forwarded_for variable. Here is our config:
log_format  proxy '$http_x_forwarded_for - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent"';
@ruflin
Copy link
Contributor

ruflin commented May 16, 2017

As reference, here is the pattern nginx.access log currently uses: https://github.com/elastic/beats/blob/master/filebeat/module/nginx/access/ingest/default.json#L7

I agree it is kind of tricky to get multiple ip addresses as it raises the question of how they will be named. I assume the list of IP's could also be longer then 3?

@sepal
Copy link
Contributor Author

sepal commented May 16, 2017

Yes probably, but I haven't tested it. I should also mentioned that the X-Forwarded-For header is not standard, but is beeing used by a lot of tools & services. There is also an RFC mentioning it: https://tools.ietf.org/html/rfc7239

@ruflin
Copy link
Contributor

ruflin commented May 17, 2017

If you find a grok solution to only pick the first one (instead of the last one) feel free to open a PR with it. I was thinking of something like %{IPORHOST:nginx.access.remote_ip} (%{IPORHOST} )* could work but didn't test it.

@sepal
Copy link
Contributor Author

sepal commented May 18, 2017

So the correct pattern should actually be %{IPORHOST:nginx.access.remote_ip}(,\\s%{IPORHOST})*. On grok testing tools like https://grokdebug.herokuapp.com it seems to work.
I've tried to exchange the ingest file, by mounting my own file into the docker container to the path /usr/share/filebeat/module/nginx/access/ingest/default.json, but filebeat still outputs the wrong IP. Any Ideas what I've could done wrong?

@andrewkroh
Copy link
Member

You probably need to delete the existing pipeline definition in Elasticsearch for the change to take effect.

@sepal
Copy link
Contributor Author

sepal commented May 18, 2017

Thanks thats what was missing. The pattern works and elasticsearch now stores the correct IP address. I created a PR for the change: #4351

@ruflin
Copy link
Contributor

ruflin commented May 19, 2017

Thanks, really appreciate you opened a PR.

@stevedodson
Copy link

stevedodson commented May 24, 2017

I am currently analysing nginx logs that have been generated via the following configuration (xxxx are anonymisations):

'"$http_x_forwarded_for" $remote_addr - [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"';

e.g.

"2a03:xxxx:10ff:f00f:xxxx:xxxx:0:8000, 10.225.192.17, 107.20.xxx.xxx" 10.2.2.121 - [30/Dec/2016:06:47:09 +0000] "GET /test.html HTTP/1.1" 404 8571 "-" "Mozilla/5.0 (compatible; Facebot 1.0; https://developers.facebook.com/docs/sharing/webmasters/crawler)"

"10.5.102.222, 199.96.xxx.xxx, 204.246.xxx.xxx" 10.2.1.185 - [22/Jan/2016:13:18:29 +0000] "GET /assets/xxxx?q=100 HTTP/1.1" 200 25507 "-" "Amazon CloudFront"

in these cases the logic behind which ip to use to look up as geoip is complex. In this case, the best logic may be to resolve ips that are not private from left to right for geoip lookup.

This syntax requires different handling than the current PR.

Also, the naming of fields may need to be reviewed. remote_ip is not remote_addr if http_x_forwarded_for addresses are used.

@ruflin
Copy link
Contributor

ruflin commented May 26, 2017

@stevedodson One tricky part with having a list of IP is that either we need to find a smart way to assign each a specific field name or will have an array in elasticsearch which then is hard to query in es. Any ideas?

@stevedodson
Copy link

stevedodson commented May 29, 2017

In summary, the issue seems to be: if http_x_forwarded_for or other multi-IP access log configuration is used, there may be a list of IP addresses in a message:

  • For reporting and searching, we would like to select a non-private IP that represents our best effort to identify the geographic remote IP.
  • We would like to be able to search the logs for any IP address

An option which involves minimal change would be to assign remote_ip to the first non-private address in the list. Also, if general searching of log messages for an IP is required, then the message field could be retained.

Longer term, additional configurations that could allow reporting such as number of private IPs that went through proxy x could also be defined.

A sample configuration that can identify the first non-private IP (according to rfc1918) - with a fallback to remote_addr if no non-private IPs are found - is shown below (not this is IPv4 only currently):

curl -XPOST "http://localhost:9200/_ingest/pipeline/_simulate" -H 'Content-Type: application/json' -d'
{
  "pipeline": {
    "description": "Pipeline for parsing Nginx access logs. Requires the geoip and user_agent plugins.",
    "processors": [
      {
        "grok": {
          "field": "message",
          "patterns": [
            "\"%{IP_LIST:nginx.access.http_x_forwarded_for}\" %{IP:nginx.access.remote_addr}"
          ],
          "pattern_definitions": {
            "IP_LIST": "%{IP}(,\\s+%{IP})*"
          },
          "ignore_missing": true
        }
      },
      {
        "split": {
          "field": "nginx.access.http_x_forwarded_for",
          "separator": ",\\s+"
        }
      },
      {
        "script": {
          "lang": "painless",
          "inline": "for (def item : ctx.nginx.access.http_x_forwarded_for) { if (item ==~/(?!(^127\\.)|(^10\\.)|(^172\\.1[6-9]\\.)|(^172\\.2[0-9]\\.)|(^172\\.3[0-1]\\.)|(^192\\.168\\.)).*/) {ctx.nginx.access.remote_ip=item; break;}}"
        }
      },
      {
        "set": {
          "field": "nginx.access.remote_ip",
          "value": "{{nginx.access.remote_addr}}",
          "override": false
        }
      }
    ],
    "on_failure": [
      {
        "set": {
          "field": "error.message",
          "value": "{{ _ingest.on_failure_message }}"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_type": "type",
      "_id": "id",
      "_source": {
        "message": "\"10.166.206.189, 10.166.205.189, 54.146.73.204\" 10.2.2.121"
      }
    },
    {
      "_index": "index",
      "_type": "type",
      "_id": "id",
      "_source": {
        "message": "\"10.166.206.189, 54.146.73.204, 10.166.205.189\" 10.2.2.121"
      }
    },
    {
      "_index": "index",
      "_type": "type",
      "_id": "id",
      "_source": {
        "message": "\"10.166.206.189, 10.166.205.189\" 10.2.2.121"
      }
    }
  ]
}'

@tsg
Copy link
Contributor

tsg commented May 29, 2017

Thanks, I'm going to work on integrating a version of this in the nginx module.

@tsg
Copy link
Contributor

tsg commented May 29, 2017

I came up with a version of the Painless script that doesn't need regexp support to be enabled:

  boolean isPrivate(def ip) {
    try {
      StringTokenizer tok = new StringTokenizer(ip, ".");
      int firstByte = Integer.parseInt(tok.nextToken());
      int secondByte = Integer.parseInt(tok.nextToken());
      if (firstByte == 10) {
        return true;
      }
      if (firstByte == 192 && secondByte == 168) {
        return true;
      }
      if (firstByte == 172 && secondByte >= 16 && secondByte <= 31) {
        return true;
      }
      if (firstByte == 127) {
        return true;
      }
      return false;
    } catch (Exception e) {
      return false;
    }
  }
  for (def item : ctx.nginx.access.http_x_forwarded_for) {
    if (!isPrivate(item)) {
      ctx.nginx.access.remote_ip = item;
    }
  }

tsg pushed a commit to tsg/beats that referenced this issue May 31, 2017
A common customization to the nginx logs is to add the contents
of the X-Forwarded-For header in front of the remote IPs. This
typically results in a list of remote IPs.

This adds a new field `remote_ip_list` which is an array, and uses
a Painless script to automatically select the first non-private
IP for the `remote_ip` field, which is the field on which GeoIP is
applied.

Fixes elastic#4322.
exekias pushed a commit that referenced this issue Jun 2, 2017
A common customization to the nginx logs is to add the contents
of the X-Forwarded-For header in front of the remote IPs. This
typically results in a list of remote IPs.

This adds a new field `remote_ip_list` which is an array, and uses
a Painless script to automatically select the first non-private
IP for the `remote_ip` field, which is the field on which GeoIP is
applied.

Fixes #4322.
tsg added a commit to tsg/beats that referenced this issue Jul 19, 2017
…4417)

A common customization to the nginx logs is to add the contents
of the X-Forwarded-For header in front of the remote IPs. This
typically results in a list of remote IPs.

This adds a new field `remote_ip_list` which is an array, and uses
a Painless script to automatically select the first non-private
IP for the `remote_ip` field, which is the field on which GeoIP is
applied.

Fixes elastic#4322.
(cherry picked from commit a2c162f)
exekias pushed a commit that referenced this issue Jul 19, 2017
…ess as remote_ip (#4703)

* Nginx module: use first not private IP address as remote_ip (#4417)

A common customization to the nginx logs is to add the contents
of the X-Forwarded-For header in front of the remote IPs. This
typically results in a list of remote IPs.

This adds a new field `remote_ip_list` which is an array, and uses
a Painless script to automatically select the first non-private
IP for the `remote_ip` field, which is the field on which GeoIP is
applied.

Fixes #4322.
(cherry picked from commit a2c162f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants