Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback for ECS from our analysts. #182

Closed
ave19 opened this issue Nov 19, 2018 · 3 comments
Closed

Feedback for ECS from our analysts. #182

ave19 opened this issue Nov 19, 2018 · 3 comments

Comments

@ave19
Copy link

ave19 commented Nov 19, 2018

This is kind of a grab bag of stuff we're finding out using ECS in our mission. This issue more about providing feedback than asking for anything. And starting conversations! If you have better ideas, please tell me! 😄

  • We have lots of kinds of feeds:

    • source and destination top levels are working out so far. These are mostly about network as we don't have a lot of application logs yet. And most of the application logs are going into a service container.
    • We're still doing source.geoip and source.geoip.asn in some places but have switched to source.geo and source.asn (same same destination) which is working well.
    • We are leaving the *.ip field in both geo and asn so that if code wants to work on them, they can just take that part of the json and run with it. I don't want to make a developer stitch together information from different parts of the structure. To be clear, we have all of these:
      • [ client.ip, client.geo.ip, client.asn.ip ]
    • networkhas grown:
      • network.interface holds the eth0 type name.
      • network.status has values like OK
  • Making field names more self-documenting:

    • Some field names are synonyms to the uninitiated, or even to a well trained analyst at 3am!
    • Example: [ agent, device, source, host ]
    • We had a case where an analyst could not figure out which IP address in an event was the one where the event actually happened. There was nothing about device.ip that looked wrong. When you have hundreds of subscriber networks, the IP address itself isn't a great clue.
    • Obviously, some of this is down to training, but in the heat of battle (we have a cyber mission) I think we should try to make it obvious. (Don't push the red button, push the cherry button! It's right there next to the rose button!)
    • I'm thinking of moving device to event.received_by or something like that. We could make a list of objects that relayed the event, but I know kibana doesn't like those very much.
    • No, seriously, close your eyes, spin around three times and read both of these and try to pick out which is agent and which is device:
      • The {field_name_here} fields contain the data about the {field_name_here}/client/shipper that created the event.
      • {field_name_here} fields are used to provide additional information about the {field_name_here} that is the source of the information. This could be a firewall, network device, etc.
  • Putting things in other things: I've got this kind of paradigm developing where if you have a data from a foo service, I put service.type: foo and then I create a service.foo: {...} which is a container for whatever was reported in that foo structure, which could be the entire original event. That seems to be doing a good job of isolating key names from each other. Especially when I have no control over what foo says or might say going forward.

  • Timestamps: This is working out well:

    • If message has a timestamp looking substring in it, put that in event.timestamp but do not reformat it. This will end up being a string type like "19/Nov/2018:10:35:27 -0500" Leave it like that.
    • Use logstash's date filter to turn event.timestmap into @timestamp, so that's a date time type now.
    • Use an ingest pipeline in Elasticsearch to add an event.indextime as a date time type, like so:
"processors": [
      {
        "set": {
          "field": "event.indextime",
          "value": "{{_ingest.timestamp}}"
        }
      }
    ]
  • We also use an event.starttime and event.endtime along with event.duration. In cases like network flows, the event.timestamp is some time after the event.endtime. Take your pick as to which you turn into @timestamp I guess, depending on mission.
  • We use timelion to plot counts for @timestamp and event.indextime in the same chart as a feed health indicator. If the counts generally match, you're keeping up.
  • The difference between the two is how long it takes data to get into your system.
@webmat
Copy link
Contributor

webmat commented Nov 19, 2018

This is amazing, thanks for putting this together :-)

We'll be doing another big push to add some missing things and clarify others ;-) We'll be taking a good look at your feedback. This is really helpful.

@sporkmonger
Copy link

Things like source.asn would be extremely important for us as well. We typically expand ASN, CIDR range block, and full ISP name.

@webmat
Copy link
Contributor

webmat commented Apr 8, 2019

@sporkmonger Discussion on AS is also happening on this PR #341

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants