Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TLS schema to ECS #606

Merged
merged 12 commits into from
Nov 15, 2019
Merged

Add TLS schema to ECS #606

merged 12 commits into from
Nov 15, 2019

Conversation

dcode
Copy link
Contributor

@dcode dcode commented Nov 12, 2019

I compared the output of Zeek 3.x, Suricata 5.x, and Packetbeat 7.5 and selected these fields as a common set that would cover the TLS protocol. You can see my scratchpad where I compared field names and came up with an example data record (YAML formatted).

While selecting fields, I aimed to keep only the data that centers around the SSL/TLS negotiation and avoid digging too deep into the x.509 certificates themselves. The certificates contain much more data than is extracted here, but this is similar to how Zeek and Suricata handle SSL/TLS protocol metadata by default, where only the high-level data is extracted.

With that in mind, there should be sufficient information recorded that will allow an analyst to pivot from a TLS event to a first class file object document (not currently defined by ECS) that extracts and analyzes all the x.509 fields. Zeek provides this as a separate x.509 event, for example. This data could also be provided by recursive file scanners like Strelka or potentially endpoint agents.

Formatting Question: I broke all the "objects" into their own group. It might be better to just use a dotted field notation, understanding that the tooling takes care of the rest. I'm not sure on this one.

@dcode dcode added enhancement New feature or request in progress labels Nov 12, 2019
Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge thanks for getting started on this, @dcode :-)

So contrary to the Beats parser for these YAML files, the generator in ECS doesn't support re-nesting with fields:, once you're adding stuff inside the field set. I've commented inside the file for more precise pointers.

General feedback, please add an example section for as many fields as possible. Typically when we do that, we try to keep the examples aligned across all examples in the field set. For an example of what I mean, you can check out the url examples, how url.original is a full URL, but then all the other fields are the actual same values extracted from the full URL.

It's also fine to kind of fake it, for example you could use the exact same examples for the client and server sections.

So that's my round of preliminary feedback.

Another thing you'll need to do is run make prior to committing follow-ups to the PR. This will generate all files and so on. Once you do that, we'll be able to preview the rendered docs, as we iterate on the PR :-)

schemas/tls.yml Outdated Show resolved Hide resolved
schemas/tls.yml Outdated Show resolved Hide resolved
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing the output from the three different tools.

schemas/tls.yml Outdated Show resolved Hide resolved
schemas/tls.yml Outdated Show resolved Hide resolved
- Removes `version.original` in favor of `message` field.
- Changes numeric value to keyword to allow for semantic versioning if that ever becomes a thing
- Renames `*.certificate_fingerprint` to `*.hash` to be ECS consistent
@webmat
Copy link
Contributor

webmat commented Nov 13, 2019

Thanks for providing the output from the three different tools.

Where is this? :-) I'm curious as a 🐱

@dcode dcode changed the title Add TLS schema to TLS Add TLS schema to ECS Nov 13, 2019
@dcode
Copy link
Contributor Author

dcode commented Nov 13, 2019

Thanks for providing the output from the three different tools.

Where is this? :-) I'm curious as a 🐱

@webmat I think he's referring to my gist. I can provide some actual output from the three tools. It'll take me a little bit, but it could probably make this process easier to digest.

@webmat
Copy link
Contributor

webmat commented Nov 13, 2019

@dcode Ah right. No the gist is fine. I do have Suricata, Zeek and Packetbeat all monitoring my own workstation. If you want to see me spying on myself, I'll happily give you access to the ECS cluster :-)

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super happy with how this is shaping up! Thanks for adding all of the examples :-)

I agree with making the field set "extended".

Here's another more detailed round of feedback. A few small things to adjust, a few questions to better understand the thinking, but nothing major is sticking out to me.

schemas/tls.yml Outdated
- name: version.number
level: extended
type: keyword
description: Numeric part of the version parsed from the original string (e.g. 1.2, 3).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two fields cannot be nested under tls.version, as this is a hard break vs Packetbeat's current definition for tls.version, where the field is type keyword. This would lead to a mapping conflict within the 7.x product line, which we can't do.

What would work well, I think is the following:

  • tls.version (keyword)
    • Field content would need to move from Packetbeat's content of e.g. "TLS 1.3" to "1.3", however
    • perfectly aligned with all other ECS *.version fields.
  • tls.version_protocol (keyword)
    • This is where we would put either "tls" or "ssl"
    • The underscore avoids the mapping conflict
    • The definition of tls.version_protocol should mention that the protocol must be lowercased, in order to be aligned with network fields such as network.protocol, network.application, network.type and network.transport.

What I'm laying out above however still leads to a minor break within the 7.x product line, where Packetbeat would have to start putting "pure" version values such as "1.2", "1.3" instead of its current "TLS 1.2", "TLS 1.3".

I'd like to get people's take on that kind of break for the tls.version field's content for Packetbeat. @andrewkroh @tsg @ruflin WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My review had been in progress while you moved things around for this part of the file, but this comment is still applicable after your recent push.

We can't nest under tls.version.* as this is a hard breaking change vs Packetbeat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that really breaks the version string. The only thing that is broken is doing analysis across agent versions. Even then, analysis will generally still work, but be degraded, which seems like a reasonable progression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's also my take. I hope we can take this step forward. But I also feel it's important we get alignment from folks on this :-)

schemas/tls.yml Show resolved Hide resolved
schemas/tls.yml Outdated Show resolved Hide resolved
schemas/tls.yml Show resolved Hide resolved
schemas/tls.yml Outdated Show resolved Hide resolved
schemas/tls.yml Outdated Show resolved Hide resolved
schemas/tls.yml Outdated Show resolved Hide resolved
schemas/tls.yml Outdated Show resolved Hide resolved
scripts/go.mod Show resolved Hide resolved
@dcode
Copy link
Contributor Author

dcode commented Nov 13, 2019

I added the raw output for the same session to the gist from zeek, suricata, and packetbeat

Copy link
Contributor

@andrewstucki andrewstucki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to a top-level comment since I left it on something that got resolved right at the same time:

Does it make sense to add a field for SAN specification? As the SNI stuff kind of implies, there are going to be plenty of times where the CN (server.subject here I believe) doesn't match a SAN specified in the cert.

@webmat
Copy link
Contributor

webmat commented Nov 13, 2019

Also don't forget to add a changelog entry to CHANGELOG.next.md :-)

@dcode
Copy link
Contributor Author

dcode commented Nov 13, 2019

I'm not sure about the SAN thing. That's digging into the certificate itself. Packetbeat is the only application I know that provides that out of the box as a list of all SANs. I suspect that Packetbeat will continue to ship x509 data in TLS events, but we don't have to define that here. I think a natural progression of this PR is defining a x509 schema to capture this information for systems the provide file analysis.

Open to discussion though.

@webmat
Copy link
Contributor

webmat commented Nov 13, 2019

I'll let other folks chime in on @andrewstucki's point. I'm not familiar enough with that

- Both packetbeat & zeek indicate whether the negotiation was successful
- I went with shorter name that still made sense (to me)
@dcode
Copy link
Contributor Author

dcode commented Nov 13, 2019

Subject to any additional input or dissenting opinions I'm done hacking on this for now.

@dcode
Copy link
Contributor Author

dcode commented Nov 14, 2019

Thinking through the SAN list point from @andrewstucki, I see the analytic need for sure, but I want to minimize data duplication. It might make sense to put the things that packetbeat currently offers in tls.server_certificate.* into a structure located at tls.server.x509.*. The reason being that x509 can exist in other places. I previously mentioned file analysis (as found in stelka and zeek), but also in things like S/MIME digitally signed email, digital signatures in office/pdf documents, IPSEC authentication, and others.

My goal is merely to think through a uniform way of representing that if we're defining it at all. It goes beyond the scope of this PR, but I think x509 as a nested field would allow a wildcard field search such as *.x509.issuer: CN=Let's Encrypt and you would be able to find all the certificates regardless of use.

Once we get that hashed out, we could do a fun ML job to find anomalous alt names for a given server_name or even destination IP address. 🤷‍♂

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final nit about the link format in the description and I think we're good.

Thanks for the additional thoughts around SAN and x509. Since this is a set of fields that would make sense in other places, I agree with keeping this out of this PR, moving forward and looking at the full use case for this in a later PR.

I'll let other people chime in, but unless dissenting opinions or other points are raised, I think we're good to merge later today or tomorrow :-)

use-cases/tls.md Show resolved Hide resolved
schemas/tls.yml Outdated Show resolved Hide resolved
schemas/tls.yml Show resolved Hide resolved
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm good with having the x509 content in its own reusable object.

@andrewstucki
Copy link
Contributor

I too like the idea of having a reusable x509 namespace that can be either embedded under TLS or as a standalone top-level thing for the SAN data, reasoning makes sense to me--also, totally cool addressing it in a subsequent PR

@webmat webmat merged commit 4570245 into elastic:master Nov 15, 2019
@webmat webmat mentioned this pull request Mar 12, 2020
dcode added a commit to dcode/ecs that referenced this pull request Apr 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants