-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TLS schema to ECS #606
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huge thanks for getting started on this, @dcode :-)
So contrary to the Beats parser for these YAML files, the generator in ECS doesn't support re-nesting with fields:
, once you're adding stuff inside the field set. I've commented inside the file for more precise pointers.
General feedback, please add an example section for as many fields as possible. Typically when we do that, we try to keep the examples aligned across all examples in the field set. For an example of what I mean, you can check out the url examples, how url.original is a full URL, but then all the other fields are the actual same values extracted from the full URL.
It's also fine to kind of fake it, for example you could use the exact same examples for the client and server sections.
So that's my round of preliminary feedback.
Another thing you'll need to do is run make
prior to committing follow-ups to the PR. This will generate all files and so on. Once you do that, we'll be able to preview the rendered docs, as we iterate on the PR :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for providing the output from the three different tools.
- Removes `version.original` in favor of `message` field. - Changes numeric value to keyword to allow for semantic versioning if that ever becomes a thing - Renames `*.certificate_fingerprint` to `*.hash` to be ECS consistent
Where is this? :-) I'm curious as a 🐱 |
@dcode Ah right. No the gist is fine. I do have Suricata, Zeek and Packetbeat all monitoring my own workstation. If you want to see me spying on myself, I'll happily give you access to the ECS cluster :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super happy with how this is shaping up! Thanks for adding all of the examples :-)
I agree with making the field set "extended".
Here's another more detailed round of feedback. A few small things to adjust, a few questions to better understand the thinking, but nothing major is sticking out to me.
schemas/tls.yml
Outdated
- name: version.number | ||
level: extended | ||
type: keyword | ||
description: Numeric part of the version parsed from the original string (e.g. 1.2, 3). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two fields cannot be nested under tls.version
, as this is a hard break vs Packetbeat's current definition for tls.version
, where the field is type keyword
. This would lead to a mapping conflict within the 7.x product line, which we can't do.
What would work well, I think is the following:
tls.version
(keyword
)- Field content would need to move from Packetbeat's content of e.g. "TLS 1.3" to "1.3", however
- perfectly aligned with all other ECS
*.version
fields.
tls.version_protocol
(keyword
)- This is where we would put either "tls" or "ssl"
- The underscore avoids the mapping conflict
- The definition of
tls.version_protocol
should mention that the protocol must be lowercased, in order to be aligned with network fields such asnetwork.protocol
,network.application
,network.type
andnetwork.transport
.
What I'm laying out above however still leads to a minor break within the 7.x product line, where Packetbeat would have to start putting "pure" version values such as "1.2", "1.3" instead of its current "TLS 1.2", "TLS 1.3".
I'd like to get people's take on that kind of break for the tls.version
field's content for Packetbeat. @andrewkroh @tsg @ruflin WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My review had been in progress while you moved things around for this part of the file, but this comment is still applicable after your recent push.
We can't nest under tls.version.*
as this is a hard breaking change vs Packetbeat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that really breaks the version string. The only thing that is broken is doing analysis across agent versions. Even then, analysis will generally still work, but be degraded, which seems like a reasonable progression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's also my take. I hope we can take this step forward. But I also feel it's important we get alignment from folks on this :-)
I added the raw output for the same session to the gist from zeek, suricata, and packetbeat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this to a top-level comment since I left it on something that got resolved right at the same time:
Does it make sense to add a field for SAN specification? As the SNI stuff kind of implies, there are going to be plenty of times where the CN (server.subject here I believe) doesn't match a SAN specified in the cert.
Also don't forget to add a changelog entry to |
I'm not sure about the SAN thing. That's digging into the certificate itself. Packetbeat is the only application I know that provides that out of the box as a list of all SANs. I suspect that Packetbeat will continue to ship x509 data in TLS events, but we don't have to define that here. I think a natural progression of this PR is defining a Open to discussion though. |
I'll let other folks chime in on @andrewstucki's point. I'm not familiar enough with that |
- Both packetbeat & zeek indicate whether the negotiation was successful - I went with shorter name that still made sense (to me)
Subject to any additional input or dissenting opinions I'm done hacking on this for now. |
Thinking through the My goal is merely to think through a uniform way of representing that if we're defining it at all. It goes beyond the scope of this PR, but I think Once we get that hashed out, we could do a fun ML job to find anomalous alt names for a given server_name or even destination IP address. 🤷♂ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One final nit about the link format in the description and I think we're good.
Thanks for the additional thoughts around SAN and x509. Since this is a set of fields that would make sense in other places, I agree with keeping this out of this PR, moving forward and looking at the full use case for this in a later PR.
I'll let other people chime in, but unless dissenting opinions or other points are raised, I think we're good to merge later today or tomorrow :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'm good with having the x509 content in its own reusable object.
I too like the idea of having a reusable x509 namespace that can be either embedded under TLS or as a standalone top-level thing for the SAN data, reasoning makes sense to me--also, totally cool addressing it in a subsequent PR |
I compared the output of Zeek 3.x, Suricata 5.x, and Packetbeat 7.5 and selected these fields as a common set that would cover the TLS protocol. You can see my scratchpad where I compared field names and came up with an example data record (YAML formatted).
While selecting fields, I aimed to keep only the data that centers around the SSL/TLS negotiation and avoid digging too deep into the x.509 certificates themselves. The certificates contain much more data than is extracted here, but this is similar to how Zeek and Suricata handle SSL/TLS protocol metadata by default, where only the high-level data is extracted.
With that in mind, there should be sufficient information recorded that will allow an analyst to pivot from a TLS event to a first class file object document (not currently defined by ECS) that extracts and analyzes all the x.509 fields. Zeek provides this as a separate x.509 event, for example. This data could also be provided by recursive file scanners like Strelka or potentially endpoint agents.
Formatting Question: I broke all the "objects" into their own group. It might be better to just use a dotted field notation, understanding that the tooling takes care of the rest. I'm not sure on this one.