Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add related.entity field #2360

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.next.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Thanks, you're awesome :-) -->
* Advanced `process.io` and `process.tty` fields to GA. #2317
* Added `threat.indicator.id`. #2324
* Added `process.group` to generated schemas. #2335
* Added `related.entity` field #2360

#### Improvements

Expand Down
19 changes: 19 additions & 0 deletions docs/fields/field-details.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -9124,6 +9124,25 @@ A concrete example is IP addresses, which can be under host, observer, source, d

// ===============================================================

|
[[field-related-entity]]
<<field-related-entity, related.entity>>

a| All the entity identifiers related to the document. If the document contains multiple entities, identifiers belonging to different entities will be present. Example identifiers include cloud resource IDs, ARNs, email addresses, or hostnames.

type: keyword


Note: this field should contain an array of values.





| extended

// ===============================================================

|
[[field-related-hash]]
<<field-related-hash, related.hash>>
Expand Down
9 changes: 9 additions & 0 deletions experimental/generated/beats/fields.ecs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7942,6 +7942,15 @@
type: group
default_field: true
fields:
- name: entity
level: extended
type: keyword
ignore_above: 1024
description: All the entity identifiers related to the document. If the document
contains multiple entities, identifiers belonging to different entities will
be present. Example identifiers include cloud resource IDs, ARNs, email addresses,
or hostnames.
default_field: false
- name: hash
level: extended
type: keyword
Expand Down
1 change: 1 addition & 0 deletions experimental/generated/csv/fields.csv
Original file line number Diff line number Diff line change
Expand Up @@ -1026,6 +1026,7 @@ ECS_Version,Indexed,Field_Set,Field,Type,Level,Normalization,Example,Description
8.12.0-dev+exp,true,registry,registry.key,keyword,core,,SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\winword.exe,Hive-relative path of keys.
8.12.0-dev+exp,true,registry,registry.path,keyword,core,,HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\winword.exe\Debugger,"Full path, including hive, key and value"
8.12.0-dev+exp,true,registry,registry.value,keyword,core,,Debugger,Name of the value written.
8.12.0-dev+exp,true,related,related.entity,keyword,extended,array,,All the entity identifiers
8.12.0-dev+exp,true,related,related.hash,keyword,extended,array,,All the hashes seen on your event.
8.12.0-dev+exp,true,related,related.hosts,keyword,extended,array,,All the host identifiers seen on your event.
8.12.0-dev+exp,true,related,related.ip,ip,extended,array,,All of the IPs seen on your event.
Expand Down
14 changes: 14 additions & 0 deletions experimental/generated/ecs/ecs_flat.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12933,6 +12933,20 @@ registry.value:
normalize: []
short: Name of the value written.
type: keyword
related.entity:
dashed_name: related-entity
description: All the entity identifiers related to the document. If the document
contains multiple entities, identifiers belonging to different entities will be
present. Example identifiers include cloud resource IDs, ARNs, email addresses,
or hostnames.
flat_name: related.entity
ignore_above: 1024
level: extended
name: entity
normalize:
- array
short: All the entity identifiers
type: keyword
related.hash:
dashed_name: related-hash
description: All the hashes seen on your event. Populating this field, then using
Expand Down
14 changes: 14 additions & 0 deletions experimental/generated/ecs/ecs_nested.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15400,6 +15400,20 @@ related:
`related.ip`, you can then search for a given IP trivially, no matter where it
appeared, by querying `related.ip:192.0.2.15`.'
fields:
related.entity:
dashed_name: related-entity
description: All the entity identifiers related to the document. If the document
contains multiple entities, identifiers belonging to different entities will
be present. Example identifiers include cloud resource IDs, ARNs, email addresses,
or hostnames.
flat_name: related.entity
ignore_above: 1024
level: extended
name: entity
normalize:
- array
short: All the entity identifiers
type: keyword
related.hash:
dashed_name: related-hash
description: All the hashes seen on your event. Populating this field, then
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@
"properties": {
"related": {
"properties": {
"entity": {
"ignore_above": 1024,
"type": "keyword"
},
"hash": {
"ignore_above": 1024,
"type": "keyword"
Expand Down
4 changes: 4 additions & 0 deletions experimental/generated/elasticsearch/legacy/template.json
Original file line number Diff line number Diff line change
Expand Up @@ -4684,6 +4684,10 @@
},
"related": {
"properties": {
"entity": {
"ignore_above": 1024,
"type": "keyword"
},
"hash": {
"ignore_above": 1024,
"type": "keyword"
Expand Down
9 changes: 9 additions & 0 deletions generated/beats/fields.ecs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7892,6 +7892,15 @@
type: group
default_field: true
fields:
- name: entity
level: extended
type: keyword
ignore_above: 1024
description: All the entity identifiers related to the document. If the document
contains multiple entities, identifiers belonging to different entities will
be present. Example identifiers include cloud resource IDs, ARNs, email addresses,
or hostnames.
default_field: false
- name: hash
level: extended
type: keyword
Expand Down
1 change: 1 addition & 0 deletions generated/csv/fields.csv
Original file line number Diff line number Diff line change
Expand Up @@ -1019,6 +1019,7 @@ ECS_Version,Indexed,Field_Set,Field,Type,Level,Normalization,Example,Description
8.12.0-dev,true,registry,registry.key,keyword,core,,SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\winword.exe,Hive-relative path of keys.
8.12.0-dev,true,registry,registry.path,keyword,core,,HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\winword.exe\Debugger,"Full path, including hive, key and value"
8.12.0-dev,true,registry,registry.value,keyword,core,,Debugger,Name of the value written.
8.12.0-dev,true,related,related.entity,keyword,extended,array,,All the entity identifiers
8.12.0-dev,true,related,related.hash,keyword,extended,array,,All the hashes seen on your event.
8.12.0-dev,true,related,related.hosts,keyword,extended,array,,All the host identifiers seen on your event.
8.12.0-dev,true,related,related.ip,ip,extended,array,,All of the IPs seen on your event.
Expand Down
14 changes: 14 additions & 0 deletions generated/ecs/ecs_flat.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12864,6 +12864,20 @@ registry.value:
normalize: []
short: Name of the value written.
type: keyword
related.entity:
dashed_name: related-entity
description: All the entity identifiers related to the document. If the document
contains multiple entities, identifiers belonging to different entities will be
present. Example identifiers include cloud resource IDs, ARNs, email addresses,
or hostnames.
flat_name: related.entity
ignore_above: 1024
level: extended
name: entity
normalize:
- array
short: All the entity identifiers
type: keyword
related.hash:
dashed_name: related-hash
description: All the hashes seen on your event. Populating this field, then using
Expand Down
14 changes: 14 additions & 0 deletions generated/ecs/ecs_nested.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15320,6 +15320,20 @@ related:
`related.ip`, you can then search for a given IP trivially, no matter where it
appeared, by querying `related.ip:192.0.2.15`.'
fields:
related.entity:
dashed_name: related-entity
description: All the entity identifiers related to the document. If the document
contains multiple entities, identifiers belonging to different entities will
be present. Example identifiers include cloud resource IDs, ARNs, email addresses,
or hostnames.
flat_name: related.entity
ignore_above: 1024
level: extended
name: entity
normalize:
- array
short: All the entity identifiers
type: keyword
related.hash:
dashed_name: related-hash
description: All the hashes seen on your event. Populating this field, then
Expand Down
4 changes: 4 additions & 0 deletions generated/elasticsearch/composable/component/related.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@
"properties": {
"related": {
"properties": {
"entity": {
"ignore_above": 1024,
"type": "keyword"
},
"hash": {
"ignore_above": 1024,
"type": "keyword"
Expand Down
4 changes: 4 additions & 0 deletions generated/elasticsearch/legacy/template.json
Original file line number Diff line number Diff line change
Expand Up @@ -4642,6 +4642,10 @@
},
"related": {
"properties": {
"entity": {
"ignore_above": 1024,
"type": "keyword"
},
"hash": {
"ignore_above": 1024,
"type": "keyword"
Expand Down
12 changes: 12 additions & 0 deletions schemas/related.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,3 +70,15 @@
identifiers include FQDNs, domain names, workstation names, or aliases.
normalize:
- array

- name: entity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entity is an extremely broad category. The danger with using this is it will mean different things to different people, and become a bucket that will hold almost anything.

This would reduce the effectiveness of having a common schema, as this field will be used by different users to hold different types of data, and cause problems with writing queries, doing data normalization, etc. Already in the description, there's resource IDs, email addresses, and hostnames, which are three different things.

I think you'll need to consider the use-cases for this, and refine the definition of what this is intended to hold. Maybe just cloud_resource_names? Or have multiple fields for the different types of data that could be related.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Michael, I see where you coming from.

However, our need is very broad, indeed. What we wish is to be able to find any event related to an entity. What is an entity? Can be very much anything. A workstation. A bare metal machine. A user. An ec2 instance. A database. Pretty much anything a SoC team is concerned about.

But then why not specify cloud_resource_name or cloud_entity? Ideally, from a user experience perspective, a user doesn't need to know all the ecs field types to search by something. Doesn't need to think twice or search before typing its search. I do see the point over data organisation on having very separated buckets, but from a search perspective, that decreases the experience. Beyond that, some concepts are just overlapping. We have related.host, related.ips which both hold information about a machine that can be seen as an entity. So where does the data about that specific host exist? We believe it would be easier to just have all the data in related.entity and search from there.

With that said, you mentioned that having it all in one field would reduce the effectiveness of data. Can you expand on that? Why would it cause problems writing queries and doing data normalisation?

Tagging @tinnytintin10 so he can give his cents as product (if he wishes).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your thoughtful analysis of the proposal, @mjwolf!

You're right that "entity" is an extremely broad category, and that's intentional. Let me explain our reasoning and address your concerns:

  1. Regarding data consistency, as related.entity is of keyword type, consistency in data format isn't a concern for searchability. All values stored will be searchable as keywords, regardless of the identifier format.
  2. Regarding query performance, given that related.entity will contain identifiers (such as ARNs, emails, hostnames, etc.) and is mapped to the keyword type, we don't anticipate significant performance issues. Querying over keyword fields is generally efficient in Elasticsearch, especially for exact matches which is the primary use case here.
  3. Regarding data analysis, the introduction of this field should not complicate data analysis. In fact, it may simplify certain types of analysis by providing a unified field for correlation across different entity types. For more specific analyses, users can still rely on the more targeted related fields and other event details.
  4. This approach also lends itself to future extensibility. Suppose certain entity types require more specific handling in the future (i.e., implicit entity type fields like host and user ecs fields), in that case, we can introduce additional fields without breaking the functionality of related.entity.

Regarding alternatives (like the one I mentioned in the last bullet above), creating implicit entity fieldsets for each possible entity type would be a significant undertaking (especially in the cloud). If we were to follow the pattern of existing fields like "host" and "user", we'd quickly run into an explosion of entity types. Consider this non-exhaustive list of potential generic entity types we'd need to account for/introduce:

expand me

a few of these might have some ecs fields available...

  • ACCESS_ROLE
  • API_GATEWAY
  • BACKUP_SERVICE
  • BUCKET
  • CICD_SERVICE
  • CLOUD_LOG_CONFIGURATION
  • CDN
  • CONFIG_MAP
  • CONTAINER_IMAGE
  • CONTAINER_REGISTRY
  • CONTAINER_REPOSITORY
  • DATA_WORKFLOW
  • DATA_WORKLOAD
  • DATABASE
  • DNS_RECORD
  • DNS_ZONE
  • DOMAIN
  • EMAIL_SERVICE
  • ENCRYPTION_KEY
  • FILE_SYSTEM_SERVICE
  • FIREWALL
  • GATEWAY
  • GOVERNANCE_POLICY
  • LOAD_BALANCER
  • MANAGED_CERTIFICATE
  • MANAGEMENT_SERVICE
  • MAP_REDUCE_CLUSTER
  • MESSAGING_SERVICE
  • MONITOR_ALERT
  • NETWORK_ADDRESS
  • NETWORK_INTERFACE
  • PEERING
  • PRIVATE_ENDPOINT
  • PRIVATE_LINK
  • RAW_ACCESS_POLICY
  • REGION
  • REGISTERED_DOMAIN
  • RESOURCE_GROUP
  • ROUTE_TABLE
  • SEARCH_INDEX
  • SECRET
  • SECRET_CONTAINER
  • SERVERLESS
  • SERVERLESS_PACKAGE
  • SERVICE_CONFIGURATION
  • SERVICE_USAGE_TECHNOLOGY
  • SNAPSHOT
  • STORAGE_ACCOUNT
  • SUBNET
  • SUBSCRIPTION
  • VIRTUAL_NETWORK
  • VOLUME
  • WEB_SERVICE

This list doesn't even include some of the entity types we already have ECS fields for, such as those related to hosts, users, and Kubernetes (which ECS calls orchestrator).

Creating and maintaining fields for each of these entity types would not only take considerable time to implement but would also result in a proliferation of field types. This approach would place a substantial cognitive burden on users, requiring them to remember a large number of specific fields for different entity types.

The related.entity field addresses this challenge by providing a single, unified field for correlation. Users don't need to know the implicit entity type for each resource to correlate events, greatly simplifying the process. For instance, they wouldn't need to know that a bucket is for blob storage or that an ARN identifies an AWS resource - they could simply use related.entity to find all events related to that entity. i.e., related.entity offers a user-friendly way to correlate events across diverse entity types without overwhelming users or complicating the schema unnecessarily.

As we move forward, we'll continue to evaluate and adapt based on the evolving needs of our users and the insights we gain from this implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tinnytintin10 for the excellent explanation, I think this makes sense for achieving what you want to achieve

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjwolf what do we need to wrap this PR up? If you approve we can merge or it must be discussed in other forums?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I've been looking into how could we implement this in OpenTelemetry. @trisch-me has been supporting me on the process. The summary is:

It's under discussion concepts that have some correlation with what we want to do. OTel have the concept of Resources, which "represents the entity producing telemetry as resource attributes". The concept of entity is under discussion here .

However, this discussion doesn't fully cover what we want because they are focusing mainly on resolving the problem of what entity produced the telemetry observation. What we want to observe is different, we want to know what entity has relation with the emitted event (actor or target). An event can have multiple related entities. And the emitting entity might have nothing to do with the information we would like. Example: romulo created the ec2 instance i-001 was emitted by the trail monitor-elastic. On this case, only the entities romulo and i-001 are relevant to the security use case. The entity trail monitor-elastic is supporting information of how that event was reported. But no security interest there.

I have discussed our use case both in Semantics SIG and Entity SIG. Both groups agreed that our use case is legit. But Entity SIG (the correct group to have this discussion) has other priorities to discuss right now. They suggested to us, as elastic, to open an OTep and be prepare our case to be discussed in the Entity SIG.

Because I believe this goes beyond just CDR, and other teams might have interest, I want to take this discussion further and find how we, as elastic, want to take this topic on. My next point of discussion will be at the Elastic OpenTelemetry Office Hours, where I'll raise what we want to do and see if more people in that group would have an interest.

There isn't, however any timeline available of when this discussion will properly start or end. @trisch-me and I agreed to merge this PR asynchronously from any OTel discussion or outcome, because the pace doesn't seem too promising.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe such a broad topic will take months in Otel to at least formalise correctly, not saying about implementation.
Also I want to add that we are talking here only about entities inside related. But the whole concept of related namespaces will not be ported to the otel as is, there is no place for it in this format. So before we could have entities there we should think about parent namespace related first.

Saying that I believe we should proceed with this topic in ECS first

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tinnytintin10 hey, thank a lot for this explanation, but I still have doubts. Let's say I'm a developer who is using ECS for my case (for example we at Elastic are using ECS in Endgame), how should I detect if something from my event/log entry is an entity or not? Should I just throw into the bucket everything I think can be entity?

I understand it's a broader topic, but I would like to have more clarity and examples there. Also to give everyone reading the info an understanding of the field and data put into it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trisch-me do you have examples on what do you mean by I think can be entity? Not sure what are the use cases you thought of.

I'm understanding entity as:

An "entity" in our context refers to any discrete component within an IT environment that can be uniquely identified and monitored. This broad term encompasses both managed and unmanaged elements.

Machines, virtual or physical, are entities. Instances of tooling/services/components such as queueing topics/subscriptions, databases, networking components, object storages, authentication and authorization components (and others) are entities. So are Users and they representation in different systems, like Okta, AWS, Azure, Active Directory and others.

Did you think something I didn't mention here @trisch-me?

Regardless, I agree that we as elastic should have a formal definition written in our docs of what are entities, and what are not entities. That will help us move forward with less doubts and friction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I said I think it was a reference to a developer, who might not understand or know what entities are, especially if it's another non-security domain. So my request is to have concrete proposal on what entity is, directly in the description or notes. Even sentences you wrote above are better explanation than we currently have in proposal. We might also have this definition somewhere else and have just a link to it

level: extended
type: keyword
short: All the entity identifiers
description: >
All the entity identifiers related to the document. If the document
contains multiple entities, identifiers belonging to different entities
will be present. Example identifiers include cloud resource IDs, ARNs, email
addresses, or hostnames.
normalize:
- array

Check failure on line 84 in schemas/related.yml

View workflow job for this annotation

GitHub Actions / Unit Tests

84:16 [new-line-at-end-of-file] no new line character at the end of file
Loading