Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query by Indexed Component Value #17608

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

bushrat011899
Copy link
Contributor

@bushrat011899 bushrat011899 commented Jan 30, 2025

Objective

Solution

Users can now call app.add_index::<C>() for an appropriate component C to have Bevy automatically track its value and provide a high-performance archetypical indexing solution. Users can then query by-value using the QueryByIndex system parameter all entities with a specific indexed value. The value is provided within the body of a system, meaning QueryByIndex can access any entity matching the provided compile-time query filters.

This is achieved by adding OnInsert and OnReplace observers and requiring that the index component C be immutable, ensuring those two hooks are able to capture all mutations. Since this is done with observers, users can index 3rd party components, required they are immutable and implement the traits required for the blanket implementation of IndexComponent.

Within the observers, a private index resource is managed which tracks the component value against a runtime marker component. This runtime marker component is used to group entities by-value at the archetype level for the index component.

This grouping allows for maximum performance in iteration, since querying is simply filtering for a hidden component by-archetype.

To-Do / Follow-up

  • It would be nice to index against a subset of a component or a whole bundle instead. This requires an immutable-bundle concept and for the user to provide a function to create their derived index from the supplied bundle.

Testing

  • New index example
  • CI

Performance

I've added some index benchmarks to the ecs bench which test iteration speed (index_iter) and mutation speed (index_update). Running these benchmarks locally, I see a performance improvement by a factor of 2 when iterating using the index (where 1 in 16 are the target value) vs a naive iter().filter(...). However there is a performance degradation by a factor of 4 in that same scenario.

This confirms expectation: indexing is only a performance win in scenarios where a value is rarely updated but frequently searched for.

Release Notes

Users can now query by-value using indexes. To setup an index, first create an immutable component with Clone, Eq, and Hash implementations:

#[derive(Component, PartialEq, Eq, Hash, Clone)]
#[component(immutable)]
struct TilePosition {
    x: i32,
    y: i32,
}

Next, request that an index be managed for this component:

app.add_index::<TilePosition>();

Finally, you can query by-value using the new QueryByIndex system parameter:

fn query_for_tiles(mut query: QueryByIndex<TilePosition, (Entity, &TilePosition)>) {
    let count = query.at(&TilePosition { x: 1, y: 1 }).iter().count();

    println!("Found {count} at (1,1)!");
}

QueryByIndex is just like Query except it has a third generic argument, C, which denotes the component you'd like to use as the index:

// Typical query
query: Query<&Foo, With<Bar>>,

// Indexed query
query: QueryByIndex<MyIndex, &Foo, With<Bar>>,

Internally, the index will move entities with differing values into their own archetypes, allowing for fast iteration at the cost of some added overhead when modifying the indexed value. As such, indexes are best used where you have an infrequently changing component that you frequently need to access a subset of.

@bushrat011899 bushrat011899 marked this pull request as draft January 30, 2025 01:15
@alice-i-cecile alice-i-cecile added C-Feature A new feature, making something new possible A-ECS Entities, components, systems, and events X-Controversial There is active debate or serious implications around merging this PR M-Needs-Release-Note Work that should be called out in the blog due to impact S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Jan 30, 2025
}

/// Marker describing the requirements for a [`Component`] to be suitable for indexing.
pub trait IndexComponent: Component<Mutability = Immutable> + Eq + Hash + Clone {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Hash is too strong. BTreeMaps or QuadTrees will work just fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of those would require Ord, which might be difficult to justify for spatial indexes. Perhaps we can offer a more generic version of this API that would allow users to choose which trait bounds and storage option makes the most sense for them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's my hope. Ideally we move the bounds off of here, and onto the lookup methods.

@bushrat011899 bushrat011899 changed the title DRAFT: Query by Indexed Component Value Query by Indexed Component Value Jan 30, 2025
@bushrat011899 bushrat011899 marked this pull request as ready for review January 30, 2025 04:13
@bushrat011899 bushrat011899 added S-Needs-Review Needs reviewer attention (from anyone!) to move forward S-Needs-SME Decision or review from an SME is required and removed S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Jan 30, 2025
@bushrat011899
Copy link
Contributor Author

Marking as ready-for-review as I have the finalised MVP API I would like for this feature. As mentioned, it would be nice to allow Ord instead of Hash as an alternative, but I'll keep it simple for now just to see if the underlying strategy of fragmenting on an immutable component's value using observers is valid.

/// }
/// }
/// ```
pub fn at(&mut self, value: &C) -> Query<'_, '_, D, (F, With<C>)> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: this implies the ability to do between, which would be super sweet. Another argument for an Ord backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! Generalised storage backends could allow some really nice methods here for sure.

world.entity_mut(entity).remove_by_id(component_id);

// On removal, we check if this was the last usage of this marker.
// If so, we can recycle it for a different value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's pretty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some pretty wild Rust in this file!

bushrat011899 and others added 11 commits January 30, 2025 16:17
Co-Authored-By: Alice Cecile <alice.i.cecile@gmail.com>
Co-Authored-By: Alice Cecile <alice.i.cecile@gmail.com>
Co-Authored-By: Alice Cecile <alice.i.cecile@gmail.com>
Co-Authored-By: Alice Cecile <alice.i.cecile@gmail.com>
Co-Authored-By: Alice Cecile <alice.i.cecile@gmail.com>
Co-Authored-By: Alice Cecile <alice.i.cecile@gmail.com>
Co-Authored-By: Alice Cecile <alice.i.cecile@gmail.com>
Co-Authored-By: Alice Cecile <alice.i.cecile@gmail.com>
Co-Authored-By: Alice Cecile <alice.i.cecile@gmail.com>
Co-Authored-By: Alice Cecile <alice.i.cecile@gmail.com>
Co-Authored-By: Alice Cecile <alice.i.cecile@gmail.com>
/// }
/// }
/// ```
pub fn at(&mut self, value: &C) -> Query<'_, '_, D, (F, With<C>)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call this where personally but maybe at has a more ecs-y flavor that I'm not familiar with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like at personally because it's like we're getting the query at a particular value. But I'd be happy to change it

Copy link
Contributor

@cBournhonesque cBournhonesque Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer at to where personally. I was otherwise thinking of for, but I think at might be better

@bushrat011899
Copy link
Contributor Author

Preliminary benchmarking shows the index speeds up iteration by a factor of two when 1-in-16 entities has the target value. However, it degrades mutation performance of that index component by a factor of ten in that same scenario. As expected, indexing is beneficial when you're searching for values at a much higher rate than updating them.

@bushrat011899
Copy link
Contributor Author

With a little refactoring I was able to close the performance gap in mutation to a factor of 4 in the benchmark. Still costly, but an improvement nonetheless. One last thing I want to change is app.add_index doesn't check for existing spawned entities. It should for correctness.

@bushrat011899 bushrat011899 removed the S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help label Jan 30, 2025
@cBournhonesque
Copy link
Contributor

cBournhonesque commented Jan 30, 2025

I've added some index benchmarks to the ecs bench which test iteration speed (index_iter) and mutation speed (index_update). Running these benchmarks locally, I see a performance improvement by a factor of 2 when iterating using the index (where 1 in 16 are the target value) vs a naive iter().filter(...). However there is a performance degradation by a factor of 4 in that same scenario.

We do want to benchmark against iter().filter(...) which just filters over all entities, but I think what we also really want is to benchmark against a more naive index implementation that just maintains a HashMap<value, HashSet<Entity>>, right?
I would love to see more explanations for why this is better than something like storing a hashmap in a resource, as outlined by https://hackmd.io/@bevy/BJPfCxduJe

(it's also the tool that I reach out for when I need to maintain an internal index in my own games)

Or is the main benefit compared to using a hashmap that this makes the query code straightforward to update (we just need to update the access instead of making doing more involved changes into the list of archetypes to iterate, etc.)? I do like the elegance of the design, I guess the alternative would be for the at to return a new QueryIndexedIter which contains the list of entities to iterate (which is computed via the index resource)

I just wonder if this elegant ZST based design won't come to bite us later on, that's why I'd like to understand better the benefits vs the QueryIndexedIter approach

Copy link
Contributor

@Carter0 Carter0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know my review doesn't mean much on this, but I think the code looks really clean and I think the pitch for this implementation is really good.

}
None => {
// Need to clone the index value for later lookups
let value = value.clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have to be in this PR, but I see a world where the value that is used here is instead of the result of some IndexableComponent::hash_value() function, so that we can group components into 'buckets' of values.

That way we could avoid the issue of needing too many ZSTs components for components with float values, as we could for example map all the entities with values between 0.0-1.0 in the same bucket (and the query would then have to do an additional check via a .iter().filter() in the QueryIndex since there's no guarantee that the entities returned by the index have the exact value that we want).

I'm not super knowledgeable about indexes but basically I think we can re-use a bunch of tricks used by traditional DB indexes

//! Fragmenting requires unique [component IDs](ComponentId), and they are finite.
//! For components with a small number of values in use at any one time, this is acceptable.
//! But for components with a _large_ number of unique values (e.g., components containing floating point numbers),
//! the pool of available component IDs will be quickly exhausted.
Copy link
Contributor

@cBournhonesque cBournhonesque Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does seem dangerous, I wonder if we could return a warning after a certain number of ZSTs are used?

Also I guess the work to make the ECS work well with higher componentID values will become more important

///
/// // Indexing is opt-in through `World::add_index`
/// world.add_index::<Player>();
/// # world.spawn(Player(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the doc is a bit lengthy, maybe having 0 Player(0), 1 Player(1), 2 Player(2)s is enough to demonstrate how this works

// If there is a marker, restrict to it
Some(state) => builder.with_id(state.component_id),
// Otherwise, create a no-op query by including With<C> and Without<C>
None => builder.without::<C>(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clever!

// Simply call `add_index` to start indexing a component.
// Make sure to call this _before_ any components are spawned, otherwise
// the index will miss those entities!
.add_index::<TilePosition>()
Copy link
Contributor

@cBournhonesque cBournhonesque Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be worthwhile to have some kind of configuration to let users define which entities they want indexed.
Maybe .add_index::<C>(IndexConfig) with IndexConfig.automatic_indexation: bool?

If it's set to false, then an entity would be indexed only if a Index<C> component is added to an entity.

This lets us:

  • ignore some entities that we don't want to get indexed (cheaper indexation). In your version they would still get indexed and the user would have to filter them out in their QueryByIndex
  • have an easy way to remove an entity from an index (by removing the Index<C> component)

For example I might only want to index the Transform of only a few entities (player entities), but not of sprites, walls, etc.

Maybe both version can co-exist, because your version has ergonomic benefits (users don't have to add Index on every entity that needs to be indexed).

(Of course this doesn't have to be a blocker for this PR)

pub fn at(&mut self, value: &C) -> Query<'_, '_, D, (F, With<C>)> {
self.state = {
// SAFETY: Mutable references do not alias and will be dropped after this block
let mut builder = unsafe { QueryBuilder::new(self.world.world_mut()) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this call to world_mut() is unsound. It will conflict with uses in other params in the same system, or in other systems running in parallel. It might be impossible to use QueryBuilder like this.

One approach that might work is to use the QueryState you're already storing in the parameter state, and try to re-filter the matched_storage_ids (and somehow put them back when you're done). That would also mean you only have to iterate archetypes/tables matching the overall filter, instead of having to check all of them as QueryBuilder does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was concerned by this. I'm currently refactoring how this works to be more conservative with component IDs but once that is done I'd like to revisit how this works. Your suggestion might be the right way forward (for performance too probably!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, your plan is to preallocate 32 component ids to use as bits, and use those to encode a 32-bit index in the archetype, right? I think you could build something that stores a Vec<QueryState> and does all of the archetype filtering in new_archetype. That would give you no overhead at query execution time, and would let you get queries with &self instead of &mut self!

The idea would be to implement SystemParam::new_archetype() manually by first doing the same filtering as any Query<D, (F, With<C>)>, but then by calling archetype.contains 32 times to get the 32-bit index encoded in that archetype. Then you'd look up the QueryState at that index in the Vec, and tell only that one about the archetype. So each QueryState in the Vec would always have the full list of archetypes with the given index, and none with other indices!

You'd need a way to make new QueryStates when you first see higher indices, but I think that's straightforward. Everything inside it is already Clone except for D::State and F::State, and I think all actual WorldQuery impls have State: Clone, so you could just add a bound there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to try and do this now that I've got the combinatorial component ID indexing working. Seriously, thank you for explaining this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Feature A new feature, making something new possible M-Needs-Release-Note Work that should be called out in the blog due to impact S-Needs-Review Needs reviewer attention (from anyone!) to move forward S-Needs-SME Decision or review from an SME is required X-Controversial There is active debate or serious implications around merging this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Entity Markers Indexes: look up entities by the value of their components
6 participants