Query by Indexed Component Value #17608

bushrat011899 · 2025-01-30T01:14:51Z

Objective

Fixes Indexes: look up entities by the value of their components #4513
Possible fix for Entity groups #1592
Fixes Entity Markers #6556
Add a first-class option for querying by component value.

Solution

Users can now call app.add_index::<C>() for an appropriate component C to have Bevy automatically track its value and provide a high-performance archetypical indexing solution. Users can then query by-value using the QueryByIndex system parameter all entities with a specific indexed value. The value is provided within the body of a system, meaning QueryByIndex can access any entity matching the provided compile-time query filters.

This is achieved by adding OnInsert and OnReplace observers and requiring that the index component C be immutable, ensuring those two hooks are able to capture all mutations. Since this is done with observers, users can index 3rd party components, required they are immutable and implement the traits required for the blanket implementation of IndexComponent.

Within the observers, a private index resource is managed which tracks the component value against a runtime marker component. This runtime marker component is used to group entities by-value at the archetype level for the index component.

This grouping allows for maximum performance in iteration, since querying is simply filtering for a hidden component by-archetype.

To-Do / Follow-up

It would be nice to index against a subset of a component or a whole bundle instead. This requires an immutable-bundle concept and for the user to provide a function to create their derived index from the supplied bundle.

Testing

New index example
CI

Performance

I've added some index benchmarks to the ecs bench which test iteration speed (index_iter) and mutation speed (index_update). Running these benchmarks locally, I see a performance improvement by a factor of 2 when iterating using the index (where 1 in 16 are the target value) vs a naive iter().filter(...). However there is a performance degradation by a factor of 4 in that same scenario.

This confirms expectation: indexing is only a performance win in scenarios where a value is rarely updated but frequently searched for.

Release Notes

Users can now query by-value using indexes. To setup an index, first create an immutable component with Clone, Eq, and Hash implementations:

#[derive(Component, PartialEq, Eq, Hash, Clone)]
#[component(immutable)]
struct TilePosition {
    x: i32,
    y: i32,
}

Next, request that an index be managed for this component:

app.add_index::<TilePosition>();

Finally, you can query by-value using the new QueryByIndex system parameter:

fn query_for_tiles(mut query: QueryByIndex<TilePosition, (Entity, &TilePosition)>) {
    let count = query.at(&TilePosition { x: 1, y: 1 }).iter().count();

    println!("Found {count} at (1,1)!");
}

QueryByIndex is just like Query except it has a third generic argument, C, which denotes the component you'd like to use as the index:

// Typical query
query: Query<&Foo, With<Bar>>,

// Indexed query
query: QueryByIndex<MyIndex, &Foo, With<Bar>>,

Internally, the index will move entities with differing values into their own archetypes, allowing for fast iteration at the cost of some added overhead when modifying the indexed value. As such, indexes are best used where you have an infrequently changing component that you frequently need to access a subset of.

alice-i-cecile · 2025-01-30T01:26:41Z

crates/bevy_ecs/src/index/mod.rs

+}
+
+/// Marker describing the requirements for a [`Component`] to be suitable for indexing.
+pub trait IndexComponent: Component<Mutability = Immutable> + Eq + Hash + Clone {}


I think Hash is too strong. BTreeMaps or QuadTrees will work just fine.

Both of those would require Ord, which might be difficult to justify for spatial indexes. Perhaps we can offer a more generic version of this API that would allow users to choose which trait bounds and storage option makes the most sense for them?

Yeah, that's my hope. Ideally we move the bounds off of here, and onto the lookup methods.

bushrat011899 · 2025-01-30T04:15:24Z

Marking as ready-for-review as I have the finalised MVP API I would like for this feature. As mentioned, it would be nice to allow Ord instead of Hash as an alternative, but I'll keep it simple for now just to see if the underlying strategy of fragmenting on an immutable component's value using observers is valid.

crates/bevy_ecs/src/index/mod.rs

alice-i-cecile · 2025-01-30T04:26:55Z

crates/bevy_ecs/src/index/mod.rs

+    ///     }
+    /// }
+    /// ```
+    pub fn at(&mut self, value: &C) -> Query<'_, '_, D, (F, With<C>)> {


Follow-up: this implies the ability to do between, which would be super sweet. Another argument for an Ord backend.

Definitely! Generalised storage backends could allow some really nice methods here for sure.

crates/bevy_ecs/src/index/mod.rs

alice-i-cecile · 2025-01-30T04:30:45Z

crates/bevy_ecs/src/index/mod.rs

+                world.entity_mut(entity).remove_by_id(component_id);
+
+                // On removal, we check if this was the last usage of this marker.
+                // If so, we can recycle it for a different value


Oh that's pretty.

There's some pretty wild Rust in this file!

crates/bevy_ecs/src/index/mod.rs

Co-Authored-By: Alice Cecile <alice.i.cecile@gmail.com>

tychedelia · 2025-01-30T07:48:43Z

crates/bevy_ecs/src/index/mod.rs

+    ///     }
+    /// }
+    /// ```
+    pub fn at(&mut self, value: &C) -> Query<'_, '_, D, (F, With<C>)> {


I'd call this where personally but maybe at has a more ecs-y flavor that I'm not familiar with.

I like at personally because it's like we're getting the query at a particular value. But I'd be happy to change it

I prefer at to where personally. I was otherwise thinking of for, but I think at might be better

bushrat011899 · 2025-01-30T08:14:59Z

Preliminary benchmarking shows the index speeds up iteration by a factor of two when 1-in-16 entities has the target value. However, it degrades mutation performance of that index component by a factor of ten in that same scenario. As expected, indexing is beneficial when you're searching for values at a much higher rate than updating them.

Improves performance slightly

Improves mutation performance

bushrat011899 · 2025-01-30T12:24:11Z

With a little refactoring I was able to close the performance gap in mutation to a factor of 4 in the benchmark. Still costly, but an improvement nonetheless. One last thing I want to change is app.add_index doesn't check for existing spawned entities. It should for correctness.

cBournhonesque · 2025-01-30T16:30:08Z

I've added some index benchmarks to the ecs bench which test iteration speed (index_iter) and mutation speed (index_update). Running these benchmarks locally, I see a performance improvement by a factor of 2 when iterating using the index (where 1 in 16 are the target value) vs a naive iter().filter(...). However there is a performance degradation by a factor of 4 in that same scenario.

We do want to benchmark against iter().filter(...) which just filters over all entities, but I think what we also really want is to benchmark against a more naive index implementation that just maintains a HashMap<value, HashSet<Entity>>, right?
I would love to see more explanations for why this is better than something like storing a hashmap in a resource, as outlined by https://hackmd.io/@bevy/BJPfCxduJe

(it's also the tool that I reach out for when I need to maintain an internal index in my own games)

Or is the main benefit compared to using a hashmap that this makes the query code straightforward to update (we just need to update the access instead of making doing more involved changes into the list of archetypes to iterate, etc.)? I do like the elegance of the design, I guess the alternative would be for the at to return a new QueryIndexedIter which contains the list of entities to iterate (which is computed via the index resource)

I just wonder if this elegant ZST based design won't come to bite us later on, that's why I'd like to understand better the benefits vs the QueryIndexedIter approach

Carter0

I know my review doesn't mean much on this, but I think the code looks really clean and I think the pitch for this implementation is really good.

cBournhonesque · 2025-01-30T16:55:07Z

crates/bevy_ecs/src/index/mod.rs

+            }
+            None => {
+                // Need to clone the index value for later lookups
+                let value = value.clone();


It doesn't have to be in this PR, but I see a world where the value that is used here is instead of the result of some IndexableComponent::hash_value() function, so that we can group components into 'buckets' of values.

That way we could avoid the issue of needing too many ZSTs components for components with float values, as we could for example map all the entities with values between 0.0-1.0 in the same bucket (and the query would then have to do an additional check via a .iter().filter() in the QueryIndex since there's no guarantee that the entities returned by the index have the exact value that we want).

I'm not super knowledgeable about indexes but basically I think we can re-use a bunch of tricks used by traditional DB indexes

cBournhonesque · 2025-01-30T16:57:27Z

crates/bevy_ecs/src/index/mod.rs

+//! Fragmenting requires unique [component IDs](ComponentId), and they are finite.
+//! For components with a small number of values in use at any one time, this is acceptable.
+//! But for components with a _large_ number of unique values (e.g., components containing floating point numbers),
+//! the pool of available component IDs will be quickly exhausted.


That does seem dangerous, I wonder if we could return a warning after a certain number of ZSTs are used?

Also I guess the work to make the ECS work well with higher componentID values will become more important

cBournhonesque · 2025-01-30T16:58:56Z

crates/bevy_ecs/src/index/mod.rs

+///
+/// // Indexing is opt-in through `World::add_index`
+/// world.add_index::<Player>();
+/// # world.spawn(Player(0));


nit: the doc is a bit lengthy, maybe having 0 Player(0), 1 Player(1), 2 Player(2)s is enough to demonstrate how this works

cBournhonesque · 2025-01-30T17:04:37Z

crates/bevy_ecs/src/index/mod.rs

+                // If there is a marker, restrict to it
+                Some(state) => builder.with_id(state.component_id),
+                // Otherwise, create a no-op query by including With<C> and Without<C>
+                None => builder.without::<C>(),


cBournhonesque · 2025-01-30T17:25:12Z

examples/ecs/index.rs

+        // Simply call `add_index` to start indexing a component.
+        // Make sure to call this _before_ any components are spawned, otherwise
+        // the index will miss those entities!
+        .add_index::<TilePosition>()


Maybe it would be worthwhile to have some kind of configuration to let users define which entities they want indexed.
Maybe .add_index::<C>(IndexConfig) with IndexConfig.automatic_indexation: bool?

If it's set to false, then an entity would be indexed only if a Index<C> component is added to an entity.

This lets us:

ignore some entities that we don't want to get indexed (cheaper indexation). In your version they would still get indexed and the user would have to filter them out in their QueryByIndex

have an easy way to remove an entity from an index (by removing the Index<C> component)

For example I might only want to index the Transform of only a few entities (player entities), but not of sprites, walls, etc.

Maybe both version can co-exist, because your version has ergonomic benefits (users don't have to add Index on every entity that needs to be indexed).

(Of course this doesn't have to be a blocker for this PR)

chescock · 2025-01-30T21:27:06Z

crates/bevy_ecs/src/index/mod.rs

+    pub fn at(&mut self, value: &C) -> Query<'_, '_, D, (F, With<C>)> {
+        self.state = {
+            // SAFETY: Mutable references do not alias and will be dropped after this block
+            let mut builder = unsafe { QueryBuilder::new(self.world.world_mut()) };


I think this call to world_mut() is unsound. It will conflict with uses in other params in the same system, or in other systems running in parallel. It might be impossible to use QueryBuilder like this.

One approach that might work is to use the QueryState you're already storing in the parameter state, and try to re-filter the matched_storage_ids (and somehow put them back when you're done). That would also mean you only have to iterate archetypes/tables matching the overall filter, instead of having to check all of them as QueryBuilder does.

I was concerned by this. I'm currently refactoring how this works to be more conservative with component IDs but once that is done I'd like to revisit how this works. Your suggestion might be the right way forward (for performance too probably!)

Hmm, your plan is to preallocate 32 component ids to use as bits, and use those to encode a 32-bit index in the archetype, right? I think you could build something that stores a Vec<QueryState> and does all of the archetype filtering in new_archetype. That would give you no overhead at query execution time, and would let you get queries with &self instead of &mut self!

The idea would be to implement SystemParam::new_archetype() manually by first doing the same filtering as any Query<D, (F, With<C>)>, but then by calling archetype.contains 32 times to get the 32-bit index encoded in that archetype. Then you'd look up the QueryState at that index in the Vec, and tell only that one about the archetype. So each QueryState in the Vec would always have the full list of archetypes with the given index, and none with other indices!

You'd need a way to make new QueryStates when you first see higher indices, but I think that's straightforward. Everything inside it is already Clone except for D::State and F::State, and I think all actual WorldQuery impls have State: Clone, so you could just add a bound there.

I'm going to try and do this now that I've got the combinatorial component ID indexing working. Seriously, thank you for explaining this!

Initial Commit

001b227

bushrat011899 marked this pull request as draft January 30, 2025 01:15

Update Examples

44408ba

alice-i-cecile reviewed Jan 30, 2025

View reviewed changes

bushrat011899 added 4 commits January 30, 2025 12:32

Address CI

df42b47

Create system param and expose via App

8f6069a

CI

9c66cd2

Docs

5b10abb

bushrat011899 changed the title ~~DRAFT: Query by Indexed Component Value~~ Query by Indexed Component Value Jan 30, 2025

bushrat011899 marked this pull request as ready for review January 30, 2025 04:13

bushrat011899 added S-Needs-Review Needs reviewer attention (from anyone!) to move forward S-Needs-SME Decision or review from an SME is required and removed S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Jan 30, 2025

alice-i-cecile reviewed Jan 30, 2025

View reviewed changes

crates/bevy_ecs/src/index/mod.rs Outdated Show resolved Hide resolved