Skip to content

Commit

Permalink
Cluster light probes using conservative spherical bounds. (#13746)
Browse files Browse the repository at this point in the history
This commit allows the Bevy renderer to use the clustering
infrastructure for light probes (reflection probes and irradiance
volumes) on platforms where at least 3 storage buffers are available. On
such platforms (the vast majority), we stop performing brute-force
searches of light probes for each fragment and instead only search the
light probes with bounding spheres that intersect the current cluster.
This should dramatically improve scalability of irradiance volumes and
reflection probes.

The primary platform that doesn't support 3 storage buffers is WebGL 2,
and we continue using a brute-force search of light probes on that
platform, as the UBO that stores per-cluster indices is too small to fit
the light probe counts. Note, however, that that platform also doesn't
support bindless textures (indeed, it would be very odd for a platform
to support bindless textures but not SSBOs), so we only support one of
each type of light probe per drawcall there in the first place.
Consequently, this isn't a performance problem, as the search will only
have one light probe to consider. (In fact, clustering would probably
end up being a performance loss.)

Known potential improvements include:

1. We currently cull based on a conservative bounding sphere test and
not based on the oriented bounding box (OBB) of the light probe. This is
improvable, but in the interests of simplicity, I opted to keep the
bounding sphere test for now. The OBB improvement can be a follow-up.

2. This patch doesn't change the fact that each fragment only takes a
single light probe into account. Typical light probe implementations
detect the case in which multiple light probes cover the current
fragment and perform some sort of weighted blend between them. As the
light probe fetch function presently returns only a single light probe,
implementing that feature would require more code restructuring, so I
left it out for now. It can be added as a follow-up.

3. Light probe implementations typically have a falloff range. Although
this is a wanted feature in Bevy, this particular commit also doesn't
implement that feature, as it's out of scope.

4. This commit doesn't raise the maximum number of light probes past its
current value of 8 for each type. This should be addressed later, but
would possibly require more bindings on platforms with storage buffers,
which would increase this patch's complexity. Even without raising the
limit, this patch should constitute a significant performance
improvement for scenes that get anywhere close to this limit. In the
interest of keeping this patch small, I opted to leave raising the limit
to a follow-up.

## Changelog

### Changed

* Light probes (reflection probes and irradiance volumes) are now
clustered on most platforms, improving performance when many light
probes are present.

---------

Co-authored-by: Benjamin Brienen <Benjamin.Brienen@outlook.com>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
  • Loading branch information
3 people authored Dec 5, 2024
1 parent d2a07f9 commit b7bcd31
Show file tree
Hide file tree
Showing 11 changed files with 551 additions and 172 deletions.
16 changes: 15 additions & 1 deletion assets/shaders/irradiance_volume_voxel_visualization.wgsl
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#import bevy_pbr::forward_io::VertexOutput
#import bevy_pbr::irradiance_volume
#import bevy_pbr::mesh_view_bindings
#import bevy_pbr::clustered_forward

struct VoxelVisualizationIrradianceVolumeInfo {
world_from_voxel: mat4x4<f32>,
Expand All @@ -25,11 +26,24 @@ fn fragment(mesh: VertexOutput) -> @location(0) vec4<f32> {
let stp_rounded = round(stp - 0.5f) + 0.5f;
let rounded_world_pos = (irradiance_volume_info.world_from_voxel * vec4(stp_rounded, 1.0f)).xyz;

// Look up the irradiance volume range in the cluster list.
let view_z = dot(vec4<f32>(
mesh_view_bindings::view.view_from_world[0].z,
mesh_view_bindings::view.view_from_world[1].z,
mesh_view_bindings::view.view_from_world[2].z,
mesh_view_bindings::view.view_from_world[3].z
), mesh.world_position);
let cluster_index = clustered_forward::fragment_cluster_index(mesh.position.xy, view_z, false);
var clusterable_object_index_ranges =
clustered_forward::unpack_clusterable_object_index_ranges(cluster_index);

// `irradiance_volume_light()` multiplies by intensity, so cancel it out.
// If we take intensity into account, the cubes will be way too bright.
let rgb = irradiance_volume::irradiance_volume_light(
mesh.world_position.xyz,
mesh.world_normal) / irradiance_volume_info.intensity;
mesh.world_normal,
&clusterable_object_index_ranges,
) / irradiance_volume_info.intensity;

return vec4<f32>(rgb, 1.0f);
}
345 changes: 251 additions & 94 deletions crates/bevy_pbr/src/cluster/assign.rs

Large diffs are not rendered by default.

96 changes: 59 additions & 37 deletions crates/bevy_pbr/src/cluster/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
use core::num::NonZero;

use self::assign::ClusterableObjectType;
use bevy_core_pipeline::core_3d::Camera3d;
use bevy_ecs::{
component::Component,
Expand All @@ -11,7 +12,7 @@ use bevy_ecs::{
system::{Commands, Query, Res, Resource},
world::{FromWorld, World},
};
use bevy_math::{AspectRatio, UVec2, UVec3, UVec4, Vec3Swizzles as _, Vec4};
use bevy_math::{uvec4, AspectRatio, UVec2, UVec3, UVec4, Vec3Swizzles as _, Vec4};
use bevy_reflect::{std_traits::ReflectDefault, Reflect};
use bevy_render::{
camera::Camera,
Expand All @@ -28,7 +29,7 @@ use bevy_utils::{hashbrown::HashSet, tracing::warn};
pub(crate) use crate::cluster::assign::assign_objects_to_clusters;
use crate::MeshPipeline;

mod assign;
pub(crate) mod assign;

#[cfg(test)]
mod test;
Expand Down Expand Up @@ -132,8 +133,7 @@ pub struct Clusters {
#[derive(Clone, Component, Debug, Default)]
pub struct VisibleClusterableObjects {
pub(crate) entities: Vec<Entity>,
pub point_light_count: usize,
pub spot_light_count: usize,
counts: ClusterableObjectCounts,
}

#[derive(Resource, Default)]
Expand Down Expand Up @@ -189,8 +189,24 @@ pub struct ExtractedClusterConfig {
pub(crate) dimensions: UVec3,
}

/// Stores the number of each type of clusterable object in a single cluster.
///
/// Note that `reflection_probes` and `irradiance_volumes` won't be clustered if
/// fewer than 3 SSBOs are available, which usually means on WebGL 2.
#[derive(Clone, Copy, Default, Debug)]
struct ClusterableObjectCounts {
/// The number of point lights in the cluster.
point_lights: u32,
/// The number of spot lights in the cluster.
spot_lights: u32,
/// The number of reflection probes in the cluster.
reflection_probes: u32,
/// The number of irradiance volumes in the cluster.
irradiance_volumes: u32,
}

enum ExtractedClusterableObjectElement {
ClusterHeader(u32, u32),
ClusterHeader(ClusterableObjectCounts),
ClusterableObjectEntity(Entity),
}

Expand All @@ -212,8 +228,11 @@ struct GpuClusterableObjectIndexListsStorage {

#[derive(ShaderType, Default)]
struct GpuClusterOffsetsAndCountsStorage {
/// The starting offset, followed by the number of point lights, spot
/// lights, reflection probes, and irradiance volumes in each cluster, in
/// that order. The remaining fields are filled with zeroes.
#[size(runtime)]
data: Vec<UVec4>,
data: Vec<[UVec4; 2]>,
}

enum ViewClusterBuffers {
Expand Down Expand Up @@ -499,16 +518,14 @@ impl Default for GpuClusterableObjectsUniform {

pub(crate) struct ClusterableObjectOrderData<'a> {
pub(crate) entity: &'a Entity,
pub(crate) shadows_enabled: &'a bool,
pub(crate) is_volumetric_light: &'a bool,
pub(crate) is_spot_light: &'a bool,
pub(crate) object_type: &'a ClusterableObjectType,
}

#[allow(clippy::too_many_arguments)]
// Sort clusterable objects by:
//
// * point-light vs spot-light, so that we can iterate point lights and spot
// lights in contiguous blocks in the fragment shader,
// * object type, so that we can iterate point lights, spot lights, etc. in
// contiguous blocks in the fragment shader,
//
// * then those with shadows enabled first, so that the index can be used to
// render at most `point_light_shadow_maps_count` point light shadows and
Expand All @@ -521,10 +538,9 @@ pub(crate) fn clusterable_object_order(
a: ClusterableObjectOrderData,
b: ClusterableObjectOrderData,
) -> core::cmp::Ordering {
a.is_spot_light
.cmp(b.is_spot_light) // pointlights before spot lights
.then_with(|| b.shadows_enabled.cmp(a.shadows_enabled)) // shadow casters before non-casters
.then_with(|| b.is_volumetric_light.cmp(a.is_volumetric_light)) // volumetric lights before non-volumetric lights
a.object_type
.ordering()
.cmp(&b.object_type.ordering())
.then_with(|| a.entity.cmp(b.entity)) // stable
}

Expand All @@ -551,8 +567,7 @@ pub fn extract_clusters(
let mut data = Vec::with_capacity(clusters.clusterable_objects.len() + num_entities);
for cluster_objects in &clusters.clusterable_objects {
data.push(ExtractedClusterableObjectElement::ClusterHeader(
cluster_objects.point_light_count as u32,
cluster_objects.spot_light_count as u32,
cluster_objects.counts,
));
for clusterable_entity in &cluster_objects.entities {
if let Ok(entity) = mapper.get(*clusterable_entity) {
Expand Down Expand Up @@ -594,16 +609,9 @@ pub fn prepare_clusters(

for record in &extracted_clusters.data {
match record {
ExtractedClusterableObjectElement::ClusterHeader(
point_light_count,
spot_light_count,
) => {
ExtractedClusterableObjectElement::ClusterHeader(counts) => {
let offset = view_clusters_bindings.n_indices();
view_clusters_bindings.push_offset_and_counts(
offset,
*point_light_count as usize,
*spot_light_count as usize,
);
view_clusters_bindings.push_offset_and_counts(offset, counts);
}
ExtractedClusterableObjectElement::ClusterableObjectEntity(entity) => {
if let Some(clusterable_object_index) =
Expand Down Expand Up @@ -664,7 +672,7 @@ impl ViewClusterBindings {
}
}

pub fn push_offset_and_counts(&mut self, offset: usize, point_count: usize, spot_count: usize) {
fn push_offset_and_counts(&mut self, offset: usize, counts: &ClusterableObjectCounts) {
match &mut self.buffers {
ViewClusterBuffers::Uniform {
cluster_offsets_and_counts,
Expand All @@ -676,20 +684,24 @@ impl ViewClusterBindings {
return;
}
let component = self.n_offsets & ((1 << 2) - 1);
let packed = pack_offset_and_counts(offset, point_count, spot_count);
let packed =
pack_offset_and_counts(offset, counts.point_lights, counts.spot_lights);

cluster_offsets_and_counts.get_mut().data[array_index][component] = packed;
}
ViewClusterBuffers::Storage {
cluster_offsets_and_counts,
..
} => {
cluster_offsets_and_counts.get_mut().data.push(UVec4::new(
offset as u32,
point_count as u32,
spot_count as u32,
0,
));
cluster_offsets_and_counts.get_mut().data.push([
uvec4(
offset as u32,
counts.point_lights,
counts.spot_lights,
counts.reflection_probes,
),
uvec4(counts.irradiance_volumes, 0, 0, 0),
]);
}
}

Expand Down Expand Up @@ -815,6 +827,12 @@ impl ViewClusterBuffers {
}
}

// Compresses the offset and counts of point and spot lights so that they fit in
// a UBO.
//
// This function is only used if storage buffers are unavailable on this
// platform: typically, on WebGL 2.
//
// NOTE: With uniform buffer max binding size as 16384 bytes
// that means we can fit 204 clusterable objects in one uniform
// buffer, which means the count can be at most 204 so it
Expand All @@ -827,12 +845,16 @@ impl ViewClusterBuffers {
// the point light count into bits 9-17, and the spot light count into bits 0-8.
// [ 31 .. 18 | 17 .. 9 | 8 .. 0 ]
// [ offset | point light count | spot light count ]
//
// NOTE: This assumes CPU and GPU endianness are the same which is true
// for all common and tested x86/ARM CPUs and AMD/NVIDIA/Intel/Apple/etc GPUs
fn pack_offset_and_counts(offset: usize, point_count: usize, spot_count: usize) -> u32 {
//
// NOTE: On platforms that use this function, we don't cluster light probes, so
// the number of light probes is irrelevant.
fn pack_offset_and_counts(offset: usize, point_count: u32, spot_count: u32) -> u32 {
((offset as u32 & CLUSTER_OFFSET_MASK) << (CLUSTER_COUNT_SIZE * 2))
| (point_count as u32 & CLUSTER_COUNT_MASK) << CLUSTER_COUNT_SIZE
| (spot_count as u32 & CLUSTER_COUNT_MASK)
| (point_count & CLUSTER_COUNT_MASK) << CLUSTER_COUNT_SIZE
| (spot_count & CLUSTER_COUNT_MASK)
}

#[derive(ShaderType)]
Expand Down
34 changes: 30 additions & 4 deletions crates/bevy_pbr/src/light_probe/environment_map.wgsl
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
#import bevy_pbr::lighting::{
F_Schlick_vec, LayerLightingInput, LightingInput, LAYER_BASE, LAYER_CLEARCOAT
}
#import bevy_pbr::clustered_forward::ClusterableObjectIndexRanges

struct EnvironmentMapLight {
diffuse: vec3<f32>,
Expand All @@ -26,6 +27,7 @@ struct EnvironmentMapRadiances {

fn compute_radiances(
input: ptr<function, LightingInput>,
clusterable_object_index_ranges: ptr<function, ClusterableObjectIndexRanges>,
layer: u32,
world_position: vec3<f32>,
found_diffuse_indirect: bool,
Expand All @@ -38,7 +40,11 @@ fn compute_radiances(
var radiances: EnvironmentMapRadiances;

// Search for a reflection probe that contains the fragment.
var query_result = query_light_probe(world_position, /*is_irradiance_volume=*/ false);
var query_result = query_light_probe(
world_position,
/*is_irradiance_volume=*/ false,
clusterable_object_index_ranges,
);

// If we didn't find a reflection probe, use the view environment map if applicable.
if (query_result.texture_index < 0) {
Expand Down Expand Up @@ -90,6 +96,7 @@ fn compute_radiances(

fn compute_radiances(
input: ptr<function, LightingInput>,
clusterable_object_index_ranges: ptr<function, ClusterableObjectIndexRanges>,
layer: u32,
world_position: vec3<f32>,
found_diffuse_indirect: bool,
Expand Down Expand Up @@ -152,6 +159,7 @@ fn compute_radiances(
fn environment_map_light_clearcoat(
out: ptr<function, EnvironmentMapLight>,
input: ptr<function, LightingInput>,
clusterable_object_index_ranges: ptr<function, ClusterableObjectIndexRanges>,
found_diffuse_indirect: bool,
) {
// Unpack.
Expand All @@ -166,7 +174,12 @@ fn environment_map_light_clearcoat(
let inv_Fc = 1.0 - Fc;

let clearcoat_radiances = compute_radiances(
input, LAYER_CLEARCOAT, world_position, found_diffuse_indirect);
input,
clusterable_object_index_ranges,
LAYER_CLEARCOAT,
world_position,
found_diffuse_indirect,
);

// Composite the clearcoat layer on top of the existing one.
// These formulas are from Filament:
Expand All @@ -179,6 +192,7 @@ fn environment_map_light_clearcoat(

fn environment_map_light(
input: ptr<function, LightingInput>,
clusterable_object_index_ranges: ptr<function, ClusterableObjectIndexRanges>,
found_diffuse_indirect: bool,
) -> EnvironmentMapLight {
// Unpack.
Expand All @@ -191,7 +205,14 @@ fn environment_map_light(

var out: EnvironmentMapLight;

let radiances = compute_radiances(input, LAYER_BASE, world_position, found_diffuse_indirect);
let radiances = compute_radiances(
input,
clusterable_object_index_ranges,
LAYER_BASE,
world_position,
found_diffuse_indirect,
);

if (all(radiances.irradiance == vec3(0.0)) && all(radiances.radiance == vec3(0.0))) {
out.diffuse = vec3(0.0);
out.specular = vec3(0.0);
Expand Down Expand Up @@ -225,7 +246,12 @@ fn environment_map_light(
out.specular = FssEss * radiances.radiance;

#ifdef STANDARD_MATERIAL_CLEARCOAT
environment_map_light_clearcoat(&out, input, found_diffuse_indirect);
environment_map_light_clearcoat(
&out,
input,
clusterable_object_index_ranges,
found_diffuse_indirect,
);
#endif // STANDARD_MATERIAL_CLEARCOAT

return out;
Expand Down
13 changes: 11 additions & 2 deletions crates/bevy_pbr/src/light_probe/irradiance_volume.wgsl
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,24 @@
irradiance_volume_sampler,
light_probes,
};
#import bevy_pbr::clustered_forward::ClusterableObjectIndexRanges

#ifdef IRRADIANCE_VOLUMES_ARE_USABLE

// See:
// https://advances.realtimerendering.com/s2006/Mitchell-ShadingInValvesSourceEngine.pdf
// Slide 28, "Ambient Cube Basis"
fn irradiance_volume_light(world_position: vec3<f32>, N: vec3<f32>) -> vec3<f32> {
fn irradiance_volume_light(
world_position: vec3<f32>,
N: vec3<f32>,
clusterable_object_index_ranges: ptr<function, ClusterableObjectIndexRanges>,
) -> vec3<f32> {
// Search for an irradiance volume that contains the fragment.
let query_result = query_light_probe(world_position, /*is_irradiance_volume=*/ true);
let query_result = query_light_probe(
world_position,
/*is_irradiance_volume=*/ true,
clusterable_object_index_ranges,
);

// If there was no irradiance volume found, bail out.
if (query_result.texture_index < 0) {
Expand Down
Loading

0 comments on commit b7bcd31

Please sign in to comment.