Common Configurations

name type description
name str Name of the experiment. Default: "default"
description str Descrption of the experiment. Default: ""
tag str Tag of the experiment. Default: ""
seed str Global seed of the experiment. Used by seed_everything of PyTorch-Lightning. Default: 0
use_timestamp bool Whether to use the current timestamp as the suffix of the tag. Default: True
timestamp Optional[str] The timestamp as the suffix of the tag. DO NOT set this manually. Default: None
exp_root_dir str The root directory for outputs of all the experiments. Default: "outputs"
exp_dir str The directory for outputs of the current experiment. DO NOT set this manually. It will be automatically set to [exp_root_dir]/[name].
trial_name str Name of the trial. DO NOT set this manually. It will be automatically set to [tag]@[timestamp].
trial_dir str The directory for outputs for the current trial. DO NOT set this manually. It will be automatically set to [exp_root_dir]/[name]/[trial_name].
resume Optional[str] The path to the checkpoint file to resume from. Default: None
data_type str Type of the data module used. See here for supported data modules. Default: ""
data dict Configurations of the data module. Default: {}
system_type str Type of the system used. See here for supported systems. Default: ""
system dict Configurations of the system. Defaut: {}
trainer dict Configurations of PyTorch-Lightning Trainer. See for supported arguments. Exceptions: logger and callbacks are set in Default: {}
checkpoint dict Configurations of PyTorch-Lightning ModelCheckpoint callback, which defines when the checkpoint will be saved. See for supported arguments. Default: {}



name type description
height int Height of the rendered image in training. Default: 64
width int Width of the rendered image in training. Default: 64
eval_height int Height of the rendered image in validation/testing. Default: 512
eval_width int Width of the rendered image in validation/testing. Default: 512
batch_size int Number of images per batch in training. Default: 1
eval_batch_size int Number of images per batch in validation/testing. DO NOT change this. Default: 1
elevation_range Tuple[float,float] Camera elevation angle range to sample from in training, in degrees. Default: (-10,90)
azimuth_range Tuple[float,float] Camera azimuth angle range to sample from in training, in degrees. Default: (-180,180)
camera_distance_range Tuple[float,float] Camera distance range to sample from in training. Default: (1,1.5)
fovy_range Tuple[float,float] Camera field of view (FoV) range along the y direction (vertical direction) to sample from in training, in degrees. Default: (40,70)
camera_perturb float Random perturbation ratio for the sampled camera positions in training. The sampled camera positions will be perturbed by N(0,1) * camera_perturb. Default: 0.1
center_perturb float Random perturbation ratio for the look-at point of the cameras in training. The look-at point wil be N(0,1) * center_perturb. Default: 0.2
up_perturb float Random pertubation ratio for the up direction of the cameras in training. The up direction will be [0,0,1] + N(0,1) * up_perturb. Default: 0.02
light_position_perturb float Used to get random light directions from camera positions, only used when light_sample_strategy="dreamfusion". The camera positions will be perturbed by N(0,1) * light_position_perturb, then the perturbed positions are used to determine the light directions. Default: 1.0
light_distance_range Tuple[float,float] Point light distance range to sample from in training. Default: (0.8,1.5)
eval_elevation_deg float Camera elevation angle in validation/testing, in degrees. Default: 150
eval_camera_distance float Camera distance in validation/testing. Default: 15
eval_fovy_deg float Camera field of view (FoV) along the y direction (vertical direction) in validation/testing, in degrees. Default: 70
light_sample_strategy str Strategy to sample point light positions in training, in ["dreamfusion", "magic3d"]. "dreamfusion" uses strategy described in the DreamFusion paper; "magic3d" uses strategy decribed in the Magic3D paper. Default: "dreamfusion"
batch_uniform_azimuth bool Whether to ensure the uniformity of sampled azimuth angles in training as described in the Fantasia3D paper. If True, the azimuth_range is equally divided into batch_size bins and the azimuth angles are sampled from every bins. Default: True


Systems contain implementation of training/validation/testing logic for different methods.

Common configurations for systems

name type description
loss dict Dict that contains loss-related configurations. Default: {}
optimizer dict Optimizer configurations. Default: {}
scheduler Optional[dict] Learning rate scheduler configurations. If None, does not use a scheduler. Default: None
weights Optional[str] Path to the weights to be loaded. This is different from resume in that this does not resume training state. Default: None
weights_ignore_modules Optional[List[str]] List of modules that should be ignored when loading weights. Default: None
cleanup_after_validation_step bool Whether to empty cache after each validation step. This will slow down validation. Default: False
cleanup_after_test_step bool Whether to empty cache after each test step. This will slow down testing. Default: False


name type description
geometry_type str Type of the geometry used in the system. See here for supported geometry.
geometry dict Configurations of the geometry.
material_type str Type of the material used in the system. See here for supported material.
matrial dict Configurations of the material.
background_type str Type of the background used in the system. See here for supported background.
background dict Configurations of the background.
renderer_type str Type of the renderer used in the system. See here for supported renderer.
renderer dict Configurations of the renderer.
guidance_type str Type of the guidance used in the system. See here for supported guidance.
guidance dict Configurations of the guidance.
prompt_processor_type str Type of the prompt processor used in the system. See here for supported prompt processor.
prompt_processor dict Configurations of the prompt processor.


This system has all the configurations of dreamfusion-system, along with the following unique configurations:

name type description
refinement bool Whether to perform refinement (second stage in the Magic3D paper). Default: False
from_coarse bool Whether to initialize geometry from the coarse stage (first stage in the Magic3D paper) for refinement. If True, weights must be specified. Default: False
inherit_coarse_texture bool Whether to load the encoding and feature network from the coarse stage for refinement, used when from_coarse=True. Default: True


This system has all the configurations of dreamfusion-system, along with the following unique configurations:

name type description
subpixel_rendering bool Whether to perform subpixel rendering in validation/testing, which decodes a 128x128 latent feature map instead of 64x64. Default: True


This system has all the configurations of dreamfusion-system, along with the following unique configurations:

name type description
refinement bool Whether to perform RGB space refinement. Default: False
guide_shape Optional[str] Path to the .obj file as the shape guidance, used in Sketch-Shape. Default: None


This system has all the configurations of dreamfusion-system, along with the following unique configurations:

name type description
latent_steps int Number of steps for geometry optimization in latent space. In the first latent_steps steps, low resolution normal and mask are concatenated and fed to the latent diffusion model. After this high resolution normal is used to perform RGB space optimziation. Details are described in the Fantasia3D paper. Default: 2500


Geometry models properties for locations in space, including density, SDF, feature and normal.

Common configurations for implicit geometry

name type description
radius float Half side length of the scene bounding box. Default: 1.0
isosurface bool Whether to enable surface extraction. Default: True
isosusrface_method str Method for surface extraction, in ["mc", "mt"]. "mc" uses the marching cubes algorithm, not differentiable; "mt" uses the marching tetrahedra algorithm, differentiable. Default: "mt"
isosurface_resolution int Grid resolution for surface extraction. Default: 128
isosurface_threshold Union[float,str] The threshold value to determine the surface location of the implicit field, in [float, "auto"]. If "auto", use the mean value of the field as the threshold. Default: 0
isosurface_chunk int Chunk size when computing the field value on grid vertices, used to prevent OOM. If 0, does not use chunking. Default: 0
isosurface_coarse_to_fine bool Whether to extract the surface in a coarse-to-fine manner. If True, will first extract a coarse surface to get a tight bounding box, which is then used to extract a fine surface. Default: True
isosurface_deformable_grid bool Whether to optimize positions of grid vertices for surface extraction. Only support isosurface_method=mt. Default: False


name type description
n_input_dims int Number of input dimensions. Default: 3 (xyz)
n_feature_dims int Number of dimensions for the output features. Note that this should be aligned with the material used. Default: 3 (albedo)
density_activation str Density activation function. See get_activation in utils/ for all supported activation functions. Default: "softplus"
density_bias Union[float,str] Offset value to be added to the pre-activated density, in [float, "blob_dreamfusion", "blob_magic3d"]. If "blob_dreamfusion", uses the blob density bias proposed in DreamFusion; if "blob_magic3d", uses the blob density bias proposed in Magic3D. Default: "blob_magic3d"
density_blob_scale float Controls the magnitude of the blob density if density_bias in ["blob_dreamfusion", "blob_magic3d"]. Default: 10
density_blob_std float Controls the divergence of the blob density if density_bias in ["blob_dreamfusion", "blob_magic3d"]. Default: 0.5
pos_encoding_config dict Configurations for the positional encoding. See for supported arguments. Default: {}
mlp_network_config dict Configurations for the MLP head for geometry attribute prediction (density, feature ...). See for supported arguments. Default: {}
normal_type str How the normal is computed, in ["analytic", "finite_difference", "pred"]. If "analytic", uses PyTorch auto-differentiation to compute the analytic normal; if "finite_difference", uses finite difference to compute the approximate normal; if "pred", uses an MLP network to predict the normal. Default: "finite_difference"
finite_difference_normal_eps float The small epsilon value in finite difference to estimate the normal, used when normal_type="finite_difference". Default: 0.01
isosurface_threshold Union[float,str] Inherit from common configurations, but default to "auto". Default: "auto"


name type description
n_input_dims int Number of input dimensions. Default: 3 (xyz)
n_feature_dims int Number of dimensions for the output features. Note that this should be aligned with the material used. Default: 3 (albedo)
pos_encoding_config dict Configurations for the positional encoding. See for supported arguments. Default: {}
mlp_network_config dict Configurations for the MLP head for geometry attribute prediction (sdf, feature ...). See for supported arguments. Default: {}
normal_type str How the normal is computed, in ["finite_difference", "pred"]. If "finite_difference", uses finite difference to compute the approximate normal; if "pred", uses an MLP network to predict the normal. Default: "finite_difference"
finite_difference_normal_eps float The small epsilon value in finite difference to estimate the normal, used when normal_type="finite_difference". Default: 0.01
shape_init Optional[str] The shape to initializa the SDF as, in [None, "sphere", "ellipsoid"]. If None, does not initialize; if "sphere", initialized as a sphere; if "ellipsoid", initialized as an ellipsoid. Default: None
shape_init_params Optional[Any] Parameters to specify the SDF initialization. If shape_init="sphere", a float is used for the sphere radius; if shape_init="ellipsoid", a tuple of three floats is used for the radius along x/y/z axis. Default: None
force_shape_init bool Whether to force initialization of the SDf even if weights are provided. Default:False


An explicit geometry parameterized with a feature volume. The feature volume has a shape of (n_feature_dims + 1) x grid_size, one channel for density and the rest for material. The density is first scaled, then biased and finally activated.

name type description
grid_size tuple[int, int, int] The resolution of the feature volume. Default: (100, 100, 100)
n_feature_dims int The feature dimensions for its material. Default: 3
density_activation Optional[str] The activation to get the density value. Default: "softplus"
density_bias Union[float, str] The initialization of the density. A float value indicates uniform initialization and blob indicates a ball centered at the center. Default: "blob"
density_blob_scale float The parameter for blob initialization. Default: 5.0
density_blob_std float The parameter for blob initialization. Default: 0.5
normal_type Optional[str] The way to compute the normal from density. If set to "pred", the normal is produced with another volume in the shape of 3 x grid_size. Default: "finite_difference"

Common configurations for explicit geometry

name type description


name type description
isosurface_resolution int Tetrahedra grid resolution for surface extraction. Default: 128
isosurface_deformable_grid bool Whether to optimize positions of tetrahedra grid vertices for surface extraction. Default: True
pos_encoding_config dict Configurations for the positional encoding. See for supported arguments. Default: {}
mlp_network_config dict Configurations for the MLP head for feature prediction. See for supported arguments. Default: {}
shape_init Optional[str] The shape to initializa the SDF as, in [None, "sphere", "ellipsoid"]. If None, does not initialize; if "sphere", initialized as a sphere; if "ellipsoid", initialized as an ellipsoid. Default: None
shape_init_params Optional[Any] Parameters to specify the SDF initialization. If shape_init="sphere", a float is used for the sphere radius; if shape_init="ellipsoid", a tuple of three floats is used for the radius along x/y/z axis. Default: None
force_shape_init bool Whether to force initialization of the SDf even if weights are provided. Default:False
geometry_only bool Whether to only model the SDF. If True, the feature prediction is ommited. Default:False
fix_geometry bool Whether to optimize the geometry. If True, the SDF (and grid vertices if isosurface_deformable_grid=True) is fixed. Default: False


The material module outputs colors or color latents conditioned on the sampled positions, view directions, and sometimes light directions and normals.


A material with view dependent effects, parameterized with a network(MLP), similar with that in NeRF.

name type description
input_feature_dims int The dimensions of the input feature. Default: 8
color_activation str The activation mapping the network output to the color. Default: "sigmoid"
dir_encoding_config dict The config of the positional encoding applied on the ray direction. Default: {"otype": "SphericalHarmonics", "degree": 3}
mlp_network_config dict The config of the MLP network. Default: { "otype": "VanillaMLP", "activation": "ReLU", "n_neurons": 16, "n_hidden_layers": 2}


A material without view dependet effects, just map features to colors.

name type description
n_output_dims int The dimensions of the material color, e.g. 3 for RGB and 4 for latent. Default: 3
color_activation str The activation mapping the network output or the feature to the color. Default: "sigmoid"
mlp_network_config Optional[dict] The config of the MLP network. Set to None to directly map the input feature to the color with color_activation, otherwise the feature first goes through an MLP. Default: None
input_feature_dims Optional[int] The dimensions of the input feature. Required when use an MLP. Default: None


name type description
ambient_light_color Tuple[float,float,float] The ambient light color for lambertian shading, used when soft_shading=False. Default: (0.1,0.1,0.1)
diffuse_light_color Tuple[float,float,float] The diffuse light color for lambertian shading, used when soft_shading=False. Default: (0.9,0.9,0.9)
ambient_only_steps int Number of steps that use albedo color as input to the guidance. Default: 1000
diffuse_prob float Use shaded color with a probability of diffuse_prob and albedo color with a probability of 1-diffuse_prob after ambient_only_steps. Default: 0.75
textureless_prob float Use textureless shaded color with a probability of textureless_prob and lambertian shaded color with a probability of 1-textureless_probwhen using shaded color. Default: 0.5
albedo_activation str Activation function for the albedo color. Default: "sigmoid"
soft_shading bool If True, uses a soft version of lambertian shading in training, which randomly samples the ambient light color and diffuse light color. Proposed in the Magic3D paper. Default: False


No specific configuration.


The background should output colors or color latents conditioned on the ray directions.

Common configurations for background

name type description
n_output_dims int The dimension of the background color, e.g. 3 for RGB and 4 for latent. Default: 3


A background with a solid color.

name type description
color tuple The initialized color of the background with each value in [0,1], should match n_output_dims. Default: (1.0, 1.0, 1.0)
learned bool Whether to optimize the background. Default: True


A background with colors parameterized with a texture map.

name type description
height int The height of the texture map. Default: 64
width int The width of the texture map. Default: 64
color_activation str The activation mapping the texture feature to the color. Default: "sigmoid"


A background parameterized with a neural network (MLP).

name type description
color_activation str The activation mapping the network output to the color. Default: "sigmoid"
dir_encoding_config dict The config of the positional encoding applied on the ray direction. Default: {"otype": "SphericalHarmonics", "degree": 3}
mlp_network_config dict The config of the MLP network. Default: { "otype": "VanillaMLP", "activation": "ReLU", "n_neurons": 16, "n_hidden_layers": 2}
random_aug bool Whether to use random color augmentation. May be able to improve the correctness of the model. Default: False
random_aug_prob float The probability to use random color augmentation. Default: 0.5.


Renderers takes geometry, material, and background to produce images given camera and light specifications.

Common configurations for renderers

name type description
radius float Half side length of the scene bounding box. This should be the same as radius of the geometry in most cases. Default: 1.0


name type description
num_samples_per_ray float Number of sample points along each ray. Default: 1.0
randomized bool Whether to randomly perturb the sample points in training. Default: True
eval_chunk_size int Number of sample points per chunk in validation/testing, to prevent OOM. Default: 160000
grid_prune bool Whether to maintain an occupancy grid and prune sample points in empty space using NeRFAcc. Default: True


name type description
context_type str Rasterization context type used by nvdiffrast, in ["gl", "cuda"]. See the nvdiffrast documentation for more details.


Given an image or its latent input, the guide should provide its gradient conditioned on a text input so that the image can be optimized with gradient descent to better match the text.

Common configurations for guidance

name type description
enable_memory_efficient_attention bool Whether to enable memory efficient attention in xformers. This will lead to lower GPU memory usage and a potential speed up at inference. Speed up at training time is not guaranteed. Default: false
enable_sequential_cpu_offload bool Whether to offload all models to CPU. This will use accelerate, significantly reducing memory usage but slower. Default: False
enable_attention_slicing bool Whether to use sliced attention computation. This will save some memory in exchange for a small speed decrease. Default: False
enable_channels_last_format bool Whether to use Channels Last format for the unet. Default: False (Stable Diffusion) / True (DeepFloyd)
pretrained_model_name_or_path str The pretrained model path in huggingface. Default: "runwayml/stable-diffusion-v1-5" (for stable-diffusion-guidance) / "DeepFloyd/IF-I-XL-v1.0" (for deep-floyd-guidance)
guidance_scale float The classifier free guidance scale. Default: 100.0 (for stable-diffusion-guidance) / 20.0 (for deep-floyd-guidance)
grad_clip Optional[Any] The gradient clip value. None or float or a list in the form of [start_step, start_value, end_value, end_step]. Default: None
half_precision_weights bool Whether to use float16 for the diffusion model. Default: True
min_step_percent float The precent range (min value) of the random timesteps to add noise and denoise. Default: 0.02
max_step_percent float The precent range (max value) of the random timesteps to add noise and denoise. Default: 0.98
weighting_strategy str The choice of w(t) of the sds loss, in ["sds", "uniform", "fantasia3d"]. Default: "sds"

name type description
use_sjc bool Whether to use score jacobian chaining (SJC) instead of SDS. Default: False
var_red bool Whether to use Eq. 16 in SJC paper. Default: True
token_merging bool Whether to use token merging. This will speed up the unet forward and slightly affect the performance. Default: False
token_merging_params Optional[dict] The config for token merging. See here for supported arguments. Default: {}


No specific configuration.

Prompt Processors

Prompt processors take a user prompt and compute text embeddings for training. The type of the prompt processor should match that of the guidance.

Common configurations for prompt processors

name type description
prompt str The text prompt. Default: "a hamburger"
negative_prompt str The uncondition text input in Classifier Free Guidance. Default: ""
pretrained_model_name_or_path str The pretrained model path in huggingface. Default: "runwayml/stable-diffusion-v1-5" (for stable-diffusion-prompt-processor) / "DeepFloyd/IF-I-XL-v1.0" (fpr deep-floyd-prompt-processor)
view_dependent_prompting bool Whether to use view dependent prompt, i.e. add front/side/back/overhead view to the original prompt. Default: True
overhead_threshold float Consider the view as overhead when the elevation degree > overhead_threshold. Default: 60.0
front_threshold float Consider the view as front when the azimuth degree in [-front_threshold, front_threshold]. Default: 45.0
back_threshold float Consider the view as back when the azimuth degree > 180 - back_threshold or < -180 + back_threshold. Default: 45.0
view_dependent_prompt_front bool Whether to put the vide dependent prompt in front of the original prompt. If set to True, the final prompt will be a front/back/side/overhead view of [prompt], otherwise it will be [prompt], front/back/side/overhead view. Default: False


No specific configuration.


No specific configuration.