Concerns Regarding Initialization Strategy in DreamerV3 #289

EwertzJN · 2024-05-16T13:58:20Z

EwertzJN
May 16, 2024

I have a question about the following line of code:
critic.model[-1].apply(uniform_init_weights(0.0))

As I understand it, this line is supposed to implement the following idea in the DreamerV3 paper:

We further noticed that the randomly initialized reward predictor and critic networks at the start of training can result in large predicted rewards that can delay the onset of learning. We initialize the output weights of the reward predictor and critic to zeros, which effectively alleviates the problem and accelerates early learning.

I understand the idea. However, the question now arises as to whether this is not suboptimal, as all weights that were initialized with 0 always receive the same gradient update and thus always remain the same, which can't really be the intention, can it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concerns Regarding Initialization Strategy in DreamerV3 #289

{{title}}

Replies: 0 comments

Select a reply

Concerns Regarding Initialization Strategy in DreamerV3 #289

EwertzJN May 16, 2024

Replies: 0 comments

EwertzJN
May 16, 2024