You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question about the following line of code: critic.model[-1].apply(uniform_init_weights(0.0))
As I understand it, this line is supposed to implement the following idea in the DreamerV3 paper:
We further noticed that the randomly initialized reward predictor and critic networks at the start of training can result in large predicted rewards that can delay the onset of learning. We initialize the output weights of the reward predictor and critic to zeros, which effectively alleviates the problem and accelerates early learning.
I understand the idea. However, the question now arises as to whether this is not suboptimal, as all weights that were initialized with 0 always receive the same gradient update and thus always remain the same, which can't really be the intention, can it?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I have a question about the following line of code:
critic.model[-1].apply(uniform_init_weights(0.0))
As I understand it, this line is supposed to implement the following idea in the DreamerV3 paper:
I understand the idea. However, the question now arises as to whether this is not suboptimal, as all weights that were initialized with 0 always receive the same gradient update and thus always remain the same, which can't really be the intention, can it?
Beta Was this translation helpful? Give feedback.
All reactions