You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In page 393, the objective for alpha shows the product relationship between alpha and the sum of target entropy heuristic and a likelihood term. However, the line
" alpha_loss = -(self.policy_model.logalpha * target_alpha).mean() "
is written in the corresponding code. They are inconsistent.
The text was updated successfully, but these errors were encountered:
In page 393, the objective for alpha shows the product relationship between alpha and the sum of target entropy heuristic and a likelihood term. However, the line
" alpha_loss = -(self.policy_model.logalpha * target_alpha).mean() "
is written in the corresponding code. They are inconsistent.
The text was updated successfully, but these errors were encountered: