The formula and code of SAC are inconsistent #23

MarginalCentrality · 2022-03-23T06:18:40Z

In page 393, the objective for alpha shows the product relationship between alpha and the sum of target entropy heuristic and a likelihood term. However, the line
" alpha_loss = -(self.policy_model.logalpha * target_alpha).mean() "
is written in the corresponding code. They are inconsistent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The formula and code of SAC are inconsistent #23

The formula and code of SAC are inconsistent #23

MarginalCentrality commented Mar 23, 2022

The formula and code of SAC are inconsistent #23

The formula and code of SAC are inconsistent #23

Comments

MarginalCentrality commented Mar 23, 2022