Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: sebulba rec ippo #1142

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

Feat: sebulba rec ippo #1142

wants to merge 2 commits into from

Conversation

SimonDuToit
Copy link
Contributor

@SimonDuToit SimonDuToit commented Nov 18, 2024

Sebulba implementation of recurrent IPPO.

Copy link
Contributor

@OmaymaMahjoub OmaymaMahjoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the system looks correct and reasonable. Well done Simon! I just kept minor requests :)

- arch: sebulba
- system: ppo/rec_ippo
- network: rnn # [mlp, continuous_mlp, cnn]
- env: lbf_gym # [rware_gym, lbf_gym]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can add smaclite_gym to the list

observation: Observation,
dones,
hstates,
key: chex.PRNGKey,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
key: chex.PRNGKey,
dones: chex.Array,
hstates: HiddenStates,

@@ -0,0 +1,910 @@
# Copyright 2022 InstaDeep Ltd. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can update the typings in Pipeline mava/utils/sebulba.py to be Union[PPOTransition, RNNPPOTransition]

log_prob = actor_policy.log_prob(action)
# It may be faster to calculate the values in the learner as
# then we won't need to pass critic params to actors.
# value = critic_apply_fn(params.critic_params, observation).squeeze()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can remove this comment

timestep = env.reset(seed=seeds)
dones = np.repeat(timestep.last(), num_agents).reshape(num_envs, -1)

# simon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here once you are done from cleaning, if you can remove this comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this comment exists in multiple lines, if you can remove them all

)

params, opt_states, traj_batch, advantages, targets, key = update_state
# learner_state = LearnerState(params, opt_states, key, None, learner_state.timestep)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can remove this comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants