New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Feat: sebulba rec ippo #1142

Open

SimonDuToit wants to merge 2 commits into develop from feat/sebulba_rec_ippo

Contributor

SimonDuToit commented Nov 18, 2024 •

edited

Loading

Sebulba implementation of recurrent IPPO.

SimonDuToit added 2 commits

November 18, 2024 18:09


          recurrent ippo

1259b01


          linting

dbf837c

SimonDuToit requested review from RuanJohn, sash-a, OmaymaMahjoub, WiemKhlifi and Louay-Ben-nessir as code owners

November 18, 2024 16:14

pull-request-size bot added the size/XL label

OmaymaMahjoub requested changes

View reviewed changes

Contributor

OmaymaMahjoub left a comment

Overall the system looks correct and reasonable. Well done Simon! I just kept minor requests :)

mava/configs/default/rec_ippo_sebulba.yaml

+                - arch: sebulba
+                - system: ppo/rec_ippo
+                - network: rnn  # [mlp, continuous_mlp, cnn]
+                - env: lbf_gym  # [rware_gym, lbf_gym]

Contributor

OmaymaMahjoub Dec 11, 2024

if you can add smaclite_gym to the list

mava/systems/ppo/sebulba/rec_ippo.py

+                      observation: Observation,
+                      dones,
+                      hstates,
+                      key: chex.PRNGKey,

Contributor

OmaymaMahjoub Dec 11, 2024

Suggested change

      
                    key: chex.PRNGKey,
          
                    dones: chex.Array,
          
                    hstates: HiddenStates,

mava/systems/ppo/sebulba/rec_ippo.py

		@@ -0,0 +1,910 @@
		# Copyright 2022 InstaDeep Ltd. All rights reserved.

Contributor

OmaymaMahjoub Dec 11, 2024

If you can update the typings in Pipeline mava/utils/sebulba.py to be Union[PPOTransition, RNNPPOTransition]

mava/systems/ppo/sebulba/rec_ippo.py

+                      log_prob = actor_policy.log_prob(action)
+                      # It may be faster to calculate the values in the learner as
+                      # then we won't need to pass critic params to actors.
+                      # value = critic_apply_fn(params.critic_params, observation).squeeze()

Contributor

OmaymaMahjoub Dec 11, 2024

if you can remove this comment

mava/systems/ppo/sebulba/rec_ippo.py

+                  timestep = env.reset(seed=seeds)
+                  dones = np.repeat(timestep.last(), num_agents).reshape(num_envs, -1)
+                  # simon

Contributor

OmaymaMahjoub Dec 11, 2024

same here once you are done from cleaning, if you can remove this comment

Contributor

OmaymaMahjoub Dec 11, 2024

I see this comment exists in multiple lines, if you can remove them all

mava/systems/ppo/sebulba/rec_ippo.py

+                      )
+                      params, opt_states, traj_batch, advantages, targets, key = update_state
+                      # learner_state = LearnerState(params, opt_states, key, None, learner_state.timestep)

Contributor

OmaymaMahjoub Dec 11, 2024

If you can remove this comment

OmaymaMahjoub assigned SimonDuToit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

OmaymaMahjoub OmaymaMahjoub requested changes

RuanJohn Awaiting requested review from RuanJohn RuanJohn is a code owner

sash-a Awaiting requested review from sash-a sash-a is a code owner

WiemKhlifi Awaiting requested review from WiemKhlifi WiemKhlifi is a code owner

Louay-Ben-nessir Awaiting requested review from Louay-Ben-nessir Louay-Ben-nessir is a code owner

Requested changes must be addressed to merge this pull request.

Labels