You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently working on a project where we try to adopt a single-agent RL framework to a multi-agent one and hope to compare on different MA algorithms on our specific problem. After I read through the both papers and corresponding implementation (mainly on QMIX and COMA) , I have some trouble on understanding the implementation of the module part, which contains agents, critics and mixers.
My first concern would be the RNN-Agents. In COMA and QMIX, agents actually play different roles in the algorithm. In QMIX, agents are just local q-functions, which input the obs and actions and outputs the corresponding Q values (action-state value function) to the mixer, where we argmax to obtain the optimal policy (this is more likely to the behaviour of the implemented RNN agents, which outputs q). However, in the COMA, agents are defined in an actor-critic way, just parameterizing a policy, which means it obvious outputs a certain action (maybe in logits manner). How could QMIX and COMA both use the same RNN agent (both algorithms init agents in the controller to interact with the env)? Am I misunderstanding some thing?
My second confusion is about the non-shared COMA (coma_ns.py from module dir). In COMA, the critic is obviously defined as a centralized critic Q(U,s). How could this critic be defined in a decentralized way? Because from my perspective, non-shared modules should only be the agents, not something defined to be centralized. In the COMA_learner.py, a single centralized critic would obviously make more sense to me.
The text was updated successfully, but these errors were encountered:
Hi, contributors of pymarl!
I am currently working on a project where we try to adopt a single-agent RL framework to a multi-agent one and hope to compare on different MA algorithms on our specific problem. After I read through the both papers and corresponding implementation (mainly on QMIX and COMA) , I have some trouble on understanding the implementation of the module part, which contains agents, critics and mixers.
My first concern would be the
RNN-Agents
. In COMA and QMIX, agents actually play different roles in the algorithm. In QMIX, agents are just local q-functions, which input the obs and actions and outputs the corresponding Q values (action-state value function) to the mixer, where we argmax to obtain the optimal policy (this is more likely to the behaviour of the implemented RNN agents, which outputs q). However, in the COMA, agents are defined in an actor-critic way, just parameterizing a policy, which means it obvious outputs a certain action (maybe in logits manner). How could QMIX and COMA both use the same RNN agent (both algorithms init agents in the controller to interact with the env)? Am I misunderstanding some thing?My second confusion is about the non-shared COMA (
coma_ns.py
from module dir). In COMA, the critic is obviously defined as a centralized critic Q(U,s). How could this critic be defined in a decentralized way? Because from my perspective, non-shared modules should only be the agents, not something defined to be centralized. In theCOMA_learner.py
, a single centralized critic would obviously make more sense to me.The text was updated successfully, but these errors were encountered: