Sable is an algorithm that was developed by the research team at InstaDeep. It also casts MARL as a sequence modelling problem and leverages the advantage decompostion theorem through auto-regressive action selection for convergence guarantees and can scale to thousands of agents by leveraging the memory efficiency of Retentive Networks.
We provide two Anakin based implementations of Sable:
Here the ff
suffix implies that the algorithm retains no memory over time but treats only the agents as the sequence dimension while rec
implies that the algorithms maintains memory over both agents and time for long context memory in partially observable environments.
For an overview of how the algorithm works, please see the diagram below. For a more detailed overview please see our associated paper.
Sable architecture and execution. The encoder receives all agent observations