Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problems about action range #604

Open
brodermind opened this issue Jun 25, 2024 · 4 comments
Open

problems about action range #604

brodermind opened this issue Jun 25, 2024 · 4 comments

Comments

@brodermind
Copy link

Hi, everyone!
I set env config as follows and trained a model:

env = gym.make("highway-v0", render_mode='rgb_array')
config = {
    ...
    "action":{
        "type":"ContinuousAction",
        "acceleration_range": [-10.0,8.0],
        "steering_range":[-np.pi/4, np.pi/4],
        
    },
    "controlld_vehicles": 1,
    "duration": 150,
    "vehicles_count": 50,
    "vehicles_density": 1,
    "absolute": False,
    "order": "sorted",
    "simulation_frequency": 15,  # [Hz]
    "policy_frequency": 15, 
    "normalize": False,
    "normalize_reward": False,
    "clip": False,
    "offroad_terminal": True
}
env.configure(config)
env.reset()

However, when i reload the model and print the action and steering value, they are all not in the correct value range. I confused.... i write print("acc: {}, steering: {}".format(action[0], action[1])) in def _reward() in highway_env.py and write print in following function

def act(self, action: np.ndarray) -> None:
       ...
        if self.longitudinal and self.lateral:
            print("original acc: {}, steering: {}".format(action[0], action[1]))
            self.controlled_vehicle.act({
                "acceleration": utils.lmap(action[0], [-1, 1], self.acceleration_range),
                "steering": utils.lmap(action[1], [-1, 1], self.steering_range),
            })
            # action[0] = utils.lmap(action[0], [-1, 1], self.acceleration_range)
            # action[1] = utils.lmap(action[1], [-1, 1], self.steering_range)
        elif self.longitudinal:
            self.controlled_vehicle.act({
                "acceleration": utils.lmap(action[0], [-1, 1], self.acceleration_range),
                "steering": 0,
            })
        elif self.lateral:
            self.controlled_vehicle.act({
                "acceleration": 0,
                "steering": utils.lmap(action[0], [-1, 1], self.steering_range)
            })
        print("actual acc: {}, steering: {}".format(action[0], action[1]))
        self.last_action = action

I get the following result:
original acc: 0.1795186996459961, steering: -0.02994769811630249
actual acc: 0.1795186996459961, steering: -0.02994769811630249
acc: 0.1795186996459961, steering: -0.02994769811630249
original acc: 0.20840740203857422, steering: -0.02228790521621704
actual acc: 0.20840740203857422, steering: -0.02228790521621704
acc: 0.20840740203857422, steering: -0.02228790521621704
original acc: 0.2234337329864502, steering: -0.014497756958007812
actual acc: 0.2234337329864502, steering: -0.014497756958007812
acc: 0.2234337329864502, steering: -0.014497756958007812
original acc: 0.2373422384262085, steering: -0.007743716239929199
actual acc: 0.2373422384262085, steering: -0.007743716239929199
acc: 0.2373422384262085, steering: -0.007743716239929199
original acc: 0.25095176696777344, steering: -0.0019524693489074707
actual acc: 0.25095176696777344, steering: -0.0019524693489074707
acc: 0.25095176696777344, steering: -0.0019524693489074707
original acc: 0.2658735513687134, steering: 0.0023392438888549805
actual acc: 0.2658735513687134, steering: 0.0023392438888549805
acc: 0.2658735513687134, steering: 0.0023392438888549805
original acc: 0.2798728942871094, steering: 0.004693746566772461
actual acc: 0.2798728942871094, steering: 0.004693746566772461
acc: 0.2798728942871094, steering: 0.004693746566772461
original acc: 0.290974497795105, steering: 0.005937099456787109
actual acc: 0.290974497795105, steering: 0.005937099456787109
acc: 0.290974497795105, steering: 0.005937099456787109
original acc: 0.3005625009536743, steering: 0.0067664384841918945
actual acc: 0.3005625009536743, steering: 0.0067664384841918945
acc: 0.3005625009536743, steering: 0.0067664384841918945

I AM SO CONFUSED .......

@brodermind
Copy link
Author

@eleurent

@eleurent
Copy link
Collaborator

eleurent commented Jun 25, 2024

You are printing the same input action twice, the one which is unscaled, in [-1, 1]. The scaled action is fed to the vehicle directly. After it has been executed, you can access it with print(self.controlled_vehicle.action)

@brodermind
Copy link
Author

Does the action input in def _reward(self, action: Action) -> float:also unscaled ? I wanna compute rewards according to the actual action, i also add print(self.controlled_vehicle.action) in the def _reward(self, action: Action) -> float: function, but get
AttributeError: 'HighwayEnv' object has no attribute 'controlled_vehicle', how can i get the actual action in def _reward(self, action: Action) -> float: ?

@brodermind
Copy link
Author

Besides, I write print("ego acc: {}, speed: {}, acc: {}".format(self.vehicle.action, self.vehicle.speed, action[0])) in def _reward(self, action: Action) -> float:, get
ego acc: {'acceleration': -9.143099665641785, 'steering': -0.0170410996816317}, speed: 24.390460022290547, acc: -0.9047888517379761
ego acc: {'acceleration': -8.94046038389206, 'steering': -0.006646797551511874}, speed: 23.794429330031075, acc: -0.8822733759880066
ego acc: {'acceleration': -8.713669419288635, 'steering': 0.00408259474217243}, speed: 23.213518035411834, acc: -0.8570743799209595
ego acc: {'acceleration': -8.462130784988403, 'steering': 0.011589494497536434}, speed: 22.649375983079274, acc: -0.8291256427764893
ego acc: {'acceleration': -8.193395435810089, 'steering': 0.01602412584630364}, speed: 22.103149620691934, acc: -0.7992661595344543
ego acc: {'acceleration': -7.89635956287384, 'steering': 0.018041595207714756}, speed: 21.576725649833676, acc: -0.7662621736526489
ego acc: {'acceleration': -7.466260373592377, 'steering': 0.01743891977243528}, speed: 21.07897495826085, acc: -0.7184733748435974
ego acc: {'acceleration': -7.3168264627456665, 'steering': 0.017414108681810925}, speed: 20.59118652741114, acc: -0.7018696069717407
ego acc: {'acceleration': -7.18576192855835, 'steering': 0.020717354298106838}, speed: 20.112135732173915, acc: -0.6873068809509277

So the 'action' is unscaled, and 'self.vehicle.action' is scaled action. However, the self.vehicle.speed seems not scaled because it seems calculate by self.vehicle.speed + action[0], not self.vehicle.speed + self.vehicle.action[acceleration]?
@eleurent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants