problems about action range #604

brodermind · 2024-06-25T08:10:22Z

Hi, everyone!
I set env config as follows and trained a model:

env = gym.make("highway-v0", render_mode='rgb_array')
config = {
    ...
    "action":{
        "type":"ContinuousAction",
        "acceleration_range": [-10.0,8.0],
        "steering_range":[-np.pi/4, np.pi/4],
        
    },
    "controlld_vehicles": 1,
    "duration": 150,
    "vehicles_count": 50,
    "vehicles_density": 1,
    "absolute": False,
    "order": "sorted",
    "simulation_frequency": 15,  # [Hz]
    "policy_frequency": 15, 
    "normalize": False,
    "normalize_reward": False,
    "clip": False,
    "offroad_terminal": True
}
env.configure(config)
env.reset()

However, when i reload the model and print the action and steering value, they are all not in the correct value range. I confused.... i write print("acc: {}, steering: {}".format(action[0], action[1])) in def _reward() in highway_env.py and write print in following function

def act(self, action: np.ndarray) -> None:
       ...
        if self.longitudinal and self.lateral:
            print("original acc: {}, steering: {}".format(action[0], action[1]))
            self.controlled_vehicle.act({
                "acceleration": utils.lmap(action[0], [-1, 1], self.acceleration_range),
                "steering": utils.lmap(action[1], [-1, 1], self.steering_range),
            })
            # action[0] = utils.lmap(action[0], [-1, 1], self.acceleration_range)
            # action[1] = utils.lmap(action[1], [-1, 1], self.steering_range)
        elif self.longitudinal:
            self.controlled_vehicle.act({
                "acceleration": utils.lmap(action[0], [-1, 1], self.acceleration_range),
                "steering": 0,
            })
        elif self.lateral:
            self.controlled_vehicle.act({
                "acceleration": 0,
                "steering": utils.lmap(action[0], [-1, 1], self.steering_range)
            })
        print("actual acc: {}, steering: {}".format(action[0], action[1]))
        self.last_action = action

I get the following result:
original acc: 0.1795186996459961, steering: -0.02994769811630249
actual acc: 0.1795186996459961, steering: -0.02994769811630249
acc: 0.1795186996459961, steering: -0.02994769811630249
original acc: 0.20840740203857422, steering: -0.02228790521621704
actual acc: 0.20840740203857422, steering: -0.02228790521621704
acc: 0.20840740203857422, steering: -0.02228790521621704
original acc: 0.2234337329864502, steering: -0.014497756958007812
actual acc: 0.2234337329864502, steering: -0.014497756958007812
acc: 0.2234337329864502, steering: -0.014497756958007812
original acc: 0.2373422384262085, steering: -0.007743716239929199
actual acc: 0.2373422384262085, steering: -0.007743716239929199
acc: 0.2373422384262085, steering: -0.007743716239929199
original acc: 0.25095176696777344, steering: -0.0019524693489074707
actual acc: 0.25095176696777344, steering: -0.0019524693489074707
acc: 0.25095176696777344, steering: -0.0019524693489074707
original acc: 0.2658735513687134, steering: 0.0023392438888549805
actual acc: 0.2658735513687134, steering: 0.0023392438888549805
acc: 0.2658735513687134, steering: 0.0023392438888549805
original acc: 0.2798728942871094, steering: 0.004693746566772461
actual acc: 0.2798728942871094, steering: 0.004693746566772461
acc: 0.2798728942871094, steering: 0.004693746566772461
original acc: 0.290974497795105, steering: 0.005937099456787109
actual acc: 0.290974497795105, steering: 0.005937099456787109
acc: 0.290974497795105, steering: 0.005937099456787109
original acc: 0.3005625009536743, steering: 0.0067664384841918945
actual acc: 0.3005625009536743, steering: 0.0067664384841918945
acc: 0.3005625009536743, steering: 0.0067664384841918945

I AM SO CONFUSED .......

The text was updated successfully, but these errors were encountered:

brodermind · 2024-06-25T11:52:20Z

@eleurent

eleurent · 2024-06-25T18:59:20Z

You are printing the same input action twice, the one which is unscaled, in [-1, 1]. The scaled action is fed to the vehicle directly. After it has been executed, you can access it with print(self.controlled_vehicle.action)

brodermind · 2024-06-25T23:29:36Z

Does the action input in def _reward(self, action: Action) -> float:also unscaled ? I wanna compute rewards according to the actual action, i also add print(self.controlled_vehicle.action) in the def _reward(self, action: Action) -> float: function, but get
AttributeError: 'HighwayEnv' object has no attribute 'controlled_vehicle', how can i get the actual action in def _reward(self, action: Action) -> float: ?

brodermind · 2024-06-28T13:47:29Z

Besides, I write print("ego acc: {}, speed: {}, acc: {}".format(self.vehicle.action, self.vehicle.speed, action[0])) in def _reward(self, action: Action) -> float:, get
ego acc: {'acceleration': -9.143099665641785, 'steering': -0.0170410996816317}, speed: 24.390460022290547, acc: -0.9047888517379761
ego acc: {'acceleration': -8.94046038389206, 'steering': -0.006646797551511874}, speed: 23.794429330031075, acc: -0.8822733759880066
ego acc: {'acceleration': -8.713669419288635, 'steering': 0.00408259474217243}, speed: 23.213518035411834, acc: -0.8570743799209595
ego acc: {'acceleration': -8.462130784988403, 'steering': 0.011589494497536434}, speed: 22.649375983079274, acc: -0.8291256427764893
ego acc: {'acceleration': -8.193395435810089, 'steering': 0.01602412584630364}, speed: 22.103149620691934, acc: -0.7992661595344543
ego acc: {'acceleration': -7.89635956287384, 'steering': 0.018041595207714756}, speed: 21.576725649833676, acc: -0.7662621736526489
ego acc: {'acceleration': -7.466260373592377, 'steering': 0.01743891977243528}, speed: 21.07897495826085, acc: -0.7184733748435974
ego acc: {'acceleration': -7.3168264627456665, 'steering': 0.017414108681810925}, speed: 20.59118652741114, acc: -0.7018696069717407
ego acc: {'acceleration': -7.18576192855835, 'steering': 0.020717354298106838}, speed: 20.112135732173915, acc: -0.6873068809509277

So the 'action' is unscaled, and 'self.vehicle.action' is scaled action. However, the self.vehicle.speed seems not scaled because it seems calculate by self.vehicle.speed + action[0], not self.vehicle.speed + self.vehicle.action[acceleration]?
@eleurent

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problems about action range #604

problems about action range #604

brodermind commented Jun 25, 2024

brodermind commented Jun 25, 2024

eleurent commented Jun 25, 2024 •

edited

Loading

brodermind commented Jun 25, 2024

brodermind commented Jun 28, 2024

problems about action range #604

problems about action range #604

Comments

brodermind commented Jun 25, 2024

brodermind commented Jun 25, 2024

eleurent commented Jun 25, 2024 • edited Loading

brodermind commented Jun 25, 2024

brodermind commented Jun 28, 2024

eleurent commented Jun 25, 2024 •

edited

Loading