Highway-Env: Autonomous Driving Simulator
- Highway-Env is a Python library offering simulated environments for autonomous driving research with diverse traffic scenarios and deep reinforcement learning integration.
- It features a modular, configurable architecture with road networks, vehicle models, and rule-based traffic agents to evaluate decision-making and control algorithms.
- Recent enhancements, such as the ComplexRoads extension and advanced reward-shaping frameworks, facilitate robust, efficient DRL experiments and performance benchmarking.
The Highway-Env package is a Python library compatible with OpenAI Gym and Gymnasium, providing simulated environments for autonomous driving research with a particular focus on deep reinforcement learning (DRL). Highway-Env encapsulates diverse traffic scenarios relevant to automated driving, offers configurable observation and action spaces, and serves as a benchmark for evaluating decision-making and control algorithms. Recent developments have introduced substantial extensions, notably through scenario composition (ComplexRoads) and integration with world model architectures, broadening the scope of experimental possibilities for safe, efficient, and robust driving policy research (Dong et al., 2023, Dat et al., 4 Jan 2026).
1. Architecture and Scenario Modeling
Highway-Env employs a modular class-based design where all environments subclass gym.Env or gymnasium.Env within the highway_env.envs namespace. Each scenario—such as multi-lane highways, merges, roundabouts, intersections, racetracks, and parking—corresponds to a distinct environment class, e.g., HighwayEnv, MergeEnv, RoundaboutEnv. The primary architectural components are:
- RoadNetwork: Represents the traffic infrastructure as a directed graph of lane segments and waypoints (sequences of position and heading states), configurable via JSON or Python dictionaries.
- Vehicle models: Employ a kinematic bicycle model with state vector . The dynamic update is governed by:
where is acceleration and is steering angle.
- Traffic Agents: Support rule-based controllers for background vehicles, implementing basic lane-following and collision avoidance.
Action spaces can be discrete (e.g., grid over steering and acceleration for DQN-style agents) or continuous (Box spaces covering steering and throttle/brake). Observations typically use a floating-point buffer encoding the ego vehicle and neighboring vehicles’ kinematic states, but image-based alternatives (such as Bird’s-Eye-View RGB frames) and LIDAR-like encodings are also supported.
Configuration of the environments is handled via dictionaries at instantiation, where all parameters—including road and traffic density, simulation frequency, observation modalities, and reward schemes—can be specified:
1 2 3 4 5 6 7 8 |
env = gym.make("highway-v0", config={ "lanes_count": 4, "vehicles_count": 20, "duration": 40, "observation": {"type": "Kinematics", "vehicles_count": 8}, "action": {"type": "ContinuousAction"}, # ... }) |
2. Core Extensions: ComplexRoads and Additional Features
Recent work has introduced substantive extensions aimed at increasing task diversity and measurement fidelity. Most notably, the ComplexRoads environment (highway_env.envs.complex_roads.ComplexRoads) composes multiple sub-scenarios into a single compound network:
- Two 4-lane highway merge segments
- Two four-way intersections (traffic lights disabled)
- Two roundabouts
- Interconnecting straight lanes
ComplexRoads inherits from the base abstractions while extending the RoadNetwork to concatenate sub-networks with transition edges. Spawn location, submap counts, vehicle density, and randomized starting positions can be configured:
1 2 3 4 5 6 7 8 9 10 11 12 |
gym.envs.registration.register(
id="ComplexRoads-v0",
entry_point="highway_env.envs.complex_roads:ComplexRoads",
kwargs={"config": {
"submap_counts": {"merge":2, "intersection":2, "roundabout":2, "straight":4},
"vehicle_density": 10,
"random_start": True,
"use_signed_lane_obs": True,
"duration": 1000
}},
max_episode_steps=1000
) |
Extended feature set:
- Signed distance to lane , enabling direction-aware lane offset measurement.
- Signed Lane Heading Difference (LHD), representing heading misalignment in .
OffRoadWrapper: penalizes off-road excursions and enforces lane reentry.PerformanceLogger: captures granular metrics (speed, jerk, distance, collisions, on-lane duration) and exports in CSV format.use_signed_lane_obsconfig flag to enable richer observation vectors. (Dong et al., 2023)
3. Custom Reward Function Design
Reward structures are fundamental to DRL in Highway-Env and have been systematically factorized into multiplicative components to drive on-lane, efficient, and safe behaviors.
Let be the signed lateral offset, the signed lane heading difference, and the ego speed:
- , typically with
Combined on-lane reward:
Crashes override the above with .
Reward shaping in other recent studies (e.g., (Dat et al., 4 Jan 2026)) further includes explicit penalties for collisions ( or depending on scenario), safe distance violations, and supplements with lane-changing, heading-alignment, and survival incentives.
These decompositions allow for precise balancing of efficiency, safety, comfort, and maneuver completion in complex, multi-modal traffic environments (Dong et al., 2023, Dat et al., 4 Jan 2026).
4. Integration with Deep Reinforcement Learning and World Models
Highway-Env is designed to interface seamlessly with DRL toolkits such as Stable-Baselines (v2/v3) and supports workflows for value-based (e.g., DQN) and policy-gradient (e.g., TRPO) algorithms. Key procedures as implemented in (Dong et al., 2023):
- Environment instantiation, action/observation space validation.
- Training loops with Stable-Baselines for DQN and TRPO, supporting both discrete and continuous control.
Example DQN setup:
1 2 3 4 5 6 7 8 9 10 |
from stable_baselines3 import DQN model = DQN( policy="MlpPolicy", env=env, learning_rate=1e-4, buffer_size=50000, # ... ) model.learn(total_timesteps=100_000) model.save("dqn_complex") |
For world model research, e.g., (Dat et al., 4 Jan 2026), the following pipeline is used:
- Bird’s-Eye-View RGB frames (64×64) as primary observations.
- Sequence batching via FIFO episode queues.
- JEPA (Joint Embedding Predictive Architecture) with Vision Masked Autoencoding and a DreamerV3-style RSSM (Recurrent State-Space Model) for latent planning.
- High-level discrete action layers where required, unified at the interface with Highway-Env.
- Policy inference via latent actor–critic heads taking concatenated deterministic and stochastic states.
This workflow facilitates the study of sample efficiency, model-based imagination rollouts, and evaluation of planning policies with explicit safety- and comfort-conditioning (Dong et al., 2023, Dat et al., 4 Jan 2026).
5. Evaluation Metrics and Experimental Protocols
Performance metrics captured by wrappers such as PerformanceLogger in ComplexRoads, and reward-logger modules in world model pipelines include:
- On-lane accuracy: proportion of time within legal lane boundaries.
- Efficiency: typically measured by mean ego-vehicle speed normalized to .
- Safety: collision rate over episodes, also tracked per scenario (e.g., on highway-v0 for HanoiWorld, compared to baselines in (Dat et al., 4 Jan 2026)).
- Comfort: quantified by jerk and sudden accelerations/braking rates.
- Success/failure rates: especially for maneuvers such as merges and lane changes.
Benchmarking protocols often involve:
- Fixed random seeds for reproducibility.
- Batch and sequence sizes set according to dynamics model requirements (e.g., , ).
- Training for $100,000$ steps (for DreamerV3, DQN, TRPO baselines) with rapid convergence (as little as $5,000$ steps for HanoiWorld leveraging pretrained encoders).
- Episode randomization for robustness, including traffic density and spawn state variation. (Dong et al., 2023, Dat et al., 4 Jan 2026)
6. Best Practices, Limitations, and Recent Developments
Recommended procedures include:
- Setting
use_signed_lane_obsfor improved situational awareness. - Adjusting
vehicle_densityand collision penalties to tune the exploration–safety tradeoff. - Using fine discretizations for DQN ( or grids over actions).
- Logging and exporting per-episode and per-timestep metrics for post-hoc analysis.
- Monitoring trust region thresholds (e.g.,
max_kl \approx 0.01 - 0.02for TRPO) to avoid policy collapse.
Recent research highlights increased sample efficiency via pretrained world model encoders (e.g., V-JEPA-2) and robust safety behaviors induced by strong regularization during training. However, edge cases (such as "reward hacks" during merging maneuvers) reveal that further improvements in reward shaping and scenario design are necessary (Dat et al., 4 Jan 2026).
A plausible implication is that the extensible design of Highway-Env—with comprehensive configuration, advanced wrapper support, and compatibility with both model-free and model-based RL architectures—facilitates research not only on technical driving competence but also on emergent, safety-critical, and social driving negotiation behaviors in multi-agent contexts (Dong et al., 2023, Dat et al., 4 Jan 2026).
References:
- Comprehensive Training and Evaluation on Deep Reinforcement Learning for Automated Driving in Various Simulated Driving Maneuvers (Dong et al., 2023)
- HanoiWorld: A Joint Embedding Predictive Architecture Based World Model for Autonomous Vehicle Controller (Dat et al., 4 Jan 2026)