RL Policy & Tron1

Testing different RL policies using Tron1

Categories:

Demo available here. Note: registration required.

Overview

The Tron1 robot is a multi-modal biped robot that can be used for Humanoid RL Research. Much of the software is open source, and can be found here.

In this project, we evaluate Tron1 using two reinforcement learning policies under two complementary conditions:

Movement tests, which assess task execution under explicit motion commands
Idle drift tests, which assess passive stability when no motion command is issued

Together, these experiments allow us to study both active locomotion performance and intrinsic stability characteristics of the policies.

Policies under test

We compare two reinforcement learning policies that share the same high-level objective (bipedal locomotion) but differ in training environment and design assumptions:

isaacgym
A policy trained using NVIDIA Isaac Gym, emphasizing fast, large-scale simulation and efficient optimization. These policies typically prioritize robust execution of commanded motions under simplified or tightly controlled dynamics.
isaaclab
A policy trained using Isaac Lab, emphasizing modularity, richer task abstractions, and closer alignment with downstream robotics workflows. This often introduces additional internal structure and constraints in the policy.

Implementation details and training setups are available at:

Movement test: policy comparison under commanded motion

We first run a movement test where the robot is asked to move forward and rotate relative to its starting pose. This evaluates how well each policy executes a simple but non-trivial locomotion task.

The test itself is straightforward:

Move forward 5 meters
Rotate 150 degrees
Complete the task within a fixed timeout

Parameterizing the test

Using the artefacts.yaml file, the movement test is configured as follows:

policy_test:
  type: test
  runtime:
    framework: ros2:jazzy
    simulator: gazebo:harmonic
  scenarios:
    defaults:
      pytest_file: test/art_test_move.py
      output_dirs: ["test_report/latest/", "output"]
    settings:
      - name: move_face_with_different_policies
        params:
          rl_type: ["isaacgym", "isaaclab"]
          move_face:
            - {name: "dyaw_150+fw_5m", forward_m: 5.0, dyaw_deg: 150, hold: false, timeout_s: 35}

Key points from above:

The test is conducted using pytest (and so pytest_file points to our test file).
Artefacts will look in the two folders described in output_dirs for uploads to the dashboard
We have two sets of parameters:
1. rl_type: two parameters: “isaacgym” and “isaaclab”
2. move_face: one parameter set specifying forward distance (5m), rotation angle (150 degrees), and timeout (35 seconds)

The test will run twice, once using the isaacgym policy, and again using the isaaclab policy. Both tests will use the move_face parameter to determine how far, and how much rotation the robot should do.

Results: isaacgym

When using the isaacgym policy, we can see (from a birdseye recording) the robot successfuly rotating, and then moving forwards:

The dashboard notes the test as a success:

And a csv we created during the test plotting the ground truth movement is automatically converted to an easy to read chart by the dashboard.

Results: isaaclab

With the isaaclab policy, we see there is still work to be done. The dashboard notes the test as a fail (and shows us the failing assertion), and the birdseye video shows the robot failing to setout its goal.

We have a csv (automatically converted to a chart) of estimated trajectory, (i.e what the robot thinks it has done), which we can see is widely different to the ground truth:

Idle drift comparison (stationary robot)

The movement test evaluates task execution under command. To complement this, we perform an idle drift test, where the robot receives no motion command at all.

After initializing the robot in a neutral standing pose, no velocity, pose, or locomotion commands are issued. The policy continues running normally, and any observed motion is therefore uncommanded.

This test isolates passive stability behavior, independent of task execution.

Test setup and durations

Idle drift is evaluated at two time scales:

10 seconds, capturing immediate transients and short-term controller bias
60 seconds, capturing slow accumulation effects such as yaw creep or gradual planar drift

Using both durations allows us to distinguish between short-term stability and long-term equilibrium behavior.

Parameterization

policy_drift:
  type: test
  runtime:
    framework: ros2:jazzy
    simulator: gazebo:harmonic
  scenarios:
    defaults:
      output_dirs: ["test_report/latest/", "output"]
      metrics: "output/metrics.json"
      pytest_file: test/art_test_drift.py
    settings:
      - name: idle_drift_compare_policies
        params:
          rl_type: ["isaacgym", "isaaclab"]
          durations_s: [10, 60]

Key points from above:

The test is executed using pytest
Each run corresponds to one (policy, duration) pair
Metrics are written to metrics.json and automatically displayed in the Artefacts dashboard

Visual results

For each duration, the two policies are compared side-by-side using birdseye recordings.

10 second idle test

isaacgym	isaaclab

60 second idle test

isaacgym	isaaclab

Metrics and evaluation

For each idle drift run, the following ground truth metrics are reported:

Duration – actual elapsed runtime of the test
X_final, Y_final – final planar displacement relative to the start
XY_final – total planar drift magnitude
Yaw_final_deg – accumulated yaw drift

An example metrics panel from the dashboard is shown below:

These metrics provide a compact quantitative summary that complements the visual observations.

Trajectory plots (60 second idle test)

In addition to scalar metrics, the dashboard provides interactive planar trajectory plots derived from ground truth pose data.

Global trajectory view

isaacgym	isaaclab

isaacgym exhibits a pronounced curved drift trajectory.
isaaclab shows a more linear overall drift direction.

Zoomed-in trajectory view

isaacgym	isaaclab

The zoomed-in view reveals fine-scale structure:

oscillatory behavior for isaacgym,
smaller but irregular deviations for isaaclab.

These patterns are difficult to see in videos alone but become clear when inspecting trajectory data directly.

What we learn from these tests

The movement and idle drift experiments are complementary:

The movement test evaluates task execution under explicit command
The idle drift test evaluates passive stability in the absence of command

By combining both, we obtain a clearer picture of policy behavior under both active and idle conditions, helping distinguish between execution errors and intrinsic stability characteristics.

Data available after the tests

For both movement and idle drift experiments, Artefacts provides access to:

ROSbag recordings
Video of the active robot in Gazebo, both birdseye and first person,
Stdout and stderr logs
Debug logs
CSV of the trajectory (estimated) automatically displayed as a graph in the dashboard
CSV of the trajectory (ground truth) automatically displayed as a graph in the dashboard
Metrics summaries for quantitative comparison

Artefacts Toolkit Helpers

For this project, we used the following helpers from the Artefacts Toolkit:

get_artefacts_params: used to select the RL policy and test parameters
extract_video: used to generate videos from recorded rosbags

Last modified February 20, 2026: Hugo 0.155.3 cd (#119) (18a870f)