TART: Temporal Action Representation Learning for Tactical Resource Control and Subsequent Maneuver Generation

Hoseong Jung1Sungil Son1,3 Daesol Cho2 Jonghae Park1 Changhyun Choi1 H. Jin Kim1*
1Seoul National University2Georgia Institute of Technology3Life Assistant Robotics
* Corresponding author
ICRA 2026

TL;DR: TART learns temporally grounded hybrid-action representations via mutual-information contrastive learning and vector quantization, enabling resource-aware discrete decisions to condition diverse, context-appropriate continuous maneuvers.

TART outperforms hybrid-action baselines across maze navigation and air-to-air combat

TART achieves an average success rate of 87.2%, compared with 77.4% for the strongest baseline (HyAR), with gains of +2.7 to +13.2 percentage points across all six evaluation scenarios.

Qualitative results in maze navigation and air-to-air combat

Overall Framework of TART

TART maximizes a mutual information objective via trajectory-level contrastive learning, quantizes context embeddings into a tactical codebook, and conditions a hybrid policy to produce multi-modal maneuver distributions.

Overview of the TART framework

Abstract

Motivation

Autonomous robotic systems must coordinate discrete resource usage with continuous maneuvers under limited budgets, especially in fast-evolving tactical scenarios. Prior hybrid-action RL methods often neglect two critical properties: causal dependency between resource decisions and subsequent maneuvers, and the multi-modality of valid follow-up behaviors under the same discrete choice.

Approach

We propose TART (Temporal Action Representation learning for Tactical resource control and subsequent maneuver generation), a framework that learns temporally grounded representations for hybrid action policies by modeling the conditional distribution of continuous maneuvers given recent state history and the current discrete resource decision.

Method

TART maximizes a mutual information objective using an InfoNCE contrastive loss that aligns matched context–future trajectory pairs. The resulting context embeddings are vector-quantized into a compact codebook of tactical modes, which condition a factorized hybrid policy: discrete actions select a tactical mode, and the continuous actor generates a multi-modal maneuver distribution under that mode.

Results

We evaluate TART in two resource-limited domains: (i) budgeted maze navigation with discrete Boost and Penetration options, and (ii) high-fidelity F-16 air-to-air combat with missile and defense system deployment. TART consistently outperforms PADDPG, PDQN, HPPO, and HyAR in success rate while maintaining or improving resource efficiency (TTG, TTE, SPE), demonstrating effective temporal coupling between resource control and subsequent maneuvers.

Experiments

Evaluation environments

TART is evaluated in two budgeted hybrid-action domains with three difficulty levels each: maze navigation (Boost, Penetration) and F-16 air-to-air combat (Missile, Gun, Defense).

Overview of maze navigation and air-to-air combat environments

Performance comparison

TART outperforms all PAMDP baselines in every scenario. Results are averaged over five random seeds; bold entries denote the best performance.

Task / Scenario TART PADDPG PDQN HPPO HyAR
Maze Navigation / Easy 97.2±1.3 88.2±6.9 85.8±6.1 87.2±4.6 94.5±3.2
Maze Navigation / Medium 90.8±4.2 74.2±6.1 68.8±6.2 74.6±9.0 80.6±8.5
Maze Navigation / Hard 72.8±9.9 38.8±10.5 38.4±13.5 46.2±6.7 60.4±9.1
Air-to-Air Combat / Easy 94.8±3.1 79.6±5.6 81.2±5.9 87.8±4.6 86.6±4.8
Air-to-Air Combat / Medium 90.8±4.2 68.6±5.9 74.2±4.3 76.2±4.5 77.6±6.3
Air-to-Air Combat / Hard 76.8±5.0 61.8±7.8 57.2±7.4 65.4±3.0 64.4±7.2
Average Success Rate 87.2 68.5 67.6 72.9 77.4

We ablate the contrastive loss LNCE (temporal alignment) and the vector-quantized loss LVQ (tactical mode diversity). Removing either component reduces success rate; the full model consistently performs best.

Task / Scenario TART (Ours) TART w/o LNCE TART w/o LVQ
Maze Navigation / Easy 97.2±1.3 92.8±3.3 95.4±1.9
Maze Navigation / Medium 90.8±4.2 89.0±3.5 88.2±5.3
Maze Navigation / Hard 72.8±9.9 69.2±5.7 67.0±3.2
Air-to-Air Combat / Easy 94.8±3.1 92.4±5.1 91.8±2.2
Air-to-Air Combat / Medium 90.8±4.2 87.0±6.2 89.8±2.6
Air-to-Air Combat / Hard 76.8±5.0 73.6±6.2 70.8±2.1
Average Success Rate 87.2 84.0 83.8

In Hard maze navigation, ablated models show higher TTG and Occupancy Coverage, reflecting delayed Penetration usage. In air combat, ablations increase TTE and SPE with identical budgets, indicating degraded launch timing and weaker follow-up maneuvers. Variance is larger for TART w/o LNCE, suggesting that multi-modality without contrastive alignment leads to unstable mode drift.

Success rate gains do not come at the cost of resource efficiency. In maze navigation, TART achieves lower or comparable TTG (Time-to-Goal). In air-to-air combat, TART maintains lower or comparable TTE (Time-to-Elimination) and SPE (Shots-per-Elimination).

Resource efficiency metrics across environments

Causal dependency and multi-modality

Discrete resource actions constrain feasible follow-up maneuvers (causal dependency) and give rise to multiple valid maneuver modes (multi-modality). TART is designed to capture both properties through its temporal representation and VQ codebook.

Causal dependency and multi-modality in air combat

BibTeX

@inproceedings{jung2026temporal,
  author    = {Jung, Hoseong and Son, Sungil and Cho, Daesol and Park, Jonghae and Choi, Changhyun and Kim, H. Jin},
  title     = {Temporal Action Representation Learning for Tactical Resource Control and Subsequent Maneuver Generation},
  booktitle = {International Conference on Robotics and Automation (ICRA)},
  year      = {2026},
}