TART: Temporal Action Representation Learning for Tactical Resource Control and Subsequent Maneuver Generation

TART outperforms hybrid-action baselines across maze navigation and air-to-air combat

TART achieves an average success rate of 87.2%, compared with 77.4% for the strongest baseline (HyAR), with gains of +2.7 to +13.2 percentage points across all six evaluation scenarios.

Qualitative results in maze navigation and air-to-air combat

Overall Framework of TART

TART maximizes a mutual information objective via trajectory-level contrastive learning, quantizes context embeddings into a tactical codebook, and conditions a hybrid policy to produce multi-modal maneuver distributions.

Abstract

Motivation

Autonomous robotic systems must coordinate discrete resource usage with continuous maneuvers under limited budgets, especially in fast-evolving tactical scenarios. Prior hybrid-action RL methods often neglect two critical properties: causal dependency between resource decisions and subsequent maneuvers, and the multi-modality of valid follow-up behaviors under the same discrete choice.

Approach

We propose TART (Temporal Action Representation learning for Tactical resource control and subsequent maneuver generation), a framework that learns temporally grounded representations for hybrid action policies by modeling the conditional distribution of continuous maneuvers given recent state history and the current discrete resource decision.

Method

TART maximizes a mutual information objective using an InfoNCE contrastive loss that aligns matched context–future trajectory pairs. The resulting context embeddings are vector-quantized into a compact codebook of tactical modes, which condition a factorized hybrid policy: discrete actions select a tactical mode, and the continuous actor generates a multi-modal maneuver distribution under that mode.

Results

We evaluate TART in two resource-limited domains: (i) budgeted maze navigation with discrete Boost and Penetration options, and (ii) high-fidelity F-16 air-to-air combat with missile and defense system deployment. TART consistently outperforms PADDPG, PDQN, HPPO, and HyAR in success rate while maintaining or improving resource efficiency (TTG, TTE, SPE), demonstrating effective temporal coupling between resource control and subsequent maneuvers.

Experiments

Evaluation environments

TART is evaluated in two budgeted hybrid-action domains with three difficulty levels each: maze navigation (Boost, Penetration) and F-16 air-to-air combat (Missile, Gun, Defense).

Overview of maze navigation and air-to-air combat environments

TART outperforms all PAMDP baselines in every scenario. Results are averaged over five random seeds; bold entries denote the best performance.

Task / Scenario	TART	PADDPG	PDQN	HPPO	HyAR
Maze Navigation / Easy	97.2±1.3	88.2±6.9	85.8±6.1	87.2±4.6	94.5±3.2
Maze Navigation / Medium	90.8±4.2	74.2±6.1	68.8±6.2	74.6±9.0	80.6±8.5
Maze Navigation / Hard	72.8±9.9	38.8±10.5	38.4±13.5	46.2±6.7	60.4±9.1
Air-to-Air Combat / Easy	94.8±3.1	79.6±5.6	81.2±5.9	87.8±4.6	86.6±4.8
Air-to-Air Combat / Medium	90.8±4.2	68.6±5.9	74.2±4.3	76.2±4.5	77.6±6.3
Air-to-Air Combat / Hard	76.8±5.0	61.8±7.8	57.2±7.4	65.4±3.0	64.4±7.2
Average Success Rate	87.2	68.5	67.6	72.9	77.4

We ablate the contrastive loss L_NCE (temporal alignment) and the vector-quantized loss L_VQ (tactical mode diversity). Removing either component reduces success rate; the full model consistently performs best.

Task / Scenario	TART (Ours)	TART w/o L_NCE	TART w/o L_VQ
Maze Navigation / Easy	97.2±1.3	92.8±3.3	95.4±1.9
Maze Navigation / Medium	90.8±4.2	89.0±3.5	88.2±5.3
Maze Navigation / Hard	72.8±9.9	69.2±5.7	67.0±3.2
Air-to-Air Combat / Easy	94.8±3.1	92.4±5.1	91.8±2.2
Air-to-Air Combat / Medium	90.8±4.2	87.0±6.2	89.8±2.6
Air-to-Air Combat / Hard	76.8±5.0	73.6±6.2	70.8±2.1
Average Success Rate	87.2	84.0	83.8

In Hard maze navigation, ablated models show higher TTG and Occupancy Coverage, reflecting delayed Penetration usage. In air combat, ablations increase TTE and SPE with identical budgets, indicating degraded launch timing and weaker follow-up maneuvers. Variance is larger for TART w/o L_NCE, suggesting that multi-modality without contrastive alignment leads to unstable mode drift.

Success rate gains do not come at the cost of resource efficiency. In maze navigation, TART achieves lower or comparable TTG (Time-to-Goal). In air-to-air combat, TART maintains lower or comparable TTE (Time-to-Elimination) and SPE (Shots-per-Elimination).

Resource efficiency metrics across environments

Causal dependency and multi-modality

Discrete resource actions constrain feasible follow-up maneuvers (causal dependency) and give rise to multiple valid maneuver modes (multi-modality). TART is designed to capture both properties through its temporal representation and VQ codebook.

BibTeX

@inproceedings{jung2026temporal,
  author    = {Jung, Hoseong and Son, Sungil and Cho, Daesol and Park, Jonghae and Choi, Changhyun and Kim, H. Jin},
  title     = {Temporal Action Representation Learning for Tactical Resource Control and Subsequent Maneuver Generation},
  booktitle = {International Conference on Robotics and Automation (ICRA)},
  year      = {2026},
}