Dynamic Contrastive Skill Learning with State-Transition Based Skill Clustering and Dynamic Length Adjustment

Jinwoo Choi¹, Seung-Woo Seo¹

¹Seoul National University

in ICLR 2025

DCSL overview

Abstract

Dynamic Contrastive Skill Learning (DCSL) is a skill-learning framework for long-horizon reinforcement learning from offline datasets. Instead of defining skills as fixed-length action sequences, DCSL represents skills through the state transitions they produce.

Given an offline dataset, DCSL learns a skill representation that captures the semantic meaning of behaviors. It uses contrastive learning to recognize semantically similar behaviors as the same skill, even when their underlying action sequences differ. DCSL further adjusts the length of each skill dynamically, allowing the model to extract skills whose durations match the natural temporal extent of the behavior.

Challenges

Existing skill-learning methods often abstract long sequences of actions into reusable skills. While this helps long-horizon decision-making, prior methods face several structural limitations:

Action-sequence-based representation: Similar behaviors can require different actions depending on the state, causing them to be learned as different skills.
Fixed skill length: Real behaviors often have variable durations, but fixed-length skills may split meaningful behaviors or merge unrelated ones.
Noisy offline datasets: Offline data can contain irrelevant or random actions, making it difficult to extract clean and reusable skills.
Inefficient skill space: When semantically similar behaviors are not clustered together, the learned skill space can become redundant and less useful for downstream tasks.

These limitations motivate a skill-learning framework that focuses on behavioral meaning rather than raw action sequences.

Main Idea

DCSL framework

DCSL defines skills based on state transitions rather than fixed action sequences. This allows the model to capture what a behavior accomplishes, instead of only how it is executed.
DCSL learns a skill similarity function through contrastive learning. This function brings semantically similar behaviors closer together in the skill space and separates behaviors with different effects.
Using the learned similarity function, DCSL performs dynamic skill length adjustment. The model relabels skill durations by identifying how long a behavior remains semantically consistent.
The learned skills can then be used for downstream reinforcement learning with existing model-free or model-based skill-based RL methods.

Results

Task Performance Across Environments

DCSL environments

DCSL quantitative results

We evaluate DCSL on challenging navigation and manipulation tasks, including AntMaze, Kitchen, and Pick-and-Place environments. Across these domains, DCSL learns reusable skills that improve task completion and efficiency.

In AntMaze tasks, DCSL shows strong performance in long-horizon navigation, demonstrating that state-transition-based skill representations help exploration in sparse-reward environments. In Kitchen tasks, DCSL achieves strong manipulation performance, showing that the learned skills can capture precise object-interaction behaviors. In Pick-and-Place tasks with increasingly noisy datasets, DCSL maintains robust performance, highlighting the importance of dynamic skill length adjustment when offline data contains irrelevant or suboptimal actions.

Skill Representation Analysis

DCSL qualitative results

DCSL also learns a more meaningful and diverse skill representation space. Compared to prior methods such as SPiRL and SkiMo, which often produce repeated behavior patterns across different skills, DCSL clusters similar behaviors more effectively and generates more diverse behaviors in the skill space. This suggests that DCSL can extract reusable skills that better reflect the semantic structure of the dataset.

Citation

@article{choi2025dynamic,
  title={Dynamic Contrastive Skill Learning with State-Transition Based Skill Clustering and Dynamic Length Adjustment},
  author={Choi, Jinwoo and Seo, Seung-Woo},
  journal={arXiv preprint arXiv:2504.14805},
  year={2025}
}

The website template was partly borrowed from Michael Gharbi, and Jon Barron.