Expose trainer dataset type metadata

## Feature request

TRL already documents the expected dataset type for each trainer in `docs/source/dataset_formats.md`, but that information is not exposed programmatically.

Today, downstream wrappers and validation layers have to hardcode trainer-name checks like `if trainer_class is GRPOTrainer` or maintain their own registry to figure out whether a trainer expects `prompt-only`, `preference`, `prompt-completion`, etc.

A small metadata API on trainer classes would solve this cleanly:

```python
from trl import GRPOTrainer

GRPOTrainer.dataset_types == ("prompt-only",)
```

For trainers that support more than one type:

```python
from trl import SFTTrainer

SFTTrainer.dataset_types == ("language-modeling", "prompt-completion")
```

## Motivation

This would let wrappers, CLIs, and downstream libraries:
- validate datasets before trainer construction
- choose preprocessing paths without hardcoded class checks
- stay aligned with the dataset-type table already maintained in the docs

## Why now

There seems to be recurring user and maintainer effort around trainer-specific dataset semantics and chat-template behavior, for example:
- #3468
- #3915
- #3919
- #4147
- #4201
- #4358

Those issues are not asking for this exact API, but they do show that trainer dataset semantics matter in practice and are currently discovered indirectly through docs or implementation details.

## Proposed shape

Add a class attribute on trainers:

```python
class _BaseTrainer(Trainer):
    dataset_types: tuple[str, ...] = ()
```

Then set it on the trainers already covered by the dataset-format table.

This would be metadata only, with no behavior change.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose trainer dataset type metadata #5511

Feature request

Motivation

Why now

Proposed shape

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Expose trainer dataset type metadata #5511

Description

Feature request

Motivation

Why now

Proposed shape

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions