Simple Quick Start examples are great – but on what hardware? A note in the doc would be lovely.

### Feature request

Hi there,

This is a detail, really, but I notice that your [simple quickstart](https://huggingface.co/docs/trl/v0.27.2/en/sft_trainer#trl.SFTTrainer):
```python
from trl import SFTTrainer
from datasets import load_dataset

trainer = SFTTrainer(
    model="Qwen/Qwen3-0.6B",
    train_dataset=load_dataset("trl-lib/Capybara", split="train"),
)
trainer.train()
```

This uses around 25GB of RAM on an A100 in Colab. Unfortunately, that still requires Colab Pro, which is not ideal for students. To my surprise, this is also not impossibly far from the memory of a T4 (16GB, but really 14-15GB).

I'm going to look into the quickest way to reduce the memory footprint whilst keeping this example minimal (probably`max_length`?), but since the free hardware available to students worldwide remains the T4 on Colab, usually, would you consider keeping that in mind for those introductory examples? That would hugely help preparing educational material, and would make the library more quickly/widely adopted as well!

What do you think? It might be as easy as adding one note in the docs saying "to run on a T4, add this option"... Thanks in advance for reading!

### Motivation

Make the quickstart examples more accessible (especially for students).

### Your contribution

1. Pointing out the memory issue in Colab (free version).

2. Some small tests:
  A. **11GB VRAM**
    
    ```python
    trainer = SFTTrainer(
        model="Qwen/Qwen2.5-0.5B",
        args = SFTConfig(per_device_train_batch_size=1), # instead of the default of 8
        train_dataset=dataset,
    )
    trainer.train()
    ```  
   B. **13.6GB VRAM**
    ```python
    trainer = SFTTrainer(
        model="Qwen/Qwen2.5-0.5B",
        args = SFTConfig(max_length=256), # instead of the default of 1024
        train_dataset=dataset,
    )
    trainer.train()
    ```  
   C. **12.4GB VRAM**
    ```python
     # reduced from https://ai.google.dev/gemma/docs/core/huggingface_text_finetune_qlora
     # Check if GPU benefits from bfloat16
     if torch.cuda.get_device_capability()[0] >= 8:
         torch_dtype = torch.bfloat16
     else:
         torch_dtype = torch.float16
     trainer = SFTTrainer(
         model="Qwen/Qwen2.5-0.5B",
         args = SFTConfig(
             # default max_len of 1024
             per_device_train_batch_size=1,
             gradient_accumulation_steps=4,
             gradient_checkpointing=True,
             fp16=True if torch_dtype == torch.float16 else False,
             bf16=True if torch_dtype == torch.bfloat16 else False,
         )
         train_dataset=dataset,
     )
     trainer.train()
     ```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple Quick Start examples are great – but on what hardware? A note in the doc would be lovely. #4968

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Simple Quick Start examples are great – but on what hardware? A note in the doc would be lovely. #4968

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions