Unsloth new Reinforcement Learning

We’ve got lots of Unsloth updates especially for RL.

Oct 02, 2025

As Reinforcement Learning (RL) continues to gain momentum, our mission at Unsloth is to make it easier and more accessible for everyone. We’re introducing many new RL capabilities, including support for OpenAI’s gpt-oss, vision models, and more memory-efficient RL. Plus, our new gpt-oss RL inference achieves the fastest tokens/s vs. any implementation.

Beyond RL, Unsloth is also focused on accuracy and quants. Third-party Aider Polyglot benchmarks for Unsloth DeepSeek-V3.1 Dynamic GGUFs revealed that the 3-bit GGUF achieved 75.6% on Aider, only 0.5% lower than the full precision unquantized model. This 75.6% score surpasses SOTA models such as Claude-4-Opus (thinking) and GPT-4.1.

Aider is one of the most comprehensive benchmarks to test how well LLMs can write, code, follow instructions, and apply changes without human intervention.

DeepSeek GGUF Results

To make training easier, Unsloth now has a Docker image: Just run the container on Windows or Linux and start training without dependency issues. Guide

Here are rest of the Unsloth news:

🦥 Unsloth Updates

Inference is crucial in RL training. To achieve the fastest inference speed for gpt-oss without vLLM, we rewrote Transformers to enable at least 3× faster inference for gpt-oss. You can train gpt-oss with GRPO/GSPO in our Colab notebook, where we showcase how you can automatically create faster GPU kernels.

gpt-oss RL Blog

Vision RL is here with Gemma 3, Qwen2.5-VL and other vision models. Due to Unsloth’s unique weight sharing and custom kernels, Unsloth makes VLM RL 1.5–2× faster, use 90% less VRAM, and enables 10× longer context lengths than FA2 setups, with no accuracy loss.

Vision RL Blog

IBM releases Granite-4.0 today, and Unsloth has Day Zero support. You can run with our Dynamic GGUFs or fine-tuning notebook which has a support agent example. Guide
RL is now even faster & more memory efficient! Our new kernels & algos allows faster RL with 50% less VRAM & 10× more context. Blog
Introducing Unsloth Flex Attention for gpt-oss that enables >8× longer context, >50% less VRAM and >1.5× faster. Blog
We’re hosting a Developer event with Mistral AI & NVIDIA at Y Combinator’s Office in San Francisco on Oct 21. Come say hello!

We’ve got lots of new collabs and features over the next few weeks including a new product launch so stay tuned!

r/unsloth Reddit

Unsloth AI

Discussion about this post