Unsloth Nov Update

Nov 25, 2025

We’re getting close to our final release of 2025! Thanks so much for sticking with us this year. We’ve got lots of new features like FP8 RL and collabs with OpenAI, NVIDIA and more! See below for our latest updates:

⚡FP8 Reinforcement Learning

DeepSeek-R1 demonstrated how powerful FP8 can be, and faster RL inference is essential as it’s the most compute-intensive part of the workload. We’re introducing FP8-precision training for RL, making FP8 GRPO now possible on consumer GPUs (NVIDIA RTX 40, 50 etc).

Qwen3-1.7B FP8 GRPO now works on just 5GB of VRAM. We collabed with PyTorch for ~1.4× faster FP8 RL inference vs FP16 (even more at longer context). FP8 GRPO benefits from Unsloth’s features like weight sharing, Flex Attention, making it use 60% less VRAM and have 10x longer context than other implementations.

FP8 RL Blog

We also dug into the issue RL reward mismatch when using FP16 vs BF16 and found Unsloth does not have this issue.

Even better Unsloth!

You may notice Unsloth now uses much less VRAM than before, enabling even longer context. We’re also implementing faster training very soon and we’ll share all the details in an upcoming blog.

🚀 OpenAI DevDay collab

We collabed with OpenAI on a gpt-oss RL notebook that autonomously beats the 2048 game using RL. Training was done locally with Unsloth on NVIDIA DGX Spark. Our notebook example was showcased at OpenAI DevDay 2025.

💚 NVIDIA support

NVIDIA DGX Spark: Unsloth enables local fine-tuning of LLMs with up to 200B parameters on the NVIDIA DGX™ Spark. With 128 GB of unified memory, you can train massive models such as gpt-oss-120b, and run or deploy inference directly on DGX Spark.

Blackwell / RTX 50 series: Unsloth now supports NVIDIA’s Blackwell architecture GPUs, including RTX 50-series GPUs (5060–5090), RTX PRO 6000, and GPUs such as B200, GB100 and more! You can read our collab on the official NVIDIA blogpost.

🐋 DeepSeek-OCR

DeepSeek-OCR is a 3B-parameter vision model for OCR and document understanding. It uses context optical compression to convert 2D layouts into vision tokens, enabling efficient long-context processing.

Fine-tune DeepSeek-OCR to enhance its vision or language performance. In our Unsloth free fine-tuning notebook, we demonstrated a 88.26% improvement for language understanding.

🎯 Quantization-Aware Training (QAT)

We worked with TorchAO to enable trainable quantization that recovers as much accuracy as possible. This results in significantly better model quality compared to standard 4-bit naive quantization.

QAT can recover up to 70% of the lost accuracy and achieve a 1-3% model performance improvement on benchmarks such as GPQA and MMLU Pro.

🐳 Run LLMs locally via Docker × Unsloth

You can run any model, including Unsloth Dynamic GGUFs, on Mac, Windows or Linux with a single line of code or no code at all. We collabed with Docker to simplify model deployment, and Unsloth now powers most of Docker’s GGUF models.

🔮 Qwen3-VL

Qwen3-VL is Qwen’s new vision models with instruct and thinking versions. The 2B, 4B, 8B and 32B models are dense, while 30B and 235B are MoE. Unsloth supports Qwen3-VL fine-tuning and RL and Qwen3-VL (8B) can be trained for free with our notebooks.

Hope you have a lovely Thanksgiving (if you celebrate it) otherwise, a lovely rest of the week! =)

Neural Foundry

Nov 25

FP8 GRPO running on consumer RTX cards is a gamechanger for anyone doing RL work at home. Getting that down to 5GB of VRAM for Qwen3 means you can actualy experiment without needing datacenter hardware. The 60% VRAM savings make this feasible for way more people to test ideas localy.

Thibaut

Thank you for everything that you bring to the community! Do you have plans to support Nvidia Jetson Thor devices on top of DGX Spark? It would be greatly appreciated 🙏🏻

2 replies by Unsloth AI and others

3 more comments...

Unsloth AI

Discussion about this post

Ready for more?