Anyscale and NovaSky teams release SkyRL tx v0.1.0: bringing Tinker-compatible reinforcement learning RL engine to local GPU clusters

How can an AI team run Tinker-style reinforcement learning on large language models using their own infrastructure and a single unified engine? Anyscale and NovaSky (UC Berkeley) Team release SkyRL tx v0.1.0 This provides developers with a way to run Tinker-compatible training and inference engines directly on their own hardware, while retaining the same minimal API exposed by Tinker in the managed service.

Research team description Sky RL TX As a unified training and inference engine, it implements the Tinker API and allows people to run Tinker-like services on their own infrastructure. The v0.1.0 release is the first in the series to support end-to-end reinforcement learning, which also makes sampling significantly faster.

Introduction to Tinker API

Thinking Machines’ Tinker is a training API built around four core features. forward_backward Perform forward and backward passes and accumulate gradients. optim_step Update the model weights based on these gradients. sample Generate tokens for interaction, evaluation, or reinforcement learning operations. save_state Write checkpoints for recovery training.

Tinker exposes these low-level primitives, rather than a complete task-specific fine-tuning abstraction, so that users can implement their own supervised or reinforcement learning loops in regular Python code, while the service handles GPU scheduling and distributed execution.

SkyRL tx targets this API and implements an open backend that users can deploy locally. It preserves the Tinker programming model while eliminating the need to rely solely on a hosting environment.

SkyRL tx location in SkyRL

SkyRL is a full-stack reinforcement learning library for large language models, including skyrl-agent For long-term agents, skyrl-train for training, and skyrl-gym Suitable for use in mathematics, coding, search and SQL tool environments.

In this stack, skyrl-tx Marked as an experimental cross-platform library, it exposes native Tinker (as REST API) for post-model training. Therefore, SkyRL tx becomes the system layer that connects RL logic, environment and training code to specific GPU resources through the Tinker interface.

architecture, also a trainable inference engine

The SkyRL tx architecture is described as an inference engine that also supports backward pass. It has four main components:

  1. REST API server Handle incoming requests from different users.
  2. database It tracks metadata about models, checkpoints, requests, and futures, and also acts as a job queue. The current implementation uses SQLite behind an interface that also supports other SQL databases, such as Postgres.
  3. engine Scheduling and batching requests across users. Each engine instance serves a base model and can have many LoRA adapters attached to it.
  4. Worker Performs forward and backward passes and saves model definition and optimizer state. Multiple workers will enable more advanced multi-node sharding in an upcoming release

What did v0.1.0 add?

The v0.1.0 release focuses on reinforcement learning support and performance improvements. The official release highlights several specific changes:

  • Sampling is now much faster because it is dithered, batched and sliced ​​correctly in the engine.
  • Different sampling parameters per request, per request seed, and stop token are now supported, which is useful when many experiments share a base model.
  • After many fixes, the RL loop now runs properly in the engine.
  • Implemented gradient checkpoint support and micro-batch sampling.
  • Postgres is now supported as a database backend, second only to SQLite.

Running RL end-to-end on 8 H100 GPUs

The official release contains specific code recipes for running reinforcement learning end-to-end on a cluster with 8 H100 GPUs.

First, the user clones the SkyRL repository and skyrl-tx Folder startup engine:

uv run --extra gpu --extra tinker -m tx.tinker.api 
  --base-model Qwen/Qwen3-4B 
  --max-lora-adapters 3 
  --max-lora-rank 1 
  --tensor-parallel-size 8 
  --train-micro-batch-size 8 > out.log

Then they cloned the Tinker Cookbook from the Thinking Machines team and added it in tinker_cookbook/recipes Folder run:

export TINKER_API_KEY=dummy
export WANDB_API_KEY=
uv run --with wandb --with tinker rl_loop.py 
  base_url= 
  model_name="Qwen/Qwen3-4B" 
  lora_rank=1 
  max_length=1024 
  save_every=100

This produces a reward curve, confirming that the RL loop is running correctly through the local SkyRL tx backend.

Main points

  • SkyRL tx v0.1.0 implements a local Tinker compatible engine, unifying the training and inference of LLM post-training.
  • The system exposes the Tinker primitives, forward_backward, optim_step, sample, and save_state via REST, while internally handling batching, LoRA adapters, and device placement.
  • The architecture is divided into an API server, SQL database, scheduling engine, and workers that perform forward and backward passes for a single base model with multiple LoRA adapters.
  • v0.1.0 adds end-to-end reinforcement learning support, faster jittered and sharded sampling, per-request sampling parameters, gradient checkpointing, micro-batching, and Postgres support.

SkyRL tx v0.1.0 is a practical step for development teams who want to use a consistent Tinker API interface for Tinker-style reinforcement learning on their own clusters. Designs that treat the system as an inference engine that also runs backward passes are clean and reduce stack divergence. Support for LoRA, gradient checkpointing, micro-batching, and Postgres are specific system upgrades. Overall, this release turns Tinker compatibility into an LLM-operational native RL backend


Check repurchase agreement and Officially released. Please feel free to check out our GitHub page for tutorials, code, and notebooks. In addition, welcome to follow us twitter And don’t forget to join our 100k+ ML SubReddit and subscribe our newsletter. wait! Are you using Telegram? Now you can also join us via telegram.


Michal Sutter is a data science professional with a master’s degree in data science from the University of Padua. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex data sets into actionable insights.

🙌 FOLLOW MARKTECHPOST: Add us as your go-to source on Google.

You may also like...