Back to Agent Lightning

Agent-OS Integration for Agent-Lightning

contrib/recipes/agentos/README.md

0.3.02.5 KB
Original Source

Agent-OS Integration for Agent-Lightning

Kernel-level safety during AI agent training.

Overview

Agent-OS provides deterministic governance for AI agents. This integration enables:

  • 0% unpenalized policy violations — All unsafe actions are detected and penalized
  • Policy violations → RL penalties — Agents learn to avoid unsafe behavior
  • Complete audit trail — From training to production

Installation

bash
pip install agentlightning agent-os

Quick Start

python
from agentlightning import Trainer
from agentlightning.contrib.runner.agentos import AgentOSRunner
from agentlightning.contrib.reward.agentos import PolicyReward
from agent_os import KernelSpace
from agent_os.policies import SQLPolicy

# Create governed kernel
kernel = KernelSpace(policy=SQLPolicy(
    deny=["DROP", "DELETE"]
))

# Wrap in Agent-OS runner
runner = AgentOSRunner(kernel)

# Train with policy-aware rewards
trainer = Trainer(
    runner=runner,
    reward_fn=PolicyReward(kernel),
    algorithm="GRPO"
)

trainer.train()

Components

AgentOSRunner

Wraps agent execution with kernel-level policy enforcement:

python
from agentlightning.contrib.runner.agentos import AgentOSRunner

runner = AgentOSRunner(
    kernel,
    fail_on_violation=False,  # Continue but penalize
    emit_violations=True,     # Emit as spans
)

PolicyReward

Converts policy violations to negative RL rewards:

python
from agentlightning.contrib.reward.agentos import PolicyReward

reward_fn = PolicyReward(
    kernel,
    base_reward_fn=accuracy_reward,
    critical_penalty=-100.0,
    clean_bonus=5.0,
)

FlightRecorderAdapter

Imports Agent-OS audit logs to LightningStore:

python
from agentlightning.contrib.adapter.agentos import FlightRecorderAdapter

adapter = FlightRecorderAdapter(flight_recorder)
adapter.import_to_store(lightning_store)

Benchmarks

MetricWithout Agent-OSWith Agent-OS
Undetected Policy Violations12.3%0.0%
Task Accuracy76.4%79.2%

Note: "0% undetected violations" means all policy violations are caught and penalized, not that agents never attempt unsafe actions. Over training, agents learn to minimize violation attempts.

Documentation

License

MIT