Reinforcement Learning from Rendering Feedback
RLRF trains a vision-language model to generate SVG code by using the rendered image as feedback. Unlike traditional supervised learning which only looks at the code tokens, RLRF renders the generated SVG and compares it to the target image. This allows the model to learn from visual mistakes and improve its generation quality, even for complex shapes and gradients that are hard to represent when the model does not see the rendered image.
Three complementary signals guide the learning process
Pixel-level fidelity measured via L2 distance between the rendered SVG and the input image.
Rrecon = -||I - Î||²
High-level perceptual alignment using DreamSim or CLIP embeddings.
Rsem = cos(φ(I), φ(Î))
Encourages compact SVG code by penalizing token length deviation from ground truth.
Reff = -|len(ŷ) - len(y)|
Combined Reward:
R = λ₁·Rrecon + λ₂·Rsem + λ₃·Reff
RLRF sets a new state-of-the-art with consistent gains across all metrics.
Full comparison across all models and metrics.
| Model | ↓ MSE | ↑ SSIM | ↑ DINO | ↓ LPIPS | Code Eff. | Time(s) |
|---|---|---|---|---|---|---|
| RLRF Results on SVG Base Models | ||||||
| StarVector-1B-Base | 4.60 | 87.00 | 96.00 | 9.22 | -800 | 64 |
| +RLRF (ours) | 3.46 | 88.00 | 98.00 | 7.51 | -127 | 23 |
| Δ Improvement | -1.14 | +1.0 | +2.0 | -1.71 | +763 | -41 |
| Qwen2.5VL-3B-Instruct | 23.31 | 62.28 | 69.26 | 35.30 | +1.5k | 24 |
| +SVG-SFT (ours) | 9.48 | 78.40 | 92.60 | 17.44 | -2.5k | 67 |
| +RLRF (ours) | 4.79 | 88.76 | 95.97 | 10.97 | 199 | 48 |
| Δ Improvement | -4.69 | +10.36 | +3.37 | -6.47 | +2.7k | -19 |
| Qwen2.5-VL-7B-Instruct | 23.10 | 61.40 | 78.00 | 33.80 | +765 | 37 |
| +SVG-SFT (ours) | 8.60 | 79.40 | 93.00 | 16.58 | -2.8k | 73 |
| +RLRF (ours) | 1.03 | 95.10 | 98.70 | 3.08 | -334 | 63 |
| Δ Improvement | -7.57 | +15.70 | +5.70 | -13.50 | +2.5k | -10 |
| VLMs (Open-Source) | ||||||
| Qwen2.5VL-32B-Instruct | 23.62 | 55.46 | 82.38 | 35.83 | +1.3k | 58 |
| Qwen2.5VL-72B-Instruct | 23.20 | 55.72 | 81.68 | 34.14 | +1.4k | 62 |
| Llama4-Scout (109B) | 20.98 | 58.58 | 83.72 | 33.37 | +1.4k | 57 |
| Llama4-Maverick (400B) | 20.67 | 59.26 | 85.61 | 31.75 | +1.3k | 61 |
| VLMs (Closed-Source) | ||||||
| Gemini-Flash-1.5 | 20.38 | 59.65 | 84.70 | 33.27 | +1.2k | 59 |
| Gemini-Flash-2.0 | 19.31 | 60.21 | 86.53 | 32.10 | +1.1k | 63 |
| Gemini-1.5-Pro | 20.19 | 60.75 | 84.17 | 33.02 | +1.2k | 58 |
| Claude 3.7 Sonnet | 17.73 | 69.33 | 79.80 | 28.42 | +1.4k | 62 |
| GPT-4o-1120 | 16.92 | 66.91 | 89.00 | 27.55 | +1.3k | 60 |
| Image Processing Methods | ||||||
| Im2VEC | 18.10 | 76.50 | 69.20 | 29.10 | -4.3k | <1 |
| Potrace | 8.15 | 77.28 | 89.23 | 19.10 | -7.3k | 12 |
| DiffVG | 6.64 | 81.23 | 86.12 | 20.5 | -19.7k | 31 |
| PyAutoTrace | 4.71 | 87.44 | 95.68 | 10.71 | -99.7k | <1 |
| VTracer | 4.25 | 87.94 | 95.75 | 11.66 | -12.9k | <1 |
| SuperSVG | 3.05 | 83.30 | 82.70 | 13.50 | -65.6k | <1 |
| LIVE | 2.22 | 88.11 | 93.45 | 7.23 | -18.3k | 1,243 |
High-fidelity generation with complex details
@inproceedings{rodriguez2025rendering,
title={Rendering-Aware Reinforcement Learning for Vector Graphics Generation},
author={Rodriguez, Juan A and Zhang, Haotian and Puri, Abhay and Feizi, Aarash and Pramanik, Rishav and Wichmann, Pascal and Mondal, Arnab and Samsami, Mohammad Reza and Awal, Rabiul and Taslakian, Perouz and others},
booktitle={Advances in Neural Information Processing Systems (NeurIPS 2025)},
year={2025}
}