Apple Rendering-Aware Reinforcement Learning for
Vector Graphics Generation

Joan Rodriguez1,2,3, Haotian Zhang5, Abhay Puri1, Aarash Feizi1,2,11, Rishav Pramanik7, Pascal Wichmann6, Arnab Mondal2,8, Mohammad Reza Samsami2,9,
Rabiul Awal1,2, Perouz Taslakian1, Spandana Gella1, Sai Rajeswar1,2, David Vazquez1, Christopher Pal1,2,4,10, Marco Pedersoli1,2,3
1ServiceNow Research 2Mila 3ÉTS Montréal 4Polytechnique Montréal 5Columbia University
6Independent Scholar 7Stony Brook University 8Apple 9Google Research 10Canada CIFAR AI Chair 11McGill University
NeurIPS 2025 · San Diego, California

How RLRF Works

Reinforcement Learning from Rendering Feedback

RLRF Method Overview

RLRF trains a vision-language model to generate SVG code by using the rendered image as feedback. Unlike traditional supervised learning which only looks at the code tokens, RLRF renders the generated SVG and compares it to the target image. This allows the model to learn from visual mistakes and improve its generation quality, even for complex shapes and gradients that are hard to represent when the model does not see the rendered image.

Composite Reward Function

Three complementary signals guide the learning process

Reconstruction

Pixel-level fidelity measured via L2 distance between the rendered SVG and the input image.

Rrecon = -||I - Î||²

Semantic Similarity

High-level perceptual alignment using DreamSim or CLIP embeddings.

Rsem = cos(φ(I), φ(Î))

Code Efficiency

Encourages compact SVG code by penalizing token length deviation from ground truth.

Reff = -|len(ŷ) - len(y)|

Combined Reward: R = λ₁·Rrecon + λ₂·Rsem + λ₃·Reff

Performance Comparison

RLRF sets a new state-of-the-art with consistent gains across all metrics.

MSE ↓ (lower is better)

SSIM ↑ (higher is better)

DINO ↑ (higher is better)

LPIPS ↓ (lower is better)

Code Efficiency (tokens)

Time to Sample (seconds)

Baselines
RLRF (Ours)

Detailed Results

Full comparison across all models and metrics.

Model ↓ MSE ↑ SSIM ↑ DINO ↓ LPIPS Code Eff. Time(s)
RLRF Results on SVG Base Models
StarVector-1B-Base4.6087.0096.009.22-80064
+RLRF (ours)3.4688.0098.007.51-12723
Δ Improvement-1.14+1.0+2.0-1.71+763-41
Qwen2.5VL-3B-Instruct23.3162.2869.2635.30+1.5k24
+SVG-SFT (ours)9.4878.4092.6017.44-2.5k67
+RLRF (ours)4.7988.7695.9710.9719948
Δ Improvement-4.69+10.36+3.37-6.47+2.7k-19
Qwen2.5-VL-7B-Instruct23.1061.4078.0033.80+76537
+SVG-SFT (ours)8.6079.4093.0016.58-2.8k73
+RLRF (ours)1.0395.1098.703.08-33463
Δ Improvement-7.57+15.70+5.70-13.50+2.5k-10
VLMs (Open-Source)
Qwen2.5VL-32B-Instruct23.6255.4682.3835.83+1.3k58
Qwen2.5VL-72B-Instruct23.2055.7281.6834.14+1.4k62
Llama4-Scout (109B)20.9858.5883.7233.37+1.4k57
Llama4-Maverick (400B)20.6759.2685.6131.75+1.3k61
VLMs (Closed-Source)
Gemini-Flash-1.520.3859.6584.7033.27+1.2k59
Gemini-Flash-2.019.3160.2186.5332.10+1.1k63
Gemini-1.5-Pro20.1960.7584.1733.02+1.2k58
Claude 3.7 Sonnet17.7369.3379.8028.42+1.4k62
GPT-4o-112016.9266.9189.0027.55+1.3k60
Image Processing Methods
Im2VEC18.1076.5069.2029.10-4.3k<1
Potrace8.1577.2889.2319.10-7.3k12
DiffVG6.6481.2386.1220.5-19.7k31
PyAutoTrace4.7187.4495.6810.71-99.7k<1
VTracer4.2587.9495.7511.66-12.9k<1
SuperSVG3.0583.3082.7013.50-65.6k<1
LIVE2.2288.1193.457.23-18.3k1,243

Qualitative Results

High-fidelity generation with complex details

Figure: Example of SVG generation. RLRF produces cleaner paths and more accurate gradients compared to baseline methods.
Image-to-SVG generation results
Figure: Image-to-SVG generation results.
Image-to-SVG emoji generation results
Figure: Image-to-SVG emoji generation results.
Text-to-SVG thinking process results
Figure: Text-to-SVG thinking process results.
Text-to-SVG generation results
Figure: Text-to-SVG generation results.

Citation

@inproceedings{rodriguez2025rendering,
  title={Rendering-Aware Reinforcement Learning for Vector Graphics Generation},
  author={Rodriguez, Juan A and Zhang, Haotian and Puri, Abhay and Feizi, Aarash and Pramanik, Rishav and Wichmann, Pascal and Mondal, Arnab and Samsami, Mohammad Reza and Awal, Rabiul and Taslakian, Perouz and others},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS 2025)},
  year={2025}
}