Star Icon StarVector: Generating Scalable Vector Graphics Code From Images And Text

CVPR 2025

1 ServiceNow Research 2 Mila - Quebec AI Institute 3 Canada CIFAR AI Chair 4 ETS, Montreal, Canada 5 UBC, Vancouver, Canada 6 Apple
Teaser Image
Figure 1: StarVector is a foundation model for SVG generation. It uses a Vision-Language Modeling architecture to understand images and text instructions. StarVector excels at vectorizing a wide range of visual inputs, from general icons and logotypes to more intricate vectors such as technical diagrams.

StarVector represents a breakthrough in Scalable Vector Graphics (SVG) generation, seamlessly integrating visual and textual inputs into a unified foundation SVG model. By reframing vectorization as a code generation task rather than a traditional image processing problem, StarVector transcends previous limitations. This paradigm shift enables the model to leverage the full richness of SVG syntax—including circles, polygons, text elements, and complex paths—without simplification. Our approach allows training on internet-scale data to capture the diverse spectrum of vector representations. At its core, the model employs a vision-language architecture (VLM), enabling unprecedented capabilities in generating complex SVG elements. Complemented by SVG-Stack—our extensive dataset—and SVG-Bench—our comprehensive evaluation framework—StarVector establishes a new paradigm for high-quality vector graphics generation.

Key Capabilities

01

Advanced Multimodal Architecture

StarVector's multimodal architecture processes both visual and textual information with remarkable precision, enabling sophisticated image vectorization and text-guided SVG creation that captures fine details and structural relationships. The image encoder and language decoder work together to understand the semantics of an image in pixel space, recognizing primitive shapes, hierarchies, and layers to produce compact and semantically meaningful SVG primitive outputs.

02

Unparalleled Complexity Handling

Where traditional algorithms falter, StarVector excels—effortlessly recognizing and generating intricate SVG elements including text, complex paths, and various primitives directly from images. The model intelligently identifies geometric shapes, connectivity patterns, and structural elements to produce professional-quality diagrams and icons.

03

Robust Data Foundation

Built upon SVG-Stack—our meticulously curated dataset of over 2 million SVG samples—and evaluated through SVG-Bench, StarVector benefits from diverse, high-quality training examples that ensure consistent performance across various graphic styles and complexities.

04

Leading-Edge Performance

StarVector significantly outperforms existing methods in both text-to-SVG and image-to-SVG generation tasks, demonstrating a substantial leap forward in vectorization quality while remaining fully accessible to the research community as an open-source resource.

Model Architecture

StarVector employs a vision-language architecture to generate high-quality SVG code

Teaser Image
Figure 2: a) StarVector Architecture: StarVector projects images into embeddings via an image encoder, then maps these embeddings to the LLM hidden space using an LLM Adapter, generating Visual Tokens. Text conditioning is achieved with the LLM's tokenizer and embedder. The model learns to map token sequences (visual or textual) to SVG code. The symbol ⊕ denotes mutually exclusive operations (image-to- SVG or text-to-SVG), while ‖ indicates sequence concatenation. Figure 2: b)Vision Model and Adapter: The image encoder employs a Vision Transformer (ViT) to process image patches sequentially. The LLM Adapter non-linearly projects embeddings into visual tokens for LLM integration.

The architecture shown above enables StarVector to process both images and text prompts through a unified framework. This approach allows the model to leverage the strengths of both modalities, resulting in more accurate and contextually appropriate SVG generation. The LLM Adapter is a critical component that bridges the gap between visual and textual representations, ensuring that the model can effectively translate visual information into structured SVG code.

Quick Start - Image2SVG Generation

Get started with StarVector in just a few lines of code

              
from PIL import Image
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoProcessor
from starvector.data.util import process_and_rasterize_svg
import torch

# Load the model
model_name = "starvector/starvector-8b-im2svg"

starvector = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, trust_remote_code=True)
processor = starvector.model.processor
tokenizer = starvector.model.svg_transformer.tokenizer

# Move model to GPU and set to evaluation mode
starvector.cuda()
starvector.eval()

# Load and process the input image
image_pil = Image.open('assets/examples/sample-18.png')

image = processor(image_pil, return_tensors="pt")['pixel_values'].cuda()
if not image.shape[0] == 1:
    image = image.squeeze(0)
batch = {"image": image}

# Generate SVG from the image
raw_svg = starvector.generate_im2svg(batch, max_length=4000)[0]
svg, raster_image = process_and_rasterize_svg(raw_svg)
            

The code above demonstrates how to load a pre-trained StarVector model using the Transformers library, process an input image, and generate SVG code. The model handles all the complexity of understanding the visual elements and translating them into structured vector graphics code.

Note: To use image rasterization features, you need to install the starvector library. Visit the StarVector repository for installation instructions and to ensure all dependencies are properly installed.

Models

StarVector models achieve state-of-the-art performance on SVG generation tasks

We provide Hugging Face 🤗 model checkpoints for image2SVG vectorization, for 💫 StarVector-8B and 💫 StarVector-1B. These are the results on SVG-Bench, using the DinoScore metric.

Method SVG-Stack SVG-Fonts SVG-Icons SVG-Emoji SVG-Diagrams
AutoTrace 0.942 0.954 0.946 0.975 0.874
Potrace 0.898 0.967 0.972 0.882 0.875
VTracer 0.954 0.964 0.940 0.981 0.882
Im2Vec 0.692 0.733 0.754 0.732 -
LIVE 0.934 0.956 0.959 0.969 0.870
DiffVG 0.810 0.821 0.952 0.814 0.822
GPT-4-V 0.852 0.842 0.848 0.850 -
💫 StarVector-1B 0.926 0.978 0.975 0.929 0.943
💫 StarVector-8B 0.966 0.982 0.984 0.981 0.959

Note: StarVector models will not work for natural images or illustrations, as they have not been trained on those images. They excel in vectorizing icons, logotypes, technical diagrams, graphs, and charts.

As shown in the table above, StarVector-8B achieves the highest performance across all benchmark datasets, demonstrating its effectiveness in generating high-quality SVG code from images. The model's ability to understand and reproduce complex vector graphics makes it particularly valuable for applications requiring precise vectorization of icons, logos, and technical diagrams.

Datasets - SVG-Bench

A comprehensive benchmark for evaluating SVG generation models

SVG-Bench is a benchmark for evaluating SVG generation models. It contains 10 datasets, and 3 tasks: Image-to-SVG, Text-to-SVG, and Diagram-to-SVG. The benchmark provides a standardized way to assess the performance of different approaches to SVG generation, enabling fair comparisons and driving progress in the field.

See our Huggingface 🤗 Dataset Collection

Dataset Train Val Test Token Length SVG Primitives Annotation
SVG-Stack 🤗 2.1M 108k 5.7k 1,822 ± 1,808 All Captions
SVG-Stack_sim 🤗 601k 30.1k 1.5k 2k ± 918 Vector path -
SVG-Diagrams 🤗 - - 472 3,486 ± 1,918 All -
SVG-Fonts 🤗 1.8M 91.5k 4.8k 2,121 ± 1,868 Vector path Font letter
SVG-Fonts_sim 🤗 1.4M 71.7k 3.7k 1,722 ± 723 Vector path Font letter
SVG-Emoji 🤗 8.7k 667 668 2,551 ± 1,805 All -
SVG-Emoji_sim 🤗 580 57 96 2,448 ± 1,026 Vector Path -
SVG-Icons 🤗 80.4k 6.2k 2.4k 2,449 ± 1,543 Vector path -
SVG-Icons_sim 🤗 80,435 2,836 1,277 2,005 ± 824 Vector path -
SVG-FIGR 🤗 270k 27k 3k 5,342 ± 2,345 Vector path Class, Caption

We offer a summary of statistics about the datasets used in our training and evaluation experiments. These datasets are included in SVG-Bench. The subscript _sim_ stands for the simplified version of the dataset, as required by some baselines.

Datasets Examples

Datasets Examples
Figure 5: Examples from our diverse SVG-Bench datasets. The benchmark includes a wide range of vector graphics styles, from simple icons to complex colored illustrations, enabling comprehensive evaluation of SVG generation models.

The diversity and scale of these datasets enable StarVector to learn a wide range of SVG generation capabilities, from simple icons to complex diagrams. By training on this comprehensive collection, the model develops a robust understanding of vector graphics principles and can generalize to new, unseen examples.

Qualitative Results

Visual comparison of StarVector against baseline methods

The following examples demonstrate StarVector's superior performance in generating high-quality SVG code from various input images. These comparisons highlight the model's ability to capture fine details and structural elements that other methods often miss.

Image-to-SVG Comparison
Figure 3: Comparison of StarVector with baseline methods on various image-to-SVG tasks. Note how StarVector preserves fine details and structural elements while producing clean vector graphics. Traditional methods often struggle with complex shapes and details.
MSE Comparison
Figure 4: Limitations of pixel-based metrics like MSE for evaluating SVG quality. Two visually different outputs can have similar MSE scores, highlighting the need for perceptual metrics that better align with human judgment.
SVG Diagrams Comparison
Figure 5: Comparison on technical diagrams. StarVector excels at vectorizing complex diagrams with multiple elements, preserving both structure and details. Note how our model correctly handles text elements, connections, and geometric shapes that are crucial for diagram comprehension.

Key observations: StarVector consistently produces cleaner, more accurate SVG representations compared to traditional vectorization methods. The model's ability to understand semantic content enables it to make intelligent decisions about which details to preserve and how to structure the resulting SVG code.

These qualitative results demonstrate that StarVector not only achieves higher numerical scores on benchmark metrics but also produces visually superior results that better capture the intent and structure of the original images. This is particularly evident in complex cases like technical diagrams and detailed icons, where traditional methods often struggle to maintain coherence and accuracy.

Conclusion

StarVector represents a significant advancement in the field of vector graphics generation. By combining the power of vision-language models with a comprehensive training dataset, we've created a system that can accurately translate images into high-quality SVG code. The model's performance on SVG-Bench demonstrates its effectiveness across a wide range of vector graphics tasks.

We believe that StarVector will enable new applications in design, illustration, and technical documentation, making vector graphics more accessible and easier to create. We invite the research community to build upon our work and explore new directions in this exciting field.

For more details, please refer to our paper and explore our code repository.

If you find this work useful, please cite:

@misc{rodriguez2024starvector,
  title={StarVector: Generating Scalable Vector Graphics Code from Images and Text}, 
  author={Juan A. Rodriguez and Abhay Puri and Shubham Agarwal and Issam H. Laradji and Pau Rodriguez and Sai Rajeswar and David Vazquez and Christopher Pal and Marco Pedersoli},
  year={2024},
  eprint={2312.11556},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2312.11556}, 
}