Figure 1: StarVector is a foundation model for SVG generation. It uses a Vision-Language Modeling architecture
to understand images and text instructions. StarVector excels at vectorizing a wide range of visual
inputs, from general icons and logotypes to more intricate vectors such as technical diagrams.
StarVector represents a breakthrough in Scalable Vector Graphics (SVG) generation, seamlessly integrating visual and textual inputs into a unified foundation SVG model. By reframing vectorization as a code generation task rather than a traditional image processing problem, StarVector transcends previous limitations. This paradigm shift enables the model to leverage the full richness of SVG syntax—including circles, polygons, text elements, and complex paths—without simplification. Our approach allows training on internet-scale data to capture the diverse spectrum of vector representations. At its core, the model employs a vision-language architecture (VLM), enabling unprecedented capabilities in generating complex SVG elements. Complemented by SVG-Stack—our extensive dataset—and SVG-Bench—our comprehensive evaluation framework—StarVector establishes a new paradigm for high-quality vector graphics generation.
Key Capabilities
01
Advanced Multimodal Architecture
StarVector's multimodal architecture processes both visual and textual information with remarkable precision, enabling sophisticated image vectorization and text-guided SVG creation that captures fine details and structural relationships. The image encoder and language decoder work together to understand the semantics of an image in pixel space, recognizing primitive shapes, hierarchies, and layers to produce compact and semantically meaningful SVG primitive outputs.
02
Unparalleled Complexity Handling
Where traditional algorithms falter, StarVector excels—effortlessly recognizing and generating intricate SVG elements including text, complex paths, and various primitives directly from images. The model intelligently identifies geometric shapes, connectivity patterns, and structural elements to produce professional-quality diagrams and icons.
03
Robust Data Foundation
Built upon SVG-Stack—our meticulously curated dataset of over 2 million SVG samples—and evaluated through SVG-Bench, StarVector benefits from diverse, high-quality training examples that ensure consistent performance across various graphic styles and complexities.
04
Leading-Edge Performance
StarVector significantly outperforms existing methods in both text-to-SVG and image-to-SVG generation tasks, demonstrating a substantial leap forward in vectorization quality while remaining fully accessible to the research community as an open-source resource.
Model Architecture
StarVector employs a vision-language architecture to generate high-quality SVG code
Figure 2: a) StarVector Architecture: StarVector projects images into embeddings via an image encoder,
then maps these embeddings to the LLM hidden space using an LLM Adapter, generating Visual Tokens.
Text conditioning is achieved with the LLM's tokenizer and embedder. The model learns to map token
sequences (visual or textual) to SVG code. The symbol ⊕ denotes mutually exclusive operations (image-to-
SVG or text-to-SVG), while ‖ indicates sequence concatenation. Figure 2: b)Vision Model and Adapter: The
image encoder employs a Vision Transformer (ViT) to process image patches sequentially. The LLM Adapter
non-linearly projects embeddings into visual tokens for LLM integration.
The architecture shown above enables StarVector to process both images and text prompts through a unified framework. This approach allows the model to leverage the strengths of both modalities, resulting in more accurate and contextually appropriate SVG generation. The LLM Adapter is a critical component that bridges the gap between visual and textual representations, ensuring that the model can effectively translate visual information into structured SVG code.
Quick Start - Image2SVG Generation
Get started with StarVector in just a few lines of code
fromPILimportImagefromtransformersimportAutoModelForCausalLM, AutoTokenizer, AutoProcessorfromstarvector.data.utilimportprocess_and_rasterize_svgimporttorch# Load the model
model_name = "starvector/starvector-8b-im2svg"
starvector = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, trust_remote_code=True)
processor = starvector.model.processor
tokenizer = starvector.model.svg_transformer.tokenizer
# Move model to GPU and set to evaluation mode
starvector.cuda()
starvector.eval()
# Load and process the input image
image_pil = Image.open('assets/examples/sample-18.png')
image = processor(image_pil, return_tensors="pt")['pixel_values'].cuda()
ifnot image.shape[0] == 1:
image = image.squeeze(0)
batch = {"image": image}
# Generate SVG from the image
raw_svg = starvector.generate_im2svg(batch, max_length=4000)[0]
svg, raster_image = process_and_rasterize_svg(raw_svg)
The code above demonstrates how to load a pre-trained StarVector model using the Transformers library, process an input image, and generate SVG code. The model handles all the complexity of understanding the visual elements and translating them into structured vector graphics code.
Note: To use image rasterization features, you need to install the starvector library. Visit the StarVector repository for installation instructions and to ensure all dependencies are properly installed.
Models
StarVector models achieve state-of-the-art performance on SVG generation tasks
We provide Hugging Face 🤗 model checkpoints for image2SVG vectorization, for 💫 StarVector-8B and 💫 StarVector-1B. These are the results on SVG-Bench, using the DinoScore metric.
Note: StarVector models will not work for natural images or illustrations, as they have not been trained on those images. They excel in vectorizing icons, logotypes, technical diagrams, graphs, and charts.
As shown in the table above, StarVector-8B achieves the highest performance across all benchmark datasets, demonstrating its effectiveness in generating high-quality SVG code from images. The model's ability to understand and reproduce complex vector graphics makes it particularly valuable for applications requiring precise vectorization of icons, logos, and technical diagrams.
Datasets - SVG-Bench
A comprehensive benchmark for evaluating SVG generation models
SVG-Bench is a benchmark for evaluating SVG generation models. It contains 10 datasets, and 3 tasks: Image-to-SVG, Text-to-SVG, and Diagram-to-SVG. The benchmark provides a standardized way to assess the performance of different approaches to SVG generation, enabling fair comparisons and driving progress in the field.
We offer a summary of statistics about the datasets used in our training and evaluation experiments. These datasets are included in SVG-Bench. The subscript _sim_ stands for the simplified version of the dataset, as required by some baselines.
Datasets Examples
Figure 5: Examples from our diverse SVG-Bench datasets. The benchmark includes a wide range of vector graphics styles, from simple icons to complex colored illustrations, enabling comprehensive evaluation of SVG generation models.
The diversity and scale of these datasets enable StarVector to learn a wide range of SVG generation capabilities, from simple icons to complex diagrams. By training on this comprehensive collection, the model develops a robust understanding of vector graphics principles and can generalize to new, unseen examples.
Qualitative Results
Visual comparison of StarVector against baseline methods
The following examples demonstrate StarVector's superior performance in generating high-quality SVG code from various input images. These comparisons highlight the model's ability to capture fine details and structural elements that other methods often miss.
Figure 3: Comparison of StarVector with baseline methods on various image-to-SVG tasks. Note how StarVector preserves fine details and structural elements while producing clean vector graphics. Traditional methods often struggle with complex shapes and details.Figure 4: Limitations of pixel-based metrics like MSE for evaluating SVG quality. Two visually different outputs can have similar MSE scores, highlighting the need for perceptual metrics that better align with human judgment.Figure 5: Comparison on technical diagrams. StarVector excels at vectorizing complex diagrams with multiple elements, preserving both structure and details. Note how our model correctly handles text elements, connections, and geometric shapes that are crucial for diagram comprehension.
Key observations: StarVector consistently produces cleaner, more accurate SVG representations compared to traditional vectorization methods. The model's ability to understand semantic content enables it to make intelligent decisions about which details to preserve and how to structure the resulting SVG code.
These qualitative results demonstrate that StarVector not only achieves higher numerical scores on benchmark metrics but also produces visually superior results that better capture the intent and structure of the original images. This is particularly evident in complex cases like technical diagrams and detailed icons, where traditional methods often struggle to maintain coherence and accuracy.
Conclusion
StarVector represents a significant advancement in the field of vector graphics generation. By combining the power of vision-language models with a comprehensive training dataset, we've created a system that can accurately translate images into high-quality SVG code. The model's performance on SVG-Bench demonstrates its effectiveness across a wide range of vector graphics tasks.
We believe that StarVector will enable new applications in design, illustration, and technical documentation, making vector graphics more accessible and easier to create. We invite the research community to build upon our work and explore new directions in this exciting field.
@misc{rodriguez2024starvector,
title={StarVector: Generating Scalable Vector Graphics Code from Images and Text},
author={Juan A. Rodriguez and Abhay Puri and Shubham Agarwal and Issam H. Laradji and Pau Rodriguez and Sai Rajeswar and David Vazquez and Christopher Pal and Marco Pedersoli},
year={2024},
eprint={2312.11556},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2312.11556},
}