Voxel51’s new automatic labeling technology is expected to cut annotation costs by 100,000 x

A groundbreaking new research from computer vision startup Voxel51 shows that traditional data annotation models are about to be subverted. The company reported in a study released today that its new automatic marking system can achieve 95% human accuracy While existing 5,000 times faster Then 100,000 times cheaper Instead of manual tagging.
This study benchmarks basic models such as Yolo-World and Dino on famous datasets, including cocoa, LVIS, BDD100K and VOC. It is worth noting that in many realities, models trained for AI-generated tags are only trained to those trained with human tags. For companies building computer vision systems, the implication is huge: millions of dollars in annotation costs can be saved, and the model development cycle can shrink in weeks to hours.
A new era of comments: From manual labor to pipelines by models
Data annotation has been a painful bottleneck in AI development for decades. From ImageNet to autonomous driving datasets, teams rely on a massive army of human workers to map boundaries and segment objects, an expensive and slow effort.
The popular logic is simple: more artificially marked data = better AI. But Voxel51’s research puts its assumption on its head.
Their approach utilizes pre-trained underlying models (with zero-fire capability) and integrates them into a pipeline that automates regular labels while using active learning to tag uncertain or complex cases for human review. This approach greatly reduces time and cost.
In one test, it took more than an hour to mark 3.4 million objects with an NVIDIA L40S GPU and sold for $1.18. Doing this manually with AWS SageMaker will take nearly 7,000 hours and cost more than $124,000. In particularly challenging situations, such as identifying rare categories in cocoa or LVIS datasets, occasionally marked models Superior performance Their person marks the counterpart. This surprising result may stem from the consistent labeling pattern of the underlying model and its training on large-scale Internet data.
Internal Voxel51: Teams reshape their visual AI workflow
Founded in 2016 by Professor Jason Corso and Brian Moore of the University of Michigan, Voxel51 was originally a consulting firm for video analytics. Corso is a veteran of computer vision and robotics, and has published more than 150 academic papers and contributed extensive open source code to the AI community. Moore, a former PhD student, serves as CEO.
The turning point is when the team realizes that most AI bottlenecks are not in the model design, but in the data. Insights inspired them to create Fifty OnesThis is a platform designed to enable engineers to explore, curate and optimize visual datasets more effectively.
Over the years, the company has raised more than $45 million, including Series A A and a $30 million Series B led by Bessemer Venture Partners. Then came major customers such as LG Electronics, Bosch, Berkshire Gray, Precision Placing and RIOS adopting enterprises to integrate Voxel51’s tools into their production AI workflow.
From Tools to Platform: Extended Roles of FiftyOne
FiftyOne has evolved from a simple dataset visualization tool to a data-centric AI platform. It supports a variety of formats and label modes – Coco, Pascal VOC, LVIS, BDD100K, open images – and seamlessly integrates with frameworks like Tensorflow and Pytorch.
FiftyOne is not just a visualization tool, it also enables advanced operations: finding duplicate images, identifying samples of label errors, surface outliers, and measuring model failure modes. Its plugin ecosystem supports custom modules for optical role recognition, video Q&A and embedding-based analytics.
The Enterprise version of the FiftyOne team introduced collaboration features such as version control, access and integration with cloud storage (EG, S3), as well as annotation tools such as LabelBox and CVAT. It is worth noting that Voxel51 also works with V7 Labs to simplify traffic between dataset planning and manual annotation.
Rethink the annotation industry
Voxel51’s automatic tagging research challenges assumptions based on nearly 1B $1B annotation industry. In traditional workflows, each image has to be touched by a human, an expensive and often redundant process. Voxel51 believes that most of this work can be eliminated now.
With its system, most images are marked by AI, while edge cases are only upgraded to humans. This hybrid strategy not only reduces costs, but also ensures higher overall data quality, as human efforts are reserved for the most difficult or valuable annotations.
This shift is similar to a broader trend in the AI field Data-centric AI– A method that focuses on optimizing training data rather than infinitely adjusting the model architecture.
Competitive landscape and industry reception
Investors like Bessemer see voxel51 as the “data orchestration layer” of AI, i.e. how DevOps tools transform how software development is developed. Their open source tools have received millions of downloads, and their community includes thousands of developers and ML teams around the world.
While other startups such as slorkel AI, Roboflow and Activeloop also focus on data workflows, Voxel51’s breadth, open source spirit, and enterprise-level infrastructure all stand out. Instead of competing with annotation providers, Voxel51’s platform makes existing services more efficient through selective planning.
What the future means
The long-term meaning is profound. If widely adopted, Voxel51’s approach could significantly reduce barriers to entry to the computer vision, democratizing startups and researchers who lack a broad label budget.
In addition to saving costs, this approach also Continuous learning systemmodels in production will automatically mark failures, then review in the same carefully planned pipeline, re-labeled and folded back into the training data.
The company’s broader vision fits in with how AI is growing: not only smarter models, but smarter workflows. In this vision, the notes are not dead, but are no longer the realm of brute force labor. This is strategic, selective, and driven by automation.