Artificial intelligence data annotation startup Encord, officially known as Cord Technologies Inc., wants to break down barriers to training multimodal AI models.
To do that, it has just released what it says is the world’s largest open-source multimodal dataset to help developers of all shapes and sizes build more sophisticated AI systems.
Along with the dataset, Encord has created a new methodology for training multimodal AI models. It’s called EBind, and the company claims it can be used to train advanced models capable of processing multiple kinds of data on a single graphics processing unit within a matter of hours, rather than weeks or days.
The startup says the new dataset and methodology can help to democratize access to multimodal AI and increase the ability of smaller startups to compete with the likes of OpenAI, Google LLC, Meta Platforms Inc. and Anthropic PBC.
Encord does know a thing or two about AI training, so it’s qualified to make such a claim. The company is the creator of an automated data annotation platform that’s used to label and annotate different types of data, including text files, images, videos and audio, so it can be used to train machine learning and computer vision models.
Though automated data annotation systems are not new, traditional ones have relied heavily on human supervision. Encord doesn’t do this, instead automating the entire process by using AI itself to supervise the AI that’s doing the annotating, which helps companies to get large datasets ready for AI training much faster than was possible before.
Why multimodal AI?
Encord co-founder and Chief Executive Eric Landau said the company wants to democratize access to multimodal AI because of its huge potential. Multimodal AI models are uniquely able to process multiple kinds of data, which is different from standard chatbots that can only be trained on text, or computer vision models, which learn exclusively from images. By ingesting multiple kinds of data, they can be used to solve more complex problems and generate more nuanced outputs.
“Multimodal AI is the next major leap for our industry, with the power to teach robots, self-driving cars, drones and other systems to recognize and make inferences from their physical environments using the same combination of senses that humans use,” Landau explained.
The problem with multimodal AI is that, until now, it has been largely inaccessible to smaller teams. For one thing, there’s a lack of multimodal data lying around in the public domain that can be used to train these models. And existing training methodologies require vast computation resources to run efficiently, which makes them prohibitively expensive for many smaller companies.
Landau said Encord’s new dataset and EBind methodology are meant to disrupt that status quo: “[They will] vastly reduce the time and compute power needed to develop, train and deploy multimodal AI systems – and will help to unleash the next wave of innovation in this space,” he promised.
Focus on data quality
The EBind methodology was designed to be used with Encord’s voluminous and high-quality open multimodal dataset. It relies on a “single encoder per data modality,” wherein the training process is driven more by data quality rather than raw compute power. So, the better the data is, the faster the models can be trained, even if only limited compute resources are available, Landau said.
According to Encord’s internal research, it was able to train a simple, 1.8 billion-parameter multimodal model that outperformed rivals models with up to 17 times more parameters, and it did this in just a few hours using a single GPU. The company has not yet published this research so its claims cannot be verified, but Charlotte Bax, CEO of the British vision AI startup Captur Ltd., has had early access to the dataset and methodology and was mightily impressed.
“The dataset opens new possibilities for improving performance on image quality measures for our shared models across various verticals,” Bax said. “We’re always looking at ways to augment datasets for our on-device models to achieve better handling of edge cases, and Encord’s new dataset offers a powerful pathway to accomplish that goal.”
Encord President Ulrik Stig Hansen said the success of the new methodology shows that data quality, rather than computing resources, will have the biggest impact on AI innovation in future. “The winning organizations… [will be those] that adopt new approaches to data curation and dataset construction, not just those that throw escalating levels of compute power at the problem,” he predicted.
Image: News/Dreamina
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
- 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
- 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About News Media
Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.
