Inference computing is critical in this new era of artificial intelligence, but energy and cost issues can plague companies trying to implement AI.
D-Matrix Corp., a computing platform that specifically manages AI inferencing workloads is determined to give customers more inference in less time, with less energy.
“We’re super excited … to announce the world’s most efficient AI computing accelerator for inference,” said Sid Sheth (pictured), founder and chief executive officer of d-Matrix. “We built this product with inference and inference only in mind. When we started the company back in 2019, we essentially looked at, the landscape of AI compute out there, and made a bet that inference computing would be the largest computing opportunity of our lifetime.”
Sheth spoke with theCUBE Research’s John Furrier at SC24, during an exclusive broadcast on theCUBE, News Media’s livestreaming studio. They discussed the evolution of inference computing. (* Disclosure below.)
Solving the pain points of inference computing
D-Matrix has built a Peripheral Component Interconnect card to complement Nvidia Corp.’s graphic processing units. The biggest obstacle to inference workloads is a lack of bandwidth and memory capacity, according to Sheth.
“We’ve built the acceleration cards to package the silicon together and we build a software stack that goes along with it to essentially map AI workloads onto the silicon,” he said. “We sell the whole unit along with the software and then we work with partners in the server ecosystem … but the big difference in the way we go to market is the fact that we take a very collaborative approach with the ecosystem.”
D-Matrix will work with customers to figure out which server vendor works best with them. The company has already built integrations for Liquid AI, GigaIO and Super Micro Computer Inc. Flexibility is key to d-Matrix’s approach.
“We do not need any special server configuration for that [PCI] card to plug into,” Sheth said. “It is something that is already available as a server [configuration] … from pretty much a lot of the system. Options are already there. It comes down to what is the end user application that the customer is trying to solve for and how we can make them better at that.”
D-Matrix’s new accelerator card will manage low-latency batches of data for use cases such as creating and interacting with a video prompt from a generative AI model in real time. The main pain points for customers are user experience, cost and energy efficiency, according to Sheth, who claims d-Matrix has found solutions for all of them.
“We can do models that are a hundred billion in a rack, and we can do them better than anyone else on the three things … which is the user experience, which is interactivity, cost and power efficiency and energy efficiency,” he said. “Every enterprise customer that we have spoken to, most of them at least feel that their AI journeys are starting with models in that size bucket. They feel a hundred billion is plenty.”
Here’s the complete video interview, part of News’s and theCUBE Research’s coverage of SC24:
(* Disclosure: d-Matrix Corp. sponsored this segment of theCUBE. Neither d-Matrix nor other sponsors have editorial control over content on theCUBE or News.)
Photo: News
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU