By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Interview: Pure Storage on the AI data challenge beyond hardware | Computer Weekly
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Interview: Pure Storage on the AI data challenge beyond hardware | Computer Weekly
News

Interview: Pure Storage on the AI data challenge beyond hardware | Computer Weekly

News Room
Last updated: 2025/06/24 at 9:47 PM
News Room Published 24 June 2025
Share
SHARE

To successfully tackle artificial intelligence (AI) workloads is not just about throwing compute and storage resources at it. Sure, you need enough processing power and the storage to supply it with data at the correct rate, but before any such operations can achieve success, it’s critical to ensure the quality of data used in AI training.

That’s the core message from Par Botes, vice-president of AI infrastructure at Pure Storage, whom we caught up with last week at the company’s Accelerate event in Las Vegas.

Botes emphasised the need for enterprises tackling AI to capture, organise, prepare and align data. That’s because data can often be incomplete or inappropriate to the questions AI tries to answer. 

We talked to Botes about data engineering, data management, the use of data lakehouses and making sure datasets fit the need being addressed by AI. 

What does Pure Storage view as the key upcoming or emerging storage challenges in AI? 

I think it’s hard to create systems that solve problems using AI without having a really good way of organising data, capturing data, then preparing it and aligning it to the processing elements, the GPUs [graphics processing units], that make them access data fast enough. 

What in particular makes those challenges difficult? 

I’ll start with the most obvious one: how do I get GPUs to consume the data? The GPUs are incredibly powerful, and they drive a tremendous amount of bandwidth.

It’s hard to feed GPUs with data at the pace we consume it. That is starting to increasingly become solved, particularly at the high end. But for a regular enterprise type of company, these are new types of systems and new types of skills they have to implement. 

“As your data improves, as your insights change, your data has to change with it. Thus, your model has to evolve with it. This becomes a continuous process”

Par Botes, Pure Storage

It’s not a hard problem on the science side, it’s a hard problem in operations, because these are not muscles that have existed in enterprise for a long time. 

The next part of that problem is: How do I prepare my data? How do I gather it? How do I know where I have the correct data? How do I assess it? How do I track it? How do I apply lineage to it to see that this model is trained with this set of data? How do I know that it has a complete dataset? That’s a very hard problem. 

Is that a problem that varies between customer and workload? Because I can imagine one might know, just by the expertise that resides within an organisation, that you have all the data you need. Or, in another situation, it might be unclear whether you do or not.

It’s pretty hard to know, without reasoning about [whether] you have all the data you need. I’ll give you an example.

I spent many years building a self-driving car – perception networks, driving systems – but frequently, we found the car didn’t perform as well in some conditions.

The road turned left and slightly uphill, with other cars around it. We then realised we didn’t have enough training data. So, having a principled way of reasoning about the data, reasoning about completeness, reasoning about the range [of data], and to have all the data for that, and analysing it mathematically, is not a discipline that’s super common outside of high-end training companies.

Having looked at the issues that tend to arise, the difficulties that can arise with AI workloads, how would you say that customers can begin to mitigate those? 

The general approach I recommend is to think about your data engineering processes. So, we partner with data engineering companies that do things like lakehouses.

Think about: How do I apply a lakehouse to my incoming data? How do I use my lakehouse to clean it and prepare it? In some cases, maybe even transform it and make it ready for the training system. I will start by thinking about the data engineering discipline in my company and how do I prepare that to be ready for AI? 

What does data engineering consist of if you drill down into it? 

Data engineering generally consists of how do I get access to other datasets that can exist in corporate databases, in structured systems, or in other systems we have, and how do I get access to that? How do I ingest that into an intermediate form that I lakehouse? And how do I then transform that and select data from those sets that might be across different repositories to create a dataset that represents the data I want to train against.

That’s the discipline we typically call data engineering. And it’s becoming a very distinct skill and a very distinct discipline. 

When it comes to storage, how do customers support data lakehouses with storage? In what forms?

Today, what’s common is you have the cloud companies, which provide the data lakehouses, and for the on-prem, we have the system houses.

We work with several of them. We provide complete solutions that include data lakehouse vendors. And we partner with those.

And then, of course, the underlying storage that makes it perform fast and work well. And so the key components, I’d say, are the popular data lakehouse databases and the infrastructure beneath that, and then connect those over into other storage systems for the training side. 

Looking at data engineering, is it really a one-time, one-off challenge, or is it something that’s ongoing as organisations tackle AI? 

Data engineering is kind of hard to disentangle from storage. They’re not exactly the same thing, but they’re closely related. 

Once you start using AI, you want to record all new data. You want to transform it and make it part of your AI system, whether you’re using that with RAG [retrieval augmented generation] or fine-tuning, or if you are advanced, you build your own model.

You’re constantly going to increase it and make it better. As your data improves, as your insights change, your data has to change with it. Thus, your model has to evolve with it.

This becomes a continuous process. 

You have to think about a few things, such as lineage. What’s the history of this data? What originated from where? What’s consumed where? You want to think about, when people use your model or when you internally use your model. What’s the question being asked? What’s the question that comes up with it? 

And you want to store and use that for quality assurance, also for further training in the future. This becomes what we call an AI flywheel of data. The data is constantly ingested, consumed, computed, ingested, consumed, computed.

And that circle doesn’t stop. 

Is there anything else you think customers ought to be looking at? 

You should also think, what is this data really, what does the data represent? If this data represents something you observe or something you do, if you have gaps in the data, the AI will fill in those gaps. When it fills in those gaps wrongly, we call it hallucination.

The trick is to know your data well enough that you know where there are gaps. And if you have gaps, can you find ways to fill out those gaps? When you get to that level of sophistication, you’re starting to have a really impressive system to use. 

Even if you start with the very basics of using a cloud service, start by recording what you send and what you’re getting back. Because that forms the basis for your data management discipline. And when I use the term data engineering, in between data engineering and storage is this discipline called data management.

This is the organisation of data, which you want to start as early as you can. Because by the time you get ready to do something beyond just using the service, you now have the first body of data for your data engineers and for your storage.

That’s a tremendous insight that I wish everyone would consider doing really quickly. 

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article China reaffirms EV commitment with new measures amid doubts · TechNode
Next Article This Company Has Some Advice for Trump Mobile on Selling US-Made Phones (Hint: It’s Hard)
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Galaxy Unpacked 2025: Galaxy Z Fold 7, Z Flip 7, More
Mobile
Microsoft Extends Windows 10 Security Updates for One Year with New Enrollment Options
Computing
Anker Recalls 1.1 Million Power Banks for Fires and Explosions: How to Get a Free Replacement
News
Alibaba pumps $634 million into Lazada in heated competition · TechNode
Computing

You Might also Like

News

Anker Recalls 1.1 Million Power Banks for Fires and Explosions: How to Get a Free Replacement

6 Min Read
News

The Best Outdoor Speakers for 2025

22 Min Read
News

Apple is trying to undo this court decision — and wants a new judge too

4 Min Read
News

Upgrade your PC with Office 2021 and Windows 11 Pro for life for one low price

3 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?