Over two years have passed since OpenAI released ChatGPT, opening a technological ‘arms race’ between creators of various generative Artificial Intelligence (AI) solutions. Mostly based on advancements in foundational machine learning (ML) models that can perform various creative tasks — from semantic understanding and content generation to creating high-quality images — these technologies rocketed their providers among the most valuable private business entities.
Despite commercial success and wide adoption, the question of finding the best way to train these models — the legal, moral, and technical issues of data — remains an elephant in the room. Some AI developers are painfully tiptoeing around data privacy and ownership challenges, while others (especially the big, powerful firms) simply ignore these issues, prioritizing “innovation.”
Recently, many AI experts started to talk about federated learning, edge AI, and local AI as feasible alternatives for solving the sensitive data issues. However, these approaches have their own risks, related to the very problems they should solve: technological complexity and data quality, privacy, and security.
Leaving the data where it belongs
Federated learning is a distributed (decentralized) ML technique that enables training models by moving the training process to where the data is, instead of collecting and moving the data to the place of training (the central server). The drill is straightforward: a developer initializes the model parameters on the central server and sends them to connected client nodes (other servers, edge devices, smart consumer devices, etc.), where the global model is trained using local data.
After training is completed, client nodes return updated parameters back to the central server, where they are combined by averaging the results — a process called “aggregation”. If there were any traces of identifiable personal information, they should be lost in the process. Communication between nodes is carried out through special encryption, which adds another layer of security.
Sensitive data (be it private data or commercial secrets) is protected by various regulations across countries, sometimes making it impossible to move that data from one place to another if the company wants to train its ML models in a centralized, traditional way. As such, federated ML promises to solve the most pressing data issues — the difficulties in using private data securely and moving any sensitive information from one legal regime to another.
Another benefit comes from cost savings. In the case of traditional centralized ML training, the data volume and resulting storage costs might be excruciating for smaller AI developers — just think about the mass of data collected by edge devices, like cameras and sensors. Most of this data wouldn’t even be useful. Thus, federated learning brings certain development costs down and allows for the use of more diverse data, which might result in better model accuracy.
The use cases for federated learning are easy to imagine. Decentralized ML can help train an AI system used for medical diagnosis, which needs to combine sensitive healthcare records from different institutions or countries. Or, it can benefit an international bank training a fraud detection model on data aggregated by its branch offices. However, the most immediate and beneficial effect of using federated learning might lie in empowering further advancements in the field of local (on-device) AI.
Conceptual mumbo-jumbo
Federated learning, local AI , and edge AI are related concepts that have some nuanced differences, so for the start, it is necessary to clear up which means what. Federated learning is a decentralized approach to ML, which doesn’t involve sharing raw data. The ML model is trained collaboratively on distributed datasets — the training nodes may be different devices or different servers.
Edge AI runs directly on edge servers or devices, such as IoT, industrial machines, and AVs, without the necessary connection to a larger group of cloud servers, thus shrinking direct computational costs. The essence of edge AI is taking real-time decisions that are inferred from the device itself, without collaborative learning and data sharing. In some cases, the model can be pre-trained using federated learning techniques, but this is also not necessary. Transport, logistics, defense (think autonomous drones), and maintenance are the primary industries benefiting from edge AI applications.
Finally, local (on-device) AI is a middle-ground concept between the two. Local AI summarizes any system that doesn’t depend on external servers — it may be an edge device, a computer, or a private server. It doesn’t necessarily take real-time decisions, and it can work both offline and online. The use cases are diverse — examples range from Apple Neural Engine, which is built into Apple’s advanced smart chips to perform tasks like Face ID, image enhancement, and Siri suggestions, to Meta’s LLaMA architecture, which has versions optimized for running on local computers.
Running algorithms directly on devices helps maintain real-time inference without lags and preserves data privacy. Moreover, local AI can be developed using federated learning techniques, and the combination of both might actually bring immense benefits, both in terms of efficiency and data privacy.
Bringing AI to our devices: the pros
Both local and edge AI are primarily products of ubiquitous computing, a technology that allows everyday objects to perform computationally advanced tasks by exploiting high-quality sensors and microprocessors with high computational power. The growth of on-device CPU/GPU capabilities opened up the possibility of running AI algorithms locally. Leading tech companies quickly grasped that this is the next pot of gold that offers them the opportunity to bring AI closer to everyday life and commercialize AI products more easily. Moreover, it became clear that these small devices possess an enormous amount of valuable data.
However, if trained or finetuned only on local data, on-device AI would become very limited at some point, which happens for a few reasons. First, local models are still less powerful due to hardware limits, and that often affects accuracy and usability. Second, since the data local AI uses is not shared, it is very limited in scope, impacting the model’s context window. These major shortcomings can be alleviated by combining the benefits of local AI with federated machine learning approaches.
Imagine mobile apps designed to help people alleviate anxiety, make personal investment decisions, get health advice, or simply learn to play chess. Any of these functions could be made way more advanced and useful by constantly combining data from thousands or even millions of other local devices. The raw data itself wouldn’t be shared, but the AI model could get timely updates based on the experience of multiple users.
To sum up, training and finetuning AI models locally and using federated learning to combine data from different sources (other users’ devices or data collected from the web) without compromising its sensitive nature or moving it from one legal regime to another can become a major leap forward from the data issues that haunt AI developers. However, these approaches bring their own data safety and security concerns that must be kept in mind.
Bringing AI to our devices: the cons
The risk of leaking parameters, combined with attacks on client nodes by malicious actors (sometimes using malicious nodes to corrupt the entire model), is probably the worst risk that federated learning entails. Therefore, federated learning still requires robust encryption. Also, there are highly technical and yet unsolved questions of the best “averaging” techniques that would efficiently conceal all sensitive information coming from local devices without compromising model accuracy.
Furthermore, while solving the issue of high computational costs, federated learning has a huge node coordination and communication problem. Varying data distributions and quality across nodes can affect the global model’s performance and reliability. Thus, there is a complicated question of how to sift “good” data and mitigate the influence of poor inputs. It is especially important in the case of local AI, where decision-making often happens in real time.
Last but not least, federated learning techniques do not mitigate the risk of malicious outputs or data bias. Actually, many separate learning processes in different nodes can exacerbate it, as those localized processes are virtually a black box. As of today, there is no single viable solution to this issue, and the question of how to distribute responsibility along the AI value chain remains a legal grey zone.
Despite considerable benefits, employing federated learning techniques for AI development in general and local AI in particular poses several potential risks and challenges, ranging from model complexity and data heterogeneity to the same old data privacy, which can be violated either accidentally (by leaking model parameters) or by launching attacks on client nodes and poisoning the global model.
However, these challenges should by no means stop the AI industry from using federated learning techniques. As of today, there is no better way to comply with data regulations, except for one — not using sensitive data at all. This is utopian (or even dystopian) since it means that vital areas of AI development, such as healthcare and financial services, will have no way forward for years to come.
Currently, many providers try to circumvent the problem by using synthetic data. However, where critical aspects — health and security — are at stake, data quality must be the number one priority when developing artificial brains.