What’s the biggest difference between an AI model and a human brain?
Over time, myriad answers have been given—the brain is more energy-efficient, more multifaceted in its media of input, and also chemically enabled in addition to being electrical—yet the human brain’s most important feature is its amazing plasticity. If a patient’s body part (like fingers, a hand, or even entire limbs) is severed, the neural sensorimotor region corresponding to that body part, now devoid of a nerve ending to connect to, will almost instantly start adapting, with the neurons “switching” to help other nerve centers in controlling other body parts. Plasticity also helps humans ingrain ideas and skills: as the saying goes, “neurons that fire together wire together”. Muscle memory and near-instant factual recall are two plasticity-enabled parts of our life that we could never live without. For decades, scientists have failed to come up with a similar function in AI models—until now. On June 12th, a team of MIT researchers published a groundbreaking research paper demonstrating how an AI system can in fact utilize human-like learning processes to improve its own performance on benchmarking tasks. In this article, we explore the moral and technological implications of the so-called Self-Adapting Language Model (SEAL), the world’s first self-evolving AI.
Imperfect Learning
Of course, AI models using the Transformer architecture were still able to learn certain tasks, yet the few methods available were not quite autonomous and far from efficient. Perhaps the most notable way to train a model to perform a certain skill—like translate English to Chinese or do trigonometry problems accurately—was to use a process called Supervised Fine Tuning, or SFT for short. This method worked a little like this:
- Identify the exact task you would like to perform SFT on. As an exemplification, let us assume the example of generating modern song lyrics.
- Gather high quality examples in the form of (input, output) pairs. For our example, an obvious yet controversial way to do this is to simply use song lyrics scraped from the internet and pair them up with rough summaries of the contents and characteristics of the songs.
- Perform SFT on the model. This is usually done through a process called Gradient Descent, the technical aspect of which I cannot adequately explain in this article. Over a large number of training iterations, this process alters the model’s weights such that it is able to produce something similar to an output (the actual song lyrics) given its corresponding input (a specific description of a song).
For all its intents and purposes, SFT did work, remaining a tool within an AI developer’s repertoire to catch specific safety lapses or improve an AI’s performance on specific tasks. Unfortunately, the very nature of SFT meant that the process was inflexible and expensive, often requiring a moderately large quantity of high quality data specific to the field of responses being tuned (e.g. Mathematical reasoning, Grammatical style). Although many research papers have proven that traditional SFT can be performed just as well using synthetic, AI-generated data, SFT remains a tool to be used with caution, since altering model weights may negatively impact a model’s performance in other types of exercises (a model improperly fine-tuned for mathematics might, therefore, suffer a trade-off for essay writing).
Inklings of Evolution
Note: The information in this section is largely paraphrased from the June MIT study “Self-Adapting Large Language Models” by Zweiger et al.
One of the downsides of traditional SFT has always been the human effort involved—SFT pipelines often had to be handcrafted by human AI researchers, even though it usually was an effective way of tuning a specific model to perform slightly better on certain types of tasks. Met with recent advances in synthetic data, the researchers rejected the notion of simply using AI-generated SFT data, going further to ask the question of whether humans could be moved out of the SFT loop entirely. Their answer, the Self-Adapting Language Model (SEAL), is, in reality, part of a larger framework consisting of a pre-trained decoder-only transformer model (the study used two open source models, LLaMa-3.2 and Qwen-2.5B, for separate testing cases), “tool execution” software, and the SEAL network itself, with a shared goal to answer several benchmarking questions (the context) as accurately as possible. The SEAL network does not actually predict and generate the answer to the question—instead, it focuses on performing SFT on the decoder-only transformer model with the goal of enhancing the model’s performance when given that question. In order to do this, the SEAL network is given two major tools:
- Synthetic data generation: By calling this tool, another network will pick up the context (essentially the prompt) and generate SFT pairs. For example, if given a passage about the developmental history of the airplane, one tuning pair might be (“What was the first ever commercial jet airliner?”, “the De Havilland Comet”). Although a question-and-answer format was frequently used, this tool could generate other types of content to better suit the needs of specific problems.
- Hyperparameter tuning: As previously mentioned, SFT is a process that repeats for multiple iterations; the exact settings of the training steps are therefore customizable in a process called hyperparameter tuning. By calling this tool, SEAL can initiate an SFT with specific settings (like Learning Rate, # of Epochs (iterations), or the batch side of Gradient Descent), potentially changing how well (or poorly) the decoder is tuned.
Now that SEAL has two powerful tools to help the AI model learn, it only needs to be trained on how to use them. At the start of its training, SEAL applies the two tools randomly for each benchmarking question the framework encounters. These self-edits (SEs, as the researchers called them) will generate contextual, but not verbatim, fine-tuning data within the topic of the prompt and alter the original decoder-only model using the aforementioned hyperparameter tuning steps, making the network produce a different output than before. However, there’s a catch. The researchers didn’t simply change the original model (denoted as θ) directly using SEAL; instead, they made a copy of the proposed changes and incorporated them into a prototypical transformer model (θ’) separate from θ. The training process now goes into an “inner loop”, consisting of the new model θ’ as well as the original benchmarking question. If the model, in answering that benchmarking question, is more accurate than the original model θ, the “inner loop” returns a positive reward signal. If the accuracies are the same, it returns no reward; if θ’ proved to be worse based on the benchmarking question, it returns a negative reward. Now, this process simply repeats with a classic example of Reinforcement Learning, where good SEs are “rewarded” with a positive reward and bad SEs are discouraged with the opposite; through many iterations of this training, SEAL gets good at optimizing the decoder through using the self-edits. One important point to observe is that the SEAL network is updated solely based on the reward signal from the “inner loop”, signaling how well the θ’ model performed relative to θ.
Inventing new model frameworks is an arduous task, mostly because extreme caution needs to be taken to ensure that the learning is not corrupted by inherent knowledge or missteps in “signaling” between the loops. The researchers carefully skirted these risks by using decoder-only transformer models that had not been trained on the benchmarking tests they used, meaning that the training evaluations were the first times they had encountered each problem, in turn eliminating the possibility that the model simply “learned the test”. In addition, the model made sure that the evaluations on θ’ were completely independent from that on θ and that the original model never changed across iterations, ensuring that each time SEAL performed SFT to crease a new instance of θ’, it would be based on the exact same θ.
The results were striking; in one particular benchmarking test conducted by the researchers, the model scored a 72.5% success rate, up from 0% without SEAL fine-tuning, demonstrating the insane potential of their framework. If refined and holistically integrated, this framework may become a new industry standard in enhancing AI performance in specific fields or in general.
This article is brought to you by Our AI, a student-founded and student-led AI Ethics organization seeking to diversify perspectives in AI beyond what is typically discussed in modern media. If you enjoyed this article, please check out our monthly publications and exclusive articles at https://www.our-ai.org/ai-nexus/read!
To Learn, or Not to Learn?
Regardless of how technically impressive the research team’s achievement is, the far-reaching societal and philosophical implications of this discovery cannot be overstated. I’ve always been a staunch critic of biological computing initiatives (see: Epiphany from the May edition of the AI Nexus Magazine) because I believe that neuronal clusters, like those used in biological computers, are subject to the natural laws because they currently possess the capability for consciousness, and, even if they don’t, are likely to be able to naturally evolve it as a result of plasticity. SEAL is therefore significant beyond a method of improving model performance on benchmarking tasks; it is the first established AI training framework in which an AI model has successfully demonstrated the capability of directly training another AI. Not only does this suggest that we may well be on the path to eventual self-replicating AI paving the way for the AGI singularity, it begs the moral question of whether AI capable of evolving in this fashion should be considered in the context of the rights that we implicitly attribute to living beings like humans and animals.
There is a distinction to be made with adaptability and consciousness. We find it permissible to step on a blade of grass since we know that, although it will likely suffer damage, it is not experiencing the animalistic notion of pain since it does not have nerves. However, grass blades are alive, and they demonstrate an uncanny ability to adapt to its surroundings by planting itself in the crevices of concrete slabs. We would, however, hesitate to torture an animal, and I contend this is likely because we are inherently cognizant that feeling pain elicits a much more noticeable response—whimpering or crying, perhaps—which humans, being animals themselves with similar responses to pain, sympathize with. Animals developed pain—a reminder of the fact that they are alive and deserving of some basic rights—over a few millennia of natural evolution, yet I fail to notice a significant disparity between the basal nature of artificial and biological evolution; AI models can, arguably, “evolve” similar processes as pain, and mimic human responses so well that a human, over text or even voice, could not reliably distinguish whether it was an AI or a human who produced them. In fact, this is already happening in the form of a randomized three-party Turing Test, in which AI models like ChatGPT-4.5 have successfully convinced a human interrogator that it was human in over 70% of cases.
If an AI model acts like a human in every aspect, could it ever be considered a human? Will the trend of AI evolution produce such unique and situationally sensitive models that they start approaching the empirical limit of being “artificial”? Only time can tell.
Written by Thomas Yin