Robots are already demonstrating impressive capabilities in advertising videos and at trade fairs – from skilled household helpers to sensitive industrial robots. But there is still a long way to go before machines can actually understand their environment and move safely in real, everyday situations. This is intended to be paved by new software and infrastructure approaches that bring together large amounts of decentrally collected data in order to train so-called vision-language-action models (VLA).
We spoke to robotics researcher Wolfram Burgard from the Technical University of Nuremberg about what today’s systems are already capable of, how VLA models work, why mechanical gripping in particular remains an enormous challenge – and why international collaboration is crucial for the progress of robotics.
c’t: Professor Burgard, vision-language-action models are intended to make robots intelligent. How do you compare this to language models like those behind ChatGPT?
That was the excerpt from our heise-Plus article “Robotics researcher Burgard: Why humanoid robots are still in their early stages”. With a heise Plus subscription you can read the entire article.
