They say that you can’t teach an old dog new tricks. However, in the case of Spot, Boston Dynamics’ dog-inspired quadruped robot, it is learning a vital skill: to speak. What’s more, it’s doing so with the aid of ChatGPT and other AI language models.
See what makes Spot tick
Already a smart dog bot in its own right, Spot, who was created by the Waltham, Massachusetts-based robotics company in 2016, boasts the abilities to run, jump, climb stairs, and even dance.
With the addition of perception sensors, which include stereo and depth cameras situated on both its gripper arm and the front section of its body, the pooch can also navigate various terrains, dodge obstacles, patrol areas, and self-right itself when it experiences a fall. Indeed, compared to Tesla’s Optimus humanoid robot, Spot is less unnerving, more personable as a loyal companion, and ideal for certain situations where they can perform tasks requiring a great deal of mobility and adaptability.
Now, the team at Boston Dynamics has gone a step further in developing and refining Spot’s communication capabilities by utilizing AI-driven tech in the form of open source large language models (LLMs).
How Boston Dynamics used LLMs to enhance Spot’s speaking skills
At their core, LLMs are a type of AI language model designed to understand and generate human-like text responses. It is trained to do so by processing countless textual data already generated by humans. Think of how you use ChatGPT: by typing a statement, question, or similar prompt in its input field, the chatbot analyzes it and quickly formulates mostly coherent, up-to-date, and informative responses.
To train Spot how to communicate out loud, Boston Dynamics integrated ChatGPT’s API along with other LLMs into the robotic pooch’s system. They then fed English documentation into the LLM, granted it access to the team’s SDK, fitted it with text-to-speech capabilities, and provided it with a map of their facility that included scripted descriptions of each room.
Physically, Spot’s cameras were configured to run through BLIP-2, a visual questioning and answering LLM that generates text based on the images that the dog “sees” before vocalizing it. It was also equipped with a microphone so it could hear prompts from one or more people.
For a human-like touch, the team updated Spot to detect humans who speak directly to it and turn towards them, opening and closing its gripper like a mouth as it talks. They also accessorized the gripper with googly eyes and other costume elements.
Spot’s tour guide debut: How did it do?
Recently, Boston Dynamics conducted a demonstration of Spot’s new communication abilities. Footage from the demo shows the four-legged bot giving team members a guided tour of their facility, all while showcasing uniquely programmed personalities ranging from a British butler to a high-energy teenager.
In one instance, when asked how it felt about its “job” as a tour guide, Spot stopped, turned towards the speaker, and replied that its employment brought “great satisfaction.” In another instance, it was able to formulate perfect haikus on the spot.
Of course, the LLM is not completely perfect. During the tour, Spot erroneously claimed that Stretch, Boston Dynamics’ logistics robot, was designed for yoga. It also did not recognize the name of Marc Raibert, the company’s chairperson, and subsequently went off the script by directing the asker to the facility’s help desk.
The use of ChatGPT shows Spot’s potential in enhancing human-robot interaction. However, there are still limitations to how AI processes and interprets information. Nevertheless, it’s a significant step forward in making bots more functional, relatable, and interactive in human environments.