To teach a robot to navigate a house, you need to give it either a lot of real time in many real houses or a lot of virtual time in many virtual houses. The latter is definitely the better option, and Facebook and Matterport are working together to bring thousands of virtual, interactive digital twins of real spaces to researchers and their voracious young AIs.
On Facebook’s page, the big advancement is in two parts: the new Habitat 2.0 training environment and the dataset they created for it. You may remember Habitat from a few years back; In the search for so-called “embodied AI”, i.e. AI models that interact with the real world, Facebook has put together a series of somewhat photo-realistic virtual environments in which they can navigate.
Many robots and AIs have learned things like movement and object recognition in idealized, unrealistic spaces that are more like games than reality. A real living room is something completely different from a reconstructed one. By learning to move around in what looks like reality, the knowledge of an AI is more easily transferred to real-world applications like home robotics.
But ultimately these environments were just polygonal, with minimal interaction and no real physical simulation – if a robot hits a table, it won’t fall over and spill objects all over the place. The robot could go into the kitchen, but it couldn’t open the refrigerator or pull anything out of the sink. Habitat 2.0 and the new ReplicaCAD dataset change that with increased interactivity and 3D objects instead of simply interpreted 3D surfaces.
Simulated robots in these new apartment-scale environments can roll around as they did before, but when they get to an object they can actually do something with it. For example, if it is a robot’s job to pick up a fork from the dining room table and put it in the sink, a few years ago picking up and putting down the fork would only have been accepted because it couldn’t really be simulated effectively. In the new Habitat system, the fork is physically simulated, as is the table it stands on, the sink it goes to, and so on. That makes it more computationally intensive, but also a lot more useful.
You are nowhere near the first to get to this stage, but the whole field is moving at a breakneck pace and every time a new system comes out it in some ways overtakes the others, pointing out the next major bottleneck or opportunity . In this case, Habitat 2.0’s closest competition is likely to be AI2’s ManipulaTHOR, which combines room-sized environments with the simulation of physical objects.
What Habitat has surpassed is speed: according to the paper it is described in, the simulator can run around 50-100 times faster, which means a robot can do a lot more exercise per math second. (The comparisons are by no means accurate, and the systems differ in other ways.)
The data set used for this is called ReplicaCAD and essentially consists of the original scans at room level, which were recreated with user-defined 3D models. This is a tedious manual process, Facebook admitted, and they are looking for ways to scale it, but it offers a very useful end product.
The originally scanned space above and ReplicaCAD 3D replica below.
More details and more types of physical simulation are on the roadmap – basic objects, movements, and robot presences are supported, but accuracy had to give way to speed at this stage.
Matterport is also taking some big steps in partnership with Facebook. After a huge platform expansion in recent years, the company has put together a huge collection of 3D scanned buildings. Although it has already worked with researchers, the company decided it was time to put a bigger part of its holdings available to the community.
“We have Matterported every type of physical structure that exists or is close to it. Houses, skyscrapers, hospitals, offices, cruise ships, jets, taco bells, McDonalds … and all the information that is contained in a digital twin is very important for research, ”CEO RJ Pittman told me. “We were confident that this would affect everything from computer vision to robotics to household item identification. Facebook didn’t need any persuasion … for Habitat and the embodied AI it’s right in the middle of the fairway. “
To do this, it created a dataset, HM3D, of a thousand meticulously captured 3D interiors, from the home scans that real estate browsers can detect, to businesses and public spaces. It is the largest collection of its kind that has been made publicly available.
Photo credits: Matterport
The environments, which are scanned and interpreted by an AI trained on precise digital twins, are so dimensionally accurate that, for example, exact figures can be calculated for the window area or the entire cabinet volume. It is a helpful, realistic playground for AI models, and although the resulting data set is not (yet) interactive, it very much reflects the real world in all of its variance. (It’s different from the Facebook interactive dataset, but could form the basis for an extension.)
“In particular, it is a diversified dataset,” said Pittman. “We wanted to make sure we had a rich grouping of different real world environments – you need this variety of data if you want to get the maximum benefit from training an AI or a robot.”
All of the data was voluntarily provided by the owners of the rooms, so don’t worry that it has been unethically sucked up by some fine print. Ultimately, Pittman explained, the company wants to create a larger, more parameterized data set that can be accessed through an API – basically realistic virtual rooms as a service.
“Maybe you’re building a hospitality robot for bed and breakfasts of a certain style in the US – wouldn’t it be great to get a thousand of them?” He mused. “We want to see how far we can advance with this first set of data, get those insights, then continue working with the research community and our own developers, and move on from there. That is an important starting point for us. “
Both datasets will be open and available to researchers everywhere.
Comments are closed.