Training AI Like Babies Boosts Performance

Summary: Researchers developed a human-inspired AI training method that improves object recognition by incorporating spatial data, leading to a 14.99% performance boost. This novel approach mimics how infants learn from their environment, training AI systems more efficiently. The findings could advance AI for extreme environments or space exploration.

Key Facts:

  1. The new AI training method mimics infant learning with spatial data.
  2. AI models trained with this method outperformed base models by up to 14.99%.
  3. The method was tested in virtual environments to simulate real-world learning.

Source: Penn State

A novel, human-inspired approach to training artificial intelligence (AI) systems to identify objects and navigate their surroundings could set the stage for the development of more advanced AI systems to explore extreme environments or distant worlds, according to research from an interdisciplinary team at Penn State.

In the first two years of life, children experience a somewhat narrow set of objects and faces, but with many different viewpoints and under varying lighting conditions.

This shows a baby and robot.
The researchers developed a new contrastive learning algorithm, which is a type of self-supervised learning method in which an AI system learns to detect visual patterns to identify when two images are derivations of the same base image, resulting in a positive pair. Credit: Neuroscience News

Inspired by this developmental insight, the researchers introduced a new machine learning approach that uses information about spatial position to train AI visual systems more efficiently. They found that AI models trained on the new method outperformed base models by up to 14.99%.

They reported their findings in the May issue of the journal Patterns.

“Current approaches in AI use massive sets of randomly shuffled photographs from the internet for training. In contrast, our strategy is informed by developmental psychology, which studies how children perceive the world,” said Lizhen Zhu, the lead author and doctoral candidate in the College of Information Sciences and Technology at Penn State.

The researchers developed a new contrastive learning algorithm, which is a type of self-supervised learning method in which an AI system learns to detect visual patterns to identify when two images are derivations of the same base image, resulting in a positive pair. These algorithms, however, often treat images of the same object taken from different perspectives as separate entities rather than as positive pairs.

Taking into account environmental data, including location, allows the AI system to overcome these challenges and detect positive pairs regardless of changes in camera position or rotation, lighting angle or condition and focal length, or zoom, according to the researchers.

“We hypothesize that infants’ visual learning depends on location perception. In order to generate an egocentric dataset with spatiotemporal information, we set up virtual environments in the ThreeDWorld platform, which is a high-fidelity, interactive, 3D physical simulation environment. This allowed us to manipulate and measure the location of viewing cameras as if a child was walking through a house,” Zhu added.

The scientists created three simulation environments — House14K, House100K and Apartment14K, with ‘14K’ and ‘100K’ referring to the approximate number of sample images taken in each environment. Then they ran base contrastive learning models and models with the new algorithm through the simulations three times to see how well each classified images.

The team found that models trained on their algorithm outperformed the base models on a variety of tasks. For example, on a task of recognizing the room in the virtual apartment, the augmented model performed on average at 99.35%, a 14.99% improvement over the base model.

These new datasets are available for other scientists to use in training through www.child-view.com.

“It’s always hard for models to learn in a new environment with a small amount of data. Our work represents one of the first attempts at more energy-efficient and flexible AI training using visual content,” said James Wang, distinguished professor of information sciences and technology and advisor of Zhu.

The research has implications for the future development of advanced AI systems meant to navigate and learn from new environments, according to the scientists.

“This approach would be particularly beneficial in situations where a team of autonomous robots with limited resources needs to learn how to navigate in a completely unfamiliar environment,” Wang said.

“To pave the way for future applications, we plan to refine our model to better leverage spatial information and incorporate more diverse environments.”

Collaborators from Penn State’s Department of Psychology and Department of Computer Science and Engineering also contributed to this study.

Funding: This work was supported by the U.S. National Science Foundation, as well as the Institute for Computational and Data Sciences at Penn State.

About this AI research news

Author: Francisco Tutella
Source: Penn State
Contact: Francisco Tutella – Penn State
Image: The image is credited to Neuroscience News

Original Research: Open access.
Incorporating simulated spatial context information improves the effectiveness of contrastive learning models” by Lizhen Zhu et al. Patterns


Abstract

Incorporating simulated spatial context information improves the effectiveness of contrastive learning models

Highlights

  • Created an approach that uses spatial context as a similarity signal
  • A method for constructing image datasets using an environmental-sampling agent
  • Training with contextual information improves state of the art in contrastive learning
  • Simulated data provide new forms of physically realistic augmentations

The bigger picture

Despite being trained on extensive datasets, current computer vision systems lag behind human children in learning about the visual world.

One possible reason for this discrepancy is the fact that humans actively explore their environment as embodied agents, sampling data from a stable visual world with accompanying context.

Bearing some resemblance to human childhood experience, contrastive learning is a machine-learning technique that allows learning of general features without having labeled data.

This is done by grouping together similar things or objects and separating those that are dissimilar. Contrastive learning methods can be applied to multiple tasks, for example, to train visual learning agents.

Improving these machine-learning strategies is important for the development of efficient intelligent agents, like robots or vehicles, with the ability to explore and learn from their surroundings.

Join our Newsletter
I agree to have my personal information transferred to AWeber for Neuroscience Newsletter ( more information )
Sign up to receive our recent neuroscience headlines and summaries sent to your email once a day, totally free.
We hate spam and only use your email to contact you about newsletters. You can cancel your subscription any time.