A study on using aggregated data for effective associative learning

Multi-object tracking (MOT) is a complex system consisting of several functional components, such as detection, visualization, and association. Association is in the final stages of the MOT process and is often seen as a core issue, aimed at connecting bounding boxes with existing small tracks. The linkage module makes inferences according to physical features, motion characteristics, or both. In the community, what many solutions for association have in common is that they are trained with real-world video data. However, there are some potential problems with this practice. First, annotating trajectories in video frames requires expensive labor. This is likely to limit the size of MOT training data. Second, privacy and ethical issues limit the real-world use of data in human-centered tasks, such as tracking multiple pedestrians.

To avoid these concerns, researchers from the Australian National University and Tsinghua University investigated the use of aggregate data in the MOT. They built a 3D simulation engine, MOTX, to create videos with multiple targets, rich captions, and controllable visual elements. Such data provides an inexpensive way to obtain large-scale data with accurate labels. With MOTX, they wanted to answer two interesting questions.

The first question is whether the associative knowledge learned from aggregated data works in real-world videos. A common weakness of synthetic data includes the difference in its distribution from real-world data, especially in terms of image style. In “Form-Focused” tasks, such as redefining and segmenting, to avoid errors in a real-world test environment, models trained on synthetic data require advanced techniques. additional training techniques, such as fine tuning or domain tuning on real data. However, associative learning is different from emergent learning in regards to data requirements. According to the existing works, motion signals play an essential role in association. While it is difficult for an engine to simulate a realistic picture of appearance, it can be less difficult for motion signals, such as blockages.

This study shows that on some of the most advanced federated networks, the federated knowledge learned from aggregated data can be well-tuned for real-world scenarios without sacrificing performance. Specifically, the researchers aggregated data sets using MOTX by manually setting key parameters (e.g. camera view) close to real-world training sets. Then, when the associated networks are recently trained on such synthetic videos, they achieve similar or sometimes even better tracking accuracy than training the real data. Their resection studies of appearance and movement characteristics offered two suggestions. The first is that the difference in appearance between synthetic and real-world data is unlikely to harm associative knowledge learning. The second is that 3D tools can simulate motion signals well in bonding situations. The above findings may explain the competitiveness of aggregated data and imply that MOT derives more benefits from using aggregated data than “focused” tasks. form». This is a very early study of the role of aggregate data in MOT.

The second question is how dynamic factors affect the learning of associative knowledge. The datasets currently available are mostly from the real world, such as MOT15. While these data are beneficial for model training, having them immobilized gives us limited opportunity to understand how the system responds to changing visual elements. For example, how does the density of pedestrians in the training set affect the accuracy of the model? Can a static camera trained model be well implemented in motion camera systems?

The researchers leveraged MOTX’s powerful customization capabilities to help answer this question. They perform empirical studies on how subject and camera-related factors affect the learning of associative knowledge. Specifically, they investigated two sets of factors. The first group of factors are pedestrian-related factors, such as travel density and speed; The second is camera-related factors, including camera view and camera movement. Specifically, with the proposed MOTX2 engine, the moving elements are abstracted using system parameters, so they can easily simulate different situations by simply changing the parameters. this, such as setting the object velocity to 1m/s. Their results shed light on the relationship between factors in the training and testing data and the performance of the MOT system.

View article:

A study on using aggregated data for effective associative learning

http://doi.org/10.1007/s11633-022-1380-x

Disclaimer: AAAS and EurekAlert! is not responsible for the accuracy of newsletters posted to EurekAlert! by contributing to the organization or for the use of any information through the EurekAlert system.

#study #aggregated #data #effective #associative #learning

Deja un comentario