SG-RAPL: Scene Graph-Driven Reasoning for Action Planning of Humanoid Robot

D.A. Yudin1,2*, A.A. Lazarev1, E.A. Bakaeva1, A.A. Kochetkova1, A.K. Kovalev1,2, A.I. Panov1,2
1Moscow Institute of Physics and Technology, 2AIRI, Moscow
*Corresponding author: yudin.da@mipt.ru

Abstract

Recent advances in mapping and depth prediction have enabled autonomousrobots to interpret tasks in natural language. This paper presents a high-level planning algorithm for dynamic environments, allowing adaptive andoptimal control of anthropomorphic robots. Using a scene graph to detail theworld map and indicate abnormal situations, large language models trans-form natural language tasks into linear temporal logic automata for effectivere-planning. A perceptual segmentation and tracking module generates areal-time 3D scene graph, providing instance segmentation, obstacle detec-tion, and 6DoF-pose detection. The scheduling module decomposes high-level tasks into subtasks like navigation and object manipulation. Exper-iments show the approach enables efficient scheduling of complex tasks invirtual environments. This work advances autonomous, adaptable robots forapplications in healthcare, logistics, and manufacturing.

Method

The project contains two main nodes: Planner and Perception Module. The purple block represents all data coming from outside the pipeline for further processing. The arrows show the interaction of different modules and the addition of negative feedback from the environment and the robot. The Scheduler generates control instructions for the robot based on the json file generated by the Perception module. Analysis by visual information works in service mode and transmits a message to the Planner Module on request when one of five check points is reached.

SG-RAPL Scheme

Demo Pipeline

The Perception Module first processes the RBG frame, Actual State service updates environment information which used by Planner Module to decompose user text command.