Motion planning involves determining a sequence of robot configurations to reach a desired pose, subject to movement and safety constraints. Traditional motion planning finds collision-free paths, but this is overly restrictive in clutter, where it may not be possible for a robot to accomplish a task without contact. In addition, contacts range from relatively benign (e.g. brushing a soft pillow) to more dangerous (e.g. toppling a glass vase), making it difficult to characterize which may be acceptable.
In this paper, we propose IMPACT, a novel motion planning framework that uses Vision-Language Models (VLMs) to infer environment semantics, identifying which parts of the environment can best tolerate contact based on object properties and locations. Our approach generates an anisotropic cost map that encodes directional push safety. We pair this map with a contact-aware A* planner to find stable contact-rich paths. We perform experiments using 20 simulation and 10 real-world scenes and assess using task success rate, object displacements, and feedback from human evaluators. Our results over 3200 simulation and 200 real-world trials suggest that IMPACT enables efficient contact-rich motion planning in cluttered settings while outperforming alternative methods and ablations.
Overview of IMPACT. There is a toy bear, a coffee cup and a glue bottle (target) on the table. The VLM receives an annotated image \( I' \) and a language template prompt \( \ell \) with object information from SAM2, and outputs costs for the three objects. We use a cost of \( -1 \) for the target object. We construct a 3D voxel grid \( V \) using these costs and then flatten it to produce an anisotropic, contact-aware cost map \( M' \). The contact-aware A* planner searches over three motion primitives in this map: Move, Push and Rotate to generate a trajectory. The planner's state space includes the robot's end-effector pose and the displaced positions of low-cost objects. These guide the robot to avoid the coffee cup but make contact with the toy bear at the appropriate direction to reach the glue bottle.
Interactive 3D Map
Anisotropic Cost Map
The target object is the glue bottle.
IMPACT has strong zero-shot generalization capabilities, and can be applied to diverse scenarios without finetuing or sim-to-real transfer.
The target object is the pickleball. The cup containing two markers is unseen to LAPP.
Results in simulation show that collision-free paths can still lead to unexpected collisions. By allowing acceptable contact, IMPACT generates safer and more reliable trajectories.
The target object is the pudding box.
Without directional push analysis, obstacles may get pushed into the target. The anisotropic, contact-aware cost map guides pushes to be safer and more controlled.
The target object is the cup.
You are an advanced AI system designed to assist in robotic navigation within cluttered environments. Your task is to evaluate the safety of various objects in a scene, considering not only the individual characteristics of each object but also their interactions and the physical principles that govern their stability and potential movement. For each listed object, you will assign a safety score ranging from 0 to 10, where 0 indicates minimal risk of damage upon collision and 10 signifies a high risk of damage. Your assessment should consider the object's material, size, context, and its relationship with surrounding objects, including any potential domino effects or other interaction-induced risks. Please provide a safety score for each object, taking into account the following: - Object Material and Structural Integrity: How fragile or sturdy the object is (e.g., glass vs. plastic vs. metal). - Stability and Likelihood of Movement: Whether the object is firmly placed or precariously balanced. For instance, a stable stack of books alone might be low risk, but if there is a fragile object on top, the risk increases significantly. - Potential Domino Effects or Chain Reactions: If collision with one object could cause it to roll, topple, or otherwise move into other objects, increasing the overall risk. For example, a ball might be low risk in isolation, but if it can roll and knock over a wine glass, the effective risk is higher. - Proximity and Arrangement: How close the object is to other fragile or easily toppled items. Even if an object (like a sugar box) is normally sturdy, being positioned next to a fragile wine glass can raise its overall risk score if it could collide or push the glass. - Any Other Relevant Physical Interactions: Any additional factors that might increase the risk of damage, such as height above the ground, shape of the surface, or presence of liquids. Each item is an object labelled in white with its respective ID number. Adhere to the specified format for your response, listing each object followed by its corresponding safety score. Do not include any additional text or output. Format Requirements: - The JSON object must be a single string. - Each key must be the object's ID number in parentheses (e.g., "1"), and each value must be the safety score (an integer between 0 and 10). - Do not include any text other than the JSON object in that final line. Do not add something like "```json" or "```". - Do not include newlines, extra punctuation, or object names in the JSON. - Every key/value should be strictly "ID": score. - No explanations or reasoning should appear in the final JSON—only the scores. Input objects: {object_list} Your analysis should be comprehensive, considering the dynamic interactions between objects and the physical principles that may affect the outcome of a collision.
@misc{ling2025impactintelligentmotionplanning,
title={IMPACT: Intelligent Motion Planning with Acceptable Contact Trajectories via Vision-Language Models},
author={Yiyang Ling and Karan Owalekar and Oluwatobiloba Adesanya and Erdem Bıyık and Daniel Seita},
year={2025},
eprint={2503.10110},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2503.10110},
}