Abstract
Robots still lag behind humans in their ability to generalize from limited experience, particularly when transferring learned behaviors to long-horizon tasks in unseen environments. We present the first method that enables robots to autonomously invent symbolic, relational concepts directly from a small number of raw, unsegmented, and unannotated demonstrations. From these, the robot learns logic-based world models that support zero-shot generalization to tasks of far greater complexity than those in training. Our framework achieves performance on par with hand-engineered symbolic models, while scaling to execution horizons far beyond training and handling up to 18 times more objects than seen during learning. The results demonstrate a framework for autonomously acquiring transferable symbolic abstractions from raw robot experience, contributing toward the development of interpretable, scalable, and generalizable robot planning systems.

Example of transfer achieved in this paper. (a) shows the complete set of training tasks for one of our experiment domains; (b) shows the automatically invented symbolic concepts and world models by our approach; (c) shows some of the test tasks zero-shot solved using the learned model despite significantly greater numbers and complexity. Interpretable relation names are provided for clarity, not learned by the method.

Overview of LAMP. From unlabeled, unsegmented demonstrations (a), LAMP learns relational critical region predictors (b), each defining relations between object pairs (c). These relations form abstract states, with learned actions as transitions (d), together constituting a symbolic world model learned from scratch. For new tasks with more objects or obstructions, LAMP predicts new relations to build the abstract state space and goal, then uses the model to synthesize behaviors that achieve it (f).

Empirical evaluation of LAMP. (a) Generalization achieved by our approach, with the x-axis denoting domain and the y-axis the generalization factor. The red dotted line marks the 1x generalization zone typical of imitation learning or behavior cloning. (b) Robustness comparison with Code-as-Policies and STAMP, where the x-axis shows the number of training demonstrations and the y-axis the proportion of solved test tasks; shaded regions indicate standard deviation over 10 runs. (c) Training tasks used to learn symbolic world models. (d) Test tasks used to evaluate generalization.
BibTeX
@inproceedings{shah2025reals2logic,
title={From the Real World to Logic and Back: Learning Symbolic World Models for Long-Horizon Planning},
author={Shah, Naman, and Nagpal, Jayesh, and Srivastava, Siddharth},
booktitle={Proceedings of the Conference on Robot Learning (CoRL)},
year={2025},
url={https://aair-lab.github.io/r2l-lamp}
}