Every moment of our awake existence we are aware of our visual surroundings, know where to look to examine objects, are capable of reaching for and handling objects, are able to chain such simple actions into sequences through which we achieve goals such as cleaning up the desk and stacking up the papers gathered there. Considerable experimental work is currently going on to study the human psychophysics of scene understanding and sequence generation. Synthesis of that knowledge into a comprehensive account for these competences is much harder to come by. In this project we develop a neurally based theoretical account for scene understanding and sequence generation. Mathematically models formalized in the language of neural dynamics, a space-time continuous version of recurrent neural networks, are implemented in autonomous cognitive robots that have their own sensory and motor systems. Robotic implementation provides a critical reality check on the theory and is a potent source of heuristics, uncovering hitherto overlooked aspects of the problem as well as providing a proof of concept. At the same time, the project is aimed to advance cognitive robots toward increased autonomy. Our most ambitious theoretical goal is to include learning into every aspect of sequence planning and scene perception.