3rd Workshop on
Semantic Perception, Mapping and Exploration (SPME)
Karlsruhe, Germany, May 5th, 2013.
The 3rd Workshop on Semantic Perception, Mapping and Exploration (SPME) will take place on Sunday 5th at Kongresszentrum Karlsruhe (one day before ICRA 2013 at the same venue) as an independent event in conjunction with ICRA.
The workshop takes place in the "Thoma-Saal" (basement of Stadthalle/Kongresszentrum).
Motivation and goals
As robots and autonomous systems move away from laboratory setups towards complex real-world scenarios, both the perception capabilities of these systems and their abilities to acquire and model semantic information must become more powerful. The autonomous acquisition of information, the extraction of semantic models, and exploration strategies for deciding where and how to acquire the most relevant information pertinent to a specific semantic model are the research foci of an annual series of workshops, called Semantic Perception, Mapping and Exploration (SPME).
Semantic perception has long been seen as one of the most challenging aspects in developing complex intelligent systems that need to interact with the world. Within this aspect, the capability of recognizing objects in an unknown environment, i.e., semantic object perception, has recently become the center of attention of numerous research efforts, motivated on the one hand by the frequency with which this problem typically occurs in common applications, and on the other by the desire of providing current robotic solutions with reliable perception systems. Its current challenges encompass not only improving the robustness of semantic object perception technology, but also developing effective solutions capable of running in real-time on mobile, power-limited architectures.
The goal of the third edition of this workshop is to provide a snapshot of the state of the art in semantic object perception algorithms and technologies. It will touch on both the academic and industrial side, offering the opportunity for interested applicants to engage in dialog and interaction.
- March 29, 2013 (EXTENDED) - Paper Submission Due (DEADLINE PASSED)
- April 22, 2013 - Notification of Acceptance
- April 30, 2013 - Final Papers Due
- May 5, 2013 - Workshop at ICRA
9:00-9:10: Welcome and introduction of this year's workshop topics by the organizers
Morning session #1: “Perception And Manipulation” (chair: Aitor Aldoma)
|9:10-9:45 ||“A Geometry-Based Approach for Learning from Demonstrations for Manipulation”|
Pieter Abbeel (invited talk)
|9:45-10:20||"Joint 3D Reconstruction and Class Segmentation" (talk)|
M. Pollefeys (invited talk)
Morning session #2: “Segmentation and Understanding” (chair: Dirk Holz)
Afternoon session #1: “Object Recognition and Modelling” (chair: Federico Tombari)
Afternoon session #2: “Semantic Perception” (chair: Andrzej Pronobis)
For all interested audience, we remind that participation to the workshop does not require registration for ICRA or the SPME workshop. No registration is required to attend the workshop ("bring your own badge"). The ICRA registration desk will already be open.
We solicit paper submissions
, optionally accompanied by a video
, both of which will be reviewed (single-blind) by the program committee. The review criteria will be: technical quality, significance of system demonstration and timeliness. We aim to accept 9 to 12 papers for oral presentation
at the meeting. Papers should be up to 6 pages in length, and formatted according to the IEEE ICRA style (please see www.icra2013.org/?page_id=99
) . Videos will be shown during an afternoon video session
open to the public. Accepted papers and videos will be assembled into proceedings that are going to be published online. In addition, we will pursue publication of a special journal issue to include the best papers.
This edition of the annual workshop series focuses on (3D) semantic object perception. Topics of interest include, but are not necessarily limited to:
- 3D/RGB-D object recognition in clutter and occlusions
- 3D/RGB-D object categorization
- Object perception for robotics manipulation
- 3D semantic segmentation and semantic scene interpretation
- 3D/RGB-D registration and alignment
- 3D/RGB-D object modeling
- 3D/RGB-D object tracking
The workshop will feature invited talks from key researchers in the field:
- Pieter Abbeel, UC Berkeley, USA
- Darius Burschka, TU Muenchen, Germany
- Dieter Fox, University of Washington, USA
- Daniel Munoz, Carnegie Mellon University, USA
- Marc Pollefeys, ETH Zurich, Switzerland
||Pieter Abbeel received a BS/MS in Electrical Engineering from KU
Leuven (Belgium) and received his Ph.D. degree in Computer Science
from Stanford University in 2008. He joined the faculty at UC Berkeley
in Fall 2008, with an appointment in the Department of Electrical
Engineering and Computer Sciences. He has won various awards,
including best paper awards at ICML and ICRA, the Sloan Fellowship,
the Air Force Office of Scientific Research Young Investigator Program
(AFOSR-YIP) award, the Okawa Foundation award, the 2011's TR35, the
IEEE Robotics and Automation Society (RAS) Early Career Award, and the
Dick Volz Best U.S. Ph.D. Thesis in Robotics and Automation Award. He
has developed apprenticeship learning algorithms which have enabled
advanced helicopter aerobatics, including maneuvers such as tic-tocs,
chaos and auto-rotation, which only exceptional human pilots can
perform. His group has also enabled the first end-to-end completion of
reliably picking up a crumpled laundry article and folding it. His
work has been featured in many popular press outlets, including BBC,
New York Times, MIT Technology Review, Discovery Channel, SmartPlanet
and Wired. His current research focuses on robotics and machine
learning with a particular focus on challenges in personal robotics,
surgical robotics and connectomics.
A Geometry-Based Approach for Learning from Demonstrations for Manipulation
I will present a new approach for robots to learn to perform
challenging manipulation tasks from demonstrations. Our method is
able to adapt a demonstrated trajectory to a new situation in which
the manipulated objects and their surrounding environment have
different shapes, sizes, and poses.
At the core of our approach is a non-rigid registration between the
demonstration scene and the new scene. While registration is only
concerned with the objects and their environment, perhaps
surprisingly, we show that it is possible to meaningfully extrapolate
to the entire space for our particular choice of non-rigid
registration. This in turn enables using the extrapolated
registration to transform (=generalize) the robot tools' pose
trajectories from the demonstration scene to the new scene.
Our experiments show that our approach enables rapidly teaching a
robot a wide variety of manipulations tasks---which in our experience
can otherwise easily take days (if not weeks) to program. In a
second set of experiments we show that our approach enables autonomous
knot-tying for a wide range of knot-types and starting configurations,
well beyond the prior state of the art.
||Darius Burschka received his PhD degree in Electrical and Computer Engineering in 1998 from the Technische Universität Müchen in the field of vision-based navigation and map generation with binocular stereo systems. In 1999, he was a Postdoctoral Associate at Yale University, New Haven, Connecticut,
where he worked on laser-based map generation and landmark selection
from video images for vision-based navigation systems. From 1999 to
2003, he was an Associate Research Scientist at the Johns Hopkins
University, Baltimore, Maryland. Later 2003 to 2005, he was an
Assistant Research Professor in Computer Science at the Johns Hopkins
University. Currently, he is an Associate Professor in Computer
Science at the Technische Universität München, Germany, where he
heads the computer vision and perception group. He was an area
coordinator in the DFG Cluster of Excellence ``Cognition in Technical
Systems'' and currently since 2005 heads also a virtual institute for
"Telerobotics and Sensor Data Fusion" between the German Aerospace
Agency (DLR) and the Technische Universität München. His areas of research are sensor systems for mobile and medical robots
and human computer interfaces. The focus of his research is on
vision-based navigation and three-dimensional reconstruction from
sensor data. Dr. Burschka has been a member of IEEE since 1999.
Semantic Perception for Semi-Autonomous Teleoperation Tasks
Three-dimensional perception is essential for successful interaction
with the environment. While the plain 3D information gives us
information about the structure of the surrounding environment and helps
to prevent collisions, it is not sufficient for reasoning about the
actions in the environment. Semantic perception allows to enrich the
information about the 3D structure with the knowledge about its complete
structure, function in the local environment, and ways how to interact
We present our current work on semantic perception in the context of
semi-autonomous tele-manipulation, where the knowledge about the objects
in the world helps to compensate for the significant time delays in the
transmission of the control signals and helps the system to relate the
gestures of the tele-operator to interaction intentions in the local
environment. These are used to complete the time-critical parts of the
manipulation task. We will present our current work in the area of
tele-manipulation which allowed a successful coupling of a
tele-manipulation console at the Johns Hopkins University in Baltimore,
USA with a manipulation platform at the German Aerospace Center (DLR) in
Oberpfaffenhofen, Germany using a very low-bandwidth network for
Dieter Fox is an Associate Professor in the Department of Computer Science & Engineering at the University of Washington, where he heads the UW Robotics and State Estimation Lab. From 2009 to 2011, he was also Director of the Intel Research Labs Seattle. He currently serves as the academic PI of the Intel Science and Technology Center for Pervasive Computing hosted at UW. Dieter obtained his Ph.D. from the University of Bonn, Germany. Fox's research is in robotics and artificial intelligence, with a focus on state estimation, perception, and activity recognition. He has published over 150 technical papers and is co-author of the text book "Probabilistic Robotics". He is a fellow of the AAAI and received several best paper awards at major robotics and AI conferences. He is also an editor of the IEEE Transactions on Robotics, serves on the advisory board of the Journal of Artificial Intelligence Research (JAIR), and was program co-chair of the 2008 AAAI Conference on Artificial Intelligence. He currently serves as the program chair of the 2013 Robotics: Science and Systems conference.
Hierarchical Sparse Coding for Object Recognition and Reconstruction
Good features are crucial for successful object recognition. The combination of color and depth information provided by RGB-D cameras motivates the development of new features that can take full advantage of the rich information provided by such cameras. In this talk, I will discuss our recent work on learning features for object recognition. Our hierarchical matching pursuit (HMP) framework uses sparse coding to learn features from raw, unlabeled RGB-D data. In HMP, sparse codes over small patches are accumulated via a pooling operation, followed by a second layer of sparse coding over the pooled feature vectors. The resulting image level features achieve excellent object recognition results on several image and RGB-D classification tasks. I will also present recent work on sparse coding for compression of 3D maps.
Daniel Munoz is a Ph.D. candidate in the Robotics Institute at Carnegie Mellon University. His research interests span the areas of computer vision, machine learning, and robotics for enabling autonomous systems to perceive, reason, and operate within novel environments. Daniel received his M.S. in Robotics (2009) and B.S. in Electrical and Computer Engineering (2007) from Carnegie Mellon University. He is the recipient of an ICRA Best Vision Paper Award Finalist (2011), QinetiQ North America Robotics Fellowship (2009) and a Siebel Scholarship (2008).
Inference Machines: Parsing Scenes via Iterated Predictions
Extracting a rich representation of the environment is critical for many autonomous tasks such as path planning, mapping, and object tracking. Achieving this representation not only requires recognizing individual objects but also understanding the contextual relations among the objects, leading to a full understanding of the scene. Within computer vision, there is a common belief that sophisticated representations and energy functions are necessary to achieve high performance predictions. Unfortunately, performing exact inference over these expressive models is often intractable, and the combination of approximate inference and learning is not well understood. Instead, we consider approximate inference as a procedure: we can view an iterative inference algorithm, such as belief propagation on a random field, as a network of computational modules taking in observations and other local computations on the graph (messages). We can then iteratively train each of these modules to output ideal intermediate messages, culminating in a holistic interpretation of the scene. We demonstrate that this iterative decoding approach not only achieves state-of-the-art classification performance on a variety of image and 3-D point cloud datasets, but it is also extremely efficient in practice. Finally, I will discuss our recent developments and applications under this framework.
Marc Pollefeys is a full professor in the Dept. of Computer Science of ETH Zurich since 2007 where he is the head of the Institute for Visual Computing and leads the Computer Vision and Geometry lab. He currently also remains associated with the Dept. of Computer Science of the University of North Carolina at Chapel Hill where he started as an assistant professor in 2002 and became an associate professor in 2005. Before this he was a postdoctoral researcher at the Katholieke Universiteit Leuven in Belgium, where he also received his M.S. and Ph.D. degrees in 1994 and 1999, respectively. His main area of research is computer vision, but he is also active in robotics, machine learning and computer graphics. One of his main research goals is to develop flexible approaches to capture visual representations of real world objects, scenes and events. Dr. Pollefeys has received several prizes for his research, including a Marr prize, an NSF CAREER award, a Packard Fellowship and a European Research Council Starting Grant. He is the author or co-author of more than 200 peer-reviewed publications. He will be a general chair for ECCV 2014, was a Program Co-Chair for CVPR 2009, was general chair of 3DIMPVT 2012, general/program co-chair of the 3rd Symposium on 3D Data Processing, Visualization and Transmission and has organized workshops and courses at major vision and graphics conferences and has served on the program committees of many conferences. Prof. Pollefeys has served on the Editorial Board of the IEEE Transactions on Pattern Analysis and Machine Intelligence, the International Journal of Computer Vision and Foundations and Trends in Computer Graphics and Computer Vision. Several of Prof. Pollefeys’ advisees are now professors at universities in Europe, Asia and the United States. Prof. Pollefeys is a Fellow of the IEEE.
Joint 3D Reconstruction and Class Segmentation
Both image segmentation and dense 3D modeling from images represent an intrinsically ill-posed problem. Strong regularizers are therefore required to constrain the solutions from being ’too noisy’. Unfortunately, these priors generally yield overly smooth reconstructions or segmentations in certain regions whereas they fail in other areas to constrain the solution sufficiently. In this paper we argue that image segmentation and dense 3D reconstruction contribute valuable information to each other’s task. As a consequence, we propose a rigorous mathematical framework to formulate and solve a joint segmentation and dense reconstruction problem. Image segmentations provide geometric cues about which surface orientations are more likely to appear at a certain location in space whereas a dense 3D reconstruction yields a suitable regularization for the segmentation problem by lifting the labeling from 2D images to 3D space. We show how appearance-based cues and 3D surface orientation priors can be learned from training data and subsequently used for class-specific regularization. Experimental results on several real data sets highlight the advantages of our joint formulation. Our final result is a 3D surface reconstruction of a scene segmented in semantically meaningful regions.
- Francesco Amigoni, Politecnico di Milano, Italy
- Sven Behnke, University of Bonn, Germany
- Wolfram Burgard, University of Freiburg, Germany
- Luigi Di Stefano, University of Bologna, Italy
- Tom Duckett, University of Lincoln, UK
- Jared Glover, Massachusetts Institute of Technology, USA
- Joachim Hertzberg, University of Osnabruck, Germany
- Patric Jensfelt, KTH Royal Institute of Technology, Sweden
- Jim Little, University of British Columbia, Canada
- Daniel Munoz, Carnegie Mellon University, USA
- Dejan Pangercic, Robert Bosch LLC, USA
- Alessandro Saffiotti, Orebro University, Sweden
- Alexander Stoytchev, Iowa State University, USA
- Michael Zillich, TU Vienna, Austria
- Dirk Holz, University of Bonn, Germany
- Federico Tombari, University of Bologna, Italy
- Aitor Aldoma, Vienna University of Technology, Austria
- Andreas Nuechter, Jacobs University Bremen, Germany
- Andrzej Pronobis, University of Washington, USA
- Radu Bogdan Rusu, Open Perception, USA