Autonomous perching and grasping for micro aerial vehicles
The ability to maneuver micro aerial vehicles (MAVs) precisely relative to specific targets and to interact with the environment (i.e., aerial manipulation) could benefit society by assisting with dangerous jobs, providing useful information, and improving the efficiency of many tasks. For example, precise relative positioning would allow for close inspections of bridges, cell towers, rooftops, or water towers. Aerial manipulation could improve or enable precision farming, construction, repairing structures, transportation of objects, automated recharging or battery replacement, environmental sampling, or perching to turn off motors and reduce power consumption.
The prevalence of commercially available MAVs has risen rapidly, but platforms are currently limited to sensing and data collection tasks. Indeed, many manufacturers are producing aerial robots equipped with cameras. However, none are able to physically interact with objects. Thus, there is a need for solutions empowering aerial robots to closely track, grasp, perch on, and manipulate specific objects of interest. Here, we present an overview of current approaches and challenges for vision-based perching and aerial manipulation. A more extensive discussion is available elsewhere.1
Many existing perching and grasping methods assume that the states of the robot and target are known,2–9 which is a poor assumption and motivates the search for solutions using onboard sensors. Visual-inertial approaches are appealing because the sensors are lightweight, complement each other well, and are sufficient for navigation in unknown environments.10, 11 However, in these cases, the vehicle is controlled with respect to a fixed reference frame, not specific objects. A more appropriate approach for manipulation is visual servoing, which uses visual feedback to control a robot relative to a target object.
There is a foundational body of literature covering monocular visual servoing that discusses the differences between position-based visual servoing (PBVS) and image-based visual servoing (IBVS).12–14 With PBVS, the relative pose of the robot is estimated, and the control law is expressed in the 3D Cartesian space. With IBVS, in contrast, the control law is computed directly from features observed in the image.12 Each has its benefits. For example, PBVS systems can use common odometry filters from the MAV literature, while IBVS is more robust to calibration errors, making it appealing for low-cost, lightweight systems.
In our work,15–17 we explore the coupling between the pose of a robot and the image of a cylinder. The relationship is diffeomorphic, allowing us to relate velocities of the robot in the world frame to velocities of the image features. We can then express the dynamics of the quadrotor in terms of the image features, which can be used to develop and prove stability of an IBVS control law. In addition, we show that the image features are flat outputs of the system, enabling the application of trajectory planning methods for differentially flat systems.18–21 The image sequence in Figure 1 shows sample results.
One of the main challenges for visual servoing with aerial robotics stems from underactuation. In our work,15–17 we simplify the system by assuming that the visual frame is of fixed orientation, which is achieved by rotating the observed features using the onboard attitude estimate. This requires an accurate attitude estimate, synchronized images, and the ability to estimate the yaw using image features. Our approach results in a decoupling of the attitude dynamics from the translational dynamics in the virtual image, and it allows for planning trajectories in terms of the flat outputs in the image space. A related challenge is either to guarantee that the target will not leave the field of view or to ensure that the robot will still reach the desired relative pose even if the target is temporarily occluded or leaves the field of view.
Difficulty also arises from the fact that quadrotors are high-order systems. As a result, some control approaches assume knowledge of the velocity in the inertial frame,22, 23 which could be a crippling assumption for a lightweight system. To the best of our knowledge, there is a lack of research considering grasping of moving targets. Landing on moving targets was demonstrated,24, 25 but required some limiting assumptions. One of the key difficulties with moving targets is handling the increased complexity of the relative dynamics. Finally, we need to consider a wider variety of object geometries. Our previous work is restricted to cylindrical objects, but could potentially be generalized to any surface of revolution.26
Despite challenges for visual servoing with aerial robotics such as underactuation, high-order dynamics, and computational limitations of onboard computers, we have been able to demonstrate successful results.17 Our next steps will include modeling coupled dynamics with moving targets, consideration of occlusions and limited fields of view, and interaction with arbitrary geometries.
We gratefully acknowledge support from Army Research Laboratory grant W911NF-08-2-0004, Office of Naval Research grants N00014-07-1-0829, N00014-14-1-0510, N00014-09-1-1051, and N00014-09-1-103, and National Science Foundation grants IIP-1113830, IIS-1426840, and IIS-1138847.
Justin Thomas joined the University of Pennsylvania in 2011 as a PhD candidate in the Department of Mechanical Engineering and Applied Mechanics and as a member of the GRASP Lab under Vijay Kumar. His research interests include dynamic grasping, aerial manipulation, perching, and vision-based control using MAVs.