Leveraging Multi-modal Sensing for Robotic Insertion Tasks in R&D Laboratories

Published 2 Jul 2023 in cs.RO | (2307.00671v1)

Abstract: Performing a large volume of experiments in Chemistry labs creates repetitive actions costing researchers time, automating these routines is highly desirable. Previous experiments in robotic chemistry have performed high numbers of experiments autonomously, however, these processes rely on automated machines in all stages from solid or liquid addition to analysis of the final product. In these systems every transition between machine requires the robotic chemist to pick and place glass vials, however, this is currently performed using open loop methods which require all equipment being used by the robot to be in well defined known locations. We seek to begin closing the loop in this vial handling process in a way which also fosters human-robot collaboration in the chemistry lab environment. To do this the robot must be able to detect valid placement positions for the vials it is collecting, and reliably insert them into the detected locations. We create a single modality visual method for estimating placement locations to provide a baseline before introducing two additional methods of feedback (force and tactile feedback). Our visual method uses a combination of classic computer vision methods and a CNN discriminator to detect possible insertion points, then a vial is grasped and positioned above an insertion point and the multi-modal methods guide the final insertion movements using an efficient search pattern. Through our experiments we show the baseline insertion rate of 48.78% improves to 89.55% with the addition of our "force and vision" multi-modal feedback method.

Abstract PDF HTML Upgrade to Chat

References (22)

Citations (3)

View on Semantic Scholar

Summary

The paper presents multi-modal sensing methods that boost vial insertion success from 48.78% with vision-only to 89.55% with force feedback.
It employs a dual-stage approach combining the circular Hough Transform and CNN filtering to improve goal position detection in lab automation.
The study compares visual-only, force-vision, and tactile-vision approaches, highlighting improved error recovery and enhanced safety in robotic systems.

This paper explores the use of multi-modal sensory feedback to improve the reliability of robotic vial insertion, a common task in R&D laboratories. The authors compare a single-modality vision-based approach with two multi-modal approaches: vision and force feedback, and vision and tactile feedback. The experimental results demonstrate that the multi-modal approaches significantly improve the success rate of vial insertion compared to the visual baseline. The most successful method, combining visual and force feedback, achieves an 89.55% success rate, compared to 48.78% for the vision-only method. The paper highlights the potential of multi-modal sensing to enhance the robustness and safety of robotic systems in laboratory automation.

Goal Position Detection and Filtering

The vial insertion task is divided into two sub-problems: goal position detection and vial insertion. For goal position detection, the authors employ a combination of the circular Hough Transform (CHT) and a CNN classifier. The CHT is used to generate a large number of possible vial placement locations from a top-down image of the workspace. To address the issue of oblique rack views, the CHT detection parameters are set to be overly sensitive, which leads to the detection of many false positives. A CNN classifier is then used to filter these candidates, identifying vacant locations belonging to the target rack. Input data for the CNN is created by scaling the candidate's corresponding radius by a margin factor to 110% of the original and cropping this region from the image, centered on the candidate location. The network then predicts if the cropped image belongs to the rack and, if so, whether the rack slot is occupied. The target is selected from the filtered candidates based on the highest CNN classification score for "In Rack" and "Unoccupied."

Figure 1: Sequential actions of the robot to perform vial insertion, showing the robot detecting the rack, collecting the vial, positioning it above the target, and inserting it.

The second sub-task involves grasping a vial and inserting it into the detected location using one of three modalities:

Visual Baseline: This method involves capturing a second image after moving the camera closer to the rack. The image processing from the previous stage is repeated to refine the insertion point estimate. The vial is then aligned with the revised insertion point and moved down the z-axis until it is below the rack height, at which point the gripper release command is sent.
Force and Visual Feedback: This method utilizes the robotic arm's internal force sensors to detect contact between the vial and the rack. A FIFO buffer is used to record the static force experienced by the sensor. During insertion, any deviation of more than 20% from the recorded static value causes the robot to stop and assess the vial's state. The position of the vial at the tip of the gripper is then used to determine if the vial has impacted the top surface of the rack. If contact is detected, a search algorithm is initiated.
Tactile and Visual Feedback: This method uses a pair of DIGIT visual tactile sensors to provide feedback during insertion. Reference tactile images are captured with the gripper open and not in contact with any object. During the insertion attempt, the absolute difference between the current tactile image and each reference image is calculated. Contact regions are extracted from the resulting difference image, and their center points are used to track the vial's position. Deviation from a neutral state triggers the robot to stop and assess the vial's state.

Search Algorithm

Figure 2: A workflow diagram illustrating the three modalities used: visual, force and vision, and tactile and vision, emphasizing the different methods for assessing vial placement.

A key component of the multi-modal approaches is a search algorithm that allows the robot to recover from failed placement attempts. This algorithm leverages the knowledge of the initial target location and attempts to detect neighboring regions, disregarding occupancy. A bounding box is fitted around the selected target, and the search is conducted within this box. The search creates an envelope around the initial placement position at a distance represented by the search spacing. Trial locations are generated within this envelope, and the robot attempts to insert the vial into these locations. If the vial fails to be inserted after searching the current envelope, it is expanded by the search spacing. This process is repeated until the vial is successfully inserted or the search region is entirely beyond the bounds.

Experimental Results and Analysis

The experimental results, summarized in Table 1, demonstrate the effectiveness of the multi-modal approaches. The visual baseline achieved a success rate of 48.78%, while the force and visual feedback method achieved 89.55%, and the tactile and visual feedback method achieved 69.81%. The force and visual feedback method also exhibited a lower average runtime compared to the visual baseline, despite requiring multiple placement attempts.

The decrease in first-time placement accuracy for the force and visual feedback method, compared to the visual baseline, highlights the effectiveness of the second imaging step in the visual-only modality. However, the multi-modal methods demonstrate a greater capacity for error recovery, as shown in Figure 3, enabling them to achieve higher overall success rates.

Figure 3: A bar chart illustrating the distribution of successful placement attempts by the number of attempts required for the multi-modal methods.

The authors note that the tactile feedback method, while improving upon the visual baseline, exhibits a lower success rate than the force feedback method. This is attributed to the Digit sensor's lower gripping force and the smoother contact surface of its silicon membrane, which can lead to the vial moving within the gripper during failed placement attempts. Despite this limitation, the tactile feedback method offers improved safety due to the lower forces exerted on the vial and surrounding surfaces, along with greater insight into the vial's orientation.

Figure 4: A graph illustrating the cumulative probability of successful vial insertion across multiple attempts, showing that the multi-modal methods eventually surpass the visual baseline.

Conclusion and Future Directions

The authors conclude that multi-modal sensing offers a promising approach for improving the reliability and safety of robotic vial handling in laboratory automation. The combination of visual and force feedback demonstrates the highest placement reliability, while the inclusion of tactile sensors provides additional insights into the grasped object's orientation, enhancing safety during movement planning. The authors propose several avenues for future research, including exploring the combination of all three modalities and employing machine learning techniques to further improve placement success rates. They also suggest improving the search algorithm by dynamically adjusting the search spacing and accounting for previous search results to compensate for calibration errors.

Markdown Report Issue