SOLIS: Autonomous Solubility Screening using Deep Neural Networks

Published 18 Mar 2022 in cs.CV and eess.IV | (2203.10970v1)

Abstract: Accelerating material discovery has tremendous societal and industrial impact, particularly for pharmaceuticals and clean energy production. Many experimental instruments have some degree of automation, facilitating continuous running and higher throughput. However, it is common that sample preparation is still carried out manually. This can result in researchers spending a significant amount of their time on repetitive tasks, which introduces errors and can prohibit production of statistically relevant data. Crystallisation experiments are common in many chemical fields, both for purification and in polymorph screening experiments. The initial step often involves a solubility screen of the molecule; that is, understanding whether molecular compounds have dissolved in a particular solvent. This usually can be time consuming and work intensive. Moreover, accurate knowledge of the precise solubility limit of the molecule is often not required, and simply measuring a threshold of solubility in each solvent would be sufficient. To address this, we propose a novel cascaded deep model that is inspired by how a human chemist would visually assess a sample to determine whether the solid has completely dissolved in the solution. In this paper, we design, develop, and evaluate the first fully autonomous solubility screening framework, which leverages state-of-the-art methods for image segmentation and convolutional neural networks for image classification. To realise that, we first create a dataset comprising different molecules and solvents, which is collected in a real-world chemistry laboratory. We then evaluated our method on the data recorded through an eye-in-hand camera mounted on a seven degree-of-freedom robotic manipulator, and show that our model can achieve 99.13% test accuracy across various setups.

Abstract PDF Upgrade to Chat

Citations (10)

View on Semantic Scholar

Summary

The paper presents an automated system that combines robotics and deep neural networks to replicate human solubility assessments.
It details a method using Mask R-CNN for vial detection and a CNN classifier, achieving 99.04% accuracy with a fine-tuned ResNet18 model.
The study demonstrates real-world experiments with caffeine and benzimidazole, highlighting potential improvements in material discovery and pharmaceutical development.

Autonomous Solubility Screening with Deep Learning

The paper "SOLIS: Autonomous Solubility Screening using Deep Neural Networks" (2203.10970) introduces an automated system for solubility screening, a crucial step in material discovery and pharmaceutical development. The system combines a robotic platform with a cascaded deep learning model to determine whether a solute has dissolved in a solvent, mimicking a human chemist's visual assessment. This approach aims to reduce manual labor, improve throughput, and enhance the reliability of solubility measurements in laboratory settings.

SOLIS Architecture

The SOLIS architecture (Figure 1) comprises three main stages: image acquisition, image segmentation, and solubility classification.

Figure 1: Overview of the proposed approach for determining whether a molecule dissolves in a given solvent.

A Franka Emika Panda robot, equipped with an Intel RealSense D435i camera, captures images of the solute-solvent mixture in a vial. A Mask R-CNN, pre-trained on the TransProteus dataset, identifies the vial within the image, generating a region of interest (RoI). A CNN-based image classifier then analyzes the RoI to determine if the solute has fully dissolved. The Mask R-CNN uses ResNet as a backbone. The bounding box coordinates of the glass vial, predicted by the Mask R-CNN, are used as input to the CNN.

Dataset and Experimental Setup

The authors created a novel dataset consisting of images of caffeine and benzimidazole in water, ethanol, and acetone (Figure 2).

Figure 2: An overview of the recorded dataset for benzimidazole with acetone.

The data was collected in a real-world chemistry laboratory, capturing the challenges associated with uncontrolled environments. During the experiment, the sample was continuously stirred with a magnetic stirrer to prevent the powder dropping to the bottom of the vial or sample clumping. The dataset was annotated by a human chemist. The experiments were supervised by a human chemist, who was also responsible for annotating the dataset.

The models were evaluated based on their ability to predict the state of the solution. The cross-entropy loss and accuracy were used as evaluation metrics. The experiments were conducted using PyTorch on a machine equipped with an AMD Ryzen Threadripper 3970X CPU and an NVIDIA GeForce RTX 3090 GPU.

Experimental Results and Analysis

The authors evaluated several CNN architectures for the solubility classifier, including VGG, ResNet18, InceptionV3, and DenseNet. Both fine-tuning and feature extraction strategies were employed. The results indicated that fine-tuning pre-trained models on ImageNet generally outperformed feature extraction. ResNet18 achieved the best performance, with a test accuracy of 99.04% and a cross-entropy loss of 0.0264.

Further analysis of misclassifications revealed that reflections on the vial walls and the small proportion of solution in the early stages of the experiment were the primary sources of error (Figure 3).

Figure 3: An in-depth analysis of the misclassifications across the five different folds for the ResNet18 model, which is finetuned. For each fold, we report the top five worst predictions.

The study suggests that a buffer of images could improve the system's robustness.

Implications and Future Directions

The SOLIS system offers a practical solution for automating solubility screening in materials discovery and pharmaceutical chemistry. Its ability to operate in real-world laboratory conditions without human intervention makes it a valuable tool for increasing efficiency and throughput. The modularity of the method suggests its potential to change visual assessment of samples in pharmaceutical and clean energy applications.

Future research directions include deploying the model on a robot in a closed-loop material discovery workflow and exploring the use of organic solvents. Additional refinements could facilitate a more accurate estimation of solubility.