Abstract
Similarity searching is an important facility in modern chemical information management systems to accede the rich information contained in currently enormous chemical repositories. Basically, given a molecular representation, a similarity measure, and a matching algorithm, the technique output returns an ordered list of dataset molecules in decreasing order of similarity with respect to a query or reference molecule specified by the user. As a consequence, researchers have put their interest in molecular representations and similarity measures performance. However, their studies have been predominantly focused in binary representations and the corresponding resemblance measures, and little work has been done taking into account other types of numerical description. Also, Machine Learning techniques have been applied for descriptor selection, though not consistently with the neighbourhood principle. These precedents, together with the need of new methods suitable for each chemical context, constitute the motivation for this work. It comprises the computational implementation, in the Java environment, and comparison of two novel measures of similarity to other proximity models established in the literature at effectively retrieving eight pharmacological datasets from Medicinal Chemistry, represented by machine learning-selected real descriptors, and some efficient matching algorithm.