INTRODUCTION
Chemoinformatics (or as it is also known, cheminformatics) basically relates to the storage and manipulation of chemical data. Many of the techniques gathered together under this title pre-date its introduction, and perhaps the term was coined to explicitly signal that a similar discipline to bioinformatics exists within the realm of theoretical chemistry. The term is also heavily related to drug discovery, given that those with the largest commercial interest in systematic computational analysis of molecules are drug companies. This area is explicitly addressed elsewhere in this book (see Chapter 24) and so here we restrict ourselves to the computational representation of molecules, the derivation of useful molecular properties, application of the techniques to biological systems, and the storage and comparison of those data.
Chemoinformatics can be used on chemicals of any elemental composition, however it is most well developed for small organic molecules, i.e. those compounds that are mostly made from combinations of carbon, hydrogen, nitrogen, oxygen and sulfur, with the addition of small amounts of other elements, most notably the halogens (fluorine, chlorine, bromium, iodine) and, particularly for biological chemistry, phosphorous. While small positively charged metal ions are commonly encountered in compounds (often as counter ions to organic acids) larger metal ions are comparably rare.
Chemoinformatics calculations can address features of single molecules; however, they become invaluable when dealing with a set of molecules among which certain features need to be compared. Like most computational processes, it is of course possible to perform many of the calculations described here by hand, albeit many of these are tedious and repetitive in nature. Once automated, chemoinformatics allows the comparison of molecules to/within large datasets, and many calculations specifically involve comparison of one molecule to another. The concept of similarity is central to chemoinformatics studies, as often molecules classed as similar can be expected to exhibit similar physiological behaviours, and particularly bind to macromolecules in a similar way. This can often be done by direct comparison of the molecular structures (if they differ only by a single atom or bond); however, the ideal is to find molecules that possess similar properties, but come from distinct chemical families.
Finally, it should always be borne in mind that chemoinformatics is a branch of theoretical chemistry. A great many of the techniques used in this context rely on approximations to the exact physics occurring in the molecule.