Converting image to 3D molecule using CACTUS OSRA


OSRA (Optical Structure Recognition Application) is a free and open-source optical graph recognition program. The stand-alone version of OSRA is a command-line. OSRA converts a graphical representation of chemical structures from images, as they appear in journal articles, patent documents, textbooks, trade magazines etc., to SMILES (Simplified Molecular Input Line Entry Specification) format. In addition, the online OSRA tool converts the SMILES string to a 3D molecule in SDF (Standard Data File) file format. OSRA can recognize over 90 graphical format documents by parsing vectors through ImageMagick software. The standard file formats include BMP, GIF, ICO, JPEG, PNG, TIFF, WMF, PDF, PS, etc. The CACTUS OSRA tool parses the graphical input to a SMILES string and converts it to a 3D molecule. The video tutorial below demonstrates the conversion of a document in GIF format image to a 3D structure.

Note that any software designed for optical recognition is unlikely to be perfect, and the output produced might, and probably will, contain errors, so curation by a human knowledgeable in chemical structures is highly recommended.

Try this sample image from the US Patent Office website first: patent.gif.


