|
|
|
|
|
|
| Christoph Helma, in silico toxicology & University of Freiburg |
|
Dr. Christoph Helma is a computational toxicologist. He has more than 10 years experience in the development and application of advanced data mining techniques for toxicological applications, predominantly for the identification of Structure-Activity Relationships. He is the developer of the lazar system for the prediction of toxic activities (www.predictive-toxicology.org/lazar/), has been an invited keynote speaker at major scientific conferences and has published more than 30 peer-reviewed articles. He was the main organizer of the Predictive Toxicology Challenge 2000-2001, editor for a special section in Bioinformatics, and editor of a book about Predictive Toxicology. He is the founder and head of "in silico toxicology", a spin-off company of the University of Freiburg.
|
|
Lazy-Structure-Activity-Relationships (lazar) for the in-silico Prediction of Chemical Carcinogenicity
Christoph Helma, in silico toxicology, Talstrasse 20, D-79102 Freiburg, Germany
During the last years data mining techniques have gained much popularity in Bio- and Chemoinformatics. This presentation gives an overview of the lazar (Lazy Structure Activity Relationships) system for the prediction of biological activities and its application for the prediction of carcinogenicity, an endpoint that is very hard to predict with existing techniques.
lazar relies on relatively few model assumptions and provides the rationales for its predictions in an understandable and traceable manner. The system is capable of discriminating reliably between trustworthy and untrustworthy predictions (e.g. for structures that fall beyond the scope of the training set) by assigning a confidence index to each prediction.
lazar predictions are based on a modified k-nearest-neighbor algorithm, that is capable of detecting activity-specific chemical similarities. For this purpose lazar determines relevant features automatically froma predefined language of features (e.g. linear fragments, rex fragments) and uses only the set of relevant features for the calculation of activity-specific chemical similarities.
Cross-validation experiments with various carcinogenicity endpoints from the Carcinogenic Potency Database (1376 compounds) indicate an overall accuracy of more than 70%, but most of the misclassified instances have very low confidence indices. This indicates that the training set contains insufficient and/or contradictory information to derive reliable predictions for these structures. Raising the confidence threshold for acceptable predictions leads to increased predictive accuracies, of course at the cost of a larger number of unpredictable compounds. With reasonable settings for the confidence threshold it is possible to achieve predictive accuracies of 80-90%, which is an indication that it is in fact possible to provide reliable carcinogenicity predictions with the lazar system.
A public interface for the lazar system is accessible from the website http://www.predictive-toxicology.org/lazar/.
|
|
|
|
|
|
|
|
|