You are here: Home » Meet » OpenTox 2011 » posterabstracts » Toxicity Prediction using Support Vector Machines and Random Forest Models based on Maximum Common Substructure Based Algorithm

Toxicity Prediction using Support Vector Machines and Random Forest Models based on Maximum Common Substructure Based Algorithm

Rama Kaalia, Surajit Ray, Om Prakash, Hitesh Patel, and Indira Ghosh, SCIS, Jawaharlal Nehru University, New Delhi, India

The major urge in the drug discovery industry today is to devise methods for in silico toxicity prediction so as to facilitate the whole drug discovery process and reduce the burden on in vivo chemical testing using poor animals. To serve this purpose developing statistically significant QSAR models. So we have developed QSAR models based on 2-d substructure profile using different machine learning algorithms and compared their performances.

Our algorithm MaxTox [1], which is based on finding 2-d maximum common substructures responsible for toxicity, is used to predict biologically toxic compounds when combined with modern machine learning techniques. MaxTox involves (i) selecting a group of molecules based on common toxicity end-point, (ii) pair-wise comparison to generate a list of MCSS, (iii) formation of toxicity endpoint specific MCSS dictionary by removing redundant MCSS, (iv) generation of fingerprints of training set and test set molecules, and (v) using fingerprint as descriptors with the toxicity end-point as the dependent variable to create predictive models.

Our work focuses on the comparison of QSAR models built using MCSS fingerprints as descriptors vs. models built using physicochemical descriptors like logP, energies, hydrophobicity and steric properties. Also we compared the performance of classification models built using two different machine learning algorithms namely Support Vector Machine (SVM) and Random Forest (RF). The prediction results from SVM classification models were compared to the results from RF models created from the same descriptors set (generated using MaxTox). To check the performance of our algorithm we have compared the results with previously published classification models that were generated using discriminant analysis on physicochemical descriptors [2]. We have taken the two datasets which were used in the previously reported study [2] for toxicity endpoints namely, mutagenicity in Salmonella typhimurium and carcinogenicity in rodents to build models by our descriptors and methods to compare the accuracy of the results. Models were built using (i) same training and test set data partition (RomualdoDatasetPartition) and (ii) new randomly shuffled data sets (NewDataPartition) to thoroughly check the model's performance.

Out of three different approaches, RF classification models built using MCS dictionary showed accuracy: 0.88 (training) and 0.71 (test) for mutagenicity dataset as compared to 0.87 (training) and 0.81 (test); 0.93 (training) and 0.69 (test) for carcinogenicity dataset as compared to 0.94 (training) and 0.7 (test) for classification models built in the published study. SVM classification models built using the same MCSS dictionary showed accuracy: 0.98 (training) and 0.75 (test) for mutagenicity dataset; 0.9 (training) and 0.73 (test) for carcinogenicity dataset. However the accuracy of SVM models is a bit higher than RF models but the sensitivity and specificity of models are fairly similar.

To conclude, MCSS based classification models built using SVM are comparatively fairer than RF models, and these models are as good in performance as any QSAR models built using physicochemical descriptors. Moreover we have made our models compliant with OECD principles and currently MaxTox is a part of OpenTox [3] which is a collaborative framework for toxicity prediction algorithms, models and datasets. Also MaxTox is running as a standalone application [4] where it provides service for generating fingerprints using MaxTox algorithm, toxicity prediction using already built models and model building for unknown set of compounds using MaxTox and SVM.

References :
1. Ghosh (2010), 18th European Symposium on Quantitative Structure-Activity Relationships, Rhodes.
2. Romualdo (2008), J. Chem. Inf. Model, 48(5), 971-980

(presenting author: Indira Ghosh)

Document Actions