Applicability domain estimation for classification QSARs on example of Ames test and CYP450 inhibition
Iurii Sushko, Sergii Novotarskyi, Robert Koerner, Ahmed Abdelaziz, Wolfram Teetz and Igor Tetko, eADMET GmbH, Germany
In QSAR research, it is of crucial importance to determine the compounds
where models give reliable predictions, that is the applicability
domain of QSAR models. One of the approaches that has been shown to
provide good results for regression QSARs is based on so called
distances to models (DMs) - special metrics that estimate the prediction
accuracy of QSAR models. This work generalizes this approach to
classification QSARs and shows the its successful application to the
prediction of mutagenicity and CYP450 inhibition potential of chemical
compounds.
For both the predicted properties, our approaches could
identify highly accurate predictions, which have the accuracy close to
that of experimental measurements (90-95%). Precisely these predictions
are most useful and should be used to substitute experimental
measurements and, therefore, save significant efforts and costs. On the
contrary, it was also possible to identify very inaccurate predictions
with the accuracy close to random (50%). The use of such predictions,
naturally, is infeasible and should be avoided. Thus, we prove that DMs
can be successfully used to estimate the prediction accuracy and,
therefore, to estimate applicability domain no only for regression, but
also for classification problems.
(presenting author: Iurii Sushko)