Stepwise D-Optimal design based on latent variables

Stefan Brandmeier, Helmholtz Center Munich, Germany
Ullrika Sahlin and Tomas Öberg, Linnaeus University, Sweden
Igor Tetko, Helmoltz Center Munich, Germany

In the course of REACH, each chemical compound produced in or imported into the EU in amount of more than 1 ton has to be registered according to a number of environmental endpoints, including bioaccumulation and toxicity. Experimental determination of these properties requires a high number of animal tests. Apart from ethical reasons, animal experiments are expensive and time consuming. Therefore, the number of these tests should be kept as small as possible. This can be achieved by testing only a small representative subset of compounds, using them to build QSAR models and predict the remaining compounds.

There are several standard approaches for the selection of diverse sets of compounds for model purposes, such as factorial [1] or D-Optimal [2] design. The later method is frequently considered to be a better choice [2]. The D-optimal design selects compounds using principal component analysis (PCA) of molecular descriptors. The analysis is done in one step and does not take into account the target property. Therefore, the selected compounds may not be optimal for modelling of the given property. Moreover, most labs, e.g. because of restricted capacities, test compounds not in parallel but in a stepwise procedure. The question is whether there is a better strategy that could provide better selection of compounds by taking into consideration the target property and available data.

We introduce a stepwise Partial Least Squares D-Optimal approach (PLS-Optimal design) to iteratively refine the chemicals space for the compound selection. The new approach utilizes the D-Optimal design but instead of PCA components, it selects compounds based on the PLS latent variables. We show that models developed with compounds selected using the PLS-Optimal design have significantly higher performance compared to those selected with the traditional approach.

[1] Torbjörn Lundstedt et al. 1998, Experimental design and optimization, Chemometrics and Intelligent Laboratory Systems 42:3-40
[2] Massimo Baroni et al.. 1993, D-Optimal Designs in QSAR, Quantitative Structure-Activity Relationships 12:225–231

(presenting author: Stefan Brandmaier)

Sections

Stepwise D-Optimal design based on latent variables

Document Actions