Sections
You are here: Home » Meet » OpenTox 2011 » Abstracts » Dependence between Models and Tests in Predictive Toxicology

Dependence between Models and Tests in Predictive Toxicology

Tom Aldenberg, RIVM, Netherlands

The registration of chemicals in the EU aims at reduced animal (in vivo) testing and will increasingly rely on substitute information, i.e. in vitro tests, in silico models, and chemical information. The chemical's toxicological properties ("endpoints") have to be estimated from Integrated Testing Strategies (ITSs), involving multiple sets of predictor information from various sources. To build these ITSs, we consider contingency tables of rows of chemical data and columns of predictor covariates, as well as the response endpoint to be predicted. We focus on training data of binary predictors and response, whether they are tests, model/ rule-based results, or chemical properties. The predictors cannot be assumed to be independent, as would be required to build a naive Bayes classifier. The modern statistical approach is to apply so-called loglinear models to parameterize the interactions between different predictors. These models may range from fitting a constant to the table, up to the full 'saturated' model with interaction terms fitting the table exactly. However, several sampling, or data generation, models are possible for these contingency tables. We will consider Poisson, Multinomial, and Product-Multinomial data models, which allow Generalized Linear Model (GLM) regression techniques to be employed. A variety of model selection techniques assist in deciding the interaction order of the model(s), in order to find a balance between adequate description and over-fitting. Recently, new kinds of regression model have surfaced in the analysis of genetic variation: Logic Regression and Logic Forest. We investigate how logical expressions, involving Boolean operators, can add to, or replace, the more classical multiplicative interaction terms and improve the model identification and interpretation of the fits. Having modeled the individual cell probabilities of the chemicals versus predictors matrix, we can use these numbers to feed expressions for the entropy and mutual information measures to characterize the dependence structure of the ITS model.

Document Actions