Machine Learning Methods for Peptide Toxicity Prediction

Andrzej Stanisławczyk, AGH University of Science and Technology, Poland
Marcin Król, Łukasz Proszek and Mariusz Milik, Selvita SA, Poland

Sets of about 2000 toxic and 70000 non-toxic peptides 8 to 70 amino acid long were selected from the Uniprot database, and used for training and performance tests of several machine learning methods. The peptide sequence representation was prepared using MEME and SeqCode programs, assuming reversibility of sequence patterns; this means that e.g. the pattern “ABCD” was assumed to be equivalent to the pattern “DCBA”. These motif-based properties were binary-coded; with “1” meaning that the motif is present in the given peptide sequence, and “0” in the opposite case.

We tested two variants of logistic regression models, a multilayer perceptron (MLP) neural network, and two variants of classification tree models. In these tests, best performance was obtained for the neural network model; however, authors suggest that this approach is less attractive for practical application, because of the problem with bio-chemical interpretation of the obtained rules often observed for neural network methodology. As the best compromise between classification performance and interpretability, authors propose the logistic regression model with interactions.

This work was partially supported by a grant from PEPLASER collaborative project (FP7-HEALTH-2007-B)

(presenting authors: Marcin Król and Łukasz Proszek)

Sections

Machine Learning Methods for Peptide Toxicity Prediction

Document Actions