In silico pKa prediction
Robert Körner, Iurii Sushko, Sergii Novotarskyi and Igor Tetko, eADMET GmbH, Germany
The biopharmaceutical profile of a compound depends directly on the dissociation constants of its acidic and basic groups, commonly expressed as the negative decadic logarithm pKa of the acid dissociation constant (Ka). The acid dissociation constant (also protonation or ionization constant) Ka is an equilibrium constant defined as the ratio of the protonated and the deprotonated form of a compound; it is usually stated as pKa = − log10 Ka. The pKa value of a compound strongly influences its pharmacokinetic and biochemical properties. Its accurate es- timation is therefore of great interest in areas such as biochemistry, medicinal chemistry, pharmaceutical chemistry, and drug development. Aside from the pharmaceutical industry, it also has relevance in environmental ecotoxicology, as well as the agrochemicals and specialty chemicals industries.
In literature, a vast number of different approaches for pKa prediction can be found (Rupp et al, Comb. Chem. High Throughput Screening: submitted 2010). These approaches can be divided into two different classes. On the one hand there are direct calculations, so called ab initio methods, trying to determine the pKa value by quantum chemical or mechanical computation. On the other hand there statistical models, trained on chemical or structural descriptors. These descriptors can be, for example, of quantum chemical, semi empirical, graph topological or simple statistical nature. This type of modeling is called QSPR (Quantitative Structure Property Relationship).
In
our recent work, we develop such a QSPR model using localized molecular
descriptors to train multiple linear regression and artificial neural
networks to estimate dissociation constants (pKa). The performance of
our approach is similar to that of a semi-empirical model (Tehan et al,
QSAR & Comb. Sci. 21(5): 457–472, 473–485) based on frontier
electron theory.
How such a prediction model can be built, is shown
by an example performed with OCHEM, an online chemical database with an
environment for modeling (http://ochem.eu/). It is a publicly accessible
database for chemical compound data and predictive models. OCHEM is
built on a “wiki”-oriented structure, where users can collect and
organize chemical compounds, together with data on physico-chemical and
biological properties of these. On the other side, users get the
facility to develop, apply, and distribute predictive models. It is
unique in its combination of compound data and predictive models.
This study is partially supported by the BMBF GO-Bio project 0313883.