Existing in silico toxicology software for the OpenTox project

C. Helma <helma@in-silico.de>

Introduction

During the last year we have transfered most of our developments into a framework (tentatively called OpenTox), that can be used as the basis for the OpenTox project.

Features:

Technical details:

Programming Language
Ruby
Framework
Ruby on Rails
Database
database independent (we use sqlite3)
Version control
Git

In the list below you can find our (already available) contributions to the delivarables, maybe we can use this list also as a template for the contributions from other partners.

Work package 1: Framework design (WP Leader: IST)

Objectives:

To define the requirements and specifications of the OpenTox framework, to evaluate the implemented prototypes and to contribute to standards that are relevant for (Q)SAR model development.

Deliverables:

Month 6: Initial requirements, standards and APIs (Responsible: IST)

  1. Evaluation of common use-cases for toxicological end users, data providers, (Q)SAR model developers and algorithm developers
  2. Evaluation of current standards that are relevant for the OpenTox framework
  3. Initial specification of requirements and standards for the OpenTox framework
  4. Definition of APIs for the database interface
  5. Definition of APIs for the algorithm interface
  6. Definition of APIs for the validation interface

Month 12: Initial GUI design (Responsible: DG)

  1. Design of a user interface for toxicological risk assessors
  2. Design of a user interface for model developers

Month 24: Prototype evaluation, improved API and interface designs (Responsible: IST)

  1. Evaluation of the prototype implementation
  2. Improved design of APIs and interfaces

Month 36: Evaluation of the final implementation (Responsible: DG)

  1. Evaluation of the final implementation

Work package 2: Framework implementation (WP Leader: IDEA)

Objectives:

To provide the basic infrastructure for the project and common functionality for the other parts of the project. This will include an easy-to-use graphical user interface (GUI) for toxicological experts, that accesses (Q)SAR models provided by the consortium, a GUI for (Q)SAR model developers with facilities for data import, facilities to retrieve rationales and supporting information for (Q)SAR predictions and a plug-in system for the integration of third party programs and external model developments.

Deliverables:

Month 6: Project repository and website (Responsible: IDEA)

  1. Establishment of a common project repository with version control and project management tools (e.g. mailing lists for users and developers, bug and feature request trackers)

Month 18: Prototype framework (Responsible: ALU-FR)

  1. Implementation of a prototype framework with APIs for work packages 3 (databases), 4 (algorithms) and 5 (validation)
  2. Implementation of a prototype GUI for toxicological risk assessors
  3. Implementation of a prototype GUI for model developers

Month 24: Prototype server (Responsible: TUM)

  1. Implementation of a prototype public access server

Month 33: Final framework implementation (Responsible: IDEA)

  1. Final implementation of the framework according to WP 1 specifications after prototype evaluation
  2. Final implementation of GUIs according to WP 1 specifications after prototype evaluation with installers for the major operating systems

Work package 3: Toxicity databases (WP Leader: ISS)

Objectives:

To provide a database with data for the training and validation of toxicity (Q)SAR models. The initial database will be built upon the AMBIT database (provided by IDEA). Within this project we will populate it with data that is provided by consortium members (e.g. ISS ISSCAN, ITEM REPROTOX, ITEM REPDOSE, IDEA AMBIT, IBMC TERA, EPA DSSTOX, FDA GENREPCAR) and enrich them systematically with data from other sources. We will additionally seek to collaborate with other toxicity-related projects for unifying data storage and maintenance. The final version will have facilities to import confidential and commercial data, quality assurance procedures and algorithms for data aggregation. All public data incorporated into the OpenTox database will be available to the public.

Deliverables:

Month 6: Initial vocabularies and ontologies for toxicological data (Responsible: ISS)

  1. Definition of vocabularies and ontologies for toxicological and chemical data

Month 12: Prototype database with initial data (Responsible: IDEA)

  1. Definition of requirements for data inclusion
  2. Identification of suitable data sources
  3. Implementation of a prototype database according to the requirements from WP1
  4. Import of initial data into the prototype database

Month 21: Tools for the integration of confidential data (Responsible: IST)

  1. Implementation of tools for the integration of confidential data

Month 33: Redesigned database with additional content (Responsible: ISS)

  1. Redesign and implementation of the database according to WP1 specifications after prototype evaluation
  2. Modification of the database content according to WP 1 specifications after prototype evaluation

Work package 4: (Q)SAR algorithms (WP Leader: TUM)

Objectives:

This work package will implement a framework for the integration of state-of-the-art statistical, data mining and chemoinformatics algorithms into the OpenTox project. New algorithms will be developed and implemented according to the requests of WP1 (Framework design), WP3 (Toxicity databases) and after a weak-point analysis of currently available techniques.

Deliverables:

Month 6: Selection of algorithms for the prototype (Responsible: TUM)

Month 18: Initial prototype of (Q)SAR algorithms (Responsible: NTUA)

  1. Integration and implementation of algorithms for the generation of structural features (e.g. paths, trees, subgraphs, multiple neighborhood of atoms, pharmacophore descriptors)
  2. Integration and implementation of algorithms for the calculation of chemical properties (e.g. logP, surface parameters, reactivity indices)
  3. Implementation of algorithms for the retrieval of bioassay data from sources like PubChem
  4. Integration and implementation of algorithms for feature selection (e.g. statistical filters, closed sets, principal component analysis)
  5. Integration and implementation of algorithms for classification and regression (e.g. k-nearest neighbors, linear regression, neural nets, support vector machines, decision and regression trees)
  6. Integration and implementation of algorithms for the aggregation of predictions (rule based and data driven)
  7. Integration and implementation of supporting algorithms (e.g. applicability domain estimation, various measures of chemical similarity, structure and property based searches)
  8. Implementation and a plugin system for external (commercial) programs These tasks will run in parallel and the progress will be monitored by WP1 (Framework design). Priorities for the implementation and development of new algorithms will be set in collaboration with WP1.

Month 33: Final version of (Q)SAR algorithms (Responsible: TUM)

Work package 5: (Q)SAR model validation (WP Leader: ALU-FR)

Objectives:

To provide tools for the unbiased evaluation of (Q)SAR models, regardless of the underlying algorithms. The automated creation of validation reports that are compliant with international standards (e.g. OECD guidelines, ECB QSAR model reporting format) and facilities for the toxicological interpretation of validation results will be provided for an independent external review of (Q)SAR models. Facilities for validation against confidential data will be provided for the same purpose. We will start with existing validation routines from project members (e.g. IST lazar, IDEA AMBIT, NTUA Y-scrambling) and sequentially add features that are requested by WP1 (Framework design).

Deliverables:

Month 18: Prototype validation routines (Responsible: ALU-FR)

  1. Implementation of validation methods based on artificial test sets (e.g. crossvalidation, leave-one-out, simple training/test set splits)

Month 24: Report generation facilities (Responsible: IBMC)

  1. Implementation of standards-compliant validation report generation facilities

Month 30: Validation facilities for confidential data (Responsible: NTUA)

  1. Implementation of validation techniques for confidential data

Month 33: Final implementation of validation routines (Responsible: ALUFR)

  1. Revision of the implemented techniques according to WP1 evaluation