Documentation/Examples: Detailed documentation and examples of use can be found at the new location of our web site which is still under development.

Training of Regression Models

Use Case: Training of Regression Models; Build a model that predicts a numeric value for a certain feature

Assumes: Dataset Services, Dataset with at least two numeric features one of which should be declared to be the predicted one.

Intended Audience: Scientists related to life sciences and toxicology, QSAR experts, People intrested in machine learning/statistics, Pharmaceutical Industry R&D and other related fields.

Input Information Needed:

URI of an existing dataset provided by a related service including at least two numeric features.
The target feature need to be declared.
Other model specific parameters are used to calibrate the algorithm that produces the model (for instance, the parameter γ in the SVM case.)

Exception Events:

Provided dataset URI does not exist
Unacceptable tuning parameters (for example γ < 0 or tolerance < 0)
Provided prediction feature is not a valid feature of the dataset
Dataset contains less than two numeric attributes
The provided prediction feature is not numeric in this dataset

Expected Result: URI of trained model

Subsequent Events: Once the model is generated the following use cases can include it:

Use the model for prediction
Validate the model using the training data or other external data

List of Model Training Services:

http://opentox.ntua.gr:3000/algorithm [Provides a list of all algorithms on the server]
http://opentox.ntua.gr:3000/algorithm/svm
http://opentox.ntua.gr:3000/algorithm/mlr

Training of Classification Models

Use Case: Training of Classification Models; Build a model that predicts a nominal value (a category) for a certain feature. Nominal are called those features that accept values in a finite set of values, not necessarily numeric

Assumes: Dataset Services, Dataset with at least one nominal feature and another numeric or nominal feature

Intended Audience: Scientists related to life sciences and toxicology, QSAR experts, People intrested in machine learning/statistics, Pharmaceutical Industry R&D and other related fields.

Input Information Needed:

URI of an existing dataset provided by the user, as described in the 'Assumes' section
The target feature need to be declared.
Other model specific parameters are used to calibrate the algorithm that produces the model (for instance, the parameter γ in the case of support vector classifiers.)

Exception Events:

Provided dataset URI does not exist
Unacceptable tuning parameters (for example γ < 0 or tolerance < 0)
Provided prediction feature is not a valid feature of (or it is not contained in) the dataset
Dataset is not valid for classification.
The provided prediction feature is not nominal in this dataset

Expected Result: URI of trained model

Subsequent Events: Once the model is generated the following use cases can exploit it:

Use the model for prediction
Validate the model using the training data or other external data

List of Model Training Services:

http://opentox.ntua.gr:3000/algorithm/svm This service is not implemented yet because it depends on other non-implemented dataset services related to the "NominalFeature" characterization.

Domain of Applicability

Use Case: Domain of Applicability Calculation Services; Build a resource (proposal: a model-type resource; needs to be agreed) that is able to decide whether a compound or a set of such can be used in combination with a certain model, or as it is formally said, whether a certain compound is in the domain of applicability of a certain model.

Assumes: Dataset and Model services

Intended Audience: Scientists related to life sciences and toxicology, QSAR experts, People intrested in machine learning/statistics, Pharmaceutical Industry R&D and other related fields.

Input Information Needed:

URI of an existing dataset provided by the user, as described in the 'Assumes' section OR
URI of existing compound and a set of services able to calculate the features for this compound which are independent features in the model under consideration
URI of a trained model

There are applicability domain calculation services that do not require tuning parameters

Exception Events:

Compound URI or Dataset URI not found
Features could not be calculated for a given compound
The provided dataset does not contain the independent features of the model whose DoA is to be calculated
Unacceptable tuning parameters (if present)

Expected Result: URI of DoA model

Subsequent Events: Once the DoA model is generated the following use cases can exploit it:

Use the DoA model to tell whether a prediction model is appropriate for the prediction concerning a certain compound
Use the DoA model to find one or more appropriate models for a certain compound or dataset

List of DoA Services:

http://opentox.ntua.gr:3000/algorithm/doa This service is not implemented yet. We design a DoA service based on the method of leverages. This method uses only the training data to take a decision. Other methods include the algorithm as well. A first implementation of the service will be available not after 2010/02/04.

Data CleanUp Services

Use Case: Data Preprocessing services used to clean dataset from unwanted features (e.g. String) and/or missing values. Mainly we recognize two types of cleanup services: One that removes all features of a certain type from a dataset thus creating a new one and services that compensate for missing values, substituting them with the mean or median value of all other feature values for the same feature.

Assumes: Dataset services, Available datasets

Intended Audience: Scientists related to life sciences and toxicology, QSAR experts, People intrested in machine learning/statistics, Pharmaceutical Industry R&D and other related fields.

Input Information Needed:

URI of an existing dataset provided by the user
Service-specific parameters such as the type of cleanup to be applied to the dataset.

Exception Events:

Dataset URI not found
Dataset representation is generated but uploading of the cleaned-up dataset to a remote server failed
Unacceptable tuning parameters (if present)

Expected Result: URI of cleaned dataset

Subsequent Events: Once the cleaned up dataset is generated the following use cases can exploit it:

Use the dataset to build a regression or classification model

List of DoA Services:

These services is not implemented yet but will be ready not after 2010/02/04

Weka machine learning algorithms

Automatically recognize numeric and nominal attributes, even if not declared explicitly in RDF and will ignore e.g. string attributes if only numeric are required

Clustering (no target feature required): http://ambit.uni-plovdiv.bg:8080/ambit2/algorithm/SimpleKMeans
Decision tree (target feature required) http://ambit.uni-plovdiv.bg:8080/ambit2/algorithm/J48
Linear regression (target feature required) http://ambit.uni-plovdiv.bg:8080/ambit2/algorithm/LR

pKa estimation
No dataset or target feature required.

pKa http://ambit.uni-plovdiv.bg:8080/ambit2/algorithm/pka

Toxtree

No dataset or target feature required.

Model services

http://ambit.uni-plovdiv.bg:8080/ambit2/model

pKa and Toxtree models

http://ambit.uni-plovdiv.bg:8080/ambit2/model?search=ToxTree

http://ambit.uni-plovdiv.bg:8080/ambit2/model?search=pKa

Weka machine learning algorithms

Models, derived by Weka algorithms, as above.

http://ambit.uni-plovdiv.bg:8080/ambit2/model?search=weka

Sections

ToxCreate

ToxCreate Application

Try ToxCreate Demo application at www.toxcreate.net/test

TUM services

NTUA Services

Training of Regression Models

Training of Classification Models

Domain of Applicability

Data CleanUp Services

IDEA services

Ontology service

Dataset services

Algorithm services

Model services

Document Actions