ToxCreate
ToxCreate Application
Try ToxCreate Demo application at www.toxcreate.net/test
Issue tracker: http://github.com/helma/opentox-toxmodel/issues
Contains:
- Bug reports, feature requests and comments
- Development priorities (next steps), may be changed by user votes
- Required contributions from other participants
Please use the issue tracker for bug reports, feature requests, comments and votes for development priorities.
TUM services
Planned TUM contributed services (/algorithm and /model) with current status:
Regression algorithms for model learning:
- kNNregression [ready]
- PLSregression [ready]
- M5P [ready]
- GaussP [ready]
Descriptor calculation algorithms:
- JOELIB2 [under dev.; planned for approx. 20.01.]
- CDKPhysChem [under dev.; planned for approx. 20.01.]
Descriptor selection (Feature selection) algorithms:
- InfoGainAttributeEval [under development; planned for approx. ?]
Model service for predictions:
- /model/{id} [ready]
TUM issue tracker: http://lxkramer13.informatik.tu-muenchen.de/trac/TUMOpenTox-dev/report
TUM complete service overview: http://opentox.informatik.tu-muenchen.de:8080/OpenTox-dev/algorithm
NTUA Services
Issue Tracker: An issue tracker is available online at : http://github.com/sopasakis/yaqp/issues (Hosted by github).
Documentation/Examples: Detailed documentation and examples of use can be found at the new location of our web site which is still under development.
Training of Regression Models
Use Case: Training of Regression Models; Build a model that predicts a numeric value for a certain feature
Assumes: Dataset Services, Dataset with at least two numeric features one of which should be declared to be the predicted one.
Intended Audience: Scientists related to life sciences and toxicology, QSAR experts, People intrested in machine learning/statistics, Pharmaceutical Industry R&D and other related fields.
Input Information Needed:
- URI of an existing dataset provided by a related service including at least two numeric features.
- The target feature need to be declared.
- Other model specific parameters are used to calibrate the algorithm that produces the model (for instance, the parameter γ in the SVM case.)
Exception Events:
- Provided dataset URI does not exist
- Unacceptable tuning parameters (for example γ < 0 or tolerance < 0)
- Provided prediction feature is not a valid feature of the dataset
- Dataset contains less than two numeric attributes
- The provided prediction feature is not numeric in this dataset
Expected Result: URI of trained model
Subsequent Events: Once the model is generated the following use cases can include it:
- Use the model for prediction
- Validate the model using the training data or other external data
List of Model Training Services:
- http://opentox.ntua.gr:3000/algorithm [Provides a list of all algorithms on the server]
- http://opentox.ntua.gr:3000/algorithm/svm
- http://opentox.ntua.gr:3000/algorithm/mlr
Training of Classification Models
Use Case: Training of Classification Models; Build a model that predicts a nominal value (a category) for a certain feature. Nominal are called those features that accept values in a finite set of values, not necessarily numeric
Assumes: Dataset Services, Dataset with at least one nominal feature and another numeric or nominal feature
Intended Audience: Scientists related to life sciences and toxicology, QSAR experts, People intrested in machine learning/statistics, Pharmaceutical Industry R&D and other related fields.
Input Information Needed:
- URI of an existing dataset provided by the user, as described in the 'Assumes' section
- The target feature need to be declared.
- Other model specific parameters are used to calibrate the algorithm that produces the model (for instance, the parameter γ in the case of support vector classifiers.)
Exception Events:
- Provided dataset URI does not exist
- Unacceptable tuning parameters (for example γ < 0 or tolerance < 0)
- Provided prediction feature is not a valid feature of (or it is not contained in) the dataset
- Dataset is not valid for classification.
- The provided prediction feature is not nominal in this dataset
Expected Result: URI of trained model
Subsequent Events: Once the model is generated the following use cases can exploit it:
- Use the model for prediction
- Validate the model using the training data or other external data
List of Model Training Services:
http://opentox.ntua.gr:3000/algorithm/svm This service is not implemented yet because it depends on other non-implemented dataset services related to the "NominalFeature" characterization.
Domain of Applicability
Use Case: Domain of Applicability Calculation Services; Build a resource (proposal: a model-type resource; needs to be agreed) that is able to decide whether a compound or a set of such can be used in combination with a certain model, or as it is formally said, whether a certain compound is in the domain of applicability of a certain model.
Assumes: Dataset and Model services
Intended Audience: Scientists related to life sciences and toxicology, QSAR experts, People intrested in machine learning/statistics, Pharmaceutical Industry R&D and other related fields.
Input Information Needed:
- URI of an existing dataset provided by the user, as described in the 'Assumes' section OR
- URI of existing compound and a set of services able to calculate the features for this compound which are independent features in the model under consideration
- URI of a trained model
Exception Events:
- Compound URI or Dataset URI not found
- Features could not be calculated for a given compound
- The provided dataset does not contain the independent features of the model whose DoA is to be calculated
- Unacceptable tuning parameters (if present)
Expected Result: URI of DoA model
Subsequent Events: Once the DoA model is generated the following use cases can exploit it:
- Use the DoA model to tell whether a prediction model is appropriate for the prediction concerning a certain compound
- Use the DoA model to find one or more appropriate models for a certain compound or dataset
List of DoA Services:
http://opentox.ntua.gr:3000/algorithm/doa This service is not implemented yet. We design a DoA service based on the method of leverages. This method uses only the training data to take a decision. Other methods include the algorithm as well. A first implementation of the service will be available not after 2010/02/04.
Data CleanUp Services
Use Case: Data Preprocessing services used to clean dataset from unwanted features (e.g. String) and/or missing values. Mainly we recognize two types of cleanup services: One that removes all features of a certain type from a dataset thus creating a new one and services that compensate for missing values, substituting them with the mean or median value of all other feature values for the same feature.
Assumes: Dataset services, Available datasets
Intended Audience: Scientists related to life sciences and toxicology, QSAR experts, People intrested in machine learning/statistics, Pharmaceutical Industry R&D and other related fields.
Input Information Needed:
- URI of an existing dataset provided by the user
- Service-specific parameters such as the type of cleanup to be applied to the dataset.
Exception Events:
- Dataset URI not found
- Dataset representation is generated but uploading of the cleaned-up dataset to a remote server failed
- Unacceptable tuning parameters (if present)
Expected Result: URI of cleaned dataset
Subsequent Events: Once the cleaned up dataset is generated the following use cases can exploit it:
- Use the dataset to build a regression or classification model
List of DoA Services:
These services is not implemented yet but will be ready not after 2010/02/04
IDEA services
Issue tracker: https://sourceforge.net/tracker/?group_id=191756Ontology service
http://ambit.uni-plovdiv.bg:8080/ontologyDataset services
http://ambit.uni-plovdiv.bg:8080/ambit2/dataset
Algorithm services
http://ambit.uni-plovdiv.bg:8080/ambit2/algorithm
Weka machine learning algorithms
Automatically recognize numeric and nominal attributes, even if not declared explicitly in RDF and will ignore e.g. string attributes if only numeric are required
- Clustering (no target feature required): http://ambit.uni-plovdiv.bg:8080/ambit2/algorithm/SimpleKMeans
- Decision tree (target feature required) http://ambit.uni-plovdiv.bg:8080/ambit2/algorithm/J48
- Linear regression (target feature required) http://ambit.uni-plovdiv.bg:8080/ambit2/algorithm/LR
pKa estimation
No dataset or target feature required.
Toxtree
No dataset or target feature required.
- Cramer rules http://ambit.uni-plovdiv.bg:8080/ambit2/algorithm/toxtreecramer
- Cramer rules (extendeed) http://ambit.uni-plovdiv.bg:8080/ambit2/algorithm/toxtreecramer2
- Eye irritation http://ambit.uni-plovdiv.bg:8080/ambit2/algorithm/toxtreeeye
- Skin irritation http://ambit.uni-plovdiv.bg:8080/ambit2/algorithm/toxtreeskinirritation
- http://ambit.uni-plovdiv.bg:8080/ambit2/algorithm/toxtreemic
- Michael acceptors http://ambit.uni-plovdiv.bg:8080/ambit2/algorithm/toxtreemichaelacceptors
- Carcinogenicity and mutagenicity http://ambit.uni-plovdiv.bg:8080/ambit2/algorithm/toxtreecarc
- Kroes TTC decision tree http://ambit.uni-plovdiv.bg:8080/ambit2/algorithm/toxtreekroes
Model services
http://ambit.uni-plovdiv.bg:8080/ambit2/model
pKa and Toxtree models
http://ambit.uni-plovdiv.bg:8080/ambit2/model?search=ToxTree
http://ambit.uni-plovdiv.bg:8080/ambit2/model?search=pKa
Weka machine learning algorithms
Models, derived by Weka algorithms, as above.