You are here: Home » Data » Documents » Development » Validation » Validation and Reporting Overview and Data Flow

Validation and Reporting Overview and Data Flow

(Training-Test) Validation Workflow

The following chart illustrates the possible working process of validating an algorithm, with a user-defined training and test dataset:

(Internal) Curl commands for validating an algorithm

1. Init validation
  curl -X POST -d algorithm_uri="<algorithm_service>/algorithm/<algorithm_id>" \
               -d training_dataset_uri="<dataset_service>/dataset/<train_dataset_id>" \
               -d test_dataset_uri="<dataset_service>/dataset/<test_dataset_id>" \
               -d prediction_feature="<prediction_feature>" \
               -d algorithm_params="<alg_param_key_1>=<alg_param_val1>;<alg_param_key_2>=<alg_param_val2>" [OPTIONAL]\
               <validation_service>/training_test_validation

Subsequent (internal) curl calls:
2. Build model (the alg-parms could be a feature generation service/dataset, this has to be discussed):
  curl -X POST -d dataset_uri="<dataset_service>/dataset/<train_dataset_id>" \
               -d prediction_feature="<prediction_feature>" \
               -d <alg_param_key1>="<alg_param_val1>" \
               -d <alg_param_key2>="<alg_param_val2>" \
                <algorithm_service>/algorithm/<algorithm_id>
  -> <model_service>/model/<model_id>

5. Make predictions:
  curl -X POST -d dataset_uri="<dataset_service>/dataset/<test_dataset_id>" \
               <model_service>/model/<model_id>
  -> <dataset_service>/dataset/<prediction_dataset_id>

Get test-dataset (arrow is missing in chart):
  curl -X GET <dataset_service>/dataset/<test_dataset_id>

8. Get predictions:
  curl -X GET <dataset_service>/dataset/<prediction_dataset_id>

At this stage one has to determine weather this is a classification or regression task.
Use either the meta information of the model:

  curl -X GET <model_service>/model/<model_id>

Alternatively use the feature type which should be included in the dataset (Numerical -> Regression else Classification)

Return validation uri:
  -> <validation_service>/<validation_id>
Reporting WorkflowReporting Workflow

General Remarks:

The endpoint / prediction-feature has to be specified by the user as input parameter (see curl calls)
In case the algorithm needs further parameters (which features to use for the prediction, uri of feature generation service), then these params have to given by the user as well (see curl calls)
Not storing the predictions (i.e. deleting the predictions after the validation) limits the report functions

This is how a crossvalidation could work:

Reporting Workflow

The reports described here are reports for model validation (for the model developer) and prediction reports.

Proposition for the report creation worflow:

Remarks:

Validation and Reporting Webservice (WS) are realized as one webservice
One/serveral validation-uris or one/serveral crossvalidation-uris are needed as input parameter (depends on report type)
The report is stored as XML file on the server, and is still available after e.g. deleting the validation

The different report types are listed in the Validation API
The basic XML report is in docbook style, and contains pure results (without much description): Data Tables, Charts, Roc Curves, List of (wrong) predictions, significance tests, and so on..

Some of the results will only be available if the prediction dataset still exists (e.g. Roc Curve)
Some more webservices (besides the Dataset WS for the prediction dataset) will propably be accessed to get additional information (e.g. model meta information via Model WS)

The Fasttox Report

Even though the fasttox use case does not belong to validation, I would suggest to include the fasttox report (and other prediction reports) into the validation reporting facility

The prediction of each model could be stored in an validation object (classification/regression stats would be empty)
The fasttox report creation could reuse functionality of the validation reporting

Formating the reports

Chart for formatting reports:

Remarks:

Each report is stored in docbook XML format, and can be transformed into various supported formats (e.g. html, pdf, rtf)
The standard xsl transformation produces a simple html/pdf document, that includes all data tables, plots etc.
The user can create and specify own xsl, to produce nicly formatted reports with logos, description text, etc.. and maybe skip some information

Special Report Formats

QMRF

about QMRF

parameter could either be model-uri (-> use all available validations with this model) or a list of validation-uri(s)/crossvalidation-uri(s)
the only output format could be the XML format as defined by ambits QMRF editor http://ambit.sourceforge.net/qmrf/jws/qmrfeditor.jnlp
missing information could be filled out by the user via QMRF editor

QPRF

about QPRF, assembling of the data was discussed in a knowledge café discussion in rome

proposition:

create report similar to the example report that can be found here: http://ecb.jrc.ec.europa.eu/qsar/qsar-tools/index.php?c=QRF
leave empty fields to be filled out by the user
the user could edit the report in rtf format with a text editor like microsoft word to fill in the missing information
another possibility could be to insert form field into the pdf report

Sections