Sections
You are here: Home » Data » Documents » Development » Validation » Validation and Reporting Overview and Data Flow

Validation and Reporting Overview and Data Flow

(Training-Test) Validation Workflow

 

The following chart illustrates the possible working process of validating an algorithm, with a user-defined training and test dataset:

validation workflow

 

(Internal) Curl commands for validating an algorithm

1. Init validation
  curl -X POST -d algorithm_uri="<algorithm_service>/algorithm/<algorithm_id>" \
               -d training_dataset_uri="<dataset_service>/dataset/<train_dataset_id>" \
               -d test_dataset_uri="<dataset_service>/dataset/<test_dataset_id>" \
               -d prediction_feature="<prediction_feature>" \
               -d algorithm_params="<alg_param_key_1>=<alg_param_val1>;<alg_param_key_2>=<alg_param_val2>" [OPTIONAL]\
               <validation_service>/training_test_validation

Subsequent (internal) curl calls:
2. Build model (the alg-parms could be a feature generation service/dataset, this has to be discussed):
  curl -X POST -d dataset_uri="<dataset_service>/dataset/<train_dataset_id>" \
               -d prediction_feature="<prediction_feature>" \
               -d <alg_param_key1>="<alg_param_val1>" \
               -d <alg_param_key2>="<alg_param_val2>" \
                <algorithm_service>/algorithm/<algorithm_id>
  -> <model_service>/model/<model_id>

5. Make predictions:
  curl -X POST -d dataset_uri="<dataset_service>/dataset/<test_dataset_id>" \
               <model_service>/model/<model_id>
  -> <dataset_service>/dataset/<prediction_dataset_id>

Get test-dataset (arrow is missing in chart):
  curl -X GET <dataset_service>/dataset/<test_dataset_id>

8. Get predictions:
  curl -X GET <dataset_service>/dataset/<prediction_dataset_id>

At this stage one has to determine weather this is a classification or regression task.
Use either the meta information of the model:

  curl -X GET <model_service>/model/<model_id>

Alternatively use the feature type which should be included in the dataset (Numerical -> Regression else Classification)

Return validation uri:
  -> <validation_service>/<validation_id>
Reporting WorkflowReporting Workflow

 

General Remarks:

  • The endpoint / prediction-feature has to be specified by the user as input parameter (see curl calls)
  • In case the algorithm needs further parameters (which features to use for the prediction, uri of feature generation service), then these params have to given by the user as well (see curl calls)
  • Not storing the predictions (i.e. deleting the predictions after the validation) limits the report functions

 

This is how a crossvalidation could work:

crossvalidation

 

 

Reporting Workflow

 

The reports described here are reports for model validation (for the model developer) and prediction reports.

Proposition for the report creation worflow:

 

report

 

 

Remarks:

  • Validation and Reporting Webservice (WS) are realized as one webservice
  • One/serveral validation-uris or one/serveral crossvalidation-uris are needed as input parameter (depends on report type)
  • The report is stored as XML file on the server, and is still available after e.g. deleting the validation
  • The different report types are listed in the Validation API
  • The basic XML report is in docbook style, and contains pure results (without much description): Data Tables, Charts, Roc Curves, List of (wrong) predictions, significance tests, and so on..
  • Some of the results will only be available if the prediction dataset still exists (e.g. Roc Curve)
  • Some more webservices (besides the Dataset WS for the prediction dataset) will propably be accessed to get additional information (e.g. model meta information via Model WS)


The Fasttox Report

Even though the fasttox use case does not belong to validation, I would suggest to include the fasttox report (and other prediction reports) into the validation reporting facility

  • The prediction of each model could be stored in an validation object (classification/regression stats would be empty)
  • The fasttox report creation could reuse functionality of the validation reporting

 

 

Formating the reports

 

Chart for formatting reports:

 

report format workflow

 

Remarks:

  • Each report is stored in docbook XML format, and can be transformed into various supported formats (e.g. html, pdf, rtf)
  • The standard xsl transformation produces a simple html/pdf document, that includes all data tables, plots etc.
  • The user can create and specify own xsl, to produce nicly formatted reports with logos, description text, etc.. and maybe skip some information

 

 

Special Report Formats

 

QMRF

 

about QMRF

 

  • parameter could either be model-uri (-> use all available validations with this model) or a list of validation-uri(s)/crossvalidation-uri(s)
  • the only output format could be the XML format as defined by ambits QMRF editor http://ambit.sourceforge.net/qmrf/jws/qmrfeditor.jnlp
  • missing information could be filled out by the user via QMRF editor

 

QPRF

 

about QPRF,  assembling of the data was discussed in a knowledge café discussion in rome

proposition:

  • create report similar to the example report that can be found here: http://ecb.jrc.ec.europa.eu/qsar/qsar-tools/index.php?c=QRF
  • leave empty fields to be filled out by the user
  • the user could edit the report in rtf format with a text editor like microsoft word to fill in the missing information
  • another possibility could be to insert form field into the pdf report
Document Actions