Sections
You are here: Home » Development » API » API 1.0 - archived version » Dataset

Dataset

Component description

A set of chemical compounds and assigned features

REST operations

Dataset

Description Method URI Parameters Result Status codes
get list of datasets available GET /dataset Query (optional - to be defined) List of URI (datasets.xsd) 200,404,503
create a new dataset POST /dataset/ None or Representation in a supported MIME format New URI /dataset/{id} 200,400,503
get dataset GET /dataset/{id} Preferred MIME type Representation in one of supported MIME formats 200,404,503
update dataset PUT /dataset/{id} Representation in a supported MIME format - 200,400,404,503
remove dataset DELETE /dataset/{id} - - 200,404,503

Chemical compounds in a dataset

Description Method URI Parameters Result Status codes
get compounds HEAD /dataset/{id}/compound - List of URI to structures as in dataset.xsd 200,404,503
get compounds GET /dataset/{id}/compound Preferred MIME type Representation in one of supported MIME formats 200,404,503
get compound GET /dataset/{id}/compound{id2} Preferred MIME type Representation in one of supported MIME formats 200,404,503
add compound POST /dataset/{id}/compound/ Representation in a supported MIME format New URI /dataset/{id}/compound/{id2} 200,400,404,503
update compound PUT /dataset/{id}/compound/{id2} Representation in a supported MIME format - 200,400,404,503
remove a compound from a dataset DELETE /dataset/{id}/compound/{id2} - - 200,404,503
remove all compounds from a dataset DELETE /dataset/{id}/compound - - 200,404,503

Conformers in a dataset (optional)

Description Method URI Parameters Result Status codes
get conformers HEAD /dataset/{id}/compound/{id}/conformers - List of URI to conformers as in dataset.xsd 200,404,503
get conformers GET /dataset/{id}/compound/{id}/conformers Preferred MIME type Representation in one of supported MIME formats 200,404,503
get conformer GET /dataset/{id}/compound/{id2}/conformer/{id} Preferred MIME type Representation in one of supported MIME formats 200,404,503
add conformer POST /dataset/{id}/compound/{id2} Representation in a supported MIME format New URI /dataset/{id}/compound/{id2}/conformer/{id3} 200,400,404,503
update conformer PUT /dataset/{id}/compound/{id2}/conformer/{id3} Representation in a supported MIME format - 200,400,404,503
remove conformers DELETE /dataset/{id}/compound/{id2}/conformer/{ids} - - 200,404,503
remove all conformers DELETE /dataset/{id}/compound/{id2}/conformer - - 200,404,503

Features in a dataset

Description Method URI Parameters Result Status codes
get feature definitions HEAD /dataset/{id}/feature_definition - List of URI of features as in datasets.xsd 200,404,503
get feature definitions GET /dataset/{id}/feature_definition - XML scheme for Feature Definition object 200,404,503
get feature definition GET /dataset/{id}/feature_definition/{id2} - XML scheme for Feature Definition object 200,404,503
add feature definition PUT /dataset/{id}/feature_definition/ XML scheme for Feature Definition object New URI /dataset/{id}/feature_definition/{id2} 200,400,404,503
update feature definition PUT /dataset/{id}/feature_definition/{id2} XML scheme for Feature Definition object - 200,400,404,503
remove feature_definition DELETE /dataset/{id}/feature_definition - - 200,404,503

Actions on datasets (split, merge, subset)

Description Method URI Parameters Result Status codes
split PUT ? /split/dataset/{id}/ split parameters (e.g. crossvalidation folds) List of new dataset URI as in datasets.xsd 200,404,503
merge PUT ? /merge/dataset List of dataset URI to be merged as in datasets.xsd Merged dataset URI /dataset/{id} 200,404,503

Alternative: split and merge can be considered as a special case of "create dataset" , with specific input parameters

create a new empty dataset PUT /dataset/ None New URI /dataset/{id} 200,400,503
split an existing dataset PUT /dataset/ URI of dataset to split & parameters New URI /dataset/{id} 200,400,503
merge datasets (union) PUT /dataset/ List of dataset URI to be merged as in datasets.xsd New URI /dataset/{id} 200,400,503

Queries

Description Method URI Parameters Result Status codes
given a compound, retrieve congeneric chemicals GET TODO - new URI /dataset/{id} 200,404,503
given a compound, retrieve similar chemicals GET TODO - new URI /dataset/{id} 200,404,503
retrieve chemicals that have data for a given endpoint GET TODO - new URI /dataset/{id} 200,404,503
search within a dataset GET /dataset/{datasetid}/query Parameters TODO new URI /dataset/{id} 200,404,503
more - TODO



HTTP status codes

Interpretation Nr Name
Success 200 OK
Dataset not found 404 Not Found
Incorrect MIME type 400 Bad request
Service not available 503 service unavailable

Dataset representation

XML schema for Dataset object

Document Actions

Efficient creation of datasets for validation purposes

Posted by Martin Gütlein at Sep 11, 2009 01:21 PM
I propose to
* remove the split, merge and subset dataset-options
* add the following commands:

desc: copy a dataset while excluding compounds of the orig dataset
method: POST
uri: /dataset/{i}/copy
params: exclude_compounds (comma-separated list of compound-ids)
return: uri of new dataset

desc: copy a dataset while including compounds of the orig datset
method: POST
uri: /dataset/{i}/copy
params: include_compounds (comma-seperated list of compound-ids)
return: uri of new dataset

The old split and merge functions have the disadvantage that each dataset service has to provide this functions with the exact same functionality.
The new copy functions allow an efficient creation of test and training datasets (you do not have to add/remove each compound on its own), and ensure that the dataset-splits has to be implemented only once (by the validation component) and is easy to reproduce.

Efficient creation of datasets for validation purposes

Posted by Martin Gütlein at Sep 14, 2009 11:29 AM
As discussed with Nina on Friday, the changes I suggested have two shortcomings:
* it should be possible to create a new dataset by copying sets of compounds from various (more than one) other datasets
* instead of using compound ids, one should use a list of compound uri

Therefore my new suggestion would be to simply extend the already existing add and remove POST-commands to accept a list of compound URIs instead of just one. It should further be possible to 'add' a complete dataset (param is dataset-uri, all compounds of this dataset are added).

This would still allow for example the creation of a test-dataset, with only few http-requests, independent of the number of compounds.
The disadvantage is that a lot of redundant information is transfered (uri-prefix) if all compounds have the same location.

Any comments?