Skip to content. | Skip to navigation

Sections

API

You are here: Home » Development » API » API 1.0 - archived version » Dataset

Dataset

Component description

A set of chemical compounds and assigned features

REST operations

Dataset

Description	Method	URI	Parameters	Result	Status codes
get list of datasets available	GET	/dataset	Query (optional - to be defined)	List of URI (datasets.xsd)	200,404,503
create a new dataset	POST	/dataset/	None or Representation in a supported MIME format	New URI /dataset/{id}	200,400,503
get dataset	GET	/dataset/{id}	Preferred MIME type	Representation in one of supported MIME formats	200,404,503
update dataset	PUT	/dataset/{id}	Representation in a supported MIME format	-	200,400,404,503
remove dataset	DELETE	/dataset/{id}	-	-	200,404,503

Chemical compounds in a dataset

Description	Method	URI	Parameters	Result	Status codes
get compounds	HEAD	/dataset/{id}/compound	-	List of URI to structures as in dataset.xsd	200,404,503
get compounds	GET	/dataset/{id}/compound	Preferred MIME type	Representation in one of supported MIME formats	200,404,503
get compound	GET	/dataset/{id}/compound{id2}	Preferred MIME type	Representation in one of supported MIME formats	200,404,503
add compound	POST	/dataset/{id}/compound/	Representation in a supported MIME format	New URI /dataset/{id}/compound/{id2}	200,400,404,503
update compound	PUT	/dataset/{id}/compound/{id2}	Representation in a supported MIME format	-	200,400,404,503
remove a compound from a dataset	DELETE	/dataset/{id}/compound/{id2}	-	-	200,404,503
remove all compounds from a dataset	DELETE	/dataset/{id}/compound	-	-	200,404,503

Conformers in a dataset (optional)

Description	Method	URI	Parameters	Result	Status codes
get conformers	HEAD	/dataset/{id}/compound/{id}/conformers	-	List of URI to conformers as in dataset.xsd	200,404,503
get conformers	GET	/dataset/{id}/compound/{id}/conformers	Preferred MIME type	Representation in one of supported MIME formats	200,404,503
get conformer	GET	/dataset/{id}/compound/{id2}/conformer/{id}	Preferred MIME type	Representation in one of supported MIME formats	200,404,503
add conformer	POST	/dataset/{id}/compound/{id2}	Representation in a supported MIME format	New URI /dataset/{id}/compound/{id2}/conformer/{id3}	200,400,404,503
update conformer	PUT	/dataset/{id}/compound/{id2}/conformer/{id3}	Representation in a supported MIME format	-	200,400,404,503
remove conformers	DELETE	/dataset/{id}/compound/{id2}/conformer/{ids}	-	-	200,404,503
remove all conformers	DELETE	/dataset/{id}/compound/{id2}/conformer	-	-	200,404,503

Features in a dataset

Description	Method	URI	Parameters	Result	Status codes
get feature definitions	HEAD	/dataset/{id}/feature_definition	-	List of URI of features as in datasets.xsd	200,404,503
get feature definitions	GET	/dataset/{id}/feature_definition	-	XML scheme for Feature Definition object	200,404,503
get feature definition	GET	/dataset/{id}/feature_definition/{id2}	-	XML scheme for Feature Definition object	200,404,503
add feature definition	PUT	/dataset/{id}/feature_definition/	XML scheme for Feature Definition object	New URI /dataset/{id}/feature_definition/{id2}	200,400,404,503
update feature definition	PUT	/dataset/{id}/feature_definition/{id2}	XML scheme for Feature Definition object	-	200,400,404,503
remove feature_definition	DELETE	/dataset/{id}/feature_definition	-	-	200,404,503

Actions on datasets (split, merge, subset)

Description	Method	URI	Parameters	Result	Status codes
split	PUT	? /split/dataset/{id}/	split parameters (e.g. crossvalidation folds)	List of new dataset URI as in datasets.xsd	200,404,503
merge	PUT	? /merge/dataset	List of dataset URI to be merged as in datasets.xsd	Merged dataset URI /dataset/{id}	200,404,503

Alternative: split and merge can be considered as a special case of "create dataset" , with specific input parameters

create a new empty dataset	PUT	/dataset/	None	New URI /dataset/{id}	200,400,503
split an existing dataset	PUT	/dataset/	URI of dataset to split & parameters	New URI /dataset/{id}	200,400,503
merge datasets (union)	PUT	/dataset/	List of dataset URI to be merged as in datasets.xsd	New URI /dataset/{id}	200,400,503

Queries

Description	Method	URI	Parameters	Result	Status codes
given a compound, retrieve congeneric chemicals	GET	TODO	-	new URI /dataset/{id}	200,404,503
given a compound, retrieve similar chemicals	GET	TODO	-	new URI /dataset/{id}	200,404,503
retrieve chemicals that have data for a given endpoint	GET	TODO	-	new URI /dataset/{id}	200,404,503
search within a dataset	GET	/dataset/{datasetid}/query	Parameters TODO	new URI /dataset/{id}	200,404,503
more - TODO

HTTP status codes

Interpretation	Nr	Name
Success	200	OK
Dataset not found	404	Not Found
Incorrect MIME type	400	Bad request
Service not available	503	service unavailable

Dataset representation

XML schema for Dataset object

Document Actions

Efficient creation of datasets for validation purposes

Posted by Martin Gütlein at Sep 11, 2009 01:21 PM

I propose to
* remove the split, merge and subset dataset-options
* add the following commands:

desc: copy a dataset while excluding compounds of the orig dataset
method: POST
uri: /dataset/{i}/copy
params: exclude_compounds (comma-separated list of compound-ids)
return: uri of new dataset

desc: copy a dataset while including compounds of the orig datset
method: POST
uri: /dataset/{i}/copy
params: include_compounds (comma-seperated list of compound-ids)
return: uri of new dataset

The old split and merge functions have the disadvantage that each dataset service has to provide this functions with the exact same functionality.
The new copy functions allow an efficient creation of test and training datasets (you do not have to add/remove each compound on its own), and ensure that the dataset-splits has to be implemented only once (by the validation component) and is easy to reproduce.

Efficient creation of datasets for validation purposes

Posted by Martin Gütlein at Sep 14, 2009 11:29 AM

As discussed with Nina on Friday, the changes I suggested have two shortcomings:
* it should be possible to create a new dataset by copying sets of compounds from various (more than one) other datasets
* instead of using compound ids, one should use a list of compound uri

Therefore my new suggestion would be to simply extend the already existing add and remove POST-commands to accept a list of compound URIs instead of just one. It should further be possible to 'add' a complete dataset (param is dataset-uri, all compounds of this dataset are added).

This would still allow for example the creation of a test-dataset, with only few http-requests, independent of the number of compounds.
The disadvantage is that a lot of redundant information is transfered (uri-prefix) if all compounds have the same location.

Any comments?

Powered by Plone