Dataset

— filed under: API

Provides access to chemical compounds and their features (e.g. structural, physical-chemical, biological, toxicological properties)

REST operations

Dataset

Description	Method	URI	Parameters	Result	Status codes
Get a list of available datasets	GET	/dataset	Query parameters (optional, to be defined by service providers)	List of URIs or RDF for the metadata only	200,404,503
Get a dataset	GET	/dataset/{id}	-	Representation of the dataset in a supported MIME type	200,404,503
Query a dataset	GET	/dataset/{id}	compound_uris[] and/or feature_uris[] to select compounds and features; further query parameters may be defined by service providers	Representation of the query result in a supported MIME type	200,404,503
Get metadata for a dataset	GET	/dataset/{id}/metadata	-	Representation of the dataset metadata in a supported MIME type	200,404,503
Get a list of all compounds in a dataset	GET	/dataset/{id}/compounds	-	List of compound URIs	200,404,503
Get a list of all features in a dataset	GET	/dataset/{id}/features	-	RDF or List of feature URIs (pointing to feature definitions/ontologies)	200,404,503
Create a new dataset	POST	/dataset	Dataset representation in a supported MIME type. MIME type to be specified via Content-type header. Content-type:application/www-form-urlencoded dataset_uri , feature_uris[] and compound_uris[] parameters are used to specify subset of a dataset, as in GET operation; File upload via Content-type:multipart/form-data: file parameter File upload metadata: parameters as in opentox.owl	New URI /dataset/{id} or redirect to task URI (for large uploads)	200,202,400,503
Update a dataset	PUT	/dataset/{id}	Data representation in a supported MIME type; entries for existing compound/feature pairs will be overwritten, entries for new compound/features will be added File upload metadata: Dublin core annotation parameters, as in opentox,owl#Dataset Content-type:application/www-form-urlencoded dataset_uri , feature_uris[] and compound_uris[] parameters are used to specify subset of a dataset, as in GET operation; File upload via Content-type:multipart/form-data: file parameter File upload metadata: Dublin core annotation parameters, as in opentox,owl#Dataset	Dataset URI or task URI	200,202,400,404,503
Remove a dataset	DELETE	/dataset/{id}	-	-	200,404,503
Remove a part of the dataset	DELETE	/dataset/{id}	compound_uris[] and/or feature_uris[]; further query parameters may be defined to select the data to be deleted NOTE: HTTP DELETE doesn't allow to pass a body (at least in Restlet), therefore this functionality can only be implemented as compound_uris[] and feature_uris[] as query parameters, which may result in a long URL - how to redesign partial delete?	-	200,404,503

Dataset representation

RDF specification

Metadata

RDF dataset representation (Dublin core properties only)
Content-type:multipart/form-data File upload metadata: parameters as in opentox.owl - verify if fully qualified dc: properties can be used as parameter names

Features

RDF dataset representation

Supported MIME types:

Mandatory:

application/rdf+xml (default)
application/www-form-urlencoded
multipart/form-data

Optional:

other RDF serialization formats
application/xml
text/xml
text/x-yaml
text/x-json
application/json
text/csv
text/arff
text/html
chemical/x-mdl-sdfile
...
multipart/form-data for upload - we need to fix the name of the file upload field

HTTP status codes

Interpretation	Nr	Name
Success	200	OK
Asynchronous task started	202	Accepted
Dataset not found	404	Not Found
Incorrect MIME type	400	Bad request
Service not available	503	Service unavailable

Queries

Subsets of a dataset (e.g. all data for a certain feature, all data for a set of compounds)) are accessed through query parameters. This allows us to pass full URIs as parameters and circumvents the problem of no-unique IDs (e.g. for /dataset/{id}/compound/{compound_id} URIs). The query parameters compound_uris[] and feature_uris[] are mandatory, more advanced queries (e.g. similarity searches) may be implemented by individual services.

Examples:

Get all features of two compounds: curl -X GET http://my_dataset_service/dataset_id?compounds[]=compound1_uri&compounds[]=compound2_uri
Get a single feature of a single compound: curl -X GET http://my_dataset_service/dataset_id?compounds[]=compound_uri&features[]=feature_uri
Remove a compound from a dataset: curl -X DELETE -d "compounds[]=compound_uri" http://my_dataset_service/dataset_id
Upload an sdf to ambit server: curl -X POST -H 'Content-Type:chemical/x-mdl-sdfile' --data-binary @filename.sdf http://ambit.uni-plovdiv.bg:8080/ambit2/dataset
Get compound URIs of a dataset: curl -X GET -H 'Accept:text/uri-list' http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/dataset_id

Together with a little bit of RDF processing you can use queries also for set operations (e.g. subsets, split, merge, intersection).

PS Take care to URI encode parameters that are sent via GET.

Proposal

Introduce copy/clone operation on dataset

Feature ontologies

The feature URI points to a Feature object, which allows retrieval of the Feature object as RDF and provides information about the name, units, source and the type of the feature. The feature type is denoted by a mandatory link to an ontology via owl:sameAs or directly subclassing a class from an ontology.

This allows Feature URI to point directly to an existing (fixed) ontology, or to a web service, providing access to dynamically created Feature objects.

Conformers

Conformer URIs (see Compound API) can be used instead of compound URIs. The Resolving the parent structure should be done via the compound webservice.

Sections

Dataset

REST operations

Dataset

Dataset representation

RDF specification

Metadata

Features

Supported MIME types:

Mandatory:

Optional:

HTTP status codes

Queries

Examples:

Feature ontologies

Conformers

Document Actions

Efficient creation of datasets for validation purposes

Efficient creation of datasets for validation purposes

Efficient creation of datasets for validation purposes

Efficient creation of datasets for validation purposes

Efficient creation of datasets for validation purposes

Efficient creation of datasets for validation purposes

Dataset Features ?