Dataset
—
filed under:
API
Provides access to chemical compounds and their features (e.g. structural, physical-chemical, biological, toxicological properties)
REST operations
Dataset
Description | Method | URI | Parameters | Result | Status codes | |
---|---|---|---|---|---|---|
Get a list of available datasets | GET | /dataset | Query parameters (optional, to be defined by service providers) | List of URIs or RDF for the metadata only |
200,404,503 | |
Get a dataset | GET | /dataset/{id} | - | Representation of the dataset in a supported MIME type | 200,404,503 | |
Query a dataset | GET | /dataset/{id} | compound_uris[] and/or feature_uris[] to select compounds and features; further query parameters may be defined by service providers |
Representation of the query result in a supported MIME type | 200,404,503 | |
Get metadata for a dataset | GET | /dataset/{id}/metadata | - | Representation of the dataset metadata in a supported MIME type | 200,404,503 | |
Get a list of all compounds in a dataset | GET | /dataset/{id}/compounds | - | List of compound URIs | 200,404,503 | |
Get a list of all features in a dataset | GET | /dataset/{id}/features | - | RDF or List of feature URIs (pointing to feature definitions/ontologies) | 200,404,503 | |
Create a new dataset | POST | /dataset | Dataset representation in a supported MIME type. MIME type to be specified via Content-type header.
|
New URI /dataset/{id} or redirect to task URI (for large uploads) |
200,202,400,503 | |
Update a dataset | PUT |
|
|
Dataset URI or task URI |
200,202,400,404,503 | |
Remove a dataset | DELETE | /dataset/{id} | - | - | 200,404,503 | |
Remove a part of the dataset | DELETE | /dataset/{id} | compound_uris[] and/or feature_uris[]; further query parameters may be defined to select the data to be deleted NOTE: HTTP DELETE doesn't allow to pass a body (at least in Restlet), therefore this functionality can only be implemented as compound_uris[] and feature_uris[] as query parameters, which may result in a long URL - how to redesign partial delete? |
- | 200,404,503 |
Dataset representation
RDF specification
Metadata
- RDF dataset representation (Dublin core properties only)
- Content-type:multipart/form-data File upload metadata: parameters as in opentox.owl - verify if fully qualified dc: properties can be used as parameter names
Features
Supported MIME types:
Mandatory:
Optional:
- other RDF serialization formats
- application/xml
- text/xml
- text/x-yaml
- text/x-json
- application/json
- text/csv
- text/arff
- text/html
- chemical/x-mdl-sdfile
- ...
- multipart/form-data for upload - we need to fix the name of the file upload field
HTTP status codes
Interpretation | Nr | Name |
---|---|---|
Success | 200 | OK |
Asynchronous task started |
202 | Accepted |
Dataset not found | 404 | Not Found |
Incorrect MIME type | 400 | Bad request |
Service not available | 503 | Service unavailable |
Queries
Subsets of a dataset (e.g. all data for a certain feature, all data for a set of compounds)) are accessed through query parameters. This allows us to pass full URIs as parameters and circumvents the problem of no-unique IDs (e.g. for /dataset/{id}/compound/{compound_id} URIs). The query parameters compound_uris[] and feature_uris[] are mandatory, more advanced queries (e.g. similarity searches) may be implemented by individual services.Examples:
- Get all features of two compounds
- curl -X GET http://my_dataset_service/dataset_id?compounds[]=compound1_uri&compounds[]=compound2_uri
- Get a single feature of a single compound
- curl -X GET http://my_dataset_service/dataset_id?compounds[]=compound_uri&features[]=feature_uri
- Remove a compound from a dataset
- curl -X DELETE -d "compounds[]=compound_uri" http://my_dataset_service/dataset_id
- Upload an sdf to ambit server
- curl -X POST -H 'Content-Type:chemical/x-mdl-sdfile' --data-binary @filename.sdf http://ambit.uni-plovdiv.bg:8080/ambit2/dataset
- Get compound URIs of a dataset
- curl -X GET -H 'Accept:text/uri-list' http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/dataset_id
PS Take care to URI encode parameters that are sent via GET.
Proposal
Introduce copy/clone operation on dataset
Feature ontologies
The feature URI points to a Feature object, which allows retrieval of the Feature object as RDF and provides information about the name, units, source and the type of the feature. The feature type is denoted by a mandatory link to an ontology via owl:sameAs or directly subclassing a class from an ontology.
This allows Feature URI to point directly to an existing (fixed) ontology, or to a web service, providing access to dynamically created Feature objects.
Efficient creation of datasets for validation purposes
* remove the split, merge and subset dataset-options
* add the following commands:
desc: copy a dataset while excluding compounds of the orig dataset
method: POST
uri: /dataset/{i}/copy
params: exclude_compounds (comma-separated list of compound-ids)
return: uri of new dataset
desc: copy a dataset while including compounds of the orig datset
method: POST
uri: /dataset/{i}/copy
params: include_compounds (comma-seperated list of compound-ids)
return: uri of new dataset
The old split and merge functions have the disadvantage that each dataset service has to provide this functions with the exact same functionality.
The new copy functions allow an efficient creation of test and training datasets (you do not have to add/remove each compound on its own), and ensure that the dataset-splits has to be implemented only once (by the validation component) and is easy to reproduce.