API proposal for applicability domain estimation
An API proposal, attempting to unify different approaches of applicability domain estimation.
Applicability domain in OpenTox framework:
- An applicability domain procedure is an OpenTox Algorithm.
- An applicability domain "model" is created posting a dataset URI to an applicability domain algorithm URI. This creates ot:Model with type ota:ApplicabilityDomain and returns a "AD-model" uri.
- Alternatively, for AD, embedded in a predictive model, just declare additional rdf:type of the model to be ota:ApplicabilityDomain
- An applicability domain estimation is done by POSTing a dataset to the "AD-model" uri. This generates another dataset with an extra feature telling whether the corresponding compound belongs to the applicability domain (or in fuzzy terms, how much does it belong to that set).
- For models with embedded AD, on POST of a dataset to the model , both prediction results and AD estimates are generated.
- All models provides the estimation results as specified below.
Applicability domain RDF representation:
A predictive model can be assigned external or embedded applicability domain
- In case of AD external to the model:
@prefix ot: <http://www.opentox.org/api/1.1#> . @prefix ota: <http://www.opentox.org/algorithmTypes.owl#> . </model/mlr-model> ot:hasDomain </model/leverage-ad-model>. </model/mlr-model> rdf:type ot:Model. </model/mlr-model> ot:algorithm </algorithm/mlr>. </algorithm/mlr> rdf:type ot:Algorithm. </algorithm/mlr> rdf:type ota:Regression. </model/leverage-ad-model> rdf:type ot:Model. </model/leverage-ad-model> ot:algorithm </algorithm/leverage>. </algorithm/leverage> rdf:type ot:Algorithm. </algorithm/leverage> rdf:type ota:ApplicabilityDomain.
- In case of AD embedded with the model
@prefix ot: <http://www.opentox.org/api/1.1#> . @prefix ota: <http://www.opentox.org/algorithmTypes.owl#> . <lazar-model> ot:hasDomain <lazar-model>. <lazar-model> rdf:type ot:Model. <lazar-model> ot:algorithm </algorithm/lazar>. </algorithm/lazar> rdf:type ot:Algorithm. </algorithm/lazar> rdf:type ota:ApplicabilityDomain. </algorithm/lazar> rdf:type ota:LazyLearning.
Results form applicability domain estimation
- by analogy of ot:predictedVariables, used to specify features,
where prediction results are stored, one can specify which features
hold the result of AD estimation (suggestion for better property names instead of ot:adMembership and ot:adMetric are welcome !)
@prefix ot: <http://www.opentox.org/api/1.1#> . //the estimated value, e.g. leverage ot:Model ot:adMetric ot:Feature. //the desision for AD membership, based on the estimated value - e.g. "in-domain" if leverage > threshold //have to agree on the value type - boolean, numeric, string, nominal ? ot:Model ot:adMembership ot:Feature.
and subsequently use the same ot:dataEntry and ot:FeatureValue RDF constructions , used elsewhere to specify property values, to specify AD results as well:
@prefix ot: <http://www.opentox.org/api/1.1#> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix : <http://ambit.uni-plovdiv.bg:8080/ambit2/> . @prefix ota: <http://www.opentox.org/algorithmTypes.owl#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix ac: <http://ambit.uni-plovdiv.bg:8080/ambit2/compound/> . @prefix ad: <http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix af: <http://ambit.uni-plovdiv.bg:8080/ambit2/feature/> . ad:1 a ot:Dataset ; ot:dataEntry [ a ot:DataEntry ; ot:compound ac:1 ; ot:values [ a ot:FeatureValue ; ot:feature af:1 ; ot:value "3.14"^^xsd:double ] ot:values [ a ot:FeatureValue ; ot:feature af:9999 ; ot:value "0.0"^^xsd:double ] ] . af:1 a ot:Feature , ot:NumericFeature ; dc:title "MLR-prediction" ; ot:hasSource <http://opentox.ntua.gr/model/mlr> ; ot:units "" . af:9999 a ot:Feature , ot:NumericFeature ; dc:title "AD-leverage" ; ot:hasSource <http://opentox.ntua.gr/model/leverage-ad> ; ot:units "" . ac:1 a ot:Compound ; ot:NumericFeature a owl:Class ; rdfs:subClassOf ot:Feature . ot:DataEntry a owl:Class . ot:hasSource a owl:ObjectProperty . ot:units a owl:DatatypeProperty . ot:values a owl:ObjectProperty . ot:compound a owl:ObjectProperty . dc:title a owl:AnnotationProperty . ot:feature a owl:ObjectProperty . ot:Dataset a owl:Class . dc:description a owl:AnnotationProperty . ot:dataEntry a owl:ObjectProperty . ot:Compound a owl:Class . dc:identifier a owl:AnnotationProperty . ot:FeatureValue a owl:Class . ot:Feature a owl:Class . dc:type a owl:AnnotationProperty . ot:value a owl:DatatypeProperty .
There is no difference in representation of AD results, if AD is embedded in the model itself, besides that ot:hasSource for features , representing predicted values and AD estimation, point to the same ot:Model object
ad:1 a ot:Dataset ; ot:dataEntry [ a ot:DataEntry ; ot:compound ac:1 ; ot:values [ a ot:FeatureValue ; ot:feature af:lazar_prediction ; ot:value "1.0"^^xsd:double ] ot:values [ a ot:FeatureValue ; ot:feature af:10000 ; ot:value "0.666"^^xsd:double ] ] . af:10000 a ot:Feature , ot:NumericFeature ; dc:title "AD-lazar" ; ot:hasSource <http://in-silico.ch/model/lazar> ; ot:units "" . af:lazar_prediction a ot:Feature , ot:NumericFeature ; dc:title "prediction-lazar" ; ot:hasSource <http://in-silico.ch/model/lazar> ; ot:units "". ac:1 a ot:Compound ;