You are here: Home » Tutorials » Drug Discovery III » 1. Creation of an OpenTox dataset from a local file

1. Creation of an OpenTox dataset from a local file

Download the TCAMS subset from the link mentioned in the introduction to this tutorial and save it to your computer. Open the file and take a look at it. The OpenTox dataset service implemented in the Ambit2 application allows many different file formats for uploading and creating OpenTox datasets (SDF, MOL, SMI, CSV, TXT, ToxML (.xml)). In this tutorial, we will use a CSV file. Note that in this tutorial we won't actually upload the data file containing the 87 TCAMS compounds, to avoid repeated uploading of the same dataset by every user of this tutorial. We will only take you through the steps of how to do the upload, but we will leave the final click of the "Upload" button aside.

To upload the data, navigate in your web browser to Navigate to "Datasets" using the links given at the top of the page ("Datasets" is the third from the left).


The box at the top left is for adding a new dataset. Browse for the file on your computer, and give a name to the dataset, for example "TCAMS WorkshopMunich Dataset". Select "Match by SMILES" from the dropdown list under "Match". This way, on import the application will check if any of the compounds are already present in the database, based on the SMILES used. IMPORTANT: Don't click "Submit" at the end. The dataset we are working with in this tutorial already exists, and there's no need to add another version of it.

Assuming you had uploaded the data file and had given it the name "TCAMS WorkshopMunich Dataset", let's find the dataset in Ambit2. For this, stay at, or go back to the datasets page of Ambit2 ( The available datasets are listed according to their names' first letter. Thus, navigate to "T" to find the TCAMS WorkshopMunich Dataset.

TCAMS Subset in Ambit2

Click on the dataset title and investigate the dataset. You should find that the CSV table is reproduced in Ambit2. Note the URL of the page that shows the dataset: The part before the question mark is the unique identifier of the TCAMS subset dataset in OpenTox: Note this URI (e.g. copy-paste it to a text file), we will use it later. Alternatively, keep the current browser tab at the dataset page, and continue the tutorial on a new tab.


Next step: 2. Selection of models available in OpenTox through ToxPredict

Back to Overview

Document Actions