Sections
You are here: Home » Development » Testing » Test Case Development » ToxCreate » ToxCreate GUI – outline functionality

ToxCreate GUI – outline functionality

1.1 Section: “Step 1”, entering the chemical sample files and data.

  • 1. User navigates to the “ToxCreate” web page or opens the ToxCreate application (if installed locally). It may be appropriate to access the web page via a login so that preferences and results can be saved.
  • 2. User then enters structure as one or more SD files or other supported file formats:

i) Browse: the user double-clicks in an empty sample cell in the spreadsheet to open a browser dialogue box to select one or more SDF files (or SD, Mol, SMILES, Cartesian & XYZ files) on the local computer or network. After clicking to select one or more files, the “Open” button opens the file which is then loaded into the spreadsheet, one sample per sample cell. All associated data is placed in the same row, under the appropriate header which is created automatically.

 ii) Drag and Drop an SDF file anywhere on the page (SD, Mol, SMILES, Cartesian & XYZ files). The file is then loaded into the spreadsheet as above.

(this may be too complex for first implementation, so may be deferred)

 iii) If the structure files do not include numerical data for the QSAR, the user can enter this manually after the structure files have been loaded. The user can click in an empty header cell to position the cursor to type a header name. Data values can be entered into the cells below, by placing the cursor in each cell in order to type the value. The “Return” key completes the data entry in that cell and moves the cursor into the next cell below.

1.1.1 File types

ToxCreate will automatically recognize the following file types: SDF, SD, Mol, PDB, SMILES. Files containing a batch of structures must be in SDF format.

1.1.2 Clicking in the spreadsheet cells

Clicking in a chemical sample cell (under column A) highlights that cell.

Clicking in any other cell (not under column A) places a text cursor, allowing the user to type data or copy and paste into that cell. Press and drag allows selection of a range of cells

1.1.3 Browsing to chemical sample files

Double clicking on an empty chemical sample cell opens the “browse to…” dialog window so the user can navigate to and select a file to load into that cell. If more than one file is selected, the additional files are added alphabetically and sequentially in the sample cells below the first selected one.

Alternatively, the “Browse to…” dialog window can be opened by selecting the empty sample cell, then clicking the “Browse…” button

1.1.4 Editing chemical samples

Double-clicking on a sample cell with a file name in it, opens that structure in a 2D editor which can be toggled between 2D and 3D views with a simultaneous 1D (text) editor. Both windows are interactive (edits in one window automatically update the other immediately) and editing is allowed in both (see FasTox). When the 3D view is selected, an option button appears allowing the user to automatically “Convert to 3D”.

Alternatively, the editor can be opened by selecting the sample name, then clicking “Edit selected sample”

If an empty chemical sample cell is selected, clicking “Edit selected sample” opens an empty editor window allowing the user to draw, name, and save a new structure into that cell. If an existing file is edited, the original file cannot be overwritten, but is saved with an incremented suffix number.

1.1.5 Deleting chemical samples

Chemical samples can be deleted by clicking the name to select the cell, then clicking the “Clear selected cells” button. If there are associated fields of data, a dialog box asks the user if the additional data should be deleted or retained. “Clear all cells” clears and resets the whole spreadsheet. The “Back” arrow on the browser allows the user to undo the “Clear all cells” action.

1.1.6 Checking the input file for errors

While loading, after the file is checked for errors, various alerts appear in a dialog box:

  • “Multiple molecules in sample #, non-relevant molecules should be deleted…
  • “incorrect valence on atoms: etc.
  • “unrecognized atom type:
  • “Structure has a non-zero net charge of ? (ion)
  • “Structure for CAS number ??? not found
  • “File could not be read due to unrecognized format
  • “Etc.

Critical errors invoke an error dialog box describing the error and sample number (if appropriate). Non-critical errors do not prevent the sample from being loaded, but are listed as footnotes at the bottom of the spreadsheet and an associated superscript added to the sample name.

Examples of non-critical 3D* errors:

  • Bond distance for atoms X and Y is outside normal range
  • Atoms X and Y are too close
  • Incorrect valence on atoms: C26, C28, etc.
  • Total charge is non-zero charge
  • more than one molecule in file
  • unrecognized element present
  • incorrect valence on one or more atoms
  • etc.

The error report dialog box has a button to “Edit…”, “Cancel” and for some errors a “Ignore and continue” button will be active. The “Edit…” button open a 2D/3D and simultaneous 1D (text) editor. Both windows are interactive (edits in one window automatically update the other immediately) and editing is allowed in both (see Fastox).

Non-critical errors are listed as footnotes at the bottom of the spreadsheet and an associated superscript added to the sample name. Critical errors invoke an error dialog box describing the error and sample number (if appropriate).

“Step 2”, Start QSAR model creation

This section explains the various fields in “Step 2” of Figure 1

1.2 “Step 2”, Start QSAR model creation

1.2.1 Rename chemical samples by:

This field show a list of all fields in the SDF file. The selected field will be used as the sample name in the final results spreadsheet. The default (selected) setting is “Use MDL MOL name”.

1.2.2 Choose data field for QSAR:

A list of the data fields that contain numeric data is show in this window. The selected field will be used to correlate with the descriptors generated.

1.2.3 Hide Settings / Show settings

On the left side a list of check-box options can be displayed or hidden by clicking to toggle the “Hide settings” or “Show settings” button. By default all check boxes are hidden. “Reset defaults” sets all check boxes to the default settings (as shown).

*3D structures

Assuming that the 3D conversion is fast and that OpenTox would eventually include some 3D QSARs based on QM methods or 3D-substructure searching, then this option should be checked by default. The extra conversion time would be minor compared to the descriptor generation and regression analysis time.

1.2.4 Restore Defaults

Sets all check boxes back to default values

1.2.5 Wizard

The Wizard takes the user through a series of steps to select various options such as descriptors and regression criteria etc., before running the QSAR model builder (not yet designed).

1.2.6 Time Limit

Time limit sets the maximum time that the system will continue to look for the best correlation. If the time limit is reached before the calculations are finished, then calculations will be stopped and the result will be presented showing the best correlation(s) found in the time allowed.

1.2.7 Fast QSAR

To save time, only ‘linear’ topological descriptors are computed and used in the regression

1.2.8 Best QSAR

All descriptors, including their non-linear functions, are computed and used for the regression analysis.

“Step 3” automatically computing the new QSAR Model 

1.2.9 Data analysis:

On starting the QSAR analysis, the training data values are analyzed for evenness of spread. If the “skew” value does not fall below a preset threshold, then various functions are tested (e.g. reciprocal, square, square root, logarithm, etc.). The new skew values are computed and the user is presented with a warning “The data to be predicted is not evenly spread and could lead to a misleadingly high r^2”, and a list of functions that improve the skew value (best first) and has the option to select one or continue with the original linear data.

1.2.10 Descriptors:

For “Best QSAR”, all descriptors are calculated by default including the following:

  1. Topological descriptors including atom and group counts and Kier and Hall indices
  2. Electrotopological descriptors
  3. Quantum descriptors (MOPAC PM6 recommended for speed)
  4. “Density descriptors” of most meaningful descriptors (above), e.g. Number of hydroxyl groups divided by molecular weight (or /volume or /surface area).
  5. Non-linear functions of all above descriptors, e.g. reciprocal, square, square root, logarithm

 “Quick QSAR” uses only linear topological, Electrotopological descriptors and “Density descriptors”.

 Descriptors with variance below a preset threshold are removed from the set.

1.2.11 Regression analysis, systematic

Regression analysis uses all computed descriptors and starts by systematically looking for and saving the best ten single-descriptor-models (based on highest r^2). Then it looks for the best two-descriptor-models by comparing all permutations of two descriptors, and saves the best 10 models. Then it continues by looking for the best three-descriptor-models by comparing all permutations of three-descriptor models. Finally it looks for the best four-descriptor model.

However, to avoid over-fitting, the number of descriptors in the model is not allowed to exceed the number of training samples divided by 5.

1.2.12 Other analyses:

After applying the systematic regression analysis, other methods such as genetic algorithms, neural nets are tried next, until the preset time limit is reached.

1.2.13 Stopping the calculation

The best QSAR (highest r^2 adjusted for number of degrees of freedom) found to date is always displayed on the screen during the computation, along with a progress bar and an estimated time to complete. The user has the option of stopping further calculation (using the “Cancel” or “Stop and Save” button) and accepting the model shown, at any time. Models that include two descriptors that correlate more than 95% are discarded.

Final QSAR Report

1.4.14 Spreadsheet format report

The data and results in ToxCreate should be presented as an interactive spreadsheet format. Shading the data cells proportional to the data value is desirable (note green shading):

1.4.15 PDF Format Report

At the end of the calculation a report is created automatically on the “Report” page, and as a downloadable pdf file. The following is an example of the style and content of the QSAR report in PDF format:

 

 

 

Document Actions