FreeTreeMiner

Categories: Descriptor calculation

Exposed methods:

ftm
Input:	2D chemical structure information
Output:	Frequent substructures
Input format:	SD file (MDL Mol)
Output format:	Program specific text files and/or Weka's ARFF format
User-specified parameters:	Minimum support
Reporting information:	Frequent free trees (SMARTs) with occurrence maps, border elements

Description:

The FreeTreeMiner (FTM) software [RUE04] computes all acyclic substructures (in mathematical terms: free or
unrooted trees) occurring at a given minimum frequency in a set of molecules. The substructures are computed
by a depth-first search. Additionally to the minimum frequency support, a maximum frequency constraint can
be set. This constraint can either refer to the same database/set or to a second one, meaning that all
substructures frequent in the first and infrequent in the second are returned by FTM. The frequent
substructures are returned as SMARTS strings together with their occurrences in the given set of structures.
The software is implemented in the programming language C++ and was developed for the Linux and Mac OS
X operating systems. The FTM software is dependent on the open source chemistry toolbox OpenBabel
(http://www.openbabel.org). FTM itself provides no graphical user interface (GUI) and is executed via the
command line. The input format accepted by FTM is the widely used MDL Molfile (sometimes called SD file or
SDF; specification URL: http://www.mdl.com/downloads/public/ctfile/ctfile.jsp). FTM's output formats are
program specific plain text files and/or Weka's [WIT99] ARFF format. For further information, we refer to the
original publication [RUE04] and the website
http://wwwkramer.in.tum.de/research/data_mining/pattern_mining/graph_mining

Background (publication date, popularity/level of familiarity, rationale of approach, further comments)
Published in 2004. A further development of the MolFea approach for acyclic substructures. Acyclic substructures were chosen, as they still allow advanced computations like the calculation of borders. On typical structure databases, the number of frequent acyclic substructures is not much less than the number of frequent unconstrained (i.e., also including cyclic) substructures.

Type of Descriptor:

Substructural descriptors, acyclic substructures, currently no wildcards used or other
more advanced features of the SMARTS language, results can be used in all
fingerprint-based similarity and distance measures.

Interfaces: Standalone application

Priority: High

Development status:

Homepage: http://wwwkramer.in.tum.de/research/data_mining/pattern_mining/graph_mining

Dependencies:
External components: OpenBabel

Technical details

Data: No

Software: Yes

Programming language(s): C++

Operating system(s): Linux, Windows

Input format: SDF

Output format: txt, ARFF

License: GPL

References

References:
[RUE04] Rückert, U and Kramer, S., Frequent Free Tree Discovery in Graph Data, in: SAC '04: Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 564-570 (New York, NY, USA: ACM Press, 2004).
[WIT99] Witten, I.H. Frank, E., Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations (Morgan Kaufmann, 1999).

Sections

FreeTreeMiner

Technical details

References

Document Actions