DASS: efficient discovery and p-value calculation of substructures in unordered data.

Jens Hollunder; Maik Friedel; Andreas Beyer; Christopher T Workman; Thomas Wilhelm

DASS: efficient discovery and p-value calculation of substructures in unordered data.

Standard

DASS: efficient discovery and p-value calculation of substructures in unordered data. / Hollunder, Jens; Friedel, Maik; Beyer, Andreas; Workman, Christopher T; Wilhelm, Thomas.

In: BIOINFORMATICS, Vol. 23, No. 1, 1, 2007, p. 77-83.

Research output: SCORING: Contribution to journal › SCORING: Journal article › Research › peer-review

Harvard

Hollunder, J, Friedel, M, Beyer, A, Workman, CT & Wilhelm, T 2007, 'DASS: efficient discovery and p-value calculation of substructures in unordered data.', BIOINFORMATICS, vol. 23, no. 1, 1, pp. 77-83. <http://www.ncbi.nlm.nih.gov/pubmed/17032678?dopt=Citation>

APA

Hollunder, J., Friedel, M., Beyer, A., Workman, C. T., & Wilhelm, T. (2007). DASS: efficient discovery and p-value calculation of substructures in unordered data. BIOINFORMATICS, 23(1), 77-83. [1]. http://www.ncbi.nlm.nih.gov/pubmed/17032678?dopt=Citation

Vancouver

Hollunder J, Friedel M, Beyer A, Workman CT, Wilhelm T. DASS: efficient discovery and p-value calculation of substructures in unordered data. BIOINFORMATICS. 2007;23(1):77-83. 1.

Bibtex

@article{c3bfca0fa320445e88188615dfbf1c81,

title = "DASS: efficient discovery and p-value calculation of substructures in unordered data.",

abstract = "MOTIVATION: Pattern identification in biological sequence data is one of the main objectives of bioinformatics research. However, few methods are available for detecting patterns (substructures) in unordered datasets. Data mining algorithms mainly developed outside the realm of bioinformatics have been adapted for that purpose, but typically do not determine the statistical significance of the identified patterns. Moreover, these algorithms do not exploit the often modular structure of biological data. RESULTS: We present the algorithm DASS (Discovery of All Significant Substructures) that first identifies all substructures in unordered data (DASS(Sub)) in a manner that is especially efficient for modular data. In addition, DASS calculates the statistical significance of the identified substructures, for sets with at most one element of each type (DASS(P(set))), or for sets with multiple occurrence of elements (DASS(P(mset))). The power and versatility of DASS is demonstrated by four examples: combinations of protein domains in multi-domain proteins, combinations of proteins in protein complexes (protein subcomplexes), combinations of transcription factor target sites in promoter regions and evolutionarily conserved protein interaction subnetworks. AVAILABILITY: The program code and additional data are available at http://www.fli-leibniz.de/tsb/DASS",

author = "Jens Hollunder and Maik Friedel and Andreas Beyer and Workman, {Christopher T} and Thomas Wilhelm",

year = "2007",

language = "Deutsch",

volume = "23",

pages = "77--83",

journal = "BIOINFORMATICS",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "1",

}

RIS

TY - JOUR

T1 - DASS: efficient discovery and p-value calculation of substructures in unordered data.

AU - Hollunder, Jens

AU - Friedel, Maik

AU - Beyer, Andreas

AU - Workman, Christopher T

AU - Wilhelm, Thomas

PY - 2007

Y1 - 2007

N2 - MOTIVATION: Pattern identification in biological sequence data is one of the main objectives of bioinformatics research. However, few methods are available for detecting patterns (substructures) in unordered datasets. Data mining algorithms mainly developed outside the realm of bioinformatics have been adapted for that purpose, but typically do not determine the statistical significance of the identified patterns. Moreover, these algorithms do not exploit the often modular structure of biological data. RESULTS: We present the algorithm DASS (Discovery of All Significant Substructures) that first identifies all substructures in unordered data (DASS(Sub)) in a manner that is especially efficient for modular data. In addition, DASS calculates the statistical significance of the identified substructures, for sets with at most one element of each type (DASS(P(set))), or for sets with multiple occurrence of elements (DASS(P(mset))). The power and versatility of DASS is demonstrated by four examples: combinations of protein domains in multi-domain proteins, combinations of proteins in protein complexes (protein subcomplexes), combinations of transcription factor target sites in promoter regions and evolutionarily conserved protein interaction subnetworks. AVAILABILITY: The program code and additional data are available at http://www.fli-leibniz.de/tsb/DASS

AB - MOTIVATION: Pattern identification in biological sequence data is one of the main objectives of bioinformatics research. However, few methods are available for detecting patterns (substructures) in unordered datasets. Data mining algorithms mainly developed outside the realm of bioinformatics have been adapted for that purpose, but typically do not determine the statistical significance of the identified patterns. Moreover, these algorithms do not exploit the often modular structure of biological data. RESULTS: We present the algorithm DASS (Discovery of All Significant Substructures) that first identifies all substructures in unordered data (DASS(Sub)) in a manner that is especially efficient for modular data. In addition, DASS calculates the statistical significance of the identified substructures, for sets with at most one element of each type (DASS(P(set))), or for sets with multiple occurrence of elements (DASS(P(mset))). The power and versatility of DASS is demonstrated by four examples: combinations of protein domains in multi-domain proteins, combinations of proteins in protein complexes (protein subcomplexes), combinations of transcription factor target sites in promoter regions and evolutionarily conserved protein interaction subnetworks. AVAILABILITY: The program code and additional data are available at http://www.fli-leibniz.de/tsb/DASS

M3 - SCORING: Zeitschriftenaufsatz

VL - 23

SP - 77

EP - 83

JO - BIOINFORMATICS

JF - BIOINFORMATICS

SN - 1367-4803

IS - 1

M1 - 1

ER -