IDA – Intelligent Data Analysis Research Group

Department of Computer Science, Karlovo náměstí 13

Who are we?

See our member list.

Our research

In the Intelligent Data Analysis research lab we work on the following topics:

  • We make computers discover knowledge hidden in data. We devise algorithms able to detect patterns and associations in data, construct predictive models and help identify the processes that generated the data. We contribute mainly to the fields of statistical-relational machine learning, data mining and inductive logic programming.
  • We develop non-conventional optimization techniques, such as new kinds of evolutionary and randomized algorithms, which yield reasonable solutions in acceptable runtimes even in tasks where traditional optimization fails.

What it's all good for

We apply our methods primarily in bioinformatics. For example, we have made a tool called XGENE.ORG that analyzes gene expression data using machine learning methods or the Prodigy system for statistical analysis of protein structure.

Examples of our current projects

  • We employ machine learning to construct predictors of protein-DNA interactions. The predictors are learned from data on known interactions, describing the 3D structure of the interacting proteins and the sequential structure of the DNA target locus. In future work, we would like to use machine learning to predict DNA integration sites of retroviruses such as HIV.
  • We develop a theoretical framework and algorithms for agents learning from mixtures of knowledge (theories) and data, thus trying to mimic the non-trivial real-world scenarios of human learning.

Our project have been funded by

  • Czech Science Foundation
  • Czech Ministry of Education
  • European Commission

Our main partners (joint projects and publications)

  • University of Minnesota, U.S., Division of Hematology-Oncology and Blood and Marrow Transplantation
  • Jozef Stefan Institute, Slovenia, Department of Knowledge Technologies
  • Université de Caen, France, GREYC lab.
  • Karlova univerzita, MFF
  • Technische Universität Wien, Austria, Institut für Softwaretechnik und Interaktive Systeme

Selected publications

  • Kuzelka O., Zelezny F.: Block-Wise Construction of Tree-like Relational Features with Monotone Reducibility and Redundancy. Machine Learning 83(2):163-192, 2011
  • Zahalka J., Zelezny F.: An Experimental Test of Occam's Razor in Classification (Technical note). Machine Learning 82(3):475-481, 2011
  • Zakova M., Kremen P., Zelezny F., Lavrac N.: Automatic Knowledge Discovery Workflow Composition through Ontology-Based Planning. IEEE Trans. Automation Science and Engineering 8(2):253-264, 2011
  • Zelezny F., Lavrac N.: Guest editors' introduction: Special issue on Inductive Logic Programming (ILP-2008). Machine Learning 76(1):1-2, 2009
  • Kuzelka O., Zelezny F.: A Restarted Strategy for Efficient Subsumption Testing. Fundamenta Informaticae 89(1):95-109, 2008
  • Igor Trajkovski, Filip Železný, Nada Lavrač and Jakub Tolar. Learning Relational Descriptions of Differentially Expressed Gene Groups. IEEE Trans. Sys Man Cyb C, 38(1), 16-25, 2008.
  • Jiří Kléma, Lenka Nováková, Filip Karel, Olga Štěpánková and Filip Železný. Sequential Data Mining: A Comparative Case Study in Development of Atherosclerosis Risk Factors. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38 :3--15, 2008.
  • Jiří Kléma, Sylvain Blachon, Arnaud Soulet, Bruno Cremilleux and Olivier Gandrilon. Constraint-Based Knowledge Discovery from SAGE Data. In Silico Biology, 8:14, 2008.
  • Jiří Kubalík, Richard Mordinyi and Stefan Biffl. Multiobjective Prototype Optimization with Evolved Improvement Steps. Evolutionary Computation in Combinatorial Optimization (EvoCOP 2008), 2008.
  • Petr Pošík. Preventing Premature Convergence in a Simple EDA via Global Step Size Setting. Parallel Problem Solving from Nature - PPSN X, 2008.
  • Petr Pošík and Vojtěch Franc. Estimation of Fitness Landscape Contours in EAs. GECCO 2007 - Proceedings of the 9th annual conference on Genetic and Evolutionary Computation, 2007.
  • Filip Železný and Nada Lavrač. Propositionalization-Based Relational Subgroup Discovery with RSD. Machine Learning, special issue on statistical relational learning, 62:33-63, 2006.
  • Filip Železný, Ashwin Srinivasan and C. David Page. Randomised Restarted Search in ILP. Machine Learning, 64:183--208, 2006.
  • Dragan Gamberger, Nada Lavrač, Filip Železný and Jakub Tolar. Induction of comprehensible models for gene expression datasets by subgroup discovery methodology. Journal of Biomedical Informatics, 37:269-284, 2004.

Responsible person: RNDr. Patrik Mottl, Ph.D.