WiSe 2024/25

Computing at Scale in Machine Learning: Distributed computing and algorithmic approaches (14038)

Learning Outcomes:

Students will obtain an overview on how to solve large-scale computational problems in data science and machine learning using a) parallel approaches from multi-threaded computation on individual machines to implicit parallelism frameworks on compute clusters and b) algorithms and data structures supporting efficient exact or approximate computation with massive data sets in and out of core. In particular they will learn how to analyze relevant probabilistic data structures and algorithms and select and implement appropriate computational approaches for large-scale problems.

Contents:

The focus will be on the following areas:

  • A review of memory-compute co-location and its impact on big data computations.
  • Solving Machine Learning (ML) work loads using explicit parallelism, specifically multi-threaded computation on an individual machine.
  • Introduction of implicit parallelism programming models as implemented for example in MapReduce, Spark and Ray and their application in ML.
  • Probabilistic algorithms such as sketching algorithms (incl. CountMinSketch, HyperLogLog) or Bloom filters.
  • Implementing ML methods using index data structures such as suffix or kd-trees.

Recommended Prerequisites:

Introduction to machine learning at Master’s level. Advanced knowledge of programming in Python and the Linux command line.

Contact: Alexander Schliep (alexander@schlieplab.org). Teaching assistants: Nathalie Gocht, Aleksandra Khatova.

Introduction to Bioinformatics

Learning Outcomes:

After successfully completing the module, students will have acquired an overview of the fundamentals of bioinformatics. This includes an introduction to relevant molecular processes, scientific instruments to investigate these processes, and the data generated by them. For central computational problems, students will be able to discuss advantages and disadvantages of statistical and basic algorithmic approaches, respectively adapt them to specific biological questions. Students will be able to analyze specific biological data using appropriate software libraries for Python.

Contents:

The focus will be on the basics of the following areas:

  • An introduction to molecular biology including relevant scientific instruments and the Omics-data generated by them.
  • Pair-wise and multiple sequence alignments, seed-and-extend approaches, and genome indexes Evolutionary models and phylogenetic trees
  • Signals in sequences: identification of motifs
  • Assembly of genomes and transcriptomes
  • Gene expression analysis

Recommended Prerequisites:

Good knowledge of discrete probability, algorithms and data structures at the undergraduate level. Advanced knowledge of programming in Python and the Linux command line.

Contact: Alexander Schliep (alexander@schlieplab.org). Teaching assistants: Nathalie Gocht, Aleksandra Khatova.

Bioinformatics (100051)

After successfully completing the module, students will be familiar with state-of-the-art problems and methodological approaches used in medical bioinformatics. They will have the ability to familiarize themselves with current research in medical bioinformatics from original research literature, to participte in a technical discussion within the context of international science and present scientific content in written and oral form.

Students will learn about specific state-of-the-art problems and methodological approaches used in medical bioinformatics. The applications will range from diagnostics and monitoring patients with sensor, clinical and omics data, to detect clinically relevant states or understand cellular processes relevant to diagnosis and disease as well mechanisms for treating diseases. Methods will include both algorithmic and machine learning approaches.

Students will present a topic based on—typically—one original research article chosen from a list of suggestions made available here and prepare a report on the same topic. Attending all talks by other students and participating in discussions is expected for a passing grade.

Contact: Alexander Schliep (alexander@schlieplab.org). Teaching assistants: Nathalie Gocht, Aleksandra Khatova.

Research Module in Artificial Intelligence: Specialization Medical Bioinformatics (100050)

The projects in this version of the module should focus on research relevant to the department for Medical Bioinformatics with a focus on patient-data acquisition. This includes topics such as analysis of (multi-)omics data in particular from genomics and transcriptomics studies, analysis of clinical and medical sensor time-series data and AI methods for computational drug design. A main methodological focus is on modality specific processing of information (e.g., predicting safety or efficacy of drugs, identifying changes of a cell’s state from gene expression level) and knowledge acquisition (such as in dynamic event classification in patient data or data fusion of omics data to infer disease mechanisms). Due to the large amounts of data available from experimental collaborators and public sources, machine learning approaches often have non-trivial algorithmic aspects and/or use parallel computation (for example federated learning including privacy guarantees).

Contact: Alexander Schliep (alexander@schlieplab.org).

Bioinformatics advanced seminar: Master's and doctoral students

Contact: Alexander Schliep (alexander@schlieplab.org).

 

SoSe 2024

Bioinformatics: Artifical Intelligence and Algorithmic Approaches

The course will provide an introduction to modern bioinformatics and to selected applications from biology and medicine addressed with with computational approaches based on classical algorithms and statistical machine learning, as well as modern deep learning approaches. The focus will be on four fundamental problem areas:

Comparing sequences: Sequence alignment algorithms, Genomescale approaches using index data structures, Alignment-free methods

Analyzing gene expression: alignment-based and alignment-free methods to analyzing RNASeq, single-cell analysis

Signals in sequences: identification of motifs, accessibility, and modification of DNA Sequence variations and relation to phenotypes: structural variants in disease, pan-genome approaches

Detailed information for participants is available at https://www.b-tu.de/elearning/btu/course/view.php?id=12912

Contact: Alexander Schliep (alexander@schlieplab.org). Teaching assistants: Nathalie Gocht, Aleksandra Khatova.

Artificial Intelligence for Drug Design

AI is revolutionizing drug design both for small molecule drugs - the prevalent drug modality - and novel modalities such as oligonucleotide therapeutics. Some of the progress has been achieved by transferring methods from established AI areas such as NLP. For other areas novel methodological developments were instrumental, with very exciting developments on the intersection between molecular dynamics and AI. The focus of the seminar will be on state-of-the-art methods and applications of AI in drug design for small molecule drugs and oligonucleotide therapeutics.

After successfully completing the module, students have insight into this exciting field of application for Artificial Intelligence (AI). They are able to acquire research literature and to present the topic orally as well as in a written report.

Detailed information available for participants at https://www.b-tu.de/elearning/btu/course/view.php?id=12913

Contact: Alexander Schliep (alexander@schlieplab.org). Teaching assistants: Nathalie Gocht, Aleksandra Khatova.

Research Module in Artificial Intelligence: Specialization Medical Bioinformatics

After successfully completing the module, students have an overview on how to solve large-scale computational problems in data science and machine learning. They know parallel approaches from multi-threaded computation on individual machines to implicit parallelism frameworks on compute clusters. They are familiar with algorithms and data structures supporting efficient exact or approximate (e.g. sketching) computation with massive data sets in and out of core. They are able to implement the algorithms. They can assess which methods can be used in a given situation.

The focus will be on the following areas: A review of memory-compute co-location and its impact on big data computations; Solving Machine Learning (ML) work loads using explicit parallelism, specifically multi-threaded computation on an individual machine; Introduction of implicit parallelism programming models as implemented for example in MapReduce, Spark and Ray and their application in ML; Sketching algorithms (e.g. CountMinSketch, HyperLogLog) or Bloom filters; Implementing ML methods using index data structures such as suffix or kd-trees.

Detailed information for participants is available at https://www.b-tu.de/elearning/btu/course/view.php?id=12914

Contact: Alexander Schliep (alexander@schlieplab.org). Teaching assistants: Nathalie Gocht, Aleksandra Khatova.

 

WiSe 2023/24

Research Module in Artificial Intelligence: Specialization Medical Bioinformatics

The projects in this version of the module should focus on research relevant to the department for Medical Bioinformatics with a focus on patient-data acquisition. This includes topics such as analysis of (multi-)omics data in particular from genomics and transcriptomics studies, analysis of clinical and medical sensor time-series data and AI methods for computational drug design. A main methodological focus is on modality specific processing of information (e.g., predicting safety or efficacy of drugs, identifying changes of a cell’s state from gene expression level) and knowledge acquisition (such as in dynamic event classification in patient data or data fusion of omics data to infer disease mechanisms). Due to the large amounts of data available from experimental collaborators and public sources, machine learning approaches often have non-trivial algorithmic aspects and/or use parallel computation (for example federated learning including privacy guarantees).

Detailed information available for participants at https://www.b-tu.de/elearning/btu/course/view.php?id=12437

Contact: Alexander Schliep (alexander@schlieplab.org).

Seminar Bioinformatics

After successfully completing the module, students will be familiar with state-of-the-art problems and methodological approaches used in medical bioinformatics. They will have the ability to familiarize themselves with current research in medical bioinformatics from original research literature, to participte in a technical discussion within the context of international science and present scientific content in written and oral form.

Students will learn about specific state-of-the-art problems and methodological approaches used in medical bioinformatics. The applications will range from diagnostics and monitoring patients with sensor, clinical and omics data, to detect clinically relevant states or understand cellular processes relevant to diagnosis and disease as well mechanisms for treating diseases. Methods will include both algorithmic and machine learning approaches.

Students will present a topic based on—typically—one original research article chosen from a list of suggestions made available here and prepare a report on the same topic. Attending all talks by other students and participating in discussions is expected for a passing grade.

Detailed information for participants is available at https://www.b-tu.de/elearning/btu/course/view.php?id=12550

Contact: Alexander Schliep (alexander@schlieplab.org). Teaching assistants: Nathalie Gocht, Aleksandra Khatova.

Computing at Scale in Machine Learning: Distributed computing and algorithmic approaches

After successfully completing the module, students have an overview on how to solve large-scale computational problems in data science and machine learning. They know parallel approaches from multi-threaded computation on individual machines to implicit parallelism frameworks on compute clusters. They are familiar with algorithms and data structures supporting efficient exact or approximate (e.g. sketching) computation with massive data sets in and out of core. They are able to implement the algorithms. They can assess which methods can be used in a given situation.

The focus will be on the following areas: A review of memory-compute co-location and its impact on big data computations; Solving Machine Learning (ML) work loads using explicit parallelism, specifically multi-threaded computation on an individual machine; Introduction of implicit parallelism programming models as implemented for example in MapReduce, Spark and Ray and their application in ML; Sketching algorithms (e.g. CountMinSketch, HyperLogLog) or Bloom filters; Implementing ML methods using index data structures such as suffix or kd-trees.

Contact: Alexander Schliep (alexander@schlieplab.org). Teaching assistants: Nathalie Gocht, Aleksandra Khatova.

 

SoSe 2023

Artificial Intelligence for Drug Design (Modul 13979)

AI is revolutionizing drug design both for small molecule drugs - the prevalent drug modality - and novel modalities such as oligonucleotide therapeutics. Some of the progress has been achieved by transferring methods from established AI areas such as NLP. For other areas novel methodological developments were instrumental, with very exciting developments on the intersection between molecular dynamics and AI. The focus of the seminar will be on state-of-the-art methods and applications of AI in drug design for small molecule drugs and oligonucleotide therapeutics.

After successfully completing the module, students have insight into this exciting field of application for Artificial Intelligence (AI). They are able to acquire research literature and to present the topic orally as well as in a written report.

Detailed information available for participants at https://www.b-tu.de/elearning/btu/course/view.php?id=11573

Contact: Alexander Schliep (alexander@schlieplab.org). Teaching assistants: Nathalie Gocht.

Bioinformatics: Artificial Intelligence and Algorithmic Approaches (Modul 13978)

The course will provide an introduction to modern bioinformatics and to selected applications from biology and medicine addressed with with computational approaches based on classical algorithms and statistical machine learning, as well as modern deep learning approaches. The focus will be on four fundamental problem areas:

Comparing sequences: Sequence alignment algorithms, Genomescale approaches using index data structures, Alignment-free methods

Analyzing gene expression: alignment-based and alignment-free methods to analyzing RNASeq, single-cell analysis

Signals in sequences: identification of motifs, accessibility, and modification of DNA Sequence variations and relation to phenotypes: structural variants in disease, pan-genome approaches

Detailed information available for participants at https://www.b-tu.de/elearning/btu/course/view.php?id=11572

Contact: Alexander Schliep (alexander@schlieplab.org). Teaching assistants: Nathalie Gocht.

 

P4 2022/23

Computational Techniques for Large-Scale Data (DIT 065 / DAT470)

Techniques for Large-Scale Data is a required class in the Master’s Program in Applied Data Science at Gothenburg University and open to students in other programs.

The aim of this course is to deepen the students’ knowledge and skills and familiarize them with the technical and technological side of data science, including software respectively hardware environments. The course will introduce aspects of designing and implementing large-scale data science solutions.

Detailed information available for participants at https://chalmers.instructure.com/courses/18200.

Contact: Alexander Schliep (alexander@schlieplab.org).

 

P1 2020/21

Introduction to Data Science (DIT 852)

Introduction to Data Science is a required class in the Master’s Program in Applied Data Science at Gothenburg University,

The course gives an introduction to applied data science using case studies from different application domains and a primer in using Python for data science. In particular the following topics will be covered: revision of mathematical and statistical concepts useful in data science, such as basic set theory, logic, and probability theory; case studies of data science applications displaying a range of application areas and fundamental types of analysis problems; a brief introduction to working in a Unix/Linux environment and programming in Python to implement basic transformations, visualizations and analyses; use of a machine learning library from Python to analyze data; demonstration of inherent limitations of computational analyzes with examples; implications on privacy and ethical considerations.

These three large streams—data science fundamentals and methods, data science in Python, and case studies for uses, impact and societal impact of data science will be followed (mostly) concurrently throughout the course.

Detailed information for participants at https://canvas.gu.se/courses/37381

Contact: Alexander Schliep (alexander@schlieplab.org). Additional lecturers: Alexander Schliep.

 

P4 2019/20

Techniques for Large-Scale Data (DIT873/DAT346)

Techniques for Large-Scale Data is a required class in the Master’s Program in Applied Data Science at Gothenburg University and open to students in other programs.

The aim of this course is to deepen the students’ knowledge and skills and familiarize them with the technical and technological side of data science, including relevant data models, and software respectively hardware environments. The course will introduce aspects of designing and implementing large-scale data science solutions.

With Graham Kemp and Marwa Naili. Detailed information for participants at https://chalmers.instructure.com/courses/9360

Contact: Alexander Schliep (alexander@schlieplab.org).

 

P1 2019/20

Introduction to Data Science (DIT 852)

Introduction to Data Science is a required class in the Master’s Program in Applied Data Science at Gothenburg University,

The course gives an introduction to applied data science using case studies from different application domains and a primer in using Python for data science. In particular the following topics will be covered: revision of mathematical and statistical concepts useful in data science, such as basic set theory, logic, and probability theory; case studies of data science applications displaying a range of application areas and fundamental types of analysis problems; a brief introduction to working in a Unix/Linux environment and programming in Python to implement basic transformations, visualizations and analyses; use of a machine learning library from Python to analyze data; demonstration of inherent limitations of computational analyzes with examples; implications on privacy and ethical considerations.

These three large streams—data science fundamentals and methods, data science in Python, and case studies for uses, impact and societal impact of data science will be followed (mostly) concurrently throughout the course.

Detailed information for participants at https://canvas.gu.se/courses/25009

Contact: Alexander Schliep (alexander@schlieplab.org).

 

P4 2018/19

Techniques for Large-Scale Data (DIT872/DAT345)

Techniques for Large-Scale Data is a required class in the Master’s Program in Applied Data Science at Gothenburg University and open to students in other programs.

The aim of this course is to deepen the students’ knowledge and skills and familiarize them with the technical and technological side of data science, including relevant data models, and software respectively hardware environments. The course will introduce aspects of designing and implementing large-scale data science solutions.

With Graham Kemp.

Detailed information for participants at https://chalmers.instructure.com/courses/4088

Contact: Alexander Schliep (alexander@schlieplab.org).

 

P1 2018/19

Introduction to Data Science (DIT 852)

Introduction to Data Science is a required class in the Master’s Program in Applied Data Science at Gothenburg University,

The course gives an introduction to applied data science using case studies from different application domains and a primer in using Python for data science. In particular the following topics will be covered: revision of mathematical and statistical concepts useful in data science, such as basic set theory, logic, and probability theory; case studies of data science applications displaying a range of application areas and fundamental types of analysis problems; a brief introduction to working in a Unix/Linux environment and programming in Python to implement basic transformations, visualizations and analyses; use of a machine learning library from Python to analyze data; demonstration of inherent limitations of computational analyzes with examples; implications on privacy and ethical considerations.

These three large streams—data science fundamentals and methods, data science in Python, and case studies for uses, impact and societal impact of data science will be followed (mostly) concurrently throughout the course.

Detailed information for participants at https://canvas.gu.se/courses/18446

Contact: Alexander Schliep (alexander@schlieplab.org).

 

P4 2017/18

Techniques for Large-Scale Data (DIT871/DAT345)

Techniques for Large-Scale Data is a required class in the Master’s Program in Applied Data Science at Gothenburg University and open to students in other programs.

The aim of this course is to deepen the students’ knowledge and skills and familiarize them with the technical and technological side of data science, including relevant data models, and software respectively hardware environments. The course will introduce aspects of designing and implementing large-scale data science solutions.

With Graham Kemp.

Contact: Alexander Schliep (alexander@schlieplab.org).

 

P2 2017/18

Computational methods in bioinformatics

We contribute one lecture to Graham Kemp's Computational methods in bioinformatics TDA507/DIT741 course. See http://www.cse.chalmers.se/~kemp/teaching/TDA507/2017-2018/ for further information.

Contact: Alexander Schliep (alexander@schlieplab.org). Additional lecturers: Alexander Schliep.

Statistical Methods for Data Science (DIT 861)

The course gives an introduction to the theory of probability and statistics, data analysis using descriptive statistics and data visualization, and applications of probabilistic modeling in data science. In the course, the following broad areas will be covered: data analysis including descriptive statistics and data visualization, probability theory including basic probability calculations, random variables, distributions, statistical methods including point and interval estimates, hypothesis testing, bootstrapping, probabilistic models in data science applications, such as Naive Bayes classifiers and topic models for text or Hidden Markov Models for sequences.

Co-taught with Richard Johansson (richard.johansson@cse.gu.se; course responsible).

Participants can see details at https://nordunet.instructure.com/courses/507.

Contact: Alexander Schliep (alexander@schlieplab.org). Additional lecturers: Alexander Schliep.

 

P1 2017/18

Introduction to Data Science (DIT 851)

Introduction to Data Science is a required class in the Master’s Program in Applied Data Science at Gothenburg University,

The course gives an introduction to applied data science using case studies from different application domains and a primer in using Python for data science. In particular the following topics will be covered: revision of mathematical and statistical concepts useful in data science, such as basic set theory, logic, and probability theory; case studies of data science applications displaying a range of application areas and fundamental types of analysis problems; a brief introduction to working in a Unix/Linux environment and programming in Python to implement basic transformations, visualizations and analyses; use of a machine learning library from Python to analyze data; demonstration of inherent limitations of computational analyzes with examples; implications on privacy and ethical considerations.

These three large streams—data science fundamentals and methods, data science in Python, and case studies for uses, impact and societal impact of data science will be followed (mostly) concurrently throughout the course.

Detailed information for participants at https://nordunet.instructure.com/courses/455

Contact: Alexander Schliep (alexander@schlieplab.org). Additional lecturers: Alexander Schliep.

 

P2 2016/17

Advanced Algorithms

TDA251/DIT280 is part of the master's programme Computer Science - Algorithms, Languages and Logic (MPALG) but is also of interest for various other programmes. The goal is to learn selected advanced techniques in the design and analysis of algorithms. It will continue in the spirit of the first Algorithms course and maintain a rigorous analytical style. It is assumed that you are taking this course because you like the subject and you want to gain a deeper understanding of more specialized topics in algorithms, not for a guide on how to implement them. At some points in the course we may even touch on frontiers of current research. We may also have possibilities to reflect upon the proper use of algorithms in reality, besides their mathematical aspects.

Co-taught with Peter Damaschke, Devdatt Dubhashi, Alexander Schliep and Muhammad Azam Sheikh.

Detailed information at http://www.cse.chalmers.se/edu/year/2016/course/TDA251/.

Contact: Alexander Schliep (alexander@schlieplab.org).

Computational methods in bioinformatics

We contribute two lectures to Graham Kemp's Computational methods in bioinformatics TDA507/DIT741 course. See http://www.cse.chalmers.se/~kemp/teaching/TDA507/2016-2017/ for further information.

Contact: Alexander Schliep (alexander@schlieplab.org).

 

Fall 2015

Introduction to Discrete Structures I

Organisation: CS course number: 01:198:205, Sections 01, 02, 03

Prerequisites: 01:198:111 and 01:640:152. Credit not given for this course and 14:332:312. Please note that courses for which a student has received a grade of D cannot be used to satisfy prerequisite requirements.

Description: Provides the background in logic, mathematical proof techniques and combinatorics required in Discrete Structures II and in the design and analysis of algorithms. Basic Set Notation, Propositional Logic, Truth Tables, Boolean Circuits. First-Order Logic, Predicates, Quantifiers. Mathematical Induction: Program Correctness, Trees, Grammars. Relations: Closures of relations. Orders, Equivalence Relations, Functions Finite-State Machines.

Expected work: Weekly assignments, 2 midterms, quizzes and active participations using iclicker (version one with multiple choices answers A-E suffices), final exam

Textbook: Adapted version of Rosen: Discrete Math and its Applications (McGraw Hill, most recent edition).

Course Website: More information at http://bioinformatics.rutgers.edu/Teaching/F15DiscreteStructuresI/. There is also a Sakai site for participants

Contact: Alexander Schliep (alexander@schlieplab.org).

 

Spring 2015

Introduction to Bioinformatics

Organisation: CS course number: 16:198:671:01, CBMB course number: 16:118:617:03. Time: Thursdays 3:20-6:20pm. Room: Hill 260.

Class starts 1/22

NOTE: This course is designed at the 500 level for first-year graduate students and advanced undergraduate students interested in modern biology applications or, more generally, interested in machine learning or statistical algorithms (e.g. Hidden Markov Models). No biology background is required. Even though the class has a 674-number it may be used to satisfy the Category B requirement.

Description: The field of Bioinformatics is primarily concerned with the analysis of data from molecular biology using methods from computer science---algorithms and machine learning---and from computational statistics. Its development reflects the immense continuing change of biology and the rapid advances in experimental techniques, exemplified by the invention of DNA sequencing only 36 years ago, the completion of the Human genome not quite a decade ago and our personal genome sequences in the very near future. The biological questions we will answer range from deciding whether two proteins have a common ancestor and how we rapidly identify such proteins in large databases to assembly of genomes from sequencing data.

Course Website: http://bioinformatics.rutgers.edu/Teaching/S15IntroToBioinformatics

Contact: Alexander Schliep (alexander@schlieplab.org). Additional lecturers: Alexander Schliep.

 

Fall 2014

Introduction to Discrete Structures II

Organisation: CS course number: 01:198:206, Sections 01, 02, HN

Prerequisites: 01:198:205 or 14:332:202; 01:640:152. Please note that courses for which a student has received a grade of D cannot be used to satisfy prerequisite requirements.

Description: Provides the background in combinatorics and probability theory required in design and analysis of algorithms, in system analysis, and in other areas of computer science. Counting: Binomial Coefficients, Permutations, Combinations, Partitions. Recurrence Relations and Generating Functions. Discrete Probability: Events and Random Variables; Conditional Probability, Independence; Expectation, Variance, Standard Deviation; Binomial, Poisson and Geometric Distributions; law of large numbers. Some Topics from Graph Theory: Paths, Components, Connectivity, Euler Paths, Hamiltonian Paths, Planar Graphs, Trees.

Expected work: Weekly assignments, 2 midterms, quizzes and active participations using iclicker (version one with multiple choices answers A-E suffices), final exam

Textbook: Sheldon Ross: A first course in Probability (Prentice Hall, most recent edition). Note that earlier editions should work too.

Course Website: More information at http://bioinformatics.rutgers.edu/Teaching/F14DiscreteStructuresII/. There is also a Sakai site for participants

Contact: Alexander Schliep (alexander@schlieplab.org).

 

Spring 2014

Introduction to Bioinformatics

Organisation: CS course number: 16:198:671:01, CBMB course number: 16:118:617:03. Time: Thursdays 3:20-6:20pm. Room: Hill 260.

NOTE: This course is designed at the 500 level for first-year graduate students and advanced undergraduate students interested in modern biology applications or, more generally, interested in machine learning or statistical algorithms (e.g. Hidden Markov Models). No biology background is required. Even though the class has a 674-number it may be used to satisfy the Category B requirement.

Description: The field of Bioinformatics is primarily concerned with the analysis of data from molecular biology using methods from computer science---algorithms and machine learning---and from computational statistics. Its development reflects the immense continuing change of biology and the rapid advances in experimental techniques, exemplified by the invention of DNA sequencing only 36 years ago, the completion of the Human genome not quite a decade ago and our personal genome sequences in the very near future. The biological questions we will answer range from deciding whether two proteins have a common ancestor and how we rapidly identify such proteins in large databases to assembly of genomes from sequencing data.

Course Website: http://bioinformatics.rutgers.edu/Teaching/S14IntroToBioinformatics

Contact: Alexander Schliep (alexander@schlieplab.org).

 

Fall 2013

Introduction to Discrete Structures II

Organisation: CS course number: 01:198:206, Sections 01, 02, HN

Prerequisites: 01:198:205 or 14:332:202; 01:640:152. Please note that courses for which a student has received a grade of D cannot be used to satisfy prerequisite requirements.

Description: Provides the background in combinatorics and probability theory required in design and analysis of algorithms, in system analysis, and in other areas of computer science. Counting: Binomial Coefficients, Permutations, Combinations, Partitions. Recurrence Relations and Generating Functions. Discrete Probability: Events and Random Variables; Conditional Probability, Independence; Expectation, Variance, Standard Deviation; Binomial, Poisson and Geometric Distributions; law of large numbers. Some Topics from Graph Theory: Paths, Components, Connectivity, Euler Paths, Hamiltonian Paths, Planar Graphs, Trees.

Expected work: Weekly assignments, 2 midterms, quizzes and active participations using iclicker (version one with multiple choices answers A-E suffices), final exam

Textbook: Sheldon Ross: A first course in Probability (Prentice Hall, 9th edition). Note that earlier editions should work too.

Course Website: More information at http://bioinformatics.rutgers.edu/Teaching/F13DiscreteStructuresII/. There is also a Sakai site for participants

Contact: Alexander Schliep (alexander@schlieplab.org). Additional lecturers: Alexander Schliep. Teaching assistants: John Wiedenhoeft.

 

Spring 2013

Introduction to Bioinformatics

Organisation: CS course number: 16:198:674:01, CBMB course number: 16:118:617:03. Time: Thursdays 3:20-6:20pm. Room: Hill 264.

NOTE: This course is designed at the 500 level for first-year graduate students and advanced undergraduate students interested in modern biology applications or, more generally, interested in machine learning or statistical algorithms (e.g. Hidden Markov Models). No biology background is required. Even though the class has a 674-number it may be used to satisfy the Category B requirement.

Description: The field of Bioinformatics is primarily concerned with the analysis of data from molecular biology using methods from computer science---algorithms and machine learning---and from computational statistics. Its development reflects the immense continuing change of biology and the rapid advances in experimental techniques, exemplified by the invention of DNA sequencing only 36 years ago, the completion of the Human genome not quite a decade ago and our personal genome sequences in the very near future. The biological questions we will answer range from deciding whether two proteins have a common ancestor and how we rapidly identify such proteins in large databases to assembly of genomes from sequencing data.

Course Website: http://bioinformatics.rutgers.edu/Teaching/S13IntroToBioinformatics

Contact: Alexander Schliep (alexander@schlieplab.org).

 

Spring 2012

Introduction to Bioinformatics

Organisation: CS course number: 16:198:674:01, CBMB course number: 16:118:617:03. Time: Thursdays 3:20-6:30pm. Room: Hill 264.

NOTE: This course is designed at the 500 level for first-year graduate students and advanced undergraduate students interested in modern biology applications or, more generally, interested in machine learning or statistical algorithms (e.g. Hidden Markov Models). No biology background is required. Even though the class has a 674-number it may be used to satisfy the Category B requirement.

Description: The field of Bioinformatics is primarily concerned with the analysis of data from molecular biology using methods from computer science---algorithms and machine learning---and from computational statistics. Its development reflects the immense continuing change of biology and the rapid advances in experimental techniques, exemplified by the invention of DNA sequencing only 36 years ago, the completion of the Human genome not quite a decade ago and our personal genome sequences in the very near future. The biological questions we will answer range from deciding whether two proteins have a common ancestor and how we rapidly identify such proteins in large databases to assembly of genomes from sequencing data.

Course Website: http://bioinformatics.rutgers.edu/Teaching/S12IntroToBioinformatics

Contact: Alexander Schliep (alexander@schlieplab.org).

 

Fall 2011

Introduction to Discrete Structures II

Organisation: CS course number: 01:198:206, Sections 01, 02, HN

Prerequisites: 01:198:205 or 14:332:202; 01:640:152. Please note that courses for which a student has received a grade of D cannot be used to satisfy prerequisite requirements.

Description: Provides the background in combinatorics and probability theory required in design and analysis of algorithms, in system analysis, and in other areas of computer science. Counting: Binomial Coefficients, Permutations, Combinations, Partitions. Recurrence Relations and Generating Functions. Discrete Probability: Events and Random Variables; Conditional Probability, Independence; Expectation, Variance, Standard Deviation; Binomial, Poisson and Geometric Distributions; law of large numbers. Some Topics from Graph Theory: Paths, Components, Connectivity, Euler Paths, Hamiltonian Paths, Planar Graphs, Trees.

Expected work: Weekly assignments, 1 or 2 tests, Final Exam

Textbook: Sheldon Ross: A first course in Probability (Prentice Hall, 8th edition). Note that earlier editions should work too.

Course Website: More information at http://bioinformatics.rutgers.edu/Teaching/F11DiscreteStructuresII/. There is also a Sakai site for participants

Contact: Alexander Schliep (alexander@schlieplab.org).

 

Fall 2010

Introduction to Bioinformatics

Organisation: CS course number: 16:198:674:01, CBMB course number: 16:118:617:03.<br> Tuesday & Thursday, 1:40-3:00pm, Hill 262.

NOTE: This course is designed at the 500 level for first-year graduate students interested in modern biology applications or, more generally, interested in machine learning or statistical algorithms. No biology background is required.

Description: The field of Bioinformatics is primarily concerned with the analysis of data from molecular biology using methods from computer science---algorithms and machine learning---and from computational statistics. Its development reflects the immense continuing change of biology and the rapid advances in experimental techniques, exemplified by the invention of DNA sequencing only 36 years ago, the completion of the Human genome not quite a decade ago and our personal genome sequences in the very near future. The biological questions we will answer range from deciding whether two proteins have a common ancestor and how we rapidly identify such proteins in large databases to reconstructing the sequence of genome modifications leading to cancerous growth of cells.

Course Website: See http://bioinformatics.rutgers.edu/Teaching/F10IntroToBioinformatics

Contact: Alexander Schliep (alexander@schlieplab.org).

 

Spring 2010

Light seminar: Bioinformatics for next-generation sequencing

Organisation: Course number: 16:198:500:07, Tuesdays 12-1 pm in Room Hill 260

Description: Computational tools have become central to the modern development of molecular biology my using CS, mathematics, and statistics to help solve fundamental problems.

This seminar is intended as an introduction to the field, with an emphasis on recent developments.

Grading: Pass/fail based on attendance and presentation.

Note: Jointly with Kevin Chen (Genetics/BioMaPS). See Sakai Website for schedule and further information.

Contact: Alexander Schliep (alexander@schlieplab.org).

 

Fall 2009

Hidden Markov Models in Biology

Organisation Course number: 16:198:672, Time: Thursdays 3:20-6:20 in Room: Hill 260 (BioMaPS seminar room)

Description: Hidden Markov Models (HMMs) are an important class of stochastic models which were first applied and popularized in the context of automatic speech recognition by Rutgers' distinguished faculty member Lawrence Rabiner and his colleagues at Bell labs.

In recent years they found wide-spread use in analysis of data from molecular biology. They constitute the state-of-the-art in searching for remote homolog protein sequences and, with mild extensions, in identifying genes (or other signals) in DNA sequences. In addition to these stochastic models of sequences of discrete symbols, continuous-valued sequences can be modeled. These arise for example from time-course experiments measuring gene expression during the cell-cycle, in response to stimuli or during the development of organisms. Similarly the analysis of genomic tiling array data for chromosomal aberrations, location of transcription factor binding sites or identification of expressed regions is a typical segmentation problem for which HMMs are very well suited. The popularity of HMMs is based on their ease in terms of creating complex models, their simple stochastic structure and their polynomial-time algorithms for all relevant operations. Nevertheless, there have been exciting recent developments to arrive at reasonable running time and memory requirements even for genome-sized data sets. Similarly, theoretical aspects of HMMs and their estimation or training are still an active area of research. In this course we will introduce the necessary theory, the relevant algorithmic developments, and, through hands-on projects using the GHMM (http://ghmm.org), some of the engineering aspects of solving computational biology problems with HMMs. An emphasis will be put on recent developments in the field.

For CS students: note that HMMs are applied to a wide range of non-biological problems from fault detection in computer systems, over handwriting recognition, to predicting crises in the middle East based on newsfeed-data.

Prerequisites: As I expect an interdisciplinary audience I will not impose strict course requirements. You will need some elementary algorithms, linear algebra, discrete math and probability theory background and some programming experience. Note that all probability theory required will be reviewed. For CS students: While the examples in the class are all from biology, I will make sure that they are accessible to students without any biology background.

Grading: Attendance and participation is expected and will count toward the grade. Other components are problem sets which will have to be handed in, class project(s) and a preparation of a term presentation based on an original research paper.

Contents: Review of elementary probability and information theory, Markov chains, positional weight matrices and sequence logos, Hidden Markov Models, forward/backward-algorithm, Viterbi and posterior decoding, Baum-Welch training (i.e., the Expectation Maximization algorithm), pair-wise sequence alignments, a probabilistic interpretation of alignment scores, analyzing DNA sequences: gene finding with labelled HMMs and k-best decoding, finding transcription factors binding sites, detecting remote homolog protein sequences with profile HMMs, Dirichlet priors, HMMs for continous-valued observations, analyzing gene expression time-courses with HMMs, mixture models, analyzing tiling DNA micorarray data. Depending on interests and class composition further topics include: distance functions between HMMs, identifyability, learning HMM topology, numerically stable implementations of the basic algorithms, efficient MCMC approaches for full Bayesian analysis with HMMs, memory-efficient versions of the Viterbi, algorithmic improvements for repetitive sequences.

Class website: See the Sakai Wiki for the schedule, class notes, individual chapters of (a) textbook and original papers

Contact: Alexander Schliep (alexander@schlieplab.org).

 

WS 2006/07

Applied Data Mining

Format: Blockveranstaltung 5.- 16.03.2007 ganztägig im PC-Pool, 3. Stock, MPI. 5 credits, Schwerpunktbereiche C und D.

Vorbesprechung: 8.2.2007, Raum 331, Turm 2, 3. Stock, MPI für Molekulare Genetik. 15:00-16:00 Uhr

Inhalt: In diesem praktischen Kompaktkurs wird den teilnehmenden Studierenden die Möglichkeit geboten, die in Seminaren bzw. Vorlesungen zur statistischen Mustererkennung bzw. zum Data Mining erworbenen Kenntnisse durch praktische Analyse exemplarischer molekularbiologischer Datensätze zu vertiefen und zu ergänzen. Der thematische Schwerpunkt sind Hidden-Markov- Modelle. Elementare Programmierkenntnisse (C, C++, Python) werden vorausgesetzt, der Schwerpunkt liegt aber auf dem Erlernen der Data Mining Methodik.Für die Implementation der behandelten Methoden stellt die Bibliothek GHMM (http://ghmm.org) die benötigten Algorithmen und Datenstrukturen zur Verfügung. Die verbleibenden Programmieraufgaben beziehen sich auf problemspezifische Adaptionen, Datenaufbereitung und Ergebnisvisualisierung. Während des ganztägigen Kompaktkurses sollen die Teilnehmer/innen ausserdem die Arbeit in einem Team kennenlernen. Ein Leistungsnachweis wird durch die Präsentation der Projektergebnisse erworben.

Voraussetzungen: Erfolgreiche Teilnahme an der Vorlesung "Algorithmische Bioinformatik"; Statistik-Kenntnisse.

Contact: Alexander Schliep (alexander@schlieplab.org). Additional lecturers: Benjamin Georgi. Teaching assistants: Ivan G Costa, Janne Grunau.

 

WS 2005/06

Algebraic Statistics

Format: Bookseminar. Thursdays 14:00-16:00 SR, 3. Stock, MPI. 2 SWS, Nr. 19714 im FU-KVV. Anrechenbar in Schwerpunkt C und D.

Inhalt: In this seminar we will work through the book Algebraic Statistics for Computational Biology" by Lior Pachter and Bernd Sturmfels (eds.). The format will be a book-seminar in which all participants have to read the material every time and turns are taken with giving presentations; several talks per participant are to be expected. This seminar is targeted at students at the final Master's resp. graduate level and will not review elementary material. Prerequesites are statistics and algebra and applications of statistical models in bioinformatics.

Contact: Alexander Schliep (alexander@schlieplab.org).

Applied Data Mining

Format: Blockveranstaltung 6.- 17.03.2006 ganztägig im PC-Pool, 3. Stock, MPI. 5 credits (Schwerpunktbereiche C und D).

Vorlesungen:

  • 6.03.06: 10:00 - 12:00 Seminarraum, EG MPI für Molekulare Genetik.
  • 7.03.06: 10:30 - 12:30 Seminarraum, EG MPI für Molekulare Genetik.
  • 8.03.06: 10:00 - 12:00 Seminarraum, EG MPI für Molekulare Genetik.
  • 9.03.06: 10:00 - 12:00 Seminarraum, 3. Stock MPI für Molekulare Genetik.
  • 13.03.06: 10:00 - 12:00 Seminarraum, EG MPI für Molekulare Genetik.
  • 14.03.06: 10:00 - 12:00 PC Pool, 3. Stock, MPI für Molekulare Genetik.
  • 15.03.06: 10:00 - 12:00 Seminarraum, EG MPI für Molekulare Genetik.

Vorbesprechung: 18.1.2006, Raum 331, Turm 2, 3. Stock, MPI für Molekulare Genetik. 15:00-16:00 Uhr

Inhalt: In diesem praktischen Kompaktkurs wird den teilnehmenden Studierenden die Möglichkeit geboten, die in Seminaren bzw. Vorlesungen zur statistischen Mustererkennung bzw. zum Data Mining erworbenen Kenntnisse durch praktische Analyse exemplarischer molekularbiologischer Datensätze zu vertiefen und zu ergänzen. Die thematischen Schwerpunkte sind Support-Vector-Maschinen und Hidden-Markov- Modelle. Elementare Programmierkenntnisse (C, C++, Python, R) werden vorausgesetzt, der Schwerpunkt liegt aber auf dem Erlernen der Data Mining Methodik.

Voraussetzungen: Erfolgreiche Teilnahme an einem der Seminare "Elements of Statistical Learning" oder "Clusteranalyse heterogener Daten", sowie erfolgreiche Teilnahme an der Vorlesung "Algorithmische Bioinformatik".

Details: Microarrays, Proteinstrukturen, Text Mining --- die Liste ist fortführbar. Das Erkennen komplexer Strukturen in hochdimensionalen Räumen ist ein immer wiederkehrendes Problem in der Bioinformatik. Support Vector Machines (SVM) sind eine junge aber sehr erfolgreiche Klassifikationsmethode, mit deren Hilfe viel Probleme erfolgreich bearbeitet werden können.

Ähnlich verhält es sich mit Hidden Markov Models (HMM). Auf Ihnen basieren z.B. die Standardmethoden für Gen-Vorhersage und das Auffinden homologer Proteinsequenzen. Neben zahlreichen Anwendungen in der Sequenzanalyse, werden Sie auch in für die Analyse von Microarrays eingesetzt.

Die erste Woche des Softwarepraktikums beschäftigt sich mit SVM für Regression und Klassifikation, die zweite Woche mit HMM und der Analyse --- Annotation und Klassifikation --- von biologischen Sequenzen.

Die beiden Blöcke haben analogen Aufbau. Am ersten Tag wird eine Einführung gegeben, am letzten Tag steht Zeit zum Schreiben eines Berichtes zur Verfügung. Jeweils von Dienstag bis Donnerstag werden morgens praktische Aspekte erläutert. Danach wird in kleinen praktischen Schritten auf ein für diesen Tag von den Teilnehmern selbständig zu bearbeitendes Problem hingeführt.

Die Aufgaben sind individuell zu bearbeiten. Die Ergebnisse werden in einem Bericht zusammengefasst. Gruppenarbeit ist erlaubt, die Beiträge Einzelner müssen jedoch klar gekennzeichnet sein. Am Ende jedes Blocks findet bei jedem Teilnehmer die Begutachtung der Implementierung einer der drei Aufgaben statt.

Aus der Güte der Begutachtung und des schriftlichen Berichtes ergeben sich die Note.

Voraussetzungen: Erfolgreiche Teilnahme an der Vorlesung "Algorithmische Bioinformatik"; Statistik-Kenntnisse.

Contact: Alexander Schliep (alexander@schlieplab.org). Additional lecturers: Alexander Schliep, Benjamin Georgi. Teaching assistants: Janne Grunau.

Algorithmische Bioinformatik

Inhalt: Vorlesungen zu Hidden Markov Modellen (Einführung, Profil-HMMs) und Gene Finding im Rahmen der Vorlesung Algorithmische Bioinformatik (Reinert et al.)

Contact: Alexander Schliep (alexander@schlieplab.org).

 

SS 2005

Information theoretic methods in bioinformatics

Format: Book seminar. Th 14:00 - 16:00, SR, 3. Stock, MPI. 2 SWS (Anrechenbar in Schwerpunkt C und D)

Description: In this seminar we will review information theoretic appraches based on the book "Information Theory, Inference and Learning Algorithms" by David J.c. MacKay. The format will be a book-seminar in which all participants have to read the material every time and turns are taken with giving presentations. Depending on the number of participants several talks are to be expected. Towards the end we will focus on information theoretic approaches for HMMs using original literature, using the book-seminar format. This seminar is targeted at students at the final Master's resp. graduate level and will not review elementary material. Prerequesites are successful attendance of "Algorithmische Bioinformatik" and "Hidden Markov Models" or further advanced course work. A good working knowledge of HMMs and Bayesian statistics in general is required.

Contact: Alexander Schliep (alexander@schlieplab.org).

Implementing information theoretic methods in bioinformatics

Format: Programming project after the end of the seminar. Grading based on resulting software. For Masters students (Anrechenbar in den Schwerpunkten C und D.).

Inhalt: In this practical course we will implement selected methods covered in the book seminar "Information theoretic methods in bioinformatics" and perform numerical experiments to further investigate theoretical observations. A particular emphasis will be put on information theoretic methods as applied to clustering problems and, more generally, in the context of HMMs. This practical course is targeted at students at the final Master's resp. graduate level. Prerequesites are successful attendance of "Algorithmische Bioinformatik" and "Hidden Markov Models" or further advanced course work. A good working knowledge of HMMs and Bayesian statistics in general is required. Solid programming experience in Python an C is required.

Contact: Alexander Schliep (alexander@schlieplab.org).

 

WS 2004/05

Statistische Mustererkennung in der Bioinformatik mit HMMs

Format: Fr 12:00-14:00, Informatik 1.27. 2 SWS

Inhalt: Vorlesung an der Martin-Luther-Universität Halle-Wittenberg.

Contact: Alexander Schliep (alexander@schlieplab.org).

Analyse von DNA-Microarrays

Format: Di 10:00-12:00, Informatik 1.26. 2 SWS

Inhalt: Vorlesung an der Martin-Luther-Universität Halle-Wittenberg.

Contact: Alexander Schliep (alexander@schlieplab.org).

Analyse von DNA-Microarrays

Format: Di 14:00-15:00, Informatik 1.03. 1 SWS

Inhalt: Übung zur Vorlesung Analyse von DNA-Microarrays an der Martin-Luther-Universität Halle-Wittenberg.

Contact: Alexander Schliep (alexander@schlieplab.org).

Statistische Mustererkennung in der Bioinformatik mit HMMs

Format: Di 15:00-16:00, Informatik 1.03. 1 SWS

Inhalt: Übung zur Vorlesung Statistische Mustererkennung in der Bioinformatik mit HMMs an der Martin-Luther-Universität Halle-Wittenberg.

Contact: Alexander Schliep (alexander@schlieplab.org).

Problemstellungen der Bioinformatik

Format: Fr 10:00-12:00, Informatik 1.03. 2 SWS

Inhalt: Seminar an der Martin-Luther-Universität Halle-Wittenberg.

Contact: Alexander Schliep (alexander@schlieplab.org).

Algorithmische Bioinformatik

Format: Mo 12:00-14:00 HS 001 Arnimallee, Mi 12:00-14:00 SR 005 Takustrasse 9. 4 SWS

Inhalt: Vorlesungen zu Hidden Markov Modellen und Gene Finding im Rahmen der Vorlesung Algorithmische Bioinformatik (Vingron et al.)

Contact: Alexander Schliep (alexander@schlieplab.org).

 

SS 2004

Statistische Gruppentests

Format: Do 14:00 - 16:00, SR, 3. Stock, MPI. 2 SWS

Inhalt: Mit einem Gruppentest bezeichnet man einen Ansatz 'teure' Experimente an einzelnen Proben einzusparen, indem man Gruppen gleichzeitig testet. Dies findet z.B. Anwendung bei der Qualitätskontrolle, z.B. Tests auf HIV oder Hepatitits mittels PCR, von Blutkonserven. Gruppentests sind aber viel allgemeiner anwendbar, wie z.B. bei der Bestimmung von Haplotyp und Genotyp, Erstellung physikalischer Kartierungen und beim Design von DNA-chips. Die zugrundeliegende Theorie ist reizvoll, da sie diskrete Mathematik (Kombinatorik, Kodierungstheorie) und Statistik eng verknüpft.

Contact: Alexander Schliep (alexander@schlieplab.org).

Statistische Mustererkennung in der Bioinformatik

Format: Do 10:00 - 12:00, SR 119 Arnimallee 3. 2 SWS

Inhalt: Hidden-Markov-Modelle (HMM) sind eine flexible Klasse statistischer Modelle insb. für biologische Sequenzen und Zeitreihen. Ausgehend von der klassischen Definition eines HMM werden wir anhand von Anwendungen in der Molekularbiologie Modellerweiterungen (z.B. Zustände höherer Ordnung, multi-variate Ausgaben) und Klassifizierungs- und Gruppierungsverfahren auf der Basis von HMMs vorstellen. Dies wird ergänzt um eine Einordnung der HMMs in die Hierarchie statistischer Modelle. Ein zusätzlicher Schwerpunkt liegt auf einer Darstellung effektiver Techniken für eine effiziente und numerisch stabile Implementation.

Contact: Alexander Schliep (alexander@schlieplab.org).

Softwarepraktikum Statistische Mustererkennung

Format: ganztägig, PC Pool, 3. Stock, MPI. 12 cr

Inhalt: Aufbauend auf der GHMM-Bibliothek (http://ghmm.org) bietet das Praktikum die Möglichkeit an aktuellen Fragestellungen der Bioinformatik im Bereich der Analyse von DNA-Sequenzen zu arbeiten. Um gleichzeitig verschiedene Informationsquellen (z.B. Primär- und Sekundärstruktur bei Proteinen) in der Analyse nutzen zu können, ist es nötig, multi-variate, oder vektor-wertige, Ausgaben zu unterstützen. Auf dieser Basis werden wir ein Programm zum Auffinden von Genen in DNA-Sequenzen eukaryontischer Genome entwerfen und implementieren. Dabei gilt es, das biologische Problem zu modellieren, einen Lösungsansatz zu entwerfen und Erweiterungen bzw. Anpassungen an der Bibliothek vorzunehmen. Für Training und Evaluation sind geeignete biologische Datensätze zu erstellen. Während des Praktikums sollen Teilnehmer/innen die selbständige Arbeit als Team (zwei Teams mit je vier Teilnehmern) kennenlernen und Erfahrungen mit Methodik (Extreme Programming) und Softwarewerkzeugen (z.B. für Versionskontrolle, Tests und Dokumentationen) des Software Engineering sammeln.

Contact: Alexander Schliep (alexander@schlieplab.org). Teaching assistants: Wasinee Rungsarityotin.

 

WS 2003/04

Clusteranalyse heterogener Daten

Format: Di 10:00 - 12:00, SR, 3. Stock, MPI. 2 SWS

Inhalt: Der Gegenstand des Seminars sind Verfahren, die es erlauben unterschiedliche Datentypen (heterogene Daten) auszunutzen, um die Robustheit und Aussagekraft bei der Analyse zu verbessern.

Contact: Alexander Schliep (alexander@schlieplab.org).

Statistical Classification: Support Vector Machines and Generalized Linear Models

Format: Do 14:00 - 16:00, Raum 111, Arnimallee 2-6. 2 SWS

Inhalt: Gemeinsames Seminar mit Ehrhard Behrends und Peter Martus.

Contact: Alexander Schliep (alexander@schlieplab.org).

Applied Data Mining

Format: Ganztägig 1-12.03.2004, PC-Pool, 3. Stock, MPI. 2 SWS Schwerpunktbereiche C und D.

Inhalt: Gemeinsame Veranstaltung mit Knut Reinert sowie Dennis Kostka und Florian Markowetz.

In diesem praktischen Kompaktkurs wird den teilnehmenden Studierenden die Möglichkeit geboten, die in Seminaren bzw. Vorlesungen zur statistischen Mustererkennung bzw. zum Data Mining erworbenen Kenntnisse durch praktische Analyse exemplarischer molekularbiologischer Datensätze zu vertiefen und zu ergänzen. Die thematischen Schwerpunkte sind Support-Vector-Maschinen und Hidden-Markov- Modelle. Elementare Programmierkenntnisse (C, C++, Python, R) werden vorausgesetzt, der Schwerpunkt liegt aber auf dem Erlernen der Data Mining Methodik.

Voraussetzungen: Erfolgreiche Teilnahme an einem der Seminare "Elements of Statistical Learning" oder "Clusteranalyse heterogener Daten", sowie erfolgreiche Teilnahme an der Vorlesung "Algorithmische Bioinformatik".

Details: Microarrays, Proteinstrukturen, Text Mining --- die Liste ist fortführbar. Das Erkennen komplexer Strukturen in hochdimensionalen Räumen ist ein immer wiederkehrendes Problem in der Bioinformatik. Support Vector Machines (SVM) sind eine junge aber sehr erfolgreiche Klassifikationsmethode, mit deren Hilfe viel Probleme erfolgreich bearbeitet werden können.

Ähnlich verhält es sich mit Hidden Markov Models (HMM). Auf Ihnen basieren z.B. die Standardmethoden für Gen-Vorhersage und das Auffinden homologer Proteinsequenzen. Neben zahlreichen Anwendungen in der Sequenzanalyse, werden Sie auch in für die Analyse von Microarrays eingesetzt.

Die erste Woche des Softwarepraktikums beschäftigt sich mit SVM für Regression und Klassifikation, die zweite Woche mit HMM und der Analyse --- Annotation und Klassifikation --- von biologischen Sequenzen.

Die beiden Blöcke haben analogen Aufbau. Am ersten Tag wird eine Einführung gegeben, am letzten Tag steht Zeit zum Schreiben eines Berichtes zur Verfügung. Jeweils von Dienstag bis Donnerstag werden morgens praktische Aspekte erläutert. Danach wird in kleinen praktischen Schritten auf ein für diesen Tag von den Teilnehmern selbständig zu bearbeitendes Problem hingeführt.

Die Aufgaben sind individuell zu bearbeiten. Die Ergebnisse werden in einem Bericht zusammengefasst. Gruppenarbeit ist erlaubt, die Beiträge Einzelner müssen jedoch klar gekennzeichnet sein. Am Ende jedes Blocks findet bei jedem Teilnehmer die Begutachtung der Implementierung einer der drei Aufgaben statt.

Aus der Güte der Begutachtung und des schriftlichen Berichtes ergeben sich die Note.

Contact: Alexander Schliep (alexander@schlieplab.org). Additional lecturers: Wasinee Rungsarityotin. Teaching assistants: Benjamin Georgi.

Algorithmische Bioinformatik

Format: 4 SWS, Mo 10:00-12:00 SR 005 Takustrasse 9, Mi 10:00-12:00 SR 005 Takustrasse 9

Inhalt: Vorlesungen zu Hidden Markov Modellen (Einführung, Profil-HMMs) und Gene Finding im Rahmen der Vorlesung Algorithmische Bioinformatik (Reinert et al.).

Contact: Alexander Schliep (alexander@schlieplab.org).

 

SS 2003

Statistische Mustererkennung in der Bioinformatik

Format: Do 10:00 - 12:00, SR 119 Arnimallee 3. 2 SWS

Contact: Alexander Schliep (alexander@schlieplab.org). Teaching assistants: Wasinee Rungsarityotin.

Softwarepraktikum Statistische Mustererkennung

Format: 2 SWS, ganztägig, PC Pool, 3. Stock, MPI

Contact: Alexander Schliep (alexander@schlieplab.org). Teaching assistants: Wasinee Rungsarityotin.

 

WS 2002/03

Markov Ketten

Format: 2 SWS, Fr 16:00-18:00, SR 059 Takustrasse 9

Inhalt: Gemeinsames Seminar mit Huisinga, Schütte und Vingron

Contact: Alexander Schliep (alexander@schlieplab.org).

Algorithmische Bioinformatik

Format: 4SWS, Mo 10:00-12:00 SR 005, Takustrasse 9, Mi 10:00-12:00 SR 031, Arnimallee 2-6

Inhalt: Vorlesungen zu Hidden Markov Modellen (Einführung, Profil-HMMs) und Gene Finding im Rahmen der Vorlesung Algorithmische Bioinformatik (Vingron et al.)

Contact: Alexander Schliep (alexander@schlieplab.org).