Arvutiteaduse instituut
  1. Kursused
  2. 2013/14 kevad
  3. Andmekaeve uurimisseminar (MTAT.03.277)
EN
Logi sisse

Andmekaeve uurimisseminar 2013/14 kevad

Older Datamining Seminars: 2008k » 2008s » 2009k » 2009s » 2010k » 2011k » 2012s » 2013s

  • About
  • Track I: Research Projects
    • Assigned Topics
    • Overall Progress
    • Deadlines

Fast detection of spurious DNA reads

Supervisor: Sven Laur
Student: Anti Alman
Short description: The present DNA measurement technology is not perfect. Reading errors can occur with frequencies up to 1 error per 5 base pairs. In some setting, we have millions of measurements. Each of them corresponds to a short sequence (30-40 letters) consisting of four letters A,C,G,T. As a result, we need to quickly find pairs of sequences that differ up to 3-5 places, as these are probably misreads of the same DNA sequence. The aim of this project is to develop methods for searching such pairs which has quasi-linear complexity. In particular, we study various projections methods and methods based on coding theory.

Discriminative sequence mining

Supervisor: Fabrizio Maria Maggi
Student: Erik Mägi
Short description: Discriminative sequence mining tries to find temporal business rules based on positive and negative examples (traces of correct and incorrect behaviors). The aim of the project is to analyze the BPI challenge 2011 dataset using a toolset presented at SIGKDD09 (available on request: write an e-mail to f.m.maggi@ut.ee). Detailed tasks are:

  • Get familiar with the toolset;
  • Find a way to classify the traces in the dataset as positive and negative examples (e.g., using the LTL checker, a plug-in of the process mining tool ProM;
  • Use the toolset for discriminative sequence mining using the positive and negative examples derived in previous stages;

Provide a report including:

  • an overview of the basics of discriminative sequence mining;
  • an overview of the toolset;
  • a study on the applicability of the discriminative sequence mining approach by using the given toolset.

Interactive hierarchical clustering tool for text segments

Supervisor: Timo Petmanson
Student: Karl-Oskar Masing
Short description: Create an interactive webtool for aggregative clustering similar sentences in Estonian. Need: We have 20,000 short opinions about study courses and we need to semantically annnotate them. Most of them are really similar. So we need a tool that could make annotating sentences easy. The tool should display the dendrogram of sentences or representative patterns and make it easy to merge close by table rows. Technically, you have to implement:

  • similarity measure between sentences
  • interactive merging of cells and recreating of the tree
  • automatic generalisation of regular expression patterns

Multi whole genome alignment

Supervisor: Balaji Rajashekar
Student: Fanny-Dhelia Pajuste
Short description: The aim of this project is to perform whole genome alignment, clustering of samples and automatic identification of regions showing mutations/substitutions in the genomes. You must align genomes from different individuals, identify the differences in regions in the sequences and show a line/bar graph on the top with levels of mutations. The samples can also be clustered. Annotations from public databases can be used to identify genomic regions. The resulting software should be graphical, standalone or web based.

Automatic product categorisation

Supervisor: Sven Laur
Student: Andres Viikmaa
Short description: Lots of product catalogs are imported into product catalog aggregator. Each product catalog has it's own taxonomy. To merge each catalog into master taxonomy tree requires manual work, either to map products by their properties into correct category or map taxonomy trees. This process can be made at least semi-automatic with machine learning tools or even automatic if good enough model can be generated from trainig set. This project will aim to create semi-automatic classification programm, to provide guidance to user to pick correct category from short list of candidates.

How does transcription factor AIRE influence gene expression

Supervisor: Tauno Metsalu
Student: Tõnis Tasa
Short description: First aim is to count reads in certain sized windows in not-annotated areas and compare different samples with varying AIRE and etoposide applications between one-another. Secondly, it's necessary to quantify alternative transcription starting sites(TSS) 100/50 basepairs ahead of the established starting sites. Third question is whether AIRE activates genes by directing RNA polymerase to elongation phase. This is done by comparing reads in areas 20-50bm from TSS and siding AIRE-negative and -positive samples.

Automatic analysis of tissue images

Supervisor: Balaji Rajashekar and Tauno Metsalu
Student: Maarja Lepamets
Short description: Various bioinformatic studies provide images of cell cultures and tissues. The aim of this project is to learn some basic image processing techniques and apply them to find qualitative measures for different properties of the tissue. More precisely, the aim is to quantify the amount of colour-tagged proteins in tissue by estimating the size and intensity of colour spots in the image. Eg. http://fimm.webmicroscope.net/MainCollection/cyclin

Noise reduction of Emotiv EPOC-s raw signa

Supervisor: Ilja Kuzovkin
Student: Raimond Tunnel
Short description: Raw signal from the Emotiv EPOC-s EEG reading device contains a lot of background noise and readings that do not contribute to mental state classification. Goal is to use some classification algorithm on the raw data, on the filtered (with some noise reduction algorithm) data and to compare the classification algorithm's F1 scores.

Community detection with node and edge attributes

Supervisor: Marlon Dumas
Student: Riivo Kikas
Short description: Majority of community detection research has focused on how to detect communities in networks based on structure only. Besides structure, information about nodes and links, such as node and edge attributes can be used for extracting communities. In this work, we review current state of community detection with node/edge attributes and develop an algorithm for discovering communities with different types of attributes (categorical, numeric, action log data). We will also consider detecting evolving community in temporally changing networks. As a case study example, we aim to

  • Extract communities in a social network of users while minimising user age variance in the community.
  • Extract communities in a social network that like same news articles, by maximising similarity between users based on read articles.

DNA methylation and linkage disequilibrium

Supervisor: Sven Laur and Hedi Peterson
Student: Hans-Peter Tulmin
Short description: TBA

  • Arvutiteaduse instituut
  • Loodus- ja täppisteaduste valdkond
  • Tartu Ülikool
Tehniliste probleemide või küsimuste korral kirjuta:

Kursuse sisu ja korralduslike küsimustega pöörduge kursuse korraldajate poole.
Õppematerjalide varalised autoriõigused kuuluvad Tartu Ülikoolile. Õppematerjalide kasutamine on lubatud autoriõiguse seaduses ettenähtud teose vaba kasutamise eesmärkidel ja tingimustel. Õppematerjalide kasutamisel on kasutaja kohustatud viitama õppematerjalide autorile.
Õppematerjalide kasutamine muudel eesmärkidel on lubatud ainult Tartu Ülikooli eelneval kirjalikul nõusolekul.
Tartu Ülikooli arvutiteaduse instituudi kursuste läbiviimist toetavad järgmised programmid:
euroopa sotsiaalfondi logo it akadeemia logo