Arvutiteaduse instituut
  1. Kursused
  2. 2022/23 kevad
  3. Andmete kokkupakkimine (MTAT.05.127)
EN
Logi sisse

Andmete kokkupakkimine 2022/23 kevad

  • Pealeht
  • Loengud
  • Viited

Info

Lecturer: Boris Kudryashov
Language: English
Lecture: Monday, 10:15 -- 12:00, Delta, 2048
Practice session: Wednesday, 10:15 -- 12:00, Delta, 2047
Office hours for homework discussions: Delta, 3089, Friday, 11:15 - 13:00, weeks 24-39

Books for the course:

  • M. Nelson and J.-L. Gailly, "The data compression book 2nd edition," M & T Books, New York, NY, (1995).
  • I. Bocharova, "Compression for multimedia," Cambridge Univ. Press (is sold via INTERNET)
  • D. Salomon, and G. Motta, "Handbook of data compression," Springer Science & Business Media, 2010.

Contact:

 Boris Kudryashov (boriskud@ut.ee)
Vitaly Skachek

General information

This course is primarily intended for Bachelor's level students and is recommended for Master's students interested in data compression techniques.

First, we will answer the questions

  • why for a given data the required storage space can be reduced?
  • how much space is necessary for the source with the known probabilistic model?

Information Theory gives precise answers to these questions. Optimum codes for sources with known statistical models are known from the dawn of information theory. Surprisingly, the near-optimum coding efficiency can be achieved also in the case if there is no a priori information about source statistics. We will study elegant coding algorithms combining high efficiency and low complexity.

Nowadays most of the data transmitted over communication networks are analog data represented in digital formats. Quality loss is an inevitable consequence of analog-to-digital conversion.

The required data rate vs quality trade-off is established by theorems of the rate-distortion theory. However, unlike the lossless coding theory, there is no universal lossy coding which is equally efficient for all types of multimedia data. Contemporary lossy coding techniques rely on the physical models of data sources and human perceptual sensitivity to different distortions. Standard coding techniques are based on complicated mathematical models and computational algorithms. Nevertheless, we will reveal most of the secrets of the high efficiency of ACELP coding for speech, AAC, and MP3 coding for music, JPEG, and MPEG coding for still images and video.

Contents

Lectures and practice sessions?

NLecturePractice session
1Measuring the quantity of information. Basic notions of information theoryComputing achievable data rates for examples of discrete sources
2Uniform and nonuniform source codesApplications of Huffman codes to different scenarios
3Arithmetic codingPractical implementation
4Off-line universal codingExamples. Implementation issues
5On-line universal coding for memoryless sourcesExamples. Implementation issues
6Ideas behind low-complexity universal coding techniques. Interval coding and recency rank coding. LZ-77 algorithmMonotonic codes. LZ-77 example
7LZW algorithm, PPM algorithmExamples. Implementation issues
8Burrows-Wheeler transform and its application. Overview of the best archiversA BWT-based data compression algorithm
9Fundamentals of rate-distortion theory. Measuring analog information. Sampling. Transforms. QuantizationScalar quantization. Rate-distortion functions for Gaussian variables and processes
10Optimum scalar quantization. Vector quantizationSimulations of quantizers followed by lossless encoders
11Linear prediction-based speech coding. DemosMatlab implementation of main stages of speech coder
12Psycho-acoustic models and filterbanks. Standards. DemosImplementation of subband decomposition via overlapped harmonic transforms
13Formats of image data. JPEGImplementation issues
14Discrete wavelet transform. JPEG-2000Implementation example
15Formats of video data. Motion compensation. MPEG standardsSimulations. Demos.
16Horizons. Open problems. Theoretical research topicsSummary of projects

Projects?

TopicContentStart (week)Deadline (week)
Measuring informationFor a real discrete data source estimate theoretically achievable compression ratio. Compare with the efficiency of existing archivers16
Universal data compressionImplement programs for estimating the efficiency of a given set of universal data compression algorithms. Choose the best for a given data source. Compare with efficiency of the standard archivers513
Analog data compressionChoose a multimedia data fragment. Evaluate rate-distortion trade-off using straightforward scalar quantization followed by lossless encoding. Repeat computations using a proper transform1017

Schedule:

 Lecture  Monday 10.15-12.00  weeks (24-39),  Delta, 2047 
Practical session Wed. 10.15 - 12.00 week (24-39) Delta, 2045


Demos

Speech compression standards

The original speech file (PCM 8kHz, 16 bits, size 73684 bytes
ADPCM, 8kHz, 4 bits, size 18688 bytes
GSM Standard, CELP-like coder, size 7540 bytes
CELP-like coder, size 4896 bytes

Image compression standards

Original image file, size 3145782 bytes
JPEG Standard, size 31016 bytes
JPEG2000, size 31016 bytes
Wavelet coder, size 29401 bytes

  • Arvutiteaduse instituut
  • Loodus- ja täppisteaduste valdkond
  • Tartu Ülikool
Tehniliste probleemide või küsimuste korral kirjuta:

Kursuse sisu ja korralduslike küsimustega pöörduge kursuse korraldajate poole.
Õppematerjalide varalised autoriõigused kuuluvad Tartu Ülikoolile. Õppematerjalide kasutamine on lubatud autoriõiguse seaduses ettenähtud teose vaba kasutamise eesmärkidel ja tingimustel. Õppematerjalide kasutamisel on kasutaja kohustatud viitama õppematerjalide autorile.
Õppematerjalide kasutamine muudel eesmärkidel on lubatud ainult Tartu Ülikooli eelneval kirjalikul nõusolekul.
Tartu Ülikooli arvutiteaduse instituudi kursuste läbiviimist toetavad järgmised programmid:
euroopa sotsiaalfondi logo