Info

Lecturer: Boris Kudryashov
Language: English
Lecture: Monday, 14:15 -- 16:00, Paabel, 17-219
Practice session: Wednesday, 10:15 -- 12:00, Paabel, 17-218

Books for the course:

M. Nelson and J.-L. Gailly, "The data compression book 2nd edition," M & T Books, New York, NY, (1995).
I. Bocharova, "Compression for multimedia," Cambridge Univ. Press (is sold via INTERNET)
D. Salomon, and G. Motta, "Handbook of data compression," Springer Science & Business Media, 2010.

Contact:

 Boris Kudryashov (boriskud@ut.ee)
 Vitaly Skachek

General information

This course is primarily intended for Bachelor's level students and is recommended for Master's students with an interest in data compression techniques.

First, we will answer the questions

why for a given data the required storage space can be reduced?
how much space is necessary for the source with known probabilistic model?

Information Theory gives precise answers to these questions. Optimum codes for sources with known statistical models are known from the dawn of information theory. Surprisingly, the near-optimum coding efficiency can be achieved also in the case if there is no a priori information about source statistics. We will study elegant coding algorithms combining high efficiency and low complexity.

Nowadays most of the data transmitted over communication networks is analog data represented in digital formats. The quality loss is inevitable consequence of the analog-to-digital conversion.

The required data rate vs quality trade-off is established by theorems of the rate-distortion theory. However, unlike the lossless coding theory, there is no universal lossy coding which is equally efficient for all types of multimedia data. Contemporary lossy coding techniques rely on the physical models of data sources and human perceptual sensitivity to different distortions. Standard coding techniques are based on complicated mathematical models and computational algorithms. Nevertheless, we will reveal most of the secrets of high efficiency of ACELP coding for speech, AAC and MP3 coding for music, JPEG and MPEG coding for still images and video.

Lectures and practice sessions ?

N	Lecture	Practice session
1	Measuring the quantity of information. Basic notions of information theory	Computing achievable data rates for examples of discrete sources
2	Uniform and nonuniform source codes	Applications of Huffman codes to different scenarios
3	Arithmetic coding	Practical implementation
4	Off-line universal coding	Examples. Implementation issues
5	On-line universal coding for memoryless sources	Examples. Implementation issues
6	Ideas behind low-complexity universal coding techniques. Interval coding and recency rank coding. LZ-77 algorithm	Monotonic codes. LZ-77 example
7	LZW algorithm, PPM algorithm	Examples. Implementation issues
8	Burrows-Wheeler transform and its application. Overview of the best archivers	A BWT-based data compression algorithm
9	Fundamentals of rate-distortion theory. Measuring of analog information. Sampling. Transforms. Quantization	Scalar quantization. Rate-distortion functions for Gaussian variables and processes
10	Optimum scalar quantization. Vector quantization	Simulations of quantizers followed by lossless encoders
11	Linear prediction based speech coding. Demos	Matlab implementation of main stages of speech coder
12	Psycho-acoustic models and filterbanks. Standards. Demos	Implementation of subband decomposition via overlapped harmonic transforms
13	Formats of image data. JPEG	Implementation issues
14	Discrete wavelet trandsform. JPEG-2000	Implementation example
15	Formats of video data. Motion compensation. MPEG standards	Simulations. Demos.
16	Horizons. Open problems. Theoretical research topics	Summary of projects

Projects ?

Topic	Content	Start (week)	Deadline (week)
Measuring information	For a real discrete data source estimate theoretically achievable compression ratio. Compare with the efficiency of existing archivers	1	6
Universal data compression	Implement programs for estimating the efficiency of a given set of universal data compression algorithms. Choose the best for a given data source. Compare with efficiency of the standard archivers	5	13
Analog data compression	Choose a multimedia data fragment. Evaluate rate-distortion trade-off using straightforward scalar quantization followed by lossless encoding. Repeat computations using a proper transform	10	17

Schedule:

 Lecture  Monday 14.15-16.00  weeks (24-39),  J. Liivi 2 - 511 
 Practical session  Wed. 10.15 - 12.00 week (24-39)  J. Liivi 2 - 511

Demos

Speech compression standards

The original speech file (PCM 8kHz, 16 bits, size 73684 bytes
ADPCM, 8kHz, 4 bits, size 18688 bytes
GSM Standard, CELP-like coder, size 7540 bytes
CELP-like coder, size 4896 bytes

Image compression standards

Original image file, size 3145782 bytes
JPEG Standard, size 31016 bytes
JPEG2000, size 31016 bytes
Wavelet coder, size 29401 bytes

Practice in data compression 2018/19 kevad