Institute of Computer Science
  1. Courses
  2. 2025/26 fall
  3. Foundation Models: From Symbolic Reasoning to Deep Learning (LTAT.02.034)
ET
Log in

Foundation Models: From Symbolic Reasoning to Deep Learning 2025/26 fall

  • Pealeht
  • Loengud
  • Assignments&Grading?
    • Homework and Schedule
      • Information?
      • Submit?
      • Grading?
    • Essays?
    • Projects?
    • Exam?
  • Help?
  • Links

Homework assignment: Measuring information and data compression

Goal of the homework The goal is to understand the meaning of the theoretical characterization of the amount of information contained in a specific message. It is informative to compute theoretical limits on the achievable compression efficiency and compare the results with those obtained from practical data compression techniques. Another goal is to discover an analogy between the way compression algorithms approximate the source model and the heuristic approaches to ”tokenization” in the framework of LLMs.

Test data examples

The exemplary data files are presented in the Table in the Appendix. Use your birthdate as your variant number.

Steps Step 1. Estimate one-dimensional and two-dimensional probabilities for the data source. Remark1: Whileestimatingempiric probabilities use “sliding” blocks. In this case, a file of length N contains N − n+1 blocks of length n. Step 2. Estimate source entropies H(X), H(X2) = H(XiXi+1), H2(X) = H(X2)/2, H(Xi|Xi−1). Comment on the achievable compression eff iciency.

Hint: For estimating conditional entropies, the following formula can be helpful: H(Y|X) = H(XY)−H(X).

  • Institute of Computer Science
  • Faculty of Science and Technology
  • University of Tartu
In case of technical problems or questions write to:

Contact the course organizers with the organizational and course content questions.
The proprietary copyrights of educational materials belong to the University of Tartu. The use of educational materials is permitted for the purposes and under the conditions provided for in the copyright law for the free use of a work. When using educational materials, the user is obligated to give credit to the author of the educational materials.
The use of educational materials for other purposes is allowed only with the prior written consent of the University of Tartu.
Terms of use for the Courses environment