Programmeerimise alused - Kursused - Arvutiteaduse instituut

6.4 STRING COMPARISON (EXTRA READING)

This chapter is not mandatory in the context of this course, hence feel free to continue to the next subchapter.

String comparison allows us to evaluate the relationship between different strings. These operations are based on the lexicographical ordering of the strings.

UNICODE

By providing a universal set of character codes, Unicode standardizes how strings are compared lexicographically.

Every character in Unicode is assigned a unique code point, a numerical identifier representing the character in the Unicode standard. For example, the character 'A' is U+0041, and 'a' is U+0061.

When strings are compared, each character’s Unicode code point is considered. The comparison is done character by character, starting from the first character of each string.

The numerical value of these code points determines the order. Characters with lower code points come before those with higher ones in lexicographical order. For instance, since 'A' (U+0041) has a lower code point than 'a' (U+0061), 'A' is lexicographically smaller than 'a'.

Because each uppercase and lowercase letter has a different code point in Unicode, comparisons are inherently case-sensitive. This means that "apple" and "Apple" are considered different.

CHECKING EQUALITY AND INEQUALITY

== checks if the two strings are exactly the same. != checks if two strings are not exactly the same.

print("hello" == "hello")  # True
print("hello" == "Hello")  # False, because the comparison is case-sensitive
print("apple" != "banana") # True

GREATER AND LESS THAN

A string is considered greater if it comes after another string in alphabetic order. This comparison is also case-sensitive, meaning uppercase letters come before lowercase letters. Also, important to note that the comparison is done on a character-by-character basis until a difference is found.

pprint("apple" < "banana")  # True
print("apple" > "Apple")    # True, because lowercase > uppercase in Unicode
print("apple" >= "apple")   # True
print("banana" <= "banana") # True

CASE-INSENSITIVE COMPARISON

Sometimes, you might want to compare strings in a case-insensitive manner. To do so, we can convert the case of the strings to be the same using .lower() and .upper() methods.

string1 = "Python"
string2 = "python"
print(string1 == string2) # False as the first string starts with a capital P
print(string1.lower() == string2.lower())  # True

KEY THINGS TO REMEMBER

Unicode assigns each character a unique code point which standardizes string comparison. Strings are compared based on the Unicode code points of their characters, starting from the first character.
Uncode treats uppercase and lowercase letters as different, affecting their order in comparisons.
== returns True if two strings are exactly the same. != returns True if two strings differ in at least one character.
Strings can be compared using <, >, <=, and >= based on their character's unicode values. 'apple' > 'Apple' because lowercase letters have higher code points than uppercase.
We can use str.lower() and str.upper() methods for case-insensitive comparison.

SELF-CONTROL EXERCISES

< previous

Table of Contents

next >

Programmeerimise alused 2024/25 sügis