< previous | Table of Contents | next > |
6.4 STRING COMPARISON (EXTRA READING)
This chapter is not mandatory in the context of this course, hence feel free to continue to the next subchapter.
String comparison allows us to evaluate the relationship between different strings. These operations are based on the lexicographical ordering of the strings.
UNICODE
By providing a universal set of character codes, Unicode standardizes how strings are compared lexicographically.
Every character in Unicode is assigned a unique code point, a numerical identifier representing the character in the Unicode standard. For example, the character 'A' is U+0041, and 'a' is U+0061.
When strings are compared, each character’s Unicode code point is considered. The comparison is done character by character, starting from the first character of each string.
The numerical value of these code points determines the order. Characters with lower code points come before those with higher ones in lexicographical order. For instance, since 'A' (U+0041) has a lower code point than 'a' (U+0061), 'A' is lexicographically smaller than 'a'.
Because each uppercase and lowercase letter has a different code point in Unicode, comparisons are inherently case-sensitive. This means that "apple" and "Apple" are considered different.
CHECKING EQUALITY AND INEQUALITY
==
checks if the two strings are exactly the same.
!=
checks if two strings are not exactly the same.
print("hello" == "hello") # True print("hello" == "Hello") # False, because the comparison is case-sensitive print("apple" != "banana") # True
GREATER AND LESS THAN
A string is considered greater if it comes after another string in alphabetic order. This comparison is also case-sensitive, meaning uppercase letters come before lowercase letters. Also, important to note that the comparison is done on a character-by-character basis until a difference is found.
pprint("apple" < "banana") # True print("apple" > "Apple") # True, because lowercase > uppercase in Unicode print("apple" >= "apple") # True print("banana" <= "banana") # True
CASE-INSENSITIVE COMPARISON
Sometimes, you might want to compare strings in a case-insensitive manner. To do so, we can convert the case of the strings to be the same using .lower()
and .upper()
methods.
string1 = "Python" string2 = "python" print(string1 == string2) # False as the first string starts with a capital P print(string1.lower() == string2.lower()) # True
KEY THINGS TO REMEMBER
- Unicode assigns each character a unique code point which standardizes string comparison. Strings are compared based on the Unicode code points of their characters, starting from the first character.
- Uncode treats uppercase and lowercase letters as different, affecting their order in comparisons.
==
returnsTrue
if two strings are exactly the same.!=
returnsTrue
if two strings differ in at least one character.- Strings can be compared using
<
,>
,<=
, and>=
based on their character's unicode values. 'apple' > 'Apple' because lowercase letters have higher code points than uppercase. - We can use
str.lower()
andstr.upper()
methods for case-insensitive comparison.
SELF-CONTROL EXERCISES
< previous | Table of Contents | next > |