Character

Character

Characters can be classified in a variety of ways. One could partition the ascii characters into digits, symbols, upper case letters, lower case letters, and non-printing characters. Unicode is an incredibly robust standard that classifies all characters as well. We’ll simplify our analysis for the purpose of identifiers in programming languages, which are a bit more limited than the extent of all of unicode. Something to loop back to.

Moving forward, we will define a character as one of the printable ascii characters. We can partition these characters in the following classes:

  • The space character (0x20)
  • ASCII digits 0-9 (0x30-0x39)
  • Uppercase Latin alphabet (0x41-0x5A)
  • Lowercase Latin alphabet (0x61-0x7A)
  • Symbols, everything else

In unicode terminology, we are going to consider only characters from the Basic Latin block. 1

Letter Case

The unicode standard 16.0.0 describes a case as follows. 2

Case is a normative property of characters in certain alphabets whereby characters are considered to be variants of a single letter.

In the context of the basic latin block, we only need to consider two variants of this case property.

These variants, which may differ markedly in shape and size, are called the uppercase letter (also known as capital or majuscule) and the lowercase letter (also known as small or minuscule).

In the context of what we will define later as a string case, we call this property letter case. All characters are one of uppercase, lowercase, or neither. Characters that are neither uppercase or lowercase we say are caseless.

letter case examples
lowercase a, b, c, …, z
uppercase A, B, C, …, Z
caseless 0,…,9, _, -, space, etc.

todo: should letter case be two words?