10 20 100 How many ways are there to roll 100 on ten twenty-sided dice? Look at the first die. That die has a number between 1 and 20. If it has a 1, then I need to know how many ways to roll a 99 on 9 20-sided dice. If it had a 2, then I need to know how many ways to roll a 98 on 9 20-sided dice. If it had a 3, then I need to know how many way to roll a 97 on 9 dice? . . . If it had a 20, then need to know how many ways to roll an 80 on 9 20-sided dice? Add all these up: I know how many ways to roll 100 on 10 dice? Strings are sequences of characters. Characters are elements of a finite set used to represent the symbols we use in written communication. ASCII: 7-bit 0-127. ASCII is still not complete obsolete. Your laptop keyboard is based on ASCII characters. In England, ASCII doesn't have a hashtag. #. Instead, it's the British pound symbol. In my life experience, I remember before ASCII had ^. When I started programming, ^ was an up-arrow with a stem. You sometimes see the caret used in describing control characters. ^C : means someone hit CTRL-C. File systems use 8-bit characters are bytes. And so, there have several competing extensions to ASCII where the first 128 are ASCII characters and the last 128 are other stuff that varied from platform, from machine to machine, and sometimes with different pieces of software on the same machine. IBM used EBCDIC, a bizarre 8-bit character set. The PC Revolution, Apples, IBM PC's, Commodores, TRS-80's, they all had their own characters sets that may or may not have resembled ASCII in some way. Now, in an attempt to standardize the world, we have Unicode. Unicode has over a million different characters...and counting. Unicode is the character set of Java, at least internally. Some modern languages use Unicode characters as part of the language itself. The exact character set is unimportant for string algorithms. Just understand that there is a character set with a finite number of characters. For the purposes of this class, the character set I'll use is the 26-character set of capital letters. The first algorithm we'll look at is String Sorting. The fastest possible sort algorithm runs in n lg n time. Radix sort runs even faster. Let's say I have a set of strings, all of which are three characters long. CAT DOG BAT ARE EMU EAR RUN FUN SUN SON TON Here's the idea: You stable sort by the last character in the string, then by the second-to-the-last character, and so forth, and finally by the first character in the string. Stable sort: preserve the relative order of the strings in the event of a tie. Radix Sort: Start with the last character and tally how many of each you see. E(1) G(1) N(5) R(1) T(2) U(1) 1 1 5 1 2 1 Now make a running sum of this tally. 1 2 2 8 10 11 This means that there is one that comes at or before E, two that come at or before G, 7 that come at or before N and so forth. 0. ARE 1. DOG 2. RUN 3. FUN 4. SUN 5. SON 6. TON 7. EAR 8. CAT 9. BAT 10. EMU 0 1 2 7 8 10 E G N R T U 3 1 3 1 3 A M O R U 0 3 4 7 8 0. EAR 1. CAT 2. BAT 3. EMU 4. DOG 5. SON 6. TON 7. ARE 8. RUN 9. FUN 10. SUN 1 1 1 1 2 1 1 2 1 A B C D E F R S T 0 1 2 3 4 6 7 8 10 0. ARE 1. BAT 2. CAT 3. DOG 4. EAR 5. EMU 6. FUN 7. RUN 8. SON 9. SUN 10. TON Radix Sort is potentially faster than the QuickSort. No comparison-based sort can run faster than O(n lg n). BUT THE RADIX SORT IS NOT A COMPARISON-BASED SORT. Radix Sort runs in O(nk) time, n is the number of strings. k is the max length of the strings. (Perhaps you might look at it as O(n (k+c)) where c is the number of valid characters in the alphabet.