Lesson 9 - Strings in Python - Split
In the previous exercise, Solved tasks for Python lesson 8, we've practiced our knowledge from previous lessons.
In the previous tutorial, Solved tasks for Python lesson 8, we made clear that Python strings are
essentially sequences of characters. In today's lesson, we're going to explain
the split()
string method that I have intentionally kept from you
because we didn't know that strings are similar to lists
split()
From the previous tutorial, we know that parsing strings character by character can be rather complicated. Despite the fact that we made a fairly simple example. Of course, we'll encounter strings all the time. They're present in user inputs (from the console or from input fields in form applications), and in TXT and XML files. Very often, we're given one long string, a line in a file or in the console, in which there are multiple values separated by separators, e.g. commas. In this case, we're referring to the CSV format (Comma-Separated Values). To make sure that we're all on the same page, let's look at some sample strings:
Jessie,Brown,Wall Street 10,New York,130 00 .. ... .-.. .- -. -.. ... --- ..-. - (1,2,3;4,5,6;7,8,9)
- The first string represents a user. We could, for example, store users into a CSV file (one per line).
- The second string is Morse code characters and uses a space character as a separator.
- The third string is a matrix of
3
columns and3
rows. The column separator is a comma, whereas the row separator is a semicolon.
We can call the split()
method on a string, which takes a
separator character as a parameter. It'll then split the original string using
the separator into a sequence of substrings and return it. Which will greatly
simplify value extraction from strings for our current intents and purposes.
We're also already familiar with the join()
method which is
called directly on the string separator and vice versa allows us to join a
sequence of substrings into a single string using a specified separator. The
parameter is a sequence. The output of the method is the resulting string.
Right then, let's see what we've got up until now. We still don't know how to declare objects, users, or even work with multidimensional arrays, i.e. matrices. Nevertheless, we want to make something cool, so we'll settle for making a Morse code message decoder.
Morse code decoder
We'll start out by preparing the structure of the program, as always. We need two strings for the messages, one for a message in Morse code, the other one will be empty for now and we'll store the results of our efforts there. Next, we need letter definitions (as we had with vowels). Of course, it'll be a definition based off of the ones in Morse code. Letters can be stored into a single string since they only consist one character. Morse code characters consist of multiple characters, that we have to specify using a list.
The structure of our program should now look something like this:
# the string which we want to decode s = ".. -.-. - ... --- -.-. .. .- .-.." print("The original message: %s" %(s)) # a string with a decoded message message = "" # array definitions alphabetChars = "abcdefghijklmnopqrstuvwxyz" morseChars = [".-", "-...", "-.-.", "-..", ".", "..-.", "--.", "....", "..", ".---", "-.-", ".-..", "--", "-.", "---", ".--.", "--.-", ".-.", "...", "-", "..-", "...-", ".--", "-..-", "-.--", "--.."]
We could also add other Morse characters such as numbers and punctuation
marks but won't worry about them (for now). We'll split the string
s
with the split()
method into a sequence of
substrings containing the Morse characters. We'll set the space character as the
separator. Then, we'll iterate over the sequence using a for
loop:
# splitting a string into Morse characters characters = s.split(" ") # iteration over Morse characters for morseChar in characters:
Ideally, we should somehow deal with cases when the user enters things like
multiple spaces between characters (users often do things of the sort). In this
case, split()
creates one more empty substring in the sequence. We
would then detect it in the loop and ignore it, but we won't deal with that in
this lesson.
In the loop, we'll attempt to find the current Morse character in the
morseChars
list. We'll be interested in its index
because when we look at that same index in the alphabetChars
list,
there will be a corresponding letter. This is mainly because both the list and
the string contain the same characters which are ordered alphabetically. Let's
place the following code into the loop body:
alphabetChar = "?" if morseChar in morseChars: # character was found index = morseChars.index(morseChar) alphabetChar = alphabetChars[index] message += alphabetChar
First, the alphabetical character is set to "?"
since it may
very well be that we don't have it defined in our list. Then we try to determine
its index. If it succeeds, we assign the character from alphabetic characters at
its index to alphabetChar
. Finally, we add the character to the
message. The +=
operator works the same as
message = message + alphabetChar
.
Now, we'll print the message:
{PYTHON}
# the string which we want to decode
s = ".. -.-. - ... --- -.-. .. .- .-.."
print("The original message: %s" %(s))
# a string with a decoded message
message = ""
# array definitions
alphabetChars = "abcdefghijklmnopqrstuvwxyz"
morseChars = [".-", "-...", "-.-.", "-..", ".", "..-.", "--.", "....",
"..", ".---", "-.-", ".-..", "--", "-.", "---", ".--.", "--.-", ".-.", "...", "-", "..-",
"...-", ".--", "-..-", "-.--", "--.."]
# splitting a string into Morse characters
characters = s.split(" ")
# iteration over Morse characters
for morseChar in characters:
alphabetChar = "?"
if morseChar in morseChars: # character was found
index = morseChars.index(morseChar)
alphabetChar = alphabetChars[index]
message += alphabetChar
print("The decoded message: %s" % (message))
Console application
The original message: .. -.-. - ... --- -.-. .. .- .-..
The decoded message: ictsocial
Done! If you want to train some more, you may create a program which would
encode a string to the Morse code. The code would be very similar. We'll use the
split()
and join()
methods several more times
throughout our courses.
Special characters and escaping
Strings can contain special characters which are prefixed with backslash
"\"
. Mainly, the \n
character, which causes a line
break anywhere in the text, and \t
, which is the tab character.
Let's test them out:
{PYTHON}
print("First line\nSecond line")
"\"
character indicates a special character sequence in a string
and can be used also e.g. to write Unicode characters as "\uxxxx"
where xxxx
is the character code.
The problem might be when we want to write "\"
itself, in this
case, we have to escape it by writing one more
"\"
:
{PYTHON}
print("This is a backslash: \\")
We can escape a quotation mark in the same way, so Python wouldn't misinterpret it as the end of the string:
{PYTHON}
print("This is a quotation mark: \"")
We can also take an advantage from the fact Python supports both single and double quotes. If we need to write double quotes in the string, we don't have to escape it but put the string into single quotes instead:
{PYTHON}
print('Yes, that was my "lucky" day, I lost my car keys!')
When we want a string to contain line breaks, it can be useful to declare it using the triple quotes:
{PYTHON}
s = """The first line
the second line"""
print(s)
The output:
Console application
s = """The first line
the second line"""
print(s)
Inputs from the console and input fields in form applications are, of course,
escaped automatically, so the user doesn't need to enter \n
,
\t
, etc. Programmers are allowed to write these characters in the
code, so we have to keep escaping in mind.
In the next lesson, we'll learn all about multidimensional arrays.