Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Views: 1374
Image: ubuntu2004
Kernel: Python 3 (system-wide)

nuwc_nps_banner.png

Lesson 1.3: Strings

During this lesson, you will learn the following:

  • String Basics (indexing, slicing, membership, iterating)

  • String Formatting (for print statements)

Working with String (text) data

String Operators

  • For strings, + is concatenation.

  • For strings, * is repitition.

'hello' + 'world'
'helloworld'
'ma-na ' * 4
'ma-na ma-na ma-na ma-na '
'hello' + 3 # fails due to incorrect data type
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-4-21ca289a519f> in <module> ----> 1 'hello' + 3 # fails due to incorrect data type TypeError: can only concatenate str (not "int") to str
'hello' + str(3) # succeeds due to matching data types
'hello3'

Strings as Ordered Sequences

A string is a sequence of characters:

  • Each character in the string has a position, called the index.

  • We can access each character in the string by its index position, using square brackets .

  • In Python, index positions start their counting with 0 (zero), not 1.

Strings are immutable (they cannot be changed)

fruit = 'banana' # create a string print(fruit)
banana
type(fruit)
str

Forward indexing starts at position 0

Python support reverse indexing (starting at position -1)

fruit[0] # use the string index - it starts at 0
'b'
fruit[5] # use the string index
'a'
fruit[6] # what happens here?
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-14-ecdcf1c80663> in <module> ----> 1 fruit[6] # what happens here? IndexError: string index out of range

You can "count down" from the end of a string using the index as follows:

fruit[-1] # get the last character in the string
'a'
fruit[-2] # get the second to last character in the string
'n'

Slicing Strings

Python supports a special operation called a "slice" (using the operator "😊 to select a subset of a sequence, in this case here, a subset of characters in a string. The general format of a slice is as follows:

some_sequence[n:m]

This yields the sub-sequence:

  • starting with the nthn^{th} character

  • up to, but not including, the mthm^{th} character

fruit[2:5] # starting at index 2, going up to index 5 - note grabs index=2 (the third letter)
'nan'
fruit[2:] # starting at index 2, going to the end
'nana'
fruit[:5] # starting at the beginning, going up to index 5
'banan'
fruit[:] # what will this do?
'banana'
# you can use negative index values (from the end) in your slices fruit[:-1] # everything up to, but not including, the last character
'banan'

Iterating Over Strings with a While Loop

  • To iterate over a string with a while loop, you leverage the string index

  • The len() function is useful building for while loops over strings:

    • It returns the length of a sequence (not just strings!)

    • It can be used to control loops over the string

    • Some care is needed when using the length b/c counting is zero-based (remember OBOE!)

len(fruit)
6
i = 0 while i < len(fruit): print(fruit[i]) i+=1
b a n a n a

Iterating Over Strings with For Loop

  • A for loop iterates over each element in the sequence from start to end (the use of the index is implicit).

  • You use a variable (in this case "char") to represent each element in the list during loop execution:

# here is a for loop over the string "fruit" - note the use of the descriptive variable "char" to represent the index for char in fruit: print(char)
b a n a n a

The in Operator for Strings

  • The word in is a boolean operator that takes two strings and returns True if the first appears as a substring in the second

  • This provides a very convenient way to check for membership in a sequence.

'n' in 'banana'
True
if('ana' in fruit): print("ana")
ana
'seed' in 'banana'
False

String Comparisons

For strings, we already know the + operator concatenates, and == tests for equality.

It turns out that the relational operators < and > have been implemented to reflect lexicographic ordering (i.e., indicate whether something comes earlier or later in a dictionary).

'apple' < 'banana'
True
'apple' > 'banana'
False
'banana' == 'apple'
False
'apple' < 'banana' < 'orange'
True
'app' < 'apple'
True
'apples' < 'apple'
False
'apple' < 'Apple'
False
'alpha' < 'Bravo'
False
"Apple".upper() =="APPLE"
True

String Formatting in Python

This notebook covers the "old style" of string formatting (using printf-like syntax). This style of formatting is documented here. Essentially we use the % operator:

  • when applied to integers, % is the modulus operator
  • when the first operand is a string, % is the format operator

Details for the "new style" are covered in the Python 3 Documentation here. Although more powerful, this style is also more object-oriented (not covered here).

a="apple" b='banana' c = " hello my name is gary'" d = 'double in "single ' print(d)
double in "single
nBananas = 27 "We have %d bananas. and %d oranges " % (nBananas,i,j,k) "we have {} bananas and {} oranges".format(nBananas, 6)
'we have 27 bananas and 6 oranges'
noSuch = "kiwis" 'We are out of %s today.' % noSuch
"%f" % 0.0 "{:00.4} \n \t {}".format(0.123456297823789, "gary")
'0.1235 \n \t gary'
a=1 b=2 b print(a)
2 1
"%f" % 1.5
"%.10f" % (1/7)
caseCount = 42 caseContents = "peaches" print("We have %d cases of %s today." % (caseCount, caseContents)) print("we have {} cases of {} today".format(caseCount, caseContents))
We have 42 cases of peaches today. we have 42 cases of peaches today

For additional details, see the online documentation.

Next lesson: 1.4 tuples, lists, dictionaries, and sets!