Introduction to Python Programming#

This tutorial provides a basic overview of Python (>= 3.6) specifically for data analytics. In particular, we do not cover object-oriented programming (OOP) aspects of Python, how to manage errors and exceptions, etc.

Python is a mature general-purpose programming language and has lots of bells and whistles accordingly. However, for introductory data analytics, you only need to be proficient in a certain subset of all the features that Python has to offer and our goal here is to focus on these features only.

This tutorial does not assume any prior programming experience of any kind, though some background in functional programming would certainly be beneficial. The reader is referred to this book for a solid introduction to the Python programming language.

If you would like a cheat sheet for Python basics, this one here is quite useful. In addition, the link here has an excellent range of interactive learning materials on Python programming.

Table of Contents#

Let’s first suppress warnings as they can get annoying sometimes.

import warnings
warnings.filterwarnings('ignore')

In Python, comments are indicated with a “#” (pound) sign. Anything that comes after the # sign is ignored by the Python interpreter.

If you want to execute a line but suppress its output, you can end the line with a semi-colon.

# we would like to run the line below, 
# but we would like to hide its output:
2+2;

Variables#

In Python, you can assign a name to a value and this name is called a “variable”. To create a variable, you use the = sign.

x = 'Hello'  # here, x is of "string" base type, which we discuss further below

Once you define the variable x, you can use that variable instead of its actual value. Let’s verify this using the print function (sometimes we will refer to a function as a “command” or “method” - practically they all mean the same thing).

print(x)
Hello

You can assign the same value to more than one variable.

y = z = 23
print(y)
print(z)
23
23

Python allows you to assign multiple variables to multiple values simultaneously. This is called multiple assignment.

a, b, c = 3.14, 'Hello', 1887
print(a, b, c)
3.14 Hello 1887

In Python, you can easily swap values between multiple variables as below.

a, b, c = c, b, a
print(a, b, c)
1887 Hello 3.14

If you no longer need a variable, you can delete it from your computer’s memory by using the del() function.

del(x) 

Base Data Types in Python#

Python has the following base data types (there is also a byte base type, but we do not cover this).

Type

Description

int

Integer

float

Floating-point number

str

String

bool

Boolean (True or False)

To see the type of an object, you can use the type() function.

type(10)
int
type(10.5)
float
type('Python')
str
type(True)
bool

Integers#

In Python, you can construct integer values with no limits on the number of digits. Below, ** is the power operator.

long_integer = 12**34
print(long_integer)
4922235242952026704037113243122008064

The function int() lets you construct an integer number from a (compatible) string.

x = int('10')
print(x)
10
type(x)
int

While converting from string to integer, you will get an error if the string you are trying to convert does not represent any numbers.

# this will not work: x = int('Python')

Floats#

Real numbers are represented by floating-points in Python.

f = 18.87
print(f)
18.87

If you use int() with a floating-point number, you will only get the integer part of that number.

f = int(18.87)
print(f)
18
type(f)
int

We can use the float() function to define a floating-point number from a (compatible) string.

f = float('18.87')
print(f)
18.87
type(f)
float

Strings#

A string is a sequence of characters. In Python, strings are enclosed in either single or double quotes - it doesn’t matter which one you use. However, we recommend using single quotes over double quotes since you can create them with one less key press!

s1 = "I love chocolate."
s2 = 'I love chocolate.'
print(s1)
print(s2)
I love chocolate.
I love chocolate.

The exception here is that if you need to put a single quote in your string, you need to put the entire string inside double quotes and vice versa.

s3 = "Let's learn some Python."
s4 = 'The most popular languages for data analysis are apparently "R" and "Python".'
print(s3)
print(s4)
Let's learn some Python.
The most popular languages for data analysis are apparently "R" and "Python".

The str() function constructs a string from other compatible data types.

s5 = str('Python')
print(s5)
Python
type(s5)
str
s6 = str(18.87)
print(s6)
18.87
type(s6)
str

Starting Python 3.6, you can use f-strings to put other variables inside strings. This is extremely handy.

name = 'pi'
value = 3.14
print(f'The value of {name} is {value}.')
The value of pi is 3.14.

The function find() returns the starting index of a given sequence of characters in the string. If not found, it returns -1.

print(s1.find('I'))
print(s1.find('we'))
0
-1

startswith() checks if a string starts with a particular sequence of characters.

print(s1.startswith('I love'))
True

endswith() checks if a string ends with a particular sequence of characters.

print(s1.endswith('.'))
True

count() counts the number of occurance of a sequence of characters in the given string.

print(s1.count('e'))
print(s1.count('ee'))
2
0

lower() converts any upper case to lower and upper() does vice versa.

print(s1)
print(s1.lower())
print(s1.upper())
I love chocolate.
i love chocolate.
I LOVE CHOCOLATE.

The function replace() replaces one substring with another one. You need to pay attention that Python strings are immutable, that is, you cannot change a string once it is defined. However, you can replace a substring and set the output to a new string (possibly the original string), as shown below.

s1.replace('I','We all')

# Python strings are immutable!
# You cannot change them in place
print(s1)

# But the following will change the string s1
s1= s1.replace('I','We all')
print(s1)
I love chocolate.
We all love chocolate.

For other methods that are available for a string (or any other data structure), you can use the tab completion feature of Jupyter Notebook. Just define a string, put a dot, and then hit the tab button.

st = 'aBc'
st.swapcase()
'AbC'

A string is actually of a container type, so you can combine two strings using the “+” sign.

first_string = 'I love '
second_string = 'chocolate.'
print(first_string + second_string)
I love chocolate.

Booleans#

Boolean represents logical values: True or False. The function bool() returns either True or False based on its input parameter. In particular, bool() will always return True unless its input parameter is one of the below:

  1. Empty (such as [] (empty list), () (empty tuple), {} (empty dictionary))

  2. False

  3. None (this is a Python keyword for undefined objects)

  4. 0

NOTE: Python is a case sensitive programming language. Therefore, valid boolean values are True and False, not ~~true~~ or ~~false~~.

bool(100)
True
bool(True)
True
bool([])
False
bool(False)
False
bool(None)
False
bool(0)
False

Comparison and Logical Operators#

In Python, comparison and logical operators allows you to evaluate a condition to a single boolean value, True or False.

Comparison Operators#

Operator

Meaning

==

True, if equal

!=

True, if not equal to

<

less than

>

greater than

<=

less than or equal to

>=

greater than or equal to

x = 1000
y = 2000
print('x == y:', x == y)
print('x != y:', x != y)
print('x < y:', x < y)
print('x > y:', x > y)
print('x <= y:', x <= y)
print('x >= y:', x >= y)
x == y: False
x != y: True
x < y: True
x > y: False
x <= y: True
x >= y: False

You can also compare strings with comparison operators.

string_1 = 'Python'
string_2 = 'PYTHON'
print(string_1 + ' == ' + string_2 + ':', string_1 == string_2)
Python == PYTHON: False

Alternatively, for base types, instead == or !=, you can use is or is not respectively.

print('x is y:', x is y)
print('x is not y:', x is not y)
x is y: False
x is not y: True

Logical Operators#

You can use logical operators to compare boolean values.

Operator

Meaning

and

True, if both statements are true

or

True, if one of statements is true

not

False, if the result is true

(18 == 18) and (18 != -1)
True
(10 < 15) or (19 > 20)
True
not(1900 >= 2000)
True

Basic Mathematical Operations#

Operator

Task Performed

+

Addition

-

Subtraction

/

Division

//

Floor division

*

Multiplication

%

Modulo

**

Exponent

2 + 3
5
20 - 13
7
4/2
2.0

The / operator always results in a float, even if the result is actually an integer.

print(type(4/2))
<class 'float'>
4/3
1.3333333333333333

The operator // results in an integer division such that only the integer part is kept and the result is of integer type.

4//3
1
print(type(4//3))
<class 'int'>
23*10
230
23%10
3
2**10
1024

You can also use the pow() function to compute “x to the power y”.

pow(2, 10)
1024

round() simply rounds a number based on the specified number of decimals.

round(3.14159265359, 0)
3.0
round(3.14159265359, 1)
3.1
round(3.14159265359, 2)
3.14

Expect to see some strange behavior with round() - for instance, rounding to an integer is done to the nearest even number! Unlike what you might expect, rounding is a very tricky business and it has even caused fatalities. For a detailed explanation for rounding in Python, please see this.

round(3.5, 0)
4.0
round(4.5, 0)
4.0

abs() returns the absolute value of its input parameter.

abs(-3.4)
3.4

Containers#

Python defines two types of containers:

  • Ordered sequences (lists, tuples, and strings)

  • Key containers (dictionaries and sets).

Lists#

A Python list is an ordered sequence of elements that is enclosed in square brackets and separated by a comma. You can access any of these elements by simply referring to its index value.

You can put any combination of data types into a list. Lists are declared by [] or list().

# create an empty list
list0 = []
list1 = list()
print(type(list0), type(list1))
<class 'list'> <class 'list'>
lst = [1, 1.23, True, 'hello', None]
print(lst)
[1, 1.23, True, 'hello', None]
lst = list((1, 1.23, True, 'hello', None)) # notice the double round brackets
print(lst)
[1, 1.23, True, 'hello', None]
cars = ['Toyota', 'Mercedes', 'Ford']
print(cars)
['Toyota', 'Mercedes', 'Ford']

NOTE: In Python, indexing starts from 0. Thus, for instance, the list cars will have Toyota at 0 index, Mercedes at 1 index, and Ford at 2 index.

cars[0]
'Toyota'

Indexing in reverse order is also possible. For instance, if you want to access Ford, the last element in cars, the index would be -1. And index -2 will be Mercedes, and finally index -3 will be Toyota.

cars[-1]
'Ford'

Indexing is limited to accessing a single element. Slicing, on the other hand, is accessing a sequence of elements inside the list.

# pay attention to the range() function
# and how we use list() to convert the output to a proper list
num = list(range(0,10))
print(num)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
print(num[0:4])
print(num[4:])
[0, 1, 2, 3]
[4, 5, 6, 7, 8, 9]
print(num[-3:])  # get the 3 elements from the end
[7, 8, 9]

It is also possible to slice a parent list with a step length.

num[0:9:3]
[0, 3, 6]
num[0:9:5]
[0, 5]

As in lists, you can also access any character in a string through indexing and slicing.

name = 'Python'
print(name[0])
print(name[1:])
P
ython

However, you cannot modify a string as strings are immutable.

# This will not work: name[3] = 'y'

Equal vs. Identical Lists#

Pay attention to equal vs. identical for two lists (or variables of other container types in general). The equality operator “==” checks whether two variables have the same contents. The “is” operator checks whether two variables are identical, that is, if they point to the same address in the memory. As the example below shows, two variables might have the same content, but they might be pointing to different addresses in the memory. (This distinction is only for container types; for base types (such as integers and strings), there is no difference between being equal and identical.)

x = y = [1, 2, 3]  # both x and y point to the same address in the memory, so they are identical
z = [1, 2, 3]  # z has the same content as x and y, but it points to a different address

print('x == y:', x == y)
print('z == y:', z == y)
print('x is y:', x is y)

# this is False as x and z point to different addresses 
# in the memory even though they have the same contents
print('z is y:', z is y)  
x == y: True
z == y: True
x is y: True
z is y: False

Operations on Lists#

append() adds a element to the end of the list.

lst = [1, 2, 3, 4, 5]
print(lst)

lst.append(6)
print(lst)
[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5, 6]

extend() adds another list at the end.

lst.extend([7, 8])
print(lst)
[1, 2, 3, 4, 5, 6, 7, 8]

Alternatively, you can use + to combine multiple lists (or multiple strings).

lst = lst + [9, 10]
print(lst)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

If you want to insert an item at the position you specify, use insert(x, y). Remember, append() can insert the element only at the end.

lst.insert(5, 'Python')
print(lst)
[1, 2, 3, 4, 5, 'Python', 6, 7, 8, 9, 10]

You can use remove() to remove the first occurance of an element by specifying the element itself using the function.

lst.remove('Python')
print(lst)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

sort() method arranges the elements in ascending order in place. That is, the original list is updated with the new order. You can sort all numerical or all string lists, but not a mix of them.

lst_num = [3, 5, 1.23]
lst_num.sort()
print(lst_num)
[1.23, 3, 5]
lst_str = ['hello', 'world']
print(lst_str)
lst_str.sort(reverse=True)
print(lst_str)
['hello', 'world']
['world', 'hello']
lst_mix = [3, 5, 1.23, 'hello']
# this will not work: lst_mix.sort()

For reversing a list in place, use reverse()

lst_mix.reverse()
print(lst_mix)
['hello', 1.23, 5, 3]

If you do not want to modify the original list, use sorted() and reversed() and set them equal to a new list.

lst = [3, 5, 1]
lst_new = sorted(lst)
print('original:', lst)
print('sorted:', lst_new)
lst_reversed = reversed(lst)  # this returns an interator, not a list!
print('just reversed:', lst_reversed)
lst_reversed_list = list(lst_reversed)
print('reversed and re-listed:', lst_reversed_list)
original: [3, 5, 1]
sorted: [1, 3, 5]
just reversed: <list_reverseiterator object at 0x1033f7070>
reversed and re-listed: [1, 5, 3]

count() counts the number of a particular element that is present in the list. If there is none, it will return 0.

lst.count(1)
1

index() finds the index of a particular element. Note that if there are multiple elements of the same value then this will return the first index. if there is none, it will throw an error.

lst
[3, 5, 1]
lst.index(1)
2

For other methods that are available for a list (or any other object), you can use the tab completion feature of Jupyter Notebook. Just define a list, put a dot, and then hit the tab button.

lst.clear()

If you want your list to be immutable, that is unchangable, use the tuple container. You can define a tuple by () or tuple().

tpl = (1, 2, 3)
# You cannot change a tuple. For instance, try tpl[0] = 3.14

If you want a set in a mathematical sense, use the set container. You can define a set by set(). Python has a rich collection of methods for sets such as union, intersection, set difference, etc.

st = set([1, 1, 1, 2, 2, 2, 2])
print(st)
{1, 2}

Dictionaries#

Dictionaries are like a lookup table. A dictionary consists of “key: value” pairs. To define a dictionary, you can use either {} or dict().

# create an empty dictionary
dict0 = {}
dict1 = dict()
print(type(dict0), type(dict1))
<class 'dict'> <class 'dict'>
dict0 = {}
dict0['One'] = 1
dict0['Two'] = 2 
dict0['Three'] = 3
print(dict0)
{'One': 1, 'Two': 2, 'Three': 3}

An alternative way to define a dictionary is below.

dict1 = {'One': 1, 'Two': 2, 'Three': 3}
print(dict1)
{'One': 1, 'Two': 2, 'Three': 3}

You can access the value ‘3’ via the key ‘Three’.

print(dict0['Three'])
3

Operations on Dictionaries#

values() returns a list of values in a dictionary.

dict0.values()
dict_values([1, 2, 3])

keys() returns all the keys in a dictionary.

dict0.keys()
dict_keys(['One', 'Two', 'Three'])

items() returns the list with all dictionary keys with values.

dict0.items()
dict_items([('One', 1), ('Two', 2), ('Three', 3)])

update() inserts items to a dictionary.

dict1 = {'Four': 4}
dict0.update(dict1)
dict0
{'One': 1, 'Two': 2, 'Three': 3, 'Four': 4}

clear() clears the entire dictionary.

dict0.clear()
print(dict0)
{}

Common Operations on Containers#

num = list(range(10))
print(num)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

To find the length of the list, that is, the number of elements in a list, use the len() method. For dictionaries, this method will return the total number of items.

len(num)
10
len(dict0)
0

If a list consists of all numeric or all string elements, then min() and max() gives the minimum and maximum value in the list.

min(num)
0
max(num)
9
num2 = num + ['hello']
# this won't work because not all elements are numeric: min(num2)
# min() and max() also work with strings:
st = ['one','two', 'three']
max(st)
'two'

How to check if a particular element is in a predefined list or dictionary:

names = ['Earth','Air','Fire']
'Tree' in names
False
'Air' in names
True

For a dictionary, in checks the keys, not values.

dict0 = {'One': 1, 'Two': 2, 'Three': 3}
'One' in dict0 
True
'Four' in dict0 
False

Conditional Statements and Loops#

If#

Statement block is executed only if a condition is true.

if logical_condition:
    statement(s)

Make sure you put colon : at the end and indent the next line (preferably by 4 spaces, not a tab).

a = 2000
b = 1999

if a > b:
  print('a is greater than b')
a is greater than b

If-else#

if logical_condition: 
    statement(s)
else:
    statement(s)
a = 2000
b = 1999

if a < b:
  print('b is greater than a')
else:
  print('a is greater than b')
a is greater than b

If-elif#

if logical_condition: 
    statement(s)
elif:
    statement(s)
else:
    statement(s)
a = 2000
b = 2000

if b > a:
  print('b is greater than a')
elif a == b:
  print('a and b are equal')
else:
  print('a is greater than b')
a and b are equal

Nested if#

You can also write if statements inside a if statement.

a = 1999
b = 2000

if a > b:
    print('a > b')
elif a < b:
    print('a < b')
    if a == 1999:
        print('a = 1999')
    else:
        print('a is not equal to 1999')
else:
    print('a = b')
a < b
a = 1999

While#

while logical_condition:
    statement(s)
i = 1
while i < 10:
    print(i*2)
    i = i + 1
print('Mission accomplished!')
2
4
6
8
10
12
14
16
18
Mission accomplished!

For#

For each item of a sequence, statements are executed.

for variable in sequence:
    statement(s)

NOTE: Keep in mind that the colon : at the end and indentation of the next line are mandatory.

for i in range(10):
    print(i)
0
1
2
3
4
5
6
7
8
9
for i in [1, 5, 10, 15]:
    print(i*5)
5
25
50
75
dict0 = {'One': 1, 'Two': 2, 'Three': 3}
for key, value in dict0.items():
    print(f'key is {key}, value is {value}.')
key is One, value is 1.
key is Two, value is 2.
key is Three, value is 3.

Break#

break terminates the loop when a condition becomes true.

for i in range(10):
    print(i)
    if i >= 4:
        break
0
1
2
3
4

Continue#

Unlike break, when a condition becomes a true, continue lets you skip the rest of the code inside a loop for the current iteration only and continue on with the next iteration.

for i in range(10):
    if i == 2:
        print('Ignoring 2')
        continue
    else:
        print(i)
0
1
Ignoring 2
3
4
5
6
7
8
9

List Comprehension#

You can create a list with a for-loop as below. This is called list comprehension and it is a very commonly used Python feature.

[do_something_with_x for x in sequence]

For example, say we would like create a list of numbers ranging from 0 to 9.

[z for z in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

How to convert the above numbers to strings and then combine them with a dash in between:

st_lst = [str(z) for z in range(10)]
print('-'.join(st_lst))
0-1-2-3-4-5-6-7-8-9
  • List comprehension is very flexible as you can have an additional conditional statement.

    [do_something_with_x for x in sequence if x some_condition]
    

For example, how to return a list of even numbers ranging from 0 to 10:

[z for z in range(11) if z % 2 == 0]
[0, 2, 4, 6, 8, 10]

Functions#

You can define your own functions that perform a particular task. You can also pass in input parameters to your functions. You define a function using the keyword def followed by the function name and any input parameters you might have. If you like, you can define default values for your parameters. Functions can return a value using the return() command. If there is no return statement, the function will implicitly return None. A typical syntax for a function is as follows.

def function_name(parameter):
    """ documentation """
    statement(s)
    return(value)
def test_function():   
    print('test')
test_function()
test
def iLove(food):
    print('I love ' + food)
iLove('chocolate')
I love chocolate

Here is another version of the above function that uses a default value for its input parameter.

def iLove(food='junk food'):
    print('I love ' + food)
iLove()
I love junk food

Another example of function that performs a mathematical operation based on the input value.

def five_times(x):
    return(5 * x)
five_times(3)
15

It is always a good practice to document your functions. You should write the documentation right after declaring the function. For example, suppose you would like to create a square function to return the squared value of its input parameter.

def square(x):
    """
    Returns the square of the input.
    """
    return x ** 2
square(4)
16

To access “documentation” of a function, you use its __doc__ method.

square.__doc__
'\n    Returns the square of the input.\n    '

Object Introspection#

For help with variables and functions, add ? at the beginning.

# the output will appear in a box at the bottom of your browser.
?print
Signature: print(*args, sep=' ', end='\n', file=None, flush=False)
Docstring:
Prints the values to a stream, or to sys.stdout by default.

sep
  string inserted between values, default a space.
end
  string appended after the last value, default a newline.
file
  a file-like object (stream); defaults to the current sys.stdout.
flush
  whether to forcibly flush the stream.
Type:      builtin_function_or_method
?square
Signature: square(x)
Docstring: Returns the square of the input.
File:      /var/folders/2p/_97_wp4j3vq9k61zw0xjljs40000gn/T/ipykernel_20504/3098292944.py
Type:      function

With functions, add ?? at the beginning to see the source code, if available.

??square
Signature: square(x)
Source:   
def square(x):
    """
    Returns the square of the input.
    """
    return x ** 2
File:      /var/folders/2p/_97_wp4j3vq9k61zw0xjljs40000gn/T/ipykernel_20504/3098292944.py
Type:      function

Modules#

A module is just a code library. It’s a file containing a collection of functions (also known as methods/ commands) and variables (also known as attributes) that you can include in your code. If someone has already written a module that you need, you can just use it instead of reinventing the wheel! The most commonly used modules in Python for data analytics are the NumPy, Pandas, StatsModels, and Scikit-Learn modules.

You can see a list of available variables and functions in a module using the dir() command.

import numpy as np
dir(np)[100:110]
['bitwise_not',
 'bitwise_or',
 'bitwise_right_shift',
 'bitwise_xor',
 'blackman',
 'block',
 'bmat',
 'bool',
 'bool_',
 'broadcast']

As a simple illustration, let’s create a matrix using NumPy.

data = np.arange(12)
data.reshape(3,4)
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Exercises#

  1. Return a list of numbers ranging between 0 to 50 (inclusive) that are divisible by 3 and 5.

  1. Suppose we have the following dictionary. Add a new course named ‘Introduction to Analytics’ with course code MATH2350 to this dictionary.

course_names = {'MATH2319': 'Machine learning', 'MATH1298': 'Categorical Data Analysis'}
  1. Given a value x, write a function that checks if x is a number and returns its squared value if so. If not, the function should return None.

Solutions#

  1. List comprehension
  2. [z for z in range(51) if z%3 == 0 and z%5 == 0] 
    
  3. Dictionaries
  4. course_names['MATH2350'] = 'Introduction to Analytics'
    
  5. Conditional statements and functions
  6. def square(x):
        """
        Return the square of x if number, None otherwise.
        """
        if isinstance(x, (int, float)):
            return x ** 2
        else:
            return None