Python Basics#

The goal of this section is to give seasoned developers some introduction to Python. We will primarily focus on crucial differences in Python and other languages helpful for this course. The introduction of significant libraries relevant to data wrangling, analysis, and Machine Learning will come later during the course.

✏️ The example is inspired by [Tom]. Some sections are reduced or extended compared to the original example. We strongly recommend visiting tutorials on Classes and Modules to help you better structure and abstract your code.

Data Types#

Let”s start with an overview of Common built-in Python data types. See the Python 3 documentation for a summary of the standard built-in Python datatypes.

English name	Type name	Type Category	Description	Example
integer	`int`	Numeric Type	positive/negative whole numbers	`42`
floating point number	`float`	Numeric Type	real number in decimal form	`3.14159`
boolean	`bool`	Boolean Values	true or false	`True`
string	`str`	Sequence Type	text	`\"I Can Has Cheezburger?\"`
list	`list`	Sequence Type	a collection of objects - mutable & ordered	`["Ali", "Xinyi", "Miriam"]`
tuple	`tuple`	Sequence Type	a collection of objects - immutable & ordered	`("Thursday", 6, 9, 2018)`
dictionary	`dict`	Mapping Type	mapping of key-value pairs	`{"name": "DSCI", "code": 511, "credits": 2}`
none	`NoneType`	Null Object	represents no value	`None`

Working With Sequences#

List, Tuples, Sets, and Dictionaries allow us to store multiple things (“elements”) in a single object. Lists and Tuples are ordered.

Lists#

my_list = [1, 2, "THREE", 4, 0.5]
print("The sequence is of", type(my_list), ", it has", len(my_list), "items with the following content:", my_list)

The sequence is of <class 'list'> , it has 5 items with the following content: [1, 2, 'THREE', 4, 0.5]

The List can hold any datatype - even other lists:

another_list = [1, "two", [3, 4, "five"], True, None, {"key": "value"}]
another_list

[1, 'two', [3, 4, 'five'], True, None, {'key': 'value'}]

List are mutable structures, so you can easily extend them:

my_list.append(10)
my_list

[1, 2, 'THREE', 4, 0.5, 10]

Check out the documentation for more list methods.

Tuples#

Tuples look similar to lists but have a crucial difference - they are immutable.

my_tuple = (1, 2, "THREE", 4, 0.5)
print("The sequence is of", type(my_tuple), ", it has", len(my_tuple), "items with the following content:", my_tuple)

The sequence is of <class 'tuple'> , it has 5 items with the following content: (1, 2, 'THREE', 4, 0.5)

We can access values inside a list or tuple using square bracket syntax. Python uses zero-based indexing, meaning a list first element is in position 0, not position 1.

print("The content of the list:", my_list)
print("The first element is", my_list[0], ", the third element is", my_list[2], ", the last element is", my_list[-1])

The content of the list: [1, 2, 'THREE', 4, 0.5, 10]
The first element is 1 , the third element is THREE , the last element is 10

We can use the colon: to access a sub-sequence. This is called “slicing”:

my_list[1:3]

[2, 'THREE']

Sets#

Another built-in Python data type is the Set, which stores an un-ordered list of unique items. Being unordered, sets do not record element position or order of insertion and so do not support indexing.

s = {2, 3, 5, 11}

you can easily compare sets:

{1, 2, 3} == {3, 2, 1}

True

however, this is not valid for lists:

[1, 2, 3] == [3, 2, 1]

False

conversion of a list to the Set might come in handy in case you want to find out unique items:

set([2, 3, 5, 7, 5])

{2, 3, 5, 7}

Dictionary#

A Dictionary is a mapping between key-values pairs and is defined with curly-brackets:

house = {
    "bedrooms": 3,
    "bathrooms": 2,
    "city": "Vancouver",
    "price": 2499999,
    "date_sold": (1, 3, 2015),
}

We can access a specific field of a dictionary with square brackets:

house["price"]

Loops and Comprehensions#

Loops#

For loops allow us to execute code a specific number of times:

for n in [2, 7, -1, 5]:
    print("The number is {} and its square is {}".format(n, n ** 2))

The number is 2 and its square is 4
The number is 7 and its square is 49
The number is -1 and its square is 1
The number is 5 and its square is 25

The main points to notice:

Keyword for begins the loop. Colon : ends the first line of the loop.
Block of code indented is executed for each value in the list (hence the name “for” loops).
The loop ends after the variable n has taken all the values in the list.
We can iterate over any kind of “iterable”: list, tuple, range, set, string.
An iterable is any object with a sequence of values that can be looped over.

We can write a loop inside another loop to iterate over multiple dimensions of data:

for x in [1, 2, 3]:
    for y in ["a", "b", "c"]:
        print((x, y))

(1, 'a')
(1, 'b')
(1, 'c')
(2, 'a')
(2, 'b')
(2, 'c')
(3, 'a')
(3, 'b')
(3, 'c')

enumerate() adds a counter to an iterable, which we can use within the loop:

for n, i in enumerate(["a", "b", "c"]):
    print("index {}, value {}".format(n, i))

index 0, value a
index 1, value b
index 2, value c

We can loop through key-value pairs of a dictionary using items(). The general syntax is for key, value in dictionary.items():

courses = {
    10: "ten",
    25: "twentyfive",
    30: "thirty!"
}

for num, as_string in courses.items():
    print("Number {} is {}".format(num, as_string))

Number 10 is ten
Number 25 is twentyfive
Number 30 is thirty!

Comprehensions#

Comprehensions allow us to build lists/tuples/sets/dictionaries in one convenient, compact line of code.

Below is a standard for loop you might use to iterate over an iterable and create a list:

words = ["Tom", "ingests", "many", "eggs", "to", "outrun", "large", "eagles", "after", "running", "near", "!"]
first_letters = []
for word in words:
    first_letters.append(word[0])
first_letters

['T', 'i', 'm', 'e', 't', 'o', 'l', 'e', 'a', 'r', 'n', '!']

a comprehension it is a more readable way of achieving the same:

[word[0] for word in words]

['T', 'i', 'm', 'e', 't', 'o', 'l', 'e', 'a', 'r', 'n', '!']

you can add a condition:

[i for i in range(11) if i % 2 == 0]

[0, 2, 4, 6, 8, 10]

and even do Dictionary comprehension:

{word: len(word) for word in words}

{'Tom': 3,
 'ingests': 7,
 'many': 4,
 'eggs': 4,
 'to': 2,
 'outrun': 6,
 'large': 5,
 'eagles': 6,
 'after': 5,
 'running': 7,
 'near': 4,
 '!': 1}

Functions#

A function is a reusable piece of code that can accept input parameters, also known as “arguments”.

For example, let’s define a function called square which takes one input parameter n and returns the square n**2:

def square(n):
    n_squared = n ** 2
    return n_squared


square(12345)

152399025

Functions begin with the def keyword, the function name, arguments in parentheses, and then a colon (:). The code executed by the function is defined by indentation. The function’s output or “return” value is specified using the return keyword.

Sometimes it is convenient to have default values for some arguments in a function. Because they have default values, these arguments are optional and are hence called “optional arguments”. For example:

def repeat_string(s, n=2):
    return s * n


repeat_string("ds", 5)

'dsdsdsdsds'

you can specify the argument to use:

repeat_string("ds", n=5)

'dsdsdsdsds'

This is useful when you have multiple arguments. If you do not pass the second argument, the default n=2 will be used:

repeat_string("ds")

'dsds'

In many programming languages, functions can only return one object. That is technically true in Python too, but there is a “workaround” which is to return a tuple:

def sum_and_product(x, y):
    return x + y, x * y  # note that parentheses are omitted for simplicity


sum_and_product(5, 6)

(11, 30)

Miscellaneous#

try / except#

We don’t want our code to crash if something goes wrong - we want it to fail gracefully. In Python, this can be accomplished using try/except:

try:
    5 / 0  # ZeroDivisionError
except Exception as ex:
    print("You did something bad!")
    print(ex)

You did something bad!
division by zero

assert Statements#

assert statements are the most common way to test your functions. They cause your program to fail if the tested condition is False. The syntax is:

assert "mike" in ["mike", "tom", "tiffany"], "Instructor not present!"

now try to change “mike” to “timothy”.

Some Python first mistakes with references#

Let’s review some classic issues with references:

a = 1
b = a
a = 1254
print(b)

in this case, you get the copy of the first number, and you can change a without changing b. However, when referencing a list (or dict), no copy is created, so:

a = [1, 2, 3]
b = a
a[1] = 'strv'
print(b)

[1, 'strv', 3]

Exercises#

Given this nested list, use indexing to grab the word “DS”:#

my_list = [10, [3, 4], [5, [100, 200, ["DS"]], 23, 11], 1, 7]

# TODO: your answer here

Given this nest dictionary grab the word “DS”:#

my_dict = {
    "outer": [
        1,
        2,
        3,
        {"inner": ["this", "is", "inception", {"inner_inner": [1, 2, 3, "DS"]}]},
    ]
}

# TODO: your answer here

Use list comprehension to square every odd number in the following list of numbers:#

my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]

# TODO: your answer here

Write a For loop that prints all words capitalized:#

my_list = ["Steve Irwin", "koala", "kangaroo", "Australia", "Sydney", "desert"]

# TODO: your answer here

Create a function divisible(a, b) that accepts two integers (a and b) and returns True if a is divisible by b without a remainder.#

For example, divisible(10, 3) should return False, while divisible(6, 3) should return True.

def divisible(a, b):
    pass  # TODO: remove this and add your answer here

Resources#

Tom: TomasBeuzen. Tomasbeuzen/python-programming-for-data-science: content from the university of british columbia's master of data science course dsci 511. URL: https://github.com/TomasBeuzen/python-programming-for-data-science.

Data Science Academy

Python Basics

Contents