Python vs. R

Python vs. R#

This tutorial summarizes some of the main differences between R and Python. It is meant to help you avoid some of the potential pitfalls if you are coming from an R programming background.

In Python, indexing starts at 0, so the first element of a list is selected by the 0-th index.

lst = ["A", "B", 3.45]
lst[0]

'A'

R:

lst <- list("A","B", 3)

lst[1]

Output: "A"

Unlike R, the ending index is excluded in Python.

lst[0:1]

['A']

lst[0:2]

['A', 'B']

In R, you use {} to define scope. In Python, there are no curlies, and you use indentation to define scope.

R:

printString <- function(x,y) {

print("Hello!")  # indenting this line is not necessary

}

In Python, when you indent, you need to end the line above with a colon. We suggest that you use 4 spaces for indentation, but tabs work just as fine.

def printInput(name):
    if type(name) is str:
        print("String: Hello " + name + '!')
    elif type(name) is int or type(name) is float:
        print("Numeric: Hello " + str(name) + '!')
    else:
        print("We don't greet strangers!")

printInput("world")
printInput(123)
printInput(1.45)
printInput(None)

String: Hello world!
Numeric: Hello 123!
Numeric: Hello 1.45!
We don't greet strangers!

In Python, variables are passed as object references to functions. In R, they are passed as values.

Python uses static scoping. That is, you can figure out the scope of a variable by just looking at the code. Also, variables inside a function in Python are local to that function and they cannot be accessed from outside. However, passing a variable to a function in Python is a bit tricky. When you pass a variable to a function, Python creates a new local reference that points to that variable. For this reason, if you modify a mutable variable inside a Python function (like a list), it will also change in the main function. This is called unintended aliasing and it can result in hard to find bugs if you don’t pay attention. Here is an example.

x = [1, 2, 3]

def lst(x):
   x.append(4)
   return x

lst(x)

[1, 2, 3, 4]

# Variable x globally changed too!
x

[1, 2, 3, 4]

If this is not what you want, you need to explicitly tell Python to create a copy of x and call it y. This way, both variables will be independent of each other.

x = [1, 2, 3]
def lst2(x):
    y = list(x)  # create a copy of x and not a reference
    y.append(4)  # change the copy
    return y

lst2(x)

[1, 2, 3, 4]

# Variable x is still [1, 2, 3]
x

[1, 2, 3]

On the other hand, if you assign a different value to the reference pointing to an outside variable, that change will NOT be reflected outside, because remember, this reference is local to the function. Here is a simple example.

x = [1, 2, 3]
def lst3(x):
   x = [4, 5, 6]
   x.append(7)
   return x

lst3(x)

[4, 5, 6, 7]

# Variable x is still [1, 2, 3]
x

[1, 2, 3]

Assignment is not always what you think.

a1 = [1,1]
a2 = [1,1]

# This simply creates a view: both a and b point to the 
# same location in the computer memory
b = a1

b[0] = 'boo!'

print(a1)

['boo!', 1]

If you want a real copy, do either one of the below:

c = list(a2)
# OR
c = a2[:]
c

[1, 1]

How to check if two variables point to the same address in the memory:

b is a1

True

c is a2

False

Tricky! How to check if two variables have the same value:

c == a2

True

Python does this for memory efficiency. However, base types will work just fine:

a = 1
b = a
b = 'boo!'
print(a)

In R, to perform exponentiation, you can use either the caret symbol or double asterisk. In Python, you can only use double asterisk because ^ is bitwise XOR in Python.

So, here is \(2^3\) (notice how you can embed Latex code inside a notebook):

R:

Input: 2**3

Output: 8

Input: 2^3

Output: 8

2**3

2^3

In R, you can usually use dot when naming variables and functions. In Python, you use dot to access methods and attributes of classes and objects. In Python, you should not use dot when naming anything.

R:

my.integer.variable <- 5

a = [1,2,3]
print(a)

a.append(4)
print(a)

[1, 2, 3]
[1, 2, 3, 4]

In R, by default, reshaping of data happens column-wise. The default behaviour in Python is to reshape row-wise. This can cause subtle bugs that are hard to catch.

R:

matrix(0:9, nrow=2, ncol=5)
     [,1] [,2] [,3] [,4] [,5]
[1,]    0    2    4    6    8
[2,]    1    3    5    7    9

import numpy as np
np.arange(10).reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

However, you can force Python to do column-wise reshaping by setting the order parameter to ‘F’ inside the reshape function.

import numpy as np
np.arange(10).reshape(2, 5, order='F')

array([[0, 2, 4, 6, 8],
       [1, 3, 5, 7, 9]])