Skip to content

Latest commit

 

History

History
91 lines (79 loc) · 6.63 KB

Writing Efficient Python Code Part 2.md

File metadata and controls

91 lines (79 loc) · 6.63 KB

Writing Efficient Python Code Part 2

Gaining Efficiencies using Combining, Counting, and Iterating

Combining Objects

  • Suppose we have two lists we want to combine in a k,v fashion:

1. Combine using enumerate()

image

2. Combining with zip()

image

Python Collections Module

  • Python comes with a number of efficient built-in modules
  • The collections module contains specialized datatypes that can be used as alternatives to standard dictionaries, lists, sets, and tuples
    • image
  • To count items and assign counter counts as k,v pairs:
    • image

Python Itertools Module

  • Another built-in module, itertools, contains functional tools for working with iterators
    • image
  • A more efficient solution to create combinations is:
    • image

Set Theory

  • Often, we'd like to compare two objects to observe similarities and differences between their contents
  • When doing this type of comparison, it's best to leverage a branch of mathematics called set theory
  • A set is defined as a collection of distinct elements
  • Thus, we can use a set to collect unique items from an existing object e.g. unique_set = set(list_a)
  • Python comes with a built-in set data type. Sets come with some handy methods we can use for comparing
    • image
    • Note that set_a.union(set_b) collects unique items that appear in the sets
    • Symmetric difference collects items that exist in exactly one of the sets (but not both)
  • Example: Get items which appear in both lists without using for loops:
    • image

Eliminating Loops

  • Although using loops when writing Python code isn't necessarily a bad design pattern, using extraneous loops can be inefficient and costly
  • Benefits of eliminating loops:
    • Fewer lines of code
    • Better code readability (Zen - Flat is better than nested)
    • Efficiency gains
  • Example of finding sum of rows in a list of lists: (eliminating loops using built-ins)
    • image
  • Example of eliminating loops using Numpy in a np array:
    • image
  • Exercise to eliminate loops: Pokemon game:
    • image

Writing Better Loops

  • The best way to make a loop more efficient is to analyze what's being done within the loop. We want to make sure that we aren't doing unnecessary work in each iteration
  • If a calculation is performed for each iteration of a loop, but its value doesn't change with each iteration, it's best to move this calculation outside (or above) the loop
  • If a loop is converting data types with each iteration, it's possible that this conversion can be done outside (or below) the loop using a map function (Holistic conversions)
  • Anything that can be done once should be moved outside of a loop
  • Example of holistic conversions:
    • image

Pandas Optimizations

  • Pandas is a library used for data analysis
  • The main data strucuture of pandas is the DataFrame, a tabular data structure with labeled rows and columns (built on top of Numpy Array structure

Iterating Pandas DF with .iterrows()

  • .iterrows() returns each DataFrame row as a tuple of (index, pandas Series) pairs
  • This means each object returned from .iterrows() contains the index of each row as the first element and the data in each row as a pandas Series as the second element
  • Example: calculate a new column for storing a team's win percentage = Wins ('W') / Games ('G'): * image

Iterating with .itertuples()

  • We can use .itertuples() to loop over our DataFrame rows instead, which is a faster method * image

Pandas alternative to looping - .apply()

  • You can loop over DataFrames row-by-row with ease using .iterrows() and .itertuples()
  • However, in order to write efficient code, we want to avoid looping when possible
  • Thus we use .apply() acts like the map function
  • It takes a function as an input and applies this function to an entire DataFrame
  • Since we are working with tabular data, we must specify an axis that we'd like our function to act on
  • Must specify an axis to apply (0 for columns; 1 for rows)
  • Just like the map function, pandas' .apply() method can be used with anonymous functions or lambdas
  • Example 1: * image
  • Example 2: * image

Optimal pandas iterating - using Numpy arrays

  • Since pd is built on top of NumPy, we can grab any of these DataFrame column's values as a NumPy array using the .values property: * col_arr = df['col'].values
  • The beauty of knowing that pandas is built on NumPy can be seen when taking advantage of a NumPy array's broadcasting abilities
  • Remember, this means we can vectorize our calculations, and perform them on entire arrays all at once
  • Example of faster, and more readable code using np arrays: * image