- Suppose we have two lists we want to combine in a k,v fashion:
- Python comes with a number of efficient built-in modules
- The collections module contains specialized datatypes that can be used as alternatives to standard dictionaries, lists, sets, and tuples
- To count items and assign counter counts as k,v pairs:
- Another built-in module, itertools, contains functional tools for working with iterators
- A more efficient solution to create combinations is:
- Often, we'd like to compare two objects to observe similarities and differences between their contents
- When doing this type of comparison, it's best to leverage a branch of mathematics called set theory
- A set is defined as a collection of distinct elements
- Thus, we can use a set to collect unique items from an existing object e.g.
unique_set = set(list_a)
- Python comes with a built-in set data type. Sets come with some handy methods we can use for comparing
- Example: Get items which appear in both lists without using for loops:
- Although using loops when writing Python code isn't necessarily a bad design pattern, using extraneous loops can be inefficient and costly
- Benefits of eliminating loops:
- Fewer lines of code
- Better code readability (Zen - Flat is better than nested)
- Efficiency gains
- Example of finding sum of rows in a list of lists: (eliminating loops using built-ins)
- Example of eliminating loops using Numpy in a np array:
- Exercise to eliminate loops: Pokemon game:
- The best way to make a loop more efficient is to analyze what's being done within the loop. We want to make sure that we aren't doing unnecessary work in each iteration
- If a calculation is performed for each iteration of a loop, but its value doesn't change with each iteration, it's best to move this calculation outside (or above) the loop
- If a loop is converting data types with each iteration, it's possible that this conversion can be done outside (or below) the loop using a map function (Holistic conversions)
- Anything that can be done once should be moved outside of a loop
- Example of holistic conversions:
- Pandas is a library used for data analysis
- The main data strucuture of pandas is the DataFrame, a tabular data structure with labeled rows and columns (built on top of Numpy Array structure
- .iterrows() returns each DataFrame row as a tuple of (index, pandas Series) pairs
- This means each object returned from .iterrows() contains the index of each row as the first element and the data in each row as a pandas Series as the second element
- Example: calculate a new column for storing a team's win percentage = Wins ('W') / Games ('G'): *
- You can loop over DataFrames row-by-row with ease using .iterrows() and .itertuples()
- However, in order to write efficient code, we want to avoid looping when possible
- Thus we use .apply() acts like the map function
- It takes a function as an input and applies this function to an entire DataFrame
- Since we are working with tabular data, we must specify an axis that we'd like our function to act on
- Must specify an axis to apply (0 for columns; 1 for rows)
- Just like the map function, pandas' .apply() method can be used with anonymous functions or lambdas
- Example 1: *
- Example 2: *
- Since pd is built on top of NumPy, we can grab any of these DataFrame column's values as a NumPy array using the
.values
property: *col_arr = df['col'].values
- The beauty of knowing that pandas is built on NumPy can be seen when taking advantage of a NumPy array's broadcasting abilities
- Remember, this means we can vectorize our calculations, and perform them on entire arrays all at once
- Example of faster, and more readable code using np arrays: *