diff --git a/README.md b/README.md index 92bbfe4..63d6514 100644 --- a/README.md +++ b/README.md @@ -27,30 +27,30 @@ Some additional time will be reserved for discussion of real programming challen | Day | Topic | Time | | :-: | :----------------------------------------------------------------------------- | :-----------: | -| 1 | Introductions | 9:00 - 9:15 | -| | Setting the Stage | 9:15 - 9:30 | -| | Git & version control | 9:30 - 10:15 | -| | Break | 10:15 - 10:30 | -| | EDA & Our First scikit-learn Model | 10:30 - 12:00 | -| | Q&A | 12:00 - 12:30 | -| 2 | Q&A | 8:45 - 9:00 | -| | Modular Code | 9:00 - 10:00 | -| | Feature Engineering | 10:00 - 11:00 | -| | Break | 11:00 - 11:15 | -| | Case Study, pt. 1 | 11:15 - 12:00 | -| | Q&A | 12:00 - 12:30 | -| 3 | Q&A | 8:45 - 9:00 | -| | Model Evaluation & Selection | 9:00 - 10:15 | -| | Break | 10:15 - 10:30 | -| | More on Modular Code | 10:30 - 11:15 | -| | Unit Tests | 11:15 - 12:00 | -| | Q&A | 12:00 - 12:30 | -| 4 | Q&A | 8:45 - 9:00 | -| | More on Unit Tests | 9:00 - 9:30 | -| | ML lifecycle management | 9:30 - 10:30 | -| | Break | 10:30 - 10:45 | -| | Case Study, pt. 2 | 10:45 - 11:45 | -| | Case Study Review, pt. 2 and Q&A | 11:45 - 12:30 | +| 1 | Introductions | 12:45 - 1:00 | +| | Setting the Stage | 1:00 - 1:15 | +| | Git & version control | 1:15 - 2:00 | +| | Break | 2:00 - 2:15 | +| | EDA & Our First scikit-learn Model | 2:15 - 3:45 | +| | Q&A | 3:45 - 4:15 | +| 2 | Q&A | 12:45 - 1:00 | +| | Modular Code | 1:00 - 2:00 | +| | Feature Engineering | 2:00 - 3:00 | +| | Break | 3:00 - 3:15 | +| | Case Study, pt. 1 | 3:15 - 4:00 | +| | Q&A | 4:00 - 4:15 | +| 3 | Q&A | 12:45 - 1:00 | +| | Model Evaluation & Selection | 1:00 - 2:15 | +| | Break | 2:15 - 2:30 | +| | More on Modular Code | 2:30 - 3:15 | +| | Unit Tests | 3:15 - 4:00 | +| | Q&A | 4:00 - 4:15 | +| 4 | Q&A | 12:45 - 1:00 | +| | More on Unit Tests | 1:00 - 1:30 | +| | ML lifecycle management | 1:30 - 2:30 | +| | Break | 2:30 - 2:45 | +| | Case Study, pt. 2 | 2:45 - 3:45 | +| | Case Study Review, pt. 2 and Q&A | 3:45 - 4:15 | ### Course Preparation @@ -88,3 +88,5 @@ If you have any specific questions prior to the class you can reach out to us di * Ethan Swan: [GitHub](https://www.github.com/eswan18) & [Email](mailto:ethanpswan@gmail.com) * Bradley Boehmke: [GitHub](https://www.github.com/bradleyboehmke) & [Email](mailto:bradleyboehmke@gmail.com) + * Gus Powers: [GitHub](https://www.github.com/augustopher) & [Email](mailto:guspowers0@gmail.com) + * Jay Cunningham: [GitHub](https://github.com/cunningjames) & [Email](mailto:james@notbadafterall.com) diff --git a/notebooks/00-Introduction.ipynb b/notebooks/00-Introduction.ipynb index efddab3..f213f99 100644 --- a/notebooks/00-Introduction.ipynb +++ b/notebooks/00-Introduction.ipynb @@ -2,41 +2,111 @@ "cells": [ { "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "# Advanced Python for Data Science\n", "\n", - "Brad Boehmke & Ethan Swan\n", + "Gus Powers and Jay Cunningham\n", "\n", - "Jan 11, 13, 18, & 20" - ], + "January 2023" + ] + }, + { + "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } - } + }, + "source": [ + "## Introductions" + ] }, { "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ - "## Introductions" - ], + "## Gus Powers\n", + "\n", + " \n", + " \n", + "
\n", + "

Lead Data Scientist at 84.51°

\n", + "
    \n", + "
  • Creating and maintaining data science tools for internal use
  • \n", + "
  • Python, Bash (shell), & R
  • \n", + "
\n", + "

Academic

\n", + "
    \n", + "
  • BS, Chemistry, Thomas More College
  • \n", + "
  • MS, Chemistry, University of Cincinnati
  • \n", + "
  • MS, Business Analytics, University of Cincinnati
  • \n", + "
\n", + "

Contact

\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } - } + }, + "source": [ + "## Jay Cunningham\n", + "\n", + " \n", + " \n", + "
\n", + "

Lead Data Scientist at 84.51°

\n", + "
    \n", + "
  • Researching and developing forecasting models
  • \n", + "
  • Machine learning, Python
  • \n", + "
\n", + "

Academic

\n", + "
    \n", + "
  • BA, Mathematics, University of Kentucky
  • \n", + "
  • MA, Economics, University of North Carolina (Greensboro)
  • \n", + "
\n", + "

Contact

\n", + " \n", + "
" + ] }, { "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, "source": [ - "## About Brad\n", + "## Brad Boehmke\n", "\n", " \n", " \n", "
\n", - "

Professional

\n", + "

Director, Data Science at 84.51°

\n", "
    \n", - "
  • Director, Advanced Programming and Technology Solutions Team
  • \n", - "
  • 84.51°
  • \n", + "
  • Productionizing models and science solutions
  • \n", + "
  • R&D and protogyping new solutions
  • \n", + "
  • Python, R, & MLOps toolchain
  • \n", "
\n", "

Academic

\n", "
    \n", @@ -46,52 +116,59 @@ "
\n", "

Contact

\n", " \n", "
" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": { + "cell_style": "split", + "slideshow": { + "slide_type": "skip" + } + }, "source": [ - "## About Ethan\n", + "## Ethan Swan\n", "\n", " \n", " \n", "
\n", - "

Professional

\n", - "
    \n", - "
  • Lead Data Scientist, Advanced Programming and Technology Solutions Team
  • \n", - "
  • 84.51°
  • \n", - "
\n", - "

Academic

\n", - "
    \n", - "
  • BS, Computer Science, University of Notre Dame
  • \n", - "
  • MBA, Business Analytics, University of Notre Dame
  • \n", - "
\n", - "

Contact

\n", - " \n", + "

Senior Backend Engineer at ReviewTrackers

\n", + "
    \n", + "
  • Rest API development
  • \n", + "
  • Putting ML models in production
  • \n", + "
  • Python, Go, Ruby, & ReactJS (JavaScript)
  • \n", + "
\n", + "

Academic

\n", + "
    \n", + "
  • BS, Computer Science, University of Notre Dame
  • \n", + "
  • MBA, Business Analytics, University of Notre Dame
  • \n", + "
\n", + "

Contact

\n", + " \n", "
" - ], + ] + }, + { + "cell_type": "markdown", "metadata": { - "cell_style": "split", "slideshow": { "slide_type": "slide" } - } - }, - { - "cell_type": "markdown", + }, "source": [ "## Your Turn\n", "\n", @@ -99,365 +176,368 @@ "- Your name\n", "- Your job or field\n", "- How you use Python now or would like to in the future" - ], + ] + }, + { + "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } - } + }, + "source": [ + "## Course" + ] }, { "cell_type": "markdown", - "source": [ - "## Course" - ], "metadata": { "slideshow": { "slide_type": "slide" } - } - }, - { - "cell_type": "markdown", + }, "source": [ "## Course Objectives\n", "\n", "The following are the primary learning objectives of this course:" - ], - "metadata": { - "slideshow": { - "slide_type": "slide" - } - } + ] }, { "cell_type": "markdown", - "source": [ - "- Develop an intuition for the machine learning workflow and Python tooling." - ], "metadata": { "slideshow": { "slide_type": "fragment" } - } + }, + "source": [ + "- Develop an intuition for the machine learning workflow and Python tooling." + ] }, { "cell_type": "markdown", - "source": [ - "- Build familiarity with common software engineering tooling and methodologies for implementing a machine learning project." - ], "metadata": { "slideshow": { "slide_type": "fragment" } - } + }, + "source": [ + "- Build familiarity with common software engineering tooling and methodologies for implementing a machine learning project." + ] }, { "cell_type": "markdown", - "source": [ - "- Gain hands-on experience with the tools and processes discussed with applied case study work." - ], "metadata": { "slideshow": { "slide_type": "fragment" } - } + }, + "source": [ + "- Gain hands-on experience with the tools and processes discussed with applied case study work." + ] }, { "cell_type": "markdown", - "source": [ - "## Course Agenda" - ], "metadata": { "slideshow": { "slide_type": "slide" } - } + }, + "source": [ + "## Course Agenda" + ] }, { "cell_type": "markdown", - "source": [ - "| Day | Topic | Time |\n", - "| :-: | :----------------------------------------------------------------------------- | :-----------: |\n", - "| 1 | Introductions | 9:00 - 9:15 |\n", - "| | Setting the Stage | 9:15 - 9:30 |\n", - "| | Git & version control | 9:30 - 10:15 |\n", - "| | Break | 10:15 - 10:30 |\n", - "| | EDA & Our First scikit-learn Model | 10:30 - 12:00 |\n", - "| | Q&A | 12:00 - 12:30 |\n", - "| 2 | Q&A | 8:45 - 9:00 |\n", - "| | Modular Code | 9:00 - 10:00 |\n", - "| | Feature Engineering | 10:00 - 11:00 |\n", - "| | Break | 11:00 - 11:15 |\n", - "| | Case Study, pt. 1 | 11:15 - 12:00 |\n", - "| | Q&A | 12:00 - 12:30 |\n", - "| 3 | Q&A | 8:45 - 9:00 |\n", - "| | Model Evaluation & Selection | 9:00 - 10:15 |\n", - "| | Break | 10:15 - 10:30 |\n", - "| | More on Modular Code | 10:30 - 11:15 |\n", - "| | Unit Tests | 11:15 - 12:00 |\n", - "| | Q&A | 12:00 - 12:30 |\n", - "| 4 | Q&A | 8:45 - 9:00 |\n", - "| | More on Unit Tests | 9:00 - 9:30 |\n", - "| | ML lifecycle management | 9:30 - 10:30 |\n", - "| | Break | 10:30 - 10:45 |\n", - "| | Case Study, pt. 2 | 10:45 - 11:45 |\n", - "| | Case Study Review, pt. 2 and Q&A | 11:45 - 12:30 |" - ], "metadata": { "slideshow": { "slide_type": "slide" } - } + }, + "source": [ + "| Day | Topic | Time |\n", + "| :-: | :----------------------------------------------------------------------------- | :-----------: |\n", + "| 1 | Introductions | 12:45 - 1:00 |\n", + "| | Setting the Stage | 1:00 - 1:15 |\n", + "| | Git & version control | 1:15 - 2:00 |\n", + "| | Break | 2:00 - 2:15 |\n", + "| | EDA & Our First scikit-learn Model | 2:15 - 3:45 |\n", + "| | Q&A | 3:45 - 4:15 |\n", + "| 2 | Q&A | 12:45 - 1:00 |\n", + "| | Modular Code | 1:00 - 2:00 |\n", + "| | Feature Engineering | 2:00 - 3:00 |\n", + "| | Break | 3:00 - 3:15 |\n", + "| | Case Study, pt. 1 | 3:15 - 4:00 |\n", + "| | Q&A | 4:00 - 4:15 |\n", + "| 3 | Q&A | 12:45 - 1:00 |\n", + "| | Model Evaluation & Selection | 1:00 - 2:15 |\n", + "| | Break | 2:15 - 2:30 |\n", + "| | More on Modular Code | 2:30 - 3:15 |\n", + "| | Unit Tests | 3:15 - 4:00 |\n", + "| | Q&A | 4:00 - 4:15 |\n", + "| 4 | Q&A | 12:45 - 1:00 |\n", + "| | More on Unit Tests | 1:00 - 1:30 |\n", + "| | ML lifecycle management | 1:30 - 2:30 |\n", + "| | Break | 2:30 - 2:45 |\n", + "| | Case Study, pt. 2 | 2:45 - 3:45 |\n", + "| | Case Study Review, pt. 2 and Q&A | 3:45 - 4:15 |" + ] }, { "cell_type": "markdown", - "source": [ - "## Course Philosophy" - ], "metadata": { "slideshow": { "slide_type": "slide" } - } + }, + "source": [ + "## Course Philosophy" + ] }, { "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, "source": [ "Beginners typically need the instructor to make connections and solve problems for them.\n", "\n", "*Why is this code not running?\n", "What types of real world problems could I use this package for?*" - ], - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - } + ] }, { "cell_type": "markdown", - "source": [], - "metadata": {} + "metadata": {}, + "source": [] }, { "cell_type": "markdown", - "source": [ - "But as intermediate to advanced users, we believe you'll be more capable of seeing those connections yourselves.\n", - "Instead of diving into details and working through small code examples, this advanced workshop takes a slightly different approach..." - ], "metadata": { "slideshow": { "slide_type": "fragment" } - } + }, + "source": [ + "But as intermediate to advanced users, we believe you'll be more capable of seeing those connections yourselves.\n", + "Instead of diving into details and working through small code examples, this advanced workshop takes a slightly different approach..." + ] }, { "cell_type": "markdown", - "source": [ - "- **Give you an overview of the tools you might need to solve a problem**. We can't teach you machine learning in just two days, but we *can* give you a foundation. And as experienced coders, you'll be able to fill in the details yourselves when the time comes to use these tools." - ], "metadata": { "slideshow": { "slide_type": "slide" } - } + }, + "source": [ + "- **Give you an overview of the tools you might need to solve a problem**. We can't teach you machine learning in just two days, but we *can* give you a foundation. And as experienced coders, you'll be able to fill in the details yourselves when the time comes to use these tools." + ] }, { "cell_type": "markdown", - "source": [ - "- **Explain more of the intuition behind tools and techniques**. Beginners can't yet see the forest for the trees -- they are caught up in small problems and not yet ready to understand the big picture. But in this class we will talk more about general design patterns of Python and its libraries, in a way that should help you *learn them* instead of simply memorize functions." - ], "metadata": { "slideshow": { "slide_type": "fragment" } - } + }, + "source": [ + "- **Explain more of the intuition behind tools and techniques**. Beginners can't yet see the forest for the trees -- they are caught up in small problems and not yet ready to understand the big picture. But in this class we will talk more about general design patterns of Python and its libraries, in a way that should help you *learn them* instead of simply memorize functions." + ] }, { "cell_type": "markdown", - "source": [ - "- **Expect you to help yourself**. We'll still be here to answer questions and help with hard problems, but the mark of an experienced programmer is that he/she consults references often (Google, documentation, etc) and can find answers there. You'll need to do that during this course and afterward when you apply the techniques we discuss." - ], "metadata": { "slideshow": { "slide_type": "fragment" } - } + }, + "source": [ + "- **Expect you to help yourself**. We'll still be here to answer questions and help with hard problems, but the mark of an experienced programmer is that he/she consults references often (Google, documentation, etc) and can find answers there. You'll need to do that during this course and afterward when you apply the techniques we discuss." + ] }, { "cell_type": "markdown", - "source": [ - "## Prerequisites" - ], "metadata": { "slideshow": { "slide_type": "slide" } - } + }, + "source": [ + "## Prerequisites" + ] }, { "cell_type": "markdown", - "source": [ - "### Knowledge" - ], "metadata": { "slideshow": { "slide_type": "slide" } - } + }, + "source": [ + "### Python\n", + "\n", + "- If you're attending this class, it's assumed you're comfortable with the material covered in the [Introduction to Python for Data Science](https://github.com/uc-python/intro-python-datasci) and [Intermediate Python for Data Science](https://github.com/uc-python/intermediate-python-datasci) classes.\n", + "- At a very high level, those courses covered:\n", + " - Importing data into and exporting data out of Python, via Pandas\n", + " - Wrangling data in Python with Pandas\n", + " - Basics of visualization with Seaborn\n", + " - Control flow\n", + " - Writing functions\n", + " - Conda environments\n", + " - Running Python outside of Jupyter notebooks\n", + " - Basics of modeling with scikit-learn" + ] }, { "cell_type": "markdown", - "source": [ - "#### Python\r\n", - "\r\n", - "- If you're attending this class, it's assumed you're comfortable with the material covered in the [Introduction to Python for Data Science](https://github.com/uc-python/intro-python-datasci) and [Intermediate Python for Data Science](https://github.com/uc-python/intermediate-python-datasci) classes.\r\n", - "- At a very high level, those courses covered:\r\n", - " - Importing data into and exporting data out of Python, via Pandas\r\n", - " - Wrangling data in Python with Pandas\r\n", - " - Basics of visualization with Seaborn\r\n", - " - Control flow\r\n", - " - Writing functions\r\n", - " - Conda environments\r\n", - " - Running Python outside of Jupyter notebooks\r\n", - " - Basics of modeling with scikit-learn" - ], "metadata": { "slideshow": { - "slide_type": "fragment" + "slide_type": "slide" } - } - }, - { - "cell_type": "markdown", + }, "source": [ - "#### Technology\n", + "### Jupyter\n", "\n", "* If you're attending this class, it's assumed you're comfortable with launching and using Python via Jupyter Notebooks -- and ideally outside of Jupyter as well.\n", "* Course materials (slides, case studies, etc.) will be in Jupyter Notebooks, but you're free to use your IDE of choice when completing exercises and case studies." - ], + ] + }, + { + "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } - } + }, + "source": [ + "## Technology Setup" + ] }, { "cell_type": "markdown", - "source": [ - "### Technology Installation" - ], "metadata": { "slideshow": { "slide_type": "slide" } - } - }, - { - "cell_type": "markdown", + }, "source": [ - "- Unlike my other courses, Advanced Python is not designed with Binder in mind.\n", + "- Unlike our other courses, Advanced Python is not designed with Binder in mind.\n", "- This means that you'll need to use your personal laptop to run today's code.\n", "- Why? We're going to be working with bigger data and more computationally-intensive algorithms, for which Binder is not well-equipped.\n", " - In an industry setting, using these techniques would best be done on a *server*, not a personal computer." - ], - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - } + ] }, { "cell_type": "markdown", - "source": [ - "#### Anaconda\r\n", - "\r\n", - "* Anaconda is the easiest way to install Python 3 and Jupyter.\r\n", - "* If you have not yet installed Anaconda, please follow the [directions in the course README](https://github.com/uc-python/intermediate-python-datasci).\r\n", - "* Be sure that all Python packages listed in the [environment.yaml](https://github.com/uc-python/advanced-python-datasci/blob/master/environment.yaml) are installed. See [here](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file) for instructions on creating a Conda environment from an environment.yaml file.\r\n", - "* This Anaconda installation will not be able to natively display the course content as slides, but I recommend using it for completing exercises and the case studies." - ], "metadata": { "slideshow": { "slide_type": "slide" } - } + }, + "source": [ + "### Anaconda\n", + "\n", + "* Anaconda is the easiest way to install Python 3 and Jupyter.\n", + "* If you have not yet installed Anaconda, please follow the [directions in the course README](https://github.com/uc-python/intermediate-python-datasci).\n", + "* Be sure that all Python packages listed in the [environment.yaml](https://github.com/uc-python/advanced-python-datasci/blob/master/environment.yaml) are installed. See [here](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file) for instructions on creating a Conda environment from an environment.yaml file.\n", + "* This Anaconda installation will not be able to natively display the course content as slides, but I recommend using it for completing exercises and the case studies." + ] }, { "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ - "#### JupyterLab\n", + "### JupyterLab\n", "- If you took the introductory and/or intermediate courses, you may have used Jupyter Notebooks to write Python.\n", "- Jupyter Notebooks are slowly being deprecated in favor of a new, more featureful product called JupyterLab.\n", "- JupyterLab is extremely similar but supports more features, and Notebooks is no longer being updated.\n", "- I recommend using JupyterLab today even if you haven't used it before -- it comes packaged with Anaconda and should feel very familiar!" - ], - "metadata": { - "slideshow": { - "slide_type": "slide" - } - } + ] }, { "cell_type": "markdown", - "source": [ - "## Course Materials" - ], "metadata": { "slideshow": { "slide_type": "slide" } - } + }, + "source": [ + "## Course Materials" + ] }, { "cell_type": "markdown", - "source": [ - "* All of the material for this course can be reached from the [GitHub repository](https://github.com/uc-python/advanced-python-datasci).\n", - "* This repository has access to the slides and notebooks.\n", - "* You should download the material -- available via [this link](https://github.com/uc-python/advanced-python-datasci/archive/master.zip) -- and open it via Anaconda Navigator and Jupyter Notebooks/Lab." - ], "metadata": { "slideshow": { "slide_type": "fragment" } - } + }, + "source": [ + "* All of the material for this course can be reached from the [GitHub repository](https://github.com/uc-python/advanced-python-datasci).\n", + "* This repository has access to the slides and notebooks.\n", + "* You should download the material -- available via [this link](https://github.com/uc-python/advanced-python-datasci/archive/master.zip) -- and open it via Anaconda Navigator and Jupyter Notebooks/Lab." + ] }, { "cell_type": "markdown", - "source": [ - "### Slides *and* Notebooks" - ], "metadata": { "slideshow": { "slide_type": "slide" } - } + }, + "source": [ + "### Slides *are* Notebooks" + ] }, { "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, "source": [ "- We'll be showing the material in slide format most of the time.\n", "- These slides contain the same content as your notebooks, so you can follow along and run cells as we go." - ], + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Source Code" + ] + }, + { + "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } - } + }, + "source": [ + "* Source code for the training can be found on [GitHub](https://github.com/uc-python/advanced-python-datasci)\n", + "* This repository is public so you can clone (download) and/or refer to the materials at any point in the future" + ] }, { "cell_type": "markdown", - "source": [ - "## Questions\n", - "\n", - "Are there any questions before moving on?" - ], "metadata": { "slideshow": { "slide_type": "slide" } - } + }, + "source": [ + "## Questions\n", + "\n", + "Are there any questions before moving on?" + ] } ], "metadata": { @@ -486,4 +566,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} \ No newline at end of file +} diff --git a/notebooks/images/ethan.jpg b/notebooks/images/ethan.jpg index 209e51c..8d1ac83 100644 Binary files a/notebooks/images/ethan.jpg and b/notebooks/images/ethan.jpg differ diff --git a/notebooks/images/gus.jpg b/notebooks/images/gus.jpg new file mode 100644 index 0000000..13ea8e3 Binary files /dev/null and b/notebooks/images/gus.jpg differ diff --git a/notebooks/images/jay.jpg b/notebooks/images/jay.jpg new file mode 100644 index 0000000..651d053 Binary files /dev/null and b/notebooks/images/jay.jpg differ