Skip to content

Commit

Permalink
Merge pull request #59 from uc-python/update_environment
Browse files Browse the repository at this point in the history
updated environment.yaml
  • Loading branch information
augustopher authored Jan 6, 2024
2 parents 85815a5 + 5313e78 commit c16127b
Show file tree
Hide file tree
Showing 3 changed files with 109 additions and 38 deletions.
26 changes: 13 additions & 13 deletions environment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,17 @@ channels:
- defaults
- conda-forge
dependencies:
- category_encoders>=2.2
- ipykernel>=6.4
- matplotlib>=3.5
- python=3.11
- category_encoders>=2.6
- ipykernel>=6.28
- matplotlib>=3.8
- missingno>=0.4
- mlflow=1.22
- nbconvert>=6.1
- numpy>=1.21
- pandas>=1.3
- pip>=21.2
- plotnine>=0.8
- pytest>=6.2
- python=3.9
- scikit-learn>=1.0
- seaborn>=0.11
- mlflow=2.9
- nbconvert>=7.14
- numpy>=1.26
- pandas>=2.1
- pip>=23.3
- plotnine>=0.12
- pytest>=7.4
- scikit-learn>=1.3
- seaborn>=0.13
6 changes: 3 additions & 3 deletions notebooks/09-ml_lifecycle_mgt.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,7 @@
"source": [
"import mlflow\n",
"\n",
"mlflow.set_experiment(\"Predicting income\")"
"experiment = mlflow.set_experiment(\"Predicting income\")"
]
},
{
Expand Down Expand Up @@ -1210,7 +1210,7 @@
}
],
"source": [
"df = mlflow.search_runs(experiment_ids='1')\n",
"df = mlflow.search_runs(experiment_ids=experiment.experiment_id)\n",
"df"
]
},
Expand Down Expand Up @@ -1294,7 +1294,7 @@
}
],
"source": [
"model_path = f'mlruns/1/{run_id}/artifacts/best_estimator'\n",
"model_path = f'mlruns/{experiment.experiment_id}/{run_id}/artifacts/best_estimator'\n",
"model = mlflow.sklearn.load_model(model_path)\n",
"model"
]
Expand Down
115 changes: 93 additions & 22 deletions notebooks/Case Study.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -373,57 +373,128 @@
},
{
"cell_type": "markdown",
"id": "aa69e649",
"id": "ed954c73-b660-4edc-93f4-b6869c0dc9d3",
"metadata": {},
"source": [
"### Unit Tests\n",
"### Modular code & unit tests\n",
"\n",
"1. TBD\n",
"1. TBD\n",
"1. TBD"
"1. Move the `loguniform_int` class we defined above into a new module, `loguniform_int.py`. We haven't put classes into modules before, but it's no different than a function; just paste it along with any imports it needs."
]
},
{
"cell_type": "markdown",
"id": "98334504",
"id": "4bd495e6-eb4b-4ddb-83b5-dde8e586ef36",
"metadata": {},
"source": [
"### ML lifecycle management"
"Your new module should contain something like:\n",
"\n",
"```python\n",
"from scipy.stats import loguniform\n",
"\n",
"class loguniform_int:\n",
" \"\"\"Integer valued version of the log-uniform distribution\"\"\"\n",
" def __init__(self, a, b):\n",
" self._distribution = loguniform(a, b)\n",
"\n",
" def rvs(self, *args, **kwargs):\n",
" \"\"\"Random variable sample\"\"\"\n",
" return self._distribution.rvs(*args, **kwargs).astype(int)\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "42c8a06f",
"id": "7375923a-f673-4c83-abd3-cf4f099a5b9c",
"metadata": {},
"source": [
"1. Create and set an MLflow experiment titled \"UC Advanced Python Case Study\"\n",
"2. Re-perform the random hyperparameter search executed above while logging the hyperparameter search experiment with MLflow's autologging. Title this run \"rf_hyperparameter_tuning\"."
"2. Import your module and make sure you can use it in code by (re)running the below:"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "31f07d64-f468-4b4a-a60e-e338f2f00cb2",
"metadata": {
"tags": [
"ci-skip"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fitting 5 folds for each of 10 candidates, totalling 50 fits\n"
]
}
],
"source": [
"from loguniform_int import loguniform_int\n",
"\n",
"param_distributions = {\n",
" 'rf__n_estimators': loguniform_int(50, 1000),\n",
" 'rf__max_features': loguniform(.1, .8),\n",
" 'rf__max_depth': loguniform_int(2, 30),\n",
" 'rf__min_samples_leaf': loguniform_int(1, 100),\n",
" 'rf__max_samples': loguniform(.5, 1),\n",
"}\n",
"\n",
"random_search = RandomizedSearchCV(\n",
" pipeline, \n",
" param_distributions=param_distributions, \n",
" n_iter=10, # lower this to 10 so it's faster\n",
" cv=5, \n",
" scoring='neg_root_mean_squared_error',\n",
" verbose=1,\n",
" n_jobs=-1,\n",
")\n",
"\n",
"results2 = random_search.fit(X_train, y_train)"
]
},
{
"cell_type": "markdown",
"id": "60677940",
"id": "ca9dc10a-42dd-4cc4-b957-0451046cc5f9",
"metadata": {},
"source": [
"### Reproducibility with dependency tracking\n",
"3. Create a `tests.py` file in which you add the tests we already create for `get_features_and_target` (you can just copy them), along with a new test that asserts that `loguniform` objects have a `._distribution.args` attribute that holds the original numbers passed into them -- confirming that we did indeed create the kind of distribution we expected. Run the tests when finished.\n",
"\n",
"1. TBD\n",
"1. TBD\n",
"1. TBD"
"```python\n",
">>> lu = loguniform_int(2, 30)\n",
">>> lu._distribution.args\n",
"(2, 30)\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b8271687",
"cell_type": "markdown",
"id": "c7d3dd7d-11c9-471f-8391-c5a23219acd6",
"metadata": {},
"outputs": [],
"source": []
"source": [
"4. Parametrize this test. Create one `loguniform_int` with `(2, 30)` as the arguments and another with `(1, 100)` as the arguments. Confirm that in both cases, the resulting `._distribution.args` attribute holds a tuple with the same numbers that were supplied initially. Rerun your tests."
]
},
{
"cell_type": "markdown",
"id": "98334504",
"metadata": {},
"source": [
"### ML lifecycle management"
]
},
{
"cell_type": "markdown",
"id": "42c8a06f",
"metadata": {},
"source": [
"1. Create and set an MLflow experiment titled \"UC Advanced Python Case Study\"\n",
"2. Re-perform the random hyperparameter search executed above while logging the hyperparameter search experiment with MLflow's autologging. Title this run \"rf_hyperparameter_tuning\"."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
Expand All @@ -437,7 +508,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
"version": "3.9.2"
}
},
"nbformat": 4,
Expand Down

0 comments on commit c16127b

Please sign in to comment.