Skip to content

Latest commit

 

History

History
85 lines (53 loc) · 3.26 KB

README.md

File metadata and controls

85 lines (53 loc) · 3.26 KB

dhsc_data_tools package

Stable branch: main

The goal of DHSCdatatools is to provide a suite of tools for using data hosted on the DHSC analytical cloud (DAC) platform. For detailed developer documentation click here.

Pre-requisites

  1. Local installation of Simba Spark ODBC Driver 32-bit and Simba Spark ODBC Driver 64-bit.

  2. A new conda environment is recommended for the package. In Git Bash:

conda create -n <your_environment_name> python==3.12 pip

Some of the dependencies of this package are not currently compatible with the latest Python 3.13. Use any python version from and including 3.8 and below 3.13. E.g. above python==3.12 is specified.

Working with .env files

  1. Though not strictly a package dependency, we recommend you install python-dotenv to work with .env files.

In Git Bash, with the relevant environment activated:

pip install python-dotenv

To install the dhsc_data_tools package

In Git Bash, with the relevant environment activated, to install dhsc_data_tools:

pip install git+https://github.com/DataS-DHSC/dhsc-data-tools.git

.env and config files

A .env file containing tenant name and key vault name is required for dhsc_data_tools.dac_odbc.connect() and dhsc_data_tools.keyvault.KVConnection(). Please find the .env file in the Data Science Teams space DAC channel.

Place this file in your working directory.

IMPORTANT

Ensure in each project your .gitignore file excludes config, .env, and relevant yaml files. If you do accidentally commit these files (or any other sensitive data) please get in touch with the Data Science Hub to discuss how best to mitigate the breach.

Example use scripts

Connecting to the DAC data using an SQL endpoint

from dhsc_data_tools import dac_odbc
from dotenv import load_dotenv
load_dotenv(".env")

#create client
conn = dac_odbc.connect()

# Run a SQL query by using the preceding connection.
cursor = conn.cursor()
cursor.execute("SELECT * FROM samples.nyctaxi.trips LIMIT 10")

# Print the rows retrieved from the query.
for row in cursor.fetchall():
    print(row)

# For help, you can run
help(dac_odbc.connect) # or with any other module

Code of Conduct

Please note that the DHSCdatatools project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Licence

Unless stated otherwise, the codebase is released under the MIT License. This covers both the codebase and any sample code in the documentation.

All other content is © Crown copyright and available under the terms of the Open Government 3.0 licence, except where otherwise stated.