Skip to content

This repository contains tools to work with the DAC from local Python installation.

License

Notifications You must be signed in to change notification settings

DataS-DHSC/dhsc-data-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dhsc_data_tools package

Stable branch: main

The goal of DHSCdatatools is to provide a suite of tools for using data hosted on the DHSC analytical cloud (DAC) platform. For detailed developer documentation click here.

Pre-requisites

  1. Local installation of Simba Spark ODBC Driver 32-bit and Simba Spark ODBC Driver 64-bit.

  2. A new conda environment is recommended for the package. In Git Bash:

conda create -n <your_environment_name> python==3.12 pip

Some of the dependencies of this package are not currently compatible with the latest Python 3.13. Use any python version from and including 3.8 and below 3.13. E.g. above python==3.12 is specified.

Working with .env files

  1. Though not strictly a package dependency, we recommend you install python-dotenv to work with .env files.

In Git Bash, with the relevant environment activated:

pip install python-dotenv

To install the dhsc_data_tools package

In Git Bash, with the relevant environment activated, to install dhsc_data_tools:

pip install git+https://github.com/DataS-DHSC/dhsc-data-tools.git

.env and config files

A .env file containing tenant name and key vault name is required for dhsc_data_tools.dac_odbc.connect() and dhsc_data_tools.keyvault.KVConnection(). Please find the .env file in the Data Science Teams space DAC channel.

Place this file in your working directory.

IMPORTANT

Ensure in each project your .gitignore file excludes config, .env, and relevant yaml files. If you do accidentally commit these files (or any other sensitive data) please get in touch with the Data Science Hub to discuss how best to mitigate the breach.

Example use scripts

Connecting to the DAC data using an SQL endpoint

from dhsc_data_tools import dac_odbc
from dotenv import load_dotenv
load_dotenv(".env")

#create client
conn = dac_odbc.connect()

# Run a SQL query by using the preceding connection.
cursor = conn.cursor()
cursor.execute("SELECT * FROM samples.nyctaxi.trips LIMIT 10")

# Print the rows retrieved from the query.
for row in cursor.fetchall():
    print(row)

# For help, you can run
help(dac_odbc.connect) # or with any other module

Code of Conduct

Please note that the DHSCdatatools project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Licence

Unless stated otherwise, the codebase is released under the MIT License. This covers both the codebase and any sample code in the documentation.

All other content is © Crown copyright and available under the terms of the Open Government 3.0 licence, except where otherwise stated.

About

This repository contains tools to work with the DAC from local Python installation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages