diff --git a/preview/pr-36/2022/10/17/Use-Linting-Tools-to-Save-Time.html b/preview/pr-36/2022/10/17/Use-Linting-Tools-to-Save-Time.html new file mode 100644 index 0000000000..bc9f661420 --- /dev/null +++ b/preview/pr-36/2022/10/17/Use-Linting-Tools-to-Save-Time.html @@ -0,0 +1,691 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Tip of the Week: Use Linting Tools to Save Time | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Tip of the Week: Use Linting Tools to Save Time

+ + + + + + + + +
+ + + + + +
+ + + +

Tip of the Week: Use Linting Tools to Save Time

+ +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ + + +

Have you ever found yourself spending hours formatting your code so it looks just right? Have you ever caught a duplicative import statement in your code? We recommend using open source linting tools to help avoid common issues like these and save time.

+ + + +

Software Linting is the practice of detecting and sometimes automatically fixing stylistic, syntactical, or other programmatic issues. Linting usually involves installing standardized or opinionated libraries which allow you to quickly make code corrections. Using linting tools also can help you learn nuanced or unwritten intricacies of programming languages while you solve problems in your work.

+ +

TLDR (too long, didn’t read); Linting is a type of static analysis which can be used to instantly address many common code issues. isort provides automatic Python import statement linting. pre-commit provides an easy way to test and apply isort (in addition to other linting tools) through source control workflows.

+ +

Example: Python Code Style Linting with isort

+ +

Isort is a Python utility for linting package import statements (sorting, deduplication, etc). Isort may be used to automatically fix your import statements or test for their consistency. See the isort installation documentation for more information on getting started.

+ +

Before isort

+ +

The following Python code shows a series of import statements. There are duplicate imports and the imports are a mixture of custom (possibly local), external, and built-in packages. Isort can check this code using the command: isort <file or path> --check.

+ +
from custompkg import b, a
+import numpy as np
+import pandas as pd
+import sys
+import os
+import pandas as pd
+import os
+
+ +

After isort

+ +

Isort can fix the code automatically using the command: isort <file or path>. After applying the fixes, notice that all packages are alphabetized and grouped by built-in, external, and custom packages.

+ +
import os
+import sys
+
+import numpy as np
+import pandas as pd
+from custompkg import a, b
+
+ +

Using isort with pre-commit

+ +

Pre-commit is a framework which can be used to apply linting checks and fixes as git-hooks or the command line. Pre-commit includes existing hooks for many libraries, including isort. See the pre-commit installation documentation to get started.

+ +

Example .pre-commit-config.yaml Configuration

+ +

The following yaml content can be used to reference isort by pre-commit. This configuration content can be expanded to many different pre-commit hooks.

+ +
# example .pre-commit-config.yaml file leveraging isort
+# See https://pre-commit.com/hooks.html for more hooks
+---
+repos:
+  - repo: https://github.com/PyCQA/isort
+    rev: 5.10.1
+    hooks:
+      - id: isort
+
+ +

Example Using pre-commit Manually

+ +

Imagine we have a file, example.py, which includes the content from Before isort. Running pre-commit manually on the directory files will first automatically apply isort formatting. The second time pre-commit is run there will be no issue (pre-commit resolved it automatically).

+ +

First detecting and fixing the file:

+ +
% pre-commit run --all-files
+isort...................................Failed
+- hook id: isort
+- files were modified by this hook
+
+Fixing example.py
+
+ +

Then checking that the file was fixed:

+ +
% pre-commit run --all-files
+isort...................................Passed
+
+
+ + + + + +
+ + + +
+ + + + + + Next post
+ + Tip of the Week: Diagrams as Code + + +
+
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/2022/11/27/Diagrams-as-Code.html b/preview/pr-36/2022/11/27/Diagrams-as-Code.html new file mode 100644 index 0000000000..3a2e80c0ed --- /dev/null +++ b/preview/pr-36/2022/11/27/Diagrams-as-Code.html @@ -0,0 +1,763 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Tip of the Week: Diagrams as Code | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Tip of the Week: Diagrams as Code

+ + + + + + + + +
+ + + + + +
+ + + +

Tip of the Week: Diagrams as Code

+ +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ + + +

Diagrams can be a useful way to illuminate and communicate ideas. Free-form drawing or drag and drop tools are one common way to create diagrams. With this tip of the week we introduce another option: diagrams as code (DaC), or creating diagrams by using code.

+ + + +

TLDR (too long, didn’t read); +Diagrams as code (DaC) tools provide an advantage for illustrating concepts by enabling quick visual positioning, source controllable input, portability (both for input and output formats), and open collaboration through reproducibility. Consider using Mermaid (as well as many other DaC tools) to assist your diagramming efforts which can be used directly, within in your markdown files, or Github commentary using code blocks (for example, ` ```mermaid `).

+ +

Example Mermaid Diagram as Code

+ +
+ + +
+
flowchart LR
+    a --> b
+    b --> c
+    c --> d1
+    c --> d2
+
+ +

Mermaid code

+
+ + +
+
+flowchart LR
+    a --> b
+    b --> c
+    c --> d1
+    c --> d2
+
+

Mermaid rendered +

+ +
+ +
+ +

The above shows an example mermaid flowchart code and its rendered output. The syntax is specific to mermaid and acts as a simple coding language to help you depict ideas. Mermaid also includes options for sequence, class, Gantt, and other diagram types. Mermaid provides a live editor which can be used to quickly draft and share content.

+ +

Mermaid Github Integration

+ +
+ + +
+
+ + Github comment + + +
+ Github comment + +
+ +
+ +
+ + +
+
+ + Github comment preview + + +
+ Github comment preview + +
+ +
+ +
+ +
+ +

Mermaid diagrams may be rendered directly from markdown (.md) and text communication content (like pull request or issue comments) within Github. See Github’s blog post on mermaid for more details covering this topic.

+ +

Mermaid Jupyter Notebook Integration

+ +
+ + Mermaid content rendered in a Jupyter notebook + + +
+ Mermaid content rendered in a Jupyter notebook + +
+ +
+ +

Mermaid diagrams can be rendered directly within Jupyter notebooks with a small amount of additional code and a rendering service. One way to render mermaid and other diagrams within notebooks is to use Kroki.io. See this example for an interactive demonstration.

+ +

Version Controlling Your Diagrams

+ +
+graph LR
+    subgraph Compose
+      write[Write Diagram Code]
+      render[Render Diagram]
+    end
+    subgraph Store[Save and Share]
+      save[Upload Diagram]
+    end
+    write --> | create | render
+    render --> | revise | write
+    render --> | code and exports | save
+
+

Mermaid version control workflow example

+ +

Creating your diagrams with code means you can enable reproducible and collaborative work on version control systems (like git). Using git in this way allows you to reference and remix your diagrams as part of development. It also allows others to collaborate on diagrams together making modifications as needed.

+ +

Additional Resources

+ +

Please see the following the additional resources which are related to diagrams as code.

+ + +
+ + + + + +
+ + + +
+ + + Previous post
+ + Tip of the Week: Use Linting Tools to Save Time + + +
+ + + Next post
+ + Tip of the Week: Data Engineering with SQL, Arrow and DuckDB + + +
+
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/2022/12/05/Data-Engineering-with-SQL-Arrow-and-DuckDB.html b/preview/pr-36/2022/12/05/Data-Engineering-with-SQL-Arrow-and-DuckDB.html new file mode 100644 index 0000000000..1065698ad1 --- /dev/null +++ b/preview/pr-36/2022/12/05/Data-Engineering-with-SQL-Arrow-and-DuckDB.html @@ -0,0 +1,745 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Tip of the Week: Data Engineering with SQL, Arrow and DuckDB | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Tip of the Week: Data Engineering with SQL, Arrow and DuckDB

+ + + + + + + + +
+ + + + + +
+ + + +

Tip of the Week: Data Engineering with SQL, Arrow and DuckDB

+ +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ + + +

Apache Arrow is a language-independent and high performance data format useful in many scenarios. DuckDB is an in-process SQL-based data management system which is Arrow-compatible. In addition to providing a SQLite-like database format, DuckDB also provides a standardized and high performance way to work with Arrow data where otherwise one may be forced to language-specific data structures or transforms.

+ + + +

TLDR (too long, didn’t read); +DuckDB may be used to access and transform Arrow-based data from multiple data formats through SQL. Using Arrow and DuckDB provides a cross-language way to access and manage data. Data development with these tools may also enable improvements in performance, understandability, or long term maintainability of your code.

+ +

Reduce Wasted Conversion Effort with Arrow

+ +
+flowchart TB
+    Python:::outlined <--> Arrow
+    R:::outlined <--> Arrow
+    C++:::outlined <--> Arrow
+    Java:::outlined <--> Arrow
+    others...:::outlined <--> Arrow
+
+    classDef outlined fill:#fff,stroke:#333
+
+ + +

Arrow provides a multi-language data format which prevents you from needing to convert to other formats when dealing with multiple in-memory or serialized data formats. For example, this means that a Python and an R package may use the same in-memory or file-based data without conversion (where normally a Python Pandas dataframe and R data frame may require a conversion step in between).

+ +
+flowchart TB
+    subgraph Python
+      Pandas:::outlined
+      Polars:::outlined
+      dict[Python dict]:::outlined
+      list[Python list]:::outlined
+    end
+
+    Pandas <--> Arrow
+    Polars <--> Arrow
+    dict <--> Arrow
+    list <--> Arrow
+
+  classDef outlined fill:#fff,stroke:#333
+
+ +

The same stands for various libraries within one language - Arrow enables interchange between various language library formats (for example, a Python Pandas dataframe and Python dictionary are two distinct in-memory formats which may require conversions). Conversions to or from these formats can involve data type or other inferences which are costly to productivity. You can save time and effort by avoiding conversions using Arrow.

+ +

Using SQL to Join or Transform Arrow Data via DuckDB

+ +
+flowchart LR
+    subgraph duckdb["DuckDB Processing"]
+        direction BT
+        SQL[SQL] --> DuckDB[DuckDB Client]
+    end
+    parquet1[example.parquet] --> duckdb
+    sqlite[example.sqlite] --> duckdb
+    csv[example.csv] --> duckdb
+    arrow["in-memory Arrow"] --> duckdb
+    pandas["in-memory Pandas"] --> duckdb
+    duckdb --> Arrow
+    Arrow --> Other[Other work...]
+
+ +

DuckDB provides a management client and relational database format (similar to SQLite databases) which may be handled with Arrow. SQL may be used with the DuckDB client to filter, join, or change various data types. Due to Arrow’s cross-language properties, there is no additional cost to using SQL through DuckDB to return data for implementation within other purpose-built data formats. DuckDB provides client API’s in many languages (for example, Python, R, and C++), making it possible to write DuckDB client code with SQL to manage data without having to use manually written sub-procedures.

+ +
+flowchart TB
+  subgraph duckdb["DuckDB Processing"]
+        direction BT
+        SQL[SQL] --> DuckDB[DuckDB Client]
+    end
+    Python:::outlined <--> duckdb
+    R:::outlined <--> duckdb
+    C++:::outlined <--> duckdb
+    Java:::outlined <--> duckdb
+    others...:::outlined <--> duckdb
+    duckdb <--> Arrow
+
+    classDef outlined fill:#fff,stroke:#333
+
+ +

Using SQL to perform these operations with Arrow provides an opportunity for your data code to be used (or understood) within other languages without additional rewrites. SQL also provides you access to roughly 48 years worth of data management improvements without being constrained by imperative language data models or schema (reference: SQL Wikipedia: First appeared: 1974).

+ +

Example with SQL to Join Arrow Data with DuckDB in Python

+ +
+ + Jupyter notebook example screenshot with DuckDB and Arrow data handling + + +
+ Jupyter notebook example screenshot with DuckDB and Arrow data handling + +
+ +
+ +

The following example notebook shows how to use SQL to join data from multiple sources using the DuckDB client API within Python. The example includes DuckDB querying a remote CSV, local Parquet file, and Arrow in-memory tables.

+ +

Linked Example

+ +

Additional Resources

+ +

Please see the following the additional resources.

+ + +
+ + + + + +
+ + + +
+ + + Previous post
+ + Tip of the Week: Diagrams as Code + + +
+ + + Next post
+ + Tip of the Week: Remove Unused Code to Avoid Software Decay + + +
+
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/2022/12/12/Remove-Unused-Code-to-Avoid-Decay.html b/preview/pr-36/2022/12/12/Remove-Unused-Code-to-Avoid-Decay.html new file mode 100644 index 0000000000..8cd610f3eb --- /dev/null +++ b/preview/pr-36/2022/12/12/Remove-Unused-Code-to-Avoid-Decay.html @@ -0,0 +1,750 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Tip of the Week: Remove Unused Code to Avoid Software Decay | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Tip of the Week: Remove Unused Code to Avoid Software Decay

+ + + + + + + + +
+ + + + + +
+ + + +

Tip of the Week: Remove Unused Code to Avoid Software Decay

+ +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ + + +

The act of creating software often involves many iterations of writing, personal collaborations, and testing. During this process it’s common to lose awareness of code which is no longer used, and thus may not be tested or otherwise linted. Unused code may contribute to “software decay”, the gradual diminishment of code quality or functionality. This post will cover software decay and strategies for addressing unused code to help keep your code quality high.

+ + + +

TLDR (too long, didn’t read); +Unused code is easy to amass and may cause your code quality or code functionality to diminish (“decay”) over time. Effort must be taken to maintain any code or artifacts you add to your repositories, including those which are unused. Consider using Vulture, Pylint, or Coverage to help illuminate sections of your code which may need to be removed.

+ +

Code Lifecycle and Maintenance

+ +
+stateDiagram
+    direction LR
+    removal : removed or archived
+    changes : changes needed
+    [*] --> added
+    added --> maintenance
+    state maintenance {
+      direction LR
+      updated --> changes
+      changes --> updated
+    }
+    maintenance --> removal
+    removal --> [*]
+
+ + +

Diagram showing code lifecycle activities.

+ +

Adding code to a project involves a loose agreement to maintenance for however long the code is available. The maintenance of the code can involve added efforts in changes as well as passive impacts like longer test durations or decreased readability (simply from more code).

+ +
+ + + + + + + + + +

When considering multiple parts of code in many files, this maintenance can become untenable, leading to the gradual decay of your code quality or functionality. For example, let’s assume one line of code costs 30 seconds to maintain (feel free to substitute time with monetary or personnel aspects as an example measure here too). 1000 lines of code would cost 500 minutes (or about 8 hours) to maintain. This becomes more complex when considering multiple files, collaborators, or languages.

+ +

+ +

Think about your project as if it were on a hiking trail: “Carry as little as possible, but choose that little with care.” (Earl Shaffer). Be careful what code you choose to carry; it may impact your ability to address needs over time and lead to otherwise unintended software decay.

+ +

Detecting Unused Code with Vulture

+ +

Understanding the cost of added content, it’s important to routinely examine which parts of your code are still necessary. You can prepare your code for a long journey by detecting (and removing) unused code with various automated tools. These tools are generally designed for static analysis and linting, meaning they may also be incorporated into automated and routine testing.

+ +
$ vulture unused_code_example.py
+unused_code_example.py:3: unused import 'os' (90% confidence)
+unused_code_example.py:4: unused import 'pd' (90% confidence)
+unused_code_example.py:7: unused function 'unused_function' (60% confidence)
+unused_code_example.py:14: unused variable 'unused_var' (60% confidence)
+
+ +

Example of Vulture command line usage to discover unused code.

+ +

Vulture is one tool dedicated to finding unused python code. Vulture provides both a command line interface and Python API for discovering unused code. It also provide a rough confidence to show how certain it was about whether the block of code was unused. See the following interactive example for a demonstration of using Vulture.

+ +

Interactive Example on Unused Code Detection

+ +

Further Code Usefulness Detection with Pylint and Coverage.py

+ +

In addition to Vulture, Pylint and Coverage.py can be used in a similar way to help show where code may not have been used within your project.

+ +

Pylint focuses on code style and other static analysis in addition to unused variables. See Pylint’s Checkers page for more details here, using “unused-*” as a reference to checks it performs which focus on unused code.

+ +

Coverage.py helps show you which parts of your code have been executed or not. A common usecase for Coverage involves measuring “test coverage”, or which parts of your code are executed in relationship to tests written for that code. This provides another perspective on code utility: if there’s not a test for the code, is it worth keeping?

+ +

Additional Resources

+ + +
+ + + + + +
+ + + +
+ + + Previous post
+ + Tip of the Week: Data Engineering with SQL, Arrow and DuckDB + + +
+ + + Next post
+ + Tip of the Week: Linting Documentation as Code + + +
+
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/2023/01/03/Linting-Documentation-as-Code.html b/preview/pr-36/2023/01/03/Linting-Documentation-as-Code.html new file mode 100644 index 0000000000..36c796e25a --- /dev/null +++ b/preview/pr-36/2023/01/03/Linting-Documentation-as-Code.html @@ -0,0 +1,779 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Tip of the Week: Linting Documentation as Code | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Tip of the Week: Linting Documentation as Code

+ + + + + + + + +
+ + + + + +
+ + + +

Tip of the Week: Linting Documentation as Code

+ +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ + + +

Software documentation is sometimes treated as a less important or secondary aspect of software development. Treating documentation as code allows developers to version control the shared understanding and knowledge surrounding a project. Leveraging this paradigm also enables the use of tools and patterns which have been used to strengthen code maintenance. This article covers one such pattern: linting, or static analysis, for documentation treated like code.

+ + + +

TLDR (too long, didn’t read); +There are many linting tools available which enable quick revision of your documentation. Try using codespell for spelling corrections, mdformat for markdown file formatting corrections, and vale for more complex editorial style or natural language assessment within your documentation.

+ +

Spelling Checks

+ +
+ + +
+
<!--- readme.md --->
+## Example Readme
+
+Thsi project is a wokr in progess.
+Code will be updated by the team very often.
+
+(CU Anschutz)[https://www.cuanschutz.edu/]
+
+ +

Example readme.md with incorrectly spelled words

+
+ + +
+
% codespell readme.md
+readme.md:4: Thsi ==> This
+readme.md:4: wokr ==> work
+readme.md:4: progess ==> progress
+
+
+
+
+ +

Example showing codespell detection of mispelled words

+
+ +
+ +

Spelling checks may be used to automatically detect incorrect spellings of words within your documentation (and code!). Codespell is one library which can lint your word spelling. Codespell may be used through the command-line and also through a pre-commit hook.

+ +

Markdown Format Linting

+ +
+ + +
+
<!--- readme.md --->
+## Example Readme
+
+This project is a work in progress.
+Code will be updated by the team very often.
+
+(CU Anschutz)[https://www.cuanschutz.edu/]
+
+ +

Example readme.md with markdown issues

+
+ + +
+
% markdownlint readme.md
+readme.md:2 MD041/first-line-heading/first-line-h1
+First line in a file should be a top-level heading
+[Context: "## Example Readme"]
+readme.md:6:5 MD011/no-reversed-links Reversed link
+syntax [(link)[https://www.cuanschutz.edu/]]
+
+
+ +

Example showing markdownlint detection of issues

+
+ +
+ +

The format of your documentation files may also be linted for common issues. This may catch things which are otherwise hard to see when editing content. It may also improve the overall web accessibility of your content, for example, through proper HTML header order and image alternate text. Markdownlint is one library which can be used to find issues within markdown files.

+ +

Additional and similar resources to explore in this area:

+ + + +

Editorial Style and Grammar

+ +
+ + +
+
<!--- readme.md --->
+# Example Readme
+
+This project is a work in progress.
+Code will be updated by the team very often.
+
+[CU Anschutz](https://www.cuanschutz.edu/)
+
+ +

Example readme.md with questionable editorial style

+
+ + +
+
% vale readme-example.md
+readme-example.md
+2:12  error    Did you really mean 'Readme'?   Vale.Spelling
+5:11  warning  'be updated' may be passive     write-good.Passive
+               voice. Use active voice if you
+               can.
+5:34  warning  'very' is a weasel word!        write-good.Weasel
+
+ +

Example showing vale warnings and errors

+
+ +
+ +

Maintaining consistent editorial style and grammar may also be a focus within your documentation. These issues are sometimes more difficult to detect and more opinionated in nature. In some cases, organizations publish guides on this topic (see Microsoft Writing Style Guide, or Google Developer Documenation Style Guide). Some of the complexity of writing style may be linted through tools like Vale. Using common configurations through Vale can unify how language is used within your documentation by linting for common style and grammar.

+ +

Additional and similar resources to explore in this area:

+ + + +

Resources

+ +

Please see the following the resources on this topic.

+ + +
+ + + + + +
+ + + +
+ + + Previous post
+ + Tip of the Week: Remove Unused Code to Avoid Software Decay + + +
+ + + Next post
+ + Tip of the Week: Timebox Your Software Work + + +
+
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/2023/01/17/Timebox-Your-Software-Work.html b/preview/pr-36/2023/01/17/Timebox-Your-Software-Work.html new file mode 100644 index 0000000000..1865b07668 --- /dev/null +++ b/preview/pr-36/2023/01/17/Timebox-Your-Software-Work.html @@ -0,0 +1,742 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Tip of the Week: Timebox Your Software Work | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Tip of the Week: Timebox Your Software Work

+ + + + + + + + +
+ + + + + +
+ + + +

Tip of the Week: Timebox Your Software Work

+ +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ + + +

Programming often involves long periods of problem solving which can sometimes lead to unproductive or exhausting outcomes. This article covers one way to avoid less productive time expense or protect yourself from overexhaustion through a technique called “timeboxing” (also sometimes referenced as “timeblocking”).

+ + + +

TLDR (too long, didn’t read); +Use timeboxing techniques such as Pomodoro® or 52/17 to help modularize your software work to ensure you don’t fall victim to Parkinson’s Law. Timeboxing may also map well to Github Issues, which allows your software tasks to be further aligned, documented, and chunked in collaboration with others.

+ +

Controlling Work Time Expansion

+ +
+ + Image depicting work as a creature with a timebox around it. + + +
+ Image depicting work as a creature with a timebox around it. + +
+ +
+ +

Have you ever spent more time than you thought you would on a task? An adage which helps explain this phenomenon is Parkinson’s Law:

+ +
+

“… work expands so as to fill the time available for its completion.”

+
+ +

The practice of writing software is not protected from this “law”. It may be affected by it in sometimes worse ways during long periods of uninterrupted programming where we may have an inclination to forget productive goals.

+ +

One way to address this is through the use of timeboxing techiques. Timeboxing sets a fixed limit to the amount of time one may spend on a specific activity. One can use timeboxing to systematically address many tasks, for example, as with the Pomodoro® Technique (developed by Francesco Cirillo) or 52/17 rule. While there are many ways to apply timeboxing, make sure to balance activity with short breaks and focus switches to help ensure we don’t become overwhelmed.

+ +

Timeboxing Means Modularization

+ +

Timeboxing has an auxiliary benefit of framing your work as objective and oftentimes smaller chunks (we have to know what we’re timeboxing in order to use this technique). Creating distinct chunks of work applies for both our daily time schedule as well as code itself. This concept is more broadly called “modularization” and helps to distinguish large portions of work (whether in real life or in code) as smaller and more maintainable chunks.

+ +
+ + +
+
# Goals
+- Finish writing paper
+
+
+
+
+
+ +

Vague and possibly large task

+ +
+ + +
+
# Goals
+- Finish writing paper
+  - Create paper outline
+  - Finish writing introduction
+  - Check for dead hyperlinks
+  - Request internal review
+
+ +

Modular and more understandable tasks

+
+ +
+ +

Breaking down large amounts of work as smaller chunks within our code helps to ensure long-term maintainability and understandability. Similarly, keeping our tasks small can help ensure our goals are achievable and understandable (to ourselves or others). Without this modularity, tasks can be impossible to achieve (subjective in nature) or very difficult to understand. Stated differently, taking many small steps can lead to a big change in an organized, oftentimes less exhausting way (related graphic).

+ +

Version Control and Timeboxing

+ +
# Repo Issues
+- "Prevent foo warning" - 20 minutes
+- "Remove bar feature" - 20 minutes
+- "Address baz error" - 20 minutes
+
+
+ +

List of example version control repository issues with associated time duration.

+ +

The parallels between the time we give a task and related code can work towards your benefit. For example, Github Issues can be created to outline a timeboxed task which relates to a distinct chunk of code to be created, updated, or fixed. Once development tasks have been outlined as issues, a developer can use timeboxing to help organize how much time to allocate on each issue.

+ +

Using Github Issues in this way provides a way to observe task progress associated with one or many repositories. It also increases collaborative opportunities for task sizing and description. For example, if a task looks too large to complete in a reasonable amount of time, developers may work together to break the task down into smaller modules of work.

+ +

Be Kind to Yourself: Take Breaks

+ +

While timeboxing is often a conversation about how to be more productive, it’s also worth remembering: take breaks to be kind to yourself and more effective. Some studies and thought leadership have shown that taking breaks may be necessary to avoid performance decreases and impacts to your health. There’s also some indication that taking breaks may lead to better work. See below for just a few examples:

+ + + +

Additional Resources

+ + +
+ + + + + +
+ + + +
+ + + Previous post
+ + Tip of the Week: Linting Documentation as Code + + +
+ + + Next post
+ + Tip of the Week: Software Linting with R + + +
+
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/2023/01/30/Software-Linting-with-R.html b/preview/pr-36/2023/01/30/Software-Linting-with-R.html new file mode 100644 index 0000000000..4380c9576f --- /dev/null +++ b/preview/pr-36/2023/01/30/Software-Linting-with-R.html @@ -0,0 +1,731 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Tip of the Week: Software Linting with R | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Tip of the Week: Software Linting with R

+ + + + + + + + +
+ + + + + +
+ + + +

Tip of the Week: Software Linting with R

+ +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ + + +

This article covers using the software technique of linting on R code in order to improve code quality, development velocity, and collaboration.

+ + + +

TLDR (too long, didn’t read); +Use software linting (static analysis) practices on your R code with existing packages lintr and styler (among others). These linters may be applied using pre-commit in your local development environment or as continuous tests using for example Github Actions.

+ +

Treating R as Software

+ +
+

“Many users think of R as a statistics system. We prefer to think of it as an environment within which statistical techniques are implemented.”

+
+ +

(R-Project: What is R?)

+ +

The R programming language is sometimes treated as only a statistics system instead of software. This treatment can sometimes lead to common issues in development which are experienced in other languages. Addressing R as software enables developers to enhance their work by taking benefit from existing concepts applied to many other languages.

+ +

Linting with R

+ +
+flowchart LR
+  write\[Write R code] --> |check| check\[Check code with linters]
+  check --> |revise| write
+
+ + +

Workflow loop depicting writing R code and revising with linters.

+ +

Software linting, or static analysis, is one way to ensure a minimum level of code quality without writing new tests. Linting checks how your code is structured without running it to make sure it abides by common language paradigms and logical structures. Using linting tools allows a developer to gain quick insights about their code before it is viewed or used by others.

+ +

One way to lint your R code is by using the lintr package. The lintr package is also complementary of the styler pacakge, which formats the syntax of R code in a consistent way. Both of these can be used independently or as part of continuous quality checks for R code repositories.

+ +

Automated Linting Checks with R

+ +
+flowchart LR
+  subgraph development
+    write
+    check
+  end
+  subgraph linters
+    direction LR
+    lintr
+    styler
+  end
+  check <-.- linters
+  write\[Write R code] --> |check| check\[Check code with pre-commit]
+  check --> |revise| write
+
+ +

Workflow showing development with pre-commit using multiple linters.

+ +

lintr and styler can be incorporated into automated checks to help make sure linting (or other steps) are always used with new code. One tool which can help with this is pre-commit, which acts as both a local development tool in addition to providing observability within source control (more on this later).

+ +

Using pre-commit locally enables quick feedback loops using one or many checkers (such as lintr, styler, or others). Pre-commit may be used through the use of git hooks or manually using pre-commit run ... from a command-line. See this example of pre-commit checks with R for an example of multiple pre-commit checks for R code.

+ +

Continuous and Observable Testing for R

+ +
+flowchart LR
+  subgraph development [local development]
+    direction LR
+    write
+    check
+    commit
+  end
+  subgraph remote[Github repository]
+    direction LR
+    action["Check code (remotely)"]
+  end
+  write\[Write R code] --> |check| check\[Check code with pre-commit]
+  check --> |revise| write
+  check --> commit[commit + push]
+  commit --> |optional trigger| action
+  check -.-> |perform same checks| action
+
+ +

Workflow showing pre-commit used as continuous testing tool with Github.

+ +

Pre-commit linting checks can also be incorporated into continuous testing performed on your repository. One way to do this is using Github Actions. Github Actions provides a programmatic way to specify automatic steps taken as changes occur to a repository.

+ +

Pre-commit provides an example Github Action which will automatically check and alert repository maintainers when code challenges are detected. Using pre-commit in this way allows R developers to ensure lintr checks are performed on any new work checked into a repository. This can have benefits towards decreasing pull request (PR) review time and standardize how code collaboration takes place for R developers.

+ +

Resources

+ +

Please see the following the resources on this topic.

+ + +
+ + + + + +
+ + + +
+ + + Previous post
+ + Tip of the Week: Timebox Your Software Work + + +
+ + + Next post
+ + Tip of the Week: Branch, Review, and Learn + + +
+
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/2023/02/13/Branch-Review-and-Learn.html b/preview/pr-36/2023/02/13/Branch-Review-and-Learn.html new file mode 100644 index 0000000000..76908ff78f --- /dev/null +++ b/preview/pr-36/2023/02/13/Branch-Review-and-Learn.html @@ -0,0 +1,780 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Tip of the Week: Branch, Review, and Learn | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Tip of the Week: Branch, Review, and Learn

+ + + + + + + + +
+ + + + + +
+ + + +

Tip of the Week: Branch, Review, and Learn

+ +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ + + +

Git provides a feature called branching which facilitates parallel and segmented programming work through commits with version control. Using branching enables both work concurrency (multiple people working on the same repository at the same time) as well as a chance to isolate and review specific programming tasks. This article covers some conceptual best practices with branching, reviewing, and merging code using Github.

+ + + +

Please note: the content below represents one opinion in a larger space of Git workflow concepts (it’s not perfect!). Developer cultures may vary on these topics; be sure to acknowledge people and culture over exclusive or absolute dedication to what is found below.

+ +

TLDR (too long, didn’t read); +Use git branching techniques to segment the completion of programming tasks, gradually and consistently committing small changes (practicing festina lente or “make haste, slowly”). When a group of small changes are ready from branches, request pull request reviews and take advantage of comments to continuously improve the work. Prepare for a branch merge after review by deciding which merge strategy is appropriate and automating merge requirements with branch protection rules.

+ +

Concept: Coursework Branching

+ +
+flowchart LR
+ subgraph Course
+    direction LR
+    open["open\nassignment"]
+    turn_in["review\nassignment"]
+  end
+  subgraph Student ["     Student"]
+    direction LR
+    work["completed\nassignment"]
+  end
+  open -.-> turn_in
+  open --> |works towards| work
+  work --> |seeks review| turn\_in
+
+ + +

An example course and student assignment workflow.

+ +

Git branching practices may be understood in context with similar workflows from real life. Consider a student taking a course, where an assignment is given to them to complete. In addition to the steps shown in the diagram above, it’s important to think about why this pattern is beneficial:

+ + + +

Branching to Complete an “Assignment”

+ +
+%%{init: { 'logLevel': 'debug', 'theme': 'default' , 'themeVariables': {
+      'git0': '#4F46E5',
+      'git1': '#10B981',
+      'gitBranchLabel1': '#ffffff'
+} } }%%
+    gitGraph
+       commit id: "..."
+       commit id: "opened"
+       branch assignment
+       checkout assignment
+       commit id: "completed"
+       checkout main
+
+ +

An example git diagram showing assignment branch based off main.

+ +

Following the course assignment workflow, the diagram above shows an in-progress assignment branch based off of the main branch. When the assignment branch is created, we bring into it everything we know from main (the course) so far in the form of commits, or groups of changes to various files. Branching allows us to make consistent and well described changes based on what’s already happened without impacting others work in the meantime.

+ +
+

Branching best practices:

+ +
    +
  • +Keep the name and work with branches dedicated to a specific and focused purpose. For example: a branch named fix-links-in-docs might entail work related to fixing HTTP links within documentation.
  • +
  • +Consider the use of Github Forks (along with branches within the fork) to help further isolate and enrich work potential. Forks also allow remixing existing work into new possibilities.
  • +
  • +festina lente or “make haste, slowly”: Commits on any branch represent small chunks of a cohesive idea which will eventually be brought to main. It is often beneficial to be consistent with small, gradual commits to avoid a rushed or incomplete submission. The same applies more generally for software; taking time upfront to do things well can mean time saved later.
  • +
+
+ +

Reviewing the Branched Work

+ +
+%%{init: { 'logLevel': 'debug', 'theme': 'default' , 'themeVariables': {
+      'git0': '#6366F1',
+      'git1': '#10B981',
+      'gitBranchLabel1': '#ffffff'
+} } }%%
+    gitGraph
+       commit id: "..."
+       commit id: "opened"
+       branch assignment
+       checkout assignment
+       commit id: "completed"
+       checkout main
+       merge assignment id: "reviewed"
+
+ +

An example git diagram showing assignment branch being merged with main after a review.

+ +

The diagram above depicts a merge from the assignment branch to pull the changes into the main branch, simulating an assignment being returned for review within a course. While merges may be forced without review, it’s a best practice create a Pull Request (PR) Review (also known as a Merge Request (MR) on some systems) and then ask other members of your team to review it. Doing this provides a chance to make revisions before code changes are “finalized” within the main branch.

+ +
+

Github provides special tools for reviews which can assist both the author and reviewer:

+ +
    +
  • +Keep code changes intended for review small, enabling reviewers to reason through the work to more quickly provide feedback and practicing incremental continuous improvement (it may be difficult to address everything at once!). This also may denote the git history for a repository in a clearer way.
  • +
  • +Github comments: Overall review comments (encompassing all work from the branch) and Inline comments (inquiring about individual lines of code) may be provided. Inline comments may also include code suggestions, which allows for code-based revision suggestions that may be committed directly to the branch using markdown codeblocks ( ``suggestion `).
  • +
  • +Github issues: Creating issues from comments allows the creation of new repository issues to address topics outside of the current PR.
  • +
+
+ +

Merging the Branch after Review

+ +
+%%{init: { 'logLevel': 'debug', 'theme': 'default' , 'themeVariables': {
+      'git0': '#6366F1'
+} } }%%
+    gitGraph
+       commit id: "..."
+       commit id: "opened"
+       commit type: HIGHLIGHT id: "reviewed"
+       commit id: "...."
+
+ +

An example git diagram showing the main branch after the assignment branch has been merged (and removed).

+ +

Changes may be made within the assignment branch until the work is in a state where the authors and reviewers are satisfied. At this point, the branch changes may be merged into main. Approvals are sometimes provided informally (for ex., with a comment: “LGTM (looks good to me)!”) or explicitly (for ex., approvals within Github) to indicate or enable branch merge readiness . After the merge, changes may continue to be made in a similar way (perhaps accounting for concurrently branched work elsewhere). Generally, a merged branch may be removed afterwards to help maintain an organized working environment (see Github PR branch removal).

+ +
+

Github provides special tools for merging:

+ +
    +
  • +Decide which merge strategy is appropriate (there are many!): There are many merge strategies within Github (merge commits, squash merges, and rebase merging). Take time to understand them and choose which one works best.
  • +
  • +Consider using branch protection to automate merge requirements: The main or other branches may be “protected” against merges using branch protection rules. These rules can require reviewer approvals or automatic status checks to pass before changes may be merged.
  • +
  • +Use merge queuing to manage multiple PR’s: When there are many unmerged PR’s, it can sometimes be difficult to document and ensure each are merged in a desired sequence. Consider using merge queues to help with this process.
  • +
+
+ +

Additional Resources

+ +

The links below may provide additional guidance on using these git features, including in-depth coverage of various features and related configuration.

+ + +
+ + + + + +
+ + + +
+ + + Previous post
+ + Tip of the Week: Software Linting with R + + +
+ + + Next post
+ + Tip of the Week: Automate Software Workflows with GitHub Actions + + +
+
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/2023/03/15/Automate-Software-Workflows-with-Github-Actions.html b/preview/pr-36/2023/03/15/Automate-Software-Workflows-with-Github-Actions.html new file mode 100644 index 0000000000..bff62a3853 --- /dev/null +++ b/preview/pr-36/2023/03/15/Automate-Software-Workflows-with-Github-Actions.html @@ -0,0 +1,784 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Tip of the Week: Automate Software Workflows with GitHub Actions | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Tip of the Week: Automate Software Workflows with GitHub Actions

+ + + + + + + + +
+ + + + + +
+ + + +

Tip of the Week: Automate Software Workflows with GitHub Actions

+ +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ + + +

There are many routine tasks which can be automated to help save time and increase reproducibility in software development. GitHub Actions provides one way to accomplish these tasks using code-based workflows and related workflow implementations. This type of automation is commonly used to perform tests, builds (preparing for the delivery of the code), or delivery itself (sending the code or related artifacts where they will be used).

+ + + +

TLDR (too long, didn’t read); +Use GitHub Actions to perform continuous integration work automatically by leveraging Github’s workflow specification and the existing marketplace of already-created Actions. You can test these workflows with Act, which can enhance development with this feature of Github. Consider making use of “write once, run anywhere” (WORA) and Dagger in conjunction with GitHub Actions to enable reproducible workflows for your software projects.

+ +

Workflows in Software

+ +
+flowchart LR
+  start((start)) --> action
+  action["action(s)"] --> en((end))
+  style start fill:#6EE7B7
+  style en fill:#FCA5A5
+
+ + +

An example workflow.

+ +

Workflows consist of sequenced activities used by various systems. Software development workflows help accomplish work the same way each time by using what are commonly called “workflow engines”. Generally, workflow engines are provided code which indicate beginnings (what triggers a workflow to begin), actions (work being performed in sequence), and an ending (where the workflow stops). There are many workflow engines, including some which help accomplish work alongside version control.

+ +

GitHub Actions

+ +
+flowchart LR
+  subgraph workflow [GitHub Actions Workflow Run]
+    direction LR
+    action["action(s)"] --> en((end))
+    start((event\ntrigger))
+  end
+  start --> action
+  style start fill:#6EE7B7
+  style en fill:#FCA5A5
+
+ +

A diagram showing GitHub Actions as a workflow.

+ +

GitHub Actions is a feature of GitHub which allows you to run workflows in relation to your code as a continuous integration (including automated testing, builds, and deployments) and general automation tool. For example, one can use GitHub Actions to make sure code related to a GitHub Pull Request passes certain tests before it is allowed to be merged. GitHub Actions may be specified using YAML files within your repository’s .github/workflows directory by using syntax specific to Github’s workflow specification. Each YAML file under the .github/workflows directory can specify workflows to accomplish tasks related to your software work. GitHub Actions workflows may be customized to your own needs, or use an existing marketplace of already-created Actions.

+ +
+ + Image showing GitHub Actions tab on GitHub website. + + +
+ Image showing GitHub Actions tab on GitHub website. + +
+ +
+ +

GitHub provides an “Actions” tab for each repository which helps visualize and control Github Actions workflow runs. This tab shows a history of all workflow runs in the repository. For each run, it shows whether it was run successful or not, the associated logs, and controls to cancel or re-run it.

+ +
+

GitHub Actions Examples +GitHub Actions is sometimes better understood with examples. See the following references for a few basic examples of using GitHub Actions in a simulated project repository.

+ + +
+ +

Testing with Act

+ +
+flowchart LR
+  subgraph container ["local simulation container(s)"]
+    direction LR
+    subgraph workflow [GitHub Actions Workflow Run]
+      direction LR
+      start((event\ntrigger))
+      action --> en((end))
+    end
+  end
+  start --> action
+  act\[Run Act] -.-> |Simulate\ntrigger| start
+  style start fill:#6EE7B7
+  style en fill:#FCA5A5
+
+ +

A diagram showing how GitHub Actions workflows may be triggered from Act

+ +

One challenge with GitHub Actions is a lack of standardized local testing tools. For example, how will you know that a new GitHub Actions workflow will function as expected (or at all) without pushing to the GitHub repository? One third-party tool which can help with this is Act. Act uses Docker images which require Docker Desktop to simulate what running a GitHub Action workflow within your local environment. Using Act can sometimes avoid guessing what will occur when a GitHub Action worklow is added to your repository. See Act’s installation documentation for more information on getting started with this tool.

+ +

Nested Workflows with GitHub Actions

+ +
+flowchart LR
+
+  subgraph action ["Nested Workflow (Dagger, etc)"]
+    direction LR
+    actions
+    start2((start)) --> actions
+    actions --> en2((end))
+    en2((end))
+  end
+  subgraph workflow2 [Local Environment Run]
+    direction LR
+    run2[run workflow]
+    en3((end))
+    start3((event\ntrigger))
+  end
+  subgraph workflow [GitHub Actions Workflow Run]
+    direction LR
+    start((event\ntrigger))
+    run[run workflow]
+    en((end))
+  end
+  
+  start --> run
+  start3 --> run2
+  action -.-> run
+  run --> en
+  run2 --> en3
+  action -.-> run2
+  style start fill:#6EE7B7
+  style start2 fill:#D1FAE5
+  style start3 fill:#6EE7B7
+  style en fill:#FCA5A5
+  style en2 fill:#FFE4E6
+  style en3 fill:#FCA5A5
+
+ +

A diagram showing how GitHub Actions may leverage nested workflows with tools like Dagger.

+ +

There are times when GitHub Actions may be too constricting or Act may not accurately simulate workflows. We also might seek to “write once, run anywhere” (WORA) to enable flexible development on many environments. One workaround to this challenge is to use nested workflows which are compatible with local environments and GitHub Actions environments. Dagger is one tool which enables programmatically specifying and using workflows this way. Using Dagger allows you to trigger workflows on your local machine or GitHub Actions with the same underlying engine, meaning there are fewer inconsistencies or guesswork for developers (see here for an explanation of how Dagger works).

+ +

There are also other alternatives to Dagger you may want to consider based on your usecase, preference, or interest. Earthly is similar to Dagger and uses “earthfiles” as a specification. Both Dagger and Earthly (in addition to GitHub Actions) use container-based approaches, which in-and-of themselves present additional alternatives outside the scope of this article.

+ +
+

GitHub Actions with Nested Workflow Example +Reference this example for a brief demonstration of how GitHub Actions and Dagger may be used together.

+ + +
+ +

Closing Remarks

+ +

Using GitHub Actions through the above methods can help automate your technical work and increase the quality of your code with sometimes very little additional effort. Saving time through this form of automation can provide additional flexibility accomplish more complex work which requires your attention (perhaps using timeboxing techniques). Even small amounts of time saved can turn into large opportunities for other work. On this note, be sure to explore how GitHub Actions can improve things for your software endeavors.

+
+ + + + + +
+ + + +
+ + + Previous post
+ + Tip of the Week: Branch, Review, and Learn + + +
+ + + Next post
+ + Tip of the Week: Using Python and Anaconda with the Alpine HPC Cluster + + +
+
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/2023/07/07/Using-Python-and-Anaconda-with-the-Alpine-HPC-Cluster.html b/preview/pr-36/2023/07/07/Using-Python-and-Anaconda-with-the-Alpine-HPC-Cluster.html new file mode 100644 index 0000000000..c14c94ef5f --- /dev/null +++ b/preview/pr-36/2023/07/07/Using-Python-and-Anaconda-with-the-Alpine-HPC-Cluster.html @@ -0,0 +1,940 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Tip of the Week: Using Python and Anaconda with the Alpine HPC Cluster | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Tip of the Week: Using Python and Anaconda with the Alpine HPC Cluster

+ + + + + + + + +
+ + + + + +
+ + + +

Tip of the Week: Using Python and Anaconda with the Alpine HPC Cluster

+ +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ + + +

This post is intended to help demonstrate the use of Python on Alpine, a High Performance Compute (HPC) cluster hosted by the University of Colorado Boulder’s Research Computing. +We use Python here by way of Anaconda environment management to run code on Alpine. +This readme will cover a background on the technologies and how to use the contents of an example project repository as though it were a project you were working on and wanting to run on Alpine.

+ + + +

+ +

Diagram showing a repository’s work as being processed on Alpine.

+ +

Table of Contents

+ +
    +
  1. +Background: here we cover the background of Alpine and related technologies.
  2. +
  3. +Implementation: in this section we use the contents of an example project repository on Alpine.
  4. +
+ +

Background

+ +

Why would I use Alpine?

+ +

+ +

Diagram showing common benefits of Alpine and HPC clusters.

+ +

Alpine is a High Performance Compute (HPC) cluster. +HPC environments provide shared computer hardware resources like memory, CPU, GPU or others to run performance-intensive work. +Reasons for using Alpine might include:

+ + + +

How does Alpine work?

+ +

+ +

Diagram showing high-level user workflow and Alpine components.

+ +

Alpine’s compute resources are used through compute nodes in a system called Slurm. +Slurm is a system that a large number of users to run jobs on a cluster of computers; the system figures out how to use all the computers in the cluster to execute all the user’s jobs fairly (i.e., giving each user approximately equal time and resources on the cluster). A job is a request to run something, e.g. a bash script or a program, along with specifications about how much RAM and CPU it needs, how long it can run, and how it should be executed.

+ +

Slurm’s role in general is to take in a job (submitted via the sbatch command) and put it into a queue (also called a “partition” in Slurm). For each job in the queue, Slurm constantly tries to find a computer in the cluster with enough resources to run that job, then when an available computer is found runs the program the job specifies on that computer. As the program runs, Slurm records its output to files and finally reports the program’s exit status (either completed or failed) back to the job manager.

+ +

Importantly, jobs can either be marked as interactive or batch. When you submit an interactive job, sbatch will pause while waiting for the job to start and then connect you to the program, so you can see its output and enter commands in real time. On the other hand, a batch job will return immediately; you can see the progress of your job using squeue, and you can typically see the output of the job in the folder from which you ran sbatch unless you specify otherwise. +Data for or from Slurm work may be stored temporarily on local storage or on user-specific external (remote) storage.

+ +
+ + +
+ +

Wait, what are “nodes”?

+ +

A simplified way to understand the architecture of Slurm on Alpine is through login and compute “nodes” (computers). +Login nodes act as a place to prepare and submit jobs which will be completed on compute nodes. Login nodes are never used to execute Slurm jobs, whereas compute nodes are exclusively accessed via a job. +Login nodes have limited resource access and are not recommended for running procedures.

+ +
+
+ +

One can interact with Slurm on Alpine by use of Slurm interfaces and directives. +A quick way of accessing Alpine resources is through the use of the acompile command, which starts an interactive job on a compute node with some typical default parameters for the job. Since acompile requests very modest resources (1 hour and 1 CPU core at the time of writing), you’ll typically quickly be connected to a compute node. For more intensive or long-lived interactive jobs, consider using sinteractive, which allows for more customization: Interactive Jobs. +One can also access Slurm directly through various commands on Alpine.

+ +

Many common software packages are available through the Modules package on Alpine (UCB RC documentation: The Modules System).

+ +

How does Slurm work?

+ +

+ +

Diagram showing how Slurm generally works.

+ +

Using Alpine effectively involves knowing how to leverage Slurm. +A simplified way to understand how Slurm works is through the following sequence. +Please note that some steps and additional complexity are omitted for the purposes of providing a basis of understanding.

+ +
    +
  1. +Create a job script: build a script which will configure and run procedures related to the work you seek to accomplish on the HPC cluster.
  2. +
  3. +Submit job to Slurm: ask Slurm to run a set of commands or procedures.
  4. +
  5. +Job queue: Slurm will queue the submitted job alongside others (recall that the HPC cluster is a shared resource), providing information about progress as time goes on.
  6. +
  7. +Job processing: Slurm will run the procedures in the job script as scheduled.
  8. +
  9. +Job completion or cancellation: submitted jobs eventually may reach completion or cancellation states with saved information inside Slurm regarding what happened.
  10. +
+ +

How do I store data on Alpine?

+ +

+ +

Data used or produced by your processed jobs on Alpine may use a number of different data storage locations. +Be sure to follow the Acceptable data storage and use policies of Alpine, avoiding the use of certain sensitive information and other items. +These may be distinguished in two ways:

+ +
    +
  1. +

    Alpine local storage (sometimes temporary): Alpine provides a number of temporary data storage locations for accomplishing your work. +⚠️ Note: some of these locations may be periodically purged and are not a suitable location for long-term data hosting (see here for more information)!
    +Storage locations available (see this link for full descriptions):

    + +
      +
    • +Home filesystem: 2 GB of backed up space under /home/$USER (where $USER is your RMACC or Alpine username).
    • +
    • +Projects filesystem: 250 GB of backed up space under /projects/$USER (where $USER is your RMACC or Alpine username).
    • +
    • +Scratch filesystem: 10 TB (10,240 GB) of space which is not backed up under /scratch/alpine/$USER (where $USER is your RMACC or Alpine username).
    • +
    +
  2. +
  3. +

    External / remote storage: Users are encouraged to explore external data storage options for long-term hosting.
    +Examples may include the following:

    + + +
  4. +
+ +

How do I send or receive data on Alpine?

+ +

+ +

Diagram showing external data storage being used to send or receive data on Alpine local storage.

+ +

Data may be sent to or gathered from Alpine using a number of different methods. +These may vary contingent on the external data storage being referenced, the code involved, or your group’s available resources. +Please reference the following documentation from the University of Colorado Boulder’s Research Computing regarding data transfers: The Compute Environment - Data Transfer. +Please note: due to the authentication configuration of Alpine many local or SSH-key based methods are not available for CU Anschutz users. +As a result, Globus represents one of the best options available (see 3. 📂 Transfer data results below). While the Globus tutorial in this document describes how you can download data from Alpine to your computer, note that you can also use Globus to transfer data to Alpine from your computer.

+ +

Implementation

+ +

+ +

Diagram showing how an example project repository may be used within Alpine through primary steps and processing workflow.

+ +

Use the following steps to understand how Alpine may be used with an example project repository to run example Python code.

+ +

0. 🔑 Gain Alpine access

+ +

First you will need to gain access to Alpine. +This access is provided to members of the University of Colorado Anschutz through RMACC and is separate from other credentials which may be provided by default in your role. +Please see the following guide from the University of Colorado Boulder’s Research Computing covering requesting access and generally how this works for members of the University of Colorado Anschutz.

+ + + +

1. 🛠️ Prepare code on Alpine

+ +
[username@xsede.org@login-ciX ~]$ cd /projects/$USER
+[username@xsede.org@login-ciX username@xsede.org]$ git clone https://github.com/CU-DBMI/example-hpc-alpine-python
+Cloning into 'example-hpc-alpine-python'...
+... git output ...
+[username@xsede.org@login-ciX username@xsede.org]$ ls -l example-hpc-alpine-python
+... ls output ...
+
+ +

An example of what this preparation section might look like in your Alpine terminal session.

+ +

Next we will prepare our code within Alpine. +We do this to balance the fact that we may develop and source control code outside of Alpine. +In the case of this example work, we assume git as an interface for GitHub as the source control host.

+ +

Below you’ll find the general steps associated with this process.

+ +
    +
  1. Login to the Alpine command line (reference this guide).
  2. +
  3. Change directory into the Projects filesystem (generally we’ll assume processed data produced by this code are large enough to warrant the need for additional space):
    cd /projects/$USER +
  4. +
  5. Use git (built into Alpine by default) commands to clone this repo:
    git clone https://github.com/CU-DBMI/example-hpc-alpine-python +
  6. +
  7. Verify the contents were received as desired (this should show the contents of an example project repository):
    ls -l example-hpc-alpine-python +
  8. +
+ + + +

+ +
+ + +
+ +

What if I need to authenticate with GitHub?

+ +

There are times where you may need to authenticate with GitHub in order to accomplish your work. +From a GitHub perspective, you will want to use either GitHub Personal Access Tokens (PAT) (recommended by GitHub) or SSH keys associated with the git client on Alpine. +Note: if you are prompted for a username and password from git when accessing a GitHub resource, the password is now associated with other keys like PAT’s instead of your user’s password (reference). +See the following guide from GitHub for more information on how authentication through git to GitHub works:

+ + + +
+
+ +

2. ⚙️ Implement code on Alpine

+ +
[username@xsede.org@login-ciX ~]$ sbatch --export=CSV_FILEPATH="/projects/$USER/example_data.csv" example-hpc-alpine-python/run_script.sh
+[username@xsede.org@login-ciX username@xsede.org]$ tail -f example-hpc-alpine-python.out
+... tail output (ctrl/cmd + c to cancel) ...
+[username@xsede.org@login-ciX username@xsede.org]$ head -n 2 example_data.csvexample-hpc-alpine-python
+... data output ...
+
+ +

An example of what this implementation section might look like in your Alpine terminal session.

+ +

After our code is available on Alpine we’re ready to run it using Slurm and related resources. +We use Anaconda to build a Python environment with specified packages for reproducibility. +The main goal of the Python code related to this work is to create a CSV file with random data at a specified location. +We’ll use Slurm’s sbatch command, which submits batch scripts to Slurm using various options.

+ +
    +
  1. Use the sbatch command with exported variable CSV_FILEPATH.
    sbatch --export=CSV_FILEPATH="/projects/$USER/example_data.csv" example-hpc-alpine-python/run_script.sh +
  2. +
  3. After a short moment, use the tail command to observe the log file created by Slurm for this sbatch submission. This file can help you understand where things are at and if anything went wrong.
    tail -f example-hpc-alpine-python.out +
  4. +
  5. Once you see that the work has completed from the log file, take a look at the top 2 lines of the data file using the head command to verify the data arrived as expected (column names with random values):
    head -n 2 example_data.csv +
  6. +
+ +

3. 📂 Transfer data results

+ +

+ +

Diagram showing how example_data.csv may be transferred from Alpine to a local machine using Globus solutions.

+ +

Now that the example data output from the Slurm work is available we need to transfer that data to a local system for further use. +In this example we’ll use Globus as a data transfer method from Alpine to our local machine. +Please note: always be sure to check data privacy and policy which change the methods or storage locations you may use for your data!

+ +
    +
  1. +Globus local machine configuration +
      +
    1. Install Globus Connect Personal on your local machine.
    2. +
    3. During installation, you will be prompted to login to Globus. Use your ACCESS credentials to login.
    4. +
    5. During installation login, note the label you provide to Globus. This will be used later, referenced as “Globus Connect Personal label”.
    6. +
    7. Ensure you add and (importantly:) provide write access to a local directory via Globus Connect Personal - Preferences - Access where you’d like the data to be received from Alpine to your local machine.

      +
    8. +
    +
  2. +
  3. +Globus web interface +
      +
    1. Use your ACCESS credentials to login to the Globus web interface.
    2. +
    3. +Configure File Manager left side (source selection) +
        +
      1. Within the Globus web interface on the File Manager tab, use the Collection input box to search or select “CU Boulder Research Computing ACCESS”.
      2. +
      3. Within the Globus web interface on the File Manager tab, use the Path input box to enter: /projects/your_username_here/ (replacing “your_username_here” with your username from Alpine, including the “@” symbol if it applies).
      4. +
      +
    4. +
    5. +Configure File Manager right side (destination selection) +
        +
      1. Within the Globus web interface on the File Manager tab, use the Collection input box to search or select the __Globus Connect Personal label you provided in earlier steps.
      2. +
      3. Within the Globus web interface on the File Manager tab, use the Path input box to enter the local path which you made accessible in earlier steps.
      4. +
      +
    6. +
    7. +Begin Globus transfer +
        +
      1. Within the Globus web interface on the File Manager tab on the left side (source selection), check the box next to the file example_data.csv.
      2. +
      3. Within the Globus web interface on the File Manager tab on the left side (source selection), click the “Start ▶️” button to begin the transfer from Alpine to your local directory.
      4. +
      5. After clicking the “Start ▶️” button, you may see a message in the top right with the message “Transfer request submitted successfully”. You can click the link to view the details associated with the transfer.
      6. +
      7. After a short period, the file will be transferred and you should be able to verify the contents on your local machine.
      8. +
      +
    8. +
    +
  4. +
+ +

Further References

+ + +
+ + + + + +
+ + + +
+ + + Previous post
+ + Tip of the Week: Automate Software Workflows with GitHub Actions + + +
+ + + Next post
+ + Tip of the Week: Python Packaging as Publishing + + +
+
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/2023/09/05/Python-Packaging-as-Publishing.html b/preview/pr-36/2023/09/05/Python-Packaging-as-Publishing.html new file mode 100644 index 0000000000..70836a5044 --- /dev/null +++ b/preview/pr-36/2023/09/05/Python-Packaging-as-Publishing.html @@ -0,0 +1,1113 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Tip of the Week: Python Packaging as Publishing | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Tip of the Week: Python Packaging as Publishing

+ + + + + + + + +
+ + + + + +
+ + + +

Tip of the Week: Python Packaging as Publishing

+ +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ + + +

Python packaging is the craft of preparing for and reaching distribution of your Python work to wider audiences. Following conventions for packaging help your software work become more understandable, trustworthy, and connected (to others and their work). Taking advantage of common packaging practices also strengthens our collective superpowers: collaboration. This post will cover preparation aspects of packaging, readying software work for wider distribution.

+ + + +

TLDR (too long, didn’t read);

+ +

Use Pythonic packaging tools and techniques to help avoid code decay and unwanted code smells and increase your development velocity. Increase understanding with unsurprising directory structures like those exhibited in pypa/sampleproject or scientific-python/cookie. Enhance trust by being authentic on source control systems like GitHub (by customizing your profile), staying up to date with the latest supported versions of Python, and using security linting tools like PyCQA/bandit through visible + automated GitHub Actions ✅ checks. Connect your projects to others using CITATION.cff files, CONTRIBUTING.md files, and using environment + packaging tools like poetry to help others reproduce the same results from your code.

+ +

Why practice packaging?

+ +
+ + How are a page with some text and a book different? + + +
+ How are a page with some text and a book different? + +
+ +
+ +

The practice of Python packaging efforts is similar to that of publishing a book. Consider how a bag of text is different from a book. How and why are these things different?

+ + + +
+ + Code undergoing packaging to achieve understanding, trust, and connection for an audience. + + +
+ Code undergoing packaging to achieve understanding, trust, and connection for an audience. + +
+ +
+ +

These can be thought of metaphors when it comes to packaging in Python. Books have a smell which sometimes comes from how it was stored, treated, or maintained. While there are pleasant book smells, they might also smell soggy from being left in the rain or stored without maintenance for too long. Just like books, software can sometimes have negative code smells indicating a lack of care or less sustainable condition. Following good packaging practices helps to avoid unwanted code smells while increasing development velocity, maintainability of software through understandability, trustworthiness of the content, and connection to other projects.

+ +
+ + +
+ +

Note: these techniques can also work just as well for inner source collaboration (private or proprietary development within organizations)! Don’t hesitate to use these on projects which may not be public facing in order to make development and maintenance easier (if only for you).

+ +
+
+ +
+ + +
+ +

“Wait, what are Python packages?”

+ +
my_package/
+│   __init__.py
+│   module_a.py
+│   module_b.py
+
+ +

A Python package is a collection of modules (.py files) that usually include an “initialization file” __init__.py. This post will cover the craft of packaging which can include one or many packages.

+ +
+
+ +

Understanding: common directory structures

+ +
project_directory
+├── README.md
+├── LICENSE.txt
+├── pyproject.toml
+├── docs
+│   └── source
+│       └── index.md
+├── src
+│   └── package_name
+│       └── __init__.py
+│       └── module_a.py
+└── tests
+    └── __init__.py
+    └── test_module_a.py
+
+ +

Python Packaging today generally assumes a specific directory design. +Following this convention generally improves the understanding of your code. We’ll cover each of these below.

+ +

Project root files

+ +
project_directory
+├── README.md
+├── LICENSE.txt
+├── pyproject.toml
+│ ...
+
+ + + +

Project sub-directories

+ +
project_directory
+│ ...
+├── docs
+│   └── source
+│       └── index.md
+├── src
+│   └── package_name
+│       └── __init__.py
+│       └── module_a.py
+└── tests
+    └── __init__.py
+    └── test_module_a.py
+
+ + + +

Common directory structure examples

+ +

The Python directory structure described above can be witnessed in the wild from the following resources. These can serve as a great resource for starting or adjusting your own work.

+ + + +

Trust: building audience confidence

+ +
+ + How much does your audience trust your work?. + + +
+ How much does your audience trust your work?. + +
+ +
+ +

Building an understandable body of content helps tremendously with audience trust. What else can we do to enhance project trust? The following elements can help improve an audience’s trust in packaged Python work.

+ +

Source control authenticity

+ +
+ + Comparing the difference between a generic or anonymous user and one with greater authenticity. + + +
+ Comparing the difference between a generic or anonymous user and one with greater authenticity. + +
+ +
+ +

Be authentic! Fill out your profile to help your audience know the author and why you do what you do. See here for GitHub’s documentation on filling out your profile. Doing this may seem irrelevant but can go a long way to making technical work more relatable.

+ + + +

Staying up to date with supported Python releases

+ +
+ + Major Python releases and their support status. + + +
+ Major Python releases and their support status. + +
+ +
+ +

Use Python versions which are supported (this changes over time). +Python versions which are end-of-life may be difficult to support and are a sign of code decay for projects. Specify the version of Python which is compatiable with your project by using environment specifications such as pyproject.toml files and related packaging tools (more on this below).

+ + + +

Security linting and visible checks with GitHub Actions

+ +
+ + Make an effort to inspect your package for known security issues. + + +
+ Make an effort to inspect your package for known security issues. + +
+ +
+ +

Use security vulnerability linters to help prevent undesirable or risky processing for your audience. Doing this both practical to avoid issues and conveys that you care about those using your package!

+ + + +
+ + The green checkmark from successful GitHub Actions runs can offer a sense of reassurance to your audience. + + +
+ The green checkmark from successful GitHub Actions runs can offer a sense of reassurance to your audience. + +
+ +
+ +

Combining GitHub actions with security linters and tests from your software validation suite can add an observable ✅ for your project. +This provides the audience with a sense that you’re transparently testing and sharing results of those tests.

+ + + +

Connection: personal and inter-package relationships

+ +
+ + How does your package connect with other work and people? + + +
+ How does your package connect with other work and people? + +
+ +
+ +

Understandability and trust set the stage for your project’s connection to other people and projects. What can we do to facilitate connection with our project? Use the following techniques to help enhance your project’s connection to others and their work.

+ +

Acknowledging authors and referenced work with CITATION.cff

+ +
+ + figure image + + +
+ +

Add a CITATION.cff file to your project root in order to describe project relationships and acknowledgements in a standardized way. The CFF format is also GitHub compatible, making it easier to cite your project.

+ + + +

Reaching collaborators using CONTRIBUTING.md

+ +
+ + CONTRIBUTING.md documents can help you collaborate with others. + + +
+ CONTRIBUTING.md documents can help you collaborate with others. + +
+ +
+ +

Provide a CONTRIBUTING.md file to your project root so as to make clear support details, development guidance, code of conduct, and overall documentation surrounding how the project is governed.

+ + + +

Environment management reproducibility as connected project reality

+ +
+ + Environment and packaging managers can help you connect with your audience. + + +
+ Environment and packaging managers can help you connect with your audience. + +
+ +
+ +

Code without an environment specification is difficult to run in a consistent way. This can lead to “works on my machine” scenarios where different things happen for different people, reducing the chance that people can connect with a shared reality for how your code should be used.

+ +
+

“But why do we have to switch the way we do things?” +We’ve always been switching approaches (software approaches evolve over time)! A brief history of Python environment and packaging tooling:

+ +
    +
  1. +distutils, easy_install + setup.py
    (primarily used during 1990’s - early 2000’s)
  2. +
  3. +pip, setup.py + requirements.txt
    (primarily used during late 2000’s - early 2010’s)
  4. +
  5. +poetry + pyproject.toml
    (began use around late 2010’s - ongoing)
  6. +
+
+ +

Using Python poetry for environment and packaging management

+ +
+ + figure image + + +
+ +

Poetry is one Pythonic environment and packaging manager which can help increase reproducibility using pyproject.toml files. It’s one of many other alternatives such as hatch and pipenv.

+ +
+poetry directory structure template use
+ +
user@machine % poetry new --name=package_name --src .
+Created package package_name in .
+
+user@machine % tree .
+.
+├── README.md
+├── pyproject.toml
+├── src
+│   └── package_name
+│       └── __init__.py
+└── tests
+    └── __init__.py
+
+ +

After installation, Poetry gives us the ability to initialize a directory structure similar to what we presented earlier by using the poetry new ... command. If you’d like a more interactive version of the same, use the poetry init command to fill out various sections of your project with detailed information.

+ +
+poetry format for project pyproject.toml +
+ +
# pyproject.toml
+[tool.poetry]
+name = "package-name"
+version = "0.1.0"
+description = ""
+authors = ["username <email@address>"]
+readme = "README.md"
+packages = [{include = "package_name", from = "src"}]
+
+[tool.poetry.dependencies]
+python = "^3.9"
+
+[build-system]
+requires = ["poetry-core"]
+build-backend = "poetry.core.masonry.api"
+
+ +

Using the poetry new ... command also initializes the content of our pyproject.toml file with opinionated details (following the recommendation from earlier in the article regarding declared Python version specification).

+ +
+poetry dependency management
+ +
user@machine % poetry add pandas
+
+Creating virtualenv package-name-1STl06GY-py3.9 in /pypoetry/virtualenvs
+Using version ^2.1.0 for pandas
+
+...
+
+Writing lock file
+
+ +

We can add dependencies directly using the poetry add ... command. This command also provides the possibility of using a group flag (for example poetry add pytest --group testing) to help organize and distinguish multiple sets of dependencies.

+ + + +
Running Python from the context of poetry environments
+ +
% poetry run python -c "import pandas; print(pandas.__version__)"
+
+2.1.0
+
+ +

We can invoke the virtual environment directly using the poetry run ... command.

+ + + +
Building source code with poetry +
+ +
% pip install git+https://github.com/project/package_name
+
+ +

Even if we don’t reach wider distribution on PyPI or elsewhere, source code managed by pyproject.toml and poetry can be used for “manual” distribution (with reproducible results) from GitHub repositories. When we’re ready to distribute pre-built packages on other networks we can also use the following:

+ +
% poetry build
+
+Building package-name (0.1.0)
+  - Building sdist
+  - Built package_name-0.1.0.tar.gz
+  - Building wheel
+  - Built package_name-0.1.0-py3-none-any.whl
+
+ +

Poetry readies source-code and pre-compiled versions of our code for distribution platforms like PyPI by using the poetry build ... command. We’ll cover more on these files and distribution steps with a later post!

+
+ + + + + +
+ + + +
+ + + Previous post
+ + Tip of the Week: Using Python and Anaconda with the Alpine HPC Cluster + + +
+ + + Next post
+ + Tip of the Week: Data Quality Validation through Software Testing Techniques + + +
+
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/2023/10/04/Data-Quality-Validation.html b/preview/pr-36/2023/10/04/Data-Quality-Validation.html new file mode 100644 index 0000000000..63bed626a9 --- /dev/null +++ b/preview/pr-36/2023/10/04/Data-Quality-Validation.html @@ -0,0 +1,900 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Tip of the Week: Data Quality Validation through Software Testing Techniques | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Tip of the Week: Data Quality Validation through Software Testing Techniques

+ + + + + + + + +
+ + + + + +
+ + + +

Tip of the Week: Data Quality Validation through Software Testing Techniques

+ +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ +

TLDR (too long, didn’t read);

+ +

Implement data quality validation through software testing approaches which leverage ideas surrounding Hoare triples and Design by contract (DbC). Balancing reusability through component-based design data testing with Great Expectations or Assertr. For greater specificity in your data testing, use database schema-like verification through Pandera or a JSON Schema validator. When possible, practice shift-left testing on data sources by through the concept of “database(s) as code” via tools like Data Version Control (DVC) and Flyway.

+ +

Introduction

+ +

+ +

Diagram showing input, in-process data, and output data as a workflow.

+ + +

Data orientated software development can benefit from a specialized focus on varying aspects of data quality validation. +We can use software testing techniques to validate certain qualities of the data in order to meet a declarative standard (where one doesn’t need to guess or rediscover known issues). +These come in a number of forms and generally follow existing software testing concepts which we’ll expand upon below. +This article will cover a few tools which leverage these techniques for addressing data quality validation testing. +

+

Data Quality Testing Concepts

+ +

Hoare Triple

+ +

+ +

One concept we’ll use to present these ideas is Hoare logic, which is a system for reasoning on software correctness. +Hoare logic includes the idea of a Hoare triple ($ {\displaystyle {P}C{Q}} $) where $ {\displaystyle {P}} $ is an assertion of precondition, $ {\displaystyle \ C} $ is a command, and $ {\displaystyle {Q}} $ is a postcondition assertion. +Software development using data often entails (sometimes assumed) assertions of precondition from data sources, a transformation or command which changes the data, and a (sometimes assumed) assertion of postcondition in a data output or result.

+ +

Design by Contract

+ +

+ +

Data testing through design by contract over Hoare triple.

+ +

Hoare logic and Software correctness help describe design by contract (DbC), a software approach involving the formal specification of “contracts” which help ensure we meet our intended goals. +DbC helps describe how to create assertions when proceeding through Hoare triplet states for data. +These concepts provide a framework for thinking about the tools mentioned below.

+ +

Data Component Testing

+ +

+ +

Diagram showing data contracts as generalized and reusable “component” testing being checked through contracts and raising an error if they aren’t met or continuing operations if they are met.

+ +

We often need to verify a certain component’s surrounding data in order to ensure it meets minimum standards. +The word “component” is used here from the context of component-based software design to group together reusable, modular qualities of the data where sometimes we don’t know (or want) to specify granular aspects (such as schema, type, column name, etc). +These components often are implied by software which will eventually use the data, which can emit warnings or errors when they find the data does not meet these standards. +Oftentimes these components are contracts checking postconditions of earlier commands or procedures, ensuring the data we receive is accurate to our intention. +We can avoid these challenges by creating contracts for our data to verify the components of the result before it reaches later stages.

+ +

Examples of these data components might include:

+ + + +

Data Component Testing - Great Expectations

+ +
"""
+Example of using Great Expectations
+Referenced with modifications from: 
+https://docs.greatexpectations.io/docs/tutorials/quickstart/
+"""
+import great_expectations as gx
+
+# get gx DataContext
+# see: https://docs.greatexpectations.io/docs/terms/data_context
+context = gx.get_context()
+
+# set a context data source 
+# see: https://docs.greatexpectations.io/docs/terms/datasource
+validator = context.sources.pandas_default.read_csv(
+    "https://raw.githubusercontent.com/great-expectations/gx_tutorials/main/data/yellow_tripdata_sample_2019-01.csv"
+)
+
+# add and save expectations 
+# see: https://docs.greatexpectations.io/docs/terms/expectation
+validator.expect_column_values_to_not_be_null("pickup_datetime")
+validator.expect_column_values_to_be_between("passenger_count", auto=True)
+validator.save_expectation_suite()
+
+# checkpoint the context with the validator
+# see: https://docs.greatexpectations.io/docs/terms/checkpoint
+checkpoint = context.add_or_update_checkpoint(
+    name="my_quickstart_checkpoint",
+    validator=validator,
+)
+
+# gather checkpoint expectation results
+checkpoint_result = checkpoint.run()
+
+# show the checkpoint expectation results
+context.view_validation_result(checkpoint_result)
+
+ +

Example code leveraging Python package Great Expectations to perform various data component contract validation.

+ +

Great Expectations is a Python project which provides data contract testing features through the use of component called “expectations” about the data involved. +These expectations act as a standardized way to define and validate the component of the data in the same way across different datasets or projects. +In addition to providing a mechanism for validating data contracts, Great Expecations also provides a way to view validation results, share expectations, and also build data documentation. +See the above example for a quick code reference of how these work.

+ +

Data Component Testing - Assertr

+ +
# Example using the Assertr package
+# referenced with modifications from:
+# https://docs.ropensci.org/assertr/articles/assertr.html
+library(dplyr)
+library(assertr)
+
+# set our.data to reference the mtcars dataset
+our.data <- mtcars
+
+# simulate an issue in the data for contract specification
+our.data$mpg[5] <- our.data$mpg[5] * -1
+
+# use verify to validate that column mpg >= 0
+our.data %>%
+  verify(mpg >= 0)
+
+# use assert to validate that column mpg is within the bounds of 0 to infinity
+our.data %>%
+  assert(within_bounds(0,Inf), mpg)
+
+ +

Example code leveraging R package Assertr to perform various data component contract validation.

+ +

Assertr is an R project which provides similar data component assertions in the form of verify, assert, and insist methods (see here for more documentation). +Using Assertr enables a similar but more lightweight functionality to that of Great Expectations. +See the above for an example of how to use it in your projects.

+ +

Data Schema Testing

+ +

+ +

Diagram showing data contracts as more granular specifications via “schema” testing being checked through contracts and raising an error if they aren’t met or continuing operations if they are met.

+ +

Sometimes we need greater specificity than what a data component can offer. +We can use data schema testing contracts in these cases. +The word “schema” here is used from the context of database schema, but oftentimes these specifications are suitable well beyond solely databases (including database-like formats like dataframes). +While reuse and modularity are more limited with these cases, they can be helpful for efforts where precision is valued or necessary to accomplish your goals. +It’s worth mentioning that data schema and component testing tools often have many overlaps (meaning you can interchangeably use them to accomplish both tasks).

+ +

Data Schema Testing - Pandera

+ +
"""
+Example of using the Pandera package
+referenced with modifications from:
+https://pandera.readthedocs.io/en/stable/try_pandera.html
+"""
+import pandas as pd
+import pandera as pa
+from pandera.typing import DataFrame, Series
+
+
+# define a schema
+class Schema(pa.DataFrameModel):
+    item: Series[str] = pa.Field(isin=["apple", "orange"], coerce=True)
+    price: Series[float] = pa.Field(gt=0, coerce=True)
+
+
+# simulate invalid dataframe
+invalid_data = pd.DataFrame.from_records(
+    [{"item": "applee", "price": 0.5}, 
+     {"item": "orange", "price": -1000}]
+)
+
+
+# set a decorator on a function which will
+# check the schema as a precondition
+@pa.check_types(lazy=True)
+def precondition_transform_data(data: DataFrame[Schema]):
+    print("here")
+    return data
+
+
+# precondition schema testing
+try:
+    precondition_transform_data(invalid_data)
+except pa.errors.SchemaErrors as schema_excs:
+    print(schema_excs)
+
+# inline or implied postcondition schema testing
+try:
+    Schema.validate(invalid_data)
+except pa.errors.SchemaError as schema_exc:
+    print(schema_exc)
+
+ +

Example code leveraging Python package Pandera to perform various data schema contract validation.

+ +

DataFrame-like libraries like Pandas can verified using schema specification contracts through Pandera (see here for full DataFrame library support). +Pandera helps define specific columns, column types, and also has some component-like features. +It leverages a Pythonic class specification, similar to data classes and pydantic models, making it potentially easier to use if you already understand Python and DataFrame-like libraries. +See the above example for a look into how Pandera may be used.

+ +

Data Schema Testing - JSON Schema

+ +
# Example of using the jsonvalidate R package.
+# Referenced with modifications from:
+# https://docs.ropensci.org/jsonvalidate/articles/jsonvalidate.html
+
+schema <- '{
+  "$schema": "https://json-schema.org/draft/2020-12/schema",
+  "title": "Hello World JSON Schema",
+  "description": "An example",
+  "type": "object",
+  "properties": {
+    "hello": {
+      "description": "Provide a description of the property here",
+      "type": "string"
+    }
+  },
+  "required": [
+    "hello"
+  ]
+}'
+
+# create a schema contract for data
+validate <- jsonvalidate::json_validator(schema, engine = "ajv")
+
+# validate JSON using schema specification contract and invalid data
+validate("{}")
+
+# validate JSON using schema specification contract and valid data
+validate("{'hello':'world'}")
+
+ +

JSON Schema provides a vocabulary way to validate schema contracts for JSON documents. +There are several implementations of the vocabulary, including Python package jsonschema, and R package jsonvalidate. +Using these libraries allows you to define pre- or postcondition data schema contracts for your software work. +See above for an R based example of using this vocabulary to perform data schema testing.

+ +

Shift-left Data Testing

+ +

+ +

Earlier portions of this article have covered primarily data validation of command side-effects and postconditions. +This is commonplace in development where data sources usually are provided without the ability to validate their precondition or definition. +Shift-left testing is a movement which focuses on validating earlier in the lifecycle if and when possible to avoid downstream issues which might occur.

+ +

Shift-left Data Testing - Data Version Control (DVC)

+ +

+ +

Data sources undergoing frequent changes become difficult to use because we oftentimes don’t know when the data is from or what version it might be. +This information is sometimes added in the form of filename additions or an update datetime column in a table. +Data Version Control (DVC) is one tool which is specially purposed to address this challenge through source control techniques. +Data managed by DVC allows software to be built in such a way that version preconditions are validated before reaching data transformations (commands) or postconditions.

+ +

Shift-left Data Testing - Flyway

+ +

+ +

Database sources can leverage an idea nicknamed “database as code” (which builds on a similar idea about infrastructure as code) to help declare the schema and other elements of a database in the same way one would code. +These ideas apply to both databases and also more broadly through DVC mentioned above (among other tools) via the concept “data as code”. +Implementing this idea has several advantages from source versioning, visibility, and replicability. +One tool which implements these ideas is Flyway which can manage and implement SQL-based files as part of software data precondition validation. +A lightweight alternative to using Flyway is sometimes to include a SQL file which creates related database objects and becomes data documentation.

+
+ + + + + +
+ + + +
+ + + Previous post
+ + Tip of the Week: Python Packaging as Publishing + + +
+ + + Next post
+ + Tip of the Week: Codesgiving - Open-source Contribution Walkthrough + + +
+
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/2023/11/15/Codesgiving-Open-source-Contribution-Walkthrough.html b/preview/pr-36/2023/11/15/Codesgiving-Open-source-Contribution-Walkthrough.html new file mode 100644 index 0000000000..809a7f6a07 --- /dev/null +++ b/preview/pr-36/2023/11/15/Codesgiving-Open-source-Contribution-Walkthrough.html @@ -0,0 +1,1049 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Tip of the Week: Codesgiving - Open-source Contribution Walkthrough | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Tip of the Week: Codesgiving - Open-source Contribution Walkthrough

+ + + + + + + + +
+ + + + + +
+ + + +

Tip of the Week: Codesgiving - Open-source Contribution Walkthrough

+ +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ +

Introduction

+ +
+ + What good harvests from open-source have you experienced this year? + + +
+ What good harvests from open-source have you experienced this year? + +
+ +
+ + +

Thanksgiving is a holiday practiced in many countries which focuses on gratitude for good harvests of the preceding year. +In the United States, we celebrate Thanksgiving on the fourth Thursday of November each year often by eating meals we create together with others. +This post channels the spirit of Thanksgiving by giving our thanks through code as a “Codesgiving”, acknowledging and creating better software together. +

+ +

Giving Thanks to Open-source Harvests

+ +

+ +

Part of building software involves the use of code which others have built, maintained, and distributed for a wider audience. +Using other people’s work often comes in the form of open-source “harvesting” as we find solutions to software challenges we face. +Examples might include installing and depending upon Python packages from PyPI or R packages from CRAN within your software projects.

+ +
+

“Real generosity toward the future lies in giving all to the present.” +- Albert Camus

+
+ +

These open-source projects have internal costs which are sometimes invisible to those who consume them. +Every software project has an implied level of software gardening time costs involved to impede decay, practice continuous improvements, and evolve the work. +One way to actively share our thanks for the projects we depend on is through applying our time towards code contributions on them.

+ +

Many projects are in need of additional people’s thinking and development time. +Have you ever noticed something that needs to be fixed or desirable functionality in a project you use? +Consider adding your contributions to open-source!

+ +

All Contributions Matter

+ +

+ +

Contributing to open-source can come in many forms and contributions don’t need to be gigantic to make an impact. +Software often involves simplifying complexity. +Simplification requires many actions beyond solely writing code. +For example, a short walk outside, a conversation with someone, or a nap can sometimes help us with breakthroughs when it comes to development. +By the same token, open-source benefits greatly from communications on discussion boards, bug or feature descriptions, or other work that might not be strictly considered “engineering”.

+ +

An Open-source Contribution Approach

+ +

+ +

The troubleshooting process as a workflow involving looped checks for verifying an issue and validating the solution fixes an issue.

+ +

It can feel overwhelming to find a way to contribute to open-source. +Similar to other software methodology, modularizing your approach can help you progress without being overwhelmed. +Using a troubleshooting approach like the above can help you break down big challenges into bite-sized chunks. +Consider each step as a “module” or “section” which needs to be addressed sequentially.

+ +

Embrace a Learning Mindset

+ +
+

“Before you speak ask yourself if what you are going to say is true, is kind, is necessary, is helpful. If the answer is no, maybe what you are about to say should be left unsaid.” +- Bernard Meltzer

+
+ +

Open-source contributions almost always entail learning of some kind. +Many contributions happen solely in the form of code and text communications which are easily misinterpreted. +Assume positive intent and accept input from others while upholding your own ideas to share successful contributions together. +Prepare yourself by intentionally opening your mind to input from others, even if you’re sure you’re absolutely “right”.

+ +
+ + +
+ +

Before communicating, be sure to use Bernard Meltzer’s self-checks mentioned above.

+ +
    +
  1. Is what I’m about to say true? +
      +
    • Have I taken time to verify the claims in a way others can replicate or understand?
    • +
    +
  2. +
  3. Is what I’m about to say kind? +
      +
    • Does my intention and communication channel kindness (and not cruelty)?
    • +
    +
  4. +
  5. Is what I’m about to say necessary? +
      +
    • Do my words and actions here enable or enhance progress towards a goal (would the outcome be achieved without them)?
    • +
    +
  6. +
  7. Is what I’m about to say helpful? +
      +
    • How does my communication increase the quality or sustainability of the project (or group)?
    • +
    +
  8. +
+ +
+
+ +

Setting Software Scheduling Expectations

+ + + + + + + +
+ + + +

Suggested ratio of time spent by type of work for an open-source contribution.

+ +
    +
  1. 1/3 planning (~33%)
  2. +
  3. 1/6 coding (~16%)
  4. +
  5. 1/4 component and system testing (25%)
  6. +
  7. 1/4 code review, revisions, and post-actions (25%)
  8. +
+ +

This modified rule of thumb from The Mythical Man Month can assist with how you structure your time for an open-source contribution. +Notice the emphasis on planning and testing and keep these in mind as you progress (the actual programming time can be small if adequate time has been spent on planning). +Notably, the original time fractions are modified here with the final quarter of the time spent suggested as code review, revisions, and post-actions. +Planning for the time expense of the added code review and related elements assists with keeping a learning mindset throughout the process (instead of feeling like the review is a “tack-on” or “optional / supplementary”). +A good motto to keep in mind throughout this process is Festina lente, or “Make haste, slowly.” (take care to move thoughtfully and as slowly as necessary to do things correctly the first time).

+ +

Planning an Open-source Contribution

+ +

Has the Need Already Been Reported?

+ +

+ +

Be sure to check whether the bug or feature has already been reported somewhere! +In a way, this is a practice of “Don’t repeat yourself” (DRY) where we attempt to avoid repeating the same block of code (in this case, the “code” can be understood as natural language). +For example, you can look on GitHub Issues or GitHub Discussions with a search query matching the rough idea of what you’re thinking about. +You can also use the GitHub search bar to automatically search multiple areas (including Issues, Discussions, Pull Requests, etc.) when you enter a query from the repository homepage. +If it has been reported already, take a look to see if someone has made a code contribution related to the work already.

+ +

An open discussion or report of the need doesn’t guarantee someone’s already working on a solution. +If there aren’t yet any code contributions and it doesn’t look like anyone is working on one, consider volunteering to take a further look into the solution and be sure to acknowledge any existing discussions. +If you’re unsure, it’s always kind to mention your interest in the report and ask for more information.

+ +

Is the Need a Bug or Feature?

+ + + + +
+ + + +
+ +

One way to help solidify your thinking and the approach is to consider whether what you’re proposing is a bug or a feature. +A software bug is considered something which is broken or malfunctioning. +A software feature is generally considered new functionality or a different way of doing things than what exists today. +There’s often overlap between these, and sometimes they can inspire branching needs, but individually they usually are more of one than the other. +If you can’t decide whether your need is a bug or a feature, consider breaking it down into smaller sub-components so they can be more of one or the other. +Following this strategy will help you communicate the potential for contribution and also clarify the development process (for example, a critical bug might be prioritized differently than a nice-to-have new feature).

+ +

Reporting the Need for Change

+ +
# Using `function_x` with `library_y` causes `exception_z`
+
+## Summary
+
+As a `library_y` research software developer I want to use `function_x` 
+for my data so that I can share data for research outcomes.
+
+## Reproducing the error
+
+This error may be seen using Python v3.x on all major OS's using
+the following code snippet:
+...
+
+
+ +

An example of a user story issue report with imagined code example.

+ +

Open-source needs are often best reported through written stories captured within a bug or feature tracking system (such as GitHub Issues) which if possible also include example code or logs. +One template for reporting issues is through a “user story”. +A user story typically comes in the form: As a < type of user >, I want < some goal > so that < some reason >. (Mountain Goat Software: User Stories). +Alongside the story, it can help to add in a snippet of code which exemplifies a problem, new functionality, or a potential adjacent / similar solution. +As a general principle, be as specific as you can without going overboard. +Include things like programming language version, operating system, and other system dependencies that might be related.

+ +

Once you have a good written description of the need, be sure to submit it where it can be seen by the relevant development community. +For GitHub-based work, this is usually a GitHub Issue, but can also entail discussion board posts to gather buy-in or consensus before proceeding. +In addition to the specifics outlined above, also recall the learning mindset and Bernard Meltzer’s self-checks, taking time to acknowledge especially the potential challenges and already attempted solutions associated with the description (conveying kindness throughout).

+ +

What Happens After You Submit a Bug or Feature Report?

+ +

+ +

When making open-source contributions, sometimes it can also help to mention that you’re interested in resolving the issue through a related pull request and review. +Oftentimes open-source projects welcome new contributors but may have specific requirements. +These requirements are usually spelled out within a CONTRIBUTING.md document found somewhere in the repository or the organization level documentation. +It’s also completely okay to let other contributors build solutions for the issue (like we mentioned before, all contributions matter, including the reporting of bugs or features themselves)!

+ +

Developing and Testing an Open-source Contribution

+ +

Creating a Development Workspace

+ +

+ +

Once ready to develop a solution for the reported need in the open-source project you’ll need a place to version your updates. +This work generally takes place through version control on focused branches which are named in a way that relates to the focus. +When working on GitHub, this work also commonly takes place on forked repository copies. +Using these methods helps isolate your changes from other work that takes place within the project. +It also can help you track your progress alongside related changes that might take place before you’re able to seek review or code merges.

+ +

Bug or Feature Verification with Test-driven Development

+ +
+ + +
+ +

One can use a test-driven development approach as numbered steps (Wikipedia).

+ +
+
    +
  1. Add or modify a test which checks for a bug fix or feature addition
  2. +
  3. Run all tests (expecting the newly added test content to fail)
  4. +
  5. Write a simple version of code which allows the tests to succeed
  6. +
  7. Verify that all tests now pass
  8. +
  9. Return to step 3, refactoring the code as needed
  10. +
+
+ + +
+
+ +

If you decide to develop a solution for what you reported, one software strategy which can help you remain focused and objective is test-driven development. +Using this pattern sets a “cognitive milestone” for you as you develop a solution to what was reported. +Open-source projects can have many interesting components which could take time and be challenging to understand. +The addition of the test and related development will help keep you goal-orientated without getting lost in the “software forest” of a project.

+ +

Prefer Simple Over Complex Changes

+ +
+

… +Simple is better than complex. +Complex is better than complicated. +… +- PEP 20: The Zen of Python

+
+ +

Further channeling step 3. from test-driven development above, prefer simple changes over more complex ones (recognizing that the absolute simplest can take iteration and thought). +Some of the best solutions are often the most easily understood ones (where the code addition or changes seem obvious afterwards). +A “simplest version” of the code can often be more quickly refactored and completed than devising a “perfect” solution the first time. +Remember, you’ll very likely have the help of a code review before the code is merged (expect to learn more and add changes during review!).

+ +

It might be tempting to address more than one bug or feature at the same time. +Avoid feature creep as you build solutions - stay focused on the task at hand! +Take note of things you notice on your journey to address the reported needs. +These can be become additional reported bugs or features which could be addressed later. +Staying focused with your development will save you time, keep your tests constrained, and (theoretically) help reduce the time and complexity of code review.

+ +

Developing a Solution

+ +

+ +

Once you have a test in place for the bug fix or feature addition it’s time to work towards developing a solution. +If you’ve taken time to accomplish the prior steps before this point you may already have a good idea about how to go about a solution. +If not, spend some time investigating the technical aspects of a solution, optionally adding this information to the report or discussion content for further review before development. +Use timeboxing techniques to help make sure the time you spend in development is no more than necessary.

+ +

Code Review, Revisions, and Post-actions

+ +

Pull Requests and Code Review

+ +

When your code and new test(s) are in a good spot it’s time to ask for a code review. +It might feel tempting to perfect the code. +Instead, consider whether the code is “good enough” and would benefit from someone else providing feedback. +Code review takes advantage of a strength of our species: collaborative & multi-perspectival thinking. +Leverage this in your open-source experience by seeking feedback when things feel “good enough”.

+ +
+ + + +

Demonstrating Pareto Principle “vital few” through a small number of changes to achieve 80% of the value associated with the needs.

+ +

One way to understand “good enough” is to assess whether you have reached what the Pareto Principle terms as the “vital few” causes. +The Pareto Principle states that roughly 80% of consequences come from 20% of causes (the “vital few”). +What are the 20% changes (for example, as commits) which are required to achieve 80% of the desired intent for development with your open-source contribution? +When you reach those 20% of the changes, consider opening a pull request to gather more insight about whether those changes will suffice and how the remaining effort might be spent.

+ +

As you go through the process of opening a pull request, be sure to follow the open-source CONTRIBUTING.md document documentation related to the project; each one can vary. +When working on GitHub-based projects, you’ll need to open a pull request on the correct branch (usually upstream main). +If you used a GitHub issue to help report the issue, mention the issue in the pull request description using the #issue number (for example #123 where the issue link would look like: https://github.com/orgname/reponame/issues/123) reference to help link the work to the reported need. +This will cause the pull request to show up within the issue and automatically create a link to the issue from the pull request.

+ +

Code Revisions

+ +
+

“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” +- Antoine de Saint-Exupery

+
+ +

You may be asked to update your code based on automated code quality checks or reviewer request. +Treat these with care; embrace learning and remember that this step can take 25% of the total time for the contribution. +When working on GitHub forks or branches, you can make additional commits directly on the development branch which was used for the pull request. +If your reviewers requested changes, re-request their review once changes have been made to help let them know the code is ready for another look.

+ +

Post-actions and Tidying Up Afterwards

+ +

+ +

Once the code has been accepted by the reviewers and through potential automated testing suite(s) the content is ready to be merged. +Oftentimes this work is completed by core maintainers of the project. +After the code is merged, it’s usually a good idea to clean up your workspace by deleting your development branch and syncing with the upstream repository. +While it’s up to core maintainers to decide on report closure, typically the reported need content can be closed and might benefit from a comment describing the fix. +Many of these steps are considered common courtesy but also, importantly, assist in setting you up for your next contributions!

+ +

Concluding Thoughts

+ +

Hopefully the above helps you understand the open-source contribution process better. +As stated earlier, every little part helps! +Best wishes on your open-source journey and happy Codesgiving!

+ +

References

+ + +
+ + + + + +
+ + + +
+ + + Previous post
+ + Tip of the Week: Data Quality Validation through Software Testing Techniques + + +
+ + + Next post
+ + Python Memory Management and Troubleshooting + + +
+
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/2024/01/22/Python-Memory-Management-and-Troubleshooting.html b/preview/pr-36/2024/01/22/Python-Memory-Management-and-Troubleshooting.html new file mode 100644 index 0000000000..7cdc376a4a --- /dev/null +++ b/preview/pr-36/2024/01/22/Python-Memory-Management-and-Troubleshooting.html @@ -0,0 +1,1026 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Python Memory Management and Troubleshooting | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Python Memory Management and Troubleshooting

+ + + + + + + + +
+ + + + + +
+ + + +

Python Memory Management and Troubleshooting

+ +
+ + +
+ +

These blog posts are intended to provide software tips, concepts, and tools geared towards helping you achieve your goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any questions or suggestions for blog posts, please don’t hesitate to reach out!

+ +
+
+ +

Introduction

+ + +

Have you ever run Python code only to find it taking forever to complete or sometime abruptly ending with an error like: 123456 Killed or killed (program exited with code: 137)? +You may have experienced memory resource or management challenges associated with these scenarios. +This post will cover some computer memory definitions, how Python makes use of computer memory, and share some tools which may help with these types of challenges. +

+ +

What is Software?

+ + + + +

+ +

Computer software includes programs, documentation, and other data maintained on computer data storage.

+ +

Computer software is the collection of programs and data which are used to accomplish a specific tasks on a computer. +“A computer program is a sequence or set of instructions in a programming language for a computer to execute. It is one component of software, which also includes documentation and other intangible components.” (Wikipedia: Computer program). +Computer programs in their human-readable form are stored as source code. +Source code is often maintained on computer data storage.

+ +

What is Memory?

+ +

Computer Memory

+ +

+ +

Computer memory is a type of computer resource available for use by processes on a computer.

+ +

Computer memory, also sometimes known as “RAM” or “random-access memory”, or “dynamic memory” is a type of resource used by computer software on a computer. +“Computer memory stores information, such as data and programs for immediate use in the computer. … Main memory operates at a high speed compared to non-memory storage which is slower but less expensive and oftentimes higher in capacity. “ (Wikipedia: Computer memory). +When we execute a computer program it becomes a process (or sometimes many processes). +Processes are loaded into computer memory to follow the instructions and other data provided from their related computer programs.

+ +
+ + +
+ +

The word “speed” in the above context is sometimes used to describe the delay before an operation on a computer completes (also known as latency). +See the following on [Computer] Latency Numbers Everyone Should Know to better understand relative computer operation speeds.

+ + +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + +
Process Memory SegmentPurpose
StackContains information about sequences of program instructions as functions or subroutines.
HeapArea where memory for variables may be dynamically used.
Initialized dataIncludes global and static variables which are explicitly initialized.
Uninitialized dataIncludes global and static variables which are not explicitly initialized.
TextComprises program instructions for the process.
+ +

Process memory is divided into segments which have specific purposes (The Linux Programming Interface by Michael Kerrisk).

+ +

Memory for a process is further divided into parts which are typically called segments. +Each process memory segment has a specific purpose and way of organizing things. +For the purposes of this content we’ll focus on two of these segments: the stack and the heap. +The stack (sometimes also known as the “call stack”) includes information about sequences of program instructions packaged as units called “functions” or “subroutines”. +The stack also typically stores function local variables, arguments, and return value. +The heap is an area where variables for a program may be dynamically stored. +The stack can be thought of as a “roadmap” for what program will accomplish (including the location of things it will need to do that work). +The heap can be imagined of as a “warehouse” store (or remove) things used as part of the stack “roadmap”. +Please see The Linux Programming Interface by Michael Kerrisk, Chapter 6.3: Memory Layout of a Process for more information about processes.

+ + + + + + + + + + + + + + +
Memory Blocks
+
+A.) All memory blocks available. + + + + + +
BlockBlockBlock
+
+
+
+B.) Some memory blocks in use. + + + + + +
BlockBlockBlock
+
+
Practical analogy
+
+C.) You have limited boxes to hold things. + + + + + +
📦📦📦
+
+
+
+D.) Two boxes are used, the other remains empty (ready for use). + + + + + +
📦📦📦
+
+
+ +

Memory blocks may be free or used at various times. They can be thought of like reusable buckets to hold things.

+ +

The heap is often further organized through the use of “blocks”. +Memory blocks are chunks of memory of a certain byte or bit size (usually all the same size) (Wikipedia: Block (data storage)). +Memory blocks may be in use or free at different times. +If the heap is a process memory “warehouse” then blocks are like “boxes” inside the warehouse.

+ +

+ +

Process memory heaps help organize memory blocks on a computer for specific procedures. Heaps may have one or many memory pools.

+ +

Blocks may be organized in hierarchical layers to manage memory efficiently or towards a specific purpose. +Blocks may sometimes be organized into pools within the process memory heap segment. +Pools are areas of the heap used to efficiently manage blocks together in specific ways. +Each heap may have one or many pools (each with sets of blocks). +If the heap is a process memory “warehouse”, and blocks are like “boxes” inside the warehouse, pools are like “shelves” for organizing and moving those boxes within the warehouse.

+ +

Memory Allocator

+ +

+ +

Memory allocators help software reserve and free computer memory resources.

+ +

Memory management is a concept which helps enable the shared use of computer memory to avoid challenges such as memory overuse (where all memory is in use and never shared to other software). +Computer memory management often occurs through the use of a memory allocator which controls how computer memory resources are used for software. +Computer software is written to interact with memory allocators to use computer memory. +Memory allocators may be used manually (with specific directions provided on when and how to use memory resources) or automatically (with an algorithmic approach of some kind). +The memory allocator usually performs the following actions with memory (in addition to others):

+ + + +

Garbage Collection

+ +

+ +

Garbage collectors help free computer memory which is no longer referenced by software.

+ +

“Garbage collection (GC)” is used to describe a type of automated memory management. +GC is typically used to help reduce human error, avoid unintentional system failures, and decrease development time (through less memory-specific code). +“The garbage collector attempts to reclaim memory which was allocated by the program, but is no longer referenced; such memory is called garbage.” (Wikipedia: Garbage collection (computer science)). +A garbage collector often works in tandem with a memory allocator to help control computer memory resource usage in software development.

+ +

How Does Python Interact with Computer Memory?

+ +

Python Overview

+ +

+ +

A Python interpreter executes Python code and manages memory for Python procedures.

+ +

Python is an interpreted “high-level” programming language (Python: What is Python?). +Interpreted languages are those which include an “interpreter” which helps execute code written in a particular way (Wikipedia: Interpreter (computing)). +High-level languages such as Python often remove the requirement for software developers to manually perform memory management (Wikipedia: High-level programming language).

+ +

Python code is executed by a commonly pre-packaged and downloaded binary call the Python interpreter. +The Python interpreter reads Python code and performs memory management as the code is executed. +The CPython Python interpreter is the most commonly used interpreter for Python, and what’s use as a reference for other content here. +There are also other interpreters such as PyPy, Jython, and IronPython which all handle memory differently than the CPython interpreter.

+ +

Python’s Memory Manager

+ +

+ +

The Python memory manager helps manage memory in the heap for Python processes executed by the Python interpreter.

+ +

Memory is managed for Python software processes automatically (when unspecified) or manually (when specified) through the Python interpreter. +The Python memory manager is an abstraction which manages memory for Python software processes through the Python interpreter (Python: Memory Management). +From a high-level perspective, we assume variables and other operations written in Python will automatically allocate and deallocate memory through the Python interpreter when executed. +Python’s memory manager performs work through various memory allocators and a garbage collector (or as configured with customizations) within a private Python memory heap.

+ +

Python’s Memory Allocators

+ +

+ +

The Python memory manager by default will use pymalloc internally or malloc from the system to allocate computer memory resources.

+ +

The Python memory manager allocates memory for use through memory allocators. +Python may use one or many memory allocators depending on specifications in Python code and how the Python interpreter is configured (for example, see Python: Memory Management - Default Memory Allocators). +One way to understand Python memory allocators is through the following distinctions.

+ + + +

+ +

pymalloc makes use of arenas to further organize pools within a Python process memory heap.

+ +

It’s important to note that pymalloc adds additional abstractions to how memory is organized through the use of “arenas”. +These arenas are specific to pymalloc purposes. +pymalloc may be disabled through the use of a special environment variable called PYTHONMALLOC (for example, to use only C standard library dynamic memory allocation functions as seen below). +This same environment variable may be used with debug settings in order to help troubleshoot in-depth questions.

+ +

Additional Python Memory Allocators

+ +

+ +

Python code may stipulate the use of additional memory allocators, such as mimalloc and jemalloc outside of the default Python memory manager’s operation.

+ +

Python provides the capability of customizing memory allocation through the use of custom code or non-default packages. +See below for some notable examples of additional memory allocation possibilities.

+ + + +

Python Reference Counting

+ + + + + + + + + + + + + + + + + + + + +</table> + +_Python reference counting at a simple level works through the use of object reference increments and decrements._ +{:.center} + +As computer memory is allocated to Python processes the Python memory manager keeps track of these through the use of a [reference counter](https://en.wikipedia.org/wiki/Reference_counting). +In Python, we could label this as an "Object reference counter" because all data in Python is represented by objects ([Python: Data model](https://docs.python.org/3/reference/datamodel.html#objects-values-and-types)). +"... CPython counts how many different places there are that have a reference to an object. Such a place could be another object, or a global (or static) C variable, or a local variable in some C function." ([Python Developer's Guide: Garbage collector design](https://devguide.python.org/internals/garbage-collector/)). + +### Python's Garbage Collection + + + +_The Python garbage collector works as part of the Python memory manager to free memory which is no longer needed (based on reference count)._ +{:.center} + +Python by default uses an optional garbage collector to automatically deallocate garbage memory through the Python interpreter in CPython. +"When an object’s reference count becomes zero, the object is deallocated." ([Python Developer's Guide: Garbage collector design](https://devguide.python.org/internals/garbage-collector/)) +Python's garbage collector focuses on collecting garbage created by `pymalloc`, C memory functions, as well as other memory allocators like `mimalloc` and `jemalloc`. + +## Python Tools for Observing Memory Behavior + +### Python Built-in Tools + +```python +import gc +import sys + +# set gc in debug mode for detecting memory leaks +gc.set_debug(gc.DEBUG_LEAK) + +# create an int object +an_object = 1 + +# show the number of uncollectable references via COLLECTED +COLLECTED = gc.collect() +print(f"Uncollectable garbage references: {COLLECTED}") + +# show the reference count for an object +print(f"Reference count of `an_object`: {sys.getrefcount(an_object)}") +``` + +The [`gc` module](https://docs.python.org/3/library/gc.html) provides an interface to the Python garbage collector. +In addition, the [`sys` module](https://docs.python.org/3/library/sys.html) provides many functions which provide information about references and other details about Python objects as they are executed through the interpreter. +These functions and other packages can help software developers observe memory behaviors within Python procedures. + +### Python Package: Scalene + +
+ + Scalene provides a web interface to analyze memory, CPU, and GPU resource consumption in one spot alongside suggested areas of concern. + + +
+ Scalene provides a web interface to analyze memory, CPU, and GPU resource consumption in one spot alongside suggested areas of concern. + +
+ +
+ + +[Scalene](https://github.com/plasma-umass/scalene) is a Python package for analyzing memory, CPU, and GPU resource consumption. +It provides [a web interface](https://github.com/plasma-umass/scalene?tab=readme-ov-file#web-based-gui) to help visualize and understand how resources are consumed. +Scalene provides suggestions on which portions of your code to troubleshoot through the web interface. +Scalene can also be configured to work with [OpenAI](https://en.wikipedia.org/wiki/OpenAI) [LLM's](https://en.wikipedia.org/wiki/Large_language_model) by way of a an [OpenAI API provided by the user](https://github.com/plasma-umass/scalene?tab=readme-ov-file#ai-powered-optimization-suggestions). + +### Python Package: Memray + +
+ + Memray provides the ability to create and view flamegraphs which show how memory was consumed as a procedure executed. + + +
+ Memray provides the ability to create and view flamegraphs which show how memory was consumed as a procedure executed. + +
+ +
+ + +[Memray](https://github.com/bloomberg/memray) is a Python package to track memory allocation within Python and compiled extension modules. +Memray provides a high-level way to investigate memory performance and adds visualizations such as [flamegraphs](https://www.brendangregg.com/flamegraphs.html) (which contextualization of [stack traces](https://en.wikipedia.org/wiki/Stack_trace) and memory allocations in one spot). +Memray seeks to provide a way to overcome challenges with tracking and understanding Python and other memory allocators (such as C, C++, or Rust libraries used in tandem with a Python process). + +## Concluding Thoughts + +It's worth mentioning that this article covers only a small fraction of how and what memory is as well as how Python might make use of it. +Hopefully it clarifies the process and provides a way to get started with investigating memory within the software you work with. +Wishing you the very best in your software journey with memory! +
Processed line of codeReference count
+
a_string = "cornucopia"
+
+
a_string: 1
+
reference_a_string = a_string
+
+
a_string: 2
+(Because a_string is now referenced twice.)
+
del reference_a_string
+
+
a_string: 1
+(Because the additional reference has been deleted.)
+
+ + + + + +
+ + + +
+ + + Previous post
+ + Tip of the Week: Codesgiving - Open-source Contribution Walkthrough + + +
+ + + Next post
+ + Navigating Dependency Chaos with Lockfiles + + +
+
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/2024/02/20/Navigating-Dependency-Chaos-with-Lockfiles.html b/preview/pr-36/2024/02/20/Navigating-Dependency-Chaos-with-Lockfiles.html new file mode 100644 index 0000000000..dbd8565f95 --- /dev/null +++ b/preview/pr-36/2024/02/20/Navigating-Dependency-Chaos-with-Lockfiles.html @@ -0,0 +1,881 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Navigating Dependency Chaos with Lockfiles | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Navigating Dependency Chaos with Lockfiles

+ + + + + + + + +
+ + + + + +
+ + + +

Navigating Dependency Chaos with Lockfiles

+ +
+ + +
+ +

These blog posts are intended to provide software tips, concepts, and tools geared towards helping you achieve your goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any questions or suggestions for blog posts, please don’t hesitate to reach out!

+ +
+
+ +

Introduction

+ + +

Writing software often entails using code from other people to solve common challenges and take advantage of existing work. +External software used by a specific project can be called a “dependency” (the software “depends” on that external work to accomplish tasks). +Collections of software are oftentimes made available as “packages” through various platforms. +Package management for dependencies, the task of managing collections of dependencies for a specific project, is a specialized area of software development that can involve the use of unique tools and files. +This article will cover package dependency management through special files generally referred to as “lockfiles”. +

+ +

Why use dependencies?

+ +
+ + 'Reinvent the Wheel' comic by Randall Munroe, XKCD. + + +
+ ‘Reinvent the Wheel’ comic by Randall Munroe, XKCD. + +
+ +
+ +

There are various advantages to using packaged dependencies in your projects. +Using existing work this way practices a collective “don’t repeat yourself [or ourselves]” (DRY) among the global community of software developers to avoid reinventing the wheel. +Using dependencies allows us to make explicit decisions about the specific focus, or context, which the project will prioritize. +While it’s oftentimes easy to include and use dependencies in a project they come with risks that are important to consider.

+ +

See below for a rough list of reasons why one might opt to use specific dependencies in a project:

+ +
    +
  1. Solutions which entail a lot of edge cases (particularly error prone).
  2. +
  3. Solutions which need constant maintenance, i.e. a “frequently moving targets”.
  4. +
  5. Solutions which require special domain knowledge or training to correctly implement.
  6. +
+ +

A common dependency which demonstrates these aspects are those which assist with datetimes, timezones, and time deltas.

+ +

The dependency wilderness

+ + + + +

+ +

Dependencies are often on their own unpredictable schedule outside of your project’s control.

+ +

Using existing software package dependencies helps conserve resources but comes with unique challenges related to unpredictability (such as when those dependencies are updated). +This unpredictability can sometimes result in what’s colloquially called “dependency hell” or “dependency chaos”, where for example multiple external dependencies conflict with one another and are unable to be automatically resolved (among other issues). +These challenges can be especially frustrating due to when they occur (often outside of our personal schedule awareness) and how long they can take to debug (finding fixes sometimes entails costly trial-and-error). +It can feel like walking through a forest at night without a flashlight, constantly tripping over roots or running into stumps and branches!

+ +

Illuminating the dependency thicket

+ +

+ +

Software dependency choices may be understood through careful consideration between the cost of internal overwhelming invention vs external dependency chaos.

+ +

Dependency chaos can sometimes lead to “not invented here syndrome” where there’s less trust in external-facing work outside of an individual or group of people. +When or if this happens it can be important to understand dependencies as a scale of choices between overwhelming invention and infinite dependency chaos. +For example, to accomplish a small project it may not be wise to create a brand new programming language (towards the extreme of overwhelming invention). +On the other hand, if we depended upon all existing work within a certain context the solution may not be specialized, efficient, or resourceful enough to meet the goals within a reasonable amount of time.

+ + + +
+mindmap
+  root((Project))
+    Data storage
+      File 1
+      Database 2
+    Data x processing
+      Package X
+      Package Y
+    Integration
+      Solution A
+      Platform B
+
+ +

Dependency awareness and opportunity can be grouped into concerns and documented as part of a literature review (seen here as a mind map).

+ +

It can be helpful to reconsider existing knowledge on a topic area through formal or informal literature review (understanding that code within software is a type of literature) when thinking about the scale of decisions mentioned above. +Outlining existing work through a literature review can help with second-order thinking revision where we might benefit from reflecting on dependency decision-making again after an initial (first-order) creative process. +Each potential dependency discovered through this process can be organized using separation of concerns (SoC) under specific concern labels, or a general set of information which affects related code. +Include dependencies within your project which will helpfully limit the code produced (or SoC sections) thereby reducing the overall amount of concerns the project must maintain.

+ +

+ +

Bounded contexts along with shared or distinct components can be used to help limit the complexity of a project in helpful ways.

+ +

The concept of bounded context from domain-driven design can sometimes be used to help distinguish what is in or out of scope for a particular project as a way of reducing complexity. +Bounded context can be used as a way to draw abstract lines around a certain span of control in order to align available resources (like time and people) with the focus of the project. +It also can help promote loose coupling of software components in order to enable flexible design over time. +Without these considerations and the use of dependencies we might face “endless” software feature creep by continually adding new bounded contexts that are outside of our span of control (or resources).

+ +

Version constraints as dependency specification control

+ + + + + + + + + + + + + + + + + + + + + + +
Version constraintDescription of the version constraint
==2.1.0Exactly and only version 2.1.0
>=2.0.0Greater than or equal to version 2.0.0
>=2.0.0, <3.0.0Greater than or equal to version 2.0.0 and less than 3.0.0
>=2.0.0, <3.0.0, !=2.5.1Greater than or equal to version 2.0.0, less than 3.0.0, and anything that’s not exactly version 2.5.1
+ +

Version constraint specifications provide code-based descriptions for dependency versions within your project (Pythonic version specification examples above).

+ +

Many aspects of dependency chaos arise from the fact that dependencies are updated at various times. +We often want to make certain we use the most up-to-date version of a dependency because those updates may come with performance, corrective, security, or other benefits. +To accomplish this we can use what are sometimes called dependency “version range constraints” or “compliant version specifications” to provide some flexibility in how packages are installed for our projects. +Version ranges are usually preferred to help keep software projects updated and also allow for flexible dependency resolutions (for example, when a single dependency is required by multiple other dependencies). +These are often specific to the package management system and programming language being used. +See the Python Packaging Authority’s Version Specifiers section for an example of how these version constraints work.

+ +

Many version specification constraints build upon ideas from semantic versioning (SemVer). +Generally, SemVer uses a dotted three number syntax which includes a major, minor, and patch version separated by periods. +For example, a SemVer 1.2.3 represents major version 1, minor version 2, patch 3. +Developers may use of this type of specification to help differentiate the various releases of their software and help build user confidence about expected operations. +See the Semantic Versioning specification at https://semver.org/ for more information about how SemVer works.

+ +

Version constraints can still be chaotic

+ +

+ +

Unintentional failures can occur due to timeline variations between internal projects and external dependencies.

+ +

We sometimes require repeatable behavior to be productive with a project in addition to the flexibility of version range specifications. +For example, we may want for each developer and continuous integration step to have reproducible environments even if a dependency gets updated while internal development takes place. +Dependency version constraints oftentimes aren’t enough on their own to prevent reproducibility issues from occurring. +See the above diagram for a timeline depicting how Developer B and Developer D may have different experiences despite best efforts with version constraints (Dependency A may make a release that fits the version constraint but breaks Project C when Developer D tries to modify unrelated code).

+ +

Lockfiles for reproducible version constraint behavior

+ +

+ +

Version constraint lockfiles provide one way to ensure reproducible behaviors within your projects. +Lockfiles are usually recommended to be included in source control, so one always has a complete snapshot (short of the literal full source code of the dependencies) of the project’s last known working configuration.

+ +

Lockfiles usually have the following characteristics (this varies by programming language and dependency type):

+ + + +

See the above modified timeline for Developer B and Developer D to better understand how their project will benefit from a shared lockfile and reproducible dependency installations.

+ +

Pythonic Example

+ + + + + + + + + + + + + + + + + + + + + + +
Python Poetry command usedDescription of what occurs
poetry add pandas +
    +
  • Adds a caret-based version constraint specification based on the latest release (for example ^2.2.1) within a pyproject.toml file. This version constraint can be understood as >= 2.2.1, < 2.3.0.
  • +
  • Create or update the poetry.lock lockfile with known compatible versions of Pandas based on the version constraint mentioned above.
  • +
  • Installs the version of Pandas which matches the pyproject.toml and poetry.lock specifications.
  • +
+
poetry installInstalls the version of Pandas which matches the pyproject.toml and poetry.lock specifications (for example, within a new environment or for another developer).
poetry update pandas +
    +
  • Poetry checks for available Pandas releases which are compatible with the version constraint (for ex. ^2.2.1).
  • +
  • If there are new versions available which match the constraint, Poetry will update the poetry.lock lockfile and install the matching version.
  • +
+
poetry lock +
    +
  • Update all dependencies referenced in the poetry.lock lockfile with the latest compatible versions based on the version constraints specified within the pyproject.toml.
  • +
  • Optionally, if the --no-update flag is also used, refresh the dependency versions referenced within the poetry.lock lockfile based on version constraints specified within the pyproject.toml without seeking updated dependency releases.
  • +
+
+ +

Use Poetry commands to implement dependency version constraints and lockfiles for reproducible Python project environments.

+ +

Poetry is a Python packaging and dependency management tool which implements version constraints and lockfiles to help developers maintain their software projects. +Using commands like poetry add ... and poetry lock automatically creates poetry.lock lockfiles based on specifications which are added either automatically or manually to pyproject.toml files. +Similar to other tools, Poetry can operate with or without poetry.lock lockfiles (see here for more information). +Another alternative to Poetry which makes use of lockfiles is PDM (pdm.lock files).

+ +

Avoiding over-constrained dependencies

+ +

+ +

Automated dependency checking tools like Dependabot or Renovate can be used to reduce project risk through timely dependency update changes assisted by human reviewers.

+ +

Using dependency version constraints and lockfiles are helpful for reproducibility but imply a risk of over-constraint. +Two important over-constraint considerations are:

+ + + +

Make sure to address these risks by routinely considering whether your dependencies need to be updated (manually) or through the use of automated tools like GitHub’s Dependabot or Mend Renovate. +Tools like Dependabot or Renovate enable scheduled checks and updates to be applied to your project which can lead to a balanced way of ensuring risk reduction and productive future-focused development.

+ +

Concluding Thoughts

+ +

This article covered why dependencies are used, what complications they come with, and some tools to use addressing those challenges. +Every project can vary quite a bit when it comes to dependency management decision making and maintenance. +We hope you find success with dependency management through these and look forward to providing more information on this topic in the future.

+
+ + + + + +
+ + + +
+ + + Previous post
+ + Python Memory Management and Troubleshooting + + +
+ + + Next post
+ + Parquet: Crafting Data Bridges for Efficient Computation + + +
+
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/2024/03/25/Parquet-Crafting-Data-Bridges-for-Efficient-Computation.html b/preview/pr-36/2024/03/25/Parquet-Crafting-Data-Bridges-for-Efficient-Computation.html new file mode 100644 index 0000000000..5574d0e3e8 --- /dev/null +++ b/preview/pr-36/2024/03/25/Parquet-Crafting-Data-Bridges-for-Efficient-Computation.html @@ -0,0 +1,1041 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Parquet: Crafting Data Bridges for Efficient Computation | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Parquet: Crafting Data Bridges for Efficient Computation

+ + + + + + + + +
+ + + + + +
+ + + +

Parquet: Crafting Data Bridges for Efficient Computation

+ +
+ + +
+ +

These blog posts are intended to provide software tips, concepts, and tools geared towards helping you achieve your goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any questions or suggestions for blog posts, please don’t hesitate to reach out!

+ +
+
+ +

Introduction

+ +
+ + figure image + + +
+ +

(Image: Vulphere, Wikimedia Commons)

+ + +

Apache Parquet is a columnar and strongly-typed tabular data storage format built for scalable processing which is widely compatible with many data models, programming languages, and software systems. +Parquet files (typically denoted with a .parquet filename extension) are typically compressed within the format itself and are often used in embedded or cloud-based high-performance scenarios. +It has grown in popularity since it was introduced in 2013 and is used as a core data storage technology in many organizations. +This article will introduce the Parquet format from a research data engineering perspective. +

+ +

Understanding the Parquet file format

+ +
+ + figure image + + +
+ +

(Image: Robert Fischbacher163, Wikimedia Commons)

+ +

Parquet began around 2013 as work by Twitter and Cloudera collaborators to help solve large data challenges (for example, in Apache Hadoop systems). +It was partially inspired by a Google Research publication: “Dremel: Interactive Analysis of Web-Scale Datasets”. +Parquet joined the Apache Software Foundation in 2015 as a Top-Level Project (TLP) (link) +The format is similar and has related goals to that of the ORC, Avro, and Feather file formats.

+ +

One definition for the word “parquet” is: “A wooden floor made of parquetry.” (Wiktionary: Parquet). +Parquetry are often used to form decorative geometric patterns in flooring. +It seems fitting to name the format this way due to how columns and values are structured (see more below), akin to constructing a beautiful ‘floor’ for your data efforts.

+ +

We cover a few pragmatic aspects of the Parquet file format below.

+ +

+ Columnar data storage

+ + + +
+ + Parquet organizes column values together. CSV intermixes values from multiple columns + + +
+ Parquet organizes column values together. CSV intermixes values from multiple columns + +
+ +
+ +

Parquet files store data in a “columnar” way which is distinct from other formats. +We can understand this columnar format by using plaintext comma-separated value (CSV) format as a reference point. +CSV files store data in a row-orientated way by using new lines to represent rows of values. +Reading all values of a single column in CSV often involves seeking through multiple other portions of the data by default.

+ +

Parquet files are binary in nature, optimizing storage by arranging values from individual columns in close proximity to each other. +This enables the data to be stored and retrieved more efficiently than possible with CSV files. +For example, Parquet files allow you to query individual columns without needing to traverse non-necessary column value data.

+ +

+ Parquet format abstractions

+ +

Row groups, column chunks, and pages

+ + + +

+ +

Parquet organizes data using row groups, columns, and pages.

+ +

Parquet files organize column data inside of row groups. +Each row group includes chunks of columns in the form of pages. +Row groups and column pages are configurable and may change depending on the configuration of your Parquet client. +Note: you don’t need to be an expert on these details to leverage and benefit from Parquet as these are often configured for default general purposes.

+ +

Page encodings

+ +

Pages within column chunks may have a number of different encodings. +Parquet encodings are often selected based on the type of data included within columns and the operational or performance needs associated with a project. +By default, Plain (PLAIN) encoding is used which means all values are stored back to back. +Another encoding type, Run Length Encoding (RLE), is often used to efficiently store columns with many consecutively repeated values. +Column encodings are sometimes set for each individual column, usually in an automatic way based on the data involved.

+ +

+ Compression

+ +
import os
+import pyarrow as pa
+from pyarrow import parquet
+
+# create a pyarrow table
+table = pa.Table.from_pydict(
+    {
+        "A": [1, 2, 3, 4, 5],
+        "B": ["foo", "bar", "baz", "qux", "quux"],
+        "C": [0.1, 0.2, 0.3, 0.4, 0.5],
+    }
+)
+
+# Write Parquet file with Snappy compression
+parquet.write_table(table=table, where="example.snappy.parquet", compression="SNAPPY")
+
+# Write Parquet file with Zstd compression
+parquet.write_table(table=table, where="example.zstd.parquet", compression="ZSTD")
+
+ +

Parquet files can be compressed as they’re written using parameters.

+ +

Parquet files may leverage compression to help reduce file size and increase data read performance. +Compression is applied at the page level, combining benefits from various encodings. +Data stored through Parquet is usually compressed when it is written, denoting the compression type through the filename (for example: filename.snappy.parquet). +Snappy is often used as a common compression algorithm for Parquet data. +Brotli, Gzip, ZSTD, LZ4 are also sometimes used. +It’s worth exploring what compression works best for the data and systems you use (for example, ZSTD compression may hold benefits).

+ +

+ “Strongly-typed” data

+ +
import pyarrow as pa
+from pyarrow import parquet
+
+# create a pyarrow table
+table = pa.Table.from_pydict(
+    {
+        "A": [1, 2, 3],
+        "B": ["foo", "bar", 1],
+        "C": [0.1, 0.2, 0.3],
+    }
+)
+
+# write the pyarrow table to a parquet file
+parquet.write_table(table=table, where="example.parquet")
+
+# raises exception:
+# ArrowTypeError: Expected bytes, got a 'int' object (for column B)
+# Note: while this is an Arrow in-memory data exception, it also
+# prevents us from attempting to perform incompatible operations
+# within the Parquet file.
+
+ +

Data value must be all of the same type within a Parquet column.

+ +

Data within Parquet is “strongly-typed”; specific data types (such as integer, string, etc.) are associated with each column, and thus value. +Attempting to store a data value type which does not match the column data type will usually result in an error. +This can lead to performance and compression benefits due to how quickly Parquet readers can determine the data type. +Strongly-typed data also embeds a kind of validation directly inside your work (data errors “shift left” and are often discovered earlier). +See here for more on data quality validation topics we’ve written about.

+ +

+ Complex data handling

+ +
import pyarrow as pa
+from pyarrow import parquet
+
+# create a pyarrow table with complex data types
+table = pa.Table.from_pydict(
+    {
+        "A": [{"key1": "val1"}, {"key2": "val2"}],
+        "B": [[1, 2], [3, 4]],
+        "C": [
+            bytearray("😊".encode("utf-8")),
+            bytearray("🌻".encode("utf-8")),
+        ],
+    }
+)
+
+# write the pyarrow table to a parquet file
+parquet.write_table(table=table, where="example.parquet")
+
+# read the schema of the parquet file
+print(parquet.read_schema(where="example.parquet"))
+
+# prints:
+# A: struct<key1: string, key2: string>
+#   child 0, key1: string
+#   child 1, key2: string
+# B: list<element: int64>
+#   child 0, element: int64
+# C: binary
+
+ +

Parquet file columns may contain complex data types such as nested types (lists, dictionaries) and byte arrays.

+ +

Parquet files may store many data types that are complicated or impossible to store in other formats. +For example, images may be stored using the byte array storage type. +Nested data may be stored using LIST or MAP logical types. +Dates or times may be stored using various temporal data types. +Oftentimes, complex data conversion within Parquet files is already implemented (for example, in PyArrow).

+ +

+ Metadata

+ +
import pyarrow as pa
+from pyarrow import parquet
+
+# create a pyarrow table
+table = pa.Table.from_pydict(
+    {
+        "A": [1, 2, 3],
+        "B": ["foo", "bar", "baz"],
+        "C": [0.1, 0.2, 0.3],
+    }
+)
+
+# add custom metadata to table
+table = table.replace_schema_metadata(metadata={"data-producer": "CU DBMI SET Blog"})
+
+# write the pyarrow table to a parquet file
+parquet.write_table(table=table, where="example.snappy.parquet", compression="SNAPPY")
+
+# read the schema
+print(parquet.read_schema(where="example.snappy.parquet"))
+
+# prints
+# A: int64
+# B: string
+# C: double
+# -- schema metadata --
+# data-producer: 'CU DBMI SET Blog'
+
+ +

Metadata are treated as a distinct and customizable components of Parquet files.

+ +

The Parquet format treats data about the data (metadata) separately from that of column value data. +Parquet metadata includes column names, data types, compression, various statistics about the file, and custom fields (in key-value form). +This metadata may be read without reading column value data which can assist with data exploration tasks (especially if the data are large).

+ +

+ Multi-file “datasets”

+ +
import pathlib
+import pyarrow as pa
+from pyarrow import parquet
+
+pathlib.Path("./dataset").mkdir(exist_ok=True)
+
+# create pyarrow tables
+table_1 = pa.Table.from_pydict({"A": [1]})
+table_2 = pa.Table.from_pydict({"A": [2, 3]})
+
+# write the pyarrow table to parquet files
+parquet.write_table(table=table_1, where="./dataset/example_1.parquet")
+parquet.write_table(table=table_2, where="./dataset/example_2.parquet")
+
+# read the parquet dataset
+print(parquet.ParquetDataset("./dataset").read())
+
+# prints (note that, for ex., [1] is a row group of column A)
+# pyarrow.Table
+# A: int64
+# ----
+# A: [[1],[2,3]]
+
+ +

Parquet datasets may be composed of one or many individual Parquet files.

+ +

Parquet files may be used individually or treated as a “dataset” through file groups which include the same schema (column names and types). +This means you can store “chunks” of Parquet-based data in one or many files and provides opportunities for intermixing or extending data. +When reading Parquet data this way libraries usually use the directory as a way to parse all files as a single dataset. +Multi-file datasets mean you gain the ability to store arbitrarily large amounts of data by sidestepping, for example, inode limitations.

+ +

+ Apache Arrow memory format integration

+ +
import pathlib
+import pyarrow as pa
+from pyarrow import parquet
+
+# create a pyarrow table
+table = pa.Table.from_pydict(
+    {
+        "A": [1, 2, 3],
+        "B": ["foo", "bar", "baz"],
+        "C": [0.1, 0.2, 0.3],
+    }
+)
+
+# write the pyarrow table to a parquet file
+parquet.write_table(table=table, where="example.parquet")
+
+# show schema of table and parquet file
+print(table.schema.types)
+print(parquet.read_schema("example.parquet").types)
+
+# prints
+# [DataType(int64), DataType(string), DataType(double)]
+# [DataType(int64), DataType(string), DataType(double)]
+
+ +

Parquet file and Arrow data types are well-aligned.

+ +

The Parquet format has robust support and integration with the Apache Arrow memory format. +This enables consistency across Parquet integration and how the data are read using various programming languages (the Arrow memory format is relatively uniform across these).

+ +

Performance with Parquet

+ +

Parquet files often outperforms traditional formats due to how it is designed. +Other data file formats may vary in performance contingent on specific configurations and system integration. +We urge you to perform your own testing to find out what works best for your circumstances. +See below for a list of references which compare Parquet to other formats.

+ + + +

How can you use Parquet?

+ +

The Parquet format is common in many data management platforms and libraries. +Below are a list of just a few popular places where you can use Parquet.

+ + + +

Concluding Thoughts

+ +

This article covered the Parquet file format including notable features and usage. +Thank you for joining us on this exploration of Parquet. +We appreciate your support, hope the content here helps with your data decisions, and look forward to continuing the exploration of data formats in future posts.

+
+ + + + + +
+ + + +
+ + + Previous post
+ + Navigating Dependency Chaos with Lockfiles + + +
+ + + Next post
+ + Leveraging Kùzu and Cypher for Advanced Data Analysis + + +
+
+
+ + +
+ + + + + + + diff --git "a/preview/pr-36/2024/05/24/Leveraging-K\303\271zu-and-Cypher-for-Advanced-Data-Analysis.html" "b/preview/pr-36/2024/05/24/Leveraging-K\303\271zu-and-Cypher-for-Advanced-Data-Analysis.html" new file mode 100644 index 0000000000..c0b24db107 --- /dev/null +++ "b/preview/pr-36/2024/05/24/Leveraging-K\303\271zu-and-Cypher-for-Advanced-Data-Analysis.html" @@ -0,0 +1,890 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Leveraging Kùzu and Cypher for Advanced Data Analysis | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Leveraging Kùzu and Cypher for Advanced Data Analysis

+ + + + + + + + +
+ + + + + +
+ + + +

Leveraging Kùzu and Cypher for Advanced Data Analysis

+ +
+ + +
+ +

These blog posts are intended to provide software tips, concepts, and tools geared towards helping you achieve your goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any questions or suggestions for blog posts, please don’t hesitate to reach out!

+ +
+
+ +

Introduction

+ +
+ + (Image sourced from https://github.com/kuzudb/kuzu.) + + +
+ (Image sourced from https://github.com/kuzudb/kuzu.) + +
+ +
+ + + +

Graph databases can offer a more natural and intuitive way to model and explore relationships within data. +In this post, we’ll dive into the world of graph databases, focusing on Kùzu, an embedded graph database and query engine for a number of languages, and Cypher, a powerful query language designed for graph data. +We’ll explore how these tools can transform your data management and analysis workflows, provide insights into their capabilities, and discuss when it might be more appropriate to use server-based solutions. +Whether you’re a research software developer looking to integrate advanced graph processing into your applications or simply curious about the benefits of graph databases, this guide will equip you with the knowledge to harness the full potential of graph data.

+ + + +

Tabular Data

+ +
+ + Tabular data is made up up rows (or records) and columns. + + +
+ Tabular data is made up up rows (or records) and columns. + +
+ +
+ +

Data are often stored in a table, or tabular format, where information is organized into rows and columns. +Each row represents a single record and each column represents attributes of that record. +Tables are particularly effective for storing and querying large volumes of data with a fixed set of columns and data types. +Despite its versatility, tabular data can become cumbersome when dealing with complex relationships and interconnected data, where a graph-based approach might be more suitable.

+ +

Graph Data

+ +
+ + Graph data is made up of nodes and edges. + + +
+ Graph data is made up of nodes and edges. + +
+ +
+ +

Graph data represents information in the form of nodes (also called vertices) and edges (connections between nodes). +This structure is useful for modeling complex relationships and interconnected data, such as social networks, biological networks, and transportation systems. +Unlike tabular data, which is often “flattened” (treating multidimensional data as singular columns) and often rigid (requiring all new data to conform to a specific schema), graph data allows for more flexible and dynamic representations.

+ +
+ + Nodes and edges may have properties in a graph. + + +
+ Nodes and edges may have properties in a graph. + +
+ +
+ +

Nodes and edges act like different kinds of tabular records within the context of graphs. +Nodes and edges can also have properties (attributes) which further provide description to a graph. +Properties are akin to columns of a particular record in tabular formats which help describe a certain record (or node). +Graph data models are particularly useful for exploring connections, performing path analysis, and uncovering patterns that may require more transformation in tabular formats.

+ +

Graph Databases

+ +
+ + Graph databases store graph data. + + +
+ Graph databases store graph data. + +
+ +
+ +

Graph databases are specialized databases designed to store, query, and manage graph data efficiently. +They use graph structures for semantic queries, with nodes, edges, and properties being stored directly in the database. +Unlike traditional relational databases that use tables, graph databases leverage the natural relationships in the data, allowing for faster retrieval and sometimes more intuitive querying of interconnected information. +This makes them ideal for applications involving complex relationships, such as social networks, supply chain management, and knowledge graphs. +Graph databases support various query languages and algorithms optimized for traversing and analyzing graph structures.

+ +

Graph Database Querying

+ +
+ + Graph data are typically queried using specialized languages such as Cypher. + + +
+ Graph data are typically queried using specialized languages such as Cypher. + +
+ +
+ +

Graph database querying involves using specialized query languages to retrieve and manipulate graph data. +Unlike SQL, which often is used for tabular databases, graph databases use languages like Cypher, Gremlin, and SPARQL, which are designed to handle graph-specific operations. +These languages allow users to perform complex queries that traverse the graph, find paths between nodes, filter based on properties, and analyze relationships. +Querying in graph databases can be highly efficient due to their ability to leverage the inherent structure of the graph, enabling fast execution of complex queries that would be cumbersome and slow in a relational database.

+ +

Cypher Query Language

+ +
MATCH (p:Person {name: 'Alice'})-[:FRIEND]->(friend)
+RETURN friend.name, friend.age
+
+ +

This query finds nodes labeled “Person” with the name “Alice” and returns the names and ages of nodes connected to Alice by a “FRIEND” relationship.

+ +

Cypher is a powerful, declarative graph query language designed specifically for querying and updating graph databases. +Originally developed for Neo4j (one of the most popular graph databases), it is known for its expressive and intuitive syntax that makes it easy to work with graph data. +Cypher allows users to perform complex queries using simple and readable patterns that resemble ASCII art, making it accessible to both developers and data scientists. +It supports a wide range of operations, including pattern matching, filtering, aggregation, and graph traversal, enabling efficient exploration and manipulation of graph structures. +For example, a basic Cypher query to find all nodes connected by a “FRIEND” relationship might look like this: MATCH (a)-[:FRIEND]->(b) RETURN a, b, which finds and returns pairs of nodes a and b where a is connected to b by a “FRIEND” relationship.

+ +

Kùzu

+ +
+ + Kùzu provides a database format and query engine accessible through Python and other languages by using Cypher queries. + + +
+ Kùzu provides a database format and query engine accessible through Python and other languages by using Cypher queries. + +
+ +
+ +

Kùzu is an embedded graph database and query engine designed to integrate seamlessly with Python, Rust, Node, C/C++, or Java software. +Kùzu is optimized for high performance and can handle complex graph queries with ease. +Querying graphs in Kùzu is performed through Cypher, providing transferrability of queries in multiple programming languages. +Kùzu also provides direct integration with export formats that allow for efficient data analysis or processing such as Pandas and Arrow. +Kùzu is particularly suitable for software developers who need to integrate graph database capabilities into their projects without the overhead of managing a separate database server.

+ +

Tabular and Graph Data Interoperation

+ +
+ + Kùzu uses tabular data as both input and output for data operations. + + +
+ Kùzu uses tabular data as both input and output for data operations. + +
+ +
+ +

Tabular data and graph data can sometimes be used in tandem in order to achieve software goals (one isn’t necessaryily better than the other or supposed to be used in isolation). +For example, Kùzu offers both data import and export to tabular formats to help with conversion and storage outside of a graph database. +This is especially helpful when working with tabular data as an input, when trying to iterate over large datasets in smaller chunks, or building integration paths to other pieces of software which aren’t Kùzu or graph data compatible.

+ +

Kùzu Tabular Data Imports

+ +
# portions of this content referenced
+# with modifications from:
+# https://docs.kuzudb.com/import/parquet/
+import pandas as pd
+import kuzu
+
+# create parquet-based data for import into kuzu
+pd.DataFrame(
+    {"name": ["Adam", "Adam", "Karissa", "Zhang"],
+     "age": [30, 40, 50, 25]}
+).to_parquet("user.parquet")
+pd.DataFrame(
+    {
+        "from": ["Adam", "Adam", "Karissa", "Zhang"],
+        "to": ["Karissa", "Zhang", "Zhang", "Noura"],
+        "since": [2020, 2020, 2021, 2022],
+    }
+).to_parquet("follows.parquet")
+
+# form a kuzu database connection
+db = kuzu.Database("./test")
+conn = kuzu.Connection(db)
+
+# use wildcard-based copy in case of multiple files
+# copy node data
+conn.execute('COPY User FROM "user*.parquet";')
+# copy edge data
+conn.execute('COPY Follows FROM "follows*.Parquet";')
+
+df = conn.execute(
+    """MATCH (a:User)-[f:Follows]->(b:User)
+    RETURN a.name, b.name, f.since;"""
+).get_as_df()
+
+ +

One way to create graphs within Kùzu is to import data from tabular datasets. +Kùzu provides functionality to convert tabular data from CSV, Parquet, or NumPy files into a graph. +This process enables seamless integration of tabular data sources into the graph database, providing the benefits of graph-based querying and analysis while leveraging the familiar structure and benefits of tabular data.

+ +

Kùzu Data Results and Exports

+ +
# portions of this content referenced 
+# with modifications from:
+# https://kuzudb.com/api-docs/python/kuzu.html
+import kuzu
+
+# form a kuzu database connection
+db = kuzu.Database("./test")
+conn = kuzu.Connection(db)
+
+query = "MATCH (u:User) RETURN u.name, u.age;"
+
+# run query and return Pandas DataFrame
+pd_df = conn.execute(query).get_as_df()
+
+# run query and return Polars DataFrame
+pl_df = conn.execute(query).get_as_pl()
+
+# run query and return PyArrow Table
+arrow_tbl = conn.execute(query).get_as_arrow()
+
+# run query and return PyTorch Geometric Data
+pyg_d = conn.execute(query).get_as_torch_geometric()
+
+# run query within COPY to export directly to file
+conn.execute("COPY (MATCH (u:User) return u.*) TO 'user.parquet';")
+
+ +

Kùzu also is flexible when it comes to receiving data from Cypher queries. +After performing a query you have the option to use a number of methods to automatically convert into various in-memory data formats, for example, Pandas DataFrames, Polars DataFrames, PyTorch Geometric (PyG) Data, or PyArrow Tables. +There are also options to export data directly to CSV or Parquet files for times where file-based data is preferred.

+ +

Concluding Thoughts

+ +

Kùzu, with its seamless integration into Python environments and efficient handling of graph data, presents a compelling solution for developers seeking embedded graph database capabilities. +Its ability to transform and query tabular data into rich graph structures opens up new possibilities for data analysis and application development. +However, it’s important to consider the scale and specific needs of your project when choosing between Kùzu and more robust server-based solutions like Neo4j. +By leveraging the right tool for the right job, whether it’s Kùzu for lightweight embedded applications or a server-based database for enterprise-scale operations, developers can unlock the full potential of graph data. Embracing these technologies allows for deeper insights, more complex data relationships, and ultimately, more powerful and efficient applications.

+
+ + + + + +
+ + + +
+ + + Previous post
+ + Parquet: Crafting Data Bridges for Efficient Computation + + +
+ + + +
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/404.html b/preview/pr-36/404.html new file mode 100644 index 0000000000..d07b2652ae --- /dev/null +++ b/preview/pr-36/404.html @@ -0,0 +1,481 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +404 | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+

+ Page Not Found

+ +

Try searching the whole site for the content you want:

+ +
+ + +
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/_scripts/anchors.js b/preview/pr-36/_scripts/anchors.js new file mode 100644 index 0000000000..58daabcbea --- /dev/null +++ b/preview/pr-36/_scripts/anchors.js @@ -0,0 +1,47 @@ +/* + creates link next to each heading that links to that section. +*/ + +{ + const onLoad = () => { + // for each heading + const headings = document.querySelectorAll( + "h1[id], h2[id], h3[id], h4[id], h5[id], h6[id]" + ); + for (const heading of headings) { + // create anchor link + const link = document.createElement("a"); + link.classList.add("icon", "fa-solid", "fa-link", "anchor"); + link.href = "#" + heading.id; + link.setAttribute("aria-label", "link to this section"); + heading.append(link); + + // if first heading in the section, move id to parent section + if (heading.matches("section > :first-child")) { + heading.parentElement.id = heading.id; + heading.removeAttribute("id"); + } + } + }; + + // scroll to target of url hash + const scrollToTarget = () => { + const id = window.location.hash.replace("#", ""); + const target = document.getElementById(id); + + if (!target) return; + const offset = document.querySelector("header").clientHeight || 0; + window.scrollTo({ + top: target.getBoundingClientRect().top + window.scrollY - offset, + behavior: "smooth", + }); + }; + + // after page loads + window.addEventListener("load", onLoad); + window.addEventListener("load", scrollToTarget); + window.addEventListener("tagsfetched", scrollToTarget); + + // when hash nav happens + window.addEventListener("hashchange", scrollToTarget); +} diff --git a/preview/pr-36/_scripts/dark-mode.js b/preview/pr-36/_scripts/dark-mode.js new file mode 100644 index 0000000000..b75b25eb24 --- /dev/null +++ b/preview/pr-36/_scripts/dark-mode.js @@ -0,0 +1,25 @@ +/* + manages light/dark mode. +*/ + +{ + // immediately load saved (or default) mode before page renders + document.documentElement.dataset.dark = + window.localStorage.getItem("dark-mode") ?? "false"; + + const onLoad = () => { + // update toggle button to match loaded mode + document.querySelector(".dark-toggle").checked = + document.documentElement.dataset.dark === "true"; + }; + + // after page loads + window.addEventListener("load", onLoad); + + // when user toggles mode button + window.onDarkToggleChange = (event) => { + const value = event.target.checked; + document.documentElement.dataset.dark = value; + window.localStorage.setItem("dark-mode", value); + }; +} diff --git a/preview/pr-36/_scripts/fetch-tags.js b/preview/pr-36/_scripts/fetch-tags.js new file mode 100644 index 0000000000..c843b67fdc --- /dev/null +++ b/preview/pr-36/_scripts/fetch-tags.js @@ -0,0 +1,67 @@ +/* + fetches tags (aka "topics") from a given GitHub repo and adds them to row of + tag buttons. specify repo in data-repo attribute on row. +*/ + +{ + const onLoad = async () => { + // get tag rows with specified repos + const rows = document.querySelectorAll("[data-repo]"); + + // for each repo + for (const row of rows) { + // get props from tag row + const repo = row.dataset.repo.trim(); + const link = row.dataset.link.trim(); + + // get tags from github + if (!repo) continue; + let tags = await fetchTags(repo); + + // filter out tags already present in row + let existing = [...row.querySelectorAll(".tag")].map((tag) => + window.normalizeTag(tag.innerText) + ); + tags = tags.filter((tag) => !existing.includes(normalizeTag(tag))); + + // add tags to row + for (const tag of tags) { + const a = document.createElement("a"); + a.classList.add("tag"); + a.innerHTML = tag; + a.href = `${link}?search="tag: ${tag}"`; + a.dataset.tooltip = `Show items with the tag "${tag}"`; + row.append(a); + } + + // delete tags container if empty + if (!row.innerText.trim()) row.remove(); + } + + // emit "tags done" event for other scripts to listen for + window.dispatchEvent(new Event("tagsfetched")); + }; + + // after page loads + window.addEventListener("load", onLoad); + + // GitHub topics endpoint + const api = "https://api.github.com/repos/REPO/topics"; + const headers = new Headers(); + headers.set("Accept", "application/vnd.github+json"); + + // get tags from GitHub based on repo name + const fetchTags = async (repo) => { + const url = api.replace("REPO", repo); + try { + const response = await (await fetch(url)).json(); + if (response.names) return response.names; + else throw new Error(JSON.stringify(response)); + } catch (error) { + console.groupCollapsed("GitHub fetch tags error"); + console.log(error); + console.groupEnd(); + return []; + } + }; +} diff --git a/preview/pr-36/_scripts/search.js b/preview/pr-36/_scripts/search.js new file mode 100644 index 0000000000..fa23ca4c21 --- /dev/null +++ b/preview/pr-36/_scripts/search.js @@ -0,0 +1,215 @@ +/* + filters elements on page based on url or search box. + syntax: term1 term2 "full phrase 1" "full phrase 2" "tag: tag 1" + match if: all terms AND at least one phrase AND at least one tag +*/ +{ + // elements to filter + const elementSelector = ".card, .citation, .post-excerpt"; + // search box element + const searchBoxSelector = ".search-box"; + // results info box element + const infoBoxSelector = ".search-info"; + // tags element + const tagSelector = ".tag"; + + // split search query into terms, phrases, and tags + const splitQuery = (query) => { + // split into parts, preserve quotes + const parts = query.match(/"[^"]*"|\S+/g) || []; + + // bins + const terms = []; + const phrases = []; + const tags = []; + + // put parts into bins + for (let part of parts) { + if (part.startsWith('"')) { + part = part.replaceAll('"', "").trim(); + if (part.startsWith("tag:")) + tags.push(normalizeTag(part.replace(/tag:\s*/, ""))); + else phrases.push(part.toLowerCase()); + } else terms.push(part.toLowerCase()); + } + + return { terms, phrases, tags }; + }; + + // normalize tag string for comparison + window.normalizeTag = (tag) => + tag.trim().toLowerCase().replaceAll(/-|\s+/g, " "); + + // get data attribute contents of element and children + const getAttr = (element, attr) => + [element, ...element.querySelectorAll(`[data-${attr}]`)] + .map((element) => element.dataset[attr]) + .join(" "); + + // determine if element should show up in results based on query + const elementMatches = (element, { terms, phrases, tags }) => { + // tag elements within element + const tagElements = [...element.querySelectorAll(".tag")]; + + // check if text content exists in element + const hasText = (string) => + ( + element.innerText + + getAttr(element, "tooltip") + + getAttr(element, "search") + ) + .toLowerCase() + .includes(string); + // check if text matches a tag in element + const hasTag = (string) => + tagElements.some((tag) => normalizeTag(tag.innerText) === string); + + // match logic + return ( + (terms.every(hasText) || !terms.length) && + (phrases.some(hasText) || !phrases.length) && + (tags.some(hasTag) || !tags.length) + ); + }; + + // loop through elements, hide/show based on query, and return results info + const filterElements = (parts) => { + let elements = document.querySelectorAll(elementSelector); + + // results info + let x = 0; + let n = elements.length; + let tags = parts.tags; + + // filter elements + for (const element of elements) { + if (elementMatches(element, parts)) { + element.style.display = ""; + x++; + } else element.style.display = "none"; + } + + return [x, n, tags]; + }; + + // highlight search terms + const highlightMatches = async ({ terms, phrases }) => { + // make sure Mark library available + if (typeof Mark === "undefined") return; + + // reset + new Mark(document.body).unmark(); + + // limit number of highlights to avoid slowdown + let counter = 0; + const filter = () => counter++ < 100; + + // highlight terms and phrases + new Mark(elementSelector) + .mark(terms, { separateWordSearch: true, filter }) + .mark(phrases, { separateWordSearch: false, filter }); + }; + + // update search box based on query + const updateSearchBox = (query = "") => { + const boxes = document.querySelectorAll(searchBoxSelector); + + for (const box of boxes) { + const input = box.querySelector("input"); + const button = box.querySelector("button"); + const icon = box.querySelector("button i"); + input.value = query; + icon.className = input.value.length + ? "icon fa-solid fa-xmark" + : "icon fa-solid fa-magnifying-glass"; + button.disabled = input.value.length ? false : true; + } + }; + + // update info box based on query and results + const updateInfoBox = (query, x, n) => { + const boxes = document.querySelectorAll(infoBoxSelector); + + if (query.trim()) { + // show all info boxes + boxes.forEach((info) => (info.style.display = "")); + + // info template + let info = ""; + info += `Showing ${x.toLocaleString()} of ${n.toLocaleString()} results
`; + info += "Clear search"; + + // set info HTML string + boxes.forEach((el) => (el.innerHTML = info)); + } + // if nothing searched + else { + // hide all info boxes + boxes.forEach((info) => (info.style.display = "none")); + } + }; + + // update tags based on query + const updateTags = (query) => { + const { tags } = splitQuery(query); + document.querySelectorAll(tagSelector).forEach((tag) => { + // set active if tag is in query + if (tags.includes(normalizeTag(tag.innerText))) + tag.setAttribute("data-active", ""); + else tag.removeAttribute("data-active"); + }); + }; + + // run search with query + const runSearch = (query = "") => { + const parts = splitQuery(query); + const [x, n] = filterElements(parts); + updateSearchBox(query); + updateInfoBox(query, x, n); + updateTags(query); + highlightMatches(parts); + }; + + // update url based on query + const updateUrl = (query = "") => { + const url = new URL(window.location); + let params = new URLSearchParams(url.search); + params.set("search", query); + url.search = params.toString(); + window.history.replaceState(null, null, url); + }; + + // search based on url param + const searchFromUrl = () => { + const query = + new URLSearchParams(window.location.search).get("search") || ""; + runSearch(query); + }; + + // return func that runs after delay + const debounce = (callback, delay = 250) => { + let timeout; + return (...args) => { + window.clearTimeout(timeout); + timeout = window.setTimeout(() => callback(...args), delay); + }; + }; + + // when user types into search box + const debouncedRunSearch = debounce(runSearch, 1000); + window.onSearchInput = (target) => { + debouncedRunSearch(target.value); + updateUrl(target.value); + }; + + // when user clears search box with button + window.onSearchClear = () => { + runSearch(); + updateUrl(); + }; + + // after page loads + window.addEventListener("load", searchFromUrl); + // after tags load + window.addEventListener("tagsfetched", searchFromUrl); +} diff --git a/preview/pr-36/_scripts/site-search.js b/preview/pr-36/_scripts/site-search.js new file mode 100644 index 0000000000..caff0a611f --- /dev/null +++ b/preview/pr-36/_scripts/site-search.js @@ -0,0 +1,14 @@ +/* + for site search component. searches site/domain via google. +*/ + +{ + // when user submits site search form/box + window.onSiteSearchSubmit = (event) => { + event.preventDefault(); + const google = "https://www.google.com/search?q=site:"; + const site = window.location.origin; + const query = event.target.elements.query.value; + window.location = google + site + " " + query; + }; +} diff --git a/preview/pr-36/_scripts/table-wrap.js b/preview/pr-36/_scripts/table-wrap.js new file mode 100644 index 0000000000..4c5bddd8c6 --- /dev/null +++ b/preview/pr-36/_scripts/table-wrap.js @@ -0,0 +1,25 @@ +/* + put a wrapper around each table to allow scrolling. +*/ + +{ + const onLoad = () => { + // for each top-level table + const tables = document.querySelectorAll("table:not(table table)"); + for (const table of tables) { + // create wrapper with scroll + const wrapper = document.createElement("div"); + wrapper.style.overflowX = "auto"; + + // undo css force-text-wrap + table.style.overflowWrap = "normal"; + + // add wrapper around table + table.parentNode.insertBefore(wrapper, table); + wrapper.appendChild(table); + } + }; + + // after page loads + window.addEventListener("load", onLoad); +} diff --git a/preview/pr-36/_scripts/tooltip.js b/preview/pr-36/_scripts/tooltip.js new file mode 100644 index 0000000000..49eccfc5b8 --- /dev/null +++ b/preview/pr-36/_scripts/tooltip.js @@ -0,0 +1,41 @@ +/* + shows a popup of text on hover/focus of any element with the data-tooltip + attribute. +*/ + +{ + const onLoad = () => { + // make sure Tippy library available + if (typeof tippy === "undefined") return; + + // get elements with non-empty tooltips + const elements = [...document.querySelectorAll("[data-tooltip]")].filter( + (element) => element.dataset.tooltip.trim() && !element._tippy + ); + + // add tooltip to elements + tippy(elements, { + content: (element) => element.dataset.tooltip.trim(), + delay: [200, 0], + offset: [0, 20], + allowHTML: true, + interactive: true, + appendTo: () => document.body, + aria: { + content: "describedby", + expanded: null, + }, + onShow: ({ reference, popper }) => { + const dark = reference.closest("[data-dark]")?.dataset.dark; + if (dark === "false") popper.dataset.dark = true; + if (dark === "true") popper.dataset.dark = false; + }, + // onHide: () => false, // debug + }); + }; + + // after page loads + window.addEventListener("load", onLoad); + // after tags load + window.addEventListener("tagsfetched", onLoad); +} diff --git a/preview/pr-36/_styles/-theme.css b/preview/pr-36/_styles/-theme.css new file mode 100644 index 0000000000..3a610a3d99 --- /dev/null +++ b/preview/pr-36/_styles/-theme.css @@ -0,0 +1,44 @@ +[data-dark=false] { + --primary: #0795d9; + --secondary: #7dd3fc; + --text: #000000; + --background: #ffffff; + --background-alt: #fafafa; + --light-gray: #e0e0e0; + --gray: #808080; + --dark-gray: #404040; + --overlay: #00000020; +} + +[data-dark=true] { + --primary: #0795d9; + --secondary: #075985; + --text: #ffffff; + --background: #181818; + --background-alt: #1c1c1c; + --light-gray: #404040; + --gray: #808080; + --dark-gray: #b0b0b0; + --overlay: #ffffff10; +} + +:root { + --title: "Barlow", sans-serif; + --heading: "Barlow", sans-serif; + --body: "Barlow", sans-serif; + --code: "Roboto Mono", monospace; + --large: 1.2rem; + --xl: 1.4rem; + --xxl: 1.6rem; + --thin: 200; + --regular: 400; + --semi-bold: 500; + --bold: 600; + --spacing: 2; + --compact: 1.5; + --rounded: 3px; + --shadow: 0 0 10px 0 var(--overlay); + --transition: 0.2s ease; +} + +/*# sourceMappingURL=-theme.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/-theme.css.map b/preview/pr-36/_styles/-theme.css.map new file mode 100644 index 0000000000..16cb8930dd --- /dev/null +++ b/preview/pr-36/_styles/-theme.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["-theme.scss"],"names":[],"mappings":"AACA;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAEF;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EAEE;EACA;EACA;EACA;EAGA;EACA;EACA;EAGA;EACA;EACA;EACA;EAGA;EACA;EAGA;EACA;EACA","sourcesContent":["// colors\n[data-dark=\"false\"] {\n --primary: #0795d9;\n --secondary: #7dd3fc;\n --text: #000000;\n --background: #ffffff;\n --background-alt: #fafafa;\n --light-gray: #e0e0e0;\n --gray: #808080;\n --dark-gray: #404040;\n --overlay: #00000020;\n}\n[data-dark=\"true\"] {\n --primary: #0795d9;\n --secondary: #075985;\n --text: #ffffff;\n --background: #181818;\n --background-alt: #1c1c1c;\n --light-gray: #404040;\n --gray: #808080;\n --dark-gray: #b0b0b0;\n --overlay: #ffffff10;\n}\n\n:root {\n // font families\n --title: \"Barlow\", sans-serif;\n --heading: \"Barlow\", sans-serif;\n --body: \"Barlow\", sans-serif;\n --code: \"Roboto Mono\", monospace;\n\n // font sizes\n --large: 1.2rem;\n --xl: 1.4rem;\n --xxl: 1.6rem;\n\n // font weights\n --thin: 200;\n --regular: 400;\n --semi-bold: 500;\n --bold: 600;\n\n // text line spacing\n --spacing: 2;\n --compact: 1.5;\n\n // effects\n --rounded: 3px;\n --shadow: 0 0 10px 0 var(--overlay);\n --transition: 0.2s ease;\n}\n"],"file":"-theme.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/alert.css b/preview/pr-36/_styles/alert.css new file mode 100644 index 0000000000..82bf01650a --- /dev/null +++ b/preview/pr-36/_styles/alert.css @@ -0,0 +1,36 @@ +.alert { + position: relative; + display: flex; + gap: 20px; + align-items: center; + margin: 20px 0; + padding: 20px; + border-radius: var(--rounded); + overflow: hidden; + text-align: left; + line-height: var(--spacing); +} + +.alert:before { + content: ""; + position: absolute; + inset: 0; + opacity: 0.1; + background: var(--color); + z-index: -1; +} + +.alert > .icon { + color: var(--color); + font-size: var(--large); +} + +.alert-content > :first-child { + margin-top: 0; +} + +.alert-content > :last-child { + margin-bottom: 0; +} + +/*# sourceMappingURL=alert.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/alert.css.map b/preview/pr-36/_styles/alert.css.map new file mode 100644 index 0000000000..4e461a42dd --- /dev/null +++ b/preview/pr-36/_styles/alert.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["alert.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;EACA;;;AAGF;EACE;;;AAGF;EACE","sourcesContent":[".alert {\n position: relative;\n display: flex;\n gap: 20px;\n align-items: center;\n margin: 20px 0;\n padding: 20px;\n border-radius: var(--rounded);\n overflow: hidden;\n text-align: left;\n line-height: var(--spacing);\n}\n\n.alert:before {\n content: \"\";\n position: absolute;\n inset: 0;\n opacity: 0.1;\n background: var(--color);\n z-index: -1;\n}\n\n.alert > .icon {\n color: var(--color);\n font-size: var(--large);\n}\n\n.alert-content > :first-child {\n margin-top: 0;\n}\n\n.alert-content > :last-child {\n margin-bottom: 0;\n}\n"],"file":"alert.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/all.css b/preview/pr-36/_styles/all.css new file mode 100644 index 0000000000..65194abf99 --- /dev/null +++ b/preview/pr-36/_styles/all.css @@ -0,0 +1,10 @@ +*, +::before, +::after { + box-sizing: border-box; + -moz-text-size-adjust: none; + -webkit-text-size-adjust: none; + text-size-adjust: none; +} + +/*# sourceMappingURL=all.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/all.css.map b/preview/pr-36/_styles/all.css.map new file mode 100644 index 0000000000..079eb5f9e9 --- /dev/null +++ b/preview/pr-36/_styles/all.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["all.scss"],"names":[],"mappings":"AAAA;AAAA;AAAA;EAGE;EACA;EACA;EACA","sourcesContent":["*,\n::before,\n::after {\n box-sizing: border-box;\n -moz-text-size-adjust: none;\n -webkit-text-size-adjust: none;\n text-size-adjust: none;\n}\n"],"file":"all.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/anchor.css b/preview/pr-36/_styles/anchor.css new file mode 100644 index 0000000000..3724245d6a --- /dev/null +++ b/preview/pr-36/_styles/anchor.css @@ -0,0 +1,23 @@ +.anchor { + display: inline-block; + position: relative; + width: 0; + margin: 0; + left: 0.5em; + color: var(--primary) !important; + opacity: 0; + font-size: 0.75em; + text-decoration: none; + transition: opacity var(--transition), color var(--transition); +} + +:hover > .anchor, +.anchor:focus { + opacity: 1; +} + +.anchor:hover { + color: var(--text) !important; +} + +/*# sourceMappingURL=anchor.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/anchor.css.map b/preview/pr-36/_styles/anchor.css.map new file mode 100644 index 0000000000..2fc8d7b1a3 --- /dev/null +++ b/preview/pr-36/_styles/anchor.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["anchor.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;AAAA;EAEE;;;AAGF;EACE","sourcesContent":[".anchor {\n display: inline-block;\n position: relative;\n width: 0;\n margin: 0;\n left: 0.5em;\n color: var(--primary) !important;\n opacity: 0;\n font-size: 0.75em;\n text-decoration: none;\n transition: opacity var(--transition), color var(--transition);\n}\n\n:hover > .anchor,\n.anchor:focus {\n opacity: 1;\n}\n\n.anchor:hover {\n color: var(--text) !important;\n}\n"],"file":"anchor.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/background.css b/preview/pr-36/_styles/background.css new file mode 100644 index 0000000000..025e56adf9 --- /dev/null +++ b/preview/pr-36/_styles/background.css @@ -0,0 +1,20 @@ +.background { + position: relative; + background: var(--background); + color: var(--text); + z-index: 1; +} + +.background:before { + content: ""; + position: absolute; + inset: 0; + background-image: var(--image); + background-size: cover; + background-repeat: no-repeat; + background-position: 50% 50%; + opacity: 0.25; + z-index: -1; +} + +/*# sourceMappingURL=background.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/background.css.map b/preview/pr-36/_styles/background.css.map new file mode 100644 index 0000000000..b655d9e563 --- /dev/null +++ b/preview/pr-36/_styles/background.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["background.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA;EACA;;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA","sourcesContent":[".background {\n position: relative;\n background: var(--background);\n color: var(--text);\n z-index: 1;\n}\n\n.background:before {\n content: \"\";\n position: absolute;\n inset: 0;\n background-image: var(--image);\n background-size: cover;\n background-repeat: no-repeat;\n background-position: 50% 50%;\n opacity: 0.25;\n z-index: -1;\n}\n"],"file":"background.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/body.css b/preview/pr-36/_styles/body.css new file mode 100644 index 0000000000..35145d6e06 --- /dev/null +++ b/preview/pr-36/_styles/body.css @@ -0,0 +1,14 @@ +body { + display: flex; + flex-direction: column; + margin: 0; + padding: 0; + min-height: 100vh; + background: var(--background); + color: var(--text); + font-family: var(--body); + text-align: center; + line-height: var(--compact); +} + +/*# sourceMappingURL=body.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/body.css.map b/preview/pr-36/_styles/body.css.map new file mode 100644 index 0000000000..d03b64e777 --- /dev/null +++ b/preview/pr-36/_styles/body.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["body.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA","sourcesContent":["body {\n display: flex;\n flex-direction: column;\n margin: 0;\n padding: 0;\n min-height: 100vh;\n background: var(--background);\n color: var(--text);\n font-family: var(--body);\n text-align: center;\n line-height: var(--compact);\n}\n"],"file":"body.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/bold.css b/preview/pr-36/_styles/bold.css new file mode 100644 index 0000000000..94a711f107 --- /dev/null +++ b/preview/pr-36/_styles/bold.css @@ -0,0 +1,6 @@ +b, +strong { + font-weight: var(--bold); +} + +/*# sourceMappingURL=bold.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/bold.css.map b/preview/pr-36/_styles/bold.css.map new file mode 100644 index 0000000000..57012fd4b5 --- /dev/null +++ b/preview/pr-36/_styles/bold.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["bold.scss"],"names":[],"mappings":"AAAA;AAAA;EAEE","sourcesContent":["b,\nstrong {\n font-weight: var(--bold);\n}\n"],"file":"bold.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/button.css b/preview/pr-36/_styles/button.css new file mode 100644 index 0000000000..ab3f650ed0 --- /dev/null +++ b/preview/pr-36/_styles/button.css @@ -0,0 +1,49 @@ +button { + cursor: pointer; +} + +.button-wrapper { + display: contents; +} + +.button { + display: inline-flex; + justify-content: center; + align-items: center; + gap: 10px; + max-width: calc(100% - 5px - 5px); + margin: 5px; + padding: 10px 15px; + border: none; + border-radius: var(--rounded); + background: var(--primary); + color: var(--background); + text-align: center; + font: inherit; + font-family: var(--heading); + font-weight: var(--semi-bold); + text-decoration: none; + vertical-align: middle; + appearance: none; + transition: background var(--transition), color var(--transition); +} + +.button:hover { + background: var(--text); + color: var(--background); +} + +.button[data-style=bare] { + padding: 5px; + background: none; + color: var(--primary); +} +.button[data-style=bare]:hover { + color: var(--text); +} + +.button[data-flip] { + flex-direction: row-reverse; +} + +/*# sourceMappingURL=button.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/button.css.map b/preview/pr-36/_styles/button.css.map new file mode 100644 index 0000000000..5fee0dd351 --- /dev/null +++ b/preview/pr-36/_styles/button.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["button.scss"],"names":[],"mappings":"AAAA;EACE;;;AAGF;EACE;;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;EACA;;;AAGF;EACE;EACA;EACA;;AAEA;EACE;;;AAIJ;EACE","sourcesContent":["button {\n cursor: pointer;\n}\n\n.button-wrapper {\n display: contents;\n}\n\n.button {\n display: inline-flex;\n justify-content: center;\n align-items: center;\n gap: 10px;\n max-width: calc(100% - 5px - 5px);\n margin: 5px;\n padding: 10px 15px;\n border: none;\n border-radius: var(--rounded);\n background: var(--primary);\n color: var(--background);\n text-align: center;\n font: inherit;\n font-family: var(--heading);\n font-weight: var(--semi-bold);\n text-decoration: none;\n vertical-align: middle;\n appearance: none;\n transition: background var(--transition), color var(--transition);\n}\n\n.button:hover {\n background: var(--text);\n color: var(--background);\n}\n\n.button[data-style=\"bare\"] {\n padding: 5px;\n background: none;\n color: var(--primary);\n\n &:hover {\n color: var(--text);\n }\n}\n\n.button[data-flip] {\n flex-direction: row-reverse;\n}\n"],"file":"button.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/card.css b/preview/pr-36/_styles/card.css new file mode 100644 index 0000000000..1b37a12fd8 --- /dev/null +++ b/preview/pr-36/_styles/card.css @@ -0,0 +1,50 @@ +.card { + display: inline-flex; + justify-content: stretch; + align-items: center; + flex-direction: column; + width: 350px; + max-width: calc(100% - 20px - 20px); + margin: 20px; + background: var(--background); + border-radius: var(--rounded); + overflow: hidden; + box-shadow: var(--shadow); + vertical-align: top; +} + +.card[data-style=small] { + width: 250px; +} + +.card-image img { + aspect-ratio: 3/2; + object-fit: cover; + width: 100%; +} + +.card-text { + display: inline-flex; + justify-content: flex-start; + align-items: center; + flex-direction: column; + gap: 20px; + max-width: 100%; + padding: 20px; +} + +.card-text > * { + margin: 0 !important; +} + +.card-title { + font-family: var(--heading); + font-weight: var(--semi-bold); +} + +.card-subtitle { + margin-top: -10px !important; + font-style: italic; +} + +/*# sourceMappingURL=card.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/card.css.map b/preview/pr-36/_styles/card.css.map new file mode 100644 index 0000000000..393b7a3aec --- /dev/null +++ b/preview/pr-36/_styles/card.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["card.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE;EACA;EACA;;;AAIF;EACE;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE;EACA;;;AAGF;EACE;EACA","sourcesContent":[".card {\n display: inline-flex;\n justify-content: stretch;\n align-items: center;\n flex-direction: column;\n width: 350px;\n max-width: calc(100% - 20px - 20px);\n margin: 20px;\n background: var(--background);\n border-radius: var(--rounded);\n overflow: hidden;\n box-shadow: var(--shadow);\n vertical-align: top;\n}\n\n.card[data-style=\"small\"] {\n width: 250px;\n}\n\n.card-image img {\n aspect-ratio: 3 / 2;\n object-fit: cover;\n width: 100%;\n // box-shadow: var(--shadow);\n}\n\n.card-text {\n display: inline-flex;\n justify-content: flex-start;\n align-items: center;\n flex-direction: column;\n gap: 20px;\n max-width: 100%;\n padding: 20px;\n}\n\n.card-text > * {\n margin: 0 !important;\n}\n\n.card-title {\n font-family: var(--heading);\n font-weight: var(--semi-bold);\n}\n\n.card-subtitle {\n margin-top: -10px !important;\n font-style: italic;\n}\n"],"file":"card.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/checkbox.css b/preview/pr-36/_styles/checkbox.css new file mode 100644 index 0000000000..8c77dc53e1 --- /dev/null +++ b/preview/pr-36/_styles/checkbox.css @@ -0,0 +1,5 @@ +input[type=checkbox] { + cursor: pointer; +} + +/*# sourceMappingURL=checkbox.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/checkbox.css.map b/preview/pr-36/_styles/checkbox.css.map new file mode 100644 index 0000000000..90fb493297 --- /dev/null +++ b/preview/pr-36/_styles/checkbox.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["checkbox.scss"],"names":[],"mappings":"AAAA;EACE","sourcesContent":["input[type=\"checkbox\"] {\n cursor: pointer;\n}\n"],"file":"checkbox.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/citation.css b/preview/pr-36/_styles/citation.css new file mode 100644 index 0000000000..b535c2dde1 --- /dev/null +++ b/preview/pr-36/_styles/citation.css @@ -0,0 +1,97 @@ +.citation-container { + container-type: inline-size; +} + +.citation { + display: flex; + margin: 20px 0; + border-radius: var(--rounded); + background: var(--background); + overflow: hidden; + box-shadow: var(--shadow); +} + +.citation-image { + position: relative; + width: 180px; + flex-shrink: 0; +} + +.citation-image img { + position: absolute; + inset: 0; + width: 100%; + height: 100%; + object-fit: contain; +} + +.citation-text { + position: relative; + display: inline-flex; + flex-wrap: wrap; + gap: 10px; + max-width: 100%; + height: min-content; + padding: 20px; + padding-left: 30px; + text-align: left; + overflow-wrap: break-word; + z-index: 0; +} + +.citation-title, +.citation-authors, +.citation-details, +.citation-description { + width: 100%; +} + +.citation-title { + font-weight: var(--semi-bold); +} + +.citation-text > .icon { + position: absolute; + top: 20px; + right: 20px; + color: var(--light-gray); + opacity: 0.5; + font-size: 30px; + z-index: -1; +} + +.citation-publisher { + text-transform: capitalize; +} + +.citation-description { + color: var(--gray); +} + +.citation-buttons { + display: flex; + flex-wrap: wrap; + gap: 10px; +} + +.citation-buttons .button { + margin: 0; +} + +.citation-text > .tags { + display: inline-flex; + justify-content: flex-start; + margin: 0; +} + +@container (max-width: 800px) { + .citation { + flex-direction: column; + } + .citation-image { + width: unset; + height: 180px; + } +} + +/*# sourceMappingURL=citation.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/citation.css.map b/preview/pr-36/_styles/citation.css.map new file mode 100644 index 0000000000..8d3446b749 --- /dev/null +++ b/preview/pr-36/_styles/citation.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["citation.scss"],"names":[],"mappings":"AAGA;EACE;;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;EACA,OAlBW;EAmBX;;;AAIF;EACE;EACA;EACA;EACA;EACA;;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;AAAA;AAAA;AAAA;EAIE;;;AAGF;EACE;;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE;;;AAGF;EACE;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE;EACA;EACA;;;AAGF;EACE;IACE;;EAGF;IACE;IACA,QAjGS","sourcesContent":["$thumb-size: 180px;\n$wrap: 800px;\n\n.citation-container {\n container-type: inline-size;\n}\n\n.citation {\n display: flex;\n margin: 20px 0;\n border-radius: var(--rounded);\n background: var(--background);\n overflow: hidden;\n box-shadow: var(--shadow);\n}\n\n.citation-image {\n position: relative;\n width: $thumb-size;\n flex-shrink: 0;\n // box-shadow: var(--shadow);\n}\n\n.citation-image img {\n position: absolute;\n inset: 0;\n width: 100%;\n height: 100%;\n object-fit: contain;\n}\n\n.citation-text {\n position: relative;\n display: inline-flex;\n flex-wrap: wrap;\n gap: 10px;\n max-width: 100%;\n height: min-content;\n padding: 20px;\n padding-left: 30px;\n text-align: left;\n overflow-wrap: break-word;\n z-index: 0;\n}\n\n.citation-title,\n.citation-authors,\n.citation-details,\n.citation-description {\n width: 100%;\n}\n\n.citation-title {\n font-weight: var(--semi-bold);\n}\n\n.citation-text > .icon {\n position: absolute;\n top: 20px;\n right: 20px;\n color: var(--light-gray);\n opacity: 0.5;\n font-size: 30px;\n z-index: -1;\n}\n\n.citation-publisher {\n text-transform: capitalize;\n}\n\n.citation-description {\n color: var(--gray);\n}\n\n.citation-buttons {\n display: flex;\n flex-wrap: wrap;\n gap: 10px;\n}\n\n.citation-buttons .button {\n margin: 0;\n}\n\n.citation-text > .tags {\n display: inline-flex;\n justify-content: flex-start;\n margin: 0;\n}\n\n@container (max-width: #{$wrap}) {\n .citation {\n flex-direction: column;\n }\n\n .citation-image {\n width: unset;\n height: $thumb-size;\n }\n}\n"],"file":"citation.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/code.css b/preview/pr-36/_styles/code.css new file mode 100644 index 0000000000..8912ad31bf --- /dev/null +++ b/preview/pr-36/_styles/code.css @@ -0,0 +1,33 @@ +pre, +code, +pre *, +code * { + font-family: var(--code); +} + +code.highlighter-rouge { + padding: 2px 6px; + background: var(--light-gray); + border-radius: var(--rounded); +} + +div.highlighter-rouge { + width: 100%; + margin: 40px 0; + border-radius: var(--rounded); + overflow-x: auto; + overflow-y: auto; + text-align: left; +} +div.highlighter-rouge div.highlight { + display: contents; +} +div.highlighter-rouge div.highlight pre.highlight { + width: fit-content; + min-width: 100%; + margin: 0; + padding: 20px; + color: var(--white); +} + +/*# sourceMappingURL=code.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/code.css.map b/preview/pr-36/_styles/code.css.map new file mode 100644 index 0000000000..8f447176a3 --- /dev/null +++ b/preview/pr-36/_styles/code.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["code.scss"],"names":[],"mappings":"AAAA;AAAA;AAAA;AAAA;EAIE;;;AAIF;EACE;EACA;EACA;;;AAIF;EACE;EACA;EACA;EACA;EACA;EACA;;AAEA;EACE;;AAEA;EACE;EACA;EACA;EACA;EACA","sourcesContent":["pre,\ncode,\npre *,\ncode * {\n font-family: var(--code);\n}\n\n// inline code\ncode.highlighter-rouge {\n padding: 2px 6px;\n background: var(--light-gray);\n border-radius: var(--rounded);\n}\n\n// code block\ndiv.highlighter-rouge {\n width: 100%;\n margin: 40px 0;\n border-radius: var(--rounded);\n overflow-x: auto;\n overflow-y: auto;\n text-align: left;\n\n div.highlight {\n display: contents;\n\n pre.highlight {\n width: fit-content;\n min-width: 100%;\n margin: 0;\n padding: 20px;\n color: var(--white);\n }\n }\n}\n"],"file":"code.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/cols.css b/preview/pr-36/_styles/cols.css new file mode 100644 index 0000000000..b15b095ba2 --- /dev/null +++ b/preview/pr-36/_styles/cols.css @@ -0,0 +1,34 @@ +.cols { + display: grid; + --repeat: min(3, var(--cols)); + grid-template-columns: repeat(var(--repeat), 1fr); + align-items: flex-start; + gap: 40px; + margin: 40px 0; +} + +.cols > * { + min-width: 0; + min-height: 0; +} + +.cols > div > :first-child { + margin-top: 0 !important; +} + +.cols > div > :last-child { + margin-bottom: 0 !important; +} + +@media (max-width: 750px) { + .cols { + --repeat: min(2, var(--cols)); + } +} +@media (max-width: 500px) { + .cols { + --repeat: min(1, var(--cols)); + } +} + +/*# sourceMappingURL=cols.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/cols.css.map b/preview/pr-36/_styles/cols.css.map new file mode 100644 index 0000000000..2c44d54394 --- /dev/null +++ b/preview/pr-36/_styles/cols.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["cols.scss"],"names":[],"mappings":"AAGA;EACE;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;EACA;;;AAGF;EACE;;;AAGF;EACE;;;AAGF;EACE;IACE;;;AAIJ;EACE;IACE","sourcesContent":["$two: 750px;\n$one: 500px;\n\n.cols {\n display: grid;\n --repeat: min(3, var(--cols));\n grid-template-columns: repeat(var(--repeat), 1fr);\n align-items: flex-start;\n gap: 40px;\n margin: 40px 0;\n}\n\n.cols > * {\n min-width: 0;\n min-height: 0;\n}\n\n.cols > div > :first-child {\n margin-top: 0 !important;\n}\n\n.cols > div > :last-child {\n margin-bottom: 0 !important;\n}\n\n@media (max-width: $two) {\n .cols {\n --repeat: min(2, var(--cols));\n }\n}\n\n@media (max-width: $one) {\n .cols {\n --repeat: min(1, var(--cols));\n }\n}\n"],"file":"cols.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/dark-toggle.css b/preview/pr-36/_styles/dark-toggle.css new file mode 100644 index 0000000000..87065b78ff --- /dev/null +++ b/preview/pr-36/_styles/dark-toggle.css @@ -0,0 +1,30 @@ +.dark-toggle { + position: relative; + width: 40px; + height: 25px; + margin: 0; + border-radius: 999px; + background: var(--primary); + appearance: none; + transition: background var(--transition); +} + +.dark-toggle:after { + content: "\f185"; + position: absolute; + left: 12px; + top: 50%; + color: var(--text); + font-size: 15px; + font-family: "Font Awesome 6 Free"; + font-weight: 900; + transform: translate(-50%, -50%); + transition: left var(--transition); +} + +.dark-toggle:checked:after { + content: "\f186"; + left: calc(100% - 12px); +} + +/*# sourceMappingURL=dark-toggle.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/dark-toggle.css.map b/preview/pr-36/_styles/dark-toggle.css.map new file mode 100644 index 0000000000..496aa7b586 --- /dev/null +++ b/preview/pr-36/_styles/dark-toggle.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["dark-toggle.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;EACA","sourcesContent":[".dark-toggle {\n position: relative;\n width: 40px;\n height: 25px;\n margin: 0;\n border-radius: 999px;\n background: var(--primary);\n appearance: none;\n transition: background var(--transition);\n}\n\n.dark-toggle:after {\n content: \"\\f185\";\n position: absolute;\n left: 12px;\n top: 50%;\n color: var(--text);\n font-size: 15px;\n font-family: \"Font Awesome 6 Free\";\n font-weight: 900;\n transform: translate(-50%, -50%);\n transition: left var(--transition);\n}\n\n.dark-toggle:checked:after {\n content: \"\\f186\";\n left: calc(100% - 12px);\n}\n"],"file":"dark-toggle.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/feature.css b/preview/pr-36/_styles/feature.css new file mode 100644 index 0000000000..7f7e0120b6 --- /dev/null +++ b/preview/pr-36/_styles/feature.css @@ -0,0 +1,49 @@ +.feature { + display: flex; + justify-content: center; + align-items: center; + gap: 40px; + margin: 40px 0; +} + +.feature-image { + flex-shrink: 0; + width: 40%; + aspect-ratio: 3/2; + border-radius: var(--rounded); + overflow: hidden; + box-shadow: var(--shadow); +} + +.feature-image img { + width: 100%; + height: 100%; + object-fit: cover; +} + +.feature-text { + flex-grow: 1; +} + +.feature-title { + font-size: var(--large); + text-align: center; + font-family: var(--heading); + font-weight: var(--semi-bold); +} + +.feature[data-flip] { + flex-direction: row-reverse; +} + +@media (max-width: 800px) { + .feature { + flex-direction: column !important; + } + .feature-image { + width: 100%; + max-width: 400px; + } +} + +/*# sourceMappingURL=feature.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/feature.css.map b/preview/pr-36/_styles/feature.css.map new file mode 100644 index 0000000000..1a2cdac650 --- /dev/null +++ b/preview/pr-36/_styles/feature.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["feature.scss"],"names":[],"mappings":"AAEA;EACE;EACA;EACA;EACA;EACA;;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE;EACA;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE;IACE;;EAGF;IACE;IACA","sourcesContent":["$wrap: 800px;\n\n.feature {\n display: flex;\n justify-content: center;\n align-items: center;\n gap: 40px;\n margin: 40px 0;\n}\n\n.feature-image {\n flex-shrink: 0;\n width: 40%;\n aspect-ratio: 3 / 2;\n border-radius: var(--rounded);\n overflow: hidden;\n box-shadow: var(--shadow);\n}\n\n.feature-image img {\n width: 100%;\n height: 100%;\n object-fit: cover;\n}\n\n.feature-text {\n flex-grow: 1;\n}\n\n.feature-title {\n font-size: var(--large);\n text-align: center;\n font-family: var(--heading);\n font-weight: var(--semi-bold);\n}\n\n.feature[data-flip] {\n flex-direction: row-reverse;\n}\n\n@media (max-width: $wrap) {\n .feature {\n flex-direction: column !important;\n }\n\n .feature-image {\n width: 100%;\n max-width: calc($wrap / 2);\n }\n}\n"],"file":"feature.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/figure.css b/preview/pr-36/_styles/figure.css new file mode 100644 index 0000000000..95589387ff --- /dev/null +++ b/preview/pr-36/_styles/figure.css @@ -0,0 +1,25 @@ +.figure { + display: flex; + justify-content: center; + align-items: center; + flex-direction: column; + gap: 10px; + margin: 40px 0; +} + +.figure-image { + display: contents; +} + +.figure-image img { + border-radius: var(--rounded); + overflow: hidden; + box-shadow: var(--shadow); +} + +.figure-caption { + font-style: italic; + text-align: center; +} + +/*# sourceMappingURL=figure.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/figure.css.map b/preview/pr-36/_styles/figure.css.map new file mode 100644 index 0000000000..4d62fcf185 --- /dev/null +++ b/preview/pr-36/_styles/figure.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["figure.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE;EACA;EACA;;;AAGF;EACE;EACA","sourcesContent":[".figure {\n display: flex;\n justify-content: center;\n align-items: center;\n flex-direction: column;\n gap: 10px;\n margin: 40px 0;\n}\n\n.figure-image {\n display: contents;\n}\n\n.figure-image img {\n border-radius: var(--rounded);\n overflow: hidden;\n box-shadow: var(--shadow);\n}\n\n.figure-caption {\n font-style: italic;\n text-align: center;\n}\n"],"file":"figure.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/float.css b/preview/pr-36/_styles/float.css new file mode 100644 index 0000000000..c91b46eb8c --- /dev/null +++ b/preview/pr-36/_styles/float.css @@ -0,0 +1,35 @@ +.float { + margin-bottom: 20px; + max-width: 50%; +} + +.float > * { + margin: 0 !important; +} + +.float:not([data-flip]) { + float: left; + margin-right: 40px; +} + +.float[data-flip] { + float: right; + margin-left: 40px; +} + +.float[data-clear] { + float: unset; + clear: both; + margin: 0; +} + +@media (max-width: 600px) { + .float { + float: unset !important; + clear: both !important; + margin: auto !important; + max-width: unset; + } +} + +/*# sourceMappingURL=float.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/float.css.map b/preview/pr-36/_styles/float.css.map new file mode 100644 index 0000000000..42c53e0b05 --- /dev/null +++ b/preview/pr-36/_styles/float.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["float.scss"],"names":[],"mappings":"AAEA;EACE;EACA;;;AAGF;EACE;;;AAGF;EACE;EACA;;;AAGF;EACE;EACA;;;AAGF;EACE;EACA;EACA;;;AAGF;EACE;IACE;IACA;IACA;IACA","sourcesContent":["$wrap: 600px;\n\n.float {\n margin-bottom: 20px;\n max-width: 50%;\n}\n\n.float > * {\n margin: 0 !important;\n}\n\n.float:not([data-flip]) {\n float: left;\n margin-right: 40px;\n}\n\n.float[data-flip] {\n float: right;\n margin-left: 40px;\n}\n\n.float[data-clear] {\n float: unset;\n clear: both;\n margin: 0;\n}\n\n@media (max-width: $wrap) {\n .float {\n float: unset !important;\n clear: both !important;\n margin: auto !important;\n max-width: unset;\n }\n}\n"],"file":"float.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/font.css b/preview/pr-36/_styles/font.css new file mode 100644 index 0000000000..c40e155902 --- /dev/null +++ b/preview/pr-36/_styles/font.css @@ -0,0 +1,3 @@ +@font-face {} + +/*# sourceMappingURL=font.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/font.css.map b/preview/pr-36/_styles/font.css.map new file mode 100644 index 0000000000..e1d56c0444 --- /dev/null +++ b/preview/pr-36/_styles/font.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["font.scss"],"names":[],"mappings":"AAAA","sourcesContent":["@font-face {\n}\n"],"file":"font.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/footer.css b/preview/pr-36/_styles/footer.css new file mode 100644 index 0000000000..a85b907fee --- /dev/null +++ b/preview/pr-36/_styles/footer.css @@ -0,0 +1,24 @@ +footer { + display: flex; + justify-content: center; + align-items: center; + flex-direction: column; + gap: 20px; + padding: 40px; + line-height: var(--spacing); + box-shadow: var(--shadow); +} + +footer a { + color: var(--text) !important; +} + +footer a:hover { + color: var(--primary) !important; +} + +footer .icon { + font-size: var(--xl); +} + +/*# sourceMappingURL=footer.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/footer.css.map b/preview/pr-36/_styles/footer.css.map new file mode 100644 index 0000000000..61ae1179a5 --- /dev/null +++ b/preview/pr-36/_styles/footer.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["footer.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE;;;AAGF;EACE","sourcesContent":["footer {\n display: flex;\n justify-content: center;\n align-items: center;\n flex-direction: column;\n gap: 20px;\n padding: 40px;\n line-height: var(--spacing);\n box-shadow: var(--shadow);\n}\n\nfooter a {\n color: var(--text) !important;\n}\n\nfooter a:hover {\n color: var(--primary) !important;\n}\n\nfooter .icon {\n font-size: var(--xl);\n}\n"],"file":"footer.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/form.css b/preview/pr-36/_styles/form.css new file mode 100644 index 0000000000..761145950c --- /dev/null +++ b/preview/pr-36/_styles/form.css @@ -0,0 +1,8 @@ +form { + display: flex; + justify-content: center; + align-items: center; + gap: 10px; +} + +/*# sourceMappingURL=form.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/form.css.map b/preview/pr-36/_styles/form.css.map new file mode 100644 index 0000000000..65939cb61c --- /dev/null +++ b/preview/pr-36/_styles/form.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["form.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA;EACA","sourcesContent":["form {\n display: flex;\n justify-content: center;\n align-items: center;\n gap: 10px;\n}\n"],"file":"form.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/grid.css b/preview/pr-36/_styles/grid.css new file mode 100644 index 0000000000..a595ce7b1c --- /dev/null +++ b/preview/pr-36/_styles/grid.css @@ -0,0 +1,45 @@ +.grid { + display: grid; + --repeat: 3; + grid-template-columns: repeat(var(--repeat), 1fr); + justify-content: center; + align-items: flex-start; + gap: 40px; + margin: 40px 0; +} + +.grid > * { + min-width: 0; + min-height: 0; + width: 100%; + margin: 0 !important; +} + +@media (max-width: 750px) { + .grid { + --repeat: 2; + } +} +@media (max-width: 500px) { + .grid { + --repeat: 1; + } +} +.grid[data-style=square] { + align-items: center; +} +.grid[data-style=square] > * { + aspect-ratio: 1/1; +} +.grid[data-style=square] img { + aspect-ratio: 1/1; + object-fit: cover; + max-width: unset; + max-height: unset; +} + +.grid > :where(h1, h2, h3, h4, h5, h6) { + display: none; +} + +/*# sourceMappingURL=grid.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/grid.css.map b/preview/pr-36/_styles/grid.css.map new file mode 100644 index 0000000000..8e00ee1c07 --- /dev/null +++ b/preview/pr-36/_styles/grid.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["grid.scss"],"names":[],"mappings":"AAGA;EACE;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;EACA;EACA;EAEA;;;AAGF;EACE;IACE;;;AAIJ;EACE;IACE;;;AAIJ;EACE;;AAEA;EACE;;AAGF;EACE;EACA;EACA;EACA;;;AAIJ;EACE","sourcesContent":["$two: 750px;\n$one: 500px;\n\n.grid {\n display: grid;\n --repeat: 3;\n grid-template-columns: repeat(var(--repeat), 1fr);\n justify-content: center;\n align-items: flex-start;\n gap: 40px;\n margin: 40px 0;\n}\n\n.grid > * {\n min-width: 0;\n min-height: 0;\n width: 100%;\n // max-height: 50vh;\n margin: 0 !important;\n}\n\n@media (max-width: $two) {\n .grid {\n --repeat: 2;\n }\n}\n\n@media (max-width: $one) {\n .grid {\n --repeat: 1;\n }\n}\n\n.grid[data-style=\"square\"] {\n align-items: center;\n\n & > * {\n aspect-ratio: 1 / 1;\n }\n\n & img {\n aspect-ratio: 1 / 1;\n object-fit: cover;\n max-width: unset;\n max-height: unset;\n }\n}\n\n.grid > :where(h1, h2, h3, h4, h5, h6) {\n display: none;\n}\n"],"file":"grid.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/header.css b/preview/pr-36/_styles/header.css new file mode 100644 index 0000000000..f4e49a401a --- /dev/null +++ b/preview/pr-36/_styles/header.css @@ -0,0 +1,145 @@ +header { + display: flex; + justify-content: space-between; + align-items: center; + flex-wrap: wrap; + gap: 20px; + padding: 20px; + box-shadow: var(--shadow); + position: sticky !important; + top: 0; + z-index: 10 !important; +} + +header a { + color: var(--text); + text-decoration: none; +} + +.home { + display: flex; + justify-content: flex-start; + align-items: center; + gap: 10px; + flex-basis: 0; + flex-grow: 1; + max-width: 100%; +} + +.logo { + height: 40px; +} + +.logo > * { + height: 100%; +} + +.title-text { + display: flex; + justify-content: flex-start; + align-items: baseline; + flex-wrap: wrap; + gap: 5px; + min-width: 0; + font-family: var(--title); + text-align: left; +} + +.title { + font-size: var(--large); +} + +.subtitle { + opacity: 0.65; + font-weight: var(--thin); +} + +.nav-toggle { + display: none; + position: relative; + width: 30px; + height: 30px; + margin: 0; + color: var(--text); + appearance: none; + transition: background var(--transition); +} + +.nav-toggle:after { + content: "\f0c9"; + position: absolute; + left: 50%; + top: 50%; + color: var(--text); + font-size: 15px; + font-family: "Font Awesome 6 Free"; + font-weight: 900; + transform: translate(-50%, -50%); +} + +.nav-toggle:checked:after { + content: "\f00d"; +} + +nav { + display: flex; + justify-content: center; + align-items: center; + flex-wrap: wrap; + gap: 10px; + font-family: var(--heading); + text-transform: uppercase; +} + +nav > a { + padding: 5px; +} + +nav > a:hover { + color: var(--primary); +} + +@media (max-width: 700px) { + header:not([data-big]) { + justify-content: flex-end; + } + header:not([data-big]) .nav-toggle { + display: flex; + } + header:not([data-big]) .nav-toggle:not(:checked) + nav { + display: none; + } + header:not([data-big]) nav { + align-items: flex-end; + flex-direction: column; + width: 100%; + } +} + +header[data-big] { + justify-content: center; + align-items: center; + flex-direction: column; + padding: 100px 20px; + top: unset; +} +header[data-big] .home { + flex-direction: column; + flex-grow: 0; +} +header[data-big] .logo { + height: 80px; +} +header[data-big] .title-text { + flex-direction: column; + align-items: center; + text-align: center; +} +header[data-big] .title { + font-size: var(--xxl); +} +header[data-big] .subtitle { + font-size: var(--large); +} + +/*# sourceMappingURL=header.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/header.css.map b/preview/pr-36/_styles/header.css.map new file mode 100644 index 0000000000..04153a29ec --- /dev/null +++ b/preview/pr-36/_styles/header.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["header.scss"],"names":[],"mappings":"AAMA;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EAGE;EACA;EACA;;;AAIJ;EACE;EACA;;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE,QArCK;;;AAwCP;EACE;;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE;EACA;;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE;;;AAIA;EADF;IAEI;;EAEA;IACE;;EAGF;IACE;;EAGF;IACE;IACA;IACA;;;;AAKN;EACE;EACA;EACA;EACA;EAGE;;AAGF;EACE;EACA;;AAGF;EACE,QAlJO;;AAqJT;EACE;EACA;EACA;;AAGF;EACE;;AAGF;EACE","sourcesContent":["$logo-big: 80px;\n$logo: 40px;\n$big-padding: 100px;\n$collapse: 700px;\n$sticky: true;\n\nheader {\n display: flex;\n justify-content: space-between;\n align-items: center;\n flex-wrap: wrap;\n gap: 20px;\n padding: 20px;\n box-shadow: var(--shadow);\n\n @if $sticky {\n position: sticky !important;\n top: 0;\n z-index: 10 !important;\n }\n}\n\nheader a {\n color: var(--text);\n text-decoration: none;\n}\n\n.home {\n display: flex;\n justify-content: flex-start;\n align-items: center;\n gap: 10px;\n flex-basis: 0;\n flex-grow: 1;\n max-width: 100%;\n}\n\n.logo {\n height: $logo;\n}\n\n.logo > * {\n height: 100%;\n}\n\n.title-text {\n display: flex;\n justify-content: flex-start;\n align-items: baseline;\n flex-wrap: wrap;\n gap: 5px;\n min-width: 0;\n font-family: var(--title);\n text-align: left;\n}\n\n.title {\n font-size: var(--large);\n}\n\n.subtitle {\n opacity: 0.65;\n font-weight: var(--thin);\n}\n\n.nav-toggle {\n display: none;\n position: relative;\n width: 30px;\n height: 30px;\n margin: 0;\n color: var(--text);\n appearance: none;\n transition: background var(--transition);\n}\n\n.nav-toggle:after {\n content: \"\\f0c9\";\n position: absolute;\n left: 50%;\n top: 50%;\n color: var(--text);\n font-size: 15px;\n font-family: \"Font Awesome 6 Free\";\n font-weight: 900;\n transform: translate(-50%, -50%);\n}\n\n.nav-toggle:checked:after {\n content: \"\\f00d\";\n}\n\nnav {\n display: flex;\n justify-content: center;\n align-items: center;\n flex-wrap: wrap;\n gap: 10px;\n font-family: var(--heading);\n text-transform: uppercase;\n}\n\nnav > a {\n padding: 5px;\n}\n\nnav > a:hover {\n color: var(--primary);\n}\n\nheader:not([data-big]) {\n @media (max-width: $collapse) {\n justify-content: flex-end;\n\n .nav-toggle {\n display: flex;\n }\n\n .nav-toggle:not(:checked) + nav {\n display: none;\n }\n\n nav {\n align-items: flex-end;\n flex-direction: column;\n width: 100%;\n }\n }\n}\n\nheader[data-big] {\n justify-content: center;\n align-items: center;\n flex-direction: column;\n padding: $big-padding 20px;\n\n @if $sticky {\n top: unset;\n }\n\n .home {\n flex-direction: column;\n flex-grow: 0;\n }\n\n .logo {\n height: $logo-big;\n }\n\n .title-text {\n flex-direction: column;\n align-items: center;\n text-align: center;\n }\n\n .title {\n font-size: var(--xxl);\n }\n\n .subtitle {\n font-size: var(--large);\n }\n}\n"],"file":"header.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/heading.css b/preview/pr-36/_styles/heading.css new file mode 100644 index 0000000000..05ebc5b88e --- /dev/null +++ b/preview/pr-36/_styles/heading.css @@ -0,0 +1,49 @@ +h1, +h2, +h3, +h4, +h5, +h6 { + margin: 40px 0 20px 0; + font-family: var(--heading); + font-weight: var(--semi-bold); + text-align: left; + letter-spacing: 1px; +} + +h1 { + font-size: 1.6rem; + font-weight: var(--regular); + text-transform: uppercase; + text-align: center; +} + +h2 { + font-size: 1.6rem; + padding-bottom: 5px; + border-bottom: solid 1px var(--light-gray); + font-weight: var(--regular); +} + +h3 { + font-size: 1.5rem; +} + +h4 { + font-size: 1.3rem; +} + +h5 { + font-size: 1.15rem; +} + +h6 { + font-size: 1rem; +} + +:where(h1, h2, h3, h4, h5, h6) > .icon { + margin-right: 1em; + color: var(--light-gray); +} + +/*# sourceMappingURL=heading.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/heading.css.map b/preview/pr-36/_styles/heading.css.map new file mode 100644 index 0000000000..b8fe6a42e0 --- /dev/null +++ b/preview/pr-36/_styles/heading.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["heading.scss"],"names":[],"mappings":"AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;EAME;EACA;EACA;EACA;EACA;;;AAGF;EACE;EACA;EACA;EACA;;;AAGF;EACE;EACA;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE;;;AAGF;EACE;;;AAGF;EACE;;;AAGF;EACE;EACA","sourcesContent":["h1,\nh2,\nh3,\nh4,\nh5,\nh6 {\n margin: 40px 0 20px 0;\n font-family: var(--heading);\n font-weight: var(--semi-bold);\n text-align: left;\n letter-spacing: 1px;\n}\n\nh1 {\n font-size: 1.6rem;\n font-weight: var(--regular);\n text-transform: uppercase;\n text-align: center;\n}\n\nh2 {\n font-size: 1.6rem;\n padding-bottom: 5px;\n border-bottom: solid 1px var(--light-gray);\n font-weight: var(--regular);\n}\n\nh3 {\n font-size: 1.5rem;\n}\n\nh4 {\n font-size: 1.3rem;\n}\n\nh5 {\n font-size: 1.15rem;\n}\n\nh6 {\n font-size: 1rem;\n}\n\n:where(h1, h2, h3, h4, h5, h6) > .icon {\n margin-right: 1em;\n color: var(--light-gray);\n}\n"],"file":"heading.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/highlight.css b/preview/pr-36/_styles/highlight.css new file mode 100644 index 0000000000..a8cf7d3cee --- /dev/null +++ b/preview/pr-36/_styles/highlight.css @@ -0,0 +1,6 @@ +mark { + background: #fef08a; + color: #000000; +} + +/*# sourceMappingURL=highlight.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/highlight.css.map b/preview/pr-36/_styles/highlight.css.map new file mode 100644 index 0000000000..957ceb13db --- /dev/null +++ b/preview/pr-36/_styles/highlight.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["highlight.scss"],"names":[],"mappings":"AAAA;EACE;EACA","sourcesContent":["mark {\n background: #fef08a;\n color: #000000;\n}\n"],"file":"highlight.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/icon.css b/preview/pr-36/_styles/icon.css new file mode 100644 index 0000000000..ab61327d04 --- /dev/null +++ b/preview/pr-36/_styles/icon.css @@ -0,0 +1,15 @@ +.icon { + font-size: 1em; +} + +span.icon { + line-height: 1; +} + +span.icon > svg { + position: relative; + top: 0.1em; + height: 1em; +} + +/*# sourceMappingURL=icon.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/icon.css.map b/preview/pr-36/_styles/icon.css.map new file mode 100644 index 0000000000..22298685e4 --- /dev/null +++ b/preview/pr-36/_styles/icon.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["icon.scss"],"names":[],"mappings":"AAAA;EACE;;;AAGF;EACE;;;AAGF;EACE;EACA;EACA","sourcesContent":[".icon {\n font-size: 1em;\n}\n\nspan.icon {\n line-height: 1;\n}\n\nspan.icon > svg {\n position: relative;\n top: 0.1em;\n height: 1em;\n}\n"],"file":"icon.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/image.css b/preview/pr-36/_styles/image.css new file mode 100644 index 0000000000..70340d334d --- /dev/null +++ b/preview/pr-36/_styles/image.css @@ -0,0 +1,6 @@ +img { + max-width: 100%; + max-height: 100%; +} + +/*# sourceMappingURL=image.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/image.css.map b/preview/pr-36/_styles/image.css.map new file mode 100644 index 0000000000..e88ec450d0 --- /dev/null +++ b/preview/pr-36/_styles/image.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["image.scss"],"names":[],"mappings":"AAAA;EACE;EACA","sourcesContent":["img {\n max-width: 100%;\n max-height: 100%;\n}\n"],"file":"image.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/link.css b/preview/pr-36/_styles/link.css new file mode 100644 index 0000000000..3235e80303 --- /dev/null +++ b/preview/pr-36/_styles/link.css @@ -0,0 +1,15 @@ +a { + color: var(--primary); + transition: color var(--transition); + overflow-wrap: break-word; +} + +a:hover { + color: var(--text); +} + +a:not([href]) { + color: var(--text); +} + +/*# sourceMappingURL=link.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/link.css.map b/preview/pr-36/_styles/link.css.map new file mode 100644 index 0000000000..964355085e --- /dev/null +++ b/preview/pr-36/_styles/link.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["link.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE","sourcesContent":["a {\n color: var(--primary);\n transition: color var(--transition);\n overflow-wrap: break-word;\n}\n\na:hover {\n color: var(--text);\n}\n\na:not([href]) {\n color: var(--text);\n}\n"],"file":"link.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/list.css b/preview/pr-36/_styles/list.css new file mode 100644 index 0000000000..181f8a2b6c --- /dev/null +++ b/preview/pr-36/_styles/list.css @@ -0,0 +1,22 @@ +ul, +ol { + margin: 20px 0; + padding-left: 40px; +} + +ul { + list-style-type: square; +} + +li { + margin: 5px 0; + padding-left: 10px; + text-align: justify; + line-height: var(--spacing); +} +li ul, +li ol { + margin: 0; +} + +/*# sourceMappingURL=list.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/list.css.map b/preview/pr-36/_styles/list.css.map new file mode 100644 index 0000000000..a3ab1ed0a0 --- /dev/null +++ b/preview/pr-36/_styles/list.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["list.scss"],"names":[],"mappings":"AAAA;AAAA;EAEE;EACA;;;AAGF;EACE;;;AAGF;EACE;EACA;EACA;EACA;;AAEA;AAAA;EAEE","sourcesContent":["ul,\nol {\n margin: 20px 0;\n padding-left: 40px;\n}\n\nul {\n list-style-type: square;\n}\n\nli {\n margin: 5px 0;\n padding-left: 10px;\n text-align: justify;\n line-height: var(--spacing);\n\n ul,\n ol {\n margin: 0;\n }\n}\n"],"file":"list.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/main.css b/preview/pr-36/_styles/main.css new file mode 100644 index 0000000000..f72eb0d37e --- /dev/null +++ b/preview/pr-36/_styles/main.css @@ -0,0 +1,7 @@ +main { + display: flex; + flex-direction: column; + flex-grow: 1; +} + +/*# sourceMappingURL=main.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/main.css.map b/preview/pr-36/_styles/main.css.map new file mode 100644 index 0000000000..a2a0fa8dc5 --- /dev/null +++ b/preview/pr-36/_styles/main.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["main.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA","sourcesContent":["main {\n display: flex;\n flex-direction: column;\n flex-grow: 1;\n}\n"],"file":"main.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/paragraph.css b/preview/pr-36/_styles/paragraph.css new file mode 100644 index 0000000000..7e46c39156 --- /dev/null +++ b/preview/pr-36/_styles/paragraph.css @@ -0,0 +1,7 @@ +p { + margin: 20px 0; + text-align: justify; + line-height: var(--spacing); +} + +/*# sourceMappingURL=paragraph.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/paragraph.css.map b/preview/pr-36/_styles/paragraph.css.map new file mode 100644 index 0000000000..7eb50a684e --- /dev/null +++ b/preview/pr-36/_styles/paragraph.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["paragraph.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA","sourcesContent":["p {\n margin: 20px 0;\n text-align: justify;\n line-height: var(--spacing);\n}\n"],"file":"paragraph.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/portrait.css b/preview/pr-36/_styles/portrait.css new file mode 100644 index 0000000000..c9ef0df0db --- /dev/null +++ b/preview/pr-36/_styles/portrait.css @@ -0,0 +1,75 @@ +.portrait-wrapper { + display: contents; +} + +.portrait { + position: relative; + display: inline-flex; + justify-content: center; + align-items: center; + flex-direction: column; + gap: 20px; + margin: 20px; + width: 175px; + max-width: calc(100% - 20px - 20px); + text-decoration: none; +} + +.portrait[data-style=small] { + width: 100px; +} + +.portrait[data-style=tiny] { + flex-direction: row; + gap: 15px; + width: unset; + text-align: left; +} + +.portrait-image { + width: 100%; + aspect-ratio: 1/1; + border-radius: 999px; + object-fit: cover; + box-shadow: var(--shadow); +} + +.portrait[data-style=tiny] .portrait-image { + width: 50px; +} + +.portrait[data-style=tiny] .portrait-role { + display: none; +} + +.portrait-text { + display: flex; + flex-direction: column; +} + +.portrait-name { + font-family: var(--heading); + font-weight: var(--semi-bold); +} + +.portrait-role .icon { + position: absolute; + left: 0; + top: 0; + display: flex; + justify-content: center; + align-items: center; + width: calc(20px + 10%); + aspect-ratio: 1/1; + border-radius: 999px; + background: var(--background); + box-shadow: var(--shadow); + transform: translate(14%, 14%); +} + +.portrait[data-style=small] .portrait-role .icon { + left: -2px; + top: -2px; +} + +/*# sourceMappingURL=portrait.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/portrait.css.map b/preview/pr-36/_styles/portrait.css.map new file mode 100644 index 0000000000..37c9601f67 --- /dev/null +++ b/preview/pr-36/_styles/portrait.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["portrait.scss"],"names":[],"mappings":"AAAA;EACE;;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE;EACA;EACA;EACA;;;AAGF;EACE;EACA;EACA;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE;;;AAGF;EACE;EACA;;;AAGF;EACE;EACA;;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;EACA","sourcesContent":[".portrait-wrapper {\n display: contents;\n}\n\n.portrait {\n position: relative;\n display: inline-flex;\n justify-content: center;\n align-items: center;\n flex-direction: column;\n gap: 20px;\n margin: 20px;\n width: 175px;\n max-width: calc(100% - 20px - 20px);\n text-decoration: none;\n}\n\n.portrait[data-style=\"small\"] {\n width: 100px;\n}\n\n.portrait[data-style=\"tiny\"] {\n flex-direction: row;\n gap: 15px;\n width: unset;\n text-align: left;\n}\n\n.portrait-image {\n width: 100%;\n aspect-ratio: 1 / 1;\n border-radius: 999px;\n object-fit: cover;\n box-shadow: var(--shadow);\n}\n\n.portrait[data-style=\"tiny\"] .portrait-image {\n width: 50px;\n}\n\n.portrait[data-style=\"tiny\"] .portrait-role {\n display: none;\n}\n\n.portrait-text {\n display: flex;\n flex-direction: column;\n}\n\n.portrait-name {\n font-family: var(--heading);\n font-weight: var(--semi-bold);\n}\n\n.portrait-role .icon {\n position: absolute;\n left: 0;\n top: 0;\n display: flex;\n justify-content: center;\n align-items: center;\n width: calc(20px + 10%);\n aspect-ratio: 1 / 1;\n border-radius: 999px;\n background: var(--background);\n box-shadow: var(--shadow);\n transform: translate(14%, 14%);\n}\n\n.portrait[data-style=\"small\"] .portrait-role .icon {\n left: -2px;\n top: -2px;\n}\n"],"file":"portrait.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/post-excerpt.css b/preview/pr-36/_styles/post-excerpt.css new file mode 100644 index 0000000000..9935b248fa --- /dev/null +++ b/preview/pr-36/_styles/post-excerpt.css @@ -0,0 +1,63 @@ +.post-excerpt-container { + container-type: inline-size; +} + +.post-excerpt { + display: flex; + margin: 20px 0; + border-radius: var(--rounded); + background: var(--background); + overflow: hidden; + box-shadow: var(--shadow); +} + +.post-excerpt-image { + position: relative; + width: 200px; + flex-shrink: 0; +} + +.post-excerpt-image img { + position: absolute; + inset: 0; + width: 100%; + height: 100%; + object-fit: cover; +} + +.post-excerpt-text { + display: flex; + flex-wrap: wrap; + gap: 20px; + padding: 20px 30px; + text-align: left; +} + +.post-excerpt-text > * { + margin: 0 !important; +} + +.post-excerpt-text > a:first-child { + width: 100%; + font-weight: var(--semi-bold); +} + +.post-excerpt-text > div { + justify-content: flex-start; +} + +.post-excerpt-text > p { + width: 100%; +} + +@container (max-width: 800px) { + .post-excerpt { + flex-direction: column; + } + .post-excerpt-image { + width: unset; + height: 200px; + } +} + +/*# sourceMappingURL=post-excerpt.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/post-excerpt.css.map b/preview/pr-36/_styles/post-excerpt.css.map new file mode 100644 index 0000000000..e82e3a225d --- /dev/null +++ b/preview/pr-36/_styles/post-excerpt.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["post-excerpt.scss"],"names":[],"mappings":"AAGA;EACE;;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;EACA,OAlBW;EAmBX;;;AAIF;EACE;EACA;EACA;EACA;EACA;;;AAGF;EACE;EACA;EACA;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE;EACA;;;AAGF;EACE;;;AAGF;EACE;;;AAGF;EACE;IACE;;EAGF;IACE;IACA,QA/DS","sourcesContent":["$thumb-size: 200px;\n$wrap: 800px;\n\n.post-excerpt-container {\n container-type: inline-size;\n}\n\n.post-excerpt {\n display: flex;\n margin: 20px 0;\n border-radius: var(--rounded);\n background: var(--background);\n overflow: hidden;\n box-shadow: var(--shadow);\n}\n\n.post-excerpt-image {\n position: relative;\n width: $thumb-size;\n flex-shrink: 0;\n // box-shadow: var(--shadow);\n}\n\n.post-excerpt-image img {\n position: absolute;\n inset: 0;\n width: 100%;\n height: 100%;\n object-fit: cover;\n}\n\n.post-excerpt-text {\n display: flex;\n flex-wrap: wrap;\n gap: 20px;\n padding: 20px 30px;\n text-align: left;\n}\n\n.post-excerpt-text > * {\n margin: 0 !important;\n}\n\n.post-excerpt-text > a:first-child {\n width: 100%;\n font-weight: var(--semi-bold);\n}\n\n.post-excerpt-text > div {\n justify-content: flex-start;\n}\n\n.post-excerpt-text > p {\n width: 100%;\n}\n\n@container (max-width: #{$wrap}) {\n .post-excerpt {\n flex-direction: column;\n }\n\n .post-excerpt-image {\n width: unset;\n height: $thumb-size;\n }\n}\n"],"file":"post-excerpt.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/post-info.css b/preview/pr-36/_styles/post-info.css new file mode 100644 index 0000000000..df1827c557 --- /dev/null +++ b/preview/pr-36/_styles/post-info.css @@ -0,0 +1,32 @@ +.post-info { + display: flex; + justify-content: center; + align-items: center; + flex-wrap: wrap; + gap: 20px; + margin: 20px 0; + color: var(--dark-gray); +} + +.post-info .portrait { + margin: 0; +} + +.post-info .icon { + margin-right: 0.5em; +} + +.post-info a { + color: inherit; +} + +.post-info a:hover { + color: var(--primary); +} + +.post-info > span { + text-align: center; + white-space: nowrap; +} + +/*# sourceMappingURL=post-info.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/post-info.css.map b/preview/pr-36/_styles/post-info.css.map new file mode 100644 index 0000000000..9950b78726 --- /dev/null +++ b/preview/pr-36/_styles/post-info.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["post-info.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE;;;AAGF;EACE;;;AAGF;EACE;;;AAGF;EACE;EACA","sourcesContent":[".post-info {\n display: flex;\n justify-content: center;\n align-items: center;\n flex-wrap: wrap;\n gap: 20px;\n margin: 20px 0;\n color: var(--dark-gray);\n}\n\n.post-info .portrait {\n margin: 0;\n}\n\n.post-info .icon {\n margin-right: 0.5em;\n}\n\n.post-info a {\n color: inherit;\n}\n\n.post-info a:hover {\n color: var(--primary);\n}\n\n.post-info > span {\n text-align: center;\n white-space: nowrap;\n}\n"],"file":"post-info.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/post-nav.css b/preview/pr-36/_styles/post-nav.css new file mode 100644 index 0000000000..f7ddfaaac6 --- /dev/null +++ b/preview/pr-36/_styles/post-nav.css @@ -0,0 +1,35 @@ +.post-nav { + display: flex; + justify-content: space-between; + align-items: flex-start; + gap: 10px; + color: var(--gray); +} + +.post-nav > :first-child { + text-align: left; +} + +.post-nav > :last-child { + text-align: right; +} + +.post-nav > :first-child .icon { + margin-right: 0.5em; +} + +.post-nav > :last-child .icon { + margin-left: 0.5em; +} + +@media (max-width: 600px) { + .post-nav { + align-items: center; + flex-direction: column; + } + .post-nav > * { + text-align: center !important; + } +} + +/*# sourceMappingURL=post-nav.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/post-nav.css.map b/preview/pr-36/_styles/post-nav.css.map new file mode 100644 index 0000000000..b0699acaec --- /dev/null +++ b/preview/pr-36/_styles/post-nav.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["post-nav.scss"],"names":[],"mappings":"AAEA;EACE;EACA;EACA;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE;;;AAGF;EACE;;;AAGF;EACE;;;AAGF;EACE;IACE;IACA;;EAGF;IACE","sourcesContent":["$wrap: 600px;\n\n.post-nav {\n display: flex;\n justify-content: space-between;\n align-items: flex-start;\n gap: 10px;\n color: var(--gray);\n}\n\n.post-nav > :first-child {\n text-align: left;\n}\n\n.post-nav > :last-child {\n text-align: right;\n}\n\n.post-nav > :first-child .icon {\n margin-right: 0.5em;\n}\n\n.post-nav > :last-child .icon {\n margin-left: 0.5em;\n}\n\n@media (max-width: $wrap) {\n .post-nav {\n align-items: center;\n flex-direction: column;\n }\n\n .post-nav > * {\n text-align: center !important;\n }\n}\n"],"file":"post-nav.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/quote.css b/preview/pr-36/_styles/quote.css new file mode 100644 index 0000000000..b754635dcd --- /dev/null +++ b/preview/pr-36/_styles/quote.css @@ -0,0 +1,15 @@ +blockquote { + margin: 20px 0; + padding: 10px 20px; + border-left: solid 4px var(--light-gray); +} + +blockquote > :first-child { + margin-top: 0; +} + +blockquote > :last-child { + margin-bottom: 0; +} + +/*# sourceMappingURL=quote.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/quote.css.map b/preview/pr-36/_styles/quote.css.map new file mode 100644 index 0000000000..16c9d4c8ea --- /dev/null +++ b/preview/pr-36/_styles/quote.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["quote.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE","sourcesContent":["blockquote {\n margin: 20px 0;\n padding: 10px 20px;\n border-left: solid 4px var(--light-gray);\n}\n\nblockquote > :first-child {\n margin-top: 0;\n}\n\nblockquote > :last-child {\n margin-bottom: 0;\n}\n"],"file":"quote.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/rule.css b/preview/pr-36/_styles/rule.css new file mode 100644 index 0000000000..28ca0809d9 --- /dev/null +++ b/preview/pr-36/_styles/rule.css @@ -0,0 +1,8 @@ +hr { + margin: 40px 0; + background: var(--light-gray); + border: none; + height: 1px; +} + +/*# sourceMappingURL=rule.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/rule.css.map b/preview/pr-36/_styles/rule.css.map new file mode 100644 index 0000000000..a955dd9fee --- /dev/null +++ b/preview/pr-36/_styles/rule.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["rule.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA;EACA","sourcesContent":["hr {\n margin: 40px 0;\n background: var(--light-gray);\n border: none;\n height: 1px;\n}\n"],"file":"rule.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/search-box.css b/preview/pr-36/_styles/search-box.css new file mode 100644 index 0000000000..9766e9242f --- /dev/null +++ b/preview/pr-36/_styles/search-box.css @@ -0,0 +1,25 @@ +.search-box { + position: relative; + height: 40px; +} + +.search-box .search-input { + width: 100%; + height: 100%; + padding-right: 40px; +} + +.search-box button { + position: absolute; + inset: 0 0 0 auto; + display: flex; + justify-content: center; + align-items: center; + padding: 0; + aspect-ratio: 1/1; + background: none; + color: var(--black); + border: none; +} + +/*# sourceMappingURL=search-box.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/search-box.css.map b/preview/pr-36/_styles/search-box.css.map new file mode 100644 index 0000000000..7d45274378 --- /dev/null +++ b/preview/pr-36/_styles/search-box.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["search-box.scss"],"names":[],"mappings":"AAAA;EACE;EACA;;;AAGF;EACE;EACA;EACA;;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA","sourcesContent":[".search-box {\n position: relative;\n height: 40px;\n}\n\n.search-box .search-input {\n width: 100%;\n height: 100%;\n padding-right: 40px;\n}\n\n.search-box button {\n position: absolute;\n inset: 0 0 0 auto;\n display: flex;\n justify-content: center;\n align-items: center;\n padding: 0;\n aspect-ratio: 1 / 1;\n background: none;\n color: var(--black);\n border: none;\n}\n"],"file":"search-box.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/search-info.css b/preview/pr-36/_styles/search-info.css new file mode 100644 index 0000000000..e5c9a3050e --- /dev/null +++ b/preview/pr-36/_styles/search-info.css @@ -0,0 +1,8 @@ +.search-info { + margin: 20px 0; + text-align: center; + font-style: italic; + line-height: var(--spacing); +} + +/*# sourceMappingURL=search-info.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/search-info.css.map b/preview/pr-36/_styles/search-info.css.map new file mode 100644 index 0000000000..d825cee0b8 --- /dev/null +++ b/preview/pr-36/_styles/search-info.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["search-info.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA;EACA","sourcesContent":[".search-info {\n margin: 20px 0;\n text-align: center;\n font-style: italic;\n line-height: var(--spacing);\n}\n"],"file":"search-info.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/section.css b/preview/pr-36/_styles/section.css new file mode 100644 index 0000000000..e1ceb7db5a --- /dev/null +++ b/preview/pr-36/_styles/section.css @@ -0,0 +1,35 @@ +section { + padding: 40px max(40px, (100% - 1000px) / 2); + transition: background var(--transition), color var(--transition); +} + +section[data-size=wide] { + padding: 40px; +} + +section[data-size=full] { + padding: 0; +} + +section[data-size=full] > * { + margin: 0; + border-radius: 0; +} + +section[data-size=full] img { + border-radius: 0; +} + +main > section:last-of-type { + flex-grow: 1; +} + +main > section:nth-of-type(odd) { + background: var(--background); +} + +main > section:nth-of-type(even) { + background: var(--background-alt); +} + +/*# sourceMappingURL=section.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/section.css.map b/preview/pr-36/_styles/section.css.map new file mode 100644 index 0000000000..73ffca09c9 --- /dev/null +++ b/preview/pr-36/_styles/section.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["section.scss"],"names":[],"mappings":"AAGA;EACE;EACA;;;AAGF;EACE,SARQ;;;AAWV;EACE;;;AAGF;EACE;EACA;;;AAGF;EACE;;;AAGF;EACE;;;AAGF;EACE;;;AAGF;EACE","sourcesContent":["$page: 1000px;\n$padding: 40px;\n\nsection {\n padding: $padding max($padding, calc((100% - $page) / 2));\n transition: background var(--transition), color var(--transition);\n}\n\nsection[data-size=\"wide\"] {\n padding: $padding;\n}\n\nsection[data-size=\"full\"] {\n padding: 0;\n}\n\nsection[data-size=\"full\"] > * {\n margin: 0;\n border-radius: 0;\n}\n\nsection[data-size=\"full\"] img {\n border-radius: 0;\n}\n\nmain > section:last-of-type {\n flex-grow: 1;\n}\n\nmain > section:nth-of-type(odd) {\n background: var(--background);\n}\n\nmain > section:nth-of-type(even) {\n background: var(--background-alt);\n}\n"],"file":"section.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/table.css b/preview/pr-36/_styles/table.css new file mode 100644 index 0000000000..ef83c46278 --- /dev/null +++ b/preview/pr-36/_styles/table.css @@ -0,0 +1,17 @@ +table { + margin: 40px auto; + border-collapse: collapse; + overflow-wrap: anywhere; +} + +th { + font-weight: var(--semi-bold); +} + +th, +td { + padding: 10px 15px; + border: solid 1px var(--light-gray); +} + +/*# sourceMappingURL=table.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/table.css.map b/preview/pr-36/_styles/table.css.map new file mode 100644 index 0000000000..c3a3f0686c --- /dev/null +++ b/preview/pr-36/_styles/table.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["table.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA;;;AAGF;EACE;;;AAGF;AAAA;EAEE;EACA","sourcesContent":["table {\n margin: 40px auto;\n border-collapse: collapse;\n overflow-wrap: anywhere;\n}\n\nth {\n font-weight: var(--semi-bold);\n}\n\nth,\ntd {\n padding: 10px 15px;\n border: solid 1px var(--light-gray);\n}\n"],"file":"table.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/tags.css b/preview/pr-36/_styles/tags.css new file mode 100644 index 0000000000..909815e88b --- /dev/null +++ b/preview/pr-36/_styles/tags.css @@ -0,0 +1,33 @@ +.tags { + display: flex; + justify-content: center; + align-items: center; + flex-wrap: wrap; + gap: 10px; + max-width: 100%; + margin: 20px 0; +} + +.tag { + max-width: 100%; + margin: 0; + padding: 5px 10px; + border-radius: 999px; + background: var(--secondary); + color: var(--text); + text-decoration: none; + overflow: hidden; + text-overflow: ellipsis; + white-space: nowrap; + transition: background var(--transition), color var(--transition); +} + +.tag:hover { + background: var(--light-gray); +} + +.tag[data-active] { + background: var(--light-gray); +} + +/*# sourceMappingURL=tags.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/tags.css.map b/preview/pr-36/_styles/tags.css.map new file mode 100644 index 0000000000..ae75420a61 --- /dev/null +++ b/preview/pr-36/_styles/tags.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["tags.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;;;AAGF;EACE;;;AAGF;EACE","sourcesContent":[".tags {\n display: flex;\n justify-content: center;\n align-items: center;\n flex-wrap: wrap;\n gap: 10px;\n max-width: 100%;\n margin: 20px 0;\n}\n\n.tag {\n max-width: 100%;\n margin: 0;\n padding: 5px 10px;\n border-radius: 999px;\n background: var(--secondary);\n color: var(--text);\n text-decoration: none;\n overflow: hidden;\n text-overflow: ellipsis;\n white-space: nowrap;\n transition: background var(--transition), color var(--transition);\n}\n\n.tag:hover {\n background: var(--light-gray);\n}\n\n.tag[data-active] {\n background: var(--light-gray);\n}\n"],"file":"tags.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/textbox.css b/preview/pr-36/_styles/textbox.css new file mode 100644 index 0000000000..d8ec7910a6 --- /dev/null +++ b/preview/pr-36/_styles/textbox.css @@ -0,0 +1,16 @@ +input[type=text] { + width: 100%; + height: 40px; + margin: 0; + padding: 5px 10px; + border: solid 1px var(--light-gray); + border-radius: var(--rounded); + background: var(--background); + color: var(--text); + font-family: inherit; + font-size: inherit; + appearance: none; + box-shadow: var(--shadow); +} + +/*# sourceMappingURL=textbox.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/textbox.css.map b/preview/pr-36/_styles/textbox.css.map new file mode 100644 index 0000000000..de78ed438d --- /dev/null +++ b/preview/pr-36/_styles/textbox.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["textbox.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA;EACA","sourcesContent":["input[type=\"text\"] {\n width: 100%;\n height: 40px;\n margin: 0;\n padding: 5px 10px;\n border: solid 1px var(--light-gray);\n border-radius: var(--rounded);\n background: var(--background);\n color: var(--text);\n font-family: inherit;\n font-size: inherit;\n appearance: none;\n box-shadow: var(--shadow);\n}\n"],"file":"textbox.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/tooltip.css b/preview/pr-36/_styles/tooltip.css new file mode 100644 index 0000000000..28b590ebf9 --- /dev/null +++ b/preview/pr-36/_styles/tooltip.css @@ -0,0 +1,72 @@ +.tippy-box { + background: var(--background); + color: var(--text); + padding: 7.5px; + text-align: left; + box-shadow: var(--shadow); +} + +.tippy-arrow { + width: 30px; + height: 30px; +} + +.tippy-arrow:before { + width: 10px; + height: 10px; + background: var(--background); + box-shadow: var(--shadow); +} + +.tippy-arrow { + overflow: hidden; + pointer-events: none; +} + +.tippy-box[data-placement=top] .tippy-arrow { + inset: unset; + top: 100%; +} + +.tippy-box[data-placement=bottom] .tippy-arrow { + inset: unset; + bottom: 100%; +} + +.tippy-box[data-placement=left] .tippy-arrow { + inset: unset; + left: 100%; +} + +.tippy-box[data-placement=right] .tippy-arrow { + inset: unset; + right: 100%; +} + +.tippy-arrow:before { + border: unset !important; + transform-origin: center !important; + transform: translate(-50%, -50%) rotate(45deg) !important; +} + +.tippy-box[data-placement=top] .tippy-arrow:before { + left: 50% !important; + top: 0 !important; +} + +.tippy-box[data-placement=bottom] .tippy-arrow:before { + left: 50% !important; + top: 100% !important; +} + +.tippy-box[data-placement=left] .tippy-arrow:before { + left: 0 !important; + top: 50% !important; +} + +.tippy-box[data-placement=right] .tippy-arrow:before { + left: 100% !important; + top: 50% !important; +} + +/*# sourceMappingURL=tooltip.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/tooltip.css.map b/preview/pr-36/_styles/tooltip.css.map new file mode 100644 index 0000000000..6b52e915fb --- /dev/null +++ b/preview/pr-36/_styles/tooltip.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["tooltip.scss"],"names":[],"mappings":"AAAA;EACE;EACA;EACA;EACA;EACA;;;AAGF;EACE;EACA;;;AAGF;EACE;EACA;EACA;EACA;;;AAIF;EACE;EACA;;;AAEF;EACE;EACA;;;AAEF;EACE;EACA;;;AAEF;EACE;EACA;;;AAEF;EACE;EACA;;;AAEF;EACE;EACA;EACA;;;AAEF;EACE;EACA;;;AAEF;EACE;EACA;;;AAEF;EACE;EACA;;;AAEF;EACE;EACA","sourcesContent":[".tippy-box {\n background: var(--background);\n color: var(--text);\n padding: 7.5px;\n text-align: left;\n box-shadow: var(--shadow);\n}\n\n.tippy-arrow {\n width: 30px;\n height: 30px;\n}\n\n.tippy-arrow:before {\n width: 10px;\n height: 10px;\n background: var(--background);\n box-shadow: var(--shadow);\n}\n\n// correct tippy arrow styles to support intuitive arrow styles above\n.tippy-arrow {\n overflow: hidden;\n pointer-events: none;\n}\n.tippy-box[data-placement=\"top\"] .tippy-arrow {\n inset: unset;\n top: 100%;\n}\n.tippy-box[data-placement=\"bottom\"] .tippy-arrow {\n inset: unset;\n bottom: 100%;\n}\n.tippy-box[data-placement=\"left\"] .tippy-arrow {\n inset: unset;\n left: 100%;\n}\n.tippy-box[data-placement=\"right\"] .tippy-arrow {\n inset: unset;\n right: 100%;\n}\n.tippy-arrow:before {\n border: unset !important;\n transform-origin: center !important;\n transform: translate(-50%, -50%) rotate(45deg) !important;\n}\n.tippy-box[data-placement=\"top\"] .tippy-arrow:before {\n left: 50% !important;\n top: 0 !important;\n}\n.tippy-box[data-placement=\"bottom\"] .tippy-arrow:before {\n left: 50% !important;\n top: 100% !important;\n}\n.tippy-box[data-placement=\"left\"] .tippy-arrow:before {\n left: 0 !important;\n top: 50% !important;\n}\n.tippy-box[data-placement=\"right\"] .tippy-arrow:before {\n left: 100% !important;\n top: 50% !important;\n}\n"],"file":"tooltip.css"} \ No newline at end of file diff --git a/preview/pr-36/_styles/util.css b/preview/pr-36/_styles/util.css new file mode 100644 index 0000000000..995ea77cdd --- /dev/null +++ b/preview/pr-36/_styles/util.css @@ -0,0 +1,13 @@ +.left { + text-align: left; +} + +.center { + text-align: center; +} + +.right { + text-align: right; +} + +/*# sourceMappingURL=util.css.map */ \ No newline at end of file diff --git a/preview/pr-36/_styles/util.css.map b/preview/pr-36/_styles/util.css.map new file mode 100644 index 0000000000..c21a68d3fa --- /dev/null +++ b/preview/pr-36/_styles/util.css.map @@ -0,0 +1 @@ +{"version":3,"sourceRoot":"","sources":["util.scss"],"names":[],"mappings":"AAAA;EACE;;;AAGF;EACE;;;AAGF;EACE","sourcesContent":[".left {\n text-align: left;\n}\n\n.center {\n text-align: center;\n}\n\n.right {\n text-align: right;\n}\n"],"file":"util.css"} \ No newline at end of file diff --git a/preview/pr-36/about/index.html b/preview/pr-36/about/index.html new file mode 100644 index 0000000000..e32879a42f --- /dev/null +++ b/preview/pr-36/about/index.html @@ -0,0 +1,741 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +About | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+

About

+ +

We are a small group of dedicated software developers with the Department of Biomedical Informatics at the University of Colorado Anschutz. +We support the labs, faculty, and staff within the Department, as well as external groups via collaboration.

+ +

What we do

+ +

Our primary focus is creating high quality software and maintaining existing software. +We have a diverse team with a wide range of experience and expertise in software projects related to data-science, biology, medicine, statistics, and machine learning.

+ +

We can take a lab’s ideas and scientific work, and turn them into a fully-realized, complete package of software, for both experts and lay-persons alike, that enables exploration of data, dissemination of knowledge, collaboration, advanced analyses, new insights, or lots more you could imagine.

+ +

Some of the things we do are:

+ + + +

But the best way to understand the things we do is by looking at the code and using the software yourself:

+ + +
+ + + + + +
+ + +

Teaching and communication

+ +

Whenever we can, we like to share our knowledge and skills to others. +We believe this benefits the community we operate in and allows us to create better software together.

+ +

On this website, we have a blog where we occasionally post tips, tricks, and other insights related to Git, workflows, code quality, and more.

+ +

We have given workshops and personalized lessons related to Docker, cloud services, and more. +We’re always happy to set up a session to discuss technical trade whenever someone has the need.

+ +

Scope of our work

+ +

Being central to the department, and not strictly associated with any particular lab or group within it, we need to ensure that we divide up our time and effort fairly. +While we can do things like build full-stack apps from scratch and maintain complex infrastructure, the projects we take on tend to be small to medium size so that we leave ourselves available to others who need our help. +Certain projects that are very large and long term in scope, such as ones that need to be HIPAA compliant, will fall outside of our purview and might lead you to hire a dedicated developer to fill your needs. +That said, we can still provide partial support as a consulting body, a repository of information, a hiring advisor, and more.

+ +

Contact

+ +

Request Support

+ +

Start here to establish a project and work with us.

+ + + +

Book a Meeting

+ +

Schedule a meeting with us about an established project. +If you haven’t met with us yet on this particular project, please start by requesting support above.

+ + + +

In the notes field, please specify which team members are optional/required for this meeting. +Also list any additional emails, and we’ll forward the calendar invite to them.

+ +

Chat

+ +

For general questions or technical help, we also have weekly open-office hours, Thursdays at 2:00 PM Mountain Time in the following Zoom room. +Feel free to stop by!

+ + + + + +

You can also come to the Zoom room if you’re unsure about something with the requesting support process mentioned above.

+ +

The Team

+ + + + + + + + +
+ + +
+ + + + + + + diff --git a/preview/pr-36/blog/index.html b/preview/pr-36/blog/index.html new file mode 100644 index 0000000000..9ee21f1095 --- /dev/null +++ b/preview/pr-36/blog/index.html @@ -0,0 +1,2801 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Blog | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+

+Blog

+ + + + + + + +
+ +

2024

+ +
+
+ + + + + + +
+ Leveraging Kùzu and Cypher for Advanced Data Analysis + + + + + + + + + + + + + + + +

+ + +Graph databases can offer a more natural and intuitive way to model and explore relationships within data. +In this post, we’ll dive into the world of graph databases, focusing on Kùzu, an embedded graph database and query engine for a number of languages, and Cypher, a powerful query language designed for graph data. +We’ll explore how these tools can transform your data management and analysis workflows, provide insights into their capabilities, and discuss when it might be more appropriate to use server-based solutions. +Whether you’re a research software developer looking to integrate advanced graph processing into your applications or simply curious about the benefits of graph databases, this guide will equip you with the knowledge to harness the full potential of graph data. + + +

+
+
+
+ +
+
+ + + + + + +
+ Parquet: Crafting Data Bridges for Efficient Computation + + + + + + + + + + + + + + + +

+ +Apache Parquet is a columnar and strongly-typed tabular data storage format built for scalable processing which is widely compatible with many data models, programming languages, and software systems. +Parquet files (typically denoted with a .parquet filename extension) are typically compressed within the format itself and are often used in embedded or cloud-based high-performance scenarios. +It has grown in popularity since it was introduced in 2013 and is used as a core data storage technology in many organizations. +This article will introduce the Parquet format from a research data engineering perspective. + +

+
+
+
+ +
+
+ + + + + + +
+ Navigating Dependency Chaos with Lockfiles + + + + + + + + + + + + + + + +

+ +Writing software often entails using code from other people to solve common challenges and take advantage of existing work. +External software used by a specific project can be called a “dependency” (the software “depends” on that external work to accomplish tasks). +Collections of software are oftentimes made available as “packages” through various platforms. +Package management for dependencies, the task of managing collections of dependencies for a specific project, is a specialized area of software development that can involve the use of unique tools and files. +This article will cover package dependency management through special files generally referred to as “lockfiles”. + +

+
+
+
+ +
+
+ + + + + + +
+ Python Memory Management and Troubleshooting + + + + + + + + + + + + + + + +

+ +Have you ever run Python code only to find it taking forever to complete or sometime abruptly ending with an error like: 123456 Killed or killed (program exited with code: 137)? +You may have experienced memory resource or management challenges associated with these scenarios. +This post will cover some computer memory definitions, how Python makes use of computer memory, and share some tools which may help with these types of challenges. + +

+
+
+
+ +

2023

+ +
+
+ + + + + + +
+ Tip of the Week: Codesgiving - Open-source Contribution Walkthrough + + + + + + + + + + + + + + + +

+ +Thanksgiving is a holiday practiced in many countries which focuses on gratitude for good harvests of the preceding year. +In the United States, we celebrate Thanksgiving on the fourth Thursday of November each year often by eating meals we create together with others. +This post channels the spirit of Thanksgiving by giving our thanks through code as a “Codesgiving”, acknowledging and creating better software together. + +

+
+
+
+ +
+
+ + + + + + +
+ Tip of the Week: Data Quality Validation through Software Testing Techniques + + + + + + + + + + + + + + + +

+ +Data orientated software development can benefit from a specialized focus on varying aspects of data quality validation. +We can use software testing techniques to validate certain qualities of the data in order to meet a declarative standard (where one doesn’t need to guess or rediscover known issues). +These come in a number of forms and generally follow existing software testing concepts which we’ll expand upon below. +This article will cover a few tools which leverage these techniques for addressing data quality validation testing. + +

+
+
+
+ +
+
+ + + + + + +
+ Tip of the Week: Python Packaging as Publishing + + + + + + + + + + + + + + + +

+ + +Python packaging is the craft of preparing for and reaching distribution of your Python work to wider audiences. Following conventions for packaging help your software work become more understandable, trustworthy, and connected (to others and their work). Taking advantage of common packaging practices also strengthens our collective superpowers: collaboration. This post will cover preparation aspects of packaging, readying software work for wider distribution. + + +

+
+
+
+ +
+
+ + + + + + +
+ Tip of the Week: Using Python and Anaconda with the Alpine HPC Cluster + + + + + + + + + + + + + + + +

+ + +This post is intended to help demonstrate the use of Python on Alpine, a High Performance Compute (HPC) cluster hosted by the University of Colorado Boulder’s Research Computing. +We use Python here by way of Anaconda environment management to run code on Alpine. +This readme will cover a background on the technologies and how to use the contents of an example project repository as though it were a project you were working on and wanting to run on Alpine. + + +

+
+
+
+ +
+
+ + + + + + +
+ Tip of the Week: Automate Software Workflows with GitHub Actions + + + + + + + + + + + + + + + +

+ + +There are many routine tasks which can be automated to help save time and increase reproducibility in software development. GitHub Actions provides one way to accomplish these tasks using code-based workflows and related workflow implementations. This type of automation is commonly used to perform tests, builds (preparing for the delivery of the code), or delivery itself (sending the code or related artifacts where they will be used). + + +

+
+
+
+ +
+
+ + + + + + +
+ Tip of the Week: Branch, Review, and Learn + + + + + + + + + + + + + + + +

+ + +Git provides a feature called branching which facilitates parallel and segmented programming work through commits with version control. Using branching enables both work concurrency (multiple people working on the same repository at the same time) as well as a chance to isolate and review specific programming tasks. This article covers some conceptual best practices with branching, reviewing, and merging code using Github. + + +

+
+
+
+ +
+
+ + + + + + +
+ Tip of the Week: Software Linting with R + + + + + + + + + + + + + + + +

+ + +This article covers using the software technique of linting on R code in order to improve code quality, development velocity, and collaboration. + + +

+
+
+
+ +
+
+ + + + + + +
+ Tip of the Week: Timebox Your Software Work + + + + + + + + + + + + + + + +

+ + +Programming often involves long periods of problem solving which can sometimes lead to unproductive or exhausting outcomes. This article covers one way to avoid less productive time expense or protect yourself from overexhaustion through a technique called “timeboxing” (also sometimes referenced as “timeblocking”). + + +

+
+
+
+ +
+
+ + + + + + +
+ Tip of the Week: Linting Documentation as Code + + + + + + + + + + + + + + + +

+ + +Software documentation is sometimes treated as a less important or secondary aspect of software development. Treating documentation as code allows developers to version control the shared understanding and knowledge surrounding a project. Leveraging this paradigm also enables the use of tools and patterns which have been used to strengthen code maintenance. This article covers one such pattern: linting, or static analysis, for documentation treated like code. + + +

+
+
+
+ +

2022

+ +
+
+ + + + + + +
+ Tip of the Week: Remove Unused Code to Avoid Software Decay + + + + + + + + + + + + + + + +

+ + +The act of creating software often involves many iterations of writing, personal collaborations, and testing. During this process it’s common to lose awareness of code which is no longer used, and thus may not be tested or otherwise linted. Unused code may contribute to “software decay”, the gradual diminishment of code quality or functionality. This post will cover software decay and strategies for addressing unused code to help keep your code quality high. + + +

+
+
+
+ +
+
+ + + + + + +
+ Tip of the Week: Data Engineering with SQL, Arrow and DuckDB + + + + + + + + + + + + + + + +

+ + +Apache Arrow is a language-independent and high performance data format useful in many scenarios. DuckDB is an in-process SQL-based data management system which is Arrow-compatible. In addition to providing a SQLite-like database format, DuckDB also provides a standardized and high performance way to work with Arrow data where otherwise one may be forced to language-specific data structures or transforms. + + +

+
+
+
+ +
+
+ + + + + + +
+ Tip of the Week: Diagrams as Code + + + + + + + + + + + + + + + +

+ + +Diagrams can be a useful way to illuminate and communicate ideas. Free-form drawing or drag and drop tools are one common way to create diagrams. With this tip of the week we introduce another option: diagrams as code (DaC), or creating diagrams by using code. + + +

+
+
+
+ +
+
+ + + + + + +
+ Tip of the Week: Use Linting Tools to Save Time + + + + + + + + + + + + + + + +

+ + +Have you ever found yourself spending hours formatting your code so it looks just right? Have you ever caught a duplicative import statement in your code? We recommend using open source linting tools to help avoid common issues like these and save time. + + +

+
+
+
+ + + +
+ + + tip-of-the-week + + + + linting + + + + static-analysis + + + + software + + + + diagrams + + + + mermaid + + + + markdown + + + + data + + + + sql + + + + dataframes + + + + arrow + + + + duckdb + + + + code-quality + + + + software-decay + + + + code-decay + + + + vulture + + + + pylint + + + + coverage.py + + + + documentation + + + + docsascode + + + + staticanalysis + + + + timeboxing + + + + timeblocking + + + + productivity + + + + modularization + + + + projectmanagement + + + + r + + + + continuous-testing + + + + git + + + + branching + + + + pull-requests + + + + merging + + + + github + + + + workflow + + + + github-actions + + + + continuous-integration + + + + python + + + + anaconda + + + + high-performance-compute + + + + slurm + + + + globus + + + + packaging + + + + publishing + + + + software-design + + + + understandability + + + + environment-management + + + + data-quality + + + + data-testing + + + + testing + + + + design-by-contract + + + + component-based-design + + + + hoare-logic + + + + data-as-code + + + + codesgiving + + + + open-source + + + + contributions + + + + development-strategy + + + + process + + + + tip-of-the-month + + + + memory + + + + memory-management + + + + memory-allocators + + + + software-education + + + + lockfiles + + + + dependency-management + + + + software-ecosystems + + + + dependency-chaos + + + + research-data-engineering + + + + file-formats + + + + large-data + + + + data-performance + + + + parquet + + + + graph-data + + + + databases + + + + cypher + + + + data-interoperability + + +
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/feed.xml b/preview/pr-36/feed.xml new file mode 100644 index 0000000000..ef3fb4a59f --- /dev/null +++ b/preview/pr-36/feed.xml @@ -0,0 +1,3219 @@ +Jekyll2024-05-29T14:15:00+00:00/set-website/preview/pr-36/feed.xmlSoftware Engineering TeamThe software engineering team of the Department of Biomedical Informatics at the University of Colorado AnschutzLeveraging Kùzu and Cypher for Advanced Data Analysis2024-05-24T00:00:00+00:002024-05-29T14:08:50+00:00/set-website/preview/pr-36/2024/05/24/Leveraging-K%C3%B9zu-and-Cypher-for-Advanced-Data-AnalysisLeveraging Kùzu and Cypher for Advanced Data Analysis + +
+ + +
+ +

These blog posts are intended to provide software tips, concepts, and tools geared towards helping you achieve your goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any questions or suggestions for blog posts, please don’t hesitate to reach out!

+ +
+
+ +

Introduction

+ +
+ + (Image sourced from https://github.com/kuzudb/kuzu.) + + +
+ (Image sourced from https://github.com/kuzudb/kuzu.) + +
+ +
+ + + +

Graph databases can offer a more natural and intuitive way to model and explore relationships within data. +In this post, we’ll dive into the world of graph databases, focusing on Kùzu, an embedded graph database and query engine for a number of languages, and Cypher, a powerful query language designed for graph data. +We’ll explore how these tools can transform your data management and analysis workflows, provide insights into their capabilities, and discuss when it might be more appropriate to use server-based solutions. +Whether you’re a research software developer looking to integrate advanced graph processing into your applications or simply curious about the benefits of graph databases, this guide will equip you with the knowledge to harness the full potential of graph data.

+ + + +

Tabular Data

+ +
+ + Tabular data is made up up rows (or records) and columns. + + +
+ Tabular data is made up up rows (or records) and columns. + +
+ +
+ +

Data are often stored in a table, or tabular format, where information is organized into rows and columns. +Each row represents a single record and each column represents attributes of that record. +Tables are particularly effective for storing and querying large volumes of data with a fixed set of columns and data types. +Despite its versatility, tabular data can become cumbersome when dealing with complex relationships and interconnected data, where a graph-based approach might be more suitable.

+ +

Graph Data

+ +
+ + Graph data is made up of nodes and edges. + + +
+ Graph data is made up of nodes and edges. + +
+ +
+ +

Graph data represents information in the form of nodes (also called vertices) and edges (connections between nodes). +This structure is useful for modeling complex relationships and interconnected data, such as social networks, biological networks, and transportation systems. +Unlike tabular data, which is often “flattened” (treating multidimensional data as singular columns) and often rigid (requiring all new data to conform to a specific schema), graph data allows for more flexible and dynamic representations.

+ +
+ + Nodes and edges may have properties in a graph. + + +
+ Nodes and edges may have properties in a graph. + +
+ +
+ +

Nodes and edges act like different kinds of tabular records within the context of graphs. +Nodes and edges can also have properties (attributes) which further provide description to a graph. +Properties are akin to columns of a particular record in tabular formats which help describe a certain record (or node). +Graph data models are particularly useful for exploring connections, performing path analysis, and uncovering patterns that may require more transformation in tabular formats.

+ +

Graph Databases

+ +
+ + Graph databases store graph data. + + +
+ Graph databases store graph data. + +
+ +
+ +

Graph databases are specialized databases designed to store, query, and manage graph data efficiently. +They use graph structures for semantic queries, with nodes, edges, and properties being stored directly in the database. +Unlike traditional relational databases that use tables, graph databases leverage the natural relationships in the data, allowing for faster retrieval and sometimes more intuitive querying of interconnected information. +This makes them ideal for applications involving complex relationships, such as social networks, supply chain management, and knowledge graphs. +Graph databases support various query languages and algorithms optimized for traversing and analyzing graph structures.

+ +

Graph Database Querying

+ +
+ + Graph data are typically queried using specialized languages such as Cypher. + + +
+ Graph data are typically queried using specialized languages such as Cypher. + +
+ +
+ +

Graph database querying involves using specialized query languages to retrieve and manipulate graph data. +Unlike SQL, which often is used for tabular databases, graph databases use languages like Cypher, Gremlin, and SPARQL, which are designed to handle graph-specific operations. +These languages allow users to perform complex queries that traverse the graph, find paths between nodes, filter based on properties, and analyze relationships. +Querying in graph databases can be highly efficient due to their ability to leverage the inherent structure of the graph, enabling fast execution of complex queries that would be cumbersome and slow in a relational database.

+ +

Cypher Query Language

+ +
MATCH (p:Person {name: 'Alice'})-[:FRIEND]->(friend)
+RETURN friend.name, friend.age
+
+ +

This query finds nodes labeled “Person” with the name “Alice” and returns the names and ages of nodes connected to Alice by a “FRIEND” relationship.

+ +

Cypher is a powerful, declarative graph query language designed specifically for querying and updating graph databases. +Originally developed for Neo4j (one of the most popular graph databases), it is known for its expressive and intuitive syntax that makes it easy to work with graph data. +Cypher allows users to perform complex queries using simple and readable patterns that resemble ASCII art, making it accessible to both developers and data scientists. +It supports a wide range of operations, including pattern matching, filtering, aggregation, and graph traversal, enabling efficient exploration and manipulation of graph structures. +For example, a basic Cypher query to find all nodes connected by a “FRIEND” relationship might look like this: MATCH (a)-[:FRIEND]->(b) RETURN a, b, which finds and returns pairs of nodes a and b where a is connected to b by a “FRIEND” relationship.

+ +

Kùzu

+ +
+ + Kùzu provides a database format and query engine accessible through Python and other languages by using Cypher queries. + + +
+ Kùzu provides a database format and query engine accessible through Python and other languages by using Cypher queries. + +
+ +
+ +

Kùzu is an embedded graph database and query engine designed to integrate seamlessly with Python, Rust, Node, C/C++, or Java software. +Kùzu is optimized for high performance and can handle complex graph queries with ease. +Querying graphs in Kùzu is performed through Cypher, providing transferrability of queries in multiple programming languages. +Kùzu also provides direct integration with export formats that allow for efficient data analysis or processing such as Pandas and Arrow. +Kùzu is particularly suitable for software developers who need to integrate graph database capabilities into their projects without the overhead of managing a separate database server.

+ +

Tabular and Graph Data Interoperation

+ +
+ + Kùzu uses tabular data as both input and output for data operations. + + +
+ Kùzu uses tabular data as both input and output for data operations. + +
+ +
+ +

Tabular data and graph data can sometimes be used in tandem in order to achieve software goals (one isn’t necessaryily better than the other or supposed to be used in isolation). +For example, Kùzu offers both data import and export to tabular formats to help with conversion and storage outside of a graph database. +This is especially helpful when working with tabular data as an input, when trying to iterate over large datasets in smaller chunks, or building integration paths to other pieces of software which aren’t Kùzu or graph data compatible.

+ +

Kùzu Tabular Data Imports

+ +
# portions of this content referenced
+# with modifications from:
+# https://docs.kuzudb.com/import/parquet/
+import pandas as pd
+import kuzu
+
+# create parquet-based data for import into kuzu
+pd.DataFrame(
+    {"name": ["Adam", "Adam", "Karissa", "Zhang"],
+     "age": [30, 40, 50, 25]}
+).to_parquet("user.parquet")
+pd.DataFrame(
+    {
+        "from": ["Adam", "Adam", "Karissa", "Zhang"],
+        "to": ["Karissa", "Zhang", "Zhang", "Noura"],
+        "since": [2020, 2020, 2021, 2022],
+    }
+).to_parquet("follows.parquet")
+
+# form a kuzu database connection
+db = kuzu.Database("./test")
+conn = kuzu.Connection(db)
+
+# use wildcard-based copy in case of multiple files
+# copy node data
+conn.execute('COPY User FROM "user*.parquet";')
+# copy edge data
+conn.execute('COPY Follows FROM "follows*.Parquet";')
+
+df = conn.execute(
+    """MATCH (a:User)-[f:Follows]->(b:User)
+    RETURN a.name, b.name, f.since;"""
+).get_as_df()
+
+ +

One way to create graphs within Kùzu is to import data from tabular datasets. +Kùzu provides functionality to convert tabular data from CSV, Parquet, or NumPy files into a graph. +This process enables seamless integration of tabular data sources into the graph database, providing the benefits of graph-based querying and analysis while leveraging the familiar structure and benefits of tabular data.

+ +

Kùzu Data Results and Exports

+ +
# portions of this content referenced 
+# with modifications from:
+# https://kuzudb.com/api-docs/python/kuzu.html
+import kuzu
+
+# form a kuzu database connection
+db = kuzu.Database("./test")
+conn = kuzu.Connection(db)
+
+query = "MATCH (u:User) RETURN u.name, u.age;"
+
+# run query and return Pandas DataFrame
+pd_df = conn.execute(query).get_as_df()
+
+# run query and return Polars DataFrame
+pl_df = conn.execute(query).get_as_pl()
+
+# run query and return PyArrow Table
+arrow_tbl = conn.execute(query).get_as_arrow()
+
+# run query and return PyTorch Geometric Data
+pyg_d = conn.execute(query).get_as_torch_geometric()
+
+# run query within COPY to export directly to file
+conn.execute("COPY (MATCH (u:User) return u.*) TO 'user.parquet';")
+
+ +

Kùzu also is flexible when it comes to receiving data from Cypher queries. +After performing a query you have the option to use a number of methods to automatically convert into various in-memory data formats, for example, Pandas DataFrames, Polars DataFrames, PyTorch Geometric (PyG) Data, or PyArrow Tables. +There are also options to export data directly to CSV or Parquet files for times where file-based data is preferred.

+ +

Concluding Thoughts

+ +

Kùzu, with its seamless integration into Python environments and efficient handling of graph data, presents a compelling solution for developers seeking embedded graph database capabilities. +Its ability to transform and query tabular data into rich graph structures opens up new possibilities for data analysis and application development. +However, it’s important to consider the scale and specific needs of your project when choosing between Kùzu and more robust server-based solutions like Neo4j. +By leveraging the right tool for the right job, whether it’s Kùzu for lightweight embedded applications or a server-based database for enterprise-scale operations, developers can unlock the full potential of graph data. Embracing these technologies allows for deeper insights, more complex data relationships, and ultimately, more powerful and efficient applications.

]]>
dave-bunten
Parquet: Crafting Data Bridges for Efficient Computation2024-03-25T00:00:00+00:002024-05-29T14:08:50+00:00/set-website/preview/pr-36/2024/03/25/Parquet-Crafting-Data-Bridges-for-Efficient-ComputationParquet: Crafting Data Bridges for Efficient Computation + +
+ + +
+ +

These blog posts are intended to provide software tips, concepts, and tools geared towards helping you achieve your goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any questions or suggestions for blog posts, please don’t hesitate to reach out!

+ +
+
+ +

Introduction

+ +
+ + figure image + + +
+ +

(Image: Vulphere, Wikimedia Commons)

+ + +

Apache Parquet is a columnar and strongly-typed tabular data storage format built for scalable processing which is widely compatible with many data models, programming languages, and software systems. +Parquet files (typically denoted with a .parquet filename extension) are typically compressed within the format itself and are often used in embedded or cloud-based high-performance scenarios. +It has grown in popularity since it was introduced in 2013 and is used as a core data storage technology in many organizations. +This article will introduce the Parquet format from a research data engineering perspective. +

+ +

Understanding the Parquet file format

+ +
+ + figure image + + +
+ +

(Image: Robert Fischbacher163, Wikimedia Commons)

+ +

Parquet began around 2013 as work by Twitter and Cloudera collaborators to help solve large data challenges (for example, in Apache Hadoop systems). +It was partially inspired by a Google Research publication: “Dremel: Interactive Analysis of Web-Scale Datasets”. +Parquet joined the Apache Software Foundation in 2015 as a Top-Level Project (TLP) (link) +The format is similar and has related goals to that of the ORC, Avro, and Feather file formats.

+ +

One definition for the word “parquet” is: “A wooden floor made of parquetry.” (Wiktionary: Parquet). +Parquetry are often used to form decorative geometric patterns in flooring. +It seems fitting to name the format this way due to how columns and values are structured (see more below), akin to constructing a beautiful ‘floor’ for your data efforts.

+ +

We cover a few pragmatic aspects of the Parquet file format below.

+ +

Columnar data storage

+ + + +
+ + Parquet organizes column values together. CSV intermixes values from multiple columns + + +
+ Parquet organizes column values together. CSV intermixes values from multiple columns + +
+ +
+ +

Parquet files store data in a “columnar” way which is distinct from other formats. +We can understand this columnar format by using plaintext comma-separated value (CSV) format as a reference point. +CSV files store data in a row-orientated way by using new lines to represent rows of values. +Reading all values of a single column in CSV often involves seeking through multiple other portions of the data by default.

+ +

Parquet files are binary in nature, optimizing storage by arranging values from individual columns in close proximity to each other. +This enables the data to be stored and retrieved more efficiently than possible with CSV files. +For example, Parquet files allow you to query individual columns without needing to traverse non-necessary column value data.

+ +

Parquet format abstractions

+ +

Row groups, column chunks, and pages

+ + + +

+ +

Parquet organizes data using row groups, columns, and pages.

+ +

Parquet files organize column data inside of row groups. +Each row group includes chunks of columns in the form of pages. +Row groups and column pages are configurable and may change depending on the configuration of your Parquet client. +Note: you don’t need to be an expert on these details to leverage and benefit from Parquet as these are often configured for default general purposes.

+ +

Page encodings

+ +

Pages within column chunks may have a number of different encodings. +Parquet encodings are often selected based on the type of data included within columns and the operational or performance needs associated with a project. +By default, Plain (PLAIN) encoding is used which means all values are stored back to back. +Another encoding type, Run Length Encoding (RLE), is often used to efficiently store columns with many consecutively repeated values. +Column encodings are sometimes set for each individual column, usually in an automatic way based on the data involved.

+ +

Compression

+ +
import os
+import pyarrow as pa
+from pyarrow import parquet
+
+# create a pyarrow table
+table = pa.Table.from_pydict(
+    {
+        "A": [1, 2, 3, 4, 5],
+        "B": ["foo", "bar", "baz", "qux", "quux"],
+        "C": [0.1, 0.2, 0.3, 0.4, 0.5],
+    }
+)
+
+# Write Parquet file with Snappy compression
+parquet.write_table(table=table, where="example.snappy.parquet", compression="SNAPPY")
+
+# Write Parquet file with Zstd compression
+parquet.write_table(table=table, where="example.zstd.parquet", compression="ZSTD")
+
+ +

Parquet files can be compressed as they’re written using parameters.

+ +

Parquet files may leverage compression to help reduce file size and increase data read performance. +Compression is applied at the page level, combining benefits from various encodings. +Data stored through Parquet is usually compressed when it is written, denoting the compression type through the filename (for example: filename.snappy.parquet). +Snappy is often used as a common compression algorithm for Parquet data. +Brotli, Gzip, ZSTD, LZ4 are also sometimes used. +It’s worth exploring what compression works best for the data and systems you use (for example, ZSTD compression may hold benefits).

+ +

“Strongly-typed” data

+ +
import pyarrow as pa
+from pyarrow import parquet
+
+# create a pyarrow table
+table = pa.Table.from_pydict(
+    {
+        "A": [1, 2, 3],
+        "B": ["foo", "bar", 1],
+        "C": [0.1, 0.2, 0.3],
+    }
+)
+
+# write the pyarrow table to a parquet file
+parquet.write_table(table=table, where="example.parquet")
+
+# raises exception:
+# ArrowTypeError: Expected bytes, got a 'int' object (for column B)
+# Note: while this is an Arrow in-memory data exception, it also
+# prevents us from attempting to perform incompatible operations
+# within the Parquet file.
+
+ +

Data value must be all of the same type within a Parquet column.

+ +

Data within Parquet is “strongly-typed”; specific data types (such as integer, string, etc.) are associated with each column, and thus value. +Attempting to store a data value type which does not match the column data type will usually result in an error. +This can lead to performance and compression benefits due to how quickly Parquet readers can determine the data type. +Strongly-typed data also embeds a kind of validation directly inside your work (data errors “shift left” and are often discovered earlier). +See here for more on data quality validation topics we’ve written about.

+ +

Complex data handling

+ +
import pyarrow as pa
+from pyarrow import parquet
+
+# create a pyarrow table with complex data types
+table = pa.Table.from_pydict(
+    {
+        "A": [{"key1": "val1"}, {"key2": "val2"}],
+        "B": [[1, 2], [3, 4]],
+        "C": [
+            bytearray("😊".encode("utf-8")),
+            bytearray("🌻".encode("utf-8")),
+        ],
+    }
+)
+
+# write the pyarrow table to a parquet file
+parquet.write_table(table=table, where="example.parquet")
+
+# read the schema of the parquet file
+print(parquet.read_schema(where="example.parquet"))
+
+# prints:
+# A: struct<key1: string, key2: string>
+#   child 0, key1: string
+#   child 1, key2: string
+# B: list<element: int64>
+#   child 0, element: int64
+# C: binary
+
+ +

Parquet file columns may contain complex data types such as nested types (lists, dictionaries) and byte arrays.

+ +

Parquet files may store many data types that are complicated or impossible to store in other formats. +For example, images may be stored using the byte array storage type. +Nested data may be stored using LIST or MAP logical types. +Dates or times may be stored using various temporal data types. +Oftentimes, complex data conversion within Parquet files is already implemented (for example, in PyArrow).

+ +

Metadata

+ +
import pyarrow as pa
+from pyarrow import parquet
+
+# create a pyarrow table
+table = pa.Table.from_pydict(
+    {
+        "A": [1, 2, 3],
+        "B": ["foo", "bar", "baz"],
+        "C": [0.1, 0.2, 0.3],
+    }
+)
+
+# add custom metadata to table
+table = table.replace_schema_metadata(metadata={"data-producer": "CU DBMI SET Blog"})
+
+# write the pyarrow table to a parquet file
+parquet.write_table(table=table, where="example.snappy.parquet", compression="SNAPPY")
+
+# read the schema
+print(parquet.read_schema(where="example.snappy.parquet"))
+
+# prints
+# A: int64
+# B: string
+# C: double
+# -- schema metadata --
+# data-producer: 'CU DBMI SET Blog'
+
+ +

Metadata are treated as a distinct and customizable components of Parquet files.

+ +

The Parquet format treats data about the data (metadata) separately from that of column value data. +Parquet metadata includes column names, data types, compression, various statistics about the file, and custom fields (in key-value form). +This metadata may be read without reading column value data which can assist with data exploration tasks (especially if the data are large).

+ +

Multi-file “datasets”

+ +
import pathlib
+import pyarrow as pa
+from pyarrow import parquet
+
+pathlib.Path("./dataset").mkdir(exist_ok=True)
+
+# create pyarrow tables
+table_1 = pa.Table.from_pydict({"A": [1]})
+table_2 = pa.Table.from_pydict({"A": [2, 3]})
+
+# write the pyarrow table to parquet files
+parquet.write_table(table=table_1, where="./dataset/example_1.parquet")
+parquet.write_table(table=table_2, where="./dataset/example_2.parquet")
+
+# read the parquet dataset
+print(parquet.ParquetDataset("./dataset").read())
+
+# prints (note that, for ex., [1] is a row group of column A)
+# pyarrow.Table
+# A: int64
+# ----
+# A: [[1],[2,3]]
+
+ +

Parquet datasets may be composed of one or many individual Parquet files.

+ +

Parquet files may be used individually or treated as a “dataset” through file groups which include the same schema (column names and types). +This means you can store “chunks” of Parquet-based data in one or many files and provides opportunities for intermixing or extending data. +When reading Parquet data this way libraries usually use the directory as a way to parse all files as a single dataset. +Multi-file datasets mean you gain the ability to store arbitrarily large amounts of data by sidestepping, for example, inode limitations.

+ +

Apache Arrow memory format integration

+ +
import pathlib
+import pyarrow as pa
+from pyarrow import parquet
+
+# create a pyarrow table
+table = pa.Table.from_pydict(
+    {
+        "A": [1, 2, 3],
+        "B": ["foo", "bar", "baz"],
+        "C": [0.1, 0.2, 0.3],
+    }
+)
+
+# write the pyarrow table to a parquet file
+parquet.write_table(table=table, where="example.parquet")
+
+# show schema of table and parquet file
+print(table.schema.types)
+print(parquet.read_schema("example.parquet").types)
+
+# prints
+# [DataType(int64), DataType(string), DataType(double)]
+# [DataType(int64), DataType(string), DataType(double)]
+
+ +

Parquet file and Arrow data types are well-aligned.

+ +

The Parquet format has robust support and integration with the Apache Arrow memory format. +This enables consistency across Parquet integration and how the data are read using various programming languages (the Arrow memory format is relatively uniform across these).

+ +

Performance with Parquet

+ +

Parquet files often outperforms traditional formats due to how it is designed. +Other data file formats may vary in performance contingent on specific configurations and system integration. +We urge you to perform your own testing to find out what works best for your circumstances. +See below for a list of references which compare Parquet to other formats.

+ + + +

How can you use Parquet?

+ +

The Parquet format is common in many data management platforms and libraries. +Below are a list of just a few popular places where you can use Parquet.

+ + + +

Concluding Thoughts

+ +

This article covered the Parquet file format including notable features and usage. +Thank you for joining us on this exploration of Parquet. +We appreciate your support, hope the content here helps with your data decisions, and look forward to continuing the exploration of data formats in future posts.

]]>
dave-bunten
Navigating Dependency Chaos with Lockfiles2024-02-20T00:00:00+00:002024-05-29T14:08:50+00:00/set-website/preview/pr-36/2024/02/20/Navigating-Dependency-Chaos-with-LockfilesNavigating Dependency Chaos with Lockfiles + +
+ + +
+ +

These blog posts are intended to provide software tips, concepts, and tools geared towards helping you achieve your goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any questions or suggestions for blog posts, please don’t hesitate to reach out!

+ +
+
+ +

Introduction

+ + +

Writing software often entails using code from other people to solve common challenges and take advantage of existing work. +External software used by a specific project can be called a “dependency” (the software “depends” on that external work to accomplish tasks). +Collections of software are oftentimes made available as “packages” through various platforms. +Package management for dependencies, the task of managing collections of dependencies for a specific project, is a specialized area of software development that can involve the use of unique tools and files. +This article will cover package dependency management through special files generally referred to as “lockfiles”. +

+ +

Why use dependencies?

+ +
+ + 'Reinvent the Wheel' comic by Randall Munroe, XKCD. + + +
+ ‘Reinvent the Wheel’ comic by Randall Munroe, XKCD. + +
+ +
+ +

There are various advantages to using packaged dependencies in your projects. +Using existing work this way practices a collective “don’t repeat yourself [or ourselves]” (DRY) among the global community of software developers to avoid reinventing the wheel. +Using dependencies allows us to make explicit decisions about the specific focus, or context, which the project will prioritize. +While it’s oftentimes easy to include and use dependencies in a project they come with risks that are important to consider.

+ +

See below for a rough list of reasons why one might opt to use specific dependencies in a project:

+ +
    +
  1. Solutions which entail a lot of edge cases (particularly error prone).
  2. +
  3. Solutions which need constant maintenance, i.e. a “frequently moving targets”.
  4. +
  5. Solutions which require special domain knowledge or training to correctly implement.
  6. +
+ +

A common dependency which demonstrates these aspects are those which assist with datetimes, timezones, and time deltas.

+ +

The dependency wilderness

+ + + + +

+ +

Dependencies are often on their own unpredictable schedule outside of your project’s control.

+ +

Using existing software package dependencies helps conserve resources but comes with unique challenges related to unpredictability (such as when those dependencies are updated). +This unpredictability can sometimes result in what’s colloquially called “dependency hell” or “dependency chaos”, where for example multiple external dependencies conflict with one another and are unable to be automatically resolved (among other issues). +These challenges can be especially frustrating due to when they occur (often outside of our personal schedule awareness) and how long they can take to debug (finding fixes sometimes entails costly trial-and-error). +It can feel like walking through a forest at night without a flashlight, constantly tripping over roots or running into stumps and branches!

+ +

Illuminating the dependency thicket

+ +

+ +

Software dependency choices may be understood through careful consideration between the cost of internal overwhelming invention vs external dependency chaos.

+ +

Dependency chaos can sometimes lead to “not invented here syndrome” where there’s less trust in external-facing work outside of an individual or group of people. +When or if this happens it can be important to understand dependencies as a scale of choices between overwhelming invention and infinite dependency chaos. +For example, to accomplish a small project it may not be wise to create a brand new programming language (towards the extreme of overwhelming invention). +On the other hand, if we depended upon all existing work within a certain context the solution may not be specialized, efficient, or resourceful enough to meet the goals within a reasonable amount of time.

+ + + +
+mindmap
+  root((Project))
+    Data storage
+      File 1
+      Database 2
+    Data x processing
+      Package X
+      Package Y
+    Integration
+      Solution A
+      Platform B
+
+ +

Dependency awareness and opportunity can be grouped into concerns and documented as part of a literature review (seen here as a mind map).

+ +

It can be helpful to reconsider existing knowledge on a topic area through formal or informal literature review (understanding that code within software is a type of literature) when thinking about the scale of decisions mentioned above. +Outlining existing work through a literature review can help with second-order thinking revision where we might benefit from reflecting on dependency decision-making again after an initial (first-order) creative process. +Each potential dependency discovered through this process can be organized using separation of concerns (SoC) under specific concern labels, or a general set of information which affects related code. +Include dependencies within your project which will helpfully limit the code produced (or SoC sections) thereby reducing the overall amount of concerns the project must maintain.

+ +

+ +

Bounded contexts along with shared or distinct components can be used to help limit the complexity of a project in helpful ways.

+ +

The concept of bounded context from domain-driven design can sometimes be used to help distinguish what is in or out of scope for a particular project as a way of reducing complexity. +Bounded context can be used as a way to draw abstract lines around a certain span of control in order to align available resources (like time and people) with the focus of the project. +It also can help promote loose coupling of software components in order to enable flexible design over time. +Without these considerations and the use of dependencies we might face “endless” software feature creep by continually adding new bounded contexts that are outside of our span of control (or resources).

+ +

Version constraints as dependency specification control

+ + + + + + + + + + + + + + + + + + + + + + +
Version constraintDescription of the version constraint
+ +`==2.1.0` + +Exactly and only version 2.1.0
+ +`>=2.0.0` + +Greater than or equal to version 2.0.0
+ +`>=2.0.0, <3.0.0` + +Greater than or equal to version 2.0.0 and less than 3.0.0
+ +`>=2.0.0, <3.0.0, !=2.5.1` + +Greater than or equal to version 2.0.0, less than 3.0.0, and anything that's not exactly version 2.5.1
+ +

Version constraint specifications provide code-based descriptions for dependency versions within your project (Pythonic version specification examples above).

+ +

Many aspects of dependency chaos arise from the fact that dependencies are updated at various times. +We often want to make certain we use the most up-to-date version of a dependency because those updates may come with performance, corrective, security, or other benefits. +To accomplish this we can use what are sometimes called dependency “version range constraints” or “compliant version specifications” to provide some flexibility in how packages are installed for our projects. +Version ranges are usually preferred to help keep software projects updated and also allow for flexible dependency resolutions (for example, when a single dependency is required by multiple other dependencies). +These are often specific to the package management system and programming language being used. +See the Python Packaging Authority’s Version Specifiers section for an example of how these version constraints work.

+ +

Many version specification constraints build upon ideas from semantic versioning (SemVer). +Generally, SemVer uses a dotted three number syntax which includes a major, minor, and patch version separated by periods. +For example, a SemVer 1.2.3 represents major version 1, minor version 2, patch 3. +Developers may use of this type of specification to help differentiate the various releases of their software and help build user confidence about expected operations. +See the Semantic Versioning specification at https://semver.org/ for more information about how SemVer works.

+ +

Version constraints can still be chaotic

+ +

+ +

Unintentional failures can occur due to timeline variations between internal projects and external dependencies.

+ +

We sometimes require repeatable behavior to be productive with a project in addition to the flexibility of version range specifications. +For example, we may want for each developer and continuous integration step to have reproducible environments even if a dependency gets updated while internal development takes place. +Dependency version constraints oftentimes aren’t enough on their own to prevent reproducibility issues from occurring. +See the above diagram for a timeline depicting how Developer B and Developer D may have different experiences despite best efforts with version constraints (Dependency A may make a release that fits the version constraint but breaks Project C when Developer D tries to modify unrelated code).

+ +

Lockfiles for reproducible version constraint behavior

+ +

+ +

Version constraint lockfiles provide one way to ensure reproducible behaviors within your projects. +Lockfiles are usually recommended to be included in source control, so one always has a complete snapshot (short of the literal full source code of the dependencies) of the project’s last known working configuration.

+ +

Lockfiles usually have the following characteristics (this varies by programming language and dependency type):

+ +
    +
  • Lockfiles capture data about existing available and installable dependencies which match a provided version constraint specification as a single file which can be added to source control.
  • +
  • Lockfiles are referenced when available to help create reproducible installations of dependencies.
  • +
  • Lockfiles are often automatically created or changed by a package or environment management tool of some kind.
  • +
  • Lockfiles focus on reproducibility of dependency installations and don’t enable dependency resolution on their own (this is instead a part of version range specification and package management tools).
  • +
  • Lockfiles are used by developers, automated procedures (as with CI/CD procedures), production deployment environments, and elsewhere to help ensure reproducibility.
  • +
+ +

See the above modified timeline for Developer B and Developer D to better understand how their project will benefit from a shared lockfile and reproducible dependency installations.

+ +

Pythonic Example

+ + + + + + + + + + + + + + + + + + + + + + +
Python Poetry command usedDescription of what occurs
+ +`poetry add pandas` + + + +- Adds a [__caret-based version constraint specification__](https://python-poetry.org/docs/dependency-specification/#caret-requirements) based on the latest release (for example `^2.2.1`) within a `pyproject.toml` file. This version constraint can be understood as `>= 2.2.1, < 2.3.0`. +- Create or update the `poetry.lock` lockfile with known compatible versions of Pandas based on the version constraint mentioned above. +- Installs the version of Pandas which matches the `pyproject.toml` and `poetry.lock` specifications. + +
+ +`poetry install` + + + +Installs the version of Pandas which matches the `pyproject.toml` and `poetry.lock` specifications (for example, within a new environment or for another developer). + +
+ +`poetry update pandas` + + + +- Poetry checks for available Pandas releases which are compatible with the version constraint (for ex. `^2.2.1`). +- If there are new versions available which match the constraint, Poetry will update the `poetry.lock` lockfile and install the matching version. + +
+ +`poetry lock` + + + +- Update all dependencies referenced in the `poetry.lock` lockfile with the latest compatible versions based on the version constraints specified within the `pyproject.toml`. +- Optionally, if the `--no-update` flag is also used, refresh the dependency versions referenced within the `poetry.lock` lockfile based on version constraints specified within the `pyproject.toml` without seeking updated dependency releases. + +
+ +

Use Poetry commands to implement dependency version constraints and lockfiles for reproducible Python project environments.

+ +

Poetry is a Python packaging and dependency management tool which implements version constraints and lockfiles to help developers maintain their software projects. +Using commands like poetry add ... and poetry lock automatically creates poetry.lock lockfiles based on specifications which are added either automatically or manually to pyproject.toml files. +Similar to other tools, Poetry can operate with or without poetry.lock lockfiles (see here for more information). +Another alternative to Poetry which makes use of lockfiles is PDM (pdm.lock files).

+ +

Avoiding over-constrained dependencies

+ +

+ +

Automated dependency checking tools like Dependabot or Renovate can be used to reduce project risk through timely dependency update changes assisted by human reviewers.

+ +

Using dependency version constraints and lockfiles are helpful for reproducibility but imply a risk of over-constraint. +Two important over-constraint considerations are:

+ +
    +
  • Bug fixes: we may perpetuate an incorrect or failing solution within our project as the result of not installing later releases of a dependency.
  • +
  • Security fixes: we may unknowingly create security risks for others through the inclusion of known security vulnerable dependency versions.
  • +
+ +

Make sure to address these risks by routinely considering whether your dependencies need to be updated (manually) or through the use of automated tools like GitHub’s Dependabot or Mend Renovate. +Tools like Dependabot or Renovate enable scheduled checks and updates to be applied to your project which can lead to a balanced way of ensuring risk reduction and productive future-focused development.

+ +

Concluding Thoughts

+ +

This article covered why dependencies are used, what complications they come with, and some tools to use addressing those challenges. +Every project can vary quite a bit when it comes to dependency management decision making and maintenance. +We hope you find success with dependency management through these and look forward to providing more information on this topic in the future.

]]>
dave-bunten
Python Memory Management and Troubleshooting2024-01-22T00:00:00+00:002024-05-29T14:08:50+00:00/set-website/preview/pr-36/2024/01/22/Python-Memory-Management-and-TroubleshootingPython Memory Management and Troubleshooting + +
+ + +
+ +

These blog posts are intended to provide software tips, concepts, and tools geared towards helping you achieve your goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any questions or suggestions for blog posts, please don’t hesitate to reach out!

+ +
+
+ +

Introduction

+ + +

Have you ever run Python code only to find it taking forever to complete or sometime abruptly ending with an error like: 123456 Killed or killed (program exited with code: 137)? +You may have experienced memory resource or management challenges associated with these scenarios. +This post will cover some computer memory definitions, how Python makes use of computer memory, and share some tools which may help with these types of challenges. +

+ +

What is Software?

+ + + + +

+ +

Computer software includes programs, documentation, and other data maintained on computer data storage.

+ +

Computer software is the collection of programs and data which are used to accomplish a specific tasks on a computer. +“A computer program is a sequence or set of instructions in a programming language for a computer to execute. It is one component of software, which also includes documentation and other intangible components.” (Wikipedia: Computer program). +Computer programs in their human-readable form are stored as source code. +Source code is often maintained on computer data storage.

+ +

What is Memory?

+ +

Computer Memory

+ +

+ +

Computer memory is a type of computer resource available for use by processes on a computer.

+ +

Computer memory, also sometimes known as “RAM” or “random-access memory”, or “dynamic memory” is a type of resource used by computer software on a computer. +“Computer memory stores information, such as data and programs for immediate use in the computer. … Main memory operates at a high speed compared to non-memory storage which is slower but less expensive and oftentimes higher in capacity. “ (Wikipedia: Computer memory). +When we execute a computer program it becomes a process (or sometimes many processes). +Processes are loaded into computer memory to follow the instructions and other data provided from their related computer programs.

+ +
+ + +
+ +

The word “speed” in the above context is sometimes used to describe the delay before an operation on a computer completes (also known as latency). +See the following on [Computer] Latency Numbers Everyone Should Know to better understand relative computer operation speeds.

+ + +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + +
Process Memory SegmentPurpose
StackContains information about sequences of program instructions as functions or subroutines.
HeapArea where memory for variables may be dynamically used.
Initialized dataIncludes global and static variables which are explicitly initialized.
Uninitialized dataIncludes global and static variables which are not explicitly initialized.
TextComprises program instructions for the process.
+ +

Process memory is divided into segments which have specific purposes (The Linux Programming Interface by Michael Kerrisk).

+ +

Memory for a process is further divided into parts which are typically called segments. +Each process memory segment has a specific purpose and way of organizing things. +For the purposes of this content we’ll focus on two of these segments: the stack and the heap. +The stack (sometimes also known as the “call stack”) includes information about sequences of program instructions packaged as units called “functions” or “subroutines”. +The stack also typically stores function local variables, arguments, and return value. +The heap is an area where variables for a program may be dynamically stored. +The stack can be thought of as a “roadmap” for what program will accomplish (including the location of things it will need to do that work). +The heap can be imagined of as a “warehouse” store (or remove) things used as part of the stack “roadmap”. +Please see The Linux Programming Interface by Michael Kerrisk, Chapter 6.3: Memory Layout of a Process for more information about processes.

+ + + + + + + + + + + + + + +
Memory Blocks
+ +
+A.) All memory blocks available. + + + +
BlockBlockBlock
+
+ +
+ +
+B.) Some memory blocks in use. + + + + +
BlockBlockBlock
+
+ +
Practical analogy
+ +
+C.) You have limited boxes to hold things. + + + +
📦📦📦
+
+ +
+ +
+D.) Two boxes are used, the other remains empty (ready for use). + + + +
📦📦📦
+
+ +
+ +

Memory blocks may be free or used at various times. They can be thought of like reusable buckets to hold things.

+ +

The heap is often further organized through the use of “blocks”. +Memory blocks are chunks of memory of a certain byte or bit size (usually all the same size) (Wikipedia: Block (data storage)). +Memory blocks may be in use or free at different times. +If the heap is a process memory “warehouse” then blocks are like “boxes” inside the warehouse.

+ +

+ +

Process memory heaps help organize memory blocks on a computer for specific procedures. Heaps may have one or many memory pools.

+ +

Blocks may be organized in hierarchical layers to manage memory efficiently or towards a specific purpose. +Blocks may sometimes be organized into pools within the process memory heap segment. +Pools are areas of the heap used to efficiently manage blocks together in specific ways. +Each heap may have one or many pools (each with sets of blocks). +If the heap is a process memory “warehouse”, and blocks are like “boxes” inside the warehouse, pools are like “shelves” for organizing and moving those boxes within the warehouse.

+ +

Memory Allocator

+ +

+ +

Memory allocators help software reserve and free computer memory resources.

+ +

Memory management is a concept which helps enable the shared use of computer memory to avoid challenges such as memory overuse (where all memory is in use and never shared to other software). +Computer memory management often occurs through the use of a memory allocator which controls how computer memory resources are used for software. +Computer software is written to interact with memory allocators to use computer memory. +Memory allocators may be used manually (with specific directions provided on when and how to use memory resources) or automatically (with an algorithmic approach of some kind). +The memory allocator usually performs the following actions with memory (in addition to others):

+ +
    +
  • “Allocation”: computer memory resource reservation (taking memory). This is sometimes also known as “alloc”, or “allocate memory”.
  • +
  • “Deallocation”: computer memory resource freeing (giving back memory for other uses). This is sometimes also known as “free”, or “freeing memory from allocation”.
  • +
+ +

Garbage Collection

+ +

+ +

Garbage collectors help free computer memory which is no longer referenced by software.

+ +

“Garbage collection (GC)” is used to describe a type of automated memory management. +GC is typically used to help reduce human error, avoid unintentional system failures, and decrease development time (through less memory-specific code). +“The garbage collector attempts to reclaim memory which was allocated by the program, but is no longer referenced; such memory is called garbage.” (Wikipedia: Garbage collection (computer science)). +A garbage collector often works in tandem with a memory allocator to help control computer memory resource usage in software development.

+ +

How Does Python Interact with Computer Memory?

+ +

Python Overview

+ +

+ +

A Python interpreter executes Python code and manages memory for Python procedures.

+ +

Python is an interpreted “high-level” programming language (Python: What is Python?). +Interpreted languages are those which include an “interpreter” which helps execute code written in a particular way (Wikipedia: Interpreter (computing)). +High-level languages such as Python often remove the requirement for software developers to manually perform memory management (Wikipedia: High-level programming language).

+ +

Python code is executed by a commonly pre-packaged and downloaded binary call the Python interpreter. +The Python interpreter reads Python code and performs memory management as the code is executed. +The CPython Python interpreter is the most commonly used interpreter for Python, and what’s use as a reference for other content here. +There are also other interpreters such as PyPy, Jython, and IronPython which all handle memory differently than the CPython interpreter.

+ +

Python’s Memory Manager

+ +

+ +

The Python memory manager helps manage memory in the heap for Python processes executed by the Python interpreter.

+ +

Memory is managed for Python software processes automatically (when unspecified) or manually (when specified) through the Python interpreter. +The Python memory manager is an abstraction which manages memory for Python software processes through the Python interpreter (Python: Memory Management). +From a high-level perspective, we assume variables and other operations written in Python will automatically allocate and deallocate memory through the Python interpreter when executed. +Python’s memory manager performs work through various memory allocators and a garbage collector (or as configured with customizations) within a private Python memory heap.

+ +

Python’s Memory Allocators

+ +

+ +

The Python memory manager by default will use pymalloc internally or malloc from the system to allocate computer memory resources.

+ +

The Python memory manager allocates memory for use through memory allocators. +Python may use one or many memory allocators depending on specifications in Python code and how the Python interpreter is configured (for example, see Python: Memory Management - Default Memory Allocators). +One way to understand Python memory allocators is through the following distinctions.

+ +
    +
  • “Python Memory Allocator” (pymalloc) +The Python interpreter is packaged with a specialized memory allocator called pymalloc. +“Python has a pymalloc allocator optimized for small objects (smaller or equal to 512 bytes) with a short lifetime.” (Python: Memory Management - The pymalloc allocator). +Ultimately, pymalloc uses C standard library dynamic memory allocation functions to implement memory work.
  • +
  • C dynamic memory allocation functions (malloc, realloc, etc.) +When pymalloc is disabled or a memory requirements exceed pymalloc’s constraints, the Python interpreter will directly use a function from the C standard library called C standard library dynamic memory allocation functions. +When C standard library dynamic memory allocation functions are used by the Python interpreter, it uses the system’s existing implementation of the C standard library.
  • +
+ +

+ +

pymalloc makes use of arenas to further organize pools within a Python process memory heap.

+ +

It’s important to note that pymalloc adds additional abstractions to how memory is organized through the use of “arenas”. +These arenas are specific to pymalloc purposes. +pymalloc may be disabled through the use of a special environment variable called PYTHONMALLOC (for example, to use only C standard library dynamic memory allocation functions as seen below). +This same environment variable may be used with debug settings in order to help troubleshoot in-depth questions.

+ +

Additional Python Memory Allocators

+ +

+ +

Python code may stipulate the use of additional memory allocators, such as mimalloc and jemalloc outside of the default Python memory manager’s operation.

+ +

Python provides the capability of customizing memory allocation through the use of custom code or non-default packages. +See below for some notable examples of additional memory allocation possibilities.

+ +
    +
  • NumPy Memory Allocation +NumPy uses custom C-API’s which are backed by C dynamic memory allocation functions (alloc, free, realloc) to help address memory management. +These interfaces can be controlled directly through NumPy to help manage memory effectively when using the package.
  • +
  • PyArrow Memory Allocators +PyArrow provides the capability to use C standard library dynamic memory allocation functions, jemalloc, or mimalloc through the PyArrow Memory Pools group of functions. +A default memory allocator is selected for use when PyArrow based on the operating system and the availability of the memory allocator on the system. +The selection of a memory allocator for use with PyArrow can be influenced by how it performs on a particular system.
  • +
+ +

Python Reference Counting

+ + + + + + + + + + + + + + + + +</table> + +_Python reference counting at a simple level works through the use of object reference increments and decrements._ +{:.center} + +As computer memory is allocated to Python processes the Python memory manager keeps track of these through the use of a [reference counter](https://en.wikipedia.org/wiki/Reference_counting). +In Python, we could label this as an "Object reference counter" because all data in Python is represented by objects ([Python: Data model](https://docs.python.org/3/reference/datamodel.html#objects-values-and-types)). +"... CPython counts how many different places there are that have a reference to an object. Such a place could be another object, or a global (or static) C variable, or a local variable in some C function." ([Python Developer's Guide: Garbage collector design](https://devguide.python.org/internals/garbage-collector/)). + +### Python's Garbage Collection + + + +_The Python garbage collector works as part of the Python memory manager to free memory which is no longer needed (based on reference count)._ +{:.center} + +Python by default uses an optional garbage collector to automatically deallocate garbage memory through the Python interpreter in CPython. +"When an object’s reference count becomes zero, the object is deallocated." ([Python Developer's Guide: Garbage collector design](https://devguide.python.org/internals/garbage-collector/)) +Python's garbage collector focuses on collecting garbage created by `pymalloc`, C memory functions, as well as other memory allocators like `mimalloc` and `jemalloc`. + +## Python Tools for Observing Memory Behavior + +### Python Built-in Tools + +```python +import gc +import sys + +# set gc in debug mode for detecting memory leaks +gc.set_debug(gc.DEBUG_LEAK) + +# create an int object +an_object = 1 + +# show the number of uncollectable references via COLLECTED +COLLECTED = gc.collect() +print(f"Uncollectable garbage references: {COLLECTED}") + +# show the reference count for an object +print(f"Reference count of `an_object`: {sys.getrefcount(an_object)}") +``` + +The [`gc` module](https://docs.python.org/3/library/gc.html) provides an interface to the Python garbage collector. +In addition, the [`sys` module](https://docs.python.org/3/library/sys.html) provides many functions which provide information about references and other details about Python objects as they are executed through the interpreter. +These functions and other packages can help software developers observe memory behaviors within Python procedures. + +### Python Package: Scalene + +
+ + Scalene provides a web interface to analyze memory, CPU, and GPU resource consumption in one spot alongside suggested areas of concern. + + +
+ Scalene provides a web interface to analyze memory, CPU, and GPU resource consumption in one spot alongside suggested areas of concern. + +
+ +
+ + +[Scalene](https://github.com/plasma-umass/scalene) is a Python package for analyzing memory, CPU, and GPU resource consumption. +It provides [a web interface](https://github.com/plasma-umass/scalene?tab=readme-ov-file#web-based-gui) to help visualize and understand how resources are consumed. +Scalene provides suggestions on which portions of your code to troubleshoot through the web interface. +Scalene can also be configured to work with [OpenAI](https://en.wikipedia.org/wiki/OpenAI) [LLM's](https://en.wikipedia.org/wiki/Large_language_model) by way of a an [OpenAI API provided by the user](https://github.com/plasma-umass/scalene?tab=readme-ov-file#ai-powered-optimization-suggestions). + +### Python Package: Memray + +
+ + Memray provides the ability to create and view flamegraphs which show how memory was consumed as a procedure executed. + + +
+ Memray provides the ability to create and view flamegraphs which show how memory was consumed as a procedure executed. + +
+ +
+ + +[Memray](https://github.com/bloomberg/memray) is a Python package to track memory allocation within Python and compiled extension modules. +Memray provides a high-level way to investigate memory performance and adds visualizations such as [flamegraphs](https://www.brendangregg.com/flamegraphs.html) (which contextualization of [stack traces](https://en.wikipedia.org/wiki/Stack_trace) and memory allocations in one spot). +Memray seeks to provide a way to overcome challenges with tracking and understanding Python and other memory allocators (such as C, C++, or Rust libraries used in tandem with a Python process). + +## Concluding Thoughts + +It's worth mentioning that this article covers only a small fraction of how and what memory is as well as how Python might make use of it. +Hopefully it clarifies the process and provides a way to get started with investigating memory within the software you work with. +Wishing you the very best in your software journey with memory! +
Processed line of codeReference count
+ +```python +a_string = "cornucopia" +``` + + +a_string: 1 +
+ +```python +reference_a_string = a_string +``` + + +a_string: 2
+(Because `a_string` is now referenced twice.) +
+ +```python +del reference_a_string +``` + + +a_string: 1
+(Because the additional reference has been deleted.) +
]]>
dave-bunten
Tip of the Week: Codesgiving - Open-source Contribution Walkthrough2023-11-15T00:00:00+00:002024-05-29T14:08:50+00:00/set-website/preview/pr-36/2023/11/15/Codesgiving-Open-source-Contribution-WalkthroughTip of the Week: Codesgiving - Open-source Contribution Walkthrough + +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ +

Introduction

+ +
+ + What good harvests from open-source have you experienced this year? + + +
+ What good harvests from open-source have you experienced this year? + +
+ +
+ + +

Thanksgiving is a holiday practiced in many countries which focuses on gratitude for good harvests of the preceding year. +In the United States, we celebrate Thanksgiving on the fourth Thursday of November each year often by eating meals we create together with others. +This post channels the spirit of Thanksgiving by giving our thanks through code as a “Codesgiving”, acknowledging and creating better software together. +

+ +

Giving Thanks to Open-source Harvests

+ +

+ +

Part of building software involves the use of code which others have built, maintained, and distributed for a wider audience. +Using other people’s work often comes in the form of open-source “harvesting” as we find solutions to software challenges we face. +Examples might include installing and depending upon Python packages from PyPI or R packages from CRAN within your software projects.

+ +
+

“Real generosity toward the future lies in giving all to the present.” +- Albert Camus

+
+ +

These open-source projects have internal costs which are sometimes invisible to those who consume them. +Every software project has an implied level of software gardening time costs involved to impede decay, practice continuous improvements, and evolve the work. +One way to actively share our thanks for the projects we depend on is through applying our time towards code contributions on them.

+ +

Many projects are in need of additional people’s thinking and development time. +Have you ever noticed something that needs to be fixed or desirable functionality in a project you use? +Consider adding your contributions to open-source!

+ +

All Contributions Matter

+ +

+ +

Contributing to open-source can come in many forms and contributions don’t need to be gigantic to make an impact. +Software often involves simplifying complexity. +Simplification requires many actions beyond solely writing code. +For example, a short walk outside, a conversation with someone, or a nap can sometimes help us with breakthroughs when it comes to development. +By the same token, open-source benefits greatly from communications on discussion boards, bug or feature descriptions, or other work that might not be strictly considered “engineering”.

+ +

An Open-source Contribution Approach

+ +

+ +

The troubleshooting process as a workflow involving looped checks for verifying an issue and validating the solution fixes an issue.

+ +

It can feel overwhelming to find a way to contribute to open-source. +Similar to other software methodology, modularizing your approach can help you progress without being overwhelmed. +Using a troubleshooting approach like the above can help you break down big challenges into bite-sized chunks. +Consider each step as a “module” or “section” which needs to be addressed sequentially.

+ +

Embrace a Learning Mindset

+ +
+

“Before you speak ask yourself if what you are going to say is true, is kind, is necessary, is helpful. If the answer is no, maybe what you are about to say should be left unsaid.” +- Bernard Meltzer

+
+ +

Open-source contributions almost always entail learning of some kind. +Many contributions happen solely in the form of code and text communications which are easily misinterpreted. +Assume positive intent and accept input from others while upholding your own ideas to share successful contributions together. +Prepare yourself by intentionally opening your mind to input from others, even if you’re sure you’re absolutely “right”.

+ +
+ + +
+ +

Before communicating, be sure to use Bernard Meltzer’s self-checks mentioned above.

+ +
    +
  1. Is what I’m about to say true? +
      +
    • Have I taken time to verify the claims in a way others can replicate or understand?
    • +
    +
  2. +
  3. Is what I’m about to say kind? +
      +
    • Does my intention and communication channel kindness (and not cruelty)?
    • +
    +
  4. +
  5. Is what I’m about to say necessary? +
      +
    • Do my words and actions here enable or enhance progress towards a goal (would the outcome be achieved without them)?
    • +
    +
  6. +
  7. Is what I’m about to say helpful? +
      +
    • How does my communication increase the quality or sustainability of the project (or group)?
    • +
    +
  8. +
+ +
+
+ +

Setting Software Scheduling Expectations

+ + + + + + + +
+ + + +

Suggested ratio of time spent by type of work for an open-source contribution.

+ +
    +
  1. 1/3 planning (~33%)
  2. +
  3. 1/6 coding (~16%)
  4. +
  5. 1/4 component and system testing (25%)
  6. +
  7. 1/4 code review, revisions, and post-actions (25%)
  8. +
+ +

This modified rule of thumb from The Mythical Man Month can assist with how you structure your time for an open-source contribution. +Notice the emphasis on planning and testing and keep these in mind as you progress (the actual programming time can be small if adequate time has been spent on planning). +Notably, the original time fractions are modified here with the final quarter of the time spent suggested as code review, revisions, and post-actions. +Planning for the time expense of the added code review and related elements assists with keeping a learning mindset throughout the process (instead of feeling like the review is a “tack-on” or “optional / supplementary”). +A good motto to keep in mind throughout this process is Festina lente, or “Make haste, slowly.” (take care to move thoughtfully and as slowly as necessary to do things correctly the first time).

+ +

Planning an Open-source Contribution

+ +

Has the Need Already Been Reported?

+ +

+ +

Be sure to check whether the bug or feature has already been reported somewhere! +In a way, this is a practice of “Don’t repeat yourself” (DRY) where we attempt to avoid repeating the same block of code (in this case, the “code” can be understood as natural language). +For example, you can look on GitHub Issues or GitHub Discussions with a search query matching the rough idea of what you’re thinking about. +You can also use the GitHub search bar to automatically search multiple areas (including Issues, Discussions, Pull Requests, etc.) when you enter a query from the repository homepage. +If it has been reported already, take a look to see if someone has made a code contribution related to the work already.

+ +

An open discussion or report of the need doesn’t guarantee someone’s already working on a solution. +If there aren’t yet any code contributions and it doesn’t look like anyone is working on one, consider volunteering to take a further look into the solution and be sure to acknowledge any existing discussions. +If you’re unsure, it’s always kind to mention your interest in the report and ask for more information.

+ +

Is the Need a Bug or Feature?

+ + + + +
+ + + +
+ +

One way to help solidify your thinking and the approach is to consider whether what you’re proposing is a bug or a feature. +A software bug is considered something which is broken or malfunctioning. +A software feature is generally considered new functionality or a different way of doing things than what exists today. +There’s often overlap between these, and sometimes they can inspire branching needs, but individually they usually are more of one than the other. +If you can’t decide whether your need is a bug or a feature, consider breaking it down into smaller sub-components so they can be more of one or the other. +Following this strategy will help you communicate the potential for contribution and also clarify the development process (for example, a critical bug might be prioritized differently than a nice-to-have new feature).

+ +

Reporting the Need for Change

+ +
# Using `function_x` with `library_y` causes `exception_z`
+
+## Summary
+
+As a `library_y` research software developer I want to use `function_x` 
+for my data so that I can share data for research outcomes.
+
+## Reproducing the error
+
+This error may be seen using Python v3.x on all major OS's using
+the following code snippet:
+...
+
+
+ +

An example of a user story issue report with imagined code example.

+ +

Open-source needs are often best reported through written stories captured within a bug or feature tracking system (such as GitHub Issues) which if possible also include example code or logs. +One template for reporting issues is through a “user story”. +A user story typically comes in the form: As a < type of user >, I want < some goal > so that < some reason >. (Mountain Goat Software: User Stories). +Alongside the story, it can help to add in a snippet of code which exemplifies a problem, new functionality, or a potential adjacent / similar solution. +As a general principle, be as specific as you can without going overboard. +Include things like programming language version, operating system, and other system dependencies that might be related.

+ +

Once you have a good written description of the need, be sure to submit it where it can be seen by the relevant development community. +For GitHub-based work, this is usually a GitHub Issue, but can also entail discussion board posts to gather buy-in or consensus before proceeding. +In addition to the specifics outlined above, also recall the learning mindset and Bernard Meltzer’s self-checks, taking time to acknowledge especially the potential challenges and already attempted solutions associated with the description (conveying kindness throughout).

+ +

What Happens After You Submit a Bug or Feature Report?

+ +

+ +

When making open-source contributions, sometimes it can also help to mention that you’re interested in resolving the issue through a related pull request and review. +Oftentimes open-source projects welcome new contributors but may have specific requirements. +These requirements are usually spelled out within a CONTRIBUTING.md document found somewhere in the repository or the organization level documentation. +It’s also completely okay to let other contributors build solutions for the issue (like we mentioned before, all contributions matter, including the reporting of bugs or features themselves)!

+ +

Developing and Testing an Open-source Contribution

+ +

Creating a Development Workspace

+ +

+ +

Once ready to develop a solution for the reported need in the open-source project you’ll need a place to version your updates. +This work generally takes place through version control on focused branches which are named in a way that relates to the focus. +When working on GitHub, this work also commonly takes place on forked repository copies. +Using these methods helps isolate your changes from other work that takes place within the project. +It also can help you track your progress alongside related changes that might take place before you’re able to seek review or code merges.

+ +

Bug or Feature Verification with Test-driven Development

+ +
+ + +
+ +

One can use a test-driven development approach as numbered steps (Wikipedia).

+ +
+
    +
  1. Add or modify a test which checks for a bug fix or feature addition
  2. +
  3. Run all tests (expecting the newly added test content to fail)
  4. +
  5. Write a simple version of code which allows the tests to succeed
  6. +
  7. Verify that all tests now pass
  8. +
  9. Return to step 3, refactoring the code as needed
  10. +
+
+ + +
+
+ +

If you decide to develop a solution for what you reported, one software strategy which can help you remain focused and objective is test-driven development. +Using this pattern sets a “cognitive milestone” for you as you develop a solution to what was reported. +Open-source projects can have many interesting components which could take time and be challenging to understand. +The addition of the test and related development will help keep you goal-orientated without getting lost in the “software forest” of a project.

+ +

Prefer Simple Over Complex Changes

+ +
+

… +Simple is better than complex. +Complex is better than complicated. +… +- PEP 20: The Zen of Python

+
+ +

Further channeling step 3. from test-driven development above, prefer simple changes over more complex ones (recognizing that the absolute simplest can take iteration and thought). +Some of the best solutions are often the most easily understood ones (where the code addition or changes seem obvious afterwards). +A “simplest version” of the code can often be more quickly refactored and completed than devising a “perfect” solution the first time. +Remember, you’ll very likely have the help of a code review before the code is merged (expect to learn more and add changes during review!).

+ +

It might be tempting to address more than one bug or feature at the same time. +Avoid feature creep as you build solutions - stay focused on the task at hand! +Take note of things you notice on your journey to address the reported needs. +These can be become additional reported bugs or features which could be addressed later. +Staying focused with your development will save you time, keep your tests constrained, and (theoretically) help reduce the time and complexity of code review.

+ +

Developing a Solution

+ +

+ +

Once you have a test in place for the bug fix or feature addition it’s time to work towards developing a solution. +If you’ve taken time to accomplish the prior steps before this point you may already have a good idea about how to go about a solution. +If not, spend some time investigating the technical aspects of a solution, optionally adding this information to the report or discussion content for further review before development. +Use timeboxing techniques to help make sure the time you spend in development is no more than necessary.

+ +

Code Review, Revisions, and Post-actions

+ +

Pull Requests and Code Review

+ +

When your code and new test(s) are in a good spot it’s time to ask for a code review. +It might feel tempting to perfect the code. +Instead, consider whether the code is “good enough” and would benefit from someone else providing feedback. +Code review takes advantage of a strength of our species: collaborative & multi-perspectival thinking. +Leverage this in your open-source experience by seeking feedback when things feel “good enough”.

+ +
+ + + +

Demonstrating Pareto Principle “vital few” through a small number of changes to achieve 80% of the value associated with the needs.

+ +

One way to understand “good enough” is to assess whether you have reached what the Pareto Principle terms as the “vital few” causes. +The Pareto Principle states that roughly 80% of consequences come from 20% of causes (the “vital few”). +What are the 20% changes (for example, as commits) which are required to achieve 80% of the desired intent for development with your open-source contribution? +When you reach those 20% of the changes, consider opening a pull request to gather more insight about whether those changes will suffice and how the remaining effort might be spent.

+ +

As you go through the process of opening a pull request, be sure to follow the open-source CONTRIBUTING.md document documentation related to the project; each one can vary. +When working on GitHub-based projects, you’ll need to open a pull request on the correct branch (usually upstream main). +If you used a GitHub issue to help report the issue, mention the issue in the pull request description using the #issue number (for example #123 where the issue link would look like: https://github.com/orgname/reponame/issues/123) reference to help link the work to the reported need. +This will cause the pull request to show up within the issue and automatically create a link to the issue from the pull request.

+ +

Code Revisions

+ +
+

“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” +- Antoine de Saint-Exupery

+
+ +

You may be asked to update your code based on automated code quality checks or reviewer request. +Treat these with care; embrace learning and remember that this step can take 25% of the total time for the contribution. +When working on GitHub forks or branches, you can make additional commits directly on the development branch which was used for the pull request. +If your reviewers requested changes, re-request their review once changes have been made to help let them know the code is ready for another look.

+ +

Post-actions and Tidying Up Afterwards

+ +

+ +

Once the code has been accepted by the reviewers and through potential automated testing suite(s) the content is ready to be merged. +Oftentimes this work is completed by core maintainers of the project. +After the code is merged, it’s usually a good idea to clean up your workspace by deleting your development branch and syncing with the upstream repository. +While it’s up to core maintainers to decide on report closure, typically the reported need content can be closed and might benefit from a comment describing the fix. +Many of these steps are considered common courtesy but also, importantly, assist in setting you up for your next contributions!

+ +

Concluding Thoughts

+ +

Hopefully the above helps you understand the open-source contribution process better. +As stated earlier, every little part helps! +Best wishes on your open-source journey and happy Codesgiving!

+ +

References

+ +
    +
  • Top Image: Französischer Obstgarten zur Erntezeit (Le verger) by Charles-François Daubigny (cropped). (Source: Wikimedia Commons)
  • +
]]>
dave-bunten
Tip of the Week: Data Quality Validation through Software Testing Techniques2023-10-04T00:00:00+00:002024-05-29T14:08:50+00:00/set-website/preview/pr-36/2023/10/04/Data-Quality-ValidationTip of the Week: Data Quality Validation through Software Testing Techniques + +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ +

TLDR (too long, didn’t read);

+ +

Implement data quality validation through software testing approaches which leverage ideas surrounding Hoare triples and Design by contract (DbC). Balancing reusability through component-based design data testing with Great Expectations or Assertr. For greater specificity in your data testing, use database schema-like verification through Pandera or a JSON Schema validator. When possible, practice shift-left testing on data sources by through the concept of “database(s) as code” via tools like Data Version Control (DVC) and Flyway.

+ +

Introduction

+ +

+ +

Diagram showing input, in-process data, and output data as a workflow.

+ + +

Data orientated software development can benefit from a specialized focus on varying aspects of data quality validation. +We can use software testing techniques to validate certain qualities of the data in order to meet a declarative standard (where one doesn’t need to guess or rediscover known issues). +These come in a number of forms and generally follow existing software testing concepts which we’ll expand upon below. +This article will cover a few tools which leverage these techniques for addressing data quality validation testing. +

+

Data Quality Testing Concepts

+ +

Hoare Triple

+ +

+ +

One concept we’ll use to present these ideas is Hoare logic, which is a system for reasoning on software correctness. +Hoare logic includes the idea of a Hoare triple ($ {\displaystyle {P}C{Q}} $) where $ {\displaystyle {P}} $ is an assertion of precondition, $ {\displaystyle \ C} $ is a command, and $ {\displaystyle {Q}} $ is a postcondition assertion. +Software development using data often entails (sometimes assumed) assertions of precondition from data sources, a transformation or command which changes the data, and a (sometimes assumed) assertion of postcondition in a data output or result.

+ +

Design by Contract

+ +

+ +

Data testing through design by contract over Hoare triple.

+ +

Hoare logic and Software correctness help describe design by contract (DbC), a software approach involving the formal specification of “contracts” which help ensure we meet our intended goals. +DbC helps describe how to create assertions when proceeding through Hoare triplet states for data. +These concepts provide a framework for thinking about the tools mentioned below.

+ +

Data Component Testing

+ +

+ +

Diagram showing data contracts as generalized and reusable “component” testing being checked through contracts and raising an error if they aren’t met or continuing operations if they are met.

+ +

We often need to verify a certain component’s surrounding data in order to ensure it meets minimum standards. +The word “component” is used here from the context of component-based software design to group together reusable, modular qualities of the data where sometimes we don’t know (or want) to specify granular aspects (such as schema, type, column name, etc). +These components often are implied by software which will eventually use the data, which can emit warnings or errors when they find the data does not meet these standards. +Oftentimes these components are contracts checking postconditions of earlier commands or procedures, ensuring the data we receive is accurate to our intention. +We can avoid these challenges by creating contracts for our data to verify the components of the result before it reaches later stages.

+ +

Examples of these data components might include:

+ +
    +
  • The dataset has no null values.
  • +
  • The dataset has no more than 3 columns.
  • +
  • The dataset has a column called numbers which includes numbers in the range of 0-10.
  • +
+ +

Data Component Testing - Great Expectations

+ +
"""
+Example of using Great Expectations
+Referenced with modifications from: 
+https://docs.greatexpectations.io/docs/tutorials/quickstart/
+"""
+import great_expectations as gx
+
+# get gx DataContext
+# see: https://docs.greatexpectations.io/docs/terms/data_context
+context = gx.get_context()
+
+# set a context data source 
+# see: https://docs.greatexpectations.io/docs/terms/datasource
+validator = context.sources.pandas_default.read_csv(
+    "https://raw.githubusercontent.com/great-expectations/gx_tutorials/main/data/yellow_tripdata_sample_2019-01.csv"
+)
+
+# add and save expectations 
+# see: https://docs.greatexpectations.io/docs/terms/expectation
+validator.expect_column_values_to_not_be_null("pickup_datetime")
+validator.expect_column_values_to_be_between("passenger_count", auto=True)
+validator.save_expectation_suite()
+
+# checkpoint the context with the validator
+# see: https://docs.greatexpectations.io/docs/terms/checkpoint
+checkpoint = context.add_or_update_checkpoint(
+    name="my_quickstart_checkpoint",
+    validator=validator,
+)
+
+# gather checkpoint expectation results
+checkpoint_result = checkpoint.run()
+
+# show the checkpoint expectation results
+context.view_validation_result(checkpoint_result)
+
+ +

Example code leveraging Python package Great Expectations to perform various data component contract validation.

+ +

Great Expectations is a Python project which provides data contract testing features through the use of component called “expectations” about the data involved. +These expectations act as a standardized way to define and validate the component of the data in the same way across different datasets or projects. +In addition to providing a mechanism for validating data contracts, Great Expecations also provides a way to view validation results, share expectations, and also build data documentation. +See the above example for a quick code reference of how these work.

+ +

Data Component Testing - Assertr

+ +
# Example using the Assertr package
+# referenced with modifications from:
+# https://docs.ropensci.org/assertr/articles/assertr.html
+library(dplyr)
+library(assertr)
+
+# set our.data to reference the mtcars dataset
+our.data <- mtcars
+
+# simulate an issue in the data for contract specification
+our.data$mpg[5] <- our.data$mpg[5] * -1
+
+# use verify to validate that column mpg >= 0
+our.data %>%
+  verify(mpg >= 0)
+
+# use assert to validate that column mpg is within the bounds of 0 to infinity
+our.data %>%
+  assert(within_bounds(0,Inf), mpg)
+
+ +

Example code leveraging R package Assertr to perform various data component contract validation.

+ +

Assertr is an R project which provides similar data component assertions in the form of verify, assert, and insist methods (see here for more documentation). +Using Assertr enables a similar but more lightweight functionality to that of Great Expectations. +See the above for an example of how to use it in your projects.

+ +

Data Schema Testing

+ +

+ +

Diagram showing data contracts as more granular specifications via “schema” testing being checked through contracts and raising an error if they aren’t met or continuing operations if they are met.

+ +

Sometimes we need greater specificity than what a data component can offer. +We can use data schema testing contracts in these cases. +The word “schema” here is used from the context of database schema, but oftentimes these specifications are suitable well beyond solely databases (including database-like formats like dataframes). +While reuse and modularity are more limited with these cases, they can be helpful for efforts where precision is valued or necessary to accomplish your goals. +It’s worth mentioning that data schema and component testing tools often have many overlaps (meaning you can interchangeably use them to accomplish both tasks).

+ +

Data Schema Testing - Pandera

+ +
"""
+Example of using the Pandera package
+referenced with modifications from:
+https://pandera.readthedocs.io/en/stable/try_pandera.html
+"""
+import pandas as pd
+import pandera as pa
+from pandera.typing import DataFrame, Series
+
+
+# define a schema
+class Schema(pa.DataFrameModel):
+    item: Series[str] = pa.Field(isin=["apple", "orange"], coerce=True)
+    price: Series[float] = pa.Field(gt=0, coerce=True)
+
+
+# simulate invalid dataframe
+invalid_data = pd.DataFrame.from_records(
+    [{"item": "applee", "price": 0.5}, 
+     {"item": "orange", "price": -1000}]
+)
+
+
+# set a decorator on a function which will
+# check the schema as a precondition
+@pa.check_types(lazy=True)
+def precondition_transform_data(data: DataFrame[Schema]):
+    print("here")
+    return data
+
+
+# precondition schema testing
+try:
+    precondition_transform_data(invalid_data)
+except pa.errors.SchemaErrors as schema_excs:
+    print(schema_excs)
+
+# inline or implied postcondition schema testing
+try:
+    Schema.validate(invalid_data)
+except pa.errors.SchemaError as schema_exc:
+    print(schema_exc)
+
+ +

Example code leveraging Python package Pandera to perform various data schema contract validation.

+ +

DataFrame-like libraries like Pandas can verified using schema specification contracts through Pandera (see here for full DataFrame library support). +Pandera helps define specific columns, column types, and also has some component-like features. +It leverages a Pythonic class specification, similar to data classes and pydantic models, making it potentially easier to use if you already understand Python and DataFrame-like libraries. +See the above example for a look into how Pandera may be used.

+ +

Data Schema Testing - JSON Schema

+ +
# Example of using the jsonvalidate R package.
+# Referenced with modifications from:
+# https://docs.ropensci.org/jsonvalidate/articles/jsonvalidate.html
+
+schema <- '{
+  "$schema": "https://json-schema.org/draft/2020-12/schema",
+  "title": "Hello World JSON Schema",
+  "description": "An example",
+  "type": "object",
+  "properties": {
+    "hello": {
+      "description": "Provide a description of the property here",
+      "type": "string"
+    }
+  },
+  "required": [
+    "hello"
+  ]
+}'
+
+# create a schema contract for data
+validate <- jsonvalidate::json_validator(schema, engine = "ajv")
+
+# validate JSON using schema specification contract and invalid data
+validate("{}")
+
+# validate JSON using schema specification contract and valid data
+validate("{'hello':'world'}")
+
+ +

JSON Schema provides a vocabulary way to validate schema contracts for JSON documents. +There are several implementations of the vocabulary, including Python package jsonschema, and R package jsonvalidate. +Using these libraries allows you to define pre- or postcondition data schema contracts for your software work. +See above for an R based example of using this vocabulary to perform data schema testing.

+ +

Shift-left Data Testing

+ +

+ +

Earlier portions of this article have covered primarily data validation of command side-effects and postconditions. +This is commonplace in development where data sources usually are provided without the ability to validate their precondition or definition. +Shift-left testing is a movement which focuses on validating earlier in the lifecycle if and when possible to avoid downstream issues which might occur.

+ +

Shift-left Data Testing - Data Version Control (DVC)

+ +

+ +

Data sources undergoing frequent changes become difficult to use because we oftentimes don’t know when the data is from or what version it might be. +This information is sometimes added in the form of filename additions or an update datetime column in a table. +Data Version Control (DVC) is one tool which is specially purposed to address this challenge through source control techniques. +Data managed by DVC allows software to be built in such a way that version preconditions are validated before reaching data transformations (commands) or postconditions.

+ +

Shift-left Data Testing - Flyway

+ +

+ +

Database sources can leverage an idea nicknamed “database as code” (which builds on a similar idea about infrastructure as code) to help declare the schema and other elements of a database in the same way one would code. +These ideas apply to both databases and also more broadly through DVC mentioned above (among other tools) via the concept “data as code”. +Implementing this idea has several advantages from source versioning, visibility, and replicability. +One tool which implements these ideas is Flyway which can manage and implement SQL-based files as part of software data precondition validation. +A lightweight alternative to using Flyway is sometimes to include a SQL file which creates related database objects and becomes data documentation.

]]>
dave-bunten
Tip of the Week: Python Packaging as Publishing2023-09-05T00:00:00+00:002024-05-29T14:08:50+00:00/set-website/preview/pr-36/2023/09/05/Python-Packaging-as-PublishingTip of the Week: Python Packaging as Publishing + +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ + + +

Python packaging is the craft of preparing for and reaching distribution of your Python work to wider audiences. Following conventions for packaging help your software work become more understandable, trustworthy, and connected (to others and their work). Taking advantage of common packaging practices also strengthens our collective superpowers: collaboration. This post will cover preparation aspects of packaging, readying software work for wider distribution.

+ + + +

TLDR (too long, didn’t read);

+ +

Use Pythonic packaging tools and techniques to help avoid code decay and unwanted code smells and increase your development velocity. Increase understanding with unsurprising directory structures like those exhibited in pypa/sampleproject or scientific-python/cookie. Enhance trust by being authentic on source control systems like GitHub (by customizing your profile), staying up to date with the latest supported versions of Python, and using security linting tools like PyCQA/bandit through visible + automated GitHub Actions ✅ checks. Connect your projects to others using CITATION.cff files, CONTRIBUTING.md files, and using environment + packaging tools like poetry to help others reproduce the same results from your code.

+ +

Why practice packaging?

+ +
+ + How are a page with some text and a book different? + + +
+ How are a page with some text and a book different? + +
+ +
+ +

The practice of Python packaging efforts is similar to that of publishing a book. Consider how a bag of text is different from a book. How and why are these things different?

+ +
    +
  • A book has commonly understood sequencing of content (i.e. copyright page, then title page, then body content pages…).
  • +
  • A book often cites references and acknowledges other work explicitly.
  • +
  • A book undergoes a manufacturing process which allows the text to be received in many places the same way.
  • +
+ +
+ + Code undergoing packaging to achieve understanding, trust, and connection for an audience. + + +
+ Code undergoing packaging to achieve understanding, trust, and connection for an audience. + +
+ +
+ +

These can be thought of metaphors when it comes to packaging in Python. Books have a smell which sometimes comes from how it was stored, treated, or maintained. While there are pleasant book smells, they might also smell soggy from being left in the rain or stored without maintenance for too long. Just like books, software can sometimes have negative code smells indicating a lack of care or less sustainable condition. Following good packaging practices helps to avoid unwanted code smells while increasing development velocity, maintainability of software through understandability, trustworthiness of the content, and connection to other projects.

+ +
+ + +
+ +

Note: these techniques can also work just as well for inner source collaboration (private or proprietary development within organizations)! Don’t hesitate to use these on projects which may not be public facing in order to make development and maintenance easier (if only for you).

+ +
+
+ +
+ + +
+ +

“Wait, what are Python packages?”

+ +
my_package/
+│   __init__.py
+│   module_a.py
+│   module_b.py
+
+ +

A Python package is a collection of modules (.py files) that usually include an “initialization file” __init__.py. This post will cover the craft of packaging which can include one or many packages.

+ +
+
+ +

Understanding: common directory structures

+ +
project_directory
+├── README.md
+├── LICENSE.txt
+├── pyproject.toml
+├── docs
+│   └── source
+│       └── index.md
+├── src
+│   └── package_name
+│       └── __init__.py
+│       └── module_a.py
+└── tests
+    └── __init__.py
+    └── test_module_a.py
+
+ +

Python Packaging today generally assumes a specific directory design. +Following this convention generally improves the understanding of your code. We’ll cover each of these below.

+ +

Project root files

+ +
project_directory
+├── README.md
+├── LICENSE.txt
+├── pyproject.toml
+│ ...
+
+ +
    +
  • The README.md file is a markdown file with documentation including project goals and other short notes about installation, development, or usage. The README.md file is akin to a book jacket blurb which quickly tells the audience what the book will be about.
  • +
  • The LICENSE.txt file is a text file which indicates licensing details for the project. It often includes information about how it may be used and protects the authors in disputes. The LICENSE.txt file can be thought of like a book’s copyright page. See https://choosealicense.com/ for more details on selecting an open source license.
  • +
  • The pyproject.toml file is a Python-specific TOML file which helps organize how the project is used and built for wider distribution. The pyproject.toml file is similar to a book’s table of contents, index, and printing or production specification.
  • +
+ +

Project sub-directories

+ +
project_directory
+│ ...
+├── docs
+│   └── source
+│       └── index.md
+├── src
+│   └── package_name
+│       └── __init__.py
+│       └── module_a.py
+└── tests
+    └── __init__.py
+    └── test_module_a.py
+
+ +
    +
  • The docs directory is used for in-depth documentation and related documentation build code (for example, when building documentation websites, aka “docsites”). The docs directory includes information similar to a book’s “study guide”, providing content surrounding how to best make use of and understand the content found within.
  • +
  • The src directory includes primary source code for use in the project. Python projects generally use a nested package directory with modules and sub-packages. The src directory is like a book’s body or general content (perhaps thinking of modules as chapters or sections of related ideas).
  • +
  • The tests directory includes testing code for validating functionality of code found in the src directory. The above follows pytest conventions. The tests directory is for code which acts like a book’s early reviewers or editors, making sure that if you change things in src the impacts remain as expected.
  • +
+ +

Common directory structure examples

+ +

The Python directory structure described above can be witnessed in the wild from the following resources. These can serve as a great resource for starting or adjusting your own work.

+ + + +

Trust: building audience confidence

+ +
+ + How much does your audience trust your work?. + + +
+ How much does your audience trust your work?. + +
+ +
+ +

Building an understandable body of content helps tremendously with audience trust. What else can we do to enhance project trust? The following elements can help improve an audience’s trust in packaged Python work.

+ +

Source control authenticity

+ +
+ + Comparing the difference between a generic or anonymous user and one with greater authenticity. + + +
+ Comparing the difference between a generic or anonymous user and one with greater authenticity. + +
+ +
+ +

Be authentic! Fill out your profile to help your audience know the author and why you do what you do. See here for GitHub’s documentation on filling out your profile. Doing this may seem irrelevant but can go a long way to making technical work more relatable.

+ +
    +
  • Add a profile picture of yourself or something fun.
  • +
  • Set your profile description to information which is both professionally accurate and unique to you.
  • +
  • Show or link to work which you feel may be relevant or exciting to those in your audience.
  • +
+ +

Staying up to date with supported Python releases

+ +
+ + Major Python releases and their support status. + + +
+ Major Python releases and their support status. + +
+ +
+ +

Use Python versions which are supported (this changes over time). +Python versions which are end-of-life may be difficult to support and are a sign of code decay for projects. Specify the version of Python which is compatiable with your project by using environment specifications such as pyproject.toml files and related packaging tools (more on this below).

+ + + +

Security linting and visible checks with GitHub Actions

+ +
+ + Make an effort to inspect your package for known security issues. + + +
+ Make an effort to inspect your package for known security issues. + +
+ +
+ +

Use security vulnerability linters to help prevent undesirable or risky processing for your audience. Doing this both practical to avoid issues and conveys that you care about those using your package!

+ + + +
+ + The green checkmark from successful GitHub Actions runs can offer a sense of reassurance to your audience. + + +
+ The green checkmark from successful GitHub Actions runs can offer a sense of reassurance to your audience. + +
+ +
+ +

Combining GitHub actions with security linters and tests from your software validation suite can add an observable ✅ for your project. +This provides the audience with a sense that you’re transparently testing and sharing results of those tests.

+ + + +

Connection: personal and inter-package relationships

+ +
+ + How does your package connect with other work and people? + + +
+ How does your package connect with other work and people? + +
+ +
+ +

Understandability and trust set the stage for your project’s connection to other people and projects. What can we do to facilitate connection with our project? Use the following techniques to help enhance your project’s connection to others and their work.

+ +

Acknowledging authors and referenced work with CITATION.cff

+ +
+ + figure image + + +
+ +

Add a CITATION.cff file to your project root in order to describe project relationships and acknowledgements in a standardized way. The CFF format is also GitHub compatible, making it easier to cite your project.

+ + + +

Reaching collaborators using CONTRIBUTING.md

+ +
+ + CONTRIBUTING.md documents can help you collaborate with others. + + +
+ CONTRIBUTING.md documents can help you collaborate with others. + +
+ +
+ +

Provide a CONTRIBUTING.md file to your project root so as to make clear support details, development guidance, code of conduct, and overall documentation surrounding how the project is governed.

+ + + +

Environment management reproducibility as connected project reality

+ +
+ + Environment and packaging managers can help you connect with your audience. + + +
+ Environment and packaging managers can help you connect with your audience. + +
+ +
+ +

Code without an environment specification is difficult to run in a consistent way. This can lead to “works on my machine” scenarios where different things happen for different people, reducing the chance that people can connect with a shared reality for how your code should be used.

+ +
+

“But why do we have to switch the way we do things?” +We’ve always been switching approaches (software approaches evolve over time)! A brief history of Python environment and packaging tooling:

+ +
    +
  1. distutils, easy_install + setup.py
    (primarily used during 1990’s - early 2000’s)
  2. +
  3. pip, setup.py + requirements.txt
    (primarily used during late 2000’s - early 2010’s)
  4. +
  5. poetry + pyproject.toml
    (began use around late 2010’s - ongoing)
  6. +
+
+ +

Using Python poetry for environment and packaging management

+ +
+ + figure image + + +
+ +

Poetry is one Pythonic environment and packaging manager which can help increase reproducibility using pyproject.toml files. It’s one of many other alternatives such as hatch and pipenv.

+ +
poetry directory structure template use
+ +
user@machine % poetry new --name=package_name --src .
+Created package package_name in .
+
+user@machine % tree .
+.
+├── README.md
+├── pyproject.toml
+├── src
+│   └── package_name
+│       └── __init__.py
+└── tests
+    └── __init__.py
+
+ +

After installation, Poetry gives us the ability to initialize a directory structure similar to what we presented earlier by using the poetry new ... command. If you’d like a more interactive version of the same, use the poetry init command to fill out various sections of your project with detailed information.

+ +
poetry format for project pyproject.toml
+ +
# pyproject.toml
+[tool.poetry]
+name = "package-name"
+version = "0.1.0"
+description = ""
+authors = ["username <email@address>"]
+readme = "README.md"
+packages = [{include = "package_name", from = "src"}]
+
+[tool.poetry.dependencies]
+python = "^3.9"
+
+[build-system]
+requires = ["poetry-core"]
+build-backend = "poetry.core.masonry.api"
+
+ +

Using the poetry new ... command also initializes the content of our pyproject.toml file with opinionated details (following the recommendation from earlier in the article regarding declared Python version specification).

+ +
poetry dependency management
+ +
user@machine % poetry add pandas
+
+Creating virtualenv package-name-1STl06GY-py3.9 in /pypoetry/virtualenvs
+Using version ^2.1.0 for pandas
+
+...
+
+Writing lock file
+
+ +

We can add dependencies directly using the poetry add ... command. This command also provides the possibility of using a group flag (for example poetry add pytest --group testing) to help organize and distinguish multiple sets of dependencies.

+ +
    +
  • A local virtual environment is managed for us automatically.
  • +
  • A poetry.lock file is written when the dependencies are installed to help ensure the version you installed today will be what’s used on other machines.
  • +
  • The poetry.lock file helps ensure reproducibility when dealing with dependency version ranges (where otherwise we may end up using different versions which match the dependency ranges but observe different results).
  • +
+ +
Running Python from the context of poetry environments
+ +
% poetry run python -c "import pandas; print(pandas.__version__)"
+
+2.1.0
+
+ +

We can invoke the virtual environment directly using the poetry run ... command.

+ +
    +
  • This allows us to quickly run code through the context of the project’s environment.
  • +
  • Poetry can automatically switch between multiple environments based on the local directory structure.
  • +
  • We can also the environment as a “shell” (similar to virtualenv’s activate) with the poetry shell command which enables us to leverage a dynamic session in the context of the poetry environment.
  • +
+ +
Building source code with poetry
+ +
% pip install git+https://github.com/project/package_name
+
+ +

Even if we don’t reach wider distribution on PyPI or elsewhere, source code managed by pyproject.toml and poetry can be used for “manual” distribution (with reproducible results) from GitHub repositories. When we’re ready to distribute pre-built packages on other networks we can also use the following:

+ +
% poetry build
+
+Building package-name (0.1.0)
+  - Building sdist
+  - Built package_name-0.1.0.tar.gz
+  - Building wheel
+  - Built package_name-0.1.0-py3-none-any.whl
+
+ +

Poetry readies source-code and pre-compiled versions of our code for distribution platforms like PyPI by using the poetry build ... command. We’ll cover more on these files and distribution steps with a later post!

]]>
dave-bunten
Tip of the Week: Using Python and Anaconda with the Alpine HPC Cluster2023-07-07T00:00:00+00:002024-05-29T14:08:50+00:00/set-website/preview/pr-36/2023/07/07/Using-Python-and-Anaconda-with-the-Alpine-HPC-ClusterTip of the Week: Using Python and Anaconda with the Alpine HPC Cluster + +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ + + +

This post is intended to help demonstrate the use of Python on Alpine, a High Performance Compute (HPC) cluster hosted by the University of Colorado Boulder’s Research Computing. +We use Python here by way of Anaconda environment management to run code on Alpine. +This readme will cover a background on the technologies and how to use the contents of an example project repository as though it were a project you were working on and wanting to run on Alpine.

+ + + +

+ +

Diagram showing a repository’s work as being processed on Alpine.

+ +

Table of Contents

+ +
    +
  1. Background: here we cover the background of Alpine and related technologies.
  2. +
  3. Implementation: in this section we use the contents of an example project repository on Alpine.
  4. +
+ +

Background

+ +

Why would I use Alpine?

+ +

+ +

Diagram showing common benefits of Alpine and HPC clusters.

+ +

Alpine is a High Performance Compute (HPC) cluster. +HPC environments provide shared computer hardware resources like memory, CPU, GPU or others to run performance-intensive work. +Reasons for using Alpine might include:

+ +
    +
  • Compute resources: Leveraging otherwise cost-prohibitive amounts of memory, CPU, GPU, etc. for processing data.
  • +
  • Long-running jobs: Completing long-running processes which may take hours or days to complete.
  • +
  • Collaborations: Sharing a single implementation environment for reproducibility within a group (avoiding “works on my machine” inconsistency issues).
  • +
+ +

How does Alpine work?

+ +

+ +

Diagram showing high-level user workflow and Alpine components.

+ +

Alpine’s compute resources are used through compute nodes in a system called Slurm. +Slurm is a system that a large number of users to run jobs on a cluster of computers; the system figures out how to use all the computers in the cluster to execute all the user’s jobs fairly (i.e., giving each user approximately equal time and resources on the cluster). A job is a request to run something, e.g. a bash script or a program, along with specifications about how much RAM and CPU it needs, how long it can run, and how it should be executed.

+ +

Slurm’s role in general is to take in a job (submitted via the sbatch command) and put it into a queue (also called a “partition” in Slurm). For each job in the queue, Slurm constantly tries to find a computer in the cluster with enough resources to run that job, then when an available computer is found runs the program the job specifies on that computer. As the program runs, Slurm records its output to files and finally reports the program’s exit status (either completed or failed) back to the job manager.

+ +

Importantly, jobs can either be marked as interactive or batch. When you submit an interactive job, sbatch will pause while waiting for the job to start and then connect you to the program, so you can see its output and enter commands in real time. On the other hand, a batch job will return immediately; you can see the progress of your job using squeue, and you can typically see the output of the job in the folder from which you ran sbatch unless you specify otherwise. +Data for or from Slurm work may be stored temporarily on local storage or on user-specific external (remote) storage.

+ +
+ + +
+ +

Wait, what are “nodes”?

+ +

A simplified way to understand the architecture of Slurm on Alpine is through login and compute “nodes” (computers). +Login nodes act as a place to prepare and submit jobs which will be completed on compute nodes. Login nodes are never used to execute Slurm jobs, whereas compute nodes are exclusively accessed via a job. +Login nodes have limited resource access and are not recommended for running procedures.

+ +
+
+ +

One can interact with Slurm on Alpine by use of Slurm interfaces and directives. +A quick way of accessing Alpine resources is through the use of the acompile command, which starts an interactive job on a compute node with some typical default parameters for the job. Since acompile requests very modest resources (1 hour and 1 CPU core at the time of writing), you’ll typically quickly be connected to a compute node. For more intensive or long-lived interactive jobs, consider using sinteractive, which allows for more customization: Interactive Jobs. +One can also access Slurm directly through various commands on Alpine.

+ +

Many common software packages are available through the Modules package on Alpine (UCB RC documentation: The Modules System).

+ +

How does Slurm work?

+ +

+ +

Diagram showing how Slurm generally works.

+ +

Using Alpine effectively involves knowing how to leverage Slurm. +A simplified way to understand how Slurm works is through the following sequence. +Please note that some steps and additional complexity are omitted for the purposes of providing a basis of understanding.

+ +
    +
  1. Create a job script: build a script which will configure and run procedures related to the work you seek to accomplish on the HPC cluster.
  2. +
  3. Submit job to Slurm: ask Slurm to run a set of commands or procedures.
  4. +
  5. Job queue: Slurm will queue the submitted job alongside others (recall that the HPC cluster is a shared resource), providing information about progress as time goes on.
  6. +
  7. Job processing: Slurm will run the procedures in the job script as scheduled.
  8. +
  9. Job completion or cancellation: submitted jobs eventually may reach completion or cancellation states with saved information inside Slurm regarding what happened.
  10. +
+ +

How do I store data on Alpine?

+ +

+ +

Data used or produced by your processed jobs on Alpine may use a number of different data storage locations. +Be sure to follow the Acceptable data storage and use policies of Alpine, avoiding the use of certain sensitive information and other items. +These may be distinguished in two ways:

+ +
    +
  1. +

    Alpine local storage (sometimes temporary): Alpine provides a number of temporary data storage locations for accomplishing your work. +⚠️ Note: some of these locations may be periodically purged and are not a suitable location for long-term data hosting (see here for more information)!
    +Storage locations available (see this link for full descriptions):

    + +
      +
    • Home filesystem: 2 GB of backed up space under /home/$USER (where $USER is your RMACC or Alpine username).
    • +
    • Projects filesystem: 250 GB of backed up space under /projects/$USER (where $USER is your RMACC or Alpine username).
    • +
    • Scratch filesystem: 10 TB (10,240 GB) of space which is not backed up under /scratch/alpine/$USER (where $USER is your RMACC or Alpine username).
    • +
    +
  2. +
  3. +

    External / remote storage: Users are encouraged to explore external data storage options for long-term hosting.
    +Examples may include the following:

    + + +
  4. +
+ +

How do I send or receive data on Alpine?

+ +

+ +

Diagram showing external data storage being used to send or receive data on Alpine local storage.

+ +

Data may be sent to or gathered from Alpine using a number of different methods. +These may vary contingent on the external data storage being referenced, the code involved, or your group’s available resources. +Please reference the following documentation from the University of Colorado Boulder’s Research Computing regarding data transfers: The Compute Environment - Data Transfer. +Please note: due to the authentication configuration of Alpine many local or SSH-key based methods are not available for CU Anschutz users. +As a result, Globus represents one of the best options available (see 3. 📂 Transfer data results below). While the Globus tutorial in this document describes how you can download data from Alpine to your computer, note that you can also use Globus to transfer data to Alpine from your computer.

+ +

Implementation

+ +

+ +

Diagram showing how an example project repository may be used within Alpine through primary steps and processing workflow.

+ +

Use the following steps to understand how Alpine may be used with an example project repository to run example Python code.

+ +

0. 🔑 Gain Alpine access

+ +

First you will need to gain access to Alpine. +This access is provided to members of the University of Colorado Anschutz through RMACC and is separate from other credentials which may be provided by default in your role. +Please see the following guide from the University of Colorado Boulder’s Research Computing covering requesting access and generally how this works for members of the University of Colorado Anschutz.

+ + + +

1. 🛠️ Prepare code on Alpine

+ +
[username@xsede.org@login-ciX ~]$ cd /projects/$USER
+[username@xsede.org@login-ciX username@xsede.org]$ git clone https://github.com/CU-DBMI/example-hpc-alpine-python
+Cloning into 'example-hpc-alpine-python'...
+... git output ...
+[username@xsede.org@login-ciX username@xsede.org]$ ls -l example-hpc-alpine-python
+... ls output ...
+
+ +

An example of what this preparation section might look like in your Alpine terminal session.

+ +

Next we will prepare our code within Alpine. +We do this to balance the fact that we may develop and source control code outside of Alpine. +In the case of this example work, we assume git as an interface for GitHub as the source control host.

+ +

Below you’ll find the general steps associated with this process.

+ +
    +
  1. Login to the Alpine command line (reference this guide).
  2. +
  3. Change directory into the Projects filesystem (generally we’ll assume processed data produced by this code are large enough to warrant the need for additional space):
    cd /projects/$USER
  4. +
  5. Use git (built into Alpine by default) commands to clone this repo:
    git clone https://github.com/CU-DBMI/example-hpc-alpine-python
  6. +
  7. Verify the contents were received as desired (this should show the contents of an example project repository):
    ls -l example-hpc-alpine-python
  8. +
+ + + +

+ +
+ + +
+ +

What if I need to authenticate with GitHub?

+ +

There are times where you may need to authenticate with GitHub in order to accomplish your work. +From a GitHub perspective, you will want to use either GitHub Personal Access Tokens (PAT) (recommended by GitHub) or SSH keys associated with the git client on Alpine. +Note: if you are prompted for a username and password from git when accessing a GitHub resource, the password is now associated with other keys like PAT’s instead of your user’s password (reference). +See the following guide from GitHub for more information on how authentication through git to GitHub works:

+ + + +
+
+ +

2. ⚙️ Implement code on Alpine

+ +
[username@xsede.org@login-ciX ~]$ sbatch --export=CSV_FILEPATH="/projects/$USER/example_data.csv" example-hpc-alpine-python/run_script.sh
+[username@xsede.org@login-ciX username@xsede.org]$ tail -f example-hpc-alpine-python.out
+... tail output (ctrl/cmd + c to cancel) ...
+[username@xsede.org@login-ciX username@xsede.org]$ head -n 2 example_data.csvexample-hpc-alpine-python
+... data output ...
+
+ +

An example of what this implementation section might look like in your Alpine terminal session.

+ +

After our code is available on Alpine we’re ready to run it using Slurm and related resources. +We use Anaconda to build a Python environment with specified packages for reproducibility. +The main goal of the Python code related to this work is to create a CSV file with random data at a specified location. +We’ll use Slurm’s sbatch command, which submits batch scripts to Slurm using various options.

+ +
    +
  1. Use the sbatch command with exported variable CSV_FILEPATH.
    sbatch --export=CSV_FILEPATH="/projects/$USER/example_data.csv" example-hpc-alpine-python/run_script.sh
  2. +
  3. After a short moment, use the tail command to observe the log file created by Slurm for this sbatch submission. This file can help you understand where things are at and if anything went wrong.
    tail -f example-hpc-alpine-python.out
  4. +
  5. Once you see that the work has completed from the log file, take a look at the top 2 lines of the data file using the head command to verify the data arrived as expected (column names with random values):
    head -n 2 example_data.csv
  6. +
+ +

3. 📂 Transfer data results

+ +

+ +

Diagram showing how example_data.csv may be transferred from Alpine to a local machine using Globus solutions.

+ +

Now that the example data output from the Slurm work is available we need to transfer that data to a local system for further use. +In this example we’ll use Globus as a data transfer method from Alpine to our local machine. +Please note: always be sure to check data privacy and policy which change the methods or storage locations you may use for your data!

+ +
    +
  1. Globus local machine configuration +
      +
    1. Install Globus Connect Personal on your local machine.
    2. +
    3. During installation, you will be prompted to login to Globus. Use your ACCESS credentials to login.
    4. +
    5. During installation login, note the label you provide to Globus. This will be used later, referenced as “Globus Connect Personal label”.
    6. +
    7. Ensure you add and (importantly:) provide write access to a local directory via Globus Connect Personal - Preferences - Access where you’d like the data to be received from Alpine to your local machine.

    8. +
    +
  2. +
  3. Globus web interface +
      +
    1. Use your ACCESS credentials to login to the Globus web interface.
    2. +
    3. Configure File Manager left side (source selection) +
        +
      1. Within the Globus web interface on the File Manager tab, use the Collection input box to search or select “CU Boulder Research Computing ACCESS”.
      2. +
      3. Within the Globus web interface on the File Manager tab, use the Path input box to enter: /projects/your_username_here/ (replacing “your_username_here” with your username from Alpine, including the “@” symbol if it applies).
      4. +
      +
    4. +
    5. Configure File Manager right side (destination selection) +
        +
      1. Within the Globus web interface on the File Manager tab, use the Collection input box to search or select the __Globus Connect Personal label you provided in earlier steps.
      2. +
      3. Within the Globus web interface on the File Manager tab, use the Path input box to enter the local path which you made accessible in earlier steps.
      4. +
      +
    6. +
    7. Begin Globus transfer +
        +
      1. Within the Globus web interface on the File Manager tab on the left side (source selection), check the box next to the file example_data.csv.
      2. +
      3. Within the Globus web interface on the File Manager tab on the left side (source selection), click the “Start ▶️” button to begin the transfer from Alpine to your local directory.
      4. +
      5. After clicking the “Start ▶️” button, you may see a message in the top right with the message “Transfer request submitted successfully”. You can click the link to view the details associated with the transfer.
      6. +
      7. After a short period, the file will be transferred and you should be able to verify the contents on your local machine.
      8. +
      +
    8. +
    +
  4. +
+ +

Further References

+ +]]>
dave-bunten
Tip of the Week: Automate Software Workflows with GitHub Actions2023-03-15T00:00:00+00:002024-05-29T14:08:50+00:00/set-website/preview/pr-36/2023/03/15/Automate-Software-Workflows-with-Github-ActionsTip of the Week: Automate Software Workflows with GitHub Actions + +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ + + +

There are many routine tasks which can be automated to help save time and increase reproducibility in software development. GitHub Actions provides one way to accomplish these tasks using code-based workflows and related workflow implementations. This type of automation is commonly used to perform tests, builds (preparing for the delivery of the code), or delivery itself (sending the code or related artifacts where they will be used).

+ + + +

TLDR (too long, didn’t read); +Use GitHub Actions to perform continuous integration work automatically by leveraging Github’s workflow specification and the existing marketplace of already-created Actions. You can test these workflows with Act, which can enhance development with this feature of Github. Consider making use of “write once, run anywhere” (WORA) and Dagger in conjunction with GitHub Actions to enable reproducible workflows for your software projects.

+ +

Workflows in Software

+ +
+flowchart LR
+  start((start)) --> action
+  action["action(s)"] --> en((end))
+  style start fill:#6EE7B7
+  style en fill:#FCA5A5
+
+ + +

An example workflow.

+ +

Workflows consist of sequenced activities used by various systems. Software development workflows help accomplish work the same way each time by using what are commonly called “workflow engines”. Generally, workflow engines are provided code which indicate beginnings (what triggers a workflow to begin), actions (work being performed in sequence), and an ending (where the workflow stops). There are many workflow engines, including some which help accomplish work alongside version control.

+ +

GitHub Actions

+ +
+flowchart LR
+  subgraph workflow [GitHub Actions Workflow Run]
+    direction LR
+    action["action(s)"] --> en((end))
+    start((event\ntrigger))
+  end
+  start --> action
+  style start fill:#6EE7B7
+  style en fill:#FCA5A5
+
+ +

A diagram showing GitHub Actions as a workflow.

+ +

GitHub Actions is a feature of GitHub which allows you to run workflows in relation to your code as a continuous integration (including automated testing, builds, and deployments) and general automation tool. For example, one can use GitHub Actions to make sure code related to a GitHub Pull Request passes certain tests before it is allowed to be merged. GitHub Actions may be specified using YAML files within your repository’s .github/workflows directory by using syntax specific to Github’s workflow specification. Each YAML file under the .github/workflows directory can specify workflows to accomplish tasks related to your software work. GitHub Actions workflows may be customized to your own needs, or use an existing marketplace of already-created Actions.

+ +
+ + Image showing GitHub Actions tab on GitHub website. + + +
+ Image showing GitHub Actions tab on GitHub website. + +
+ +
+ +

GitHub provides an “Actions” tab for each repository which helps visualize and control Github Actions workflow runs. This tab shows a history of all workflow runs in the repository. For each run, it shows whether it was run successful or not, the associated logs, and controls to cancel or re-run it.

+ +
+

GitHub Actions Examples +GitHub Actions is sometimes better understood with examples. See the following references for a few basic examples of using GitHub Actions in a simulated project repository.

+ + +
+ +

Testing with Act

+ +
+flowchart LR
+  subgraph container ["local simulation container(s)"]
+    direction LR
+    subgraph workflow [GitHub Actions Workflow Run]
+      direction LR
+      start((event\ntrigger))
+      action --> en((end))
+    end
+  end
+  start --> action
+  act\[Run Act] -.-> |Simulate\ntrigger| start
+  style start fill:#6EE7B7
+  style en fill:#FCA5A5
+
+ +

A diagram showing how GitHub Actions workflows may be triggered from Act

+ +

One challenge with GitHub Actions is a lack of standardized local testing tools. For example, how will you know that a new GitHub Actions workflow will function as expected (or at all) without pushing to the GitHub repository? One third-party tool which can help with this is Act. Act uses Docker images which require Docker Desktop to simulate what running a GitHub Action workflow within your local environment. Using Act can sometimes avoid guessing what will occur when a GitHub Action worklow is added to your repository. See Act’s installation documentation for more information on getting started with this tool.

+ +

Nested Workflows with GitHub Actions

+ +
+flowchart LR
+
+  subgraph action ["Nested Workflow (Dagger, etc)"]
+    direction LR
+    actions
+    start2((start)) --> actions
+    actions --> en2((end))
+    en2((end))
+  end
+  subgraph workflow2 [Local Environment Run]
+    direction LR
+    run2[run workflow]
+    en3((end))
+    start3((event\ntrigger))
+  end
+  subgraph workflow [GitHub Actions Workflow Run]
+    direction LR
+    start((event\ntrigger))
+    run[run workflow]
+    en((end))
+  end
+  
+  start --> run
+  start3 --> run2
+  action -.-> run
+  run --> en
+  run2 --> en3
+  action -.-> run2
+  style start fill:#6EE7B7
+  style start2 fill:#D1FAE5
+  style start3 fill:#6EE7B7
+  style en fill:#FCA5A5
+  style en2 fill:#FFE4E6
+  style en3 fill:#FCA5A5
+
+ +

A diagram showing how GitHub Actions may leverage nested workflows with tools like Dagger.

+ +

There are times when GitHub Actions may be too constricting or Act may not accurately simulate workflows. We also might seek to “write once, run anywhere” (WORA) to enable flexible development on many environments. One workaround to this challenge is to use nested workflows which are compatible with local environments and GitHub Actions environments. Dagger is one tool which enables programmatically specifying and using workflows this way. Using Dagger allows you to trigger workflows on your local machine or GitHub Actions with the same underlying engine, meaning there are fewer inconsistencies or guesswork for developers (see here for an explanation of how Dagger works).

+ +

There are also other alternatives to Dagger you may want to consider based on your usecase, preference, or interest. Earthly is similar to Dagger and uses “earthfiles” as a specification. Both Dagger and Earthly (in addition to GitHub Actions) use container-based approaches, which in-and-of themselves present additional alternatives outside the scope of this article.

+ +
+

GitHub Actions with Nested Workflow Example +Reference this example for a brief demonstration of how GitHub Actions and Dagger may be used together.

+ + +
+ +

Closing Remarks

+ +

Using GitHub Actions through the above methods can help automate your technical work and increase the quality of your code with sometimes very little additional effort. Saving time through this form of automation can provide additional flexibility accomplish more complex work which requires your attention (perhaps using timeboxing techniques). Even small amounts of time saved can turn into large opportunities for other work. On this note, be sure to explore how GitHub Actions can improve things for your software endeavors.

]]>
dave-bunten
Tip of the Week: Branch, Review, and Learn2023-02-13T00:00:00+00:002024-05-29T14:08:50+00:00/set-website/preview/pr-36/2023/02/13/Branch-Review-and-LearnTip of the Week: Branch, Review, and Learn + +
+ + +
+ +

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views +expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you +have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to +#software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

+ +
+
+ + + +

Git provides a feature called branching which facilitates parallel and segmented programming work through commits with version control. Using branching enables both work concurrency (multiple people working on the same repository at the same time) as well as a chance to isolate and review specific programming tasks. This article covers some conceptual best practices with branching, reviewing, and merging code using Github.

+ + + +

Please note: the content below represents one opinion in a larger space of Git workflow concepts (it’s not perfect!). Developer cultures may vary on these topics; be sure to acknowledge people and culture over exclusive or absolute dedication to what is found below.

+ +

TLDR (too long, didn’t read); +Use git branching techniques to segment the completion of programming tasks, gradually and consistently committing small changes (practicing festina lente or “make haste, slowly”). When a group of small changes are ready from branches, request pull request reviews and take advantage of comments to continuously improve the work. Prepare for a branch merge after review by deciding which merge strategy is appropriate and automating merge requirements with branch protection rules.

+ +

Concept: Coursework Branching

+ +
+flowchart LR
+ subgraph Course
+    direction LR
+    open["open\nassignment"]
+    turn_in["review\nassignment"]
+  end
+  subgraph Student ["     Student"]
+    direction LR
+    work["completed\nassignment"]
+  end
+  open -.-> turn_in
+  open --> |works towards| work
+  work --> |seeks review| turn\_in
+
+ + +

An example course and student assignment workflow.

+ +

Git branching practices may be understood in context with similar workflows from real life. Consider a student taking a course, where an assignment is given to them to complete. In addition to the steps shown in the diagram above, it’s important to think about why this pattern is beneficial:

+ +
    +
  • Completing an assignment allows us as social, inter-dependent beings to present new findings which enable learning and amalgamation of additional ideas from others.
  • +
  • The timebound nature of assignments enables us to practice some form of timeboxing so as to minimize tasks which may take too much time.
  • +
  • Segmenting applied learning in distinct, goal-orientated chunks helps make larger topics easier to understand.
  • +
+ +

Branching to Complete an “Assignment”

+ +
+%%{init: { 'logLevel': 'debug', 'theme': 'default' , 'themeVariables': {
+      'git0': '#4F46E5',
+      'git1': '#10B981',
+      'gitBranchLabel1': '#ffffff'
+} } }%%
+    gitGraph
+       commit id: "..."
+       commit id: "opened"
+       branch assignment
+       checkout assignment
+       commit id: "completed"
+       checkout main
+
+ +

An example git diagram showing assignment branch based off main.

+ +

Following the course assignment workflow, the diagram above shows an in-progress assignment branch based off of the main branch. When the assignment branch is created, we bring into it everything we know from main (the course) so far in the form of commits, or groups of changes to various files. Branching allows us to make consistent and well described changes based on what’s already happened without impacting others work in the meantime.

+ +
+

Branching best practices:

+ +
    +
  • Keep the name and work with branches dedicated to a specific and focused purpose. For example: a branch named fix-links-in-docs might entail work related to fixing HTTP links within documentation.
  • +
  • Consider the use of Github Forks (along with branches within the fork) to help further isolate and enrich work potential. Forks also allow remixing existing work into new possibilities.
  • +
  • festina lente or “make haste, slowly”: Commits on any branch represent small chunks of a cohesive idea which will eventually be brought to main. It is often beneficial to be consistent with small, gradual commits to avoid a rushed or incomplete submission. The same applies more generally for software; taking time upfront to do things well can mean time saved later.
  • +
+
+ +

Reviewing the Branched Work

+ +
+%%{init: { 'logLevel': 'debug', 'theme': 'default' , 'themeVariables': {
+      'git0': '#6366F1',
+      'git1': '#10B981',
+      'gitBranchLabel1': '#ffffff'
+} } }%%
+    gitGraph
+       commit id: "..."
+       commit id: "opened"
+       branch assignment
+       checkout assignment
+       commit id: "completed"
+       checkout main
+       merge assignment id: "reviewed"
+
+ +

An example git diagram showing assignment branch being merged with main after a review.

+ +

The diagram above depicts a merge from the assignment branch to pull the changes into the main branch, simulating an assignment being returned for review within a course. While merges may be forced without review, it’s a best practice create a Pull Request (PR) Review (also known as a Merge Request (MR) on some systems) and then ask other members of your team to review it. Doing this provides a chance to make revisions before code changes are “finalized” within the main branch.

+ +
+

Github provides special tools for reviews which can assist both the author and reviewer:

+ +
    +
  • Keep code changes intended for review small, enabling reviewers to reason through the work to more quickly provide feedback and practicing incremental continuous improvement (it may be difficult to address everything at once!). This also may denote the git history for a repository in a clearer way.
  • +
  • Github comments: Overall review comments (encompassing all work from the branch) and Inline comments (inquiring about individual lines of code) may be provided. Inline comments may also include code suggestions, which allows for code-based revision suggestions that may be committed directly to the branch using markdown codeblocks ( ``suggestion `).
  • +
  • Github issues: Creating issues from comments allows the creation of new repository issues to address topics outside of the current PR.
  • +
+
+ +

Merging the Branch after Review

+ +
+%%{init: { 'logLevel': 'debug', 'theme': 'default' , 'themeVariables': {
+      'git0': '#6366F1'
+} } }%%
+    gitGraph
+       commit id: "..."
+       commit id: "opened"
+       commit type: HIGHLIGHT id: "reviewed"
+       commit id: "...."
+
+ +

An example git diagram showing the main branch after the assignment branch has been merged (and removed).

+ +

Changes may be made within the assignment branch until the work is in a state where the authors and reviewers are satisfied. At this point, the branch changes may be merged into main. Approvals are sometimes provided informally (for ex., with a comment: “LGTM (looks good to me)!”) or explicitly (for ex., approvals within Github) to indicate or enable branch merge readiness . After the merge, changes may continue to be made in a similar way (perhaps accounting for concurrently branched work elsewhere). Generally, a merged branch may be removed afterwards to help maintain an organized working environment (see Github PR branch removal).

+ +
+

Github provides special tools for merging:

+ +
    +
  • Decide which merge strategy is appropriate (there are many!): There are many merge strategies within Github (merge commits, squash merges, and rebase merging). Take time to understand them and choose which one works best.
  • +
  • Consider using branch protection to automate merge requirements: The main or other branches may be “protected” against merges using branch protection rules. These rules can require reviewer approvals or automatic status checks to pass before changes may be merged.
  • +
  • Use merge queuing to manage multiple PR’s: When there are many unmerged PR’s, it can sometimes be difficult to document and ensure each are merged in a desired sequence. Consider using merge queues to help with this process.
  • +
+
+ +

Additional Resources

+ +

The links below may provide additional guidance on using these git features, including in-depth coverage of various features and related configuration.

+ +]]>
dave-bunten
\ No newline at end of file diff --git a/preview/pr-36/images/Apache_Parquet_logo.svg b/preview/pr-36/images/Apache_Parquet_logo.svg new file mode 100644 index 0000000000..cc7ee3b363 --- /dev/null +++ b/preview/pr-36/images/Apache_Parquet_logo.svg @@ -0,0 +1,17 @@ + + + + Apache Parquet logo + + + + image/svg+xml + + Apache Parquet logo + + + + + + + diff --git a/preview/pr-36/images/French_Orchard_at_Harvest_Time_(Le_verger)_(SM_1444)_cropped.png b/preview/pr-36/images/French_Orchard_at_Harvest_Time_(Le_verger)_(SM_1444)_cropped.png new file mode 100644 index 0000000000..694c0e94fe Binary files /dev/null and b/preview/pr-36/images/French_Orchard_at_Harvest_Time_(Le_verger)_(SM_1444)_cropped.png differ diff --git a/preview/pr-36/images/ahsb.jpg b/preview/pr-36/images/ahsb.jpg new file mode 100644 index 0000000000..97daaf57be Binary files /dev/null and b/preview/pr-36/images/ahsb.jpg differ diff --git a/preview/pr-36/images/anschutz.jpg b/preview/pr-36/images/anschutz.jpg new file mode 100644 index 0000000000..c060d70134 Binary files /dev/null and b/preview/pr-36/images/anschutz.jpg differ diff --git a/preview/pr-36/images/background.jpg b/preview/pr-36/images/background.jpg new file mode 100644 index 0000000000..734a98674b Binary files /dev/null and b/preview/pr-36/images/background.jpg differ diff --git a/preview/pr-36/images/citation-cff-icon.png b/preview/pr-36/images/citation-cff-icon.png new file mode 100644 index 0000000000..a6d6008b34 Binary files /dev/null and b/preview/pr-36/images/citation-cff-icon.png differ diff --git a/preview/pr-36/images/code.jpg b/preview/pr-36/images/code.jpg new file mode 100644 index 0000000000..63cd41c811 Binary files /dev/null and b/preview/pr-36/images/code.jpg differ diff --git a/preview/pr-36/images/contributing-file-with-handshake.png b/preview/pr-36/images/contributing-file-with-handshake.png new file mode 100644 index 0000000000..febcdc51d1 Binary files /dev/null and b/preview/pr-36/images/contributing-file-with-handshake.png differ diff --git a/preview/pr-36/images/csv_vs_parquet_data_on_file.png b/preview/pr-36/images/csv_vs_parquet_data_on_file.png new file mode 100644 index 0000000000..379370f877 Binary files /dev/null and b/preview/pr-36/images/csv_vs_parquet_data_on_file.png differ diff --git a/preview/pr-36/images/dave-bunten.jpg b/preview/pr-36/images/dave-bunten.jpg new file mode 100644 index 0000000000..b120f47f2c Binary files /dev/null and b/preview/pr-36/images/dave-bunten.jpg differ diff --git a/preview/pr-36/images/david-mayer.jpg b/preview/pr-36/images/david-mayer.jpg new file mode 100644 index 0000000000..e82401de51 Binary files /dev/null and b/preview/pr-36/images/david-mayer.jpg differ diff --git a/preview/pr-36/images/duckdb_arrow_query_example.png b/preview/pr-36/images/duckdb_arrow_query_example.png new file mode 100644 index 0000000000..925fb086dc Binary files /dev/null and b/preview/pr-36/images/duckdb_arrow_query_example.png differ diff --git a/preview/pr-36/images/environment-management-tooling.png b/preview/pr-36/images/environment-management-tooling.png new file mode 100644 index 0000000000..5f8dce879a Binary files /dev/null and b/preview/pr-36/images/environment-management-tooling.png differ diff --git a/preview/pr-36/images/faisal-alquaddoomi.jpg b/preview/pr-36/images/faisal-alquaddoomi.jpg new file mode 100644 index 0000000000..5f9dfeef23 Binary files /dev/null and b/preview/pr-36/images/faisal-alquaddoomi.jpg differ diff --git a/preview/pr-36/images/fallback.svg b/preview/pr-36/images/fallback.svg new file mode 100644 index 0000000000..ac12be23a2 --- /dev/null +++ b/preview/pr-36/images/fallback.svg @@ -0,0 +1,10 @@ + + + + + + diff --git a/preview/pr-36/images/gh-actions-checkmark.png b/preview/pr-36/images/gh-actions-checkmark.png new file mode 100644 index 0000000000..5a939cefc4 Binary files /dev/null and b/preview/pr-36/images/gh-actions-checkmark.png differ diff --git a/preview/pr-36/images/github_actions_tab.png b/preview/pr-36/images/github_actions_tab.png new file mode 100644 index 0000000000..523fc0fa9b Binary files /dev/null and b/preview/pr-36/images/github_actions_tab.png differ diff --git a/preview/pr-36/images/github_mermaid_code.png b/preview/pr-36/images/github_mermaid_code.png new file mode 100644 index 0000000000..09f8b34a7c Binary files /dev/null and b/preview/pr-36/images/github_mermaid_code.png differ diff --git a/preview/pr-36/images/github_mermaid_preview.png b/preview/pr-36/images/github_mermaid_preview.png new file mode 100644 index 0000000000..9fd7967188 Binary files /dev/null and b/preview/pr-36/images/github_mermaid_preview.png differ diff --git a/preview/pr-36/images/graph_data_intro.png b/preview/pr-36/images/graph_data_intro.png new file mode 100644 index 0000000000..0cdb986d26 Binary files /dev/null and b/preview/pr-36/images/graph_data_intro.png differ diff --git a/preview/pr-36/images/graph_data_intro_properties.png b/preview/pr-36/images/graph_data_intro_properties.png new file mode 100644 index 0000000000..ae010ecc59 Binary files /dev/null and b/preview/pr-36/images/graph_data_intro_properties.png differ diff --git a/preview/pr-36/images/graph_database.png b/preview/pr-36/images/graph_database.png new file mode 100644 index 0000000000..a208284e83 Binary files /dev/null and b/preview/pr-36/images/graph_database.png differ diff --git a/preview/pr-36/images/graph_database_querying.png b/preview/pr-36/images/graph_database_querying.png new file mode 100644 index 0000000000..fdf3a91875 Binary files /dev/null and b/preview/pr-36/images/graph_database_querying.png differ diff --git a/preview/pr-36/images/graphdb-deployer.jpg b/preview/pr-36/images/graphdb-deployer.jpg new file mode 100644 index 0000000000..ef8fff9e47 Binary files /dev/null and b/preview/pr-36/images/graphdb-deployer.jpg differ diff --git a/preview/pr-36/images/greene-lab.jpg b/preview/pr-36/images/greene-lab.jpg new file mode 100644 index 0000000000..06a2a9f4a7 Binary files /dev/null and b/preview/pr-36/images/greene-lab.jpg differ diff --git a/preview/pr-36/images/icon.png b/preview/pr-36/images/icon.png new file mode 100644 index 0000000000..9e0e98cb7a Binary files /dev/null and b/preview/pr-36/images/icon.png differ diff --git a/preview/pr-36/images/jupyter_mermaid_example.png b/preview/pr-36/images/jupyter_mermaid_example.png new file mode 100644 index 0000000000..28b4f4bff9 Binary files /dev/null and b/preview/pr-36/images/jupyter_mermaid_example.png differ diff --git a/preview/pr-36/images/kuzu_intro.png b/preview/pr-36/images/kuzu_intro.png new file mode 100644 index 0000000000..7da7ccd664 Binary files /dev/null and b/preview/pr-36/images/kuzu_intro.png differ diff --git a/preview/pr-36/images/kuzu_logo.png b/preview/pr-36/images/kuzu_logo.png new file mode 100644 index 0000000000..d27cb4649c Binary files /dev/null and b/preview/pr-36/images/kuzu_logo.png differ diff --git a/preview/pr-36/images/kuzu_table_ingest.png b/preview/pr-36/images/kuzu_table_ingest.png new file mode 100644 index 0000000000..8baa72d47b Binary files /dev/null and b/preview/pr-36/images/kuzu_table_ingest.png differ diff --git a/preview/pr-36/images/logo.svg b/preview/pr-36/images/logo.svg new file mode 100644 index 0000000000..703e8b5094 --- /dev/null +++ b/preview/pr-36/images/logo.svg @@ -0,0 +1,50 @@ + + + + + + + + + + + + + + diff --git a/preview/pr-36/images/manubot.jpg b/preview/pr-36/images/manubot.jpg new file mode 100644 index 0000000000..30c49c5357 Binary files /dev/null and b/preview/pr-36/images/manubot.jpg differ diff --git a/preview/pr-36/images/memray-flamegraph.png b/preview/pr-36/images/memray-flamegraph.png new file mode 100644 index 0000000000..3039b9ed6b Binary files /dev/null and b/preview/pr-36/images/memray-flamegraph.png differ diff --git a/preview/pr-36/images/molevolvr.png b/preview/pr-36/images/molevolvr.png new file mode 100644 index 0000000000..63c9b241a8 Binary files /dev/null and b/preview/pr-36/images/molevolvr.png differ diff --git a/preview/pr-36/images/monarch-cluster.jpg b/preview/pr-36/images/monarch-cluster.jpg new file mode 100644 index 0000000000..063e7caceb Binary files /dev/null and b/preview/pr-36/images/monarch-cluster.jpg differ diff --git a/preview/pr-36/images/monarch-ui.jpg b/preview/pr-36/images/monarch-ui.jpg new file mode 100644 index 0000000000..21a201f258 Binary files /dev/null and b/preview/pr-36/images/monarch-ui.jpg differ diff --git a/preview/pr-36/images/package-audience-trust.png b/preview/pr-36/images/package-audience-trust.png new file mode 100644 index 0000000000..84d015a490 Binary files /dev/null and b/preview/pr-36/images/package-audience-trust.png differ diff --git a/preview/pr-36/images/package-connections.png b/preview/pr-36/images/package-connections.png new file mode 100644 index 0000000000..5904d42e9f Binary files /dev/null and b/preview/pr-36/images/package-connections.png differ diff --git a/preview/pr-36/images/package-magnifying-glass.png b/preview/pr-36/images/package-magnifying-glass.png new file mode 100644 index 0000000000..6e955e9589 Binary files /dev/null and b/preview/pr-36/images/package-magnifying-glass.png differ diff --git a/preview/pr-36/images/parquet_flooring.jpg b/preview/pr-36/images/parquet_flooring.jpg new file mode 100644 index 0000000000..a5256b66bb Binary files /dev/null and b/preview/pr-36/images/parquet_flooring.jpg differ diff --git a/preview/pr-36/images/poetry-icon.png b/preview/pr-36/images/poetry-icon.png new file mode 100644 index 0000000000..acb109846f Binary files /dev/null and b/preview/pr-36/images/poetry-icon.png differ diff --git a/preview/pr-36/images/python-packaging-to-audience.png b/preview/pr-36/images/python-packaging-to-audience.png new file mode 100644 index 0000000000..baef10ee3d Binary files /dev/null and b/preview/pr-36/images/python-packaging-to-audience.png differ diff --git a/preview/pr-36/images/python-version-status.png b/preview/pr-36/images/python-version-status.png new file mode 100644 index 0000000000..626c544f78 Binary files /dev/null and b/preview/pr-36/images/python-version-status.png differ diff --git a/preview/pr-36/images/scalene-web-interface.png b/preview/pr-36/images/scalene-web-interface.png new file mode 100644 index 0000000000..ad9c13d5d7 Binary files /dev/null and b/preview/pr-36/images/scalene-web-interface.png differ diff --git a/preview/pr-36/images/share.jpg b/preview/pr-36/images/share.jpg new file mode 100644 index 0000000000..065afb50fe Binary files /dev/null and b/preview/pr-36/images/share.jpg differ diff --git a/preview/pr-36/images/source-control-authenticity.png b/preview/pr-36/images/source-control-authenticity.png new file mode 100644 index 0000000000..3c49250af9 Binary files /dev/null and b/preview/pr-36/images/source-control-authenticity.png differ diff --git a/preview/pr-36/images/tabular_data_image.png b/preview/pr-36/images/tabular_data_image.png new file mode 100644 index 0000000000..f910877da9 Binary files /dev/null and b/preview/pr-36/images/tabular_data_image.png differ diff --git a/preview/pr-36/images/text-vs-book.png b/preview/pr-36/images/text-vs-book.png new file mode 100644 index 0000000000..a61efc23e4 Binary files /dev/null and b/preview/pr-36/images/text-vs-book.png differ diff --git a/preview/pr-36/images/tis-lab.jpg b/preview/pr-36/images/tis-lab.jpg new file mode 100644 index 0000000000..0d21959827 Binary files /dev/null and b/preview/pr-36/images/tis-lab.jpg differ diff --git a/preview/pr-36/images/vincent-rubinetti.jpg b/preview/pr-36/images/vincent-rubinetti.jpg new file mode 100644 index 0000000000..fe128fd420 Binary files /dev/null and b/preview/pr-36/images/vincent-rubinetti.jpg differ diff --git a/preview/pr-36/images/way-lab.jpg b/preview/pr-36/images/way-lab.jpg new file mode 100644 index 0000000000..1136aee420 Binary files /dev/null and b/preview/pr-36/images/way-lab.jpg differ diff --git a/preview/pr-36/images/work_timebox.png b/preview/pr-36/images/work_timebox.png new file mode 100644 index 0000000000..4043712642 Binary files /dev/null and b/preview/pr-36/images/work_timebox.png differ diff --git a/preview/pr-36/index.html b/preview/pr-36/index.html new file mode 100644 index 0000000000..e999b8ea7b --- /dev/null +++ b/preview/pr-36/index.html @@ -0,0 +1,579 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+
+ + Who we are + +
+ +

Who we are

+ + +

We are a small group of dedicated software developers with the Department of Biomedical Informatics at the University of Colorado Anschutz.

+ + + + +
+
+
+ + + + + +
+ + +
+ + What we do + +
+ +

What we do

+ + +

We support the labs and individuals within the Department by developing high quality web applications, web servers, data visualizations, data pipelines, and much more.

+ + + + +
+
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/members/dave-bunten.html b/preview/pr-36/members/dave-bunten.html new file mode 100644 index 0000000000..6ff62637d2 --- /dev/null +++ b/preview/pr-36/members/dave-bunten.html @@ -0,0 +1,567 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Dave Bunten (@d33bs) | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Dave Bunten is a multiskilled research data engineer with a passion for expanding human potential through software design, collaboration, and innovation. +He brings a diverse background in higher education, healthcare, and software development to help orchestrate scientific data pipelines. +Outside of work, Dave enjoys hiking, biking, painting, and spending time with family.

+ + + + + + +

+ + See Dave Bunten (@d33bs)'s posts on the Blog page + +

+
+ + +
+ + + + + + + diff --git a/preview/pr-36/members/david-mayer.html b/preview/pr-36/members/david-mayer.html new file mode 100644 index 0000000000..436613739e --- /dev/null +++ b/preview/pr-36/members/david-mayer.html @@ -0,0 +1,567 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +David Mayer (@the-mayer) | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

David is a Research Analytics Software Engineer in the CU Department of Biomedical Informatics. He has built and maintained a cloud computing platform for the Coursera Clinical Data Science Specialization, developed phenotypes to support biomedical research/precision medicine, created tools to facilitate data analytics, and is the primary developer for the ReviewR Shiny Application.

+ +

In his free time David enjoys live music, travel, and exploring new cities on bike.

+ + + + + + +

+ + See David Mayer (@the-mayer)'s posts on the Blog page + +

+
+ + +
+ + + + + + + diff --git a/preview/pr-36/members/faisal-alquaddoomi.html b/preview/pr-36/members/faisal-alquaddoomi.html new file mode 100644 index 0000000000..3fa5bd3eaa --- /dev/null +++ b/preview/pr-36/members/faisal-alquaddoomi.html @@ -0,0 +1,567 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Faisal Alquaddoomi (@falquaddoomi) | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+ + + +

Faisal has been working as a full-stack developer for the past fifteen years. He was the lead developer on svip.ch (the Swiss Variant Interpretation Platform), a variant database with a curation interface. He has also worked with the BRCA Challenge on BRCA Exchange as a mobile, web, and backend/pipeline developer.

+ +

Since starting at the University of Colorado Anschutz in July 2021, he has been primarily engaged in porting applications to Google Cloud, including profiling apps for their resource requirements, writing IaC descriptions of the application stacks, and adding instrumentation.

+ + + + + + +

+ + See Faisal Alquaddoomi (@falquaddoomi)'s posts on the Blog page + +

+
+ + +
+ + + + + + + diff --git a/preview/pr-36/members/vincent-rubinetti.html b/preview/pr-36/members/vincent-rubinetti.html new file mode 100644 index 0000000000..a6979aceba --- /dev/null +++ b/preview/pr-36/members/vincent-rubinetti.html @@ -0,0 +1,602 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Vincent Rubinetti (@vincerubinetti) | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+
+ + + +
+ + + + + + + + + +
+ + + + + + + + + +
+ + + + + + + + + +
+ + + + + + + + + +
+ + + + + + + + + +
+ +
+ + +
+ + +

Vince is a staff frontend developer in the Department. +His job is to take the studies, projects, and ideas of his colleagues and turn them into beautiful, dynamic, fully-realized web applications. +His work includes app development, website development, UI/UX design, logo design, and anything else visual or creative. +Outside of the lab, Vince is a freelance music composer for indie video games and the YouTube channel 3Blue1Brown.

+ + + + + + +

+ + See Vincent Rubinetti (@vincerubinetti)'s posts on the Blog page + +

+
+ + +
+ + + + + + + diff --git a/preview/pr-36/portfolio/index.html b/preview/pr-36/portfolio/index.html new file mode 100644 index 0000000000..98ff773c82 --- /dev/null +++ b/preview/pr-36/portfolio/index.html @@ -0,0 +1,1070 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Portfolio | Software Engineering Team + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + Software Engineering Team + + + CU Dept. of Biomedical Informatics + + + + + + + + +
+ +
+ + + + + + + + + + + +
+

Portfolio

+ +

We work with many groups both within and external to the University of Colorado:

+ + + +

Projects

+ +
+ +


+ +
+ + MolEvolvR + + +
+ + + MolEvolvR + + + + + for JRaviLab + + + +

+ A web app that enables researchers to run a general-purpose computational workflow for +characterizing the molecular evolution and phylogeny of their proteins of interest. + +

+ + + + + + +
+ + + server + + +
+ + + +
+
+ +
+ + Pycytominer + + +
+ + + Pycytominer + + + + + for the Way Lab + + + +

+ A suite of common functions used to process high dimensional readouts from high-throughput cell experiments. + +

+ + + + + + + + + + +
+
+ +
+ + CytoTable + + +
+ + + CytoTable + + + + + for the Way Lab + + + +

+ A Python package which enables large data processing to enhance single-cell morphology data analysis. + +

+ + + + + + + + + + +
+
+ +
+ + Simplex + + +
+ + + Simplex + + + + + for the Krishnan Lab + + + +

+ A web app and supporting backend for simplifying scientific and medical writing. + +

+ + + + + + + + + + +
+
+ +
+ + MyGeneset.info + + +
+ + + MyGeneset.info + + + + + for BioThings.io + + + +

+ A web app built from scratch to allow users to collect, save, and share sets of genes. A geneSET companion to MyGene.info. + +

+ + + + + + + + + + +
+
+ +
+ + Word Lapse + + +
+ + + Word Lapse + + + + + for the Greene Lab + + + +

+ A frontend web app and supporting backend server that allows users to explore how a word changes in meaning over time based on natural language processing machine learning. + +

+ + + + + + + + + + +
+
+ +
+ + Monarch Initiative UI + + +
+ + + Monarch Initiative UI + + + + + for TISLab + + + +

+ A redesign and rewrite of the Monarch Initiative application from the ground up, designed to be more modern, maintainable, robust, and accessible. + +

+ + + + + + + + + + +
+
+ +
+ + Monarch Initiative Cloud Migration + + +
+ + + Monarch Initiative Cloud Migration + + + + + for TISLab + + + +

+ A migration of the all Monarch Initiative backend and associated services from physical hardware to Google Cloud, including automated provisioning and deployment via Terraform, Ansible, and Docker Swarm. + +

+ + + + + + + + + + +
+
+ +
+ + GraphDB Deployer + + +
+ + + GraphDB Deployer + + + + + + +

+ Automates the parsing and transformation of a KGX archive into graphdb-ready formats. After the archive is converted, automates provisioning and deployment of Neo4j and Blazegraph instances from a KGX archive. + +

+ + + + + + + + + + +
+
+ +
+ + Lab Website Template + + +
+ + + Lab Website Template + + + + + for the Greene Lab + + + +

+ An easy-to-use, flexible website template for labs. What this very site is built on! + +

+ + + + + + + + + + +
+
+
+ + +
+ + + + + + + diff --git a/preview/pr-36/redirects.json b/preview/pr-36/redirects.json new file mode 100644 index 0000000000..9e26dfeeb6 --- /dev/null +++ b/preview/pr-36/redirects.json @@ -0,0 +1 @@ +{} \ No newline at end of file diff --git a/preview/pr-36/robots.txt b/preview/pr-36/robots.txt new file mode 100644 index 0000000000..daf59619e4 --- /dev/null +++ b/preview/pr-36/robots.txt @@ -0,0 +1 @@ +Sitemap: /set-website/preview/pr-36/sitemap.xml diff --git a/preview/pr-36/sitemap.xml b/preview/pr-36/sitemap.xml new file mode 100644 index 0000000000..a33f34e881 --- /dev/null +++ b/preview/pr-36/sitemap.xml @@ -0,0 +1,103 @@ + + + +/set-website/preview/pr-36/members/dave-bunten.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/members/david-mayer.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/members/faisal-alquaddoomi.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/members/vincent-rubinetti.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/2022/10/17/Use-Linting-Tools-to-Save-Time.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/2022/11/27/Diagrams-as-Code.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/2022/12/05/Data-Engineering-with-SQL-Arrow-and-DuckDB.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/2022/12/12/Remove-Unused-Code-to-Avoid-Decay.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/2023/01/03/Linting-Documentation-as-Code.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/2023/01/17/Timebox-Your-Software-Work.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/2023/01/30/Software-Linting-with-R.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/2023/02/13/Branch-Review-and-Learn.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/2023/03/15/Automate-Software-Workflows-with-Github-Actions.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/2023/07/07/Using-Python-and-Anaconda-with-the-Alpine-HPC-Cluster.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/2023/09/05/Python-Packaging-as-Publishing.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/2023/10/04/Data-Quality-Validation.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/2023/11/15/Codesgiving-Open-source-Contribution-Walkthrough.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/2024/01/22/Python-Memory-Management-and-Troubleshooting.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/2024/02/20/Navigating-Dependency-Chaos-with-Lockfiles.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/2024/03/25/Parquet-Crafting-Data-Bridges-for-Efficient-Computation.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/2024/05/24/Leveraging-K%C3%B9zu-and-Cypher-for-Advanced-Data-Analysis.html +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/blog/ +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/about/ +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/portfolio/ +2024-05-29T14:08:50+00:00 + + +/set-website/preview/pr-36/ +2024-05-29T14:08:50+00:00 + +