Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: R wrapper #11

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .github/workflows/check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,29 @@ jobs:
source $(hatch env find)/bin/activate
maturin develop --release --manifest-path Cargo.toml --features test
hatch run test --verbose

r_port:
needs: [check-unix]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Set up Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable

- name: Install R
uses: r-lib/actions/setup-r@v2

- name: Install R packages
run: Rscript -e 'install.packages(c("devtools", "testthat", "roxygen2", "covr", "digest"))'

- name: Install Rust R package
run: Rscript -e 'devtools::install_local("r/gtfsort")'

- name: Run R tests
run: Rscript -e 'devtools::test("r/gtfsort")'

benchmark:
needs: [check-unix, check-windows]
Expand Down
7 changes: 0 additions & 7 deletions gtfsort/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 1 addition & 2 deletions gtfsort/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ libc = "0.2.151"
hashbrown = {version="0.14", features=["rayon"]}
dashmap = "5.5"
time = "0.3.36"
libR-sys = { version = "0.7.0", optional = true }
reqwest = { version = "0.12.5", features = ["blocking"], optional = true }
crc = { version = "3.2.1", optional = true}
flate2 = { version ="1.0.30" , optional = true}
Expand All @@ -53,7 +52,7 @@ opt-level = 3
[lib]
name = "gtfsort"
path = "src/lib.rs"
crate-type = ["cdylib", "rlib"]
crate-type = ["cdylib", "rlib", "staticlib"]

[[bin]]
name = "gtfsort"
Expand Down
2 changes: 2 additions & 0 deletions r/gtfsort/.Rbuildignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
^src/\.cargo$
^src/rust/gtfsort/target/
17 changes: 17 additions & 0 deletions r/gtfsort/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Package: gtfsort
Title: An Optimized Lexicographiclexicographic Chr/Pos/Feature GTF/GFF Sorter.
Version: 0.0.0.9000
Authors@R:
c(person("Alejandro", "Gonzales-Irribarren", , "[email protected]", role = c("aut", "cre"),
comment = c(ORCID = "0000-0001-7010-8146")),
person("Anne", "Fu", , "[email protected]", role = "aut",
comment = c(ORCID = "0000-0002-9025-6071")))
Description: While current tools (most of them GFF3-focused) have been recommended for sorting GTF files, none are directed towards chr/pos/feature ordering. This approach ensures custom sorting directionality, which is useful for reducing computation times in tools that work with sorted GTF files. Furthermore, it provides a friendly and organized visualization of gene structures (gene -> transcript -> CDS/exon -> start/stop -> UTR/Sel), allowing users to search for features more efficiently.
License: use_mit_license()
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
Config/rextendr/version: 0.3.1
Suggests:
testthat (>= 3.0.0)
Config/testthat/edition: 3
5 changes: 5 additions & 0 deletions r/gtfsort/NAMESPACE
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Generated by roxygen2: do not edit by hand

export(sort_annotations)
export(sort_annotations_str)
useDynLib(gtfsort, .registration = TRUE)
41 changes: 41 additions & 0 deletions r/gtfsort/R/extendr-wrappers.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Generated by extendr: Do not edit by hand

# nolint start

#
# This file was created with the following call:
# .Call("wrap__make_gtfsort_wrappers", use_symbols = TRUE, package_name = "gtfsort")

#' @usage NULL
#' @useDynLib gtfsort, .registration = TRUE
NULL

#' Sort a GTF/GFF/GFF3 file.
#'
#' @param input The input file path.
#' @param output The output file path.
#' @param threads The number of threads to use.
#' @return a list with the input and output file paths, the number of threads used, whether the input and output were memory-mapped, the time taken to parse, index, and write the output, and the memory used before and after the operation.
#'
#' @examples
#' sort_annotations("tests/data/chr1.gtf", "tests/data/chr1.sorted.gtf", 1)
#'
#' @export
sort_annotations <- function(input, output, threads) .Call(wrap__sort_annotations, input, output, threads)

#' Sort a string with GTF/GFF/GFF3 annotations.
#'
#' @param mode The mode to parse the annotations. Either "gtf" or "gff" or "gff3".
#' @param input The string with the GTF/GFF/GFF3 annotations.
#' @param output A function that will be called with each chunk of the sorted string. Return NULL to continue, or a string to stop.
#' @param threads The number of threads to use.
#' @return a list with the input and output strings, the number of threads used, whether the input and output were memory-mapped, the time taken to parse, index, and write the output, and the memory used before and after the operation.
#'
#' @examples
#' sort_annotations_str("gtf", "chr1\t.\texon\t11869\t12227\t.\t+\t.\tgene_id \"ENSG00000223972.5\"; transcript_id \"ENST00000456328.2\"; exon_number \"1\";\nchr1\t.\texon\t12613\t12721\t.\t+\t.\tgene_id \"ENSG00000223972.5\"; transcript_id \"ENST00000456328.2\"; exon_number \"2\";", function(str) { cat(str); return(NULL); }, 1)
#'
#' @export
sort_annotations_str <- function(mode, input, output, threads) .Call(wrap__sort_annotations_str, mode, input, output, threads)


# nolint end
25 changes: 25 additions & 0 deletions r/gtfsort/man/sort_annotations.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

27 changes: 27 additions & 0 deletions r/gtfsort/man/sort_annotations_str.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions r/gtfsort/src/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
*.o
*.so
*.dll
target
.cargo
30 changes: 30 additions & 0 deletions r/gtfsort/src/Makevars
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
TARGET_DIR = ./rust/target
LIBDIR = $(TARGET_DIR)/release
STATLIB = $(LIBDIR)/libgtfsort.a
PKG_LIBS = -L$(LIBDIR) -lgtfsort

all: C_clean

$(SHLIB): $(STATLIB)

CARGOTMP = $(CURDIR)/.cargo

$(STATLIB):
# In some environments, ~/.cargo/bin might not be included in PATH, so we need
# to set it here to ensure cargo can be invoked. It is appended to PATH and
# therefore is only used if cargo is absent from the user's PATH.
if [ "$(NOT_CRAN)" != "true" ]; then \
export CARGO_HOME=$(CARGOTMP); \
fi && \
export PATH="$(PATH):$(HOME)/.cargo/bin" && \
cargo build --lib --release --manifest-path=./rust/Cargo.toml --target-dir $(TARGET_DIR)
if [ "$(NOT_CRAN)" != "true" ]; then \
rm -Rf $(CARGOTMP) && \
rm -Rf $(LIBDIR)/build; \
fi

C_clean:
rm -Rf $(SHLIB) $(STATLIB) $(OBJECTS)

clean:
rm -Rf $(SHLIB) $(STATLIB) $(OBJECTS) rust/target
5 changes: 5 additions & 0 deletions r/gtfsort/src/Makevars.ucrt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Rtools42 doesn't have the linker in the location that cargo expects, so we
# need to overwrite it via configuration.
CARGO_LINKER = x86_64-w64-mingw32.static.posix-gcc.exe

include Makevars.win
40 changes: 40 additions & 0 deletions r/gtfsort/src/Makevars.win
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
TARGET = $(subst 64,x86_64,$(subst 32,i686,$(WIN)))-pc-windows-gnu

TARGET_DIR = ./rust/target
LIBDIR = $(TARGET_DIR)/$(TARGET)/release
STATLIB = $(LIBDIR)/libgtfsort.a
PKG_LIBS = -L$(LIBDIR) -lgtfsort -lws2_32 -ladvapi32 -luserenv -lbcrypt -lntdll

all: C_clean

$(SHLIB): $(STATLIB)

CARGOTMP = $(CURDIR)/.cargo

$(STATLIB):
mkdir -p $(TARGET_DIR)/libgcc_mock
# `rustc` adds `-lgcc_eh` flags to the compiler, but Rtools' GCC doesn't have
# `libgcc_eh` due to the compilation settings. So, in order to please the
# compiler, we need to add empty `libgcc_eh` to the library search paths.
#
# For more details, please refer to
# https://github.com/r-windows/rtools-packages/blob/2407b23f1e0925bbb20a4162c963600105236318/mingw-w64-gcc/PKGBUILD#L313-L316
touch $(TARGET_DIR)/libgcc_mock/libgcc_eh.a

# CARGO_LINKER is provided in Makevars.ucrt for R >= 4.2
if [ "$(NOT_CRAN)" != "true" ]; then \
export CARGO_HOME=$(CARGOTMP); \
fi && \
export CARGO_TARGET_X86_64_PC_WINDOWS_GNU_LINKER="$(CARGO_LINKER)" && \
export LIBRARY_PATH="$${LIBRARY_PATH};$(CURDIR)/$(TARGET_DIR)/libgcc_mock" && \
cargo build --target=$(TARGET) --lib --release --manifest-path=./rust/Cargo.toml --target-dir $(TARGET_DIR)
if [ "$(NOT_CRAN)" != "true" ]; then \
rm -Rf $(CARGOTMP) && \
rm -Rf $(LIBDIR)/build; \
fi

C_clean:
rm -Rf $(SHLIB) $(STATLIB) $(OBJECTS)

clean:
rm -Rf $(SHLIB) $(STATLIB) $(OBJECTS) $(TARGET_DIR)
8 changes: 8 additions & 0 deletions r/gtfsort/src/entrypoint.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
// We need to forward routine registration from C to Rust
// to avoid the linker removing the static library.

void R_init_gtfsort_extendr(void *dll);

void R_init_gtfsort(void *dll) {
R_init_gtfsort_extendr(dll);
}
2 changes: 2 additions & 0 deletions r/gtfsort/src/gtfsort-win.def
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
EXPORTS
R_init_gtfsort
Loading
Loading