Skip to content
This repository has been archived by the owner on Mar 21, 2019. It is now read-only.

Overview

cstubben edited this page Sep 5, 2014 · 5 revisions

The pmcOAI function loads PMC Open Access articles into an XMLInternalDocument. Other functions are used to parse the XML document including

  • pmcText splits xml into a list of subsections, where each subsection is a vector of paragraphs or sentences
  • pmcTable extracts tables into a list of data frames
  • pmcSupp lists supplementary files and optionally downloads them
  • pmcRef returns a data frame containing references
  • pmcMetadata lists metadata fields

The package was initially described in BMC Bioinformatics and that paper focused on extracting locus tags mentioned in full text and tables. You can use this code to find Burkholderia pseudomallei locus tags

bpgff <- read.ncbi.ftp( "Burkholderia_pseudomallei/GCF_000011545", "gff")
tags <- "(BPSL0* OR BPSL1* OR BPSL2* OR BPSL3* OR BPSS0* OR BPSS1* OR BPSS2*)"
bp <- ncbiPMC(paste(tags, "AND (Burkholderia[TITLE] OR Burkholderia[ABSTRACT]) AND open access[FILTER]")) 
pmcLoop(bp, bpgff, prefix = "BPS[SL]" , suffix= "[abc]",  file="bp.tab")

Check the links for more details

Clone this wiki locally