The goal of ensemblQueryR is to seemlessly integrate querying of Ensembl databases into your R workflow. It does this by formatting and submitting user queries to the Ensembl API. At present, the package contains functions for the three Ensembl Linkage Disequilibrium (LD) ‘endpoints’: 1. Query LD in a window around one SNP, 2. Query LD for a pair of query SNPs and 3. Query LD for SNPs at a specified genomic locus.
For further information, see our technical release.
You can install ensemblQueryR as below.
# load remotes package
library(remotes)
# to install the development version
remotes::install_github("ainefairbrother/ensemblQueryR")
To check that the Ensembl server is up and running, the server can be pinged.
library(ensemblQueryR)
ensemblQueryR::pingEnsembl()
All functions in this package take the pop
argument which defines the population for which to retrieve LD metrics. To get a list of options for this argument, run the ensemblQueryGetPops()
function.
ensemblQueryR::ensemblQueryGetPops()
Get all variants in LD with one query variant using ensemblQueryLDwithSNPwindow
. This function constrains the query by taking a minimum r-squared cut-off (r2
), D-prime (d.prime
) and window size around the variant in kilobases (window.size
).
ensemblQueryR::ensemblQueryLDwithSNPwindow(rsid="rs3851179",
r2=0.8,
d.prime=0.8,
window.size=500,
pop="1000GENOMES:phase_3:EUR")
For more than one query variant, the ensemblQueryLDwithSNPwindowDataframe
function takes a data.frame
as input, and gets all variants in LD with all query variants in the rsid
column. It is possible to parallelise this operation by setting the number of cores above 1.
# example input data
in.table <- data.frame(rsid=rep(c("rs7153434","rs1963154","rs12672022","rs3852802","rs12324408","rs56346870"), 500))
# run query on in.table
ensemblQueryR::ensemblQueryLDwithSNPwindowDataframe(
in.table=in.table,
r2=0.8,
d.prime=0.8,
window.size=500,
pop="1000GENOMES:phase_3:EUR",
cores=1
)
The ensemblQueryLDwithSNPpair
takes a single pair of query SNPs and returns a data.frame
of LD metrics.
ensemblQueryR::ensemblQueryLDwithSNPpair(
rsid1="rs6792369",
rsid2="rs1042779",
pop="1000GENOMES:phase_3:EUR"
)
The ensemblQueryLDwithSNPpairDataframe
takes a data.frame
with columns rsid1
and rsid2
and returns a data.frame
of LD metrics for all variant pairs. It is possible to parallelise this operation by setting the number of cores above 1.
# example input data
in.table <- data.frame(rsid1=rep("rs6792369", 10), rsid2=rep("rs1042779", 10))
# run query on in.table
ensemblQueryR::ensemblQueryLDwithSNPpairDataframe(
in.table=in.table,
pop="1000GENOMES:phase_3:EUR",
cores=1
)
The ensemblQueryLDwithSNPregion
function takes genomic coordinates as input and returns all variant pairs and their LD metrics within the defined region.
ensemblQueryR::ensemblQueryLDwithSNPregion(
chr="6",
start="25837556",
end="25843455",
pop="1000GENOMES:phase_3:EUR"
)
The ensemblQueryLDwithSNPregionDataframe
takes a data.frame
with columns chr
, start
and end
and returns a data.frame
of LD metrics for all variant pairs contained within each genomic region (each row of in.table
). It is possible to parallelise this operation by setting the number of cores above 1.
# example input data
in.table = data.frame(chr=rep(c("6"), 10),
start=rep(c("25837556"), 10),
end=rep(c("25843455"), 10))
# run query on in.table
ensemblQueryR::ensemblQueryLDwithSNPregionDataframe(
in.table= in.table,
pop="1000GENOMES:phase_3:EUR",
cores = 2
)
We have provided a Docker image, enabling this tool to be run regardless of your local operating system or R version. This can be found here. As long as you have Docker installed, the code below will allow you to pull this image, run a container and execute it. You will then be able to use ensemblQueryR
as described above. A working installation of Docker is required.
docker pull ainefairbrotherbrowne/ensemblqueryr:1.0; \
docker run -t -d --name ensemblqueryr ainefairbrotherbrowne/ensemblqueryr:1.0; \
docker exec -i -t ensemblqueryr R
Aditionally, to mount a volume - enabling you to load a file containing your variant IDs, for example - the following command can be used, replacing path/to/vol
with the path to the directory you wish to mount.
docker pull ainefairbrotherbrowne/ensemblqueryr:1.0; \
docker run -t -d --name ensemblqueryr ainefairbrotherbrowne/ensemblqueryr:1.0 --volume path/to/vol; \
docker exec -i -t ensemblqueryr R
For HPC use-cases where Docker usage becomes problematic owing to user privilege limitations, we have provided a singularity image. This can be found here. The code below will allow you to pull this image, run a container and execute it. You will then be able to use ensemblQueryR
as described above. A working installation of singularity is required.
singularity pull --arch amd64 library://ainefairbrother/ensemblqueryr/ensemblqueryr:sha256.e387ea11ae4eaea8f94d81c625c2c1d5a22dd351858ebcd03910a7736d76ca30; \
singularity exec ensemblqueryr_sha256.e387ea11ae4eaea8f94d81c625c2c1d5a22dd351858ebcd03910a7736d76ca30.sif R
ensemblQueryR
We value contributions from the community to improve ensemblQueryR
. Here’s how you can do this:
ensemblQueryR
repository to your GitHub account using the “Fork” button at the top.Thank you for considering making a contribution to ensemblQueryR
.
Please note that this code is still under development and may contain bugs or errors. It is not recommended for use in production environments. Use at your own risk. I am working on improving the code, addressing any issues, and expanding the package’s capabilities so please check back for updates.