Follows this guide: http://leekgroup.github.io/recount/recount-quickstart.html
## Load libraries
library('recount')
library('SummarizedExperiment')
library('SciServer')
Loading required package: SummarizedExperiment Loading required package: GenomicRanges Loading required package: stats4 Loading required package: BiocGenerics Loading required package: parallel Attaching package: 'BiocGenerics' The following objects are masked from 'package:parallel': clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following objects are masked from 'package:stats': IQR, mad, sd, var, xtabs The following objects are masked from 'package:base': Filter, Find, Map, Position, Reduce, anyDuplicated, append, as.data.frame, cbind, colMeans, colSums, colnames, do.call, duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which, which.max, which.min Loading required package: S4Vectors Attaching package: 'S4Vectors' The following object is masked from 'package:base': expand.grid Loading required package: IRanges Loading required package: GenomeInfoDb Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Loading required package: DelayedArray Loading required package: matrixStats Attaching package: 'matrixStats' The following objects are masked from 'package:Biobase': anyMissing, rowMedians Attaching package: 'DelayedArray' The following objects are masked from 'package:matrixStats': colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges The following object is masked from 'package:base': apply Setting options('download.file.method.GEOquery'='auto') Setting options('GEOquery.inmemory.gpl'=FALSE)
## Locally-mounted directory where recount resource resides
recount_dir <- '/home/idies/workspace/recount01'
## Find a project of interest
project_info <- abstract_search('GSE32465')
project_info
number_samples | species | abstract | project | |
---|---|---|---|---|
340 | 12 | human | Summary: K562-shX cells are made in an effort to validate TFBS data and ChIP-seq antibodies in Myers lab (GSE32465). K562 cells are transduced with lentiviral vector having Tet-inducible shRNA targeting a transcription factor gene. Cells with stable integration of shRNA constructs are selected using puromycin in growth media. Doxycycline is added to the growth media to induce the expression of shRNA and a red fluorescent protein marker. A successful shRNA cell line shows at least a 70% reduction in expression of the target transcription factor as measured by qPCR. For identification, we designated these cell lines as K562-shX, where X is the transcription factor targeted by shRNA and K562 denotes the parent cell line. For example, K562-shATF3 cells are K562 derived cells selected for stable integration of shRNA targeting the transcription factor ATF3 gene and showed at least a 70% reduction in the expression of ATF3 gene when measured by qPCR. Cells growing without doxycycline (uninduced) are used as a control to measure the change in expression of target transcription factor gene after induction of shRNA using doxycycline. For detailed growth and culturing protocols for these cells please refer to http://hudsonalpha.org/myers-lab/protocols . To identify the potential downstream targets of the candidate transcription factor, analyze the mRNA expression profile of the uninduced and induced K562-shX using RNA-seq. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Overall Design: Make K562-shX cells as described in the http://hudsonalpha.org/myers-lab/protocols . Measure the mRNA expression levels in uninduced K562-shX and induced K562-shX cells in two biological replicates using RNA-seq. Identify the potential downstream targets of the candidate transcription factor. | SRP009615 |
study <- project_info$project # SRP009615
study_dir <- file.path(recount_dir, study)
## Load the data
load(file.path(study_dir, 'rse_gene.Rdata'))
## Find the GEO accession ids
# (NCBI connection closed can happen error here sometimes)
geoids <- sapply(colData(rse_gene)$run, find_geo)
library(SciServer)
username = Authentication.getKeystoneUserWithToken(Authentication.getToken())$userName
## Get the sammple information from GEO
geoinfo <- lapply(geoids, function(x) {
geo_info(x, destdir=paste0('/home/idies/workspace/Temporary/', username, '/scratch')) })
Using locally cached version of GSM836270 found here: /home/idies/workspace/Temporary/langmead/scratch/GSM836270.soft Using locally cached version of GSM836271 found here: /home/idies/workspace/Temporary/langmead/scratch/GSM836271.soft Using locally cached version of GSM836272 found here: /home/idies/workspace/Temporary/langmead/scratch/GSM836272.soft Using locally cached version of GSM836273 found here: /home/idies/workspace/Temporary/langmead/scratch/GSM836273.soft Using locally cached version of GSM847561 found here: /home/idies/workspace/Temporary/langmead/scratch/GSM847561.soft Warning message in `[<-`(`*tmp*`, i, value = IRanges::CharacterList(unlist(unname(result[i])))): "implicit list embedding of S4 objects is deprecated"Using locally cached version of GSM847562 found here: /home/idies/workspace/Temporary/langmead/scratch/GSM847562.soft Warning message in `[<-`(`*tmp*`, i, value = IRanges::CharacterList(unlist(unname(result[i])))): "implicit list embedding of S4 objects is deprecated"Using locally cached version of GSM847563 found here: /home/idies/workspace/Temporary/langmead/scratch/GSM847563.soft Warning message in `[<-`(`*tmp*`, i, value = IRanges::CharacterList(unlist(unname(result[i])))): "implicit list embedding of S4 objects is deprecated"Using locally cached version of GSM847564 found here: /home/idies/workspace/Temporary/langmead/scratch/GSM847564.soft Warning message in `[<-`(`*tmp*`, i, value = IRanges::CharacterList(unlist(unname(result[i])))): "implicit list embedding of S4 objects is deprecated"Using locally cached version of GSM847565 found here: /home/idies/workspace/Temporary/langmead/scratch/GSM847565.soft Warning message in `[<-`(`*tmp*`, i, value = IRanges::CharacterList(unlist(unname(result[i])))): "implicit list embedding of S4 objects is deprecated"Using locally cached version of GSM847566 found here: /home/idies/workspace/Temporary/langmead/scratch/GSM847566.soft Warning message in `[<-`(`*tmp*`, i, value = IRanges::CharacterList(unlist(unname(result[i])))): "implicit list embedding of S4 objects is deprecated"Using locally cached version of GSM847567 found here: /home/idies/workspace/Temporary/langmead/scratch/GSM847567.soft Warning message in `[<-`(`*tmp*`, i, value = IRanges::CharacterList(unlist(unname(result[i])))): "implicit list embedding of S4 objects is deprecated"Using locally cached version of GSM847568 found here: /home/idies/workspace/Temporary/langmead/scratch/GSM847568.soft Warning message in `[<-`(`*tmp*`, i, value = IRanges::CharacterList(unlist(unname(result[i])))): "implicit list embedding of S4 objects is deprecated"
## Extract the sample characteristics
geochar <- lapply(geoinfo, geo_characteristics)
## Note that the information for this study is a little inconsistent, so we
## have to fix it.
geochar <- do.call(rbind, lapply(geochar, function(x) {
if('cells' %in% colnames(x)) {
colnames(x)[colnames(x) == 'cells'] <- 'cell.line'
return(x)
} else {
return(x)
}
}))
## We can now define some sample information to use
sample_info <- data.frame(
run = colData(rse_gene)$run,
group = sapply(geoinfo, function(x) { ifelse(grepl('uninduced', x$title),
'uninduced', 'induced') }),
gene_target = sapply(geoinfo, function(x) { strsplit(strsplit(x$title,
'targeting ')[[1]][2], ' gene')[[1]][1] })
)
## Scale counts by taking into account the total coverage per sample
rse <- scale_counts(rse_gene)
## Add sample information for DE analysis
colData(rse)$group <- sample_info$group
colData(rse)$gene_target <- sample_info$gene_target
## Perform differential gene expression analysis with DESeq2
library('DESeq2')
## Specify design and switch to DESeq2 format
dds <- DESeqDataSet(rse, ~ gene_target + group)
## Perform DE analysis
dds <- DESeq(dds, test = 'LRT', reduced = ~ gene_target, fitType = 'local', parallel=T)
res <- results(dds)
converting counts to integer mode estimating size factors estimating dispersions gene-wise dispersion estimates: 14 workers mean-dispersion relationship final dispersion estimates, fitting model and testing: 14 workers
Error in vector(mode(x), length(NArows)): vector: cannot make a vector of mode 'S4'. Traceback: 1. DESeq(dds, test = "LRT", reduced = ~gene_target, fitType = "local", . parallel = T) 2. DESeqParallel(object, test = test, fitType = fitType, betaPrior = betaPrior, . full = full, reduced = reduced, quiet = quiet, modelMatrix = modelMatrix, . modelMatrixType = modelMatrixType, BPPARAM = BPPARAM) 3. buildDataFrameWithNARows(mcols(objectNZ), mcols(object)$allZero) 4. DataFrame(lapply(resultsList, function(x) vector(mode(x), length(NArows)))) 5. lapply(resultsList, function(x) vector(mode(x), length(NArows))) 6. lapply(resultsList, function(x) vector(mode(x), length(NArows))) 7. lapply(as.list(X), FUN = FUN, ...) 8. lapply(as.list(X), FUN = FUN, ...) 9. FUN(X[[i]], ...)
sessionInfo()
R version 3.4.1 (2017-06-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Scientific Linux 7.5 (Nitrogen) Matrix products: default BLAS: /home/idies/miniconda3/lib/R/lib/libRblas.so LAPACK: /home/idies/miniconda3/lib/R/lib/libRlapack.so locale: [1] C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] DESeq2_1.18.1 SciServer_2.0.0 [3] recount_1.4.6 SummarizedExperiment_1.8.1 [5] DelayedArray_0.4.1 matrixStats_0.53.1 [7] Biobase_2.38.0 GenomicRanges_1.30.3 [9] GenomeInfoDb_1.14.0 IRanges_2.12.0 [11] S4Vectors_0.16.0 BiocGenerics_0.24.0 loaded via a namespace (and not attached): [1] colorspace_1.3-2 IRdisplay_0.4.4 qvalue_2.10.0 [4] htmlTable_1.9 XVector_0.18.0 base64enc_0.1-3 [7] bit64_0.9-5 AnnotationDbi_1.40.0 xml2_1.2.0 [10] codetools_0.2-15 splines_3.4.1 geneplotter_1.56.0 [13] knitr_1.20 IRkernel_0.8.12 Formula_1.2-1 [16] jsonlite_1.5 Rsamtools_1.30.0 annotate_1.56.0 [19] cluster_2.0.6 rentrez_1.1.0 readr_1.1.1 [22] compiler_3.4.1 httr_1.3.1 backports_1.1.2 [25] assertthat_0.2.0 Matrix_1.2-14 lazyeval_0.2.1 [28] limma_3.34.9 acepack_1.4.1 htmltools_0.3.6 [31] prettyunits_1.0.2 tools_3.4.1 bindrcpp_0.2.2 [34] gtable_0.2.0 glue_1.3.0 GenomeInfoDbData_1.0.0 [37] reshape2_1.4.3 dplyr_0.7.6 doRNG_1.6.6 [40] Rcpp_0.12.17 bumphunter_1.20.0 Biostrings_2.46.0 [43] rtracklayer_1.38.3 iterators_1.0.10 stringr_1.3.1 [46] rngtools_1.3.1 XML_3.98-1.11 zlibbioc_1.24.0 [49] scales_0.5.0 BSgenome_1.46.0 VariantAnnotation_1.24.1 [52] hms_0.3 GEOquery_2.46.15 derfinderHelper_1.12.0 [55] RColorBrewer_1.1-2 curl_3.2 memoise_1.1.0 [58] gridExtra_2.3 ggplot2_3.0.0 downloader_0.4 [61] pkgmaker_0.22 biomaRt_2.34.2 rpart_4.1-13 [64] latticeExtra_0.6-28 stringi_1.2.3 RSQLite_2.0 [67] genefilter_1.60.0 foreach_1.4.4 RMySQL_0.10.13 [70] checkmate_1.8.5 GenomicFeatures_1.28.5 BiocParallel_1.12.0 [73] repr_0.15.0 rlang_0.2.1 pkgconfig_2.0.1 [76] GenomicFiles_1.14.0 bitops_1.0-6 evaluate_0.10.1 [79] lattice_0.20-34 purrr_0.2.4 bindr_0.1.1 [82] GenomicAlignments_1.14.1 htmlwidgets_1.0 bit_1.1-12 [85] tidyselect_0.2.4 plyr_1.8.4 magrittr_1.5 [88] R6_2.2.2 Hmisc_4.0-3 pbdZMQ_0.3-2 [91] DBI_1.0.0 pillar_1.2.2 foreign_0.8-67 [94] survival_2.42-6 RCurl_1.95-4.11 nnet_7.3-12 [97] tibble_1.4.2 crayon_1.3.4 derfinder_1.12.0 [100] uuid_0.1-2 progress_1.2.0 locfit_1.5-9.1 [103] grid_3.4.1 data.table_1.11.4 blob_1.1.1 [106] digest_0.6.15 xtable_1.8-2 tidyr_0.8.1 [109] munsell_0.5.0 registry_0.5