recount quick start

Follows this guide: http://leekgroup.github.io/recount/recount-quickstart.html

In [1]:
## Load libraries
library('recount')
library('SummarizedExperiment')
library('SciServer')
Loading required package: SummarizedExperiment
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: DelayedArray
Loading required package: matrixStats

Attaching package: 'matrixStats'

The following objects are masked from 'package:Biobase':

    anyMissing, rowMedians


Attaching package: 'DelayedArray'

The following objects are masked from 'package:matrixStats':

    colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

The following object is masked from 'package:base':

    apply

Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)
In [2]:
## Locally-mounted directory where recount resource resides
recount_dir <- '/home/idies/workspace/recount01'
In [3]:
## Find a project of interest
project_info <- abstract_search('GSE32465')
project_info
number_samplesspeciesabstractproject
34012 human Summary: K562-shX cells are made in an effort to validate TFBS data and ChIP-seq antibodies in Myers lab (GSE32465). K562 cells are transduced with lentiviral vector having Tet-inducible shRNA targeting a transcription factor gene. Cells with stable integration of shRNA constructs are selected using puromycin in growth media. Doxycycline is added to the growth media to induce the expression of shRNA and a red fluorescent protein marker. A successful shRNA cell line shows at least a 70% reduction in expression of the target transcription factor as measured by qPCR. For identification, we designated these cell lines as K562-shX, where X is the transcription factor targeted by shRNA and K562 denotes the parent cell line. For example, K562-shATF3 cells are K562 derived cells selected for stable integration of shRNA targeting the transcription factor ATF3 gene and showed at least a 70% reduction in the expression of ATF3 gene when measured by qPCR. Cells growing without doxycycline (uninduced) are used as a control to measure the change in expression of target transcription factor gene after induction of shRNA using doxycycline. For detailed growth and culturing protocols for these cells please refer to http://hudsonalpha.org/myers-lab/protocols . To identify the potential downstream targets of the candidate transcription factor, analyze the mRNA expression profile of the uninduced and induced K562-shX using RNA-seq. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Overall Design: Make K562-shX cells as described in the http://hudsonalpha.org/myers-lab/protocols . Measure the mRNA expression levels in uninduced K562-shX and induced K562-shX cells in two biological replicates using RNA-seq. Identify the potential downstream targets of the candidate transcription factor.SRP009615
In [4]:
study <- project_info$project  # SRP009615
study_dir <- file.path(recount_dir, study)
In [5]:
## Load the data
In [6]:
load(file.path(study_dir, 'rse_gene.Rdata'))
In [7]:
## Find the GEO accession ids
# (NCBI connection closed can happen error here sometimes)
geoids <- sapply(colData(rse_gene)$run, find_geo)
In [9]:
library(SciServer)
In [13]:
username = Authentication.getKeystoneUserWithToken(Authentication.getToken())$userName
In [14]:
## Get the sammple information from GEO
geoinfo <- lapply(geoids, function(x) {
    geo_info(x, destdir=paste0('/home/idies/workspace/Temporary/', username, '/scratch')) })
Using locally cached version of GSM836270 found here:
/home/idies/workspace/Temporary/langmead/scratch/GSM836270.soft 
Using locally cached version of GSM836271 found here:
/home/idies/workspace/Temporary/langmead/scratch/GSM836271.soft 
Using locally cached version of GSM836272 found here:
/home/idies/workspace/Temporary/langmead/scratch/GSM836272.soft 
Using locally cached version of GSM836273 found here:
/home/idies/workspace/Temporary/langmead/scratch/GSM836273.soft 
Using locally cached version of GSM847561 found here:
/home/idies/workspace/Temporary/langmead/scratch/GSM847561.soft 
Warning message in `[<-`(`*tmp*`, i, value = IRanges::CharacterList(unlist(unname(result[i])))):
"implicit list embedding of S4 objects is deprecated"Using locally cached version of GSM847562 found here:
/home/idies/workspace/Temporary/langmead/scratch/GSM847562.soft 
Warning message in `[<-`(`*tmp*`, i, value = IRanges::CharacterList(unlist(unname(result[i])))):
"implicit list embedding of S4 objects is deprecated"Using locally cached version of GSM847563 found here:
/home/idies/workspace/Temporary/langmead/scratch/GSM847563.soft 
Warning message in `[<-`(`*tmp*`, i, value = IRanges::CharacterList(unlist(unname(result[i])))):
"implicit list embedding of S4 objects is deprecated"Using locally cached version of GSM847564 found here:
/home/idies/workspace/Temporary/langmead/scratch/GSM847564.soft 
Warning message in `[<-`(`*tmp*`, i, value = IRanges::CharacterList(unlist(unname(result[i])))):
"implicit list embedding of S4 objects is deprecated"Using locally cached version of GSM847565 found here:
/home/idies/workspace/Temporary/langmead/scratch/GSM847565.soft 
Warning message in `[<-`(`*tmp*`, i, value = IRanges::CharacterList(unlist(unname(result[i])))):
"implicit list embedding of S4 objects is deprecated"Using locally cached version of GSM847566 found here:
/home/idies/workspace/Temporary/langmead/scratch/GSM847566.soft 
Warning message in `[<-`(`*tmp*`, i, value = IRanges::CharacterList(unlist(unname(result[i])))):
"implicit list embedding of S4 objects is deprecated"Using locally cached version of GSM847567 found here:
/home/idies/workspace/Temporary/langmead/scratch/GSM847567.soft 
Warning message in `[<-`(`*tmp*`, i, value = IRanges::CharacterList(unlist(unname(result[i])))):
"implicit list embedding of S4 objects is deprecated"Using locally cached version of GSM847568 found here:
/home/idies/workspace/Temporary/langmead/scratch/GSM847568.soft 
Warning message in `[<-`(`*tmp*`, i, value = IRanges::CharacterList(unlist(unname(result[i])))):
"implicit list embedding of S4 objects is deprecated"
In [15]:
## Extract the sample characteristics
geochar <- lapply(geoinfo, geo_characteristics)
In [16]:
## Note that the information for this study is a little inconsistent, so we
## have to fix it.
geochar <- do.call(rbind, lapply(geochar, function(x) {
    if('cells' %in% colnames(x)) {
        colnames(x)[colnames(x) == 'cells'] <- 'cell.line'
        return(x)
    } else {
        return(x)
    }
}))
In [17]:
## We can now define some sample information to use
sample_info <- data.frame(
    run = colData(rse_gene)$run,
    group = sapply(geoinfo, function(x) { ifelse(grepl('uninduced', x$title),
        'uninduced', 'induced') }),
    gene_target = sapply(geoinfo, function(x) { strsplit(strsplit(x$title,
        'targeting ')[[1]][2], ' gene')[[1]][1] })
)
In [18]:
## Scale counts by taking into account the total coverage per sample
rse <- scale_counts(rse_gene)
In [19]:
## Add sample information for DE analysis
colData(rse)$group <- sample_info$group
colData(rse)$gene_target <- sample_info$gene_target
In [20]:
## Perform differential gene expression analysis with DESeq2
library('DESeq2')

## Specify design and switch to DESeq2 format
dds <- DESeqDataSet(rse, ~ gene_target + group)

## Perform DE analysis
dds <- DESeq(dds, test = 'LRT', reduced = ~ gene_target, fitType = 'local', parallel=T)
res <- results(dds)
converting counts to integer mode
estimating size factors
estimating dispersions
gene-wise dispersion estimates: 14 workers
mean-dispersion relationship
final dispersion estimates, fitting model and testing: 14 workers
Error in vector(mode(x), length(NArows)): vector: cannot make a vector of mode 'S4'.
Traceback:

1. DESeq(dds, test = "LRT", reduced = ~gene_target, fitType = "local", 
 .     parallel = T)
2. DESeqParallel(object, test = test, fitType = fitType, betaPrior = betaPrior, 
 .     full = full, reduced = reduced, quiet = quiet, modelMatrix = modelMatrix, 
 .     modelMatrixType = modelMatrixType, BPPARAM = BPPARAM)
3. buildDataFrameWithNARows(mcols(objectNZ), mcols(object)$allZero)
4. DataFrame(lapply(resultsList, function(x) vector(mode(x), length(NArows))))
5. lapply(resultsList, function(x) vector(mode(x), length(NArows)))
6. lapply(resultsList, function(x) vector(mode(x), length(NArows)))
7. lapply(as.list(X), FUN = FUN, ...)
8. lapply(as.list(X), FUN = FUN, ...)
9. FUN(X[[i]], ...)
In [21]:
sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.5 (Nitrogen)

Matrix products: default
BLAS: /home/idies/miniconda3/lib/R/lib/libRblas.so
LAPACK: /home/idies/miniconda3/lib/R/lib/libRlapack.so

locale:
[1] C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] DESeq2_1.18.1              SciServer_2.0.0           
 [3] recount_1.4.6              SummarizedExperiment_1.8.1
 [5] DelayedArray_0.4.1         matrixStats_0.53.1        
 [7] Biobase_2.38.0             GenomicRanges_1.30.3      
 [9] GenomeInfoDb_1.14.0        IRanges_2.12.0            
[11] S4Vectors_0.16.0           BiocGenerics_0.24.0       

loaded via a namespace (and not attached):
  [1] colorspace_1.3-2         IRdisplay_0.4.4          qvalue_2.10.0           
  [4] htmlTable_1.9            XVector_0.18.0           base64enc_0.1-3         
  [7] bit64_0.9-5              AnnotationDbi_1.40.0     xml2_1.2.0              
 [10] codetools_0.2-15         splines_3.4.1            geneplotter_1.56.0      
 [13] knitr_1.20               IRkernel_0.8.12          Formula_1.2-1           
 [16] jsonlite_1.5             Rsamtools_1.30.0         annotate_1.56.0         
 [19] cluster_2.0.6            rentrez_1.1.0            readr_1.1.1             
 [22] compiler_3.4.1           httr_1.3.1               backports_1.1.2         
 [25] assertthat_0.2.0         Matrix_1.2-14            lazyeval_0.2.1          
 [28] limma_3.34.9             acepack_1.4.1            htmltools_0.3.6         
 [31] prettyunits_1.0.2        tools_3.4.1              bindrcpp_0.2.2          
 [34] gtable_0.2.0             glue_1.3.0               GenomeInfoDbData_1.0.0  
 [37] reshape2_1.4.3           dplyr_0.7.6              doRNG_1.6.6             
 [40] Rcpp_0.12.17             bumphunter_1.20.0        Biostrings_2.46.0       
 [43] rtracklayer_1.38.3       iterators_1.0.10         stringr_1.3.1           
 [46] rngtools_1.3.1           XML_3.98-1.11            zlibbioc_1.24.0         
 [49] scales_0.5.0             BSgenome_1.46.0          VariantAnnotation_1.24.1
 [52] hms_0.3                  GEOquery_2.46.15         derfinderHelper_1.12.0  
 [55] RColorBrewer_1.1-2       curl_3.2                 memoise_1.1.0           
 [58] gridExtra_2.3            ggplot2_3.0.0            downloader_0.4          
 [61] pkgmaker_0.22            biomaRt_2.34.2           rpart_4.1-13            
 [64] latticeExtra_0.6-28      stringi_1.2.3            RSQLite_2.0             
 [67] genefilter_1.60.0        foreach_1.4.4            RMySQL_0.10.13          
 [70] checkmate_1.8.5          GenomicFeatures_1.28.5   BiocParallel_1.12.0     
 [73] repr_0.15.0              rlang_0.2.1              pkgconfig_2.0.1         
 [76] GenomicFiles_1.14.0      bitops_1.0-6             evaluate_0.10.1         
 [79] lattice_0.20-34          purrr_0.2.4              bindr_0.1.1             
 [82] GenomicAlignments_1.14.1 htmlwidgets_1.0          bit_1.1-12              
 [85] tidyselect_0.2.4         plyr_1.8.4               magrittr_1.5            
 [88] R6_2.2.2                 Hmisc_4.0-3              pbdZMQ_0.3-2            
 [91] DBI_1.0.0                pillar_1.2.2             foreign_0.8-67          
 [94] survival_2.42-6          RCurl_1.95-4.11          nnet_7.3-12             
 [97] tibble_1.4.2             crayon_1.3.4             derfinder_1.12.0        
[100] uuid_0.1-2               progress_1.2.0           locfit_1.5-9.1          
[103] grid_3.4.1               data.table_1.11.4        blob_1.1.1              
[106] digest_0.6.15            xtable_1.8-2             tidyr_0.8.1             
[109] munsell_0.5.0            registry_0.5