GenXPro Bioinformatics

Back

omiRas Workflow

omiRas is a free web service designed for the analysis of non coding RNA (ncRNA) datasets from replicated Illumina/Solexa small RNA-Sequening (RNA-Seq) experiments of two different conditions (e.g. cancer and adjacent control tissue of several patients). Starting with raw sequencing data, omiRas offers an efficent way to quantify ncRNA expression in each library, to analyze differential expression of ncRNAs, to identify novel microRNAs (miRNAs) and to interactively assign functions to differentially expressed miRNAs. The combination of miRNA-mRNA interaction databases together with protein-protein interaction information allows the user to construct interaction networks of interesting miRNAs and mRNAs and to identify microRNAs with important implications in the development of differential gene signatures.

omiRas

Overview

In steps (1-5) the read (pre-)processing and ncRNA quantification is performed for each library independently. These independent results are combined and used for differential expression analysis (6), interactive network analysis (7) and prediction of novel miRNAs (8+9).

(1) Raw Data pre-processing

From the reads contained in the FASTQ files the 3' sequencing adapter is removed by a local alignment of the adapter to the sequenced reads. Furthermore Illumina's marked quality region is trimmed.

(2) TAG quantification

The reads in each library are summarized to tags in a quantified FASTA format, e.g.
>tag1_49862
GAGGTAGTAGGTTGTA
if the sequence GAGGTAGTAGGTTGTA occurs 49862 times within the FASTQ library.

(3) Mapping

The FASTA reads are mapped to the genome of the organism under consideration with bowtie,
reporting only mappings in the best alignment stratum (if a read has multiple mapping loci,
only the loci with the fewest alignment mismatches are reported) allowing a maximum of two mismatches.

(4) Annotation and Normalization

For each mapping locus annotations are derived from several ncRNA databases. To take into account multiple mappings, the number of reads for each tag is normalized with the number of its mapping loci.

(5) Quantification results

(5.1) The quantification results are summarized in a table together with statistics about the mapping and the sequence length distribution.
(5.2) A FASTA file of tags overlapping intronic regions of coding genes or intergenic regions is created for the prediction of potentially novel microRNAs.

(6) Combination of expression results and differential expression

The expression results for all libraries are combined and for each type of ncRNA (snoRNA, tRNA, rRNA, miRNA, scRNA) a test for differential expression is performed independently (DEseq).

(7) Interactive network visualization of microRNAs

MicroRNAs and genes selected by the user can be visualized in an interaction network. Connections between microRNAs and genes are determined by the intersection of several miRNA-mRNA interaction databases. Connections between genes are based on the information from STRING database of protein-protein interactions.

(8) Prediction of novel miRNAs

The FASTA file generated in (5) is used to predict novel miRNAs using miRdeep. Secondary structures are visualized with RNAfold.

(9) Differential expression of novel miRNAs

A test for differential expression of novel microRNAs is performed with DEseq.