Linux system is commonly used, alongside R programming and IDEs such as Rstudio, Jupyter Notebook, and Vim.
Galaxy is the primary user-friendly option for analyzing Next Generation Sequencing (NGS) research.
DNAnexus and AIR (artificial intelligence-based RNA-seq) are cloud software UIs with subscription-based access.
Cloud services like Amazon Web Services and Microsoft Azure can handle massive genetic data but are largely controller-driven.
10.1 Downstream analysis and visualization
Downstream analysis is performed after identifying substantially expressed profiles in a study.
This analysis provides biological context for the described genomes.
Two main types of downstream analysis are done on transcriptome datasets.
11 Gene ontology & pathway analysis
Gene Ontology (GO) analysis helps understand how genes function in a specific biological system.
GO classifies genes into three categories:
Cellular components
Molecular functions
Biological processes
Gene Enrichment Analysis (GEA) is a type of GO study that can be categorized into three groups. (The text ends without listing these groups).
11.1 Singular enrichment analysis (SEA)
Singular Enrichment Analysis (SEA): This technique examines gene lists from high-throughput experiments (like NGS or microarrays) to identify functional categories that are over-represented within those lists.
Input Data: The input data for SEA is typically a set of genes, often defined by the user based on specific criteria.
Classification: The genes are categorized into three major functional areas (which are not explicitly stated in the text).
Statistical Analysis: SEA uses statistical methods like Fisher’s exact test, EASE score, or Chi-square test to determine if there’s a significant association between the genes and their functional classifications.
Tools: Commonly used tools for SEA include DAVID, GoStat, and Bingo.
11.2 Gene set enrichment analysis (GSEA)
GSEA utilizes all genomes from a high-throughput study. This ensures the analysis is unbiased, unlike other methods that might have limitations.
GSEA can analyze genes with small differential expression. This allows for a more comprehensive analysis.
GSEA calculates Maximum Enrichment Scores (MESs) based on gene ranking within a class. This score indicates the level of enrichment.
The p-value is determined by comparing MESs with expected values. This statistical test assesses the significance of the enrichment.
Tools like ErmineJ and FatiScan can be used for GSEA. These tools provide the necessary analysis and interpretation.
11.3 Modular enrichment analysis (MEA)
Modular enrichment analysis (MEA) combines SEA-type enrichment analysis with connectivity search techniques to facilitate phrase linkages.
MEA uses Kappa estimates of concordance to assess agreement, and removes genes that don’t appear frequently in nearby words.
Platforms like ADGO, DAVID, and GeneCodis can perform MEA.
MEA integrates information from various domains, such as KEGG for pathway assessment, Pfam for protein domains, and TRANSFAC for transcriptional regulation.
11.4 Correlation networks
Correlation Networks: Analyzing gene lists for statistically significant associations is crucial for understanding gene interactions.
GeneMania: This program offers detailed information on gene interactions, including co-expression, co-localization, and physical forces, providing a more comprehensive view than just gene function.
Biogrid: A repository of biochemical, genomic, and protein-protein interaction data, updated regularly based on known findings.
STRING: A library focusing on protein interactions, aiding in understanding how proteins encoded by genes work together.
WGCNA: An R tool that uses microarrays or RNA-sequencing data to construct correlation networks between genes in a specific study.
12 Future prospects and conclusion
Statisticians are crucial to bioinformatics: They develop advanced models and analysis methods to extract meaningful biological insights from vast genomic data.
Integrative research is essential: Combining data from multiple systems is key to a deeper understanding of cellular biology, requiring innovative approaches that balance statistical rigor, scalability, and interpretability.
Statistical evaluation of omics studies is challenging: Agreement on the best methods is difficult, requiring further research to validate techniques and improve data integration.
Combining clinical and genetic information is a major challenge: Fully evaluating hypotheses and making results useful to the public remains a significant hurdle in bioinformatics.
Statistics has a unique opportunity: By providing researchers with tools to analyze large datasets, statistics can significantly contribute to scientific progress in bioscience and healthcare.