Skip to content

Database

1 Omics theory

  • The Central Dogma of Molecular Biology:
    • The traditional flow of genetic information was originally thought to be unidirectional: DNA to RNA to protein.
    • Francis Crick’s central dogma established the accepted flow as DNA → RNA → Protein.
    • This flow encompasses two key processes: transcription (DNA to RNA) and translation (RNA to protein).
  • Reverse Transcription:
    • A crucial modification to the central dogma is the process of reverse transcription, where RNA is transcribed back into DNA. This process is facilitated by an enzyme called reverse transcriptase.
  • Omics Theory and the Central Dogma:
    • The central dogma forms the foundation for the Omics theory, which focuses on the study of large-scale biological datasets (e.g., genomics, transcriptomics, proteomics).
  • Key Processes within the Central Dogma:
    • Replication: The process of DNA copying itself.
    • Transcription: The process of creating RNA from a DNA template.
    • Translation: The process of building a protein from an RNA template.
    • Reverse Transcription: The process of creating DNA from an RNA template.

2 High-throughput technologies

  • High-throughput technologies automate cell biology research, allowing for faster and more parallel studies of cell function, interaction, and disease.
  • High-throughput screening (HTS) involves testing a large number of samples with various compounds to identify those with a desired effect.
  • Examples of drugs discovered through HTS: Sorafenib (for cancer) and Maraviroc (for HIV and cancer).
  • Omics research incorporates HTS to link large-scale biological data (genomics, transcriptomics, proteomics) with technology and research.
  • HTS in cell biology focuses on cells and employs methods like imaging and microarrays to study gene expression and genome-wide screening.
  • Advantages of HTS: automation for large-scale studies without compromising quality, generation of high-dimensional data sets.
  • Challenges of HTS: data collection and quality control, choosing the right method, integrating data from different platforms, managing and analyzing large datasets (""Data Deluge""), data security and privacy concerns.

3.1 What is DNA?

  • DNA Structure and Function: DNA, a double helix composed of four bases (Adenine, Thymine, Guanine, and Cytosine), carries the genetic instructions for an organism’s development and function.
  • Genomics vs. Genetics: Genomics focuses on the entire genome of an organism, while genetics studies specific genes and their functions, particularly in relation to heredity, disease, and drug response.
  • DNA Sequencing: DNA sequencing determines the order of bases in a DNA strand. Techniques like sequencing by synthesis utilize fluorescently tagged nucleotides to identify the sequence.
  • Human Genome Project: The Human Genome Project, completed in 2003, sequenced the entire human genome and made it publicly accessible. This data is crucial for studying genetic variations and disease susceptibility.
  • Functional Genome Analysis: Functional genome analysis investigates variations within the genome, including single nucleotide changes and chromosomal aberrations, to understand their impact on gene function and disease development.
  • Methods for Variant Detection: Various methods are used to detect genetic variants, including DNA microarrays and DNA sequencing techniques such as whole genome sequencing (WGS), whole exome sequencing (WES), and targeted genomic sequencing (TS).
  • Single-Cell DNA Sequencing: This method enables the sequencing of DNA from individual cells, providing insights into cellular heterogeneity and disease progression.

3.2 DNA microarray

  • DNA microarray is a tool for identifying mutations in genes like BRAC1 and BRAC2.
  • It works by comparing DNA from a patient with a control sample.
  • The process involves isolating DNA, denaturing it, cutting it into fragments, and tagging them with fluorescent dyes (green for patient, red for control).
  • These tagged fragments are then placed on a chip containing synthetic DNA sequences.
  • If the patient’s DNA has no mutations, both green and red tagged fragments will bind to the same sequences on the chip.
  • If the patient has a mutation, their DNA will not bind properly to the normal sequences but will bind to the sequences on the chip that correspond to the mutation.

3.2.1 Application of DNA microarray

Applications of DNA Microarrays:

  • Gene Expression Analysis:

    • RNA is extracted from cells and converted into labeled cDNA or cRNA.
    • Various labeling methods are used, including fluorescently labeled nucleotides and biotin-labeled nucleotides.
    • This allows for the measurement of gene expression levels across a large number of genes simultaneously.
  • Transcription Factor Binding Analysis (ChIP-chip):

    • Used to identify the DNA regions where transcription factors bind.
    • Transcription factors are cross-linked to DNA, fragmented, and isolated using antibodies.
    • The isolated DNA is then amplified, labeled, and hybridized to a microarray.
  • Genotyping:

    • Microarrays are used for SNP (Single Nucleotide Polymorphism) genotyping.
    • Methods include allele discrimination through hybridization and allele-specific expansion.

Limitations of Microarrays:

  • Dependence on Existing Knowledge: Requires prior knowledge of the genome sequence.
  • High Background Noise: Cross-hybridization can contribute to noise.
  • Limited Recognition: Incomplete coverage due to background noise and saturation effects.
  • Normalization Complexity: Comparing expression intensities across different experiments can be challenging.
  • Data Irregularity: Data may be less reliable for genes with low expression levels.
  • No Protein Information: Provides information about gene expression, but not protein expression or function.

Bioinformatics Tools for Microarray Analysis:

  • Qspline: Used for array data normalization (Affymetrix and spotted arrays).
  • ClustArray: Used for clustering array data.
  • OligoWiz and ProbeWiz: Used for designing oligonucleotide probes and PCR primers for spotted arrays.
  • Promoter: Used to predict promoter sites in vertebrate genomes.
  • Other tools: ArrayExpress, ArrayTrack, BASE, dchip, EzArray, GeneX.
  • R Packages: Affy, PLIER, LIMMA, sihPathway, org.Mm.eg.db.

3.3 DNA sequencing

  • Whole Genome Sequencing (WGS) is a complete process for analyzing an entire genome. It is crucial for understanding inherited disorders, identifying cancer-causing mutations, and tracking disease outbreaks.
  • WGS has become more accessible due to the rapid decrease in sequencing costs and the ability of modern sequencers to generate large datasets. This makes it a powerful tool for research in genomics, including human, plant, livestock, and microbial genomes.
  • NGS technology has revolutionized WGS, enabling high-throughput parallel sequencing of both DNA and RNA at a significantly lower cost than Sanger sequencing. This has made WGS more practical for various applications, such as metagenomics, public health surveillance, and outbreak investigations.
  • Traditional WGS methods involve fragmenting the genome into smaller pieces, sequencing each piece, and then assembling them using bioinformatics tools. The ""clone-by-clone"" process, used for the original human genome sequencing, involves copying fragments into bacteria to create identical clones, followed by further fragmentation and sequencing.

3.3.3 Assembly of sequencing reads

  • Assembly of sequencing reads: The process of putting together sequenced DNA fragments, known as ""assembly,"" is crucial for reconstructing a complete genome.
  • Two main assembly methods:
    • De novo assembly: This method identifies overlapping sequences, aligns them, and reconstructs the genome without relying on a reference genome. It’s ideal for sequencing new or unknown organisms and offers less biased results.
    • Mapping to a reference genome: This method aligns new sequencing data to a pre-existing reference genome, simplifying the process but potentially introducing bias.
  • Applications of Whole Genome Sequencing (WGS):
    • Mutation frequency analysis: WGS helps determine mutation rates, revealing a significant number of new mutations per generation in the human genome.
    • Genome-wide association studies (GWAS): WGS is instrumental in GWAS, identifying relationships between genetic variants and diseases or phenotypic traits.
    • Diagnostics: WGS is used for diagnosing infectious outbreaks and neurological diseases like Alzheimer’s.

3.4 Whole exome sequencing (WES)

  • Whole Exome Sequencing (WES) is a cost-effective NGS technique that sequences coding regions of the genome, which account for less than 2% of the human genome but contain approximately 85% of disease-related variants.
  • WES is used in various applications like genetic disease analysis, cancer analysis, and population genetics.
  • It has been increasingly used in clinical settings for disease diagnosis and has been instrumental in projects like the 1000 Genomes Project.
  • Targeted Gene Sequencing (TS) focuses on analyzing specific mutations in an individual’s sample by sequencing selected genes or regions known to be associated with a disease or phenotype.
  • TS offers scalability and cost-effectiveness compared to multiple separate tests.
  • The text lists various bioinformatics tools used for data analysis of WGS, WES, and TS, categorized into:
    • Read Alignment
    • Annotation
    • Visualization
    • Data-warehousing
    • Analytics
    • AI-based Analytics

3.5 Single cell DNA-SEQ (sc-DNA-seq) ”## Extractive Summary of sc-DNA-seq:

  • What is sc-DNA-seq? It is a single-cell DNA sequencing technique that helps study genetic heterogeneity in multicellular organisms.
  • Advantages over bulk tissue WGS: sc-DNA-seq overcomes limitations of WGS, providing higher sensitivity for detecting mutations present in low proportions.
  • Steps involved:
    1. Single cell isolation
    2. Whole genome amplification
    3. Library preparation
    4. Sequencing
    5. Data analysis
  • Applications:
    • Tumor cells: Identifying heterogeneity, mapping tumor cells, and understanding tumorigenesis and metastasis.
    • Nervous system: Studying variations in neurons, understanding brain circuits, and identifying different neuron types.
    • Reproductive and embryonic medicine: Sequencing germ and embryonic cells, aiding in diagnosis and treatment of reproductive and genetic diseases.
    • Immunology: Studying heterogeneity in immune cells for improved disease diagnosis and treatment.
    • Digestive and urinary systems: Mapping cells, understanding homeostasis, and countering pathogenic microorganisms.

4 Epigenomics

  • Epigenetics vs. Epigenomics:
    • Epigenetics refers to changes in gene activity without alterations to the DNA sequence, and these changes are passed down to daughter cells.
    • Epigenomics studies these epigenetic modifications across the entire genome.
  • Epigenomic Mechanisms:
    • DNA methylation and histone modifications are key components of epigenomics.
    • Noncoding RNAs also play a role in regulating these processes.
  • Importance of Epigenomics:
    • Epigenetic marks contribute to phenotypic variation, disease development, and responses to environmental stimuli.
  • Epigenomic Research Methods:
    • High-throughput techniques include:
      • Shotgun bisulfite sequencing and pyrosequencing (for DNA methylation analysis)
      • Genome-scale chromatin immunoprecipitation (for histone modification analysis)
    • Other methods include:
      • Differential methylation hybridization
      • Dab cluster methylation analysis of bisulfite-treated DNA
      • Base-specific cleavage combined with MALDI-TOF
  • Epigenomics in Cardiovascular Disease:
    • Research in this area is still in its early stages.
    • Studies have examined global epigenetic changes and focused on specific genes.
    • Genomic DNA in human atherosclerotic plaques shows hypomethylation.
  • Transgenerational Epigenetic Inheritance:
    • Environmental factors affecting pregnant individuals (Fo) can lead to direct effects on both F1 and F2 generations through epigenetic inheritance.
    • This means that epigenetic changes in Fo can be passed down to subsequent generations.
  • Methods for Studying Epigenomics:
    • ChIP-seq
    • Whole-Genome Shotgun Bisulfite Sequencing

4.1 ChIP-seq

  • ChIP-seq is a powerful technique for mapping DNA-binding proteins throughout the genome. It involves cross-linking DNA-protein interactions, fragmenting chromatin, immunoprecipitating the target protein, and sequencing the associated DNA.
  • ChIP-seq is widely used in epigenomics projects to create reference maps of epigenetic modifications. This data is crucial for understanding how proteins regulate gene expression and contribute to biological processes and diseases.
  • ChIP-seq has applications in disease research, particularly for studying epigenetic alterations in cancer and non-cancerous diseases. It is also a key tool for developing precision medicine strategies.
  • ChIP-seq is replacing ChIP-chip technology, which used DNA microarrays for analysis. This shift is driven by the increased sensitivity and resolution of sequencing methods.
  • Specialized bioinformatics tools are used to analyze ChIP-seq data. These tools include short-read aligners (e.g., BWA, Bowtie, GSNAP) and peak callers (e.g., MACS, PeakSeq, ZINBA) that help identify regions of the genome where the target protein binds.

4.2 Whole-genome shotgun bisulfite sequencing (WGSBS)

  • Whole-genome shotgun bisulfite sequencing (WGSBS) is a method used to determine methylation levels in DNA.
  • The process involves:
    • Fragmenting the entire genome into short sequences.
    • Treating the DNA with sodium bisulfite, which converts unmethylated cytosines (C) to uracils (U) while leaving methylated cytosines unchanged.
    • Sequencing the fragments and aligning them to reconstruct the complete sequence.
  • Advantages of WGSBS:
    • Provides high resolution and allows analysis at the 5 mC level.
  • Disadvantages of WGSBS:
    • Expensive.
    • Requires a large amount of DNA.
    • Bisulfite treatment fragments DNA, making library preparation challenging.
  • The process involves:
    • Extracting genomic DNA from tissue.
    • Fragmenting the DNA.
    • Ligation of adapters to both ends of the fragments.
    • Sodium bisulfite treatment.
    • Sequencing the fragments.
    • Aligning the fragments to reconstruct the complete sequence.
  • Analysis of methylation:
    • Methylation levels are determined by counting the number of cytosines at each nucleotide position.
    • This provides a single-nucleotide resolution map of methylation across the genome.

5 Transcriptomics

  • Transcriptomics studies genome-wide RNA expression.
  • Microarray technology is a common tool used in transcriptomics.
  • Transcriptomics is widely used to analyze disease mechanisms.
  • The transcriptome represents the total transcripts present at a specific stage of development.
  • Analyzing the transcriptome helps understand the functional genome and the molecular composition of cells and tissues.
  • Transcriptomics aims to identify all transcript species, including mRNAs, non-coding RNAs, and sRNAs.
  • Transcriptomics investigates the transcriptional mechanisms of genes, including start and end sites, splicing patterns, and post-translational modifications (PTMs).
  • Transcriptomics assesses the varying expression levels of transcripts across different developmental stages and conditions.

5.1 RNA-seq

  • RNA-seq is a powerful tool for transcriptome analysis and gene expression quantification. It has become a significant methodology due to the advancements in Next-Generation Sequencing (NGS).
  • NGS offers high throughput, speed, and affordability, making RNA-seq highly efficient.
  • RNA-seq can be used to investigate the transcriptome of organisms, identify new genes, and discover functional genes. It also enables analysis of gene expression in various tissues and cells, and the detection of small RNAs.
  • Key advantages of RNA-seq include:
    • High Resolution: Precise identification of single bases.
    • High Throughput: Sequencing of a vast number of base arrangements, covering the entire transcriptome.
    • High Sensitivity: Detection of rare transcripts.
    • Convenience: Applicable to various species without prior genome information or probe design.
  • RNA-seq workflow: Long RNA is converted to cDNA fragments, sequenced using HTS techniques, and aligned to the transcriptome. The results are categorized into exonic, junction, and poly-A end-reads, which are used to generate a base-resolution gene expression profile.
  • Numerous software tools are available for RNA-seq data visualization, advanced analysis, gene fusion detection, and pipeline management. Examples include expVIP, spongeScan, Cascade, omicplotR, Subread, SOAPfusion, RSEQREP, DRAP, and many more.

6 Proteomics

  • Proteomics: The study of the proteome
    • The proteome is the complete set of proteins expressed by an organism or cell at a specific time and under defined conditions.
    • It encompasses all proteins in a cell type, an organism, or even specific sub-cellular structures like a mitochondrion or a virus.
    • It’s analogous to the genome but focuses on proteins instead of DNA.
  • The Importance of Proteomics:
    • Proteins are the ""workhorses"" of the cell, carrying out essential functions for cell maintenance and organismal survival.
    • Understanding the proteome provides insights into cellular processes, disease mechanisms, and potential drug targets.
  • Challenges of Proteomics:
    • Analyzing the full range of proteins expressed in a cell is complex, as there are often thousands of proteins present at once.
    • The technology for studying proteins lags behind that of DNA and RNA analysis.
    • 2D gel electrophoresis, a common technique, has limitations.
  • Ideal Proteomics Technologies:
    • High sensitivity and high throughput to analyze large numbers of proteins quickly.
    • The ability to distinguish between different protein modifications.
    • The capacity to examine all proteins in a sample.
  • Common Methods for Studying Proteomics:
    • Reverse Phase Protein Microarrays (RPPA): Used to quantify and compare protein levels in different samples.
    • Mass Spectrometry (LC-MS/MS): A powerful tool for identifying and quantifying proteins in complex mixtures.

6.1 Reverse phase protein microarrays (RPPA)

What is RPPA?

  • Quantitative Protein Analysis: RPPA is a technique for measuring protein expression levels in a high-throughput and quantitative manner.
  • Microarray Format: Tiny samples (cell lysates, bodily fluids) are placed on a microarray chip, and antibodies are used to detect specific proteins.
  • Multiple Proteins at Once: A single microarray can analyze thousands of proteins simultaneously, making it a powerful tool for understanding complex biological processes.
  • Multiplexing: Different antibodies can be used on the same array to measure multiple proteins from the same sample at once.

Applications of RPPA:

  • Disease Pathway Analysis: RPPA is used to identify deregulated signaling pathways in diseased tissues.
  • Cancer Research: It helps researchers understand the molecular mechanisms of cancer development and metastasis, potentially leading to new treatments.
  • Biomarker Discovery: RPPA is used to identify proteins that are associated with specific diseases, potentially serving as biomarkers for diagnosis and prognosis.
  • Pre-analytical Optimization: RPPA can be used to optimize tissue handling procedures to ensure accurate and reliable protein measurements.

RPPA Data Analysis:

  • Software Tools: There are various bioinformatics tools available for analyzing RPPA data, such as RPPAware, Supercurve, Normacurve, and more.

Key Points:

  • RPPA allows for the simultaneous analysis of numerous proteins in different biological conditions.
  • It plays a vital role in disease research, biomarker discovery, and pre-analytical optimization.
  • Specialized bioinformatics tools are crucial for analyzing RPPA data effectively.

7 Metabolomics

  • Metabolomics: The study of small molecules (metabolites) in cells, tissues, and biofluids.
  • Metabolome: The complete set of metabolites and their interactions within a living organism.
  • Importance of Metabolomics: Metabolomics provides direct insight into biochemical activities within cells and tissues, offering a unique understanding of molecular phenotypes.
  • Challenges of Metabolomics:
    • Diversity of Metabolites: The metabolome includes a wide range of molecules with varying properties, requiring specialized techniques for analysis.
    • Lack of Standardization: Different labs often use unique methods, making it difficult to compare results across studies.
  • Efforts for Standardization: Initiatives are underway to address these challenges, including:
    • Metabolomics Standards Initiative: Aims to create guidelines for data reporting.
    • Ring Tests: Used to evaluate the consistency of different metabolomics methods and labs.
    • Data Repositories: Provide centralized storage for metabolomics data and metadata, such as MetaboLights in Europe and Metabolomics Workbench in the US.

7.1 Different methods for studying metabolomics

  • Mass spectrometry (MS) is a powerful tool for studying both proteomics and metabolomics. It can be used to identify and quantify proteins and metabolites.
  • In proteomics, MS can be used to classify protein expression, characterize protein interactions, and identify sites of protein modification.
  • Two main approaches for analyzing proteins using MS are bottom-up and top-down. Bottom-up involves digesting proteins into peptides, while top-down involves analyzing intact proteins.
  • A variety of software tools are available for analyzing MS data, including De novo sequencing, database searching, peptide identification, spectral library, and protein quantification.
  • In metabolomics, MS can be used to identify and quantify small molecules, such as metabolites.
  • A wide range of software tools are available for analyzing metabolomic MS data, including metaXCMS, XCMS, XCMS2, MeDDL, MetAlign, MAVEN, centWave, mzMine2, MetabolomeExpress, Chromaligner, and many more.