Skip to content

Systems & Networks

.1 Introduction

  • Biological Complexity and the Genome: Biological systems are incredibly complex, with multiple levels of organization. The genome holds the blueprint for molecular creation and essential processes.
  • Advancements from the Human Genome Project: The completion of the Human Genome Project propelled genomic research forward, leading to the development of high-throughput techniques for analyzing vast amounts of biological data.
  • Systems Biology Approach: This approach focuses on understanding the interactions and relationships between components within a biological system, using computational modeling and visualization techniques.
  • Discovery Science and High-Throughput Techniques: The Human Genome Project ushered in a new era of ""discovery science,"" utilizing high-throughput techniques like microarrays and protein chips to analyze cellular components and their interactions.
  • Complexity of Cellular Interactions: Biological functions rarely arise from single molecules but are often the result of complex interactions between various cellular components like RNA, DNA, proteins, and other molecules. This intricate network of interactions presents a challenge for biologists.

.2 Network theory

  • Network theory studies relationships between objects using graphs.
  • It has applications in various fields, including physics, biology, computer science, and finance.
  • Examples of networks include social networks, the internet, and metabolic networks.
  • Cells contain various interacting networks, forming a ""network of networks"" that governs cell behavior.
  • A challenge is integrating theoretical and experimental data to understand and model these networks.
  • Complex network theory helps understand the formation and evolution of social and technological networks.
  • The architecture of cellular networks resembles complex systems like society, the internet, and computer chips.
  • This similarity suggests common governing laws for various complex networks.

.3 Graph theory

  • Graph theory was introduced by Leonard Euler to solve the Königsberg bridge problem.
  • The problem involved seven bridges connecting four islands, and the question was whether a path existed that crossed each bridge exactly once.
  • Euler focused on the connections between landmasses, not the distances or shapes of the paths.
  • He used topological features and graphs to prove that such a path was impossible.
  • Graph theory studies graphs, which are models of pairwise relationships between objects.
  • Graphs consist of vertices (nodes) connected by edges (links).
  • Networks can represent diverse data, including genes, proteins, and their relationships.

.4.1 The various types of network edges

  • Types of Network Edges: There are three primary types of edges in network analysis:
    • Undirected Edges: Represent relationships without direction, indicating a connection between nodes. (e.g., protein-protein interactions)
    • Directed Edges: Represent relationships with direction, showing the flow of signals. (e.g., gene expression regulation)
    • Weighted Edges: Can be applied to both undirected and directed edges to represent values like sequence similarity or interaction reliability.

.4.2 Network measures

  • Degree (k): Represents the number of connections a node has. For directed networks, there are ""incoming"" (kin) and ""outgoing"" (kout) degrees.
  • Degree Distribution [P(k)]: This shows the probability of a node having a specific number of connections.
  • Scale-Free Networks: These networks have a degree distribution that follows a power law, meaning a few ""hubs"" have many connections while most nodes have few connections.
  • Path Length: The number of connections between two nodes.
  • Mean Path Length (l>): The average path length between all nodes, reflecting network navigability.
  • Clustering Coefficient: Measures the density of connections within a group of nodes, indicating the presence of ""clusters.""
  • Centrality: Indicates the importance of a node or edge in terms of information flow or connectivity. This is influenced by factors like a node’s degree.

.4.3 Network models ”## Summary of Network Models in Biology

This text discusses three main types of biological network models:

  • Random Networks:

    • Represented by the ER model.
    • Nodes are connected randomly with a probability p.
    • Node degrees follow a Poisson distribution, meaning most nodes have similar connections.
    • Mean path length increases logarithmically with network size.
  • Scale-free Network Models:

    • Defined by a power-law degree distribution, meaning a few nodes (hubs) have a high number of connections.
    • Properties are determined by these hubs.
    • Examples include Barabási-Albert model.
    • Degree exponents are smaller than those of random networks.
  • Hierarchical Network Models:

    • Iterative approach that combines properties of scale-free networks with high node clustering.
    • Modules are integrated repeatedly, creating a hierarchical structure.
    • Dense clusters are connected by hubs.
    • Found in biological, social, and linguistic networks.

.5 Types of biological networks

  • Biological networks are used to model different types of information within a cell.
  • The data used to build a network influences its characteristics, like structure and connectivity.
  • Edges and nodes in these networks can convey multiple pieces of information.
  • Major classifications of biological networks:
    • Cell signaling networks
    • Gene/transcription regulation networks
    • Genetic interaction networks
    • Metabolic networks
    • Protein-protein interaction networks

.5.1 Cell signaling networks

  • Cell signaling networks are complex systems that regulate various cellular activities like immunity, repair, and development.
  • These networks involve converting one signal to another within a cell.
  • Abnormalities in cell signaling can lead to serious diseases like diabetes and cancer.
  • Signaling pathways are involved in these networks, with proteins acting as nodes connected by directed edges.
  • The MAPK pathway is crucial for regulating cell cycle and gene transcription. It can be activated by the EGFR receptor and abnormalities can contribute to cancer.
  • The Hedgehog signaling pathway plays a critical role in animal development, influencing body plan formation and metamorphosis.
  • The TGF-beta signaling pathway is involved in cell processes like proliferation, differentiation, and apoptosis.

.5.2 Gene/transcription regulation networks

  • Gene/transcription regulation networks model how genes are expressed.
  • Gene regulation is the process of converting genetic instructions into products like RNA or protein.
  • Gene regulation is crucial for cell structure and function, including cell differentiation and organism shape (morphogenesis).
  • These networks depict both gene expression and repression.
  • Model organisms like D. melanogaster, E. coli, and S. cerevisiae are studied to understand gene regulation in other organisms.

.5.3 Genetic interaction networks

  • Genetic interactions occur when mutations in multiple genes lead to an unexpected phenotype, different from the combined effects of individual mutations.
  • These interactions represent functional connections between genes, not physical connections.
  • Networks depict these interactions with genes as nodes and their relationships as edges.
  • The direction of these edges can be deduced.

.5.4 Metabolic networks

  • Metabolic networks represent the biological reactions occurring within a cell.
  • These networks are built using experimental data and genomic sequences.
  • They are available for various organisms, from bacteria to humans.
  • Metabolic networks can be used to simulate and analyze metabolic processes.
  • Metabolism is a collection of biological processes essential for maintaining structure, responding to the environment, development, and reproduction.
  • Metabolic pathways are sequences of chemical reactions catalyzed by enzymes within a cell.
  • These pathways can produce products or initiate further metabolic processes.
  • Many pathways coexisting in a cell form a metabolic network.
  • Metabolic pathways are crucial for maintaining an organism’s homeostasis.
  • In a network representation, enzymes and substrates are nodes, and directed edges represent the reactions between them.

5.5 Protein-protein interaction networks

1. Importance of PPIs:

  • PPIs are essential for most biological functions.
  • Understanding PPIs is crucial for comprehending cell physiology changes during diseases.
  • PPIs are vital for drug development because drugs target these interactions.

2. Types of PPIs:

  • Stable interactions: Form protein complexes (e.g., hemoglobin).
  • Transient interactions: Modifications to proteins (e.g., kinases).
  • Dynamic component of the interactome: Transient interactions are crucial for the dynamic nature of cellular processes.

3. Applications of PPI Data:

  • Assigning protein functions.
  • Understanding signaling pathways in detail.
  • Identifying relationships between proteins within complexes (e.g., the proteasome).

4. The Interactome:

  • The interactome is the complete set of PPIs within a cell, organism, or environment.
  • Advances in PPI screening tools (like the yeast two-hybrid experiment and mass spectrometry) have led to a surge in PPI data and complex interactome mapping.

5. Properties of PPI Networks:

  • Small-world effect: Proteins are highly interconnected, leading to short distances between any two proteins in the network (the ""six degrees of separation"" principle). This facilitates rapid signal flow within the network.
  • Resilience: Despite high interconnectedness, networks are resilient to changes in a single gene or protein.
  • Scale-free networks: Most proteins have few connections, while a small number of ""hub"" proteins have numerous connections. This structure is based on ""preferential attachment"" where new connections are more likely to form with existing highly connected nodes.

6. Characteristics of Scale-Free PPI Networks:

  • Stability: Due to the low connectivity of most proteins, the probability of a hub protein failing is low, making the network resilient to random failures.
  • Scale invariance: Networks maintain stability regardless of size.
  • Vulnerability to targeted attacks: Failure of a small number of hub proteins can disrupt the network significantly.
  • Hub proteins often contain both lethal and vital genes: This highlights the importance of hubs in cellular processes.

7. Transitivity and Modularity:

  • Transitivity: The tendency of nodes to cluster together. High transitivity indicates a tightly connected community.
  • Modularity: PPI networks are organized into modules or functional units. These modules maintain their functions regardless of the context.
  • Protein complexes as modules: Stable interactions within a protein complex represent a well-defined functional unit.
  • Broader module concept: Modules can include non-binding interactions that still contribute to a defined function.

8. Importance of Understanding Modules:

  • Understanding modules simplifies complex networks.
  • Studying module interactions is crucial for understanding cellular processes.
  • Topological network studies help in identifying and characterizing modules.

.6 Sources of data for biological networks

  • Manual Literature Curation: Experts manually review and store published information in databases. This provides high-quality data but is time-consuming and expensive, limiting database size.
  • High-Throughput Datasets: Experiments like mass spectrometry generate large volumes of data, including protein-protein interaction (PPI) datasets. While offering large, organized data, these methods introduce bias and varying data quality.
  • Computational Techniques: These strategies predict associations based on existing experimental evidence. For example, protein interactions in mice can be predicted from human protein interactions. This expands experimental data coverage but produces noisier datasets.
  • Text Mining: Machine learning techniques extract relationships from literature. This increases data coverage but is time-consuming and often yields noisy results.

.7 Gene ontology for network analysis

  • Functional enrichment analysis helps understand the biological functions and processes associated with a list of genes.
  • Gene Ontology (GO) provides a vocabulary for describing genes and their products in terms of cellular components, biological processes, and molecular functions.
  • Pathway analysis offers more specific and relevant information compared to GO enrichment, as it focuses on biochemical systems and their roles in cellular processes.
  • Pathway analysis tools have limitations, including a lack of pathway annotation for many genes and a bias towards well-studied pathways.

.8 Analysis of biological networks and interactomes

  • Network biology focuses on understanding how interacting molecules, not just isolated proteins, regulate biochemical processes.
  • Understanding interaction networks helps identify critical components for controlling processes and can shed light on disease complexities.
  • Disease phenotypes arise from changes in a gene’s network context, not just single gene mutations.
  • Network analysis, compared to traditional methods, is time-efficient, data-intensive, and less limited by functional annotations.
  • Interactome maps, representing entities and their interactions as nodes and edges, provide a visual and comprehensive view of biological systems.
  • Network analysis incorporates genomics data onto existing networks or discovers networks based on experimental data.

.9 Interaction network construction using a gene list

  • Data Source Importance: The type and source of interaction data are crucial for constructing accurate interaction networks.
  • IMEx Consortium: The IMEx consortium promotes manual curation of experimentally generated interaction data from literature.
  • Meta-databases: These aggregate data from primary sources and provide access through a portal.
  • Computational Interactions: Some databases incorporate computational interactions to augment experimental data.
  • Overlap and Duplication: Primary interaction databases have minimal overlap to avoid duplication in manual curation. This can lead to missing data.
  • PSICQUIC: Researchers can search multiple databases using web services like PSICQUIC.
  • Interaction Types: Interactions can be physical (PPIs), regulatory (miRNA-mRNA), or biochemical. Combining these types requires caution due to edge variations.
  • Direct vs. Indirect Interactions: Techniques like Mass Spectrometry and Affinity Purification can detect interactions but not differentiate between direct and indirect ones.
  • Confidence Scores: Confidence in specific interactions varies based on the experiment used to determine it. High-throughput methods can generate massive data but have high false positive/negative rates.
  • Context-Dependent Interactions: Many interactions are context-dependent (cell type, conditions, protein isoform). Databases lack high-throughput data for specific contexts.
  • Integration of Contextual Data: Researchers are integrating gene/protein expression data to identify likely edges and nodes in subnetworks.

.10.3.4 Babelomics 5

  • Babelomics 5 is a tool that analyzes lists of genes or proteins.
  • It maps these lists to a reference interactome, which can be user-defined.
  • The tool calculates various parameters based on the mapped data, including least connected networks and interactomes.
  • Babelomics 5 compares different protein lists to identify significant changes in parameter distributions.

.11 Network visualization tools

  • Cytoscape is an open-source software for network analysis, visualization, and querying.
  • Key Features:
    • Data input and output capabilities.
    • Network and data visualization tools.
    • Filtering and querying functionalities.
    • VizMapper for visual mapping of attributes.
  • Integrations and Compatibility:
    • Compatible with other applications, tools, websites, and databases.
    • Used by commercial companies like GeneGO, Genespring, and Agilent.
  • Extensibility:
    • Utilizes plugin structures for expanding core functionalities.
    • Over 74 open-access plugins have been developed since 2004, with 46 compatible with the latest versions.

.11.6 Gephi

  • Gephi is a free, open-source graph visualization platform.
  • It can handle large networks with many nodes and edges.
  • Gephi requires significant computational power.
  • It offers advanced algorithms through plugins.
  • Gephi is multi-platform compatible.
  • Its limitation is the lack of specific biological data analysis capabilities.
  • Gephi can be used for visualization, statistical analysis, and enumeration.

.12 Important properties to be inferred from networks

  • Network construction: The first step in network analysis is constructing the network itself.
  • Feature investigation: Analyzing features within a network and comparing them to expectations can lead to a better understanding of the network.
  • Computational and mathematical methods: Various methods exist for analyzing large networks and identifying specific features.
  • Hub nodes: Hub nodes, which have a high degree of connections, are important for the structure and function of scale-free networks.

.12.4 Bioinformatics tools to detect modules, bottlenecks and hubs

  • Bioinformatics tools are used to detect bottlenecks, hubs, and modules in networks.
  • NetworkAnalyst and Cytoscape are examples of such tools.
  • NetworkAnalyst can analyze networks based on gene expression and find modules, betweenness centralities, and degrees.
  • Cytoscape’s cytoHubba application can detect bottlenecks and hubs.
  • NetworkAnalyst uses a random walk algorithm to detect modules of high frequency nodes.
  • Cytoscape’s jActiveModules application detects connected parts of the network with significant gene expression differences.
  • DIAMOnD algorithm is used to identify disease modules based on connectivity significance.
  • Limitations exist due to incomplete interactome and limited known disease-linked proteins.