What kind of task type does DeepProtein Consider¶

We have listed several tasks which are currently implemented in DeepProtein.

Protein Function Prediction¶

Protein function prediction involves determining the biological roles and activities of proteins based on their sequences or structures. This process is crucial for understanding cellular mechanisms and interactions, as a protein’s function is often linked to its sequence composition and the context of its cellular environment.

Protein Localization Prediction¶

Accurate localization predictions can enhance drug development by informing target identification and improving therapeutic efficacy, particularly in treating diseases linked to protein mislocalization. Additionally, insights gained from localization predictions facilitate the mapping of biological pathways, aiding in the identification of new therapeutic targets and potential disease mechanisms.

Protein-Protein Interaction (PPI)¶

Proteins are the essential functional units in human biology, but they seldom operate in isolation; rather, they typically interact with one another to perform various functions. Understanding protein-protein interactions (PPIs) is crucial for identifying potential therapeutic targets for disease treatment. Traditionally, determining PPI activity requires costly and time-consuming wet-lab experiments. PPI prediction seeks to forecast the activity of these interactions based on the amino acid sequences of paired proteins.

Epitope Prediction¶

An epitope, also known as an antigenic determinant, is the region of a pathogen that can be recognized by antibodies and cause an adaptive immune response. The epitope prediction task is to distinguish the active and non-active sites from the antigen protein sequences. Identifying the potential epitope is of primary importance in many clinical and biotechnologies, such as vaccine design and antibody development, and for our general understanding of the immune system [Du et al., 2023]. In epitope prediction, the machine learning model makes a binary prediction for each amino acid residue. This is also known as residue-level classification.

Paratope Prediction¶

Antibodies, or immunoglobulins, are large, Y-shaped proteins that can recognize and neutralize specific molecules on pathogens, known as antigens. They are crucial components of the immune system and serve as valuable tools in research and diagnostics. The paratope, also referred to as the antigen-binding site, is the region that specifically binds to the epitope. While we have a general understanding of the hypervariable regions responsible for this binding, accurately identifying the specific amino acids involved remains a challenge. This task focuses on predicting which amino acids occupy the active positions of the antibody that interact with the antigen. In paratope prediction, the machine learning model makes a binary prediction for each amino acid residue. This is also known as residue-level classification

Antibody Developability Prediction¶

Immunogenicity, instability, self-association, high viscosity, polyspecificity, and poor expression can hinder an antibody from being developed as a therapeutic agent, making early identification of these issues crucial. The goal of antibody developability prediction is to predict an antibody’s developability from its amino acid sequences. A fast and reliable developability predictor can streamline antibody development by minimizing the need for wet lab experiments, alerting chemists to potential efficacy and safety concerns, and guiding necessary modifications. While previous methods have used 3D structures to create accurate developability indices, acquiring 3D information is costly. Therefore, a machine learning approach that calculates developability based solely on sequence data is highly advantageous.

CRISPR Repair Outcome Prediction¶

CRISPR-Cas9 is a gene editing technology that allows for the precise deletion or modification of specific DNA regions within an organism. It operates by utilizing a custom-designed guide RNA that binds to a target site upstream, which results in a double-stranded DNA break facilitated by the Cas9 enzyme. The cell responds by activating DNA repair mechanisms, such as non-homologous end joining, leading to a range of gene insertion or deletion mutations (indels) of varying lengths and frequencies. This task aims to predict the outcomes of these repair processes based on the DNA sequence. Gene editing marks a significant advancement in the treatment of challenging diseases that conventional therapies struggle to address, as demonstrated by the FDA’s recent approval of gene-edited T-cells for the treatment of acute lymphoblastic leukemia. Since many human genetic variants linked to diseases arise from insertions and deletions, accurately predicting gene editing outcomes is essential for ensuring treatment effectiveness and reducing the risk of unintended pathogenic mutations.