How To Use The Prophane Webserver

Introduction


Discover a Comprehensive Tutorial on Using the Prophane Web Service

Welcome to our hands-on tutorial, designed to guide you through the seamless utilization of the Prophane web service. Whether you're a beginner seeking step-by-step instructions or an advanced user looking to enhance your skills, our tutorial provides valuable insights. For those intrigued by deeper insights into Prophane, don't forget to explore our 'About Prophane' page.
Additionally, delve into the wealth of knowledge found in the publication titled A complete and flexible workflow for metaproteomics data analysis based on MetaProteomeAnalyzer and Prophane. This publication serves as a comprehensive resource for in-depth understanding.
The initial segment of each step's description begins with a concise overview, outlining the actions to be undertaken, followed by a meticulous, step-by-step guide on defining your analysis. Subsequently, a distinct pink box presents the specific steps employed for the tutorial test analysis.

Hands-On Tutorial:

Access the test files for download, which encompass the protein identification data from six gastrointestinal samples (comprising three samples from patients before dietary changes and three after) along with a corresponding database.

login

You have the option to either log in and create a user account using 'Login with Google,' or you can proceed without logging in.
The benefits of setting up a user account via 'Login with Google' include unrestricted access to all analyses you've ever performed.

If you choose to continue without logging in, you can provide your email address to receive notifications with a direct link to the analysis results. Alternatively, if you prefer not to log in and don't wish to provide an email address, you can manually copy the result URL for downloading and access to status information.
login

Hands-On Tutorial:

Whether you choose to log in or not doesn't matter for this tutorial.
When initiating a Prophane analysis, you have the option to choose between a standard analysis and expert mode by toggling the 'expert' setting. Regardless of your choice, the analysis requires minimal input, namely the (meta-)protein identification file and the protein sequence FASTA file used for peptide spectrum matching.

Standard Analysis

The standard analysis encompasses a taxonomic analysis and a functional analysis, all configured with default settings.

Expert Mode

In expert mode, you gain the flexibility to finely customize each step of the workflow and add more annotation tasks. You can select from various annotation databases and search algorithms and in the 'Custom Maps' step, you can incorporate user-defined annotation maps. Additionally, you have control over the grouping of samples, quantification algorithms, and the lowest common ancestor (LCA) algorithm.dense

Default Settings for standard Analysis:

  • Taxonomic Analysis: Diamond algorithm with the NCBI non-redundant database..
  • Functional Analysis: Emapper algorithm with the Eggnog database.
  • Quantification method: NSAF (normalized to longest metaprotein).
  • Sample groups: None
  • Custom maps: None
  • Lowest common ancestor: LCA per group method with a threshold of 1.

expert toggle

Hands-On Tutorial:

Activate the expert toggle!

Step by Step Instruction: Setting Up Your Prophane Job

1. Job Label:
Begin by giving your new Prophane job a descriptive name. This label will help you easily identify and manage your analyses.
2. Source:
Select the appropriate input format for your protein identification file. Choose the format that matches the file you're uploading.
3. Report file:
Upload the protein identification file generated from your mass spectrometry analysis. This file should contain information about protein groups.
4. FASTA file:
Upload the corresponding FASTA file. This file should contain sequences for all the proteins identified in your report file. Ensure that the protein accessions or IDs in the FASTA file match those in the protein identification report.
5. Exclude:
Use the 'Exclude' option to omit specific proteins from your analysis based on string matching.

The minimum required input for a Prophane analysis includes a proteomic search result file and the corresponding matching FASTA file. To specify the format of the proteomic search result file, use the 'Source' option. The proteomic research file should be uploaded under 'Report file' and the protein sequence database under 'FASTA File'.

Once all the necessary input files have been selected, and you intend to perform a standard analysis exclusively, simply proceed to the next step and initiate the analysis by clicking the 'Submit' button.

Report file

The report file is the protein identification file derived from mass spectrometry analysis. It must include information about protein groups, also referred to as metaproteins or protein families. Prophane is versatile and supports a variety of input formats, including:

  • generic input format
  • mzTab format
  • mzIdent1.2 format
It also accommodates data produced by search software such as:
  • MetaProteome Analyzer in single or sample comparison format
  • Proteome Discoverer protein group output
  • Scaffold output
Descriptions on how to generate report files can be found in "About Prophane", "Input Files".
If you intend to use a different input format regularly, please contact us via email at support@prophane.de.

FASTA File

The FASTA file can be plain or gzipped and should contain sequences for all identified proteins. Ensure that the protein accessions or IDs in the FASTA file match those used in the protein identification report. You can use the same FASTA file you used for peptide-spectrum matching. However, if the FASTA file is excessively large, it can significantly increase Prophane's runtime. Ideally, the FASTA file should contain all sequences present in the identification file and no more. Avoid using sequence accession IDs containing pipes ('|') to ensure consistent data processing by Prophane. Sequence IDs with two or more pipes, such as Uniprot protein accessions, will be processed (e.g. from sp|O70558|SPR2G_MOUSE to 070558).

Exclude

In the 'Exclude' option, you can omit specific proteins, such as contaminations, from the analysis based on their accessions and string matching.
For example, if you have contamination proteins with accessions starting with "CON_," select "Accessions starting with" in the exclude field and enter "CON_."

input

Hands-On Tutorial: Setting Up Your Prophane Job

1. Job Label:
Name your new Prophane job, e.g. "Gut Microbiome Analysis".
2. Source:
Choose "MPA (multiple experiment)" as input format.
3. Report file:
Upload the "GutMicrobiome_Workshop.csv" file as the protein identification file.
4. FASTA file:
Upload the "GutMicrobiome_Metagenome.fasta" file.
5. Exclude:
Select "accessions starting with" as the exclusion criteria, and enter "ALBU_" to exclude all albumin proteins from the Prophane analysis.

Step by Step Instruction: Creating Sample Groups in Prophane

1. Add Group:
Click the "Add Group" button to begin creating a new sample group.
2. Sample Group Naming:
Name your sample group descriptively (e.g., "Treated Group"). You have the flexibility to choose arbitrary names for your sample groups as desired.
3. Add Samples:
For each sample that belongs to this group, click the '+' button to add a new Sample.
4. Sample Names:
Enter the correct sample name exactly as it appears in the protein identification file. Ensure accuracy in spelling.
5. Control Naming:
Double-check to confirm that the sample name is spelled correctly.
In this section, you have the option to categorize your samples into different sample groups. These sample groups are crucial for obtaining average quantification values across various samples and enable you to compare different sets of samples, such as untreated samples versus treated samples.
The mean quantification values for each sample group will be included in the final output, which consists of the "summary.csv" table and the "lca_summary.mztab" file. These values are also visualized in the Krona-Plots. Additionally, the quantification values for each individual sample will be reported. However, if no sample groups are specified, quantification will be performed separately for each individual sample.
This flexibility in sample grouping and quantification allows you to tailor your analysis to your specific experimental design and research goals.
Please note that Prophane extracts sample names from various protein reports, so it's essential to use the exact same sample names as they appear in these reports to ensure accurate grouping and analysis. The "Sample Group" part of the Prophane documentation describes how Prophane extract sample names from different protein reports.

input

Hands-On Tutorial: Creating Sample Groups in Prophane

Fill in the following fields according to the example picture.
1. Add Group:
Add two sample groups by clicking "Add Group"
2. Sample Group Naming:
Name your sample group descriptively, e.g., "pre-diet" and "post-diet"
3. Add Samples:
Add three samples to both groups by clicking "+".
4. Sample Names:
Enter the correct sample names. This must be a 100% match to sample names used in the identification file (see picture above).

Step by Step Instruction: Quantification Method

Quantification Method:
Use the drop-down menu to select your preferred quantification method.
Default setting: NSAF (normalized to longest metaprotein sequence).
Normalized Spectral Abundance Factor (NSAF) is a label-free quantification method that normalizes the number of identified spectra to the size of the metaprotein. The normalization can be performed relative to the largest, smallest protein within the protein group, or to the mean of all protein group members. This method is suitable for all input protein reports that include spectral counts.
Proteome Discoverer Protein Group Reports: Special Consideration
It's important to note that the quantification in Proteome Discoverer protein group reports is not based on spectral counts, and the abundances are already normalized within these reports. Therefore, when working with Proteome Discoverer protein group reports, select the "raw" option to preserve the existing normalization.
For more detailed information about the different normalization methods and when to use them, you can refer to the "Quantification" part of the Prophane documentation describing these methods in greater detail.

quantification

Hands-On Tutorial: Quantification Method

Quantification Method:
Choose NSAF (normalized to mean metaprotein)

Step by Step Instruction: Taxonomic Annotation

1. Task Label:
Name your task descriptively.
2. Reference Database:
Choose your preferred protein sequence database to be used for the sequence homology search.
3. Search Algorithm:
There's no choice here; the default search algorithm is always DIAMOND BLASTP.
4. E-Value:
Define the E-value. Smaller values make the search stricter. Alternatively, choose from the drop-down menu options: Relaxed, Mid-Range, or Strict.
5. Optional: Add Advanced Options
If needed, you can include advanced options by selecting an option and then clicking the "+" button.

Adding a New Taxonomic Task:
To create an additional taxonomic task, click the "Add Task" button and define the parameters as described above.
Removing a Taxonomic Task:
To remove a taxonomic task, click the '-' button located on the right side of the "Task Label" field.

Taxonomic Annotation in Prophane

Taxonomic annotation in Prophane involves a search within the selected reference database to find proteins that bear similarity to those present in your analysis. Subsequently, Prophane retrieves the associated taxon ID and outlines the entire taxonomic lineage of the protein that exhibits the closest resemblance to the query.
The search algorithm employed in Prophane is DIAMOND BLASTP. This powerful algorithm is responsible for identifying sequence homologies within the specified protein database.

Prophane supports following protein sequence databases:

  • NCBI protein nr
  • UniprotKB (Swiss-Prot & TrEMBL)
  • Swiss-Prot
  • TrEMBL

For more detailed information about the DIAMOND BLASTP algorithms please crefer to the Prophane About page. You can explore different protein databases and their unique characteristics by consulting the "Annotation Databases & Algorithms & Parameters" part of the Prophane Documentation.

Parameters for Taxonomic Annotation in Prophane

E-Value:

The E-value parameter defines the threshold for similarity for reported proteins. Smaller E-values indicate more stringent similarity criteria, resulting in reporting only very closely matching proteins. Conversely, larger E-values, referred to as the "relaxed" setting, expand the reporting to include more matches with varying degrees of similarity.

Advanced Options:

For comprehensive information on additional parameters and their functionalities, please refer to the DIAMOND documentation. This resource provides in-depth insights into the various options available for optimizing your taxonomic annotation analysis in Prophane.

defining taxonomic task

/home/jules/Documents/Metaproteomics/Prophane/prophane_tutorial_img/Custom_map_2.png /home/jules/Documents/Metaproteomics/Prophane/prophane_tutorial_img/Custom_map_1.png

Hands-On Tutorial:: Taxonomic Annotation

1. Task Label:
Name the task, e.g. "Uniprot annotation".
2. Reference Database:
Choose reference database "UniprotKB".
3. Search Algorithm and Advanced Parameters:
Use Default Settings, nothing to be done.

4. Add new taxonomic task:
Click "Add Task" button.
5. Task Label:
Name the task, e.g. "Swissprot annotation"
6. Reference Database:
Choose reference database "Swiss-Prot".
7. Search Algorithm and Advanced Parameters:
Use Default Settings, nothing to be done

Step by Step Instruction: Functional Annotation

1. Task Label:
Name your task descriptively.
2. Reference Database:
Choose your preferred protein sequence database to be used for the sequence homology search.
3. Search Algorithm:
Emapper for EggNog database, for other databases choose between hmmscan and hmmsearch.
4. E-Value:
Define the E-value. Smaller values make the search stricter. Alternatively, choose from the drop-down menu options: Relaxed, Mid-Range, or Strict.
5. m-Value:
Only for Eggnog database Choose you prefrerred method used by the emapper algorithm
5. Optional: Add Advanced Options
If needed, you can include advanced options by selecting an option and then clicking the "+" button.

Adding a New Functional Task:
To create an additional functional task, click the "Add Task" button and define the parameters as described above.
Removing a Functional Task:
To remove a functional task, click the '-' button located on the right side of the "Task Label" field.

Functional Annotation in Prophane

Functional annotation in Prophane involves a search within the selected functional database to find proteins that bear similarity to those present in your analysis. Subsequently, Prophane retrieves the annotated functions and outlines different functional annotation levels of the protein that exhibits the closest resemblance to the query.
The search algorithm employed in Prophane is hmmscan, hmmsearch or emapper.

Prophane supports following protein functional databases:

  • EggNog
  • PFAMs
  • TIGRFAMs
  • FOAM
  • CAzY/dbCAN
  • ResFAMs (full)
  • ResFAMs (core)

For more detailed information about the different functional databases please crefer to the Prophane About page. You can explore different protein databases and their unique characteristics by consulting the "Annotation Databases & Algorithms & Parameters" part of the Prophane Documentation.

Parameters for Functional Annotation in Prophane

E-Value:

The E-value parameter defines the threshold for similarity for reported proteins. Smaller E-values indicate more stringent similarity criteria, resulting in reporting only very closely matching proteins. Conversely, larger E-values, referred to as the "relaxed" setting, expand the reporting to include more matches with varying degrees of similarity.

Advanced Options:

For comprehensive information on additional parameters and their functionalities, please refer to the EggNog documentation and HAMMr documentation This resources provide in-depth insights into the various options available for optimizing your functional annotation analysis in Prophane.

functional task

Hands-On Tutorial:: Taxonomic Annotation

1. Task Label:
Name the task, e.g. "EggNog annotation".
2. Reference Database:
Choose reference database "EggNog".
3. Search Algorithm and Advanced Parameters:
Use Default Settings, nothing to be done.

4. Add new functional task:
Click "Add Task" button.
5. Task Label:
Name the task, e.g. "Pfams annotation".
6. Reference Database:
Choose reference database "PFAMs".
7. Search Algorithm and Advanced Parameters:
Use Default Settings, nothing to be done.

Step by Step Instruction: Custom Map Annotation

1. Add a new custom map task
2. Task Label:
Name your task descriptively.
3. Scope:
Define the scope of your analysis, whether it's functional or taxonomical. Please note that this choice does not affect the analysis itself; it is used for automated naming purposes.
4. Custom Map:
Select and upload your custom map file, which contains the custom annotation information.

Adding a New Custom Map Task:
To create an additional custom map task, click the "Add Task" button and define the parameters as described above.
Removing a Custom Map Task:
To remove a custom map task, click the '-' button located on the right side of the "Task Label" field.

Custom Map Annotation in Prophane

Prophane allows users to upload custom maps for taxonomic or functional annotations. These user-specified custom maps enable the mapping of identified protein accessions to specific taxonomic or functional information. The "acc2annot_mapper" within Prophane matches annotations based on protein accessions.
One common use case for custom maps is assigning taxa to protein groups in experiments with a known sample composition. Custom maps containing protein accessions and species lineages allow for tailored annotation, even excluding species with similar sequences.
Annotations are extracted from a user-provided tab-separated table (TSV) file. The names of the annotation levels are derived from the column names in the user-provided table, which serves as the basis for annotation. It's important to avoid using spaces, commas, tabulators, and semicolons in the column names.

For more detailed information about the format specifications for custom maps, please refer to the "Custom Maps" part of the Prophane Documentation.

If you want to use your custom map, first add a new custom map task:

Custom Map Annotation Add Task

Name your tasks and choose custom map file for upload:

Custom Map Annotation

Hands-On Tutorial: Custom Map Annotation

  • Skip this part and go to next step.

Step by Step Instruction: Lowest Common Ancestor

1. LCA Method:
Choose between the LCA methods "LCA per group" and "democratic LCA". The default selection is "LCA per group."
2. Threshold:
IIf you opt for the "LCA per group" method, define the threshold value. This threshold specifies the minimum percentage of shared annotations required for LCA selection. The threshold value can range from 0 to 1, with the default set at 1.
3. Optional: Add Advanced Options:
Add advanced options by selecting an option and clicking the '+' button.
Remove an options:
Use the '-' button located on the right side of the parameter field.

LCA in Prophane

The LCA approach searches hierarchical data to find the lowest common node shared by all members of a group. For every functional or taxonomical annotation at each annotation level, Prophane determines an LCA to represent all protein group members. Often, proteins within a group or metaproteins have different annotations. To obtain a unified representative for all members, Prophane determines the lowest common ancestor among them. If this is not possible the assigned LCA-value is referred to as "various". Two different methods are available for LCA determination:
The "LCA per group" method determines the LCA based on annotations of proteins within a protein group. It allows users to set a threshold value for each group, ranging from 0 to 1.A threshold value of 1 returns an LCA only if all annotations of all protein group members match; otherwise, "various" is returned. For example, with a threshold value of 0.51, Prophane returns an LCA if more than half of the annotations are the same.
The "democratic LCA" method selects the annotation that occurs most frequently across all protein groups from each protein group.

Advanced Options

  • ignore_unclassified: This advanced option considers only proteins for LCA determination if an annotation was found (excluding "unclassified" annotations).
  • minimum_number_of_annotations: This advanced option considers only proteins for LCA determination if an annotation was found (excluding "unclassified" annotations).

ignore_unclassified

The advanced option "ignore_unclassified" only takes into account for LCA-determination proteins for which an annotation was found (no "unclassified" annotations).

minimum_number_of_annotations

The "minimum_number_of_annotations" advanced option in Prophane empowers users to specify a minimum count of annotations within the annotation lineage that will be taken into account by the LCA methods. This feature proves especially useful for effectively managing annotations that might be considered extraneous or less informative.
For instance, when dealing with annotations like vector annotation that typically consist of lineage with only one-level annotations, this option allows users to filter out these less informative annotations, ensuring a more meaningful and accurate LCA determination process.
For more detailed information about LCA determination and further explanations, please refer to the provided "LCA" part of the Prophane documentation.

LCA

Hands-On Tutorial:: Lowest Common Ancestor

1. LCA method:
Choose the "LCA per group" method.
2. threshold
Define a threshold value of 0.51.
3. Add Advanced Options:
Choose option "minimum_numer_of_annotation" AND click the '+' button.
4. Minimum number of annotaions:
Enter the value 2.

Starting Your Prophane Analysis

The summary allows you to ensure that your input files and task settings are accurately configured to meet your analysis requirements. Once you've verified your inputs and settings, proceed by clicking the 'Submit' button to initiate the analysis.
Your files are uploaded to the Prophane server, and the analysis process begins. Please note that a typical Prophane run may take several hours.
All submitted Prophane jobs are queued. The time it takes to start your Prophane job depends on your position in the queue. Please be patient.
Once your Prophane job is successfully completed, you will receive an email notification if you are logged in to your account or if you've provided an email address. If you are not logged in, you can utilize the provided link to check the Prophane job status.
To access and download the analysis results, you have two options: you can either log in to your user account and use the Job Control panel, or you can use the download link you stored when submiiting the prophane job (same link is provided in the email notification ).

summary

Hands-On Tutorial:

  • 1. Check if all inputs and task settings are correct. Compare with picture above.
  • 2. Click "Submit" button.

In the Job Control panel, accessible to registered users, you can view all your submitted jobs. This panel provides real-time updates on the status of your jobs, which may include being in a queue, running, completed, or marked as failed.

Once your Prophane job is ready, you'll find a download link under the "Results" section, allowing you to retrieve your analysis results. Additionally, a link to view the resulting Krona plots in your web browser is provided.

The Job Control panel offers an organized and efficient way to manage your Prophane jobs and access the outcomes of your analyses.

Job Control

Accessing Your Prophane Analysis Results

  • 1. Download the result.zip File: Start by downloading the result.zip file and then unzip it.
  • 2. Folder Structure: Upon unzipping the folder, you will encounter various subfolders. While most of these are specific to the Prophane analysis and may not be of direct interest to you, there are key folders to focus on:
  • 3. Summary of the Prophane Analysis: You can access a summary of the analysis in one of the following formats:
    • "summary.txt" file: This file provides a Prophane analysis summary in a tab-separated format. It can be easily imported into Excel for further analysis.
    • "lca_summary.mztab" and "protein_summary.mztab" files: These files offer a standardized summary format. The "lca_summary.mztab" file contains group results, including LCA and quantification results. On the other hand, the "protein_summary.mztab" file contains annotations of the proteins analyzed. For more detailed explanations of these different summary output formats, you can refer to [link] and [link].
  • 4. Summary Visualization: Krona plots are interactive HTML5 charts of hierarchical data and visualize the taxonomic or functional abundances in a metagenome.
    These plots can be found in the "plots" folder.
    The Krona plots provide an interactive visualization of the annotation results, more precisely from the lca results of the metaproteins . They specifical desined for the vizualization of metabiome analyses. The size of the boxes corresponds to the percentage of the summed quantification results of all groups sharing the same LCA of the corresponding level. In the middle you find the top level annotation. For taxonomic annotation this corresponds to the superkingdom level. The further out you go, the more specific the annotations level and the more often you see various. You can zoom into the Krona Plot and zoom out. And you can change the showed results between sample groups and samples.
For more detailed information about Prophane results and further explanations, please refer to the provided "Result" part of the Prophane documentation.
Krona Plot

Hands-On Tutorial:

  • 1. Download the result.zip File by clicking on download link.
  • 2. Check out your results.
Load the summary.txt into excel and examine the results:
  • Open Excel
  • Go to data panel.
  • To Fill.
  • Examine summary.txt: you see Two differnt row types: the group summary row with the lca and quantification results and the protein rows containing annotations per protein (multiple annotations per protein possible)

Check the Krona Plot:
  • Open by clicking on one of the html files in the plots folder.
  • Examine Krona Plots: The comparison of the taxonomic results between swissprot and uniprot analyses show a lot more unclassified protein groups with the small swissprot database. So for 15% of the protein groups no annotation at all was possible. ) Closely related species share similar proteins, so identification of proteins in one group. For the eggnog db the toplevel annotation (the main role) are cellular processes like metabolism or signalling