Multiple Sequence Viewer Panel |
The Multiple Sequence Viewer is an alignment, visualization, and manipulation toolkit for multiple sequences.
To open the Multiple Sequence Viewer panel, you can:
Choose Tools → Multiple Sequence Viewer in the main window.
The Multiple Sequence Viewer panel provides a range of tools for manipulating and aligning multiple sequences.
The Multiple Sequence Viewer (MSV) has its own projects, which contain all the
sequences in the project along with associated data. The project is stored in a
single file with a .msv
extension, and by default is stored inside
the Maestro project. You can save it externally if you wish.
The project can include sequences imported directly into the project, and sequences that are displayed in the Workspace. Directly imported sequences remain in the project unless explicitly deleted. Sequences in the Workspace are transient: when structures are inlcuded in or excluded from the Workspace, the sequence is added to or removed from the project.
Although the data is stored separately from the Maestro project data, there are interactions between the Workspace and the MSV, which depend on settings that you make:
Changes that are made in the Workspace are propagated to the MSV. These changes cover inclusion and exclusion of entries; deletion, mutation, or insertion of residues; and selection of residues.
Changes to the residue selection in the MSV can be propagated back to the Workspace.
Changes to the sequences in the MSV are not propagated back to the Maestro structure.
Color schemes can be transferred between the MSV and the Workspace.
The speed of the sequence viewer depends on the number and length of sequences loaded into the viewer. Typically, alignments of 20 sequences of 300 residues or fewer can be interactively viewed and edited. Below are a few speed optimization tips.
Remember that operations involving Maestro sequences may depend on the complexity of the Maestro workspace.
This menu provides tools for creating, opening, and saving Multiple Sequence Viewer projects; importing and exporting sequences; saving images, and closing the panel.
Create a new, empty query in a new tab.
Rename the current query. The name appears on the tab for the query.
Create a copy of the selected sequences in the current query in a new tab.
Delete the current query. All sequences, structures, and data associated with the query are removed (unless they are also in another query tab).
Open an existing Multiple Sequence Viewer (MSV) project. Opens a file selector, in which you can navigate to and select the project.
Save the current MSV project. If the project has not yet been saved, a file selector opens, in which you can navigate to a location and name the project. If the project has been saved previously, the project is simply saved.
Save the current MSV project with a new name. Opens a file selector, in which you can navigate to a location and name the project. After saving, the current project is the one with the new name.
Import sequences into the project. Sequences can be imported from a range of file formats: FASTA, SWISSPROT, GCG, PIR, EMBL, as well as PDB and Maestro. Opens a file selector, in which you can choose the file type, navigate to and select the sequence file.
The file selector has four options:
Structural data (ATOM records), B-factors, and secondary structure assignments are also imported if the data are in the PDB file. A nonstandard version of FASTA format is accepted, in which a residue can be preceded by its residue number; numbering is otherwise sequential, starting from 1.
Export sequences from the project to a file in FASTA format or as a plain text file. Opens a file selector, in which you can navigate to a location and name the file. The file selector has three options:
>1abc|ID:7.14|SIM:19.64|HOM:17.86
These options only apply to export in FASTA format.
Save an image of the sequence viewer. Available formats are PNG, EPS, and PDF. Opens a file selector, in which you can navigate to a location and name the file. The selector has one option: Export image of the entire alignment, which is selected by default. If it is not selected, the image is of only the visible part of the alignment (what you can see in the sequence display area of the panel).
Close the current Multiple Sequence Viewer (MSV) project and close the panel.
From this menu you can choose to undo or redo actions, edit sequences, and delete sequences. The sequence editor panel accepts standard editor key strokes, such as Ctrl+C (⌘C), Ctrl+X (⌘X), and Ctrl+V (⌘V) for copying, cutting, and pasting text.
Undo the last editing operation. The operation is appended to the menu item text, for example Undo Load File.
Redo the last undone editing operation. The operation is appended to the menu item text, for example Redo Load File
Create a new sequence by entering the letter codes for the residues. Opens the Sequence Editor dialog box, in which you can name the sequence and type in the letter codes for the sequence.
Edit an existing sequence as a string of letter codes or create a new
sequence. Opens the Sequence Editor dialog box, in
which you can change the name of the sequence, and add or delete residues or
gaps by character code. Only the 20 standard amino acid codes, X
(unknown residue), and -
and ~
(gap symbols) are
recognized. When you have finished editing, you can either replace the existing
sequence (if one was selected), or add it as a new sequence to the end of the
list.
Insert entire sequences into the sequence viewer in FASTA format. Opens
the Sequence Editor dialog box, in which you can paste
the sequences and edit them. The same editing rules apply, except that lines
beginning with a >
character are treated as a sequence header.
These lines also separate one sequence from the next. Multiple sequences thus
delineated are saved as separate sequences.
This feature is useful for copying an alignment from a web site and adding it directly to the MSV without having to save a file. You can also add sequences by importing a file.
Duplicate the selected sequences, and place each duplicate immediately below its parent.
Find a pattern in the sequences. Displays the Find toolbar if it is not displayed, and puts focus in the text box so you can start typing the pattern.
Renumber the residues in one or more sequences. Opens the Renumber Residues panel, where you can either renumber the selected sequences so that residues at the same position in the sequence viewer have the same residue number; or import and align a template for a single sequence, and apply the numbering of the template to the selected sequence.
Delete the sequences that are selected in the viewer.
Remove sequences whose sequence identity is greater than a given threshold. Opens a dialog box in which you can set the threshold, then click Remove to perform the action. The default threshold is 100%. For each pair of sequences that are considered identical, the shorter of the two is deleted; if they are of the same length, the second (lower down in the sequence viewer) is deleted. In the latter case, you can change which sequence is discarded by reordering the sequences. This task operates on all sequences. If you click Cancel in the dialog box, the redundant sequences become the selection, so you can perform operations on them.
From this menu you can control which sequences are selected, which sequences are shown, and the order in which they are listed.
Hide the selected sequences.
Display all sequences.
Delete all sequences from the project.
Select all sequences.
Deselect all sequences.
Invert the selection of the sequences: select the unselected sequences, and deselect the selected sequences.
Expand all sequences so that the associated data, such as secondary structure assignment, is displayed. This command is mapped to Ctrl+DOWN ARROW (⌘DOWN ARROW).
Hide the associated data for all sequences. This command is mapped to Ctrl+UP ARROW (⌘UP ARROW).
Set the color used for the name of the selected sequences. Opens a color selector, in which you can select a color. This feature allows you to color-code the names of the sequences.
Order sequences by the phylogeny tree generated by ClustalW. The tree is displayed in the leftmost part of the display area, which is normally hidden. When this option is selected, the other sorting items are not available. The nodes of the tree have a shortcut menu, with items Swap Branches to swap the order of the branches originating at the node, Select Sequences for selecting sequences from the branches originating at the node, and Hide Branch, for hiding the sequences for the branch.
Sort the sequences by the value of a given property, in ascending order. The properties that can be used are Name, Chain ID, Length, Number of Gaps, Sequence Identity, Sequence Similarity, Sequence Homology, and Sequence Score. Homology is calculated as the percentage of residues with identical side-chain chemical properties (as defined for the Side-Chain Chemistry color scheme).
Sort the sequences by the value of a given property, in descending order. The properties that can be used are the same as for Sort Ascending.
Move the selected sequences up one position in the list.
Move the selected sequences down one position in the list.
Move the selected sequences to the top of the list.
Move the selected sequences to the bottom of the list.
Download structural information (secondary structures, B-factors, coordinates) for sequences that came from the PDB, such as in a Blast search. The information is obtained from a local copy of the PDB or from the RCSB web site. If sequences are selected, information is obtained for these sequences, otherwise it is obtained for all sequences. The PDB sequence replaces the corresponding sequence in the MSV.
This menu provides tools for automatic sequence alignment with ClustalW, residue selection and manual sequence alignment tasks. Some of the features are also available on the toolbar. Residues are not renumbered when the gaps are added or removed.
Align the selected sequences simultaneously using ClustalW. If there are columns (residues) selected, the alignment is performed only on the selected residues. You can run an alignment on several discontinuous selected regions at the same time.
Align the selected sequences pairwise using a Smith-Waterman algorithm, and the settings from the Alignment Settings dialog box.
Align new sequences with a query sequence without changing the existing alignment. Gaps are inserted into the existing alignment to preserve the residue matching.
Align sequences so that residues with identical residue numbers (and insertion codes) are aligned. This is useful for families of proteins that share common numbering schemes, such as antibodies.
Apply constraints on pairwise alignments, so that the constrained residues are in the same position (same column) after the alignment.
When you select this option, a constraint row is displayed between the query sequence and the other sequences. To add a constraint, click on a residue in the query sequence and then on a position in one of the other sequences. The constraints are displayed as red lines connecting the constrained residue pair. To remove a constraint, click on the constrained residue pair again. To remove all constraints, choose Clear Constraints from the Alignment menu or from the sequence shortcut menu.
You can also allow or disallow gaps in secondary structure elements, in the Pairwise Alignment Settings dialog box.
Remove all constraints on the alignment.
Opens the Pairwise Alignment Settings dialog box, in which you can choose the similarity matrix type, set the gap opening penalty and the gap extension penalty, choose whether to allow gaps in secondary structure elements, and create a substitution matrix from an existing alignment. This substitution matrix can be selected as Custom from the matrix option menu for subsequent alignments.
Lock gaps in the alignment so that they are not filled when performing manual alignment. If you insert a gap after locking the gaps, the new gap is not automatically locked. If you have a residue selection, the gaps are only locked in the selected region.
Unlock previously locked gaps so that they can be filled.
Select residues that are identical in all sequences. Gaps are ignored.
Select blocks of residues for which there are no gaps in any of the sequences.
Select columns in which at least one of the residues has 3D structure (atom coordinates) associated with it. This is useful for multiple aligned structures with SEQRES regions that don't have crystal coordinates in parts of their sequences.
Expand the residue selection in the selected sequences to include the corresponding residues in all of the selected sequences. If no sequences are selected, the residue selection is expanded to cover all sequences.
Expand the residue selection in the query sequence to include the corresponding residues in the selected sequences, or in all sequences if no sequences are selected.
Hide the columns in which residues in all sequences are selected.
Hide the columns in which not all residues are selected (columns in which one or more residues are not selected).
Show all columns for all sequences.
Select all residues.
Invert the selection of the residues: select the unselected residues, and deselect the selected residues.
Deselect all residues.
Delete the selected residues. Also Shift+Backspace (⇧Delete).
Delete the unselected residues.
Remove columns that consist entirely of gaps.
Replace the selected residues with gaps, in the selected sequences.
Remove gaps in the residue selection of the selected sequences by shifting residues to the left. If there is no selection, all gaps are removed, including gaps at the beginning of the sequences.
Track the regions of the sequence that have been edited, by showing an annotation that marks the edited regions. The latest edit is shown in black, and earlier edits in progressively lighter shades, up to the fifth last edit.
Clear the history of changes stored when changes are tracked.
Apply a color scheme to the sequences.
Color the residues by residue type. The colors are:
ACFILMPVW | blue | (hydrophobic) |
DE | red | (acidic) |
HKR | green-yellow | (basic) |
GNQSTY | orange | (other) |
Color residues by similarity. Identical residues are red, similar residues (positive BLOSUM62 pairwise score) are orange, other residues are white.
Color residues by Kyte-Doolittle hydrophobicity. Hydrophilic residues are blue, hydrophobic residues are red, residues with zero hydrophobicity are white.
Color residues by Hopp-Woods hydrophilicity. Hydrophilic residues are red, hydrophobic residues are blue, residues with zero hydrophobicity are white.
Color residues with the Aminochromography color scheme developed by William Taylor (Protein Engineering 1997, 10, 743–746). In this scheme, well conserved parts of the alignment exhibit bright, clear colors, while parts that are not well conserved have brownish, dull colors.
Color all residues with the color chosen from the submenu. Twelve colors are offered on the submenu, with a Custom item so that you can choose your own color in a color selector.
Color the sequence by the secondary structure assignment. If no SSA is available but one or more secondary structure predictions are available, the predictions are used to color the sequence. The colors for multiple predictions are averaged, so positions where all predictions agree have bright colors, and the positions of disagreement are more gray. If no SSP nor SSA is available, the color of the sequence is not changed.
Color the residues by their temperature factor (PDB B factor), on a green-white-red scale, with green for the lowest values and red for the highest.
Color residues by the residue propensities. The schemes that are available on the submenu are described in the table below.
Scheme | Residues | Color | Description |
---|---|---|---|
Helix Propensity | AMLEQK | red | helix-forming |
VIFW | magenta | weak helix-forming | |
CSTNDHR | gray | ambivalent | |
PGY | blue | helix-breaking | |
Strand Propensity | VILMTFWY | blue | strand-forming |
ACSNQHR | gray | ambivalent | |
DEKGP | red | strand-breaking | |
Turn Propensity | GSDNP | cyan | turn-forming |
AVLIMHFWC | magenta | turn-breaking | |
EQTKRY | gray | ambivalent | |
Helix Terminators* | GTMRKHF | green | helix-starting |
SNDELWP | red | helix-ending | |
CQAVIY | gray | ambivalent | |
Exposure Tendency | RNDQEHK | blue | surface |
ACGPSTWY | gray | ambiguous | |
ILMFV | orange | buried | |
Steric Group | GACS | red | small, noninterfering |
TVNDILPM | magenta | ambiguous | |
QEKR | cyan | sticky polar | |
HFYW | blue | aromatic | |
Side-Chain Chemistry | DE | red | acidic, hydrophilic |
(the default) | RKH | blue | basic, hydrophilic |
GAVILM | green | neutral, hydrophobic, aliphatic | |
FYW | orange | neutral, hydrophobic, aromatic | |
STNQ | cyan | neutral, hydrophilic | |
C | yellow | primary thiol | |
P | dark gray | imino acid |
Mark the selected residues with the color chosen from this submenu. This color overrides any other color applied. To remove this color, select the residues and choose Unmark Residues from the submenu.
Set the limits of sequence identity used when weighting the color density by alignment quality. Opens a dialog box, in which you can set the lower and upper threshold for the sequence identity. Residues with identity below the lower threshold are colored white; residues with identity above the upper threshold have full color, and the color density for residues with identity between the two thresholds is set using a linear scale.
Show or hide the sequence coloring.
Use white for the text on residues colored with dark colors, and black for the text on residues colored with light colors. If this option is not selected, the text is black for all residues.
Annotations are a means of representing additional information associated with the sequences. There are two classes of annotations: global annotations (consensus sequence, mean hydrophobicity) and local annotations. The global annotations are calculated for the entire set of sequences, the local annotations are computed for each sequence individually. Depending on the annotation type, they are presented as histogram plots (hydrophobicity, B-factor), color bars ("Color Blocks"), alphanumeric strings (consensus sequence, SSP, Pfam), graphical representations (secondary structure assignments).
Display the consensus sequence at the top of the panel. The consensus sequence is the sequence that is composed of the most frequently occurring residue at each position in the sequence; if there are two residues that have the same frequency of occurrence, a + symbol is used, and the residues for this position are shown in its tooltip. The sequence is annotated with a histogram of the number of sequences that are represented by each residue in the consensus, with information on the percentage in the tooltip.
Display a row at the top of the panel that contains symbols for the degree of consensus. The symbols follow the ClustalW conventions:
*
|
Single, fully conserved residue. |
:
|
One of the following "strong" groups is fully conserved: STA, NEQK, NHQK, NDEQ, QHRK, MILV, MILF, HY, FYW. |
.
|
One of the following "weaker" groups is fully conserved: CSA, ATV, SAG, STNK, STPA, SGND, SNDEQK, NDEQHK, NEQHRK, FVLIM, HFY. |
Display logo annotation. In this annotation, residue symbols at each position whose frequency of occurence at that position is greater than a threshold are drawn in a vertical stack in order of frequency, with the height of the residue symbols proportional to the frequency of occurrence. (See Schneider T.D.; Stephens R.M. Sequence Logos: A New Way to Display Consensus Sequences. Nucleic Acids Res. 1990, 18, 6097.)
Display histogram of Kyte-Doolittle hydrophobicity for each residue in the alignment, averaged over all sequences. Hydrophobic residues have positive values; hydrophilic residues have negative values.
Add all global annotations (listed above) to the display.
Remove all global annotations from the display. Does not affect sequence-dependent annotations.
Display residue numbers above the sequence. The numbers are given every 5 residues, and are left-aligned to the left edge of the residue in the sequence. This is useful for tracking sequence changes after residue deletion, for example. The ruler only gives absolute alignments.
Display secondary structure assignment for the sequence.
Display histogram of temperature factors for each residue in the sequence.
Display disulfide bonds as lines connecting cysteine residues, colored from black (strongest prediction) to light gray (weakest prediction).
Display histogram of Kyte-Doolittle hydrophobicity for each residue in the sequence. Hydrophobic residues have positive values; hydrophilic residues have negative values.
Display histogram of isoelectric points in a 5-residue window. The isoelectric point of the isolated amino acid and the 5-residue window value are displayed in the tooltip for each histogram bar.
Display a row in which residue positions are colored by the shortest distance between any ligand heavy atom and any heavy atom in the residue at that position. Red is used if the distance is less than 4 Å, orange is used if the distance is less than 6 Å, and gray is used otherwise. This annotation takes a little time to generate, and requires a structure for the sequence.
Display assignment of the three VL and VH regions. The residues are colored red, and a red line is displayed in the annotation for each region, labeled Ln or Hn for n = 1, 2, or 3.
If the sequence is an antibody, select the CDRs in the sequence.
Choose the numbering scheme to use for residues in an antibody, from Chothia, Enhanced Chothia, Kabat, IMGT, and AHo. This numbering scheme is only in effect when the Antibody CDRs annotation is turned on. When you export the antibody to a file, it is exported with the selected numbering scheme.
Display a single row of color blocks that are colored according to one of the residue properties described under Residue Propensities for the Color menu. The row is presented like a sequence, but without letter codes, only the colors for the property. This submenu includes all the items that are on the Residue Propensities submenu of the Color menu, with the addition of the Helix Terminator item, and also has commands to display annotations for all properties on the submenu or to remove all color block annotations.
Add or remove customizable annotations. There are three types of user annotation offered:
Remove all annotations from the selected sequences, leaving just the sequences. This action removes any secondary structure assignments and predictions, as well as the annotations added from this menu. If no sequences are selected, annotations are removed from all sequences.
This menu provides access to programs for finding homologs, finding families, and predicting secondary structure.
Run a BLAST search to find homologs for the first sequence. Opens the Blast Search Settings panel, in which you can make choices for the search and start the job. A progress dialog box is displayed, and the results are displayed in the BLAST Search Results dialog box.
Show the output of the latest BLAST search, in the BLAST Search Results dialog box. The dialog box allows you to select homologs, sort results by one of a range of properties, download PDB structures for the homologs, and incorporate selected homologs into the project.
Run a Pfam search to find families for the selected sequences, or for all sequences if no sequence is selected.
Run prediction programs. This submenu has the following items:
Run all the predictions listed below.
Run the secondary structure programs to obtain a prediction of the secondary structure of the selected sequences, or for all sequences if no sequence is selected. A dialog box opens that shows job progress.
Predict accessibility of each residue to solvent. If more than 25% of total residue surface area is predicted to be exposed to the solvent, the residue is marked "e" (exposed, colored blue), otherwise it is marked "b" (buried, colored yellow).
Predict the arrangement of domains. Residues marked gray are likely to form a domain. Residues marked red are likely to be in linker (inter-domain) regions.
Calculate a disorder score and classify residues by this score The score is normalized to a 0 to 1 range. If a residue has a disorder score less than 0.5, it is marked light gray. If the score is between 0.5 and 0.9, the residue is marked orange. If the score is greater than 0.9, the residue is marked red.
Predict disulfide bridges between cysteines. Predicted bonds are drawn as lines connecting cysteine residues, colored from black (strongest prediction) to light gray (weakest prediction).
Predict the contacts between beta sheets.
Remove secondary structure predictions.
Build a structure for the query sequence, using the Prime Build Structure tools. The structure is built for the query sequence, and templates must be selected from among the sequences that have PDB structures. The Build Homology Model panel opens, in which you can select options for use of templates, for multimer models, and for building the structure; and then run the job. The structure is incorporated into the Maestro project when the job finishes, and the sequence is added to the MSV and aligned with the query (or queries, for a heteromultimer).
Calculate sequence identity in multiple alignment columns that are within a certain spatial distance from the ligand in the query. You must have a query sequence with a structure and a set of sequences that are aligned to the query. Choosing this menu item opens the Analyze Binding Site dialog box, in which you can select sequences by percentage identity to the query in the binding site, or analyze the percentage identity, similarity, and homolgy of the aligned sequences to the query within a range of distances from the binding site.
Compare all sequences or the selected sequences, by identity, similarity, or homology. The percentage is displayed in a table, like a heat map, with cells color-coded by the percentage value. Opens the Compare Sequences panel, in which you can choose the comparison measure to display, switch between all or selected sequences, and refresh the display after changing the alignment.
Opens the Multiple Sequence Viewer Job Settings dialog box, so that you can make settings for any job that is run from the MSV.
Show the log file for the most recent job in a dialog box.
This menu provides options and actions for the interaction with the sequences and their structures as stored in the Maestro project and displayed in the Workspace.
Incorporate the sequences and the corresponding structures from the entries that are in the Workspace into the MSV.
Incorporate the sequences and the corresponding structures from the entries that are selected in the Project Table into the MSV.
When structures are imported into the MSV and incorporated into the Maestro project, include the structures in the Workspace.
Associate Maestro entries with sequences in the MSV without adding new sequences, modifying the alignment,or importing structures into the MSV. Opens the Associate Maestro Entries with MSV Sequences dialog box, in which you can choose an entry chain from the Workspace (list on the left) and an MSV sequence (list on the right), and click Associate Selected Pair to associate the sequence with the entry. The sequence identity for the selected pair is shown below the lists, and colored green if the identity exceeds 95% or red if it does not. The residues that do not match the entry sequence are marked as structureless in the MSV by using a less intense color. This facility is useful if the sequences are imported into the MSV and the structures are independently imported into Maestro.
Superimpose structures according to their sequence alignment. Uses the Superposition panel, with sequence identities selected as the atoms for superposition.
Run the Prime Protein Structure Alignment program on the selected (or all) protein structures and return the alignments. The sequences you select must have structures associated with them.
Color sequences with the colors that they have in the Maestro Workspace. The color of the alpha carbon is used to color the residues.
Apply the colors from the MSV to the sequences and structures in Maestro.
Color the molecular surface in the Workspace using the colors from the corresponding sequence in the MSV. If sequences are selected, only the colors of the selected sequences are applied.
Update the sequences in the MSV that originated from Maestro with any changes made in Maestro.
Update the atom selection in the Maestro Workspace from the residue selection in the MSV.
When this option is selected, changes made to sequences in the Maestro project are automatically propagated to the MSV. If it is not selected, you can choose Synchronize with Maestro to apply changes made in Maestro to the MSV.
When in Edit mode, allow mutation operations on the sequence to change the structure in Maestro. If this option is not selected, you will not be allowed to mutate sequences that came from Maestro structures. This option has no effect on deletions in the sequence, which are not propagated to the structure.
This menu provides settings for control of what is displayed.
When selected, wrap the sequences so that the display consists of multiple rows of sequences, and can be scrolled vertically. When unselected, display the sequences in a single row that scrolls horizontally. Operations on unwrapped sequences are generally faster, especially when there are many sequences.
When selected, group the annotations of the same type and display the groups as separate rows, below the sequences. When unselected, display the annotations for each sequence directly below the sequence.
Change the font size for the text in the sequence viewer. This submenu offers a selection of point sizes for the font.
Display a "ruler" that marks the residue positions in the sequence viewer. These positions are not the same as the residue numbers in each sequence, which can be offset from the origin and have gaps.
Show information about the panel and its contents in tooltips (text displayed when the pointer pauses over the relevant part of the panel).
Display a heading row above the sequences with labels for each section.
Replace residue symbols with dots for all residues that are identical to those in the query sequence. This feature makes it easy to find the mutations from the query sequence in the other sequences.
Add gaps to the end of each sequence so that the sequences are the same length.
Display the residue number of the first and last visible residue in each row for each sequence. The numbers are displayed to the left and right of the sequence.
Display the percentage identity with the query sequence to the right of the sequence. By default, the query sequence is the consensus sequence. You can set the query sequence by right-clicking on a sequence and choosing Set as Query from the shortcut menu. The name of the query sequence is displayed in the status area.
Display the percentage similarity to the query sequence to the right of the sequence. Two residues are similar if they have a positive BLOSUM62 pairwise score. The percentage similarity is calculated from the number of similar residues divided by the number of aligned residues. By default, the query sequence is the consensus sequence. You can set the query sequence by right-clicking on a sequence and choosing Set as Query from the shortcut menu. The name of the query sequence is displayed in the status area.
Display the percentage homology to the query sequence to the right of the sequence. Homology is calculated as the percentage of residues with identical side-chain chemical properties (as defined for the Side-Chain Chemistry color scheme). By default, the query sequence is the consensus sequence. You can set the query sequence by right-clicking on a sequence and choosing Set as Query from the shortcut menu. The name of the query sequence is displayed in the status area.
Display the BLOSUM62 similarity score to the right of the sequence. The score is calculated relative to the query sequence.
When calculating sequence identity, count gaps as though they were residues, rather than ignoring them. For example, a column consisting of 2 different residues and 8 gaps would have a sequence identity of 20% if gaps are included, but 50% if gaps are ignored.
When calculating sequence identity, restrict the calculation to the columns (residues) that are selected. This allows you to calculate the identity of regions of a sequence rather than the whole sequence.
Update the internal profile that is used for sequence identity and consensus calculations, and sequence coloring. If Automatically Update Sequence Profile is off, use this command to manually update the profile before doing calculations or applying coloring.
Update the internal profile that is used for sequence identity and consensus calculations, and sequence coloring automatically when changes are made. Deselect this option to improve performance when you have a large number of sequences.
Display a dialog box requesting confirmation of the action before retrieving information from a web server.
Reset all settings on the Settings menu to the default values.
There are two rows of tools on the tool bars. The first row contains toolbars that only have buttons. These toolbars are described together. The second row contains toolbars that have other tools besides buttons. These toolbars are described separately. You can show or hide any of the toolbars from the shortcut menu.
The buttons on the toolbars that only have buttons are described below.
Import Sequences Import sequences into the Multiple Sequence Viewer project from a file. The file must be in FASTA or PDB format. Opens a file selector, in which you can navigate to and select the file. Same as File → Import Sequences. |
|
Export Sequences Export sequences from the Multiple Sequence Viewer project to a FASTA file. Opens a file selector, in which you can navigate to a location and name the file. Same as File → Export Sequences. |
|
Undo Undo the last action. Same as Edit → Undo. |
|
Redo Redo the last action. Same as Edit → Redo. |
|
Lock gaps Lock gaps in the sequences so that they are not filled when performing manual alignment. The locking is applied once. Gaps created after locking is done are not automatically locked. To lock them, click this button again. Locked gaps are indicated by a dash ( - ); unlocked gaps are indicated by a tilde
(~ ). If you have a residue selection, gaps are only locked in the
selected region.
|
|
Unlock gaps Unlock gaps in the sequences after they have been locked. This allows gaps to be filled when performing manual alignment. |
|
Pairwise Alignment Align multiple sequences pairwise using ClustalW. Same as Alignment → Pairwise Alignment. |
|
Multiple Alignment Align multiple sequences simultaneously using ClustalW. Same as Alignment → Multiple Alignment. |
|
Color Matching Residues Only Apply the current color scheme only to residues that are identical in all sequences (gaps are ignored). |
|
Weight Colors by Alignment Quality Set the color density according to the sequence identity. |
|
Average Colors in Columns Average the colors in each column and color all residues in the column by the average color. |
|
Zoom in Increase the width of each residue so that the horizontal scale is expanded. |
|
Zoom out Decrease the width of each residue so that the horizontal scale is contracted. When the residues are narrower than the text, the text is no longer displayed. The residues can be identified by their tooltips. |
|
Wrap Sequences Wrap the sequences so that the display consists of multiple rows of sequences, and can be scrolled vertically. When unselected, display the sequences in a single row that scrolls horizontally. Same as Settings → Wrap Sequences. |
|
Build Homology Model Build a 3D model of the sequence using Prime. Same as Tools → Build 3D Model. |
The Mode option menu allows you to select one of four sequence editing modes so that you can edit the alignment, and in some cases, edit the sequence itself.
You can lock the sequence downstream (to the right) of the residue or block that you are moving, so that the downstream part of the sequence moves as a block, without creating or removing gaps. To lock or unlock the sequence downstream, click the Lock Sequence Downstream button to the right of the Mode option menu. Locking is on by default.
You can also use the Lock Gaps and Unlock Gaps toolbar buttons to prevent gaps from collapsing while editing the alignment.
The four sequence editing modes are described below.
Use this mode to select multiple residues and slide them.
To select residues, drag over the residues, or shift-click the first and last residues in the range. You can drag across multiple sequences to select residues, and you can drag in the ruler to select residues in all sequences.
To deselect selected residues, control-click the residues.
To slide the selected residues, drag them to their new location.
Use this mode to drag single residues to a new location. If there are residues adjacent to the residue you drag, they are also moved as you drag. Gaps are filled as you drag, unless they are locked.
Edit the sequence or the SSPs by typing. You can mutate residues by typing in the replacement residue code, and delete residues. If the sequence is a Maestro sequence, a mutation operation also mutates the residues in the structure, but you must have Allow Structural Changes selected on the Maestro menu to perform the mutation. Deletions do not change the structure, regardless of the Allow Structural Changes setting. You can also edit sequences as text by using the tools on the Edit menu. You can change an SSP by typing in the replacement code (E, H, or -).
The allowed key strokes are listed below.
Insert single gaps by clicking with the left mouse button. Delete single gaps by clicking with the right mouse button. If multiple sequences are selected and you click in one of the selected sequences, the gaps are inserted or deleted in all selected sequences. If you click in a single selected sequence or in a sequence that is not selected, the gaps are inserted or deleted only in the sequence you clicked on.
The Fetch tool allows you to fetch sequences from the Protein Data Bank or the Entrez Protein Database, from the appropriate web sites.
To fetch sequences from the PDB, type the four-character code into the Fetch text box and press ENTER. The sequence is retrieved from the RCSB web site and added to the project.
To fetch sequences from Entrez, type the access code into the Fetch text box and press ENTER.
The access code format is database|
code, where
database is the code for the database (gi
for GeneBank,
pdb
for the Protein Data Bank, emb
for the EMBL
Sequence Database, and so on), and code is the sequence code for the
database. Examples: gi|12345
, pdb|2aba
,
emb|CAA44029.1
. The sequence is retrieved from the NCBI web site
and added to the project.
This toolbar provides a tool for finding a pattern in the sequences. The pattern used in the search is an extended PROSITE pattern, which makes use of secondary structure and property information. The pattern has the following syntax.
residue | Find occurrences of the specified residue. The residue must be given as
an upper case letter. For example, A finds alanine. |
[ list] |
Find occurrences of any of the residues listed. For example
[AIL] finds occurrences of A, I, or L |
{ list} |
Exclude all occurrences of any of the residues listed. For example
{ED} ensures that occurrences of E and D are not found. |
a |
Find acidic residues (D and E) |
b |
Find basic residues (K and R) |
e |
Find residues in an extended (beta-strand) region |
f |
Find residues in a flexible region (B-factor greater than the chain average) |
h |
Find residues in a helical region |
o |
Find hydrophobic residues (A, C, F, I, L, P, V, W, and Y) |
p |
Find aromatic residues (F, W, and Y) |
s |
Find solvent-exposed residues |
x . ? |
Find any residue. Any of these three characters can be used. |
@ number |
Find the residues with the specified PDB residue number (not ruler position). Insertion codes are not recognized, so all residues with a given insertion code are found. |
( m) ( m, n) |
Find the specified number of contiguous occurrences of a residue (or residue
type). The second form specifies a variable number of occurrences,
e.g. o(2-4) means find two to four consecutive hydrophobic
residues.
|
To run a simple search for a sequence of residues, just type in the sequence.
For a more complex search, in which you apply conditions at each residue
position, you can combine these elements to create a search pattern, such as
G-[IL]-o{AC}
. There is an implied AND between contiguous elements:
so in the example given, o{AC}
means a hydrophobic residue that is
not Ala or Cys. Each such sequence of elements that applies to a single residue
must be separated from the next sequence by a -
character. The
search takes place when you press Enter. The patterns
that are found are highlighted, and all other residues are colored white.
As well as typing in patterns, you can store and retrieve your own patterns from the Select Pattern option menu. The patterns are listed at the top of this option menu, with four default items, Deamidation Site, Glycosylation Site, Proteolysis Site, and Oxidation Site. The last item is Edit Patterns, which opens the Edit Patterns dialog box, in which you can change patterns, including these default patterns, and add and delete patterns. You can edit the table cells to change the pattern name, the definition, and the "hotspot". The Hotspot column contains the residue index in the pattern that should be selected when the pattern is found. This feature is ignored by the MSV but is used in other applications that rely on the MSV (notably BioLuminate).
The MSV can handle multiple query sequences. Each is displayed in a separate tab in the sequence display area. You can create a new tab by clicking the "+" button to the top right of the display area, or by choosing Sequences → New Query. Using the "+" button opens a dialog box to import a sequence; the menu choice creates an empty query. The tabs are labeled Query 1, Query 2, and so on, by default. You can change the name by choosing Rename Query from the tab shortcut menu. Closing a tab with the X button removes the sequences and related data from the MSV.
The sequence display area in each tab is divided into three sections. The first of these is empty until you do an alignment with ClustalW. When the alignment is done, this area contains a phylogeny tree diagram. The second section displays the sequence name and the names of the various annotations. The annotations can be expanded or collapsed individually, by clicking the "tree node" (box) immediately to the left of the sequence name. They can also be expanded or collapsed globally with Ctrl+DOWN ARROW and Ctrl+UP ARROW (⌘UP ARROW). Selected sequences are colored slate blue. The third section displays the sequences and their annotations, global annotations, and a ruler.
Sequences are represented by the standard residue letter symbols. Residues for which atom coordinates are missing are colored in a paler shade than residues for which atom coordinates are available.
Secondary structure predictions are represented by the characters H (helix), E (extended), and - (everything else). Secondary structure assignments are indicated by tubes for helices and arrows for extended structures.
There are three shortcut menus. The sequence shortcut menu opens when you right-click in the sequence name section. The alignment shortcut menu opens when you right-click on a sequence. The query shortcut menu opens when you right-click on the tab name.
This shortcut menu contains items from several of the main menus, and one additional item. If you right-click on a sequence when there is no selection, the sequence you click on is selected. Otherwise, right-clicking does not change the selection. The menu items are listed below, with descriptions or links to their descriptions in the main menus.
Make the selected sequence the query sequence. This item is only present on the menu if there is a single sequence selected. To make the consensus sequence the query, first display it with Annotations → Consensus Sequence, then use this command to make it the query.
Select the residues that are in contact with the ligand. This is a useful way of selecting residues in a binding site. This item is present when you right-click on a Ligand Contacts annotation.
Rename the sequence. Opens a dialog box in which you can enter a new name.
Translate a DNA or RNA sequence into the sequence for the protein it codes for, using standard genetic code.
These items are the same as on the Tools menu.
These items are the same as on the Sequences menu.
This item is the same as on the Edit menu.
This submenu is a copy of the Annotations menu.
Clear (remove) the annotations for the selected sequences.
This shortcut menu contains items from the Alignment menu, and some additional items.
These items are the same as on the Alignment menu.
Set anchors on the residues outside the selection so that they do not move at all during alignment. The residues outside the selection are grayed out. This action prevents new gaps from being created in a sequence, but you can slide residues into existing gaps. To remove the anchors, click in an area outside the sequences.
Remove the anchors that were set by Anchor Residues Outside Selection.
This shortcut menu is displayed when you right-click the tab that contains the query name. It has two items, which are also on the Sequences menu.
Rename the current query. The name appears on the tab for the query.
Create a copy of the selected sequences in the current query in a new tab.
The status area at the foot of the panel displays information on the current task, on the sequences in the MSV project, and on the query sequence.
The Multiple Sequence Viewer was developed in collaboration with Dr. Jano Jusuf and Dr. Stanley Krystek from Bristol-Myers Squibb.
|