The Sequence alignment/map (SAM) format and SAMtools

SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.  SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments. SAM aims to be a format that:

  • Is flexible enough to store all the alignment information generated by various alignment programs;
  • Is simple enough to be easily generated by alignment programs or converted from existing alignment formats;
  • Is compact in file size;
  • Allows most of operations on the alignment to work on a stream without loading the whole alignment into memory;
  • Allows the file to be indexed by genomic position to efficiently retrieve all reads aligning to a locus.

The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

The SAM format consists of one header section and one alignment section. The lines in the header section start with character ‘@’, and lines in the alignment section do not. All lines are TAB delimited. An example is shown in Figure 1b.
In SAM, each alignment line has 11 mandatory ?elds and a variable number of optional ?elds.  They must be present but their value can be a ‘*’or a zero (depending on the ?eld) if the corresponding information is unavailable. The optional
?elds are presented as key-value pairs in the format of TAG:TYPE:VALUE. They store extra information from the platform or aligner. For example, the ‘RG’ tag keeps the ‘read group’ information for each read. In combination with the ‘@RG’ header lines, this tag allows each read to be labeled with metadata about its origin, sequencing center and library. The SAM format speci?cation gives a detailed description of each ?eld and the prede?ned TAGs.

Samtools can be downloaded from here.

 

 

Incoming search terms:

  • The Sequence alignment/map (SAM)format and SAM tools
  • The Sequence Alignment/Map format and SAMtools

Subscribe NGS Updates

Share This Article

You might also like

Leave a Reply

Submit Comment

© 7935 Genomics Gateway. All rights reserved.