ESTCodec Class Reference

A helper class to serve as a EST enCOder-DECoder. More...

#include <ESTCodec.h>

Collaboration diagram for ESTCodec:
Collaboration graph
[legend]

List of all members.

Classes

struct  NormalEncoder
 A functor to generate a encoded word (serves as a hash entry). More...
struct  RevCompEncoder
 A functor to generate a encoded word (serves as a hash entry). More...

Public Member Functions

int encode2rc (const int word) const
 Obtain the reverse-complement for a given word.
void setRevCompTable (const int wordSize)
 Set the reverse-complement translation table to be used whe next time encode2rc method is called.
 ~ESTCodec ()
 The destructor.

Static Public Member Functions

static ESTCodecgetCodec ()
 Obtain reference to process-wide unqiue instance of ESTCodec.
static char encode (const char bp)
 Obtain 2-bit code for a given base pair.
static char encode2rc (const char bp)
 Obtain 2-bit complement code for a given base pair.

Protected Member Functions

 ESTCodec ()
 The constructor.
int * addRevCompTable (const int wordSize)
 Creates and adds a new reverse-complement translation table for the given word size.

Private Attributes

HashMap< int, int * > revCompTables
 A hash map that holds tables to aid in translating a given word to its reverse complement.
const int * revCompTable
 The reverse-complement translation table to be used by the encode2rc method.

Static Private Attributes

static char charToInt []
 A simple array to map characters A, T, C, and G to 0, 3, 2, and 1 respectively.
static char charToIntComp []
 A simple array to map characters A, T, C, and G to complementary encodings 3, 0, 1, and 2 respectively.
static ESTCodec estCodec
 The process-wide unique codec instance.

Detailed Description

A helper class to serve as a EST enCOder-DECoder.

This class was introduced to try and centralize a bunch of EST encoding/decoding code that was spread out amonst multiple independent classes. Most of the EST analysis algorithms try and encode the base pairs (A, T, C, and G) in ESTs into 2-bits each (00, 11, 10, and 01) to reduce memory footprint and to enable use of numerical operations. The code to perform encoding and decoding was spread across multiple classes on a "as needed" basis. However, more than three classes required pretty much the same code warranting the introduction of this class to minimize code-redundancy.

There is one process-wide unique instance of this ESTCodec class. A single instance is used to create commonly used tables to enable rapid CODEC operations. These tables are created once, when the globally unique EST object is referenced. The globally unique instance can be referenced via the getCodec() method.

Definition at line 59 of file ESTCodec.h.


Constructor & Destructor Documentation

ESTCodec::~ESTCodec (  ) 

The destructor.

The destructor frees up memory allocated to hold translation tables etc. The destructor is called only once, when the process-wide unique instance is destroyed when program terminates.

Definition at line 70 of file ESTCodec.cpp.

References revCompTables.

ESTCodec::ESTCodec (  )  [protected]

The constructor.

The constructor is invoked only once when the process-wide unique static instance of the ESTCodec is created when the process starts. The constructor initializes the CharToInt array that is used to translate base pairs (A, T, C, and G) into 2-bits codes (00, 11, 10, and 01)

Definition at line 51 of file ESTCodec.cpp.

References charToInt, charToIntComp, and revCompTable.


Member Function Documentation

int * ESTCodec::addRevCompTable ( const int  wordSize  )  [protected]

Creates and adds a new reverse-complement translation table for the given word size.

This method is a helper method that is invoked from the setRevCompTable whenever a new reverse-complement translation table is needed. This method creates a reverse-complement table with 4wordSize entries.

Parameters:
[in] wordSize The number of base pairs in the word for which a reverse-complement translation table is to be created.
Returns:
This method returns the newly created reverse-complement translation table.

Definition at line 93 of file ESTCodec.cpp.

References revCompTables.

Referenced by setRevCompTable().

static char ESTCodec::encode ( const char  bp  )  [inline, static]

Obtain 2-bit code for a given base pair.

This method can be used to obtain the 2-bit encoding for a given base pair (bp). This method essentially translate base pairs (A, T, C, and G, both upper and lower case) into 2-bits codes (00, 11, 10, and 01).

Note:
In favor of speed, this method does not perform any special checks on the actual character in bp. It is the responsiblity of the caller to ensure that this method is invoked with appropriate parameter value.
Parameters:
[in] bp The base pair character (both upper and lower cases are handled correctly) to be encoded.

Definition at line 92 of file ESTCodec.h.

References charToInt.

Referenced by ESTCodec::NormalEncoder< Shift, Mask >::operator()().

int ESTCodec::encode2rc ( const int  word  )  const [inline]

Obtain the reverse-complement for a given word.

This method can be used to obtain the reverse-complement encoding for a given encoded word. This method essentially translates a given encoded word to its reverse-complement representation.

Note:
In favor of speed, this method does not perform any special checks on the word to be translated. It is the responsiblity of the caller to ensure that this method is invoked with appropriate parameter value after the setRevCompTable method is invoked.
Parameters:
[in] word The encoded word that must be translated to its corresponding reverse complement representation.
Returns:
The reverse-complement representation for a given word.

Definition at line 135 of file ESTCodec.h.

References revCompTable.

static char ESTCodec::encode2rc ( const char  bp  )  [inline, static]

Obtain 2-bit complement code for a given base pair.

This method can be used to obtain the complementary 2-bit encoding for a given base pair (bp). This method essentially translate base pairs (A, T, C, and G, both upper and lower case) into 2-bits codes (11, 00, 01, and 10).

Note:
In favor of speed, this method does not perform any special checks on the actual character in bp. It is the responsiblity of the caller to ensure that this method is invoked with appropriate parameter value.
Parameters:
[in] bp The base pair character (both upper and lower cases are handled correctly) whose complementary encoding is required.

Definition at line 112 of file ESTCodec.h.

References charToIntComp.

Referenced by ESTCodec::RevCompEncoder< Shift, Mask >::operator()(), UVSampleHeuristic::setReferenceEST(), and NewUVHeuristic::setReferenceEST().

static ESTCodec& ESTCodec::getCodec (  )  [inline, static]

Obtain reference to process-wide unqiue instance of ESTCodec.

This method must be used to obtain the process-wide unique instance of the ESTCodec object. The returned reference can be used to invoke other methods in this class. Here is a typical usage:

        const ESTCodec& codec = ESTCodec::getCodec();
        int bitCodec = codec.encode('A');

Definition at line 75 of file ESTCodec.h.

References estCodec.

Referenced by UVSampleHeuristic::setReferenceEST(), and NewUVHeuristic::setReferenceEST().

void ESTCodec::setRevCompTable ( const int  wordSize  ) 

Set the reverse-complement translation table to be used whe next time encode2rc method is called.

This method must be invoked to set the correct translation table to be used by the encode2rc(int) method. If a translation table does not exist in the revCompTables, then a new reverse-complement table is created by the addRevCompTable method.

Parameters:
[in] wordSize The number of base pairs in the word for which a reverse-complement translation table is to be created.

Definition at line 84 of file ESTCodec.cpp.

References addRevCompTable(), ASSERT, revCompTable, and revCompTables.

Referenced by UVSampleHeuristic::setReferenceEST(), and NewUVHeuristic::setReferenceEST().


Member Data Documentation

char ESTCodec::charToInt [static, private]

A simple array to map characters A, T, C, and G to 0, 3, 2, and 1 respectively.

This is a simple array of 255 entries that are used to convert the base pair encoding characters A, T, C, and G to 0, 3, 2, and 1 respectively. This encoding is typically used to compute the hash as defined by various EST analysis algorithms. This array is statically allocated. It is initialized in the constructor and is never changed during the life time of this class.

Note:
This array is statically allocated to enable ready access from NormalEncoder and RevCompEncoder functors defined in this class. Hopefully with static arrays, the compilers more readily optimize and inline the method calls to enocde and encode2rc.

Definition at line 333 of file ESTCodec.h.

Referenced by encode(), and ESTCodec().

char ESTCodec::charToIntComp [static, private]

A simple array to map characters A, T, C, and G to complementary encodings 3, 0, 1, and 2 respectively.

This is a simple array of 255 entries that are used to convert the base pair encoding characters A, T, C, and G to complementary codes 3, 0, 1, and 2 respectively. This encoding is typically used to compute the hash as defined by various EST analysis algorithms. This array is statically allocated. It is initialized in the constructor and is never changed during the life time of this class.

Note:
This array is statically allocated to enable ready access from NormalEncoder and RevCompEncoder functors defined in this class. Hopefully with static arrays, the compilers more readily optimize and inline the method calls to enocde and encode2rc.

Definition at line 354 of file ESTCodec.h.

Referenced by encode2rc(), and ESTCodec().

ESTCodec ESTCodec::estCodec [static, private]

The process-wide unique codec instance.

This instance variable is a process-wide unique codec that is created when the process is started and is destroyed only when the process terminates.

Definition at line 396 of file ESTCodec.h.

Referenced by getCodec().

const int* ESTCodec::revCompTable [private]

The reverse-complement translation table to be used by the encode2rc method.

This array is set by the setRevCompTable method to refer to the reverse-complement translation table to translate words of given size to their corresponding reverse-complement encodings.

Definition at line 388 of file ESTCodec.h.

Referenced by encode2rc(), ESTCodec(), and setRevCompTable().

HashMap<int, int*> ESTCodec::revCompTables [private]

A hash map that holds tables to aid in translating a given word to its reverse complement.

Converting a given encoded word (some fixed n number of base pairs, with each base pair encoded into 2-bits) to its reverse complement (that is, given the encoded sequence for attcggct it must be converted to the encoded sequence for agccgaat) needs to be computed as a part of EST analysis algorithms and heuristics. In order to enable rapid translation pre-populated tabes are used.

However, the reverse-complement translation tables need to have entries corresponding to the size of words to be translated. Different algorithms use different word sizes (such as: 8 bps or 10 bps etc). Accordingly, this hash_map is used to hold pre-computed reverse-complement translation tables. The key in the hash map is the word size. The translation tables contained in this hash map are used via the setRevCompTable method. If a reverse-complement entry does not exist, then a new entry is added by the addRevCompTable method.

Definition at line 378 of file ESTCodec.h.

Referenced by addRevCompTable(), setRevCompTable(), and ~ESTCodec().


The documentation for this class was generated from the following files:

Generated on 19 Mar 2010 for PEACE by  doxygen 1.6.1