A helper class to serve as a EST enCOder-DECoder. More...
#include <ESTCodec.h>
Classes | |
struct | NormalEncoder |
A functor to generate a encoded word (serves as a hash entry). More... | |
struct | RevCompEncoder |
A functor to generate a encoded word (serves as a hash entry). More... | |
Public Member Functions | |
int | encode2rc (const int word) const |
Obtain the reverse-complement for a given word. | |
void | setRevCompTable (const int wordSize) |
Set the reverse-complement translation table to be used whe next time encode2rc method is called. | |
~ESTCodec () | |
The destructor. | |
Static Public Member Functions | |
static ESTCodec & | getCodec () |
Obtain reference to process-wide unqiue instance of ESTCodec. | |
static char | encode (const char bp) |
Obtain 2-bit code for a given base pair. | |
static char | encode2rc (const char bp) |
Obtain 2-bit complement code for a given base pair. | |
Protected Member Functions | |
ESTCodec () | |
The constructor. | |
int * | addRevCompTable (const int wordSize) |
Creates and adds a new reverse-complement translation table for the given word size. | |
Private Attributes | |
HashMap< int, int * > | revCompTables |
A hash map that holds tables to aid in translating a given word to its reverse complement. | |
const int * | revCompTable |
The reverse-complement translation table to be used by the encode2rc method. | |
Static Private Attributes | |
static char | charToInt [] |
A simple array to map characters A , T , C , and G to 0 , 3 , 2 , and 1 respectively. | |
static char | charToIntComp [] |
A simple array to map characters A , T , C , and G to complementary encodings 3 , 0 , 1 , and 2 respectively. | |
static ESTCodec | estCodec |
The process-wide unique codec instance. |
A helper class to serve as a EST enCOder-DECoder.
This class was introduced to try and centralize a bunch of EST encoding/decoding code that was spread out amonst multiple independent classes. Most of the EST analysis algorithms try and encode the base pairs (A
, T
, C
, and G
) in ESTs into 2-bits each (00
, 11
, 10
, and 01
) to reduce memory footprint and to enable use of numerical operations. The code to perform encoding and decoding was spread across multiple classes on a "as needed" basis. However, more than three classes required pretty much the same code warranting the introduction of this class to minimize code-redundancy.
There is one process-wide unique instance of this ESTCodec class. A single instance is used to create commonly used tables to enable rapid CODEC operations. These tables are created once, when the globally unique EST object is referenced. The globally unique instance can be referenced via the getCodec() method.
Definition at line 59 of file ESTCodec.h.
ESTCodec::~ESTCodec | ( | ) |
The destructor.
The destructor frees up memory allocated to hold translation tables etc. The destructor is called only once, when the process-wide unique instance is destroyed when program terminates.
Definition at line 70 of file ESTCodec.cpp.
References revCompTables.
ESTCodec::ESTCodec | ( | ) | [protected] |
The constructor.
The constructor is invoked only once when the process-wide unique static instance of the ESTCodec is created when the process starts. The constructor initializes the CharToInt array that is used to translate base pairs (A
, T
, C
, and G
) into 2-bits codes (00
, 11
, 10
, and 01
)
Definition at line 51 of file ESTCodec.cpp.
References charToInt, charToIntComp, and revCompTable.
int * ESTCodec::addRevCompTable | ( | const int | wordSize | ) | [protected] |
Creates and adds a new reverse-complement translation table for the given word size.
This method is a helper method that is invoked from the setRevCompTable
whenever a new reverse-complement translation table is needed. This method creates a reverse-complement table with 4wordSize entries.
[in] | wordSize | The number of base pairs in the word for which a reverse-complement translation table is to be created. |
Definition at line 93 of file ESTCodec.cpp.
References revCompTables.
Referenced by setRevCompTable().
static char ESTCodec::encode | ( | const char | bp | ) | [inline, static] |
Obtain 2-bit code for a given base pair.
This method can be used to obtain the 2-bit encoding for a given base pair (bp). This method essentially translate base pairs (A
, T
, C
, and G
, both upper and lower case) into 2-bits codes (00
, 11
, 10
, and 01
).
[in] | bp | The base pair character (both upper and lower cases are handled correctly) to be encoded. |
Definition at line 92 of file ESTCodec.h.
References charToInt.
Referenced by ESTCodec::NormalEncoder< Shift, Mask >::operator()().
int ESTCodec::encode2rc | ( | const int | word | ) | const [inline] |
Obtain the reverse-complement for a given word.
This method can be used to obtain the reverse-complement encoding for a given encoded word. This method essentially translates a given encoded word to its reverse-complement representation.
setRevCompTable
method is invoked.[in] | word | The encoded word that must be translated to its corresponding reverse complement representation. |
Definition at line 135 of file ESTCodec.h.
References revCompTable.
static char ESTCodec::encode2rc | ( | const char | bp | ) | [inline, static] |
Obtain 2-bit complement code for a given base pair.
This method can be used to obtain the complementary 2-bit encoding for a given base pair (bp). This method essentially translate base pairs (A
, T
, C
, and G
, both upper and lower case) into 2-bits codes (11
, 00
, 01
, and 10
).
[in] | bp | The base pair character (both upper and lower cases are handled correctly) whose complementary encoding is required. |
Definition at line 112 of file ESTCodec.h.
References charToIntComp.
Referenced by ESTCodec::RevCompEncoder< Shift, Mask >::operator()(), UVSampleHeuristic::setReferenceEST(), and NewUVHeuristic::setReferenceEST().
static ESTCodec& ESTCodec::getCodec | ( | ) | [inline, static] |
Obtain reference to process-wide unqiue instance of ESTCodec.
This method must be used to obtain the process-wide unique instance of the ESTCodec object. The returned reference can be used to invoke other methods in this class. Here is a typical usage:
const ESTCodec& codec = ESTCodec::getCodec(); int bitCodec = codec.encode('A');
Definition at line 75 of file ESTCodec.h.
References estCodec.
Referenced by UVSampleHeuristic::setReferenceEST(), and NewUVHeuristic::setReferenceEST().
void ESTCodec::setRevCompTable | ( | const int | wordSize | ) |
Set the reverse-complement translation table to be used whe next time encode2rc method is called.
This method must be invoked to set the correct translation table to be used by the encode2rc(int) method. If a translation table does not exist in the revCompTables
, then a new reverse-complement table is created by the addRevCompTable
method.
[in] | wordSize | The number of base pairs in the word for which a reverse-complement translation table is to be created. |
Definition at line 84 of file ESTCodec.cpp.
References addRevCompTable(), ASSERT, revCompTable, and revCompTables.
Referenced by UVSampleHeuristic::setReferenceEST(), and NewUVHeuristic::setReferenceEST().
char ESTCodec::charToInt [static, private] |
A simple array to map characters A
, T
, C
, and G
to 0
, 3
, 2
, and 1
respectively.
This is a simple array of 255 entries that are used to convert the base pair encoding characters A
, T
, C
, and G
to 0
, 3
, 2
, and 1
respectively. This encoding is typically used to compute the hash as defined by various EST analysis algorithms. This array is statically allocated. It is initialized in the constructor and is never changed during the life time of this class.
Definition at line 333 of file ESTCodec.h.
Referenced by encode(), and ESTCodec().
char ESTCodec::charToIntComp [static, private] |
A simple array to map characters A
, T
, C
, and G
to complementary encodings 3
, 0
, 1
, and 2
respectively.
This is a simple array of 255 entries that are used to convert the base pair encoding characters A
, T
, C
, and G
to complementary codes 3
, 0
, 1
, and 2
respectively. This encoding is typically used to compute the hash as defined by various EST analysis algorithms. This array is statically allocated. It is initialized in the constructor and is never changed during the life time of this class.
Definition at line 354 of file ESTCodec.h.
Referenced by encode2rc(), and ESTCodec().
ESTCodec ESTCodec::estCodec [static, private] |
The process-wide unique codec instance.
This instance variable is a process-wide unique codec that is created when the process is started and is destroyed only when the process terminates.
Definition at line 396 of file ESTCodec.h.
Referenced by getCodec().
const int* ESTCodec::revCompTable [private] |
The reverse-complement translation table to be used by the encode2rc method.
This array is set by the setRevCompTable
method to refer to the reverse-complement translation table to translate words of given size to their corresponding reverse-complement encodings.
Definition at line 388 of file ESTCodec.h.
Referenced by encode2rc(), ESTCodec(), and setRevCompTable().
HashMap<int, int*> ESTCodec::revCompTables [private] |
A hash map that holds tables to aid in translating a given word to its reverse complement.
Converting a given encoded word (some fixed n number of base pairs, with each base pair encoded into 2-bits) to its reverse complement (that is, given the encoded sequence for attcggct
it must be converted to the encoded sequence for agccgaat
) needs to be computed as a part of EST analysis algorithms and heuristics. In order to enable rapid translation pre-populated tabes are used.
However, the reverse-complement translation tables need to have entries corresponding to the size of words to be translated. Different algorithms use different word sizes (such as: 8 bps or 10 bps etc). Accordingly, this hash_map is used to hold pre-computed reverse-complement translation tables. The key in the hash map is the word size. The translation tables contained in this hash map are used via the setRevCompTable
method. If a reverse-complement entry does not exist, then a new entry is added by the addRevCompTable method.
Definition at line 378 of file ESTCodec.h.
Referenced by addRevCompTable(), setRevCompTable(), and ~ESTCodec().