NewUVHeuristic Class Reference

Heuristic based upon the "u/v sample heuristic" used in WCD, a type of common word heuristic. More...

#include <NewUVHeuristic.h>

Inheritance diagram for NewUVHeuristic:
Inheritance graph
[legend]
Collaboration diagram for NewUVHeuristic:
Collaboration graph
[legend]

List of all members.

Public Member Functions

virtual void showArguments (std::ostream &os)
 Display valid command line arguments for this heuristic.
virtual bool parseArguments (int &argc, char **argv)
 Process command line arguments.
virtual int initialize ()
 Method to begin heuristic analysis (if any).
virtual int setReferenceEST (const int estIdx)
 Set the reference EST id for analysis.
virtual ~NewUVHeuristic ()
 The destructor.

Protected Member Functions

 NewUVHeuristic (const std::string &name, const std::string &outputFileName)
 The default constructor.
virtual bool runHeuristic (const int otherEST)
 Determine whether the analyzer should analyze, according to this heuristic.
void computeHash (const int estIdx)
 Method to compute the u/v hash values for a given EST.

Protected Attributes

int refESTLen
int otherESTLen
int u
int wordShift
int passes
char * s1WordMap
 Instance variable to track if a given word (of length v) appears in reference EST.
char * s1RCWordMap
 Instance variable to track if a given word (of length v) appears in reference EST.
bool bestMatchIsRC
 Flag to indicate if normal or reverse-complement version provided best match.

Static Protected Attributes

static int v = 8
 Instance variable to maintain the v parameter for the u/v heuristic.
static int BitMask = 0
static int bitsToShift = 0
 Instance variable to store the number of bits to be shifted to create hash values.

Private Attributes

UVHashTable uvCache
 A hash map to cache hash values (v base pairs in length) to seedup u/v heuristic.
const std::string hintKey
 The hint key that is used to add hint for normal or reverse-complement D2 computation.

Static Private Attributes

static arg_parser::arg_record argsList []
 The set of arguments specific to the UV heuristic.

Friends

class HeuristicFactory

Detailed Description

Heuristic based upon the "u/v sample heuristic" used in WCD, a type of common word heuristic.

Considers all words of length v in the first sequence and every 16th word of length v in the second sequence. Returns true if it finds at least u common words.

Definition at line 67 of file NewUVHeuristic.h.


Constructor & Destructor Documentation

NewUVHeuristic::~NewUVHeuristic (  )  [virtual]

The destructor.

The destructor frees memory allocated for holding any dynamic data in the base class.

Definition at line 73 of file NewUVHeuristic.cpp.

References s1RCWordMap, s1WordMap, and uvCache.

NewUVHeuristic::NewUVHeuristic ( const std::string &  name,
const std::string &  outputFileName 
) [protected]

The default constructor.

The constructor has been made protected to ensure that this class is never directly instantiated. Instead one of the derived Heuristic classes must be instantiated via the HeuristicFactory API methods.

Parameters:
[in] name The human readable name for this heuristic. This name is used when generating errors, warnings, and other output messages for this heuristic.
[in] outputFileName The output file to which any analysis information is to be written. Currently this parameter is unused.

Definition at line 58 of file NewUVHeuristic.cpp.

References otherESTLen, passes, refESTLen, s1RCWordMap, s1WordMap, u, and wordShift.


Member Function Documentation

void NewUVHeuristic::computeHash ( const int  estIdx  )  [protected]

Method to compute the u/v hash values for a given EST.

This method is a utility method that was introduced to streamline the process of computing and caching u/v hash values for a given EST. This method uses the variables u, v, and wordShift (all of them user configurable) along with a ESTCodec::NormalEncoder object to compute hash values into a std::vector. The vector is added to the uvCache hash map for future reference.

Parameters:
[in] estIdx The zero-based index of the EST whose hash values is to be computed and cached.

Definition at line 182 of file NewUVHeuristic.cpp.

References ASSERT, EST::getEST(), EST::getSequence(), otherESTLen, uvCache, v, and wordShift.

Referenced by runHeuristic().

int NewUVHeuristic::initialize (  )  [virtual]

Method to begin heuristic analysis (if any).

This method is invoked just before commencement of EST analysis. This method typically loads additional information that may be necessary for a given heuristic from data files. In addition, it may perform any pre-processing as the case may be.

Note:
Derived classes must override this method.
Returns:
If the initialization process was sucessful, then this method returns 0. Otherwise this method returns with a non-zero error code.

Implements Heuristic.

Reimplemented in TVHeuristic.

Definition at line 118 of file NewUVHeuristic.cpp.

References bitsToShift, s1RCWordMap, s1WordMap, and v.

bool NewUVHeuristic::parseArguments ( int &  argc,
char **  argv 
) [virtual]

Process command line arguments.

This method is used to process command line arguments specific to this heuristic. This method is typically used from the main method just after the heuristic has been instantiated. This method consumes all valid command line arguments. If the command line arguments were valid and successfully processed, then this method returns true.

Note:
Derived heuristic classes must override this method to process any command line arguments that are custom to their operation. When this method is overridden don't forget to call the corresponding base class implementation to display common options.
Parameters:
[in,out] argc The number of command line arguments to be processed.
[in,out] argv The array of command line arguments.
Returns:
This method returns true if the command line arguments were successfully processed. Otherwise this method returns false.

Implements Heuristic.

Reimplemented in TVHeuristic.

Definition at line 93 of file NewUVHeuristic.cpp.

References arg_parser::check_args(), Heuristic::heuristicName, passes, u, v, and wordShift.

Referenced by TVHeuristic::parseArguments().

bool NewUVHeuristic::runHeuristic ( const int  otherEST  )  [protected, virtual]

Determine whether the analyzer should analyze, according to this heuristic.

This method can be used to compare a given EST with the reference EST (set via the call to the setReferenceEST()) method.

Parameters:
[in] otherEST The index (zero based) of the EST with which the reference EST is to be compared.
Returns:
This method returns true if the heuristic says the EST pair should be analyzed, and false if it should not.

Implements Heuristic.

Reimplemented in TVHeuristic.

Definition at line 219 of file NewUVHeuristic.cpp.

References bestMatchIsRC, computeHash(), EST::getESTCount(), HeuristicChain::getHeuristicChain(), hintKey, passes, Heuristic::refESTidx, s1RCWordMap, s1WordMap, HeuristicChain::setHint(), u, uvCache, and VALIDATE.

Referenced by TVHeuristic::runHeuristic().

int NewUVHeuristic::setReferenceEST ( const int  estIdx  )  [virtual]

Set the reference EST id for analysis.

This method is invoked just before a batch of ESTs are analyzed via a call to the analyze(EST *) method. Setting the reference EST provides heuristics an opportunity to optimize certain operations, if possible.

Note:
This method must be called only after the initialize() method is called.
Returns:
If the initialization process was sucessful, then this method returns 0. Otherwise this method returns an error code.

Implements Heuristic.

Reimplemented in TVHeuristic.

Definition at line 131 of file NewUVHeuristic.cpp.

References ASSERT, BitMask, ESTCodec::encode2rc(), ESTCodec::getCodec(), EST::getEST(), EST::getESTCount(), EST::getSequence(), Heuristic::refESTidx, refESTLen, s1RCWordMap, s1WordMap, ESTCodec::setRevCompTable(), and v.

void NewUVHeuristic::showArguments ( std::ostream &  os  )  [virtual]

Display valid command line arguments for this heuristic.

This method must be used to display all valid command line options that are supported by this heuristic. Note that derived classes may override this method to display additional command line options that are applicable to it. This method is typically used in the main() method when displaying usage information.

Note:
Derived heuristic classes must override this method to display help for their custom command line arguments. When this method is overridden don't forget to call the corresponding base class implementation to display common options.
Parameters:
[out] os The output stream to which the valid command line arguments must be written.

Implements Heuristic.

Reimplemented in TVHeuristic.

Definition at line 86 of file NewUVHeuristic.cpp.


Friends And Related Function Documentation

friend class HeuristicFactory [friend]

Reimplemented in TVHeuristic.

Definition at line 68 of file NewUVHeuristic.h.


Member Data Documentation

Initial value:
 {
    {"--uv_v", "v (length of common words) (default=8)",
     &NewUVHeuristic::v, arg_parser::INTEGER},
    {NULL, NULL, NULL, arg_parser::BOOLEAN}
}

The set of arguments specific to the UV heuristic.

This instance variable contains a static list of arguments that are specific only to this analyzer class. This argument list is statically defined and shared by all instances of this class.

Note:
Use of static arguments and parameters renders this UV sample heuristic class not to be MT-safe.

Reimplemented in TVHeuristic.

Definition at line 279 of file NewUVHeuristic.h.

Flag to indicate if normal or reverse-complement version provided best match.

This flag is set at the end of the runHeuristic method in this class to indicate if the normal or the reverse-complement check yielded the best possible match. If this flag is false, then the normal check yielded the best match. If this value is true, then the reverse-complement check yielded the best match.

Definition at line 240 of file NewUVHeuristic.h.

Referenced by TVHeuristic::runHeuristic(), and runHeuristic().

int NewUVHeuristic::BitMask = 0 [static, protected]

Definition at line 248 of file NewUVHeuristic.h.

Referenced by setReferenceEST().

int NewUVHeuristic::bitsToShift = 0 [static, protected]

Instance variable to store the number of bits to be shifted to create hash values.

This instance variable is set to the value of 2 * (v - 1) (in the initialize method) to reflect the number of bits that need to be shifted in order to build the hash values for common words (including the values stored in s1WordMap and s1RCWordMap).

This instance variable is actually passed on to the ESTCodec::NormalEncoder or ESTCodec::RevCompEncoder when computing hash values. Since this is value is passed in a template parameter, it is defined to be static (to ensure that it has external linkage as per the ISO/ANSI standards requirement).

Definition at line 266 of file NewUVHeuristic.h.

Referenced by initialize().

const std::string NewUVHeuristic::hintKey [private]

The hint key that is used to add hint for normal or reverse-complement D2 computation.

This hint key is used to set a hint in the hints hash map. This string is defined as a constant to save compute time in the core runHeuristics method.

Definition at line 312 of file NewUVHeuristic.h.

Referenced by runHeuristic().

int NewUVHeuristic::otherESTLen [protected]
int NewUVHeuristic::passes [protected]
int NewUVHeuristic::refESTLen [protected]
char* NewUVHeuristic::s1RCWordMap [protected]

Instance variable to track if a given word (of length v) appears in reference EST.

This instance variable is created in the initialize() method to point to an array of 4v

Definition at line 228 of file NewUVHeuristic.h.

Referenced by initialize(), NewUVHeuristic(), TVHeuristic::runHeuristic(), runHeuristic(), setReferenceEST(), and ~NewUVHeuristic().

char* NewUVHeuristic::s1WordMap [protected]

Instance variable to track if a given word (of length v) appears in reference EST.

This instance variable is created in the initialize() method to point to an array of 4v

Definition at line 220 of file NewUVHeuristic.h.

Referenced by initialize(), NewUVHeuristic(), TVHeuristic::runHeuristic(), runHeuristic(), setReferenceEST(), and ~NewUVHeuristic().

int NewUVHeuristic::u [protected]

A hash map to cache hash values (v base pairs in length) to seedup u/v heuristic.

The u/v heuristic used to generate hash values (in the runHeuristic method) by iterating over the base pairs in a given EST sequence. However, this approach turned out to be rather slow. Furthermore, it was observed that the same set of hash values were recomputed for various pair-wise EST comparisons.

Therefore, to improve the overall performance of the u/v heuristic, it was proposed that the hash values be cached to improve performance. Of course, this does increase the net amount of memory consumed. Consequently, to aid in caching only the required subset of EST hash values, this hash map was introduced to store the necessary sequences and rapidly access them when needed.

The entries in this hash map are computed by the computeHash method, which is invoked from the runHeuristic method.

Definition at line 303 of file NewUVHeuristic.h.

Referenced by computeHash(), runHeuristic(), and ~NewUVHeuristic().

int NewUVHeuristic::v = 8 [static, protected]

Instance variable to maintain the v parameter for the u/v heuristic.

Definition at line 246 of file NewUVHeuristic.h.

Referenced by computeHash(), TVHeuristic::countCommonWords(), TVHeuristic::initialize(), initialize(), parseArguments(), and setReferenceEST().

int NewUVHeuristic::wordShift [protected]

The documentation for this class was generated from the following files:

Generated on 19 Mar 2010 for PEACE by  doxygen 1.6.1