Heuristic based upon the "u/v sample heuristic" used in WCD, a type of common word heuristic. More...
#include <NewUVHeuristic.h>
Public Member Functions | |
virtual void | showArguments (std::ostream &os) |
Display valid command line arguments for this heuristic. | |
virtual bool | parseArguments (int &argc, char **argv) |
Process command line arguments. | |
virtual int | initialize () |
Method to begin heuristic analysis (if any). | |
virtual int | setReferenceEST (const int estIdx) |
Set the reference EST id for analysis. | |
virtual | ~NewUVHeuristic () |
The destructor. | |
Protected Member Functions | |
NewUVHeuristic (const std::string &name, const std::string &outputFileName) | |
The default constructor. | |
virtual bool | runHeuristic (const int otherEST) |
Determine whether the analyzer should analyze, according to this heuristic. | |
void | computeHash (const int estIdx) |
Method to compute the u/v hash values for a given EST. | |
Protected Attributes | |
int | refESTLen |
int | otherESTLen |
int | u |
int | wordShift |
int | passes |
char * | s1WordMap |
Instance variable to track if a given word (of length v ) appears in reference EST. | |
char * | s1RCWordMap |
Instance variable to track if a given word (of length v ) appears in reference EST. | |
bool | bestMatchIsRC |
Flag to indicate if normal or reverse-complement version provided best match. | |
Static Protected Attributes | |
static int | v = 8 |
Instance variable to maintain the v parameter for the u/v heuristic. | |
static int | BitMask = 0 |
static int | bitsToShift = 0 |
Instance variable to store the number of bits to be shifted to create hash values. | |
Private Attributes | |
UVHashTable | uvCache |
A hash map to cache hash values (v base pairs in length) to seedup u/v heuristic. | |
const std::string | hintKey |
The hint key that is used to add hint for normal or reverse-complement D2 computation. | |
Static Private Attributes | |
static arg_parser::arg_record | argsList [] |
The set of arguments specific to the UV heuristic. | |
Friends | |
class | HeuristicFactory |
Heuristic based upon the "u/v sample heuristic" used in WCD, a type of common word heuristic.
Considers all words of length v in the first sequence and every 16th word of length v in the second sequence. Returns true if it finds at least u common words.
Definition at line 67 of file NewUVHeuristic.h.
NewUVHeuristic::~NewUVHeuristic | ( | ) | [virtual] |
The destructor.
The destructor frees memory allocated for holding any dynamic data in the base class.
Definition at line 73 of file NewUVHeuristic.cpp.
References s1RCWordMap, s1WordMap, and uvCache.
NewUVHeuristic::NewUVHeuristic | ( | const std::string & | name, | |
const std::string & | outputFileName | |||
) | [protected] |
The default constructor.
The constructor has been made protected to ensure that this class is never directly instantiated. Instead one of the derived Heuristic classes must be instantiated via the HeuristicFactory API methods.
[in] | name | The human readable name for this heuristic. This name is used when generating errors, warnings, and other output messages for this heuristic. |
[in] | outputFileName | The output file to which any analysis information is to be written. Currently this parameter is unused. |
Definition at line 58 of file NewUVHeuristic.cpp.
References otherESTLen, passes, refESTLen, s1RCWordMap, s1WordMap, u, and wordShift.
void NewUVHeuristic::computeHash | ( | const int | estIdx | ) | [protected] |
Method to compute the u/v hash values for a given EST.
This method is a utility method that was introduced to streamline the process of computing and caching u/v hash values for a given EST. This method uses the variables u
, v
, and wordShift
(all of them user configurable) along with a ESTCodec::NormalEncoder object to compute hash values into a std::vector. The vector is added to the uvCache
hash map for future reference.
[in] | estIdx | The zero-based index of the EST whose hash values is to be computed and cached. |
Definition at line 182 of file NewUVHeuristic.cpp.
References ASSERT, EST::getEST(), EST::getSequence(), otherESTLen, uvCache, v, and wordShift.
Referenced by runHeuristic().
int NewUVHeuristic::initialize | ( | ) | [virtual] |
Method to begin heuristic analysis (if any).
This method is invoked just before commencement of EST analysis. This method typically loads additional information that may be necessary for a given heuristic from data files. In addition, it may perform any pre-processing as the case may be.
Implements Heuristic.
Reimplemented in TVHeuristic.
Definition at line 118 of file NewUVHeuristic.cpp.
References bitsToShift, s1RCWordMap, s1WordMap, and v.
bool NewUVHeuristic::parseArguments | ( | int & | argc, | |
char ** | argv | |||
) | [virtual] |
Process command line arguments.
This method is used to process command line arguments specific to this heuristic. This method is typically used from the main method just after the heuristic has been instantiated. This method consumes all valid command line arguments. If the command line arguments were valid and successfully processed, then this method returns true
.
[in,out] | argc | The number of command line arguments to be processed. |
[in,out] | argv | The array of command line arguments. |
true
if the command line arguments were successfully processed. Otherwise this method returns false
. Implements Heuristic.
Reimplemented in TVHeuristic.
Definition at line 93 of file NewUVHeuristic.cpp.
References arg_parser::check_args(), Heuristic::heuristicName, passes, u, v, and wordShift.
Referenced by TVHeuristic::parseArguments().
bool NewUVHeuristic::runHeuristic | ( | const int | otherEST | ) | [protected, virtual] |
Determine whether the analyzer should analyze, according to this heuristic.
This method can be used to compare a given EST with the reference EST (set via the call to the setReferenceEST()) method.
[in] | otherEST | The index (zero based) of the EST with which the reference EST is to be compared. |
Implements Heuristic.
Reimplemented in TVHeuristic.
Definition at line 219 of file NewUVHeuristic.cpp.
References bestMatchIsRC, computeHash(), EST::getESTCount(), HeuristicChain::getHeuristicChain(), hintKey, passes, Heuristic::refESTidx, s1RCWordMap, s1WordMap, HeuristicChain::setHint(), u, uvCache, and VALIDATE.
Referenced by TVHeuristic::runHeuristic().
int NewUVHeuristic::setReferenceEST | ( | const int | estIdx | ) | [virtual] |
Set the reference EST id for analysis.
This method is invoked just before a batch of ESTs are analyzed via a call to the analyze(EST *) method. Setting the reference EST provides heuristics an opportunity to optimize certain operations, if possible.
Implements Heuristic.
Reimplemented in TVHeuristic.
Definition at line 131 of file NewUVHeuristic.cpp.
References ASSERT, BitMask, ESTCodec::encode2rc(), ESTCodec::getCodec(), EST::getEST(), EST::getESTCount(), EST::getSequence(), Heuristic::refESTidx, refESTLen, s1RCWordMap, s1WordMap, ESTCodec::setRevCompTable(), and v.
void NewUVHeuristic::showArguments | ( | std::ostream & | os | ) | [virtual] |
Display valid command line arguments for this heuristic.
This method must be used to display all valid command line options that are supported by this heuristic. Note that derived classes may override this method to display additional command line options that are applicable to it. This method is typically used in the main() method when displaying usage information.
[out] | os | The output stream to which the valid command line arguments must be written. |
Implements Heuristic.
Reimplemented in TVHeuristic.
Definition at line 86 of file NewUVHeuristic.cpp.
friend class HeuristicFactory [friend] |
Reimplemented in TVHeuristic.
Definition at line 68 of file NewUVHeuristic.h.
arg_parser::arg_record NewUVHeuristic::argsList [static, private] |
{ {"--uv_v", "v (length of common words) (default=8)", &NewUVHeuristic::v, arg_parser::INTEGER}, {NULL, NULL, NULL, arg_parser::BOOLEAN} }
The set of arguments specific to the UV heuristic.
This instance variable contains a static list of arguments that are specific only to this analyzer class. This argument list is statically defined and shared by all instances of this class.
Reimplemented in TVHeuristic.
Definition at line 279 of file NewUVHeuristic.h.
bool NewUVHeuristic::bestMatchIsRC [protected] |
Flag to indicate if normal or reverse-complement version provided best match.
This flag is set at the end of the runHeuristic method in this class to indicate if the normal or the reverse-complement check yielded the best possible match. If this flag is false
, then the normal check yielded the best match. If this value is true
, then the reverse-complement check yielded the best match.
Definition at line 240 of file NewUVHeuristic.h.
Referenced by TVHeuristic::runHeuristic(), and runHeuristic().
int NewUVHeuristic::BitMask = 0 [static, protected] |
Definition at line 248 of file NewUVHeuristic.h.
Referenced by setReferenceEST().
int NewUVHeuristic::bitsToShift = 0 [static, protected] |
Instance variable to store the number of bits to be shifted to create hash values.
This instance variable is set to the value of 2 * (v - 1) (in the initialize
method) to reflect the number of bits that need to be shifted in order to build the hash values for common words (including the values stored in s1WordMap
and s1RCWordMap
).
This instance variable is actually passed on to the ESTCodec::NormalEncoder or ESTCodec::RevCompEncoder when computing hash values. Since this is value is passed in a template parameter, it is defined to be static (to ensure that it has external linkage as per the ISO/ANSI standards requirement).
Definition at line 266 of file NewUVHeuristic.h.
Referenced by initialize().
const std::string NewUVHeuristic::hintKey [private] |
The hint key that is used to add hint for normal or reverse-complement D2 computation.
This hint key is used to set a hint in the hints
hash map. This string is defined as a constant to save compute time in the core runHeuristics
method.
Definition at line 312 of file NewUVHeuristic.h.
Referenced by runHeuristic().
int NewUVHeuristic::otherESTLen [protected] |
Definition at line 205 of file NewUVHeuristic.h.
Referenced by computeHash(), TVHeuristic::countCommonWords(), NewUVHeuristic(), TVHeuristic::runHeuristic(), and TVHeuristic::updateParameters().
int NewUVHeuristic::passes [protected] |
Definition at line 211 of file NewUVHeuristic.h.
Referenced by NewUVHeuristic(), parseArguments(), runHeuristic(), and TVHeuristic::updateParameters().
int NewUVHeuristic::refESTLen [protected] |
Definition at line 203 of file NewUVHeuristic.h.
Referenced by NewUVHeuristic(), setReferenceEST(), and TVHeuristic::updateParameters().
char* NewUVHeuristic::s1RCWordMap [protected] |
Instance variable to track if a given word (of length v
) appears in reference EST.
This instance variable is created in the initialize() method to point to an array of 4v
Definition at line 228 of file NewUVHeuristic.h.
Referenced by initialize(), NewUVHeuristic(), TVHeuristic::runHeuristic(), runHeuristic(), setReferenceEST(), and ~NewUVHeuristic().
char* NewUVHeuristic::s1WordMap [protected] |
Instance variable to track if a given word (of length v
) appears in reference EST.
This instance variable is created in the initialize() method to point to an array of 4v
Definition at line 220 of file NewUVHeuristic.h.
Referenced by initialize(), NewUVHeuristic(), TVHeuristic::runHeuristic(), runHeuristic(), setReferenceEST(), and ~NewUVHeuristic().
int NewUVHeuristic::u [protected] |
Definition at line 207 of file NewUVHeuristic.h.
Referenced by NewUVHeuristic(), parseArguments(), runHeuristic(), and TVHeuristic::updateParameters().
UVHashTable NewUVHeuristic::uvCache [private] |
A hash map to cache hash values (v base pairs in length) to seedup u/v heuristic.
The u/v heuristic used to generate hash values (in the runHeuristic
method) by iterating over the base pairs in a given EST sequence. However, this approach turned out to be rather slow. Furthermore, it was observed that the same set of hash values were recomputed for various pair-wise EST comparisons.
Therefore, to improve the overall performance of the u/v heuristic, it was proposed that the hash values be cached to improve performance. Of course, this does increase the net amount of memory consumed. Consequently, to aid in caching only the required subset of EST hash values, this hash map was introduced to store the necessary sequences and rapidly access them when needed.
The entries in this hash map are computed by the computeHash
method, which is invoked from the runHeuristic method.
Definition at line 303 of file NewUVHeuristic.h.
Referenced by computeHash(), runHeuristic(), and ~NewUVHeuristic().
int NewUVHeuristic::v = 8 [static, protected] |
Instance variable to maintain the v parameter for the u/v heuristic.
Definition at line 246 of file NewUVHeuristic.h.
Referenced by computeHash(), TVHeuristic::countCommonWords(), TVHeuristic::initialize(), initialize(), parseArguments(), and setReferenceEST().
int NewUVHeuristic::wordShift [protected] |
Definition at line 209 of file NewUVHeuristic.h.
Referenced by computeHash(), NewUVHeuristic(), parseArguments(), and TVHeuristic::updateParameters().