Heuristic based upon the "u/v sample heuristic" used in WCD, a type of common word heuristic. More...
#include <UVSampleHeuristic.h>
Public Member Functions | |
virtual void | showArguments (std::ostream &os) |
Display valid command line arguments for this heuristic. | |
virtual bool | parseArguments (int &argc, char **argv) |
Process command line arguments. | |
virtual int | initialize () |
Method to begin heuristic analysis (if any). | |
virtual int | setReferenceEST (const int estIdx) |
Set the reference EST id for analysis. | |
virtual | ~UVSampleHeuristic () |
The destructor. | |
Protected Member Functions | |
UVSampleHeuristic (const std::string &name, const std::string &outputFileName) | |
The default constructor. | |
virtual bool | runHeuristic (const int otherEST) |
Determine whether the analyzer should analyze, according to this heuristic. | |
void | computeHash (const int estIdx) |
Method to compute the u/v hash values for a given EST. | |
Protected Attributes | |
char * | s1WordMap |
Instance variable to track if a given word (of length v ) appears in reference EST. | |
char * | s1RCWordMap |
Instance variable to track if a given word (of length v ) appears in reference EST. | |
bool | bestMatchIsRC |
Flag to indicate if normal or reverse-complement version provided best match. | |
Static Protected Attributes | |
static int | v = 8 |
Instance variable to maintain the v parameter for the u/v heuristic. | |
static int | wordShift = 16 |
static int | BitMask = 0 |
static int | bitsToShift = 0 |
Instance variable to store the number of bits to be shifted to create hash values. | |
Private Attributes | |
UVHashTable | uvCache |
A hash map to cache hash values (v base pairs in length) to seedup u/v heuristic. | |
const std::string | hintKey |
The hint key that is used to add hint for normal or reverse-complement D2 computation. | |
Static Private Attributes | |
static arg_parser::arg_record | argsList [] |
The set of arguments specific to the UV heuristic. | |
static int | u = 4 |
Friends | |
class | HeuristicFactory |
Heuristic based upon the "u/v sample heuristic" used in WCD, a type of common word heuristic.
Considers all words of length v in the first sequence and every 16th word of length v in the second sequence. Returns true if it finds at least u common words.
Definition at line 67 of file UVSampleHeuristic.h.
UVSampleHeuristic::~UVSampleHeuristic | ( | ) | [virtual] |
The destructor.
The destructor frees memory allocated for holding any dynamic data in the base class.
Definition at line 71 of file UVSampleHeuristic.cpp.
References s1RCWordMap, s1WordMap, and uvCache.
UVSampleHeuristic::UVSampleHeuristic | ( | const std::string & | name, | |
const std::string & | outputFileName | |||
) | [protected] |
The default constructor.
The constructor has been made protected to ensure that this class is never directly instantiated. Instead one of the derived Heuristic classes must be instantiated via the HeuristicFactory API methods.
[in] | name | The human readable name for this heuristic. This name is used when generating errors, warnings, and other output messages for this heuristic. |
[in] | outputFileName | The output file to which any analysis information is to be written. Currently this parameter is unused. |
Definition at line 63 of file UVSampleHeuristic.cpp.
References s1RCWordMap, and s1WordMap.
void UVSampleHeuristic::computeHash | ( | const int | estIdx | ) | [protected] |
Method to compute the u/v hash values for a given EST.
This method is a utility method that was introduced to streamline the process of computing and caching u/v hash values for a given EST. This method uses the variables u
, v
, and wordShift
(all of them user configurable) along with a ESTCodec::NormalEncoder object to compute hash values into a std::vector. The vector is added to the uvCache
hash map for future reference.
[in] | estIdx | The zero-based index of the EST whose hash values is to be computed and cached. |
Definition at line 181 of file UVSampleHeuristic.cpp.
References ASSERT, EST::getEST(), EST::getSequence(), uvCache, v, and wordShift.
Referenced by runHeuristic().
int UVSampleHeuristic::initialize | ( | ) | [virtual] |
Method to begin heuristic analysis (if any).
This method is invoked just before commencement of EST analysis. This method typically loads additional information that may be necessary for a given heuristic from data files. In addition, it may perform any pre-processing as the case may be.
Implements Heuristic.
Definition at line 111 of file UVSampleHeuristic.cpp.
References bitsToShift, s1RCWordMap, s1WordMap, and v.
bool UVSampleHeuristic::parseArguments | ( | int & | argc, | |
char ** | argv | |||
) | [virtual] |
Process command line arguments.
This method is used to process command line arguments specific to this heuristic. This method is typically used from the main method just after the heuristic has been instantiated. This method consumes all valid command line arguments. If the command line arguments were valid and successfully processed, then this method returns true
.
[in,out] | argc | The number of command line arguments to be processed. |
[in,out] | argv | The array of command line arguments. |
true
if the command line arguments were successfully processed. Otherwise this method returns false
. Implements Heuristic.
Definition at line 91 of file UVSampleHeuristic.cpp.
References arg_parser::check_args(), Heuristic::heuristicName, u, v, and wordShift.
bool UVSampleHeuristic::runHeuristic | ( | const int | otherEST | ) | [protected, virtual] |
Determine whether the analyzer should analyze, according to this heuristic.
This method can be used to compare a given EST with the reference EST (set via the call to the setReferenceEST()) method.
[in] | otherEST | The index (zero based) of the EST with which the reference EST is to be compared. |
Implements Heuristic.
Definition at line 216 of file UVSampleHeuristic.cpp.
References bestMatchIsRC, computeHash(), EST::getESTCount(), HeuristicChain::getHeuristicChain(), hintKey, Heuristic::refESTidx, s1RCWordMap, s1WordMap, HeuristicChain::setHint(), u, uvCache, and VALIDATE.
int UVSampleHeuristic::setReferenceEST | ( | const int | estIdx | ) | [virtual] |
Set the reference EST id for analysis.
This method is invoked just before a batch of ESTs are analyzed via a call to the analyze(EST *) method. Setting the reference EST provides heuristics an opportunity to optimize certain operations, if possible.
Implements Heuristic.
Definition at line 124 of file UVSampleHeuristic.cpp.
References ASSERT, BitMask, ESTCodec::encode2rc(), ESTCodec::getCodec(), EST::getEST(), EST::getESTCount(), EST::getSequence(), Heuristic::refESTidx, s1RCWordMap, s1WordMap, ESTCodec::setRevCompTable(), and v.
void UVSampleHeuristic::showArguments | ( | std::ostream & | os | ) | [virtual] |
Display valid command line arguments for this heuristic.
This method must be used to display all valid command line options that are supported by this heuristic. Note that derived classes may override this method to display additional command line options that are applicable to it. This method is typically used in the main() method when displaying usage information.
[out] | os | The output stream to which the valid command line arguments must be written. |
Implements Heuristic.
Definition at line 84 of file UVSampleHeuristic.cpp.
friend class HeuristicFactory [friend] |
Definition at line 68 of file UVSampleHeuristic.h.
arg_parser::arg_record UVSampleHeuristic::argsList [static, private] |
{ {"--uv_u", "u (number of v-word matches) (default=4)", &UVSampleHeuristic::u, arg_parser::INTEGER}, {"--uv_v", "v (length of common words) (default=8)", &UVSampleHeuristic::v, arg_parser::INTEGER}, {"--uv_wordShift", "Word Shift (default=16)", &UVSampleHeuristic::wordShift, arg_parser::INTEGER}, {NULL, NULL, NULL, arg_parser::BOOLEAN} }
The set of arguments specific to the UV heuristic.
This instance variable contains a static list of arguments that are specific only to this analyzer class. This argument list is statically defined and shared by all instances of this class.
Definition at line 271 of file UVSampleHeuristic.h.
bool UVSampleHeuristic::bestMatchIsRC [protected] |
Flag to indicate if normal or reverse-complement version provided best match.
This flag is set at the end of the runHeuristic method in this class to indicate if the normal or the reverse-complement check yielded the best possible match. If this flag is false
, then the normal check yielded the best match. If this value is true
, then the reverse-complement check yielded the best match.
Definition at line 230 of file UVSampleHeuristic.h.
Referenced by runHeuristic().
int UVSampleHeuristic::BitMask = 0 [static, protected] |
Definition at line 240 of file UVSampleHeuristic.h.
Referenced by setReferenceEST().
int UVSampleHeuristic::bitsToShift = 0 [static, protected] |
Instance variable to store the number of bits to be shifted to create hash values.
This instance variable is set to the value of 2 * (v - 1) (in the initialize
method) to reflect the number of bits that need to be shifted in order to build the hash values for common words (including the values stored in s1WordMap
and s1RCWordMap
).
This instance variable is actually passed on to the ESTCodec::NormalEncoder or ESTCodec::RevCompEncoder when computing hash values. Since this is value is passed in a template parameter, it is defined to be static (to ensure that it has external linkage as per the ISO/ANSI standards requirement).
Definition at line 258 of file UVSampleHeuristic.h.
Referenced by initialize().
const std::string UVSampleHeuristic::hintKey [private] |
The hint key that is used to add hint for normal or reverse-complement D2 computation.
This hint key is used to set a hint in the hints
hash map. This string is defined as a constant to save compute time in the core runHeuristics
method.
Definition at line 306 of file UVSampleHeuristic.h.
Referenced by runHeuristic().
char* UVSampleHeuristic::s1RCWordMap [protected] |
Instance variable to track if a given word (of length v
) appears in reference EST.
This instance variable is created in the initialize() method to point to an array of 4v
Definition at line 218 of file UVSampleHeuristic.h.
Referenced by initialize(), runHeuristic(), setReferenceEST(), UVSampleHeuristic(), and ~UVSampleHeuristic().
char* UVSampleHeuristic::s1WordMap [protected] |
Instance variable to track if a given word (of length v
) appears in reference EST.
This instance variable is created in the initialize() method to point to an array of 4v
Definition at line 210 of file UVSampleHeuristic.h.
Referenced by initialize(), runHeuristic(), setReferenceEST(), UVSampleHeuristic(), and ~UVSampleHeuristic().
int UVSampleHeuristic::u = 4 [static, private] |
Definition at line 273 of file UVSampleHeuristic.h.
Referenced by parseArguments(), and runHeuristic().
UVHashTable UVSampleHeuristic::uvCache [private] |
A hash map to cache hash values (v base pairs in length) to seedup u/v heuristic.
The u/v heuristic used to generate hash values (in the runHeuristic
method) by iterating over the base pairs in a given EST sequence. However, this approach turned out to be rather slow. Furthermore, it was observed that the same set of hash values were recomputed for various pair-wise EST comparisons.
Therefore, to improve the overall performance of the u/v heuristic, it was proposed that the hash values be cached to improve performance. Of course, this does increase the net amount of memory consumed. Consequently, to aid in caching only the required subset of EST hash values, this hash map was introduced to store the necessary sequences and rapidly access them when needed.
The entries in this hash map are computed by the computeHash
method, which is invoked from the runHeuristic method.
Definition at line 297 of file UVSampleHeuristic.h.
Referenced by computeHash(), runHeuristic(), and ~UVSampleHeuristic().
int UVSampleHeuristic::v = 8 [static, protected] |
Instance variable to maintain the v parameter for the u/v heuristic.
Definition at line 236 of file UVSampleHeuristic.h.
Referenced by computeHash(), initialize(), parseArguments(), and setReferenceEST().
int UVSampleHeuristic::wordShift = 16 [static, protected] |
Definition at line 238 of file UVSampleHeuristic.h.
Referenced by computeHash(), and parseArguments().