Heuristic based upon the T/V heuristic used in WCD, a type of common word heuristic. More...
#include <TVHeuristic.h>
Public Member Functions | |
virtual void | showArguments (std::ostream &os) |
Display valid command line arguments for this heuristic. | |
virtual bool | parseArguments (int &argc, char **argv) |
Process command line arguments. | |
virtual int | initialize () |
Method to begin heuristic analysis (if any). | |
virtual int | setReferenceEST (const int estIdx) |
Set the reference EST id for analysis. | |
virtual | ~TVHeuristic () |
The destructor. | |
int | getWindowLen () |
Obtain the window length used for t/v heuristic. | |
Protected Member Functions | |
TVHeuristic (const std::string &outputFileName) | |
The default constructor. | |
virtual bool | runHeuristic (const int otherEST) |
Determine whether the analyzer should analyze, according to this heuristic. | |
bool | updateParameters () |
Method to obtain and update the parameters for the heuristic based on the parameter set manager. | |
template<typename Encoder > | |
int | countCommonWords (const int otherEST, Encoder encoder, const char *refWordMap) |
Templatized-method for counting common woards between two ESTs. | |
virtual void | printStats (std::ostream &os) const |
Method to display statistics regarding operation of this heuristic. | |
Private Attributes | |
int | t |
The number of minumum number of common words. | |
int | windowLen |
The window length to be used for t/v heuristic. | |
char * | matchTable |
A large table to track matches. | |
int | uvSuccessCount |
Instance variable to track the number of times UV sample heuristic passed. | |
Static Private Attributes | |
static arg_parser::arg_record | argsList [] |
The set of arguments specific to the TV heuristic. | |
Friends | |
class | HeuristicFactory |
Heuristic based upon the T/V heuristic used in WCD, a type of common word heuristic.
The idea of the t/v-heuristic is to require the common words in a pair of ESTs to be found reasonably close to each other but not too close. The rule of this heuristic is as follows:
Assume we are given two ESTs to analyze, say ei and ej and a threshold t.
Consider all v words appearing in ei.
At least t of these v words must appear in j so that they do not overlap (their starting positions must be at least v base pairs different) and are at least 100 base pairs of each other.
If there are at least t v words the huristic passes. If not, the pair need not be considered further.
Definition at line 65 of file TVHeuristic.h.
TVHeuristic::~TVHeuristic | ( | ) | [virtual] |
The destructor.
The destructor frees memory allocated for holding any dynamic data. in the base class.
Definition at line 57 of file TVHeuristic.cpp.
References matchTable.
TVHeuristic::TVHeuristic | ( | const std::string & | outputFileName | ) | [protected] |
The default constructor.
The constructor has been made protected to ensure that this class is never directly instantiated. Instead it should be created via a suitable call to the HeuristicFactory API method(s).
[in] | outputFileName | The output file to which any heuristic data is to be written. Currently, this value is ignored. |
Definition at line 49 of file TVHeuristic.cpp.
References matchTable, t, uvSuccessCount, and windowLen.
int TVHeuristic::countCommonWords | ( | const int | otherEST, | |
Encoder | encoder, | |||
const char * | refWordMap | |||
) | [inline, protected] |
Templatized-method for counting common woards between two ESTs.
This method is a helper method that is invoked from the runHeuristic method to count the number of common words between the reference EST (set via call to setReferenceEST) and otherEST (parameter). This method operates as follows:
First the matchTable (instance variable) is cleared to all zeros.
Next the initial word of length NewUVHeuristic::v is constructed while ignoring bases marked as 'n' (this may require processing of more than the first NewUVHeuristic::v bases if one of them is a 'n'.
Definition at line 232 of file TVHeuristic.h.
References EST::getEST(), EST::getSequence(), matchTable, NewUVHeuristic::otherESTLen, NewUVHeuristic::v, and windowLen.
Referenced by runHeuristic().
int TVHeuristic::getWindowLen | ( | ) | [inline] |
Obtain the window length used for t/v heuristic.
The window length defines the length of the window within which common words are tracked and reported by this heuristic. Typically, this window length must match the window length used for D2 analysis for the heuristic to be meanigful. The default value is 100. This value can be overridden by the user via suitable command line arguments.
Definition at line 158 of file TVHeuristic.h.
References windowLen.
int TVHeuristic::initialize | ( | ) | [virtual] |
Method to begin heuristic analysis (if any).
This method is invoked just before commencement of EST analysis. This method essentially passes control to the base class that merely creates the arrays for building hash maps.
Reimplemented from NewUVHeuristic.
Definition at line 94 of file TVHeuristic.cpp.
References EST::getMaxESTLen(), ParameterSetManager::getMaxFrameSize(), ParameterSetManager::getParameterSetManager(), matchTable, and NewUVHeuristic::v.
bool TVHeuristic::parseArguments | ( | int & | argc, | |
char ** | argv | |||
) | [virtual] |
Process command line arguments.
This method is used to process command line arguments specific to this heuristic. This method is typically used from the main
method just after the heuristic has been instantiated. This method consumes all valid command line arguments. If the command line arguments were valid and successfully processed, then this method returns true
.
[in,out] | argc | The number of command line arguments to be processed. When arguments are processed/consumed this parameter is changed to reflect the number of arguments processed. |
[in,out] | argv | The array of command line arguments to be processed. When arguments are processed/consumed, the consumed arguments are removed from this array. |
true
if the command line arguments were successfully processed. Otherwise this method returns false
. Reimplemented from NewUVHeuristic.
Definition at line 73 of file TVHeuristic.cpp.
References arg_parser::check_args(), Heuristic::heuristicName, NewUVHeuristic::parseArguments(), and t.
void TVHeuristic::printStats | ( | std::ostream & | os | ) | const [protected, virtual] |
Method to display statistics regarding operation of this heuristic.
This method can be used to obtain a dump of the statistics gathered regarding the operation of this heuristic. This method calls the base class method first which prints some common statistics. It then displays the number of times the u/v sample heuristic (the base class) returned success causing the t/v heuristic to be run.
[out] | os | The output stream to which the statistics regarding the heuristic is to be dumped. |
Reimplemented from Heuristic.
Definition at line 169 of file TVHeuristic.cpp.
References uvSuccessCount.
bool TVHeuristic::runHeuristic | ( | const int | otherEST | ) | [protected, virtual] |
Determine whether the analyzer should analyze, according to this heuristic.
This method can be used to compare a given EST with the reference EST (set via the call to the setReferenceEST()) method. This method operates as follows:
It invokes the corresponding method in the base class to first run the UV-sample heuristic on the pair of ESTs. If the pair fails UV-sample heuristic this method returns immediately with false
(indicating further analysis is not needed).
If the pair passes UV-sample heuristic then this method invokes the overloaded runHeuristic
method with a suitable encoder (normal or reverse-complement encoder depending on the value of NewUVHeuristic::bestMatchIsRC flag) to analyze the pair of ESTs using the TV heuristic.
[in] | otherEST | The index (zero based) of the EST with which the reference EST is to be compared. |
true
if the heuristic says the EST pair should be analyzed, and false
if it should not. Reimplemented from NewUVHeuristic.
Definition at line 119 of file TVHeuristic.cpp.
References NewUVHeuristic::bestMatchIsRC, countCommonWords(), EST::getEST(), EST::getSequence(), NewUVHeuristic::otherESTLen, NewUVHeuristic::runHeuristic(), NewUVHeuristic::s1RCWordMap, NewUVHeuristic::s1WordMap, t, updateParameters(), and uvSuccessCount.
int TVHeuristic::setReferenceEST | ( | const int | estIdx | ) | [virtual] |
Set the reference EST id for analysis.
This method is invoked just before a batch of ESTs are analyzed via a call to the analyze(EST *) method. Setting the reference EST provides this heuristic an opportunity to pre-compute the normal and reverse-complement hash tabes for words of varying sizes. The hash table enables rapid searching for words in the runHeuristic
method.
Reimplemented from NewUVHeuristic.
Definition at line 109 of file TVHeuristic.cpp.
References Heuristic::refESTidx.
void TVHeuristic::showArguments | ( | std::ostream & | os | ) | [virtual] |
Display valid command line arguments for this heuristic.
This method is used to display all valid command line options that are supported by this heuristic. Note that this method invokes the corresponding method in the base class to display any options supported by the base class. This method is typically used in the main()
method when displaying usage information.
[out] | os | The output stream to which the valid command line arguments must be written. |
Reimplemented from NewUVHeuristic.
Definition at line 64 of file TVHeuristic.cpp.
bool TVHeuristic::updateParameters | ( | ) | [protected] |
Method to obtain and update the parameters for the heuristic based on the parameter set manager.
Definition at line 155 of file TVHeuristic.cpp.
References ParameterSet::frameSize, ParameterSetManager::getParameterSet(), ParameterSetManager::getParameterSetManager(), NewUVHeuristic::otherESTLen, NewUVHeuristic::passes, NewUVHeuristic::refESTLen, ParameterSet::t, t, ParameterSet::u, NewUVHeuristic::u, windowLen, ParameterSet::wordShift, and NewUVHeuristic::wordShift.
Referenced by runHeuristic().
friend class HeuristicFactory [friend] |
Reimplemented from NewUVHeuristic.
Definition at line 66 of file TVHeuristic.h.
arg_parser::arg_record TVHeuristic::argsList [static, private] |
{ {NULL, NULL, NULL, arg_parser::BOOLEAN} }
The set of arguments specific to the TV heuristic.
This instance variable contains a static list of arguments that are specific only to this analyzer class. This argument list is statically defined and shared by all instances of this class.
Reimplemented from NewUVHeuristic.
Definition at line 294 of file TVHeuristic.h.
char* TVHeuristic::matchTable [private] |
A large table to track matches.
This instance variable contains a large table that tracks matches encountered as this heuristic tracks matching words.
Definition at line 321 of file TVHeuristic.h.
Referenced by countCommonWords(), initialize(), TVHeuristic(), and ~TVHeuristic().
int TVHeuristic::t [private] |
The number of minumum number of common words.
This instance variable contains the minimum number of words (that are close but not too close) that have matching values in pairs of ESTs. The default is 65. However, this value can be overridden by a command line argument.
Definition at line 303 of file TVHeuristic.h.
Referenced by parseArguments(), runHeuristic(), TVHeuristic(), and updateParameters().
int TVHeuristic::uvSuccessCount [private] |
Instance variable to track the number of times UV sample heuristic passed.
This instance variable is used to track the number of times the UV sample heuristic passed. This value indicates the number of times the TV heuristic was actually run. This value is incremented in the runHeuristic method and is displayed by the printStats() method.
Definition at line 332 of file TVHeuristic.h.
Referenced by printStats(), runHeuristic(), and TVHeuristic().
int TVHeuristic::windowLen [private] |
The window length to be used for t/v heuristic.
The window length defines the length of the window within which common words are tracked and reported by this heuristic. Typically, this window length must match the window length used for D2 analysis for the heuristic to be meanigful. The default value is 100. This value can be overridden by the user via suitable command line arguments.
Definition at line 314 of file TVHeuristic.h.
Referenced by countCommonWords(), getWindowLen(), TVHeuristic(), and updateParameters().