TVHeuristic Class Reference

Heuristic based upon the T/V heuristic used in WCD, a type of common word heuristic. More...

#include <TVHeuristic.h>

Inheritance diagram for TVHeuristic:
Inheritance graph
[legend]
Collaboration diagram for TVHeuristic:
Collaboration graph
[legend]

List of all members.

Public Member Functions

virtual void showArguments (std::ostream &os)
 Display valid command line arguments for this heuristic.
virtual bool parseArguments (int &argc, char **argv)
 Process command line arguments.
virtual int initialize ()
 Method to begin heuristic analysis (if any).
virtual int setReferenceEST (const int estIdx)
 Set the reference EST id for analysis.
virtual ~TVHeuristic ()
 The destructor.
int getWindowLen ()
 Obtain the window length used for t/v heuristic.

Protected Member Functions

 TVHeuristic (const std::string &outputFileName)
 The default constructor.
virtual bool runHeuristic (const int otherEST)
 Determine whether the analyzer should analyze, according to this heuristic.
bool updateParameters ()
 Method to obtain and update the parameters for the heuristic based on the parameter set manager.
template<typename Encoder >
int countCommonWords (const int otherEST, Encoder encoder, const char *refWordMap)
 Templatized-method for counting common woards between two ESTs.
virtual void printStats (std::ostream &os) const
 Method to display statistics regarding operation of this heuristic.

Private Attributes

int t
 The number of minumum number of common words.
int windowLen
 The window length to be used for t/v heuristic.
char * matchTable
 A large table to track matches.
int uvSuccessCount
 Instance variable to track the number of times UV sample heuristic passed.

Static Private Attributes

static arg_parser::arg_record argsList []
 The set of arguments specific to the TV heuristic.

Friends

class HeuristicFactory

Detailed Description

Heuristic based upon the T/V heuristic used in WCD, a type of common word heuristic.

The idea of the t/v-heuristic is to require the common words in a pair of ESTs to be found reasonably close to each other but not too close. The rule of this heuristic is as follows:

  1. Assume we are given two ESTs to analyze, say ei and ej and a threshold t.

  2. Consider all v words appearing in ei.

  3. At least t of these v words must appear in j so that they do not overlap (their starting positions must be at least v base pairs different) and are at least 100 base pairs of each other.

  4. If there are at least t v words the huristic passes. If not, the pair need not be considered further.

Definition at line 65 of file TVHeuristic.h.


Constructor & Destructor Documentation

TVHeuristic::~TVHeuristic (  )  [virtual]

The destructor.

The destructor frees memory allocated for holding any dynamic data. in the base class.

Definition at line 57 of file TVHeuristic.cpp.

References matchTable.

TVHeuristic::TVHeuristic ( const std::string &  outputFileName  )  [protected]

The default constructor.

The constructor has been made protected to ensure that this class is never directly instantiated. Instead it should be created via a suitable call to the HeuristicFactory API method(s).

Parameters:
[in] outputFileName The output file to which any heuristic data is to be written. Currently, this value is ignored.

Definition at line 49 of file TVHeuristic.cpp.

References matchTable, t, uvSuccessCount, and windowLen.


Member Function Documentation

template<typename Encoder >
int TVHeuristic::countCommonWords ( const int  otherEST,
Encoder  encoder,
const char *  refWordMap 
) [inline, protected]

Templatized-method for counting common woards between two ESTs.

This method is a helper method that is invoked from the runHeuristic method to count the number of common words between the reference EST (set via call to setReferenceEST) and otherEST (parameter). This method operates as follows:

  1. First the matchTable (instance variable) is cleared to all zeros.

  2. Next the initial word of length NewUVHeuristic::v is constructed while ignoring bases marked as 'n' (this may require processing of more than the first NewUVHeuristic::v bases if one of them is a 'n'.

Definition at line 232 of file TVHeuristic.h.

References EST::getEST(), EST::getSequence(), matchTable, NewUVHeuristic::otherESTLen, NewUVHeuristic::v, and windowLen.

Referenced by runHeuristic().

int TVHeuristic::getWindowLen (  )  [inline]

Obtain the window length used for t/v heuristic.

The window length defines the length of the window within which common words are tracked and reported by this heuristic. Typically, this window length must match the window length used for D2 analysis for the heuristic to be meanigful. The default value is 100. This value can be overridden by the user via suitable command line arguments.

Returns:
The current window (or frame) size set for t/v heuristic.

Definition at line 158 of file TVHeuristic.h.

References windowLen.

int TVHeuristic::initialize (  )  [virtual]

Method to begin heuristic analysis (if any).

This method is invoked just before commencement of EST analysis. This method essentially passes control to the base class that merely creates the arrays for building hash maps.

Returns:
If the initialization process was sucessful, then this method returns 0. Otherwise this method returns with a non-zero error code.

Reimplemented from NewUVHeuristic.

Definition at line 94 of file TVHeuristic.cpp.

References EST::getMaxESTLen(), ParameterSetManager::getMaxFrameSize(), ParameterSetManager::getParameterSetManager(), matchTable, and NewUVHeuristic::v.

bool TVHeuristic::parseArguments ( int &  argc,
char **  argv 
) [virtual]

Process command line arguments.

This method is used to process command line arguments specific to this heuristic. This method is typically used from the main method just after the heuristic has been instantiated. This method consumes all valid command line arguments. If the command line arguments were valid and successfully processed, then this method returns true.

Note:
This method calls the corresponding base class implementation to display common options.
Parameters:
[in,out] argc The number of command line arguments to be processed. When arguments are processed/consumed this parameter is changed to reflect the number of arguments processed.
[in,out] argv The array of command line arguments to be processed. When arguments are processed/consumed, the consumed arguments are removed from this array.
Returns:
This method returns true if the command line arguments were successfully processed. Otherwise this method returns false.

Reimplemented from NewUVHeuristic.

Definition at line 73 of file TVHeuristic.cpp.

References arg_parser::check_args(), Heuristic::heuristicName, NewUVHeuristic::parseArguments(), and t.

void TVHeuristic::printStats ( std::ostream &  os  )  const [protected, virtual]

Method to display statistics regarding operation of this heuristic.

This method can be used to obtain a dump of the statistics gathered regarding the operation of this heuristic. This method calls the base class method first which prints some common statistics. It then displays the number of times the u/v sample heuristic (the base class) returned success causing the t/v heuristic to be run.

Parameters:
[out] os The output stream to which the statistics regarding the heuristic is to be dumped.

Reimplemented from Heuristic.

Definition at line 169 of file TVHeuristic.cpp.

References uvSuccessCount.

bool TVHeuristic::runHeuristic ( const int  otherEST  )  [protected, virtual]

Determine whether the analyzer should analyze, according to this heuristic.

This method can be used to compare a given EST with the reference EST (set via the call to the setReferenceEST()) method. This method operates as follows:

  1. It invokes the corresponding method in the base class to first run the UV-sample heuristic on the pair of ESTs. If the pair fails UV-sample heuristic this method returns immediately with false (indicating further analysis is not needed).

  2. If the pair passes UV-sample heuristic then this method invokes the overloaded runHeuristic method with a suitable encoder (normal or reverse-complement encoder depending on the value of NewUVHeuristic::bestMatchIsRC flag) to analyze the pair of ESTs using the TV heuristic.

Parameters:
[in] otherEST The index (zero based) of the EST with which the reference EST is to be compared.
Returns:
This method returns true if the heuristic says the EST pair should be analyzed, and false if it should not.

Reimplemented from NewUVHeuristic.

Definition at line 119 of file TVHeuristic.cpp.

References NewUVHeuristic::bestMatchIsRC, countCommonWords(), EST::getEST(), EST::getSequence(), NewUVHeuristic::otherESTLen, NewUVHeuristic::runHeuristic(), NewUVHeuristic::s1RCWordMap, NewUVHeuristic::s1WordMap, t, updateParameters(), and uvSuccessCount.

int TVHeuristic::setReferenceEST ( const int  estIdx  )  [virtual]

Set the reference EST id for analysis.

This method is invoked just before a batch of ESTs are analyzed via a call to the analyze(EST *) method. Setting the reference EST provides this heuristic an opportunity to pre-compute the normal and reverse-complement hash tabes for words of varying sizes. The hash table enables rapid searching for words in the runHeuristic method.

Note:
This method must be called only after the initialize() method is called.
Returns:
If the hash table creation process was sucessful, then this method returns 0. Otherwise this method returns an error code.

Reimplemented from NewUVHeuristic.

Definition at line 109 of file TVHeuristic.cpp.

References Heuristic::refESTidx.

void TVHeuristic::showArguments ( std::ostream &  os  )  [virtual]

Display valid command line arguments for this heuristic.

This method is used to display all valid command line options that are supported by this heuristic. Note that this method invokes the corresponding method in the base class to display any options supported by the base class. This method is typically used in the main() method when displaying usage information.

Parameters:
[out] os The output stream to which the valid command line arguments must be written.

Reimplemented from NewUVHeuristic.

Definition at line 64 of file TVHeuristic.cpp.

bool TVHeuristic::updateParameters (  )  [protected]

Method to obtain and update the parameters for the heuristic based on the parameter set manager.

Returns:
true if we should analyze these ESTs, false otherwise

Definition at line 155 of file TVHeuristic.cpp.

References ParameterSet::frameSize, ParameterSetManager::getParameterSet(), ParameterSetManager::getParameterSetManager(), NewUVHeuristic::otherESTLen, NewUVHeuristic::passes, NewUVHeuristic::refESTLen, ParameterSet::t, t, ParameterSet::u, NewUVHeuristic::u, windowLen, ParameterSet::wordShift, and NewUVHeuristic::wordShift.

Referenced by runHeuristic().


Friends And Related Function Documentation

friend class HeuristicFactory [friend]

Reimplemented from NewUVHeuristic.

Definition at line 66 of file TVHeuristic.h.


Member Data Documentation

Initial value:
 {
    {NULL, NULL, NULL, arg_parser::BOOLEAN}
    }

The set of arguments specific to the TV heuristic.

This instance variable contains a static list of arguments that are specific only to this analyzer class. This argument list is statically defined and shared by all instances of this class.

Note:
Use of static arguments and parameters renders this UV sample heuristic class not to be MT-safe.

Reimplemented from NewUVHeuristic.

Definition at line 294 of file TVHeuristic.h.

char* TVHeuristic::matchTable [private]

A large table to track matches.

This instance variable contains a large table that tracks matches encountered as this heuristic tracks matching words.

Definition at line 321 of file TVHeuristic.h.

Referenced by countCommonWords(), initialize(), TVHeuristic(), and ~TVHeuristic().

int TVHeuristic::t [private]

The number of minumum number of common words.

This instance variable contains the minimum number of words (that are close but not too close) that have matching values in pairs of ESTs. The default is 65. However, this value can be overridden by a command line argument.

Definition at line 303 of file TVHeuristic.h.

Referenced by parseArguments(), runHeuristic(), TVHeuristic(), and updateParameters().

Instance variable to track the number of times UV sample heuristic passed.

This instance variable is used to track the number of times the UV sample heuristic passed. This value indicates the number of times the TV heuristic was actually run. This value is incremented in the runHeuristic method and is displayed by the printStats() method.

Definition at line 332 of file TVHeuristic.h.

Referenced by printStats(), runHeuristic(), and TVHeuristic().

int TVHeuristic::windowLen [private]

The window length to be used for t/v heuristic.

The window length defines the length of the window within which common words are tracked and reported by this heuristic. Typically, this window length must match the window length used for D2 analysis for the heuristic to be meanigful. The default value is 100. This value can be overridden by the user via suitable command line arguments.

Definition at line 314 of file TVHeuristic.h.

Referenced by countCommonWords(), getWindowLen(), TVHeuristic(), and updateParameters().


The documentation for this class was generated from the following files:

Generated on 19 Mar 2010 for PEACE by  doxygen 1.6.1