The base class of all EST analyzers. More...
#include <ESTAnalyzer.h>
Public Member Functions | |
virtual void | showArguments (std::ostream &os) |
Display valid command line arguments for this analyzer. | |
virtual bool | parseArguments (int &argc, char **argv) |
Process command line arguments. | |
virtual int | initialize ()=0 |
Method to begin EST analysis. | |
virtual std::string | getName () const =0 |
Method to obtain human-readable name for this EST analyzer. | |
virtual int | setReferenceEST (const int estIdx)=0 |
Set the reference EST id for analysis. | |
float | analyze (const int otherEST, const bool useHeuristics=true, const bool useHeavyWeight=true) |
Analyze and obtain a similarity metric using the attached heuristic chain (if one exists) followed by the appropriate heavy weight distance/similarity measure associated with this ESTAnalyzer. | |
virtual int | analyze ()=0 |
Method to perform EST analysis. | |
virtual bool | getAlignmentData (int &alignmentData) |
Get alignment data for the previous call to analyze method. | |
bool | loadFASTAFile (const char *fileName, const bool unpopulate=false) |
Method to load EST information from a FASTA file. | |
const char * | getInputFileName () const |
Obtain the input file name. | |
virtual bool | isDistanceMetric () const |
Determine if this EST analyzer provides distance metrics or similarity metrics. | |
virtual float | getInvalidMetric () const |
Obtain an invalid (or the worst) metric generated by this analyzer. | |
virtual float | getValidMetric () const |
Obtain a valid (or the best) metric generated by this analyzer. | |
virtual int | getPreferredDummyESTLength () const |
Determine preferred dummy EST lengths to be used with this analyzer. | |
virtual bool | compareMetrics (const float metric1, const float metric2) const |
Method to compare two metrics generated by this class. | |
virtual int | setHeuristicChain (HeuristicChain *chain) |
Method to attach a heuristic chain to this EST analyzer. | |
virtual HeuristicChain * | getHeuristicChain () const |
Method to obtain the heuristic chain set for this EST analyzer. | |
virtual void | displayStats (std::ostream &os) |
Method to display performance statistics. | |
virtual | ~ESTAnalyzer () |
The destructor. | |
Protected Member Functions | |
ESTAnalyzer (const std::string &analyzerName, const int refESTidx, const std::string &outputFileName) | |
The default constructor. | |
virtual float | getMetric (const int otherEST)=0 |
Analyze and compute a similarity or distance metric between a given EST and the reference EST using the heavy weight metric associated with this ESTAnalyzer. | |
Protected Attributes | |
int | refESTidx |
The index of the reference EST in a given file. | |
HeuristicChain * | chain |
The heuristic chain associated with this EST analyzer. | |
const std::string | outputFileName |
The file to which results must be written. | |
const std::string | analyzerName |
The name of this analyzer. | |
Static Protected Attributes | |
static bool | readAhead = false |
Flag to indicate if a read ahead thread must be used. | |
static bool | noMaskBases = false |
Flag to indicate if lower-case characters must be masked out of reads. | |
static char * | estFileName = NULL |
The FASTA file from where EST data is to be read. | |
static bool | htmlLog = false |
Flag to indicate if output results must be in HTML format. | |
Private Member Functions | |
ESTAnalyzer & | operator= (const ESTAnalyzer &src) |
A dummy operator=. | |
Static Private Attributes | |
static arg_parser::arg_record | commonArgsList [] |
The set of common arguments for all EST analyzers. |
The base class of all EST analyzers.
This class must be the base class of all EST analyzers in the system. This class provides some default functionality that can be readily used by the EST analyzers.
Definition at line 47 of file ESTAnalyzer.h.
ESTAnalyzer::~ESTAnalyzer | ( | ) | [virtual] |
The destructor.
The destructor frees memory allocated for holding any EST data in the base class.
Definition at line 70 of file ESTAnalyzer.cpp.
ESTAnalyzer::ESTAnalyzer | ( | const std::string & | analyzerName, | |
const int | refESTidx, | |||
const std::string & | outputFileName | |||
) | [protected] |
The default constructor.
The constructor has been made protected to ensure that this class is never directly instantiated. Instead one of the derived ESTAnalyzer classes must be instantiated via the ESTAnalyzerFactor API methods.
[in] | analyzerName | The human readable name for this EST analyzer. This name is used when generating errors, warnings, and other output messages for this analyzer. |
[in] | refESTidx | The reference EST's index in a given multi-FASTA file. Index values start with 0 (zero). The refESTidx is supplied as a global argument that is processed in the main() method. This value is simply copied to the refESTidx member in this class. |
[in] | outputFileName | The file name to which output must be written. If a valid output file is not specified, then results are written to standard output. The outputFileName is simply copied to the outputFileName member object. |
Definition at line 63 of file ESTAnalyzer.cpp.
virtual int ESTAnalyzer::analyze | ( | ) | [pure virtual] |
Method to perform EST analysis.
This method must be used to perform EST analysis. This method is a pure-virtual method. Therefore all EST analyzers must override this method to perform all the necessary operations. Typically, this method performs the following operations:
This method calls initialize.
Set's the reference EST via a call to the setReferenceEST() method.
Repeatedly uses the analyze(const int) method to compare ESTs.
Generates analysis reports at the end of analysis.
Implemented in D2, D2Zim, FWAnalyzer, MatrixFileAnalyzer, and TwoPassD2.
float ESTAnalyzer::analyze | ( | const int | otherEST, | |
const bool | useHeuristics = true , |
|||
const bool | useHeavyWeight = true | |||
) |
Analyze and obtain a similarity metric using the attached heuristic chain (if one exists) followed by the appropriate heavy weight distance/similarity measure associated with this ESTAnalyzer.
This method can be used to compare a given EST with the reference EST (set via the call to the setReferenceEST()) method.
[in] | otherEST | The index (zero based) of the EST with which the reference EST is to be compared. |
[in] | useHeuristics | A directive instructing the ESTAnalyzer on whether or not to use its heuristis chain. Defaults to true. |
[in] | useHeavyWeight | A directive instructing the ESTAnalyzer on whether or not to use the heavy weight metric. Defaults to true. |
Definition at line 81 of file ESTAnalyzer.cpp.
References chain, getInvalidMetric(), getMetric(), getValidMetric(), and HeuristicChain::shouldAnalyze().
Referenced by TransMSTClusterMaker::analyze(), PMSTClusterMaker::analyze(), MSTClusterMaker::analyze(), InteractiveConsole::analyze(), main(), and LCFilter::runFilter().
virtual bool ESTAnalyzer::compareMetrics | ( | const float | metric1, | |
const float | metric2 | |||
) | const [inline, virtual] |
Method to compare two metrics generated by this class.
This method provides the interface for comparing metrics generated by this ESTAnalyzer when comparing two different ESTs. This method returns true
if metric1
is comparatively better than or equal to metric2
.
[in] | metric1 | The first metric to be compared against. |
[in] | metric2 | The second metric to be compared against. |
true
if metric1 is comparatively better then or equal to metric2
. Reimplemented in D2, D2Zim, MatrixFileAnalyzer, and TwoPassD2.
Definition at line 346 of file ESTAnalyzer.h.
Referenced by PMSTClusterMaker::addMoreChildESTs(), MSTClusterMaker::addMoreChildESTs(), TransMSTClusterMaker::analyze(), PMSTClusterMaker::computeNextESTidx(), MSTClusterMaker::computeNextESTidx(), MSTMultiListCache::getBestEntry(), MSTCluster::makeClusters(), MSTMultiListCache::mergeList(), GreaterCachedESTInfo::operator()(), LessCachedESTInfo::operator()(), PMSTClusterMaker::populateCache(), MSTClusterMaker::populateCache(), TransMSTClusterMaker::pruneMetricEntries(), and LCFilter::runFilter().
virtual void ESTAnalyzer::displayStats | ( | std::ostream & | os | ) | [inline, virtual] |
Method to display performance statistics.
This method can be used to display any statistics collated by this class (and its descendants) regarding their operation and performance. This method was primarily introduced to enable derived classes a mechanism to override statistics display and print additional information.
[out] | os | The output stream to which the statistics must be written. |
Definition at line 388 of file ESTAnalyzer.h.
virtual bool ESTAnalyzer::getAlignmentData | ( | int & | alignmentData | ) | [inline, virtual] |
Get alignment data for the previous call to analyze method.
This method can be used to obtain alignment data (if any) that was obtained typically as an byproduct of the previous call tothe analyze() method.
[out] | alignmentData | The parameter is updated to the alignment information generated as a part of the the immediately preceding analyze(const int) method call is returned in the parameter. |
false
. Furthermore, if a previous analyze() method call was not made, then the value returned in alignmentData parameter is not defined.true
if the alignment data is actually computed by this ESTAnalyzer. The default implementation of this method always returns false
. Reimplemented in D2, D2Zim, and TwoPassD2.
Definition at line 218 of file ESTAnalyzer.h.
Referenced by InteractiveConsole::analyze(), PMSTClusterMaker::manager(), MSTClusterMaker::manager(), PMSTClusterMaker::mergeManager(), PMSTClusterMaker::populateCache(), and MSTClusterMaker::populateCache().
virtual HeuristicChain* ESTAnalyzer::getHeuristicChain | ( | ) | const [inline, virtual] |
Method to obtain the heuristic chain set for this EST analyzer.
This method may be used to obtain a pointer to the heuristic chain set for use by this analyzer. If a heuristic chain has not been set, then this method returns NULL.
not
modify or delete the returned heuristic pointer.Definition at line 372 of file ESTAnalyzer.h.
References chain.
Referenced by TwoPassD2::getMetric(), TransMSTClusterMaker::initialize(), and D2::runD2().
const char* ESTAnalyzer::getInputFileName | ( | ) | const [inline] |
Obtain the input file name.
This method returns the input file from where the EST data was read.
Definition at line 248 of file ESTAnalyzer.h.
References estFileName.
Referenced by MSTClusterMaker::buildAndShowClusters(), PMSTClusterMaker::makeClusters(), and MSTClusterMaker::makeClusters().
virtual float ESTAnalyzer::getInvalidMetric | ( | ) | const [inline, virtual] |
Obtain an invalid (or the worst) metric generated by this analyzer.
This method can be used to obtain an invalid metric value for this analyzer. This value can be used to initialize metric values. By default this method returns -1, which should be ideal for similarity-based metrics.
Reimplemented in D2, D2Zim, MatrixFileAnalyzer, and TwoPassD2.
Definition at line 284 of file ESTAnalyzer.h.
Referenced by MSTClusterMaker::addEST(), analyze(), MSTMultiListCache::getBestEntry(), MSTHeapCache::getBestEntry(), LCFilter::initialize(), MSTMultiListCache::mergeList(), PMSTClusterMaker::populateCache(), MSTClusterMaker::populateCache(), and TransMSTClusterMaker::TransMSTClusterMaker().
virtual float ESTAnalyzer::getMetric | ( | const int | otherEST | ) | [protected, pure virtual] |
Analyze and compute a similarity or distance metric between a given EST and the reference EST using the heavy weight metric associated with this ESTAnalyzer.
This method can be used to compare a given EST with the reference EST (set via the call to the setReferenceEST()) method.
[in] | otherEST | The index (zero based) of the EST with which the reference EST is to be compared. |
Implemented in CLU, D2, D2Zim, FWAnalyzer, MatrixFileAnalyzer, and TwoPassD2.
Referenced by analyze().
virtual std::string ESTAnalyzer::getName | ( | ) | const [pure virtual] |
Method to obtain human-readable name for this EST analyzer.
This method provides a human-readable string identifying the EST analyzer. This string is typically used for display/debugging purposes (particularly via the PEACE Interactive Console).
Implemented in CLU, D2, D2Zim, FMWSCA, MatrixFileAnalyzer, and TwoPassD2.
Referenced by InteractiveConsole::analyze().
virtual int ESTAnalyzer::getPreferredDummyESTLength | ( | ) | const [inline, virtual] |
Determine preferred dummy EST lengths to be used with this analyzer.
This method can be used to determine the preferred dummy EST lengths to be used with this EST analyzer. This method may be overridden in derived classes to provide a more appropriate dummy EST length.
Dummy ESTs are used for the following purpose: When clustering FASTA data that contains low complexity reads, the low complexity reads provide false relationships between ESTs giving raise to very large clusters. These large clusters are created because transitive relationships are established between ESTs due to low complexity reads.
In order to avoid super-clusters that get formed due to low complexity reads, PEACE adds two dummy ESTs, one with all "AAAAA...."
and another with all "CCCCCC..."
. The length of the ESTs must be appropriately chosen based on the type of analyzer used. This method helps ClusterMaker hierarchy to determine the appropriate dummy EST length.
Reimplemented in FWAnalyzer.
Definition at line 327 of file ESTAnalyzer.h.
Referenced by LCFilter::initialize().
virtual float ESTAnalyzer::getValidMetric | ( | ) | const [inline, virtual] |
Obtain a valid (or the best) metric generated by this analyzer.
This method can be used to obtain a valid metric value for this analyzer. This value can be used to initialize metric values. By default this method returns 0, which should be ideal for distance-based metrics.
Reimplemented in CLU.
Definition at line 300 of file ESTAnalyzer.h.
Referenced by analyze(), and MSTClusterMaker::manager().
virtual int ESTAnalyzer::initialize | ( | ) | [pure virtual] |
Method to begin EST analysis.
This method is invoked just before commencement of EST analysis. This method typically loads the list of ESTs from a given input file. In addition, it may perform any pre-processing as the case may be. Some EST analyzers may also add dummy entries to aid in various operations.
Implemented in CLU, D2, D2Zim, FWAnalyzer, MatrixFileAnalyzer, and TwoPassD2.
Referenced by MSTClusterMaker::initialize(), InteractiveConsole::initialize(), and PMSTClusterMaker::makeClusters().
virtual bool ESTAnalyzer::isDistanceMetric | ( | ) | const [inline, virtual] |
Determine if this EST analyzer provides distance metrics or similarity metrics.
This method can be used to determine if this EST analyzer provides distance metrics or similarity metrics. If this method returns true
, then this EST analyzer returns distance metrics (smaller is better). On the other hand, if this method returns false
, then this EST analyzer returns similarity metrics (bigger is better).
true
.false
(by default) to indicate that this EST analyzer operates using similarity metrics. If it operates using distance metrics then this method returns true
. Reimplemented in D2, D2Zim, MatrixFileAnalyzer, and TwoPassD2.
Definition at line 268 of file ESTAnalyzer.h.
Referenced by InteractiveConsole::analyze().
bool ESTAnalyzer::loadFASTAFile | ( | const char * | fileName, | |
const bool | unpopulate = false | |||
) |
Method to load EST information from a FASTA file.
This method can be used to load information regarding ESTs from a FASTA file. The file name from where the data is to be loaded must be passed in as the parameter.
[in] | fileName | The file name of the FASTA file from where the EST information is to be uploaded. |
[in] | unpopulate | If this parameter is true then the header and sequence information in each EST is discarded to minimize memory foot print. |
Definition at line 122 of file ESTAnalyzer.cpp.
References analyzerName, EST::create(), EST::getID(), MPI_GET_RANK, noMaskBases, and EST::unpopulate().
Referenced by FWAnalyzer::initialize().
ESTAnalyzer & ESTAnalyzer::operator= | ( | const ESTAnalyzer & | src | ) | [private] |
A dummy operator=.
The operator=() is supressed for this class as it has constant members whose value is set when the object is created. These values cannot be changed during the lifetime of this object.
[in] | src | The source object from where data is to be copied. Currently this value is ignored. |
Definition at line 179 of file ESTAnalyzer.cpp.
bool ESTAnalyzer::parseArguments | ( | int & | argc, | |
char ** | argv | |||
) | [virtual] |
Process command line arguments.
This method is used to process command line arguments specific to this EST analyzer. This method is typically used from the main method just after the EST analyzer has been instantiated. This method consumes all valid command line arguments. If the command line arguments were valid and successfully processed, then this method returns true
.
[in,out] | argc | The number of command line arguments to be processed. |
[in,out] | argv | The array of command line arguments. |
true
if the command line arguments were successfully processed. Otherwise this method returns false
. This method returns true if all arguments are consumed successfully and if a valid estID and estFileName have been specified. Reimplemented in CLU, D2, D2Zim, FMWSCA, FWAnalyzer, MatrixFileAnalyzer, and TwoPassD2.
Definition at line 104 of file ESTAnalyzer.cpp.
References analyzerName, arg_parser::check_args(), and estFileName.
Referenced by main(), and FWAnalyzer::parseArguments().
int ESTAnalyzer::setHeuristicChain | ( | HeuristicChain * | chain | ) | [virtual] |
Method to attach a heuristic chain to this EST analyzer.
[in] | chain | The heuristic chain to be attached. |
Definition at line 75 of file ESTAnalyzer.cpp.
References chain.
Referenced by main().
virtual int ESTAnalyzer::setReferenceEST | ( | const int | estIdx | ) | [pure virtual] |
Set the reference EST id for analysis.
This method is invoked just before a batch of ESTs are analyzed via a call to the analyze(EST *) method. Setting the reference EST provides analyzer's an opportunity to optimize certain operations, if possible.
Implemented in CLU, D2, D2Zim, FWAnalyzer, MatrixFileAnalyzer, and TwoPassD2.
Referenced by InteractiveConsole::analyze(), PMSTClusterMaker::populateCache(), MSTClusterMaker::populateCache(), and LCFilter::runFilter().
void ESTAnalyzer::showArguments | ( | std::ostream & | os | ) | [virtual] |
Display valid command line arguments for this analyzer.
This method must be used to display all valid command line options that are supported by this analyzer. Note that derived classes may override this method to display additional command line options that are applicable to it. This method is typically used in the main() method when displaying usage information.
[out] | os | The output stream to which the valid command line arguments must be written. |
Reimplemented in CLU, D2, D2Zim, FMWSCA, FWAnalyzer, MatrixFileAnalyzer, and TwoPassD2.
Definition at line 96 of file ESTAnalyzer.cpp.
Referenced by showUsage().
const std::string ESTAnalyzer::analyzerName [protected] |
The name of this analyzer.
This instance variable contains the human recognizable name for this analyzer. This value is set when the analyzer is instantiated (in the constructor) and is never changed during the life time of this analyzer. This information is used when generating errors, warnings, and other output messages.
Definition at line 525 of file ESTAnalyzer.h.
Referenced by FWAnalyzer::dumpHeader(), loadFASTAFile(), TwoPassD2::parseArguments(), MatrixFileAnalyzer::parseArguments(), FWAnalyzer::parseArguments(), parseArguments(), D2Zim::parseArguments(), D2::parseArguments(), CLU::parseArguments(), and FWAnalyzer::showArguments().
HeuristicChain* ESTAnalyzer::chain [protected] |
The heuristic chain associated with this EST analyzer.
The heuristic chain contains a sequence of heuristics that must be used to minimize the number of pairs of ESTs that must be actually analyzed (using heavy weight algorithms such as D2). The chain is created in the main
method via a call to HeuristicChain::setupChain method and is set by main
method via a call to setHeuristicChain method.
Definition at line 488 of file ESTAnalyzer.h.
Referenced by analyze(), getHeuristicChain(), TwoPassD2::getMetric(), FWAnalyzer::initialize(), D2::runD2(), setHeuristicChain(), TwoPassD2::setReferenceEST(), D2Zim::setReferenceEST(), and D2::setReferenceEST().
arg_parser::arg_record ESTAnalyzer::commonArgsList [static, private] |
{ {"--readAhead", "Use a read head thread to load next EST data (NYI)", &ESTAnalyzer::readAhead, arg_parser::BOOLEAN}, {"--estFile", "Name of EST file (in FASTA format) to be processed", &ESTAnalyzer::estFileName, arg_parser::STRING}, {"--html", "Generate analysis report in HTML format", &ESTAnalyzer::htmlLog, arg_parser::BOOLEAN}, {"--no-mask-bases", "Don't mask out all lower case neucleotides in reads", &ESTAnalyzer::noMaskBases, arg_parser::BOOLEAN}, {NULL, NULL, NULL, arg_parser::BOOLEAN} }
The set of common arguments for all EST analyzers.
This instance variable contains a static list of arguments that are common all the EST analyzers. The common argument list is statically defined and shared by all EST instances.
Reimplemented in FWAnalyzer.
Definition at line 536 of file ESTAnalyzer.h.
char * ESTAnalyzer::estFileName = NULL [static, protected] |
The FASTA file from where EST data is to be read.
This member object is used to hold the file name from where all the EST data is to be loaded. This member is initialized in the constructor and is never changed during the life time of this class.
Definition at line 497 of file ESTAnalyzer.h.
Referenced by FWAnalyzer::dumpHeader(), getInputFileName(), FWAnalyzer::initialize(), and parseArguments().
bool ESTAnalyzer::htmlLog = false [static, protected] |
Flag to indicate if output results must be in HTML format.
This member is initialized to false. However, the value is changed by the parseArguments method depending on the actual value specified by the user.
Definition at line 505 of file ESTAnalyzer.h.
Referenced by FWAnalyzer::analyze(), FWAnalyzer::dumpEST(), CLU::dumpEST(), and FWAnalyzer::dumpHeader().
bool ESTAnalyzer::noMaskBases = false [static, protected] |
Flag to indicate if lower-case characters must be masked out of reads.
Typically lower-case characters ('a', 't', 'c', 'g') are used to indicate bases that must be masked out of reads. This notation is used by DUST (part of NCBI BLAST) utility that identifies and tags low complexity regions with lower-case letters. If this flag is false
(default) then these lower-case characters are converted to 'N' causing them to ignored by PEACE. If this flag is true
, then these bases are converted to upper-case equivalents. This flag is passed to EST::create which actually does the conversions.
Definition at line 468 of file ESTAnalyzer.h.
Referenced by loadFASTAFile().
const std::string ESTAnalyzer::outputFileName [protected] |
The file to which results must be written.
This member object is used to hold the file name to which all the analysis results are to be written. This member is initialized to NULL. However, the value is changed by the parseArguments method depending on the actual value specified by the user.
Definition at line 515 of file ESTAnalyzer.h.
Referenced by FWAnalyzer::analyze().
bool ESTAnalyzer::readAhead = false [static, protected] |
Flag to indicate if a read ahead thread must be used.
This boolean value is by default set to false. However, the value is changed by the parseArguments method depending on wether the use whishes to use a read-ahead feature.
Definition at line 453 of file ESTAnalyzer.h.
int ESTAnalyzer::refESTidx [protected] |
The index of the reference EST in a given file.
This member object is used to hold the index of a reference EST in a given file. The index values begin from 0 (zero). This member is initialized in the constructor and is changed by the setReferenceEST() id.
Definition at line 477 of file ESTAnalyzer.h.
Referenced by FWAnalyzer::analyze(), FWAnalyzer::dumpHeader(), TwoPassD2::getMetric(), MatrixFileAnalyzer::getMetric(), FWAnalyzer::getMetric(), D2Zim::getMetric(), D2::getMetric(), CLU::getMetric(), TwoPassD2::setReferenceEST(), MatrixFileAnalyzer::setReferenceEST(), FWAnalyzer::setReferenceEST(), D2Zim::setReferenceEST(), D2::setReferenceEST(), and CLU::setReferenceEST().