A Minimum Spanning Tree (MST) based parallel cluster maker. More...
#include <MSTClusterMaker.h>
Public Types | |
enum | MessageTags { REPOPULATE_REQUEST, COMPUTE_SIMILARITY_REQUEST, SIMILARITY_LIST, SIMILARITY_COMPUTATION_DONE, COMPUTE_MAX_SIMILARITY_REQUEST, MAX_SIMILARITY_RESPONSE, ADD_EST, TRANSITIVITY_LIST, COMPUTE_TOTAL_ANALYSIS_COUNT } |
The set of tags exchanged between various processes. More... | |
Public Member Functions | |
virtual | ~MSTClusterMaker () |
The destructor. | |
virtual void | showArguments (std::ostream &os) |
Display valid command line arguments for this cluster maker. | |
virtual bool | parseArguments (int &argc, char **argv) |
Process command line arguments. | |
virtual int | makeClusters () |
Method to begin clustering. | |
virtual void | displayStats (std::ostream &os) |
Method to display performance statistics. | |
virtual int | addDummyCluster (const std::string name) |
Add a dummy cluster to the cluster maker. | |
virtual void | addEST (const int clusterID, const int estIdx) |
Add a EST directly to a given cluster. | |
Protected Member Functions | |
virtual int | manager () |
Helper method to perform manager tasks. | |
virtual int | worker () |
Helper method to perform worker tasks. | |
virtual int | initialize () |
A method to handle initialization tasks for the MSTClusterMaker. | |
virtual int | populateMST () |
Helper method to generate (or compute) or load MST data from file. | |
virtual int | buildAndShowClusters () |
Utility method to do the final clustering step. | |
virtual float | analyze (const int otherEST) |
Helper method to call the actual heavy-weight analysis method(s). | |
virtual void | populateCache (const int estIdx, SMList *metricList=NULL) |
Computes sends/receives similarity list for a given EST. | |
void | updateProgress (const int estsAnalyzed, const int totalESTcount) |
Method to generate progress logs (if requested by user). | |
int | managerUpdateCaches (int estIdx, const bool refreshEST=true) |
Helper method in Manager process to update distributed caches. | |
void | computeNextESTidx (int &parentESTidx, int &estToAdd, float &similarity, int &alignmentData, int &directionData) const |
Helper method in Manager process to collaboratively compute the next EST to be added to the MST. | |
int | getOwnerProcess (const int estIdx) const |
Determine the owner process Rank for a given estIdx. | |
void | getOwnedESTidx (int &startIndex, int &endIndex) |
Helper method to compute the start and ending indexes of the EST that this process owns. | |
void | workerProcessRequests () |
Helper method for a worker process. | |
void | sendToWorkers (int data, const int tag) const |
Distribute data and tag to all the workers. | |
bool | hasValidSMEntry (const SMList &list) const |
Method to detect if a given SMList has at least one, valid entry. | |
void | estAdded (const int estIdx, std::vector< int > &repopulateList) |
Helper method to distribute index of newly added EST to all workers and gather cache repopulation requests. | |
void | addMoreChildESTs (const int parentESTidx, int &estToAdd, float &metric, int &alignmentData, int &directionData, int &pendingESTs) |
Helper method in Manager process to add as many child nodes as possible for the given parent. | |
MSTClusterMaker (ESTAnalyzer *analyzer, const int refESTidx, const std::string &outputFile) | |
The default constructor. | |
Protected Attributes | |
MSTCache * | cache |
The cache that holds similarity metrics for MST construction. | |
std::ofstream | progressFile |
File stream to log progress information. | |
Static Protected Attributes | |
static int | cacheSize = 128 |
Variable to indicate per-EST similarity cache size. | |
static bool | strictOrder = false |
Variable to indicate if strict ordering of worker Ranks must be followed. | |
static bool | dontCluster = false |
Command line option to avoid the clustering phase. | |
static bool | prettyPrint = false |
Command line option to print a pretty cluster tree. | |
static bool | guiPrint = false |
Command line option to print the cluster tree for PEACE GUI. | |
static char * | inputMSTFile = NULL |
Variable to indicate if MST information must be simply read from a given file. | |
static char * | outputMSTFile = NULL |
Variable to indicate if MST information must be written to a given file. | |
static bool | noCacheRepop = true |
Command line option to suppress cache repopulation. | |
static int | maxUse = -1 |
Command line option to enable maximum use of precomputed scores for building MST. | |
static float | clsThreshold = 1.0 |
Command line option to set the clustering threshold to be used in deriving clusters from the MST. | |
static char * | cacheType = DefCacheType |
Command line option to set the type of cache to be used by PEACE. | |
static char * | progFileName = NULL |
Name of file to report progress in during MST construction. | |
static arg_parser::arg_record | argsList [] |
The set of common arguments for the MST cluster maker. | |
Private Attributes | |
MST * | mst |
The Minimum Spanning Tree (MST) built by this class. | |
MSTCluster | root |
The top-level root cluster that contains all other clusters. | |
Friends | |
class | ClusterMakerFactory |
A Minimum Spanning Tree (MST) based parallel cluster maker.
This class encapsulates the core functionality needed to construct a MST-based EST clusters in a parallel/distributed manner using the Message Passing Interface (MPI) library. This class includes functionality for both the Manager (MPI Rank == 0) and Worker (MPI Rank > 0) processes. Necessary functionality to distinguish and operate either as Manager or Worker is already built into the class. This class uses the MSTCache and MSTCluster classes to help in performing the various activities. Refer to the documentation on the various method for detailed description on their functionality and usage.
Definition at line 57 of file MSTClusterMaker.h.
The set of tags exchanged between various processes.
This enum provides meanigful names to the various tags (integers) exchanged between the master and worker processes participating in the construction of a MST in a parallel/distributed manner.
REPOPULATE_REQUEST | |
COMPUTE_SIMILARITY_REQUEST | |
SIMILARITY_LIST | |
SIMILARITY_COMPUTATION_DONE | |
COMPUTE_MAX_SIMILARITY_REQUEST | |
MAX_SIMILARITY_RESPONSE | |
ADD_EST | |
TRANSITIVITY_LIST | |
COMPUTE_TOTAL_ANALYSIS_COUNT |
Definition at line 67 of file MSTClusterMaker.h.
MSTClusterMaker::~MSTClusterMaker | ( | ) | [virtual] |
The destructor.
The destructor frees up all any dynamic memory allocated by this object for its operations.
Definition at line 107 of file MSTClusterMaker.cpp.
References mst.
MSTClusterMaker::MSTClusterMaker | ( | ESTAnalyzer * | analyzer, | |
const int | refESTidx, | |||
const std::string & | outputFile | |||
) | [protected] |
The default constructor.
The default constructor for this class. The constructor is made private so that this class cannot be directly instantiated. However, since the ClusterMakerFactory is a friend of this class, an object can be instantiated via the ClusterMakerFactory::create() method.
[in,out] | analyzer | The EST analyzer to be used for obtaining similarity metrics between two ESTs. This parameter is simply passed onto the base class. |
[in] | refESTidx | The reference EST index value to be used to root the spanning tree created by this method. This parameter should be >= 0. This value is simply passed onto the base class. |
[in] | outputFile | The name of the output file to which the raw MST cluster information is to be written. If this parameter is the empty string then output is written to standard output. This value is simply passed onto the base class. |
Definition at line 100 of file MSTClusterMaker.cpp.
int MSTClusterMaker::addDummyCluster | ( | const std::string | name | ) | [virtual] |
Add a dummy cluster to the cluster maker.
This method can be used to add a dummy cluster to the cluster maker. The dummy clusters are added as direct descendants of the root
cluster with the given name.
[in] | name | A human readable name to be set for this cluster. No special checks are made on the contents of the string. |
Implements ClusterMaker.
Definition at line 696 of file MSTClusterMaker.cpp.
References MSTCluster::add(), MSTCluster::getClusterID(), and root.
void MSTClusterMaker::addEST | ( | const int | clusterID, | |
const int | estIdx | |||
) | [virtual] |
Add a EST directly to a given cluster.
This method can be used to add an EST directly to a cluster. This bypasses any traditional mechanism and directly adds the EST to the specified cluster.
[in] | clusterID | The unique ID of the cluster to which the EST is to be added. This value must have been obtained from an earlier (successful) call to the ClusterMaker::addDummyCluster method. |
[in] | estIdx | The EST to be added to the given cluster. Once the EST has been added to this cluster it will not be included in the clustering process performed by this cluster maker. |
Implements ClusterMaker.
Definition at line 704 of file MSTClusterMaker.cpp.
References MSTCluster::add(), ClusterMaker::analyzer, EST::getEST(), ESTAnalyzer::getInvalidMetric(), root, and EST::setProcessed().
void MSTClusterMaker::addMoreChildESTs | ( | const int | parentESTidx, | |
int & | estToAdd, | |||
float & | metric, | |||
int & | alignmentData, | |||
int & | directionData, | |||
int & | pendingESTs | |||
) | [protected] |
Helper method in Manager process to add as many child nodes as possible for the given parent.
This is a helper method that is used only in the Manager process only when the maxUse
parameter is != -1. This method tries to add more children rooted at the given parent to the MST as long as the metric is better than maxUse
value. This method operates as follows:
First, this method sends request to compute the best local choice to each of the worker processes.
Next it computes its own local (at the Manager's end) best choice for the next EST node to be added.
It then collects response for best local choice from each worker process and tracks the best reported value.
If the next best entry is still rooted at this parent and the metric is better than maxUse
then the EST is added to MST and the process is repeated from step 1. Otherwise, the parameters are updated to the last added EST and the method returns.
[in] | parentESTidx | The source EST index from where the similarity metric is being measured. The parentESTidx is already present in the MST. |
[in,out] | estToAdd | The EST that has just been added to the MST. This method updates this value if additional ESTs are added to the MST by this method. |
[in,out] | metric | The similarity/distance metric between the parentESTidx and the estToAdd. This method updates this value if additional ESTs are added to the MST by this method. |
[in,out] | alignmentData | The alignment information between the two ESTs represented by their index values in parentESTidx and estToAdd. This method updates this value if additional ESTs are added to the MST by this method. |
[in,out] | directionData | The direction information between the two ESTs represented by their index values in parentESTidx and estToAdd. This method updates this value if additional ESTs are added to the MST by this method. |
[in,out] | pendingESTs | The number of pending ESTs that have not yet been added to the MST. This value is used and udpated by this method each time it adds a EST. |
Definition at line 247 of file MSTClusterMaker.cpp.
References MST::addNode(), ClusterMaker::analyzer, ASSERT, ESTAnalyzer::compareMetrics(), computeNextESTidx(), managerUpdateCaches(), maxUse, and mst.
Referenced by manager().
float MSTClusterMaker::analyze | ( | const int | otherEST | ) | [protected, virtual] |
Helper method to call the actual heavy-weight analysis method(s).
This is a helper method that is invoked from the populateCache() method to obtain the relationship metric (either via CLU or d2) between the current parent EST and the given otherEST. This method was introduced to enable chlid classes (such as TransMSTClusterMaker) to conveniently intercept analyzer calls and potentially shortcircuit them using concepts of conditional-transitivity.
[in] | otherEST | The index of the other EST to which the metric is required. |
Reimplemented in TransMSTClusterMaker.
Definition at line 468 of file MSTClusterMaker.cpp.
References ESTAnalyzer::analyze(), and ClusterMaker::analyzer.
Referenced by populateCache().
int MSTClusterMaker::buildAndShowClusters | ( | ) | [protected, virtual] |
Utility method to do the final clustering step.
This is a refactored (primarily to keep the code clutter to a minimum) utility method that is used to perform the final step in clustering. This method essentially calls the MSTCluster::makeClusters method that builds the clusters using the MST. Once the clusters are built, this method dumps the cluster information to the user-specified (via command line arguments) output stream.
Definition at line 713 of file MSTClusterMaker.cpp.
References ClusterMaker::analyzer, clsThreshold, ESTAnalyzer::getInputFileName(), MST::getNodes(), guiPrint, MSTCluster::guiPrintClusterTree(), MSTCluster::makeClusters(), mst, NO_ERROR, ClusterMaker::outputFileName, prettyPrint, MSTCluster::printClusterTree(), and root.
Referenced by makeClusters().
void MSTClusterMaker::computeNextESTidx | ( | int & | parentESTidx, | |
int & | estToAdd, | |||
float & | similarity, | |||
int & | alignmentData, | |||
int & | directionData | |||
) | const [protected] |
Helper method in Manager process to collaboratively compute the next EST to be added to the MST.
This is a helper method that is used only in the Manager process to perform the following tasks using the newly added estIdx value:
First, this method sends request to compute the best local choice to each of the worker processes.
Next it computes its own local (at the Manager's end) best choice for the next EST node to be added.
It then collects response for best local choice from each worker process and tracks the best reported value.
[out] | parentESTidx | The source EST index from where the similarity metric is being measured. The srcESTidx is already present in the MST. |
[out] | estToAdd | The destination EST index that is the best choice to be added to the MST (based on the local information). |
[out] | similarity | The similarity metric between the srcESTidx and the destESTidx. |
[out] | alignmentData | The alignment information between the two ESTs represented by their index values in parentESTidx and estToAdd. |
[out] | directionData | The direction information between the two ESTs represented by their index values in parentESTidx and estToAdd. |
Definition at line 210 of file MSTClusterMaker.cpp.
References ClusterMaker::analyzer, cache, ESTAnalyzer::compareMetrics(), COMPUTE_MAX_SIMILARITY_REQUEST, MSTCache::getBestEntry(), MAX_SIMILARITY_RESPONSE, MPI_CODE, MPI_GET_SIZE, MPI_RECV, MPI_TYPE_INT, sendToWorkers(), strictOrder, and TRACK_IDLE_TIME.
Referenced by addMoreChildESTs(), and manager().
void MSTClusterMaker::displayStats | ( | std::ostream & | os | ) | [virtual] |
Method to display performance statistics.
This method overrides the empty implementation in the base class to display statistics on cache usage and MPI calls for tracking and reporting the performance and behavior of this class.
[out] | os | The output stream to which the statistics must be written. |
Reimplemented in TransMSTClusterMaker.
Definition at line 642 of file MSTClusterMaker.cpp.
References cache, MSTCache::displayStats(), and MPI_GET_RANK.
Referenced by makeClusters().
void MSTClusterMaker::estAdded | ( | const int | estIdx, | |
std::vector< int > & | repopulateList | |||
) | [protected] |
Helper method to distribute index of newly added EST to all workers and gather cache repopulation requests.
This is a helper method that was added to streamline the code in managerUpdateCaches method. This method performs the following tasks:
First it uses the sendToWorkers()
method to distribute the estIdx
(parameter) value to all the workers.
Next it prunes the local caches on the manager.
It then obtains repopulation requests from each worker and places EST indexes to be repopulated in the repoulateList parameter.
[in] | estIdx | The index of the newly added EST that must be distributed to all the workers. |
[out] | repopulateList | A vector that will contain the list of ESTs that need to be repopulated (based on requests received from various workers). |
Definition at line 139 of file MSTClusterMaker.cpp.
References ADD_EST, cache, MPI_CODE, MPI_GET_SIZE, MPI_PROBE, MPI_RECV, MPI_STATUS, MPI_TYPE_INT, MSTCache::pruneCaches(), REPOPULATE_REQUEST, sendToWorkers(), and strictOrder.
Referenced by managerUpdateCaches(), and worker().
void MSTClusterMaker::getOwnedESTidx | ( | int & | startIndex, | |
int & | endIndex | |||
) | [protected] |
Helper method to compute the start and ending indexes of the EST that this process owns.
This method was introduced to keep the math and logic clutter involved in computing the list of owned ESTs out of the methods that use the information. This method returns the range, such that: startIndex
<= ownedESTidx < endIndex
.
[out] | startIndex | The starting (zero-based) index value of the contiguous range of ESTs that this process owns. |
[out] | endIndex | The ending (zero-based) index value of the contiguous range ESTs that this process owns. The value returned in this parameter is not included in the range of values. |
Definition at line 614 of file MSTClusterMaker.cpp.
References EST::getESTList(), MPI_GET_RANK, and MPI_GET_SIZE.
Referenced by populateCache(), populateMST(), TransMSTClusterMaker::processMetricList(), and TransMSTClusterMaker::~TransMSTClusterMaker().
int MSTClusterMaker::getOwnerProcess | ( | const int | estIdx | ) | const [protected] |
Determine the owner process Rank for a given estIdx.
This method is a convenience method to determine the Rank of the process that logically owns a given EST. The owning process is responsible for maintaining the cache for a given EST. The owners are assigned in a simple fashion and ESTs are evenly divided up amongst all the processes.
[in] | estIdx | The index of the EST whose owner process's rank is requested. It is assumed that the estIdx is valid. If invalid EST index values are supplied then the operation of this method is undefined. |
Definition at line 451 of file MSTClusterMaker.cpp.
References EST::getESTList(), and MPI_GET_SIZE.
Referenced by managerUpdateCaches(), TransMSTClusterMaker::populateCache(), and populateCache().
bool MSTClusterMaker::hasValidSMEntry | ( | const SMList & | list | ) | const [protected] |
Method to detect if a given SMList has at least one, valid entry.
This method is used (in the populateCache()) to determine if a given SMList has at least one valid entry. This method is useful particularly when a empty SMList is received from a remote process and in this case there ine one entry in the SMList (-1, -1).
[in] | list | The list to check if it has a valid entry. |
Definition at line 789 of file MSTClusterMaker.cpp.
Referenced by populateCache().
int MSTClusterMaker::initialize | ( | ) | [protected, virtual] |
A method to handle initialization tasks for the MSTClusterMaker.
This method is called after the ESTs have been loaded into the ESTAnalyzer, in the makeClusters method.
This method does nothing as the MSTClusterMaker does not need to do any initialization. It is provided as a convenience method for inheriting subclasses to use for initialization.
Implements ClusterMaker.
Reimplemented in TransMSTClusterMaker.
Definition at line 473 of file MSTClusterMaker.cpp.
References ClusterMaker::analyzer, ESTAnalyzer::initialize(), and NO_ERROR.
int MSTClusterMaker::makeClusters | ( | ) | [virtual] |
Method to begin clustering.
This method must be used to create clusters based on a given EST analysis method. This method performs the following tasks:
Implements ClusterMaker.
Definition at line 744 of file MSTClusterMaker.cpp.
References ClusterMaker::analyzer, ASSERT, buildAndShowClusters(), cache, clsThreshold, displayStats(), dontCluster, ESTAnalyzer::getInputFileName(), inputMSTFile, mst, NO_ERROR, outputMSTFile, populateMST(), and MST::serialize().
int MSTClusterMaker::manager | ( | ) | [protected, virtual] |
Helper method to perform manager tasks.
This method has been introduced to streamline the operations of the MSTClusterMaker when it operates as the manager. The MPI process with Rank 0 (zero) acts as the manager and coordinates all the activities of the MSTClusterMaker. This method is invoked from the makeClusters() method.
Definition at line 310 of file MSTClusterMaker.cpp.
References ADD_EST, addMoreChildESTs(), MST::addNode(), ClusterMaker::analyzer, ASSERT, computeNextESTidx(), ESTAnalyzer::getAlignmentData(), EST::getEST(), EST::getESTCount(), EST::getESTList(), EST::getProcessedESTCount(), ESTAnalyzer::getValidMetric(), EST::hasBeenProcessed(), managerUpdateCaches(), maxUse, mst, NO_ERROR, ClusterMaker::refESTidx, sendToWorkers(), and updateProgress().
Referenced by populateMST().
int MSTClusterMaker::managerUpdateCaches | ( | int | estIdx, | |
const bool | refreshEST = true | |||
) | [protected] |
Helper method in Manager process to update distributed caches.
This is a helper method that is used only in the Manager process to perform the following tasks using the newly added estIdx value:
First, this method broadcasts the newly added EST index (estIdx
) to all the workers.
Next it prunes it local cache via the MSTCache::pruneCaches() method.
It then collects requests to repopulate specific caches from all the workers.
It then adds the newly created est to the list of caches to be repopulated and broadcasts request to repopulate caches to each worker and participates in cache repopulation task by calling the populateCache() method.
[in] | estIdx | The index of the newly added EST. |
[in] | refreshEST | If this flag is true (the default value), then the neighbors for the newly added EST (specified by estIdx) are computed and the caches are updated. |
Definition at line 174 of file MSTClusterMaker.cpp.
References COMPUTE_SIMILARITY_REQUEST, estAdded(), getOwnerProcess(), MANAGER_RANK, MPI_CODE, MPI_RECV, MPI_TYPE_INT, populateCache(), sendToWorkers(), SIMILARITY_COMPUTATION_DONE, and TRACK_IDLE_TIME.
Referenced by addMoreChildESTs(), and manager().
bool MSTClusterMaker::parseArguments | ( | int & | argc, | |
char ** | argv | |||
) | [virtual] |
Process command line arguments.
This method is used to process command line arguments specific to this cluster maker. This method is typically used from the main method just after the cluster maker has been instantiated. This method consumes all valid command line arguments. If the command line arguments were valid and successfully processed, then this method returns true
.
[in,out] | argc | The number of command line arguments to be processed. |
[in,out] | argv | The array of command line arguments. |
true
if the command line arguments were successfully processed. Otherwise this method returns false
. Reimplemented from ClusterMaker.
Definition at line 123 of file MSTClusterMaker.cpp.
References cacheSize, arg_parser::check_args(), and strictOrder.
void MSTClusterMaker::populateCache | ( | const int | estIdx, | |
SMList * | metricList = NULL | |||
) | [protected, virtual] |
Computes sends/receives similarity list for a given EST.
This method is a shared method that is used by both the manager and workers. This method is used to compute the similarity metric and cache the highest set of similarity metrics. This method operates as follows:
Each process computes a subset of the EST similarity metric in the range k*Rank < otherEstIdx < (k+1)*Rank, where k=estList.size() / MPI::COMM_WORLD.Get_size(), and Rank is the MPI rank of this process.
If this process is the cache owner for the est, (that is, estIdx % Rank == 0), then it receives data from other processes and merges the information with its own list, retaining the top-most similarity metrics.
[in] | estIdx | The index of the EST that was just added to the MST and for which the adjacent neighbors need to be determined. |
[out] | metricList | If this pointer is not NULL, then this vector is populated with the set of metrics that were computed for estIdx only on the owner process. This list contains the metrics collated from all the processes participating in the distributed computing process. Currently, this feature is used by TransMSTClusterMaker to obtain the list of metrics computed. |
Reimplemented in TransMSTClusterMaker.
Definition at line 486 of file MSTClusterMaker.cpp.
References analyze(), ClusterMaker::analyzer, ASSERT, cache, ESTAnalyzer::compareMetrics(), ESTAnalyzer::getAlignmentData(), EST::getEST(), HeuristicChain::getHeuristicChain(), HeuristicChain::getHint(), ESTAnalyzer::getInvalidMetric(), getOwnedESTidx(), getOwnerProcess(), EST::hasBeenProcessed(), hasValidSMEntry(), MSTCache::isESTinMST(), MANAGER_RANK, MSTCache::mergeList(), MPI_CODE, MPI_GET_RANK, MPI_GET_SIZE, MPI_PROBE, MPI_RECV, MPI_SEND, MPI_STATUS, MPI_TYPE_CHAR, MPI_TYPE_INT, MSTCache::preprocess(), ESTAnalyzer::setReferenceEST(), SIMILARITY_COMPUTATION_DONE, SIMILARITY_LIST, and strictOrder.
Referenced by managerUpdateCaches(), and worker().
int MSTClusterMaker::populateMST | ( | ) | [protected, virtual] |
Helper method to generate (or compute) or load MST data from file.
This is a helper method that is invoked from the makeClusters() method. This method performs one of the following tasks:
If an inputMSTFile has not been specified, then this method builds an MST (either on one process or many MPI processes). This method first creates a local cache local cache that contains information to build the MST. It then builds the MST calling the manager() or worker() method depending on the MPI-rank of this process.
If an inputMSTFile has indeed been specified as a command line parameter, then this method loads the MST from the specified MST file.
Definition at line 650 of file MSTClusterMaker.cpp.
References ClusterMaker::analyzer, ASSERT, cache, cacheSize, cacheType, MST::deSerialize(), EST::getESTList(), getOwnedESTidx(), inputMSTFile, manager(), MANAGER_RANK, MPI_GET_RANK, mst, NO_ERROR, noCacheRepop, and worker().
Referenced by makeClusters().
void MSTClusterMaker::sendToWorkers | ( | int | data, | |
const int | tag | |||
) | const [protected] |
Distribute data and tag to all the workers.
This method provides a convenient mechanism to broadcast a given integer data and tag to all the workers.
[in] | data | The integer to be sent to each and every worker. |
[in] | tag | The message tag to be sent to each and every process. |
Definition at line 781 of file MSTClusterMaker.cpp.
References MPI_GET_SIZE, MPI_SEND, and MPI_TYPE_INT.
Referenced by computeNextESTidx(), estAdded(), manager(), and managerUpdateCaches().
void MSTClusterMaker::showArguments | ( | std::ostream & | os | ) | [virtual] |
Display valid command line arguments for this cluster maker.
This method must be used to display all valid command line options that are supported by this cluster maker (and its base classes).
[out] | os | The output stream to which the valid command line arguments must be written. |
Reimplemented from ClusterMaker.
Definition at line 114 of file MSTClusterMaker.cpp.
References ClusterMaker::name.
void MSTClusterMaker::updateProgress | ( | const int | estsAnalyzed, | |
const int | totalESTcount | |||
) | [protected] |
Method to generate progress logs (if requested by user).
This method is a helper method that is called from the core manager() method loop to generate progress logs as ESTs are analyzed and updated. This method cuts logs only if the progFileName comamnd line argument was specified and the progress file could be created.
[in] | estsAnalyzed | The number of ESTs analyzed thus far. |
[in] | totalESTcount | The total number of ESTs to be analyzed. |
Definition at line 291 of file MSTClusterMaker.cpp.
References progFileName, and progressFile.
Referenced by manager().
int MSTClusterMaker::worker | ( | ) | [protected, virtual] |
Helper method to perform worker tasks.
This method has been introduced to streamline the operations of the MSTClusterMaker when it operates as a worker. All the MPI processes with non-zero rank act as a worker and collaborate with the manager to assist in various activities of the MSTClusterMaker. This method is invoked from the makeClusters() method.
Definition at line 387 of file MSTClusterMaker.cpp.
References ADD_EST, cache, COMPUTE_MAX_SIMILARITY_REQUEST, COMPUTE_SIMILARITY_REQUEST, estAdded(), MSTCache::getBestEntry(), MANAGER_RANK, MAX_SIMILARITY_RESPONSE, MPI_CODE, MPI_PROBE, MPI_RECV, MPI_SEND, MPI_STATUS, MPI_TYPE_INT, NO_ERROR, populateCache(), MSTCache::pruneCaches(), and REPOPULATE_REQUEST.
Referenced by populateMST().
void MSTClusterMaker::workerProcessRequests | ( | ) | [protected] |
Helper method for a worker process.
This method is invoked from the worker() method to receive and process various requests from the manager process. This method currently handles the following requests:
COMPUTE_SIMILARITY_REQUEST
: Computes the subset of the similarity metric for the given EST index and returns the partial list back to the owner process.
COMPUTE_MAX_SIMILARITY_REQUEST
: Computes the highest similarity value between all the ESTs on this cluster and returns the top entry back to the manager. Once this request has been processed this method returns control back.
friend class ClusterMakerFactory [friend] |
Reimplemented in TransMSTClusterMaker.
Definition at line 58 of file MSTClusterMaker.h.
arg_parser::arg_record MSTClusterMaker::argsList [static, protected] |
{ {"--cache", "#similarity metrics to cache per EST", &MSTClusterMaker::cacheSize, arg_parser::INTEGER}, {"--no-cache-repop", "Suppress EST cache repopulation", &MSTClusterMaker::noCacheRepop, arg_parser::BOOLEAN}, {"--no-order", "Disable strict order of processing messages", &MSTClusterMaker::strictOrder, arg_parser::BOOLEAN}, {"--input-mst-file", "Read MST data from file (skip parallel MST building)", &MSTClusterMaker::inputMSTFile, arg_parser::STRING}, {"--output-mst-file", "Output MST data to file", &MSTClusterMaker::outputMSTFile, arg_parser::STRING}, {"--dont-cluster", "Just generate MST data. Don't do clustering", &MSTClusterMaker::dontCluster, arg_parser::BOOLEAN}, {"--pretty-print", "Print a pretty cluster tree.", &MSTClusterMaker::prettyPrint, arg_parser::BOOLEAN}, {"--gui-print", "Print the cluster tree for GUI processing.", &MSTClusterMaker::guiPrint, arg_parser::BOOLEAN}, {"--maxUse", "Set a threshold to aggressively use metrics (default=0)", &MSTClusterMaker::maxUse, arg_parser::INTEGER}, {"--clsThreshold", "Set a threshold for clustering (default=1.0)", &MSTClusterMaker::clsThreshold, arg_parser::FLOAT}, {"--cacheType", "Set type of cache (heap or mlist) to use (default=heap)", &MSTClusterMaker::cacheType, arg_parser::STRING}, {"--progress", "Log MST construction progress in a file (used by GUI)", &MSTClusterMaker::progFileName, arg_parser::STRING}, {NULL, NULL, NULL, arg_parser::BOOLEAN} }
The set of common arguments for the MST cluster maker.
This instance variable contains a static list of arguments that are common all the MST cluster maker objects.
Definition at line 804 of file MSTClusterMaker.h.
MSTCache* MSTClusterMaker::cache [protected] |
The cache that holds similarity metrics for MST construction.
This object is used to cache the similarity metrics for all ESTs that are owned by this process (that is, estIdx % Rank == 0, where Rank is the MPI rank of this process). The cache contains similarity metrics to facilitate rapid construction of the MST. Both the manager and worker processes have their own caches and manage them independently. This spreads out the memory requirement for the caches across multiple processes enabling large (in 10s of GB) caches.
The cache is created just before the clustering process commences and is deleted immediately after the clustering process (to minimize memory footprint).
Definition at line 821 of file MSTClusterMaker.h.
Referenced by computeNextESTidx(), displayStats(), estAdded(), makeClusters(), populateCache(), populateMST(), and worker().
int MSTClusterMaker::cacheSize = 128 [static, protected] |
Variable to indicate per-EST similarity cache size.
This variable is used to indicate the number of similarity metrics that must be cached for a given EST. This value is initialized to 128. The value is changed by the parseArguments() method if the user has specified an option to override the default.
Definition at line 187 of file MSTClusterMaker.h.
Referenced by parseArguments(), and populateMST().
char * MSTClusterMaker::cacheType = DefCacheType [static, protected] |
Command line option to set the type of cache to be used by PEACE.
This member variable is used to indicate the type of cache that must be used to store metrics to facilitate rapid construction of the MST. The default cache used in the MSTHashCache indicated by the cacheType set to "hash"
. The alternative cache in the MSTMultiListCache (indicated by cacheType value of "mlist"
). The user may override the default using the command line parameter --cacheType
.
Definition at line 341 of file MSTClusterMaker.h.
Referenced by populateMST().
float MSTClusterMaker::clsThreshold = 1.0 [static, protected] |
Command line option to set the clustering threshold to be used in deriving clusters from the MST.
This member variable is used to set the clustering threshold. There are two "special" options here:
1.0 -- Corresponds to the TwoPassD2 analyzer which uses different window lengths, each with different thresholds. If TwoPassD2 is used then clsThreshold must be set to 1.0.
-1 -- Corresponds to the mean/variance-based threshold. Intended to be used with the CLU analyzer.
Definition at line 328 of file MSTClusterMaker.h.
Referenced by buildAndShowClusters(), and makeClusters().
bool MSTClusterMaker::dontCluster = false [static, protected] |
Command line option to avoid the clustering phase.
If this member variable is true
, then this class only generates MST information and does not do clustering. By default this variable is initialized to false
. However, the value can be changed by the user through command line arguments. The change of value occurs in the parseArguments() method if the user has specified an option to override the default.
Definition at line 236 of file MSTClusterMaker.h.
Referenced by makeClusters().
bool MSTClusterMaker::guiPrint = false [static, protected] |
Command line option to print the cluster tree for PEACE GUI.
If this member variable is true
, then this class prints the ClusterTree in a format that is processible by the GUI. By default this variable is initialized to false
. However, the value can be changed by the user through command line argument (--gui-print). The change of value occurs in the parseArguments() method if the user has specified an option to override the default.
Definition at line 260 of file MSTClusterMaker.h.
Referenced by buildAndShowClusters().
char * MSTClusterMaker::inputMSTFile = NULL [static, protected] |
Variable to indicate if MST information must be simply read from a given file.
This member variable is used to hold the name of the file (with full path) from where MST information must be read. This instance variable is initialized to NULL. However, if the input MST file is specified then MST building is skipped and MST data read from the file is used for further processing.
Definition at line 271 of file MSTClusterMaker.h.
Referenced by makeClusters(), and populateMST().
int MSTClusterMaker::maxUse = -1 [static, protected] |
Command line option to enable maximum use of precomputed scores for building MST.
If this member variable is set to a value other than -1, then the MSTClusterMaker will try to use all the ESTs that have a metric better than the value specified for maxUse. Maximally using good metrics will ultimately reduce the total number of analysis that need to be performed, thereby reducing overall time for clustering.
Definition at line 313 of file MSTClusterMaker.h.
Referenced by addMoreChildESTs(), and manager().
MST* MSTClusterMaker::mst [private] |
The Minimum Spanning Tree (MST) built by this class.
This instance variable holds a pointer to the MST created by this class when it operates as a manager process. This pointer is initialized to NULL and a MST is created in the manager() method.
Definition at line 840 of file MSTClusterMaker.h.
Referenced by addMoreChildESTs(), buildAndShowClusters(), makeClusters(), manager(), populateMST(), and ~MSTClusterMaker().
bool MSTClusterMaker::noCacheRepop = true [static, protected] |
Command line option to suppress cache repopulation.
If this member variable is true
, then this class does not repopulate caches once a EST cache becomes empty. By default this variable is initialized to false
. However, the value can be changed by the user through command line argument (--no-cache-repop). The change of value occurs in the parseArguments() method if the user has specified an option to override the default.
If this parameter is not specified then the MSTCache will request lists to be repopulated when needed. Repopulating lists guarantees that ultimately a MST will be developed. If repopulation is suppressed via this parameter then the resulting spanning tree may not be a MST; however computation time decreases.
Definition at line 301 of file MSTClusterMaker.h.
Referenced by populateMST().
char * MSTClusterMaker::outputMSTFile = NULL [static, protected] |
Variable to indicate if MST information must be written to a given file.
This member variable is used to hold the name of the file (with full path) to which MST information must be written. This instance variable is initialized to NULL. However, if the output MST file is specified then MST data built by this program is written to the specified file.
Definition at line 282 of file MSTClusterMaker.h.
Referenced by makeClusters().
bool MSTClusterMaker::prettyPrint = false [static, protected] |
Command line option to print a pretty cluster tree.
If this member variable is true
, then this class prints a pretty ASCII tree with the cluster information By default this variable is initialized to false
. However, the value can be changed by the user through command line argument (--pretty-print). The change of value occurs in the parseArguments() method if the user has specified an option to override the default.
Definition at line 248 of file MSTClusterMaker.h.
Referenced by buildAndShowClusters().
char * MSTClusterMaker::progFileName = NULL [static, protected] |
Name of file to report progress in during MST construction.
This command line argument provides the name of the log file where progress information is to be written. The progress information is in the form: #estsProcessed, #ests. This value is specified via a command line argument.
Definition at line 350 of file MSTClusterMaker.h.
Referenced by updateProgress().
std::ofstream MSTClusterMaker::progressFile [protected] |
File stream to log progress information.
This output stream is created when the first progress information is logged and closed after the last progress information has been logged. The progress information is generated by the updateProgress method if progressFileName is not NULL.
Definition at line 830 of file MSTClusterMaker.h.
Referenced by updateProgress().
MSTCluster MSTClusterMaker::root [private] |
The top-level root cluster that contains all other clusters.
This member represents the top-level root cluster that contain all other clusters created by this cluster maker. This cluster also contains dummy clusters that are created by Filter objects used in conjunction with clustering.
Definition at line 849 of file MSTClusterMaker.h.
Referenced by addDummyCluster(), addEST(), and buildAndShowClusters().
bool MSTClusterMaker::strictOrder = false [static, protected] |
Variable to indicate if strict ordering of worker Ranks must be followed.
If this member variable is true
, then messages dispatched by workers and the manager are always read in a fixed order of increasing ranks. That is, messages from rank 0 (zero) are processed first, then messages from process with rank 1, so on and so forth. On the other hand if this variable is false
, then messages are processed in the order they are received.
The strictOrder approach guarantees consistent results for each run (involving the same number of processes) and the resulting MSTs are alll identical. However, a process may have to wait (idle wasting time) until a message from the appropriate process (with a given rank) is actually received. This may slow down the overall computational rate, particularly when the work load get's skewed toward the end of MST construction.
On the other hand, if strictOrder is relaxed (by setting strictOrder variable to false
) then messsages are processed as soon as they are received, in the order in which messages arrive. This approach minimizes wait times. However, the MST constructed between multiple runs may not be identical as equidistant (or nodes with same similarity metrics) nodes may be processed in different order. Reordering of equidistant nodes occur because in this mode a total order is not enfored and only a partial order of nodes is performed.
By default strictOrder is enabled. However, the value can be changed by the user through command line arguments. The change of value occurs in the parseArguments() method if the user has specified an option to override the default.
Definition at line 224 of file MSTClusterMaker.h.
Referenced by computeNextESTidx(), estAdded(), parseArguments(), and populateCache().