PMSTClusterMaker Class Reference

A Minimum Spanning Tree (MST) based parallel cluster maker. More...

#include <PMSTClusterMaker.h>

Inheritance diagram for PMSTClusterMaker:
Inheritance graph
[legend]
Collaboration diagram for PMSTClusterMaker:
Collaboration graph
[legend]

List of all members.

Public Types

enum  MessageTags {
  REPOPULATE_REQUEST, COMPUTE_SIMILARITY_REQUEST, SIMILARITY_LIST, SIMILARITY_COMPUTATION_DONE,
  COMPUTE_MAX_SIMILARITY_REQUEST, MAX_SIMILARITY_RESPONSE, ADD_EST, TRANSITIVITY_LIST,
  COMPUTE_TOTAL_ANALYSIS_COUNT
}
 

The set of tags exchanged between various processes.

More...

Public Member Functions

virtual ~PMSTClusterMaker ()
 The destructor.
virtual void showArguments (std::ostream &os)
 Display valid command line arguments for this cluster maker.
virtual bool parseArguments (int &argc, char **argv)
 Process command line arguments.
virtual int makeClusters ()
 Method to begin clustering.
virtual void displayStats (std::ostream &os)
 Method to display performance statistics.

Protected Member Functions

virtual int manager ()
 Helper method to perform manager tasks.
virtual int worker ()
 Helper method to perform worker tasks.
virtual float analyze (const int otherEST)
 Helper method to call the actual heavy-weight analysis method(s).
virtual void populateCache (const int estIdx, SMList *metricList=NULL)
 Computes sends/receives similarity list for a given EST.
void getOwnedESTidx (const int estIdx, int &startIndex, int &endIndex)
int managerUpdateCaches (int estIdx, const bool refreshEST=true)
 Helper method in Manager process to update distributed caches.
void computeNextESTidx (int &parentESTidx, int &estToAdd, float &similarity, int &alignmentData) const
 Helper method in Manager process to collaboratively compute the next EST to be added to the MST.
int getOwnerProcess (const int estIdx) const
 Determine the owner process Rank for a given estIdx.
void getOwnedPartition ()
 Helper method to compute the start and ending indexes of the EST that this process owns.
void workerProcessRequests ()
 Helper method for a worker process.
void sendToWorkers (int data, const int tag) const
 Distribute data and tag to all the workers.
bool hasValidSMEntry (const SMList &list) const
 Method to detect if a given SMList has at least one, valid entry.
void estAdded (const int estIdx, std::vector< int > &repopulateList)
 Helper method to distribute index of newly added EST to all workers and gather cache repopulation requests.
void addMoreChildESTs (const int parentESTidx, int &estToAdd, float &metric, int &alignmentData, int &pendingESTs)
 Helper method in Manager process to add as many child nodes as possible for the given parent.
int mergeManager (MSTCluster &rootCluster, const int threshold)
int mergeWorker ()
 PMSTClusterMaker (ESTAnalyzer *analyzer, const int refESTidx, const std::string &outputFile)
 The default constructor.

Protected Attributes

MSTCachecache
 The cache that holds similarity metrics for MST construction.
PartitionDatapData

Static Protected Attributes

static int cacheSize = 128
 Variable to indicate per-EST similarity cache size.
static double percentile = 1.0
 Command line option to set percentile value to compute clustering threshold.
static bool strictOrder = false
 Variable to indicate if strict ordering of worker Ranks must be followed.
static bool dontCluster = false
 Command line option to avoid the clustering phase.
static bool prettyPrint = false
 Command line option to print a pretty cluster tree.
static char * inputMSTFile = NULL
 Variable to indicate if MST information must be simply read from a given file.
static char * outputMSTFile = NULL
 Variable to indicate if MST information must be written to a given file.
static bool noCacheRepop = true
 Command line option to suppress cache repopulation.
static int maxUse = 0
 Command line option to enable maximum use of precomputed scores for building MST.
static char * cacheType = PDefCacheType
 Command line option to set the type of cache to be used by PEACE.
static arg_parser::arg_record argsList []
 The set of common arguments for the MST cluster maker.

Private Attributes

MSTmst
 The Minimum Spanning Tree (MST) built by this class.

Friends

class ClusterMakerFactory

Detailed Description

A Minimum Spanning Tree (MST) based parallel cluster maker.

This class encapsulates the core functionality needed to construct a MST-based EST clusters in a parallel/distributed manner using the Message Passing Interface (MPI) library. This class includes functionality for both the Manager (MPI Rank == 0) and Worker (MPI Rank > 0) processes. Necessary functionality to distinguish and operate either as Manager or Worker is already built into the class. This class uses the MSTCache and MSTCluster classes to help in performing the various activities. Refer to the documentation on the various method for detailed description on their functionality and usage.

Definition at line 56 of file PMSTClusterMaker.h.


Member Enumeration Documentation

The set of tags exchanged between various processes.

This enum provides meanigful names to the various tags (integers) exchanged between the master and worker processes participating in the construction of a MST in a parallel/distributed manner.

Enumerator:
REPOPULATE_REQUEST 
COMPUTE_SIMILARITY_REQUEST 
SIMILARITY_LIST 
SIMILARITY_COMPUTATION_DONE 
COMPUTE_MAX_SIMILARITY_REQUEST 
MAX_SIMILARITY_RESPONSE 
ADD_EST 
TRANSITIVITY_LIST 
COMPUTE_TOTAL_ANALYSIS_COUNT 

Definition at line 66 of file PMSTClusterMaker.h.


Constructor & Destructor Documentation

PMSTClusterMaker::~PMSTClusterMaker (  )  [virtual]

The destructor.

The destructor frees up all any dynamic memory allocated by this object for its operations.

Definition at line 105 of file PMSTClusterMaker.cpp.

References mst.

PMSTClusterMaker::PMSTClusterMaker ( ESTAnalyzer analyzer,
const int  refESTidx,
const std::string &  outputFile 
) [protected]

The default constructor.

The default constructor for this class. The constructor is made private so that this class cannot be directly instantiated. However, since the ClusterMakerFactory is a friend of this class, an object can be instantiated via the ClusterMakerFactory::create() method.

Parameters:
[in,out] analyzer The EST analyzer to be used for obtaining similarity metrics between two ESTs. This parameter is simply passed onto the base class.
[in] refESTidx The reference EST index value to be used to root the spanning tree created by this method. This parameter should be >= 0. This value is simply passed onto the base class.
[in] outputFile The name of the output file to which the raw MST cluster information is to be written. If this parameter is the empty string then output is written to standard output. This value is simply passed onto the base class.

Definition at line 98 of file PMSTClusterMaker.cpp.


Member Function Documentation

void PMSTClusterMaker::addMoreChildESTs ( const int  parentESTidx,
int &  estToAdd,
float &  metric,
int &  alignmentData,
int &  pendingESTs 
) [protected]

Helper method in Manager process to add as many child nodes as possible for the given parent.

This is a helper method that is used only in the Manager process only when the maxUse parameter is != -1. This method tries to add more children rooted at the given parent to the MST as long as the metric is better than maxUse value. This method operates as follows:

  1. First, this method sends request to compute the best local choice to each of the worker processes.

  2. Next it computes its own local (at the Manager's end) best choice for the next EST node to be added.

  3. It then collects response for best local choice from each worker process and tracks the best reported value.

  4. If the next best entry is still rooted at this parent and the metric is better than maxUse then the EST is added to MST and the process is repeated from step 1. Otherwise, the parameters are updated to the last added EST and the method returns.

Parameters:
[in] parentESTidx The source EST index from where the similarity metric is being measured. The parentESTidx is already present in the MST.
[in,out] estToAdd The EST that has just been added to the MST. This method updates this value if additional ESTs are added to the MST by this method.
[in,out] metric The similarity/distance metric between the parentESTidx and the estToAdd. This method updates this value if additional ESTs are added to the MST by this method.
[in,out] alignmentData The alignment information between the two ESTs represented by their index values in parentESTidx and estToAdd. This method updates this value if additional ESTs are added to the MST by this method.
[in,out] pendingESTs The number of pending ESTs that have not yet been added to the MST. This value is used and udpated by this method each time it adds a EST.

Definition at line 240 of file PMSTClusterMaker.cpp.

References MST::addNode(), ClusterMaker::analyzer, ASSERT, ESTAnalyzer::compareMetrics(), computeNextESTidx(), managerUpdateCaches(), maxUse, and mst.

Referenced by manager().

float PMSTClusterMaker::analyze ( const int  otherEST  )  [protected, virtual]

Helper method to call the actual heavy-weight analysis method(s).

This is a helper method that is invoked from the populateCache() method to obtain the relationship metric (either via CLU or d2) between the current parent EST and the given otherEST. This method was introduced to enable chlid classes (such as TransMSTClusterMaker) to conveniently intercept analyzer calls and potentially shortcircuit them using concepts of conditional-transitivity.

Parameters:
[in] otherEST The index of the other EST to which the metric is required.
Returns:
This method returns a similarity/distance metric by comparing the ESTs. This method may return -1, if the otherEST is significantly different from the reference EST (possibly warranting no further analysis) that a meanigful metric cannot be generated.

Definition at line 422 of file PMSTClusterMaker.cpp.

References ESTAnalyzer::analyze(), and ClusterMaker::analyzer.

Referenced by populateCache().

void PMSTClusterMaker::computeNextESTidx ( int &  parentESTidx,
int &  estToAdd,
float &  similarity,
int &  alignmentData 
) const [protected]

Helper method in Manager process to collaboratively compute the next EST to be added to the MST.

This is a helper method that is used only in the Manager process to perform the following tasks using the newly added estIdx value:

  1. First, this method sends request to compute the best local choice to each of the worker processes.

  2. Next it computes its own local (at the Manager's end) best choice for the next EST node to be added.

  3. It then collects response for best local choice from each worker process and tracks the best reported value.

Parameters:
[out] parentESTidx The source EST index from where the similarity metric is being measured. The srcESTidx is already present in the MST.
[out] estToAdd The destination EST index that is the best choice to be added to the MST (based on the local information).
[out] similarity The similarity metric between the srcESTidx and the destESTidx.
[out] alignmentData The alignment information between the two ESTs represented by their index values in parentESTidx and estToAdd.

Definition at line 206 of file PMSTClusterMaker.cpp.

References ClusterMaker::analyzer, cache, ESTAnalyzer::compareMetrics(), COMPUTE_MAX_SIMILARITY_REQUEST, MSTCache::getBestEntry(), PartitionData::getWorkerCount(), MAX_SIMILARITY_RESPONSE, MPI_GET_RANK, MPI_RECV, MPI_TYPE_INT, pData, sendToWorkers(), strictOrder, and TRACK_IDLE_TIME.

Referenced by addMoreChildESTs(), and manager().

void PMSTClusterMaker::displayStats ( std::ostream &  os  )  [virtual]

Method to display performance statistics.

This method overrides the empty implementation in the base class to display statistics on cache usage and MPI calls for tracking and reporting the performance and behavior of this class.

Parameters:
[out] os The output stream to which the statistics must be written.

Definition at line 713 of file PMSTClusterMaker.cpp.

References cache, MSTCache::displayStats(), and MPI_GET_RANK.

Referenced by makeClusters().

void PMSTClusterMaker::estAdded ( const int  estIdx,
std::vector< int > &  repopulateList 
) [protected]

Helper method to distribute index of newly added EST to all workers and gather cache repopulation requests.

This is a helper method that was added to streamline the code in managerUpdateCaches method. This method performs the following tasks:

  1. First it uses the sendToWorkers() method to distribute the estIdx (parameter) value to all the workers.

  2. Next it prunes the local caches on the manager.

  3. It then obtains repopulation requests from each worker and places EST indexes to be repopulated in the repoulateList parameter.

Note:
This method must be inovked only on the manager.
Parameters:
[in] estIdx The index of the newly added EST that must be distributed to all the workers.
[out] repopulateList A vector that will contain the list of ESTs that need to be repopulated (based on requests received from various workers).

Definition at line 137 of file PMSTClusterMaker.cpp.

References ADD_EST, cache, PartitionData::getWorkerCount(), MPI_GET_RANK, MPI_PROBE, MPI_RECV, MPI_STATUS, MPI_TYPE_INT, pData, MSTCache::pruneCaches(), REPOPULATE_REQUEST, sendToWorkers(), and strictOrder.

Referenced by managerUpdateCaches(), and worker().

void PMSTClusterMaker::getOwnedESTidx ( const int  estIdx,
int &  startIndex,
int &  endIndex 
) [protected]
void PMSTClusterMaker::getOwnedPartition (  )  [protected]

Helper method to compute the start and ending indexes of the EST that this process owns.

This method was introduced to keep the math and logic clutter involved in computing the list of owned ESTs out of the methods that use the information. This method returns the range, such that: startIndex <= ownedESTidx < endIndex.

Note:
This method must be invoked only after MPI::Intialize() has beeen called and the ESTs to be processed have be loaded (so that EST::getESTList() returns a valid list of ESTs).

Definition at line 596 of file PMSTClusterMaker.cpp.

References EST::getESTList(), manager(), MPI_GET_RANK, MPI_GET_SIZE, and pData.

Referenced by makeClusters().

int PMSTClusterMaker::getOwnerProcess ( const int  estIdx  )  const [protected]

Determine the owner process Rank for a given estIdx.

This method is a convenience method to determine the Rank of the process that logically owns a given EST. The owning process is responsible for maintaining the cache for a given EST. The owners are assigned in a simple fashion and ESTs are evenly divided up amongst all the processes.

Parameters:
[in] estIdx The index of the EST whose owner process's rank is requested. It is assumed that the estIdx is valid. If invalid EST index values are supplied then the operation of this method is undefined.
Note:
This method must be invoked only after MPI::Intialize() has beeen called and the ESTs to be processed have be loaded (so that EST::getESTList() returns a valid list of ESTs).
Returns:
The rank of the owner process for the given estIdx.

Definition at line 391 of file PMSTClusterMaker.cpp.

References PartitionData::getWorkerCount(), MPI_GET_RANK, and pData.

Referenced by managerUpdateCaches(), and populateCache().

bool PMSTClusterMaker::hasValidSMEntry ( const SMList list  )  const [protected]

Method to detect if a given SMList has at least one, valid entry.

This method is used (in the populateCache()) to determine if a given SMList has at least one valid entry. This method is useful particularly when a empty SMList is received from a remote process and in this case there ine one entry in the SMList (-1, -1).

Parameters:
[in] list The list to check if it has a valid entry.

Definition at line 1067 of file PMSTClusterMaker.cpp.

Referenced by populateCache().

int PMSTClusterMaker::makeClusters (  )  [virtual]
int PMSTClusterMaker::manager (  )  [protected, virtual]

Helper method to perform manager tasks.

This method has been introduced to streamline the operations of the MSTClusterMaker when it operates as the manager. The MPI process with Rank 0 (zero) acts as the manager and coordinates all the activities of the MSTClusterMaker. This method is invoked from the makeClusters() method.

Returns:
This method returns 0 (zero) if clusters were created successfully. Otherwise this method returns a non-zero value indicating an error.

Definition at line 280 of file PMSTClusterMaker.cpp.

References ADD_EST, addMoreChildESTs(), MST::addNode(), ClusterMaker::analyzer, ASSERT, computeNextESTidx(), PartitionData::estCount, ESTAnalyzer::getAlignmentData(), managerUpdateCaches(), maxUse, MPI_GET_RANK, mst, NO_ERROR, pData, sendToWorkers(), and PartitionData::startESTidx.

Referenced by getOwnedPartition(), and makeClusters().

int PMSTClusterMaker::managerUpdateCaches ( int  estIdx,
const bool  refreshEST = true 
) [protected]

Helper method in Manager process to update distributed caches.

This is a helper method that is used only in the Manager process to perform the following tasks using the newly added estIdx value:

  1. First, this method broadcasts the newly added EST index (estIdx) to all the workers.

  2. Next it prunes it local cache via the MSTCache::pruneCaches() method.

  3. It then collects requests to repopulate specific caches from all the workers.

  4. It then adds the newly created est to the list of caches to be repopulated and broadcasts request to repopulate caches to each worker and participates in cache repopulation task by calling the populateCache() method.

Parameters:
[in] estIdx The index of the newly added EST.
[in] refreshEST If this flag is true (the default value), then the neighbors for the newly added EST (specified by estIdx) are computed and the caches are updated.
Returns:
This method returns 0 on success or an suitable error code on failure.

Definition at line 172 of file PMSTClusterMaker.cpp.

References COMPUTE_SIMILARITY_REQUEST, estAdded(), getOwnerProcess(), MPI_GET_RANK, MPI_RECV, MPI_TYPE_INT, populateCache(), sendToWorkers(), SIMILARITY_COMPUTATION_DONE, and TRACK_IDLE_TIME.

Referenced by addMoreChildESTs(), and manager().

int PMSTClusterMaker::mergeManager ( MSTCluster rootCluster,
const int  threshold 
) [protected]
int PMSTClusterMaker::mergeWorker (  )  [protected]

Definition at line 881 of file PMSTClusterMaker.cpp.

References MST::getNodes(), MANAGER_RANK, MPI_SEND, MPI_TYPE_CHAR, mst, NO_ERROR, and SIMILARITY_LIST.

Referenced by makeClusters().

bool PMSTClusterMaker::parseArguments ( int &  argc,
char **  argv 
) [virtual]

Process command line arguments.

This method is used to process command line arguments specific to this cluster maker. This method is typically used from the main method just after the cluster maker has been instantiated. This method consumes all valid command line arguments. If the command line arguments were valid and successfully processed, then this method returns true.

Note:
This method consumes its custom command line arguments first and then call's the base class's parseArguments() method.
Parameters:
[in,out] argc The number of command line arguments to be processed.
[in,out] argv The array of command line arguments.
Returns:
This method returns true if the command line arguments were successfully processed. Otherwise this method returns false.

Reimplemented from ClusterMaker.

Definition at line 121 of file PMSTClusterMaker.cpp.

References cacheSize, arg_parser::check_args(), and strictOrder.

void PMSTClusterMaker::populateCache ( const int  estIdx,
SMList metricList = NULL 
) [protected, virtual]

Computes sends/receives similarity list for a given EST.

This method is a shared method that is used by both the manager and workers. This method is used to compute the similarity metric and cache the highest set of similarity metrics. This method operates as follows:

  1. Each process computes a subset of the EST similarity metric in the range k*Rank < otherEstIdx < (k+1)*Rank, where k=estList.size() / MPI::COMM_WORLD.Get_size(), and Rank is the MPI rank of this process.

  2. If this process is the cache owner for the est, (that is, estIdx % Rank == 0), then it receives data from other processes and merges the information with its own list, retaining the top-most similarity metrics.

  3. If this process is not the cache owner for the est, (that is, estIdx % Rank != 0), then it sends the computed similarity metrics to the owner process.
Parameters:
[in] estIdx The index of the EST that was just added to the MST and for which the adjacent neighbors need to be determined.
[out] metricList If this pointer is not NULL, then this vector is populated with the set of metrics that were computed for estIdx only on the owner process. This list contains the metrics collated from all the processes participating in the distributed computing process. Currently, this feature is used by TransMSTClusterMaker to obtain the list of metrics computed.

Definition at line 470 of file PMSTClusterMaker.cpp.

References analyze(), ClusterMaker::analyzer, ASSERT, cache, ESTAnalyzer::compareMetrics(), ESTAnalyzer::getAlignmentData(), EST::getEST(), ESTAnalyzer::getInvalidMetric(), getOwnedESTidx(), getOwnerProcess(), PartitionData::getPartitionManager(), PartitionData::getWorkerCount(), EST::hasBeenProcessed(), hasValidSMEntry(), MSTCache::isESTinMST(), MSTCache::mergeList(), MPI_GET_RANK, MPI_PROBE, MPI_RECV, MPI_SEND, MPI_STATUS, MPI_TYPE_CHAR, MPI_TYPE_INT, pData, MSTCache::preprocess(), ESTAnalyzer::setReferenceEST(), SIMILARITY_COMPUTATION_DONE, SIMILARITY_LIST, and strictOrder.

Referenced by managerUpdateCaches(), and worker().

void PMSTClusterMaker::sendToWorkers ( int  data,
const int  tag 
) const [protected]

Distribute data and tag to all the workers.

This method provides a convenient mechanism to broadcast a given integer data and tag to all the workers.

Parameters:
[in] data The integer to be sent to each and every worker.
[in] tag The message tag to be sent to each and every process.

Definition at line 1057 of file PMSTClusterMaker.cpp.

References PartitionData::getPartitionManager(), PartitionData::getWorkerCount(), MPI_GET_RANK, MPI_SEND, MPI_TYPE_INT, and pData.

Referenced by computeNextESTidx(), estAdded(), manager(), and managerUpdateCaches().

void PMSTClusterMaker::showArguments ( std::ostream &  os  )  [virtual]

Display valid command line arguments for this cluster maker.

This method must be used to display all valid command line options that are supported by this cluster maker (and its base classes).

Note:
This method calls the base class's showArguments first.
Parameters:
[out] os The output stream to which the valid command line arguments must be written.

Reimplemented from ClusterMaker.

Definition at line 112 of file PMSTClusterMaker.cpp.

References ClusterMaker::name.

int PMSTClusterMaker::worker (  )  [protected, virtual]

Helper method to perform worker tasks.

This method has been introduced to streamline the operations of the MSTClusterMaker when it operates as a worker. All the MPI processes with non-zero rank act as a worker and collaborate with the manager to assist in various activities of the MSTClusterMaker. This method is invoked from the makeClusters() method.

Returns:
This method returns 0 (zero) if clusters were created successfully. Otherwise this method returns a non-zero value indicating an error.

Definition at line 328 of file PMSTClusterMaker.cpp.

References ADD_EST, cache, COMPUTE_MAX_SIMILARITY_REQUEST, COMPUTE_SIMILARITY_REQUEST, estAdded(), MSTCache::getBestEntry(), PartitionData::getPartitionManager(), MAX_SIMILARITY_RESPONSE, MPI_PROBE, MPI_RECV, MPI_SEND, MPI_STATUS, MPI_TYPE_INT, NO_ERROR, pData, populateCache(), MSTCache::pruneCaches(), and REPOPULATE_REQUEST.

Referenced by makeClusters().

void PMSTClusterMaker::workerProcessRequests (  )  [protected]

Helper method for a worker process.

This method is invoked from the worker() method to receive and process various requests from the manager process. This method currently handles the following requests:

  • COMPUTE_SIMILARITY_REQUEST : Computes the subset of the similarity metric for the given EST index and returns the partial list back to the owner process.

  • COMPUTE_MAX_SIMILARITY_REQUEST : Computes the highest similarity value between all the ESTs on this cluster and returns the top entry back to the manager. Once this request has been processed this method returns control back.


Friends And Related Function Documentation

friend class ClusterMakerFactory [friend]

Definition at line 57 of file PMSTClusterMaker.h.


Member Data Documentation

Initial value:
 {
    {"--cache", "#similarity metrics to cache per EST",
     &PMSTClusterMaker::cacheSize, arg_parser::INTEGER},
    {"--no-cache-repop", "Suppress EST cache repopulation",
     &PMSTClusterMaker::noCacheRepop, arg_parser::BOOLEAN},    
    {"--percentile", "Percentile deviation to use to compute threshold value",
     &PMSTClusterMaker::percentile, arg_parser::DOUBLE},
    {"--no-order", "Disable strict order of processing messages",
     &PMSTClusterMaker::strictOrder, arg_parser::BOOLEAN},
    {"--input-mst-file", "Read MST data from file (skip parallel MST building)",
     &PMSTClusterMaker::inputMSTFile, arg_parser::STRING},
    {"--output-mst-file", "Output MST data to file",
     &PMSTClusterMaker::outputMSTFile, arg_parser::STRING},
    {"--dont-cluster", "Just generate MST data. Don't do clustering",
     &PMSTClusterMaker::dontCluster, arg_parser::BOOLEAN},
    {"--pretty-print", "Print a pretty cluster tree.",
     &PMSTClusterMaker::prettyPrint, arg_parser::BOOLEAN},
    {"--maxUse", "Set a threshold to aggressively use metrics (default=0)",
     &PMSTClusterMaker::maxUse, arg_parser::INTEGER},
    {"--cacheType", "Set type of cache (heap or mlist) to use (default=heap)",
     &PMSTClusterMaker::cacheType, arg_parser::STRING},   
    {NULL, NULL, NULL, arg_parser::BOOLEAN}
}

The set of common arguments for the MST cluster maker.

This instance variable contains a static list of arguments that are common all the MST cluster maker objects.

Definition at line 660 of file PMSTClusterMaker.h.

The cache that holds similarity metrics for MST construction.

This object is used to cache the similarity metrics for all ESTs that are owned by this process (that is, estIdx % Rank == 0, where Rank is the MPI rank of this process). The cache contains similarity metrics to facilitate rapid construction of the MST. Both the manager and worker processes have their own caches and manage them independently. This spreads out the memory requirement for the caches across multiple processes enabling large (in 10s of GB) caches.

The cache is created just before the clustering process commences and is deleted immediately after the clustering process (to minimize memory footprint).

Definition at line 677 of file PMSTClusterMaker.h.

Referenced by computeNextESTidx(), displayStats(), estAdded(), makeClusters(), populateCache(), and worker().

int PMSTClusterMaker::cacheSize = 128 [static, protected]

Variable to indicate per-EST similarity cache size.

This variable is used to indicate the number of similarity metrics that must be cached for a given EST. This value is initialized to 128. The value is changed by the parseArguments() method if the user has specified an option to override the default.

Definition at line 144 of file PMSTClusterMaker.h.

Referenced by makeClusters(), and parseArguments().

char * PMSTClusterMaker::cacheType = PDefCacheType [static, protected]

Command line option to set the type of cache to be used by PEACE.

This member variable is used to indicate the type of cache that must be used to store metrics to facilitate rapid construction of the MST. The default cache used in the MSTHashCache indicated by the cacheType set to "hash". The alternative cache in the MSTMultiListCache (indicated by cacheType value of "mlist"). The user may override the default using the command line parameter --cacheType.

Definition at line 288 of file PMSTClusterMaker.h.

Referenced by makeClusters().

bool PMSTClusterMaker::dontCluster = false [static, protected]

Command line option to avoid the clustering phase.

If this member variable is true, then this class only generates MST information and does not do clustering. By default this variable is initialized to false. However, the value can be changed by the user through command line arguments. The change of value occurs in the parseArguments() method if the user has specified an option to override the default.

Definition at line 210 of file PMSTClusterMaker.h.

Referenced by makeClusters().

char * PMSTClusterMaker::inputMSTFile = NULL [static, protected]

Variable to indicate if MST information must be simply read from a given file.

This member variable is used to hold the name of the file (with full path) from where MST information must be read. This instance variable is initialized to NULL. However, if the input MST file is specified then MST building is skipped and MST data read from the file is used for further processing.

Definition at line 233 of file PMSTClusterMaker.h.

Referenced by makeClusters().

int PMSTClusterMaker::maxUse = 0 [static, protected]

Command line option to enable maximum use of precomputed scores for building MST.

If this member variable is set to a value other than -1, then the MSTClusterMaker will try to use all the ESTs that have a metric better than the value specified for maxUse. Maximally using good metrics will ultimately reduce the total number of analysis that need to be performed, thereby reducing overall time for clustering.

Definition at line 275 of file PMSTClusterMaker.h.

Referenced by addMoreChildESTs(), and manager().

The Minimum Spanning Tree (MST) built by this class.

This instance variable holds a pointer to the MST created by this class when it operates as a manager process. This pointer is initialized to NULL and a MST is created in the manager() method.

Definition at line 689 of file PMSTClusterMaker.h.

Referenced by addMoreChildESTs(), makeClusters(), manager(), mergeManager(), mergeWorker(), and ~PMSTClusterMaker().

bool PMSTClusterMaker::noCacheRepop = true [static, protected]

Command line option to suppress cache repopulation.

If this member variable is true, then this class does not repopulate caches once a EST cache becomes empty. By default this variable is initialized to false. However, the value can be changed by the user through command line argument (--no-cache-repop). The change of value occurs in the parseArguments() method if the user has specified an option to override the default.

If this parameter is not specified then the MSTCache will request lists to be repopulated when needed. Repopulating lists guarantees that ultimately a MST will be developed. If repopulation is suppressed via this parameter then the resulting spanning tree may not be a MST; however computation time decreases.

Definition at line 263 of file PMSTClusterMaker.h.

Referenced by makeClusters().

char * PMSTClusterMaker::outputMSTFile = NULL [static, protected]

Variable to indicate if MST information must be written to a given file.

This member variable is used to hold the name of the file (with full path) to which MST information must be written. This instance variable is initialized to NULL. However, if the output MST file is specified then MST data built by this program is written to the specified file.

Definition at line 244 of file PMSTClusterMaker.h.

Referenced by makeClusters().

double PMSTClusterMaker::percentile = 1.0 [static, protected]

Command line option to set percentile value to compute clustering threshold.

This variable is used to indicate the percentile value that must be used to determine the threshold for clustering. This value is initialized to 1.0. This value is ultimately used in the MSTCluster::calculateThreshold() method to compute the threshold using the formula:

threshold = mean + (stDev * percentile);

The value is changed by the parseArguments() method if the user has specified the --percentile option to override the default.

Definition at line 161 of file PMSTClusterMaker.h.

Referenced by makeClusters().

bool PMSTClusterMaker::prettyPrint = false [static, protected]

Command line option to print a pretty cluster tree.

If this member variable is true, then this class prints a pretty ASCII tree with the cluster information By default this variable is initialized to false. However, the value can be changed by the user through command line argument (--pretty-print). The change of value occurs in the parseArguments() method if the user has specified an option to override the default.

Definition at line 222 of file PMSTClusterMaker.h.

Referenced by makeClusters().

bool PMSTClusterMaker::strictOrder = false [static, protected]

Variable to indicate if strict ordering of worker Ranks must be followed.

If this member variable is true, then messages dispatched by workers and the manager are always read in a fixed order of increasing ranks. That is, messages from rank 0 (zero) are processed first, then messages from process with rank 1, so on and so forth. On the other hand if this variable is false, then messages are processed in the order they are received.

The strictOrder approach guarantees consistent results for each run (involving the same number of processes) and the resulting MSTs are alll identical. However, a process may have to wait (idle wasting time) until a message from the appropriate process (with a given rank) is actually received. This may slow down the overall computational rate, particularly when the work load get's skewed toward the end of MST construction.

On the other hand, if strictOrder is relaxed (by setting strictOrder variable to false) then messsages are processed as soon as they are received, in the order in which messages arrive. This approach minimizes wait times. However, the MST constructed between multiple runs may not be identical as equidistant (or nodes with same similarity metrics) nodes may be processed in different order. Reordering of equidistant nodes occur because in this mode a total order is not enfored and only a partial order of nodes is performed.

By default strictOrder is enabled. However, the value can be changed by the user through command line arguments. The change of value occurs in the parseArguments() method if the user has specified an option to override the default.

Definition at line 198 of file PMSTClusterMaker.h.

Referenced by computeNextESTidx(), estAdded(), parseArguments(), and populateCache().


The documentation for this class was generated from the following files:

Generated on 19 Mar 2010 for PEACE by  doxygen 1.6.1