MSTCache Class Reference

A cache to maintain similarity/distance metrics to build a Minimum Spanning Tree (MST). More...

#include <MSTCache.h>

Inheritance diagram for MSTCache:
Inheritance graph
[legend]
Collaboration diagram for MSTCache:
Collaboration graph
[legend]

List of all members.

Public Member Functions

virtual ~MSTCache ()
 The destructor.
bool isESTinMST (const int estIdx) const
 Determine if a EST (given its index) is already present in the MST.
virtual void pruneCaches (const int estIdxAdded, std::vector< int > &repopulateList, const bool prefixListSize=true)=0
 Suggest to clear caches that contain entries for a newly added EST.
virtual void preprocess (SMList &list)
 Give the MSTCache a chance to preprocess lists before distributing them over the network.
virtual void mergeList (const int estIdx, const SMList &list)=0
 Add/merges a batch of new cache entries with the current entries in the cache for a given EST.
virtual void getBestEntry (int &srcESTidx, int &destESTidx, float &metric, int &alignmentData, int &directionData) const =0
 Obtains the top-most similar entry from the MSTCache.
virtual void displayStats (std::ostream &os, const int MyRank) const =0
 Display cache usage statistics.

Protected Member Functions

 MSTCache (const int totalESTCount, const int startOwnESTidx, const int numOwnESTs, const ESTAnalyzer *analyzer, const bool repopulateCaches, const int maxCachePerEST)
 The constructor.
MSTCacheoperator= (const MSTCache &src)
 A dummy operator=.

Static Protected Member Functions

static void copy_n (const SMList &input, const size_t count, SMList &output)
 Utility method to copy first n-entries from input to output SMList.

Protected Attributes

const ESTAnalyzer *const analyzer
 The analyzer to be used for any EST comparison.
std::vector< bool > nodesAdded
 Bit-vector to determine if a given node has been added to the MST.
const int startOwnESTidx
 The index of the first EST whose information is cached.
const int numOwnESTs
 The number of ESTs that this cache logically owns and manages.
const int maxCachePerEST
 The suggested maximum number of cached entries per EST.
const bool repopulateCaches
 Flag to control if caches must be repopulated.

Detailed Description

A cache to maintain similarity/distance metrics to build a Minimum Spanning Tree (MST).

This class is the base class of all distributed-cache maintainers used for rapdily building a Minimum Spanning Tree (MST) for clustering. This class defines the core API for all ESTCache classes. In addition, it includes a few helper methods that provides some convenience methods to manage the caches. Furthermore, this class streamlines the operation of the Master and Worker processes when building a MST in a distributed manner. This class (and its children) essentially cache similarity/distance metrics to minimize the number of times the expensive operation of computing similarity metrics is performed via an EST analyzer.

Definition at line 74 of file MSTCache.h.


Constructor & Destructor Documentation

virtual MSTCache::~MSTCache (  )  [inline, virtual]

The destructor.

The parent class (std::vector) manages all the memory operations for this class. Consequently, the destructor does not have any special tasks to perform. It here present merely to adhere to coding conventions.

Definition at line 83 of file MSTCache.h.

MSTCache::MSTCache ( const int  totalESTCount,
const int  startOwnESTidx,
const int  numOwnESTs,
const ESTAnalyzer analyzer,
const bool  repopulateCaches,
const int  maxCachePerEST 
) [protected]

The constructor.

The MST cache requires information about the total number of ESTs in the system in order to effectively track the ESTs already added to the MST. This value must be passed in as the only parameter.

Parameters:
[in] totalESTCount The total number of ESTs to be processed (that is, to be added to the MST).
[in] startOwnESTidx The starting index of the EST that is owned by the process that is using this cache. This value is used to internally normalize est index values to 0 for the first EST owned by this process.
[in] numOwnESTs The number of ESTs owned by the process that is using this cache. This information is used to reserve necessary space to hold SMLists for each EST owned by the manager/worker process.
[in] analyzer The analyzer to be used for EST comparisons, comparing metrics, and ordering entries in this cache depending on the type of EST analyzer (whether it is similarity based or distance metric based).
[in] repopulateCaches If this parameter is true then the MSTCache will request lists to be repopulated when needed. Repopulating lists guarantees that ultimately a MST will be developed. If repopulation is suppressed then the resulting spanning tree may not be a MST.
[in] maxCachePerEST This parameter suggests the maximum number of cached entries that must be maintained for a given EST. This parameter may not be really used by all dervied classes (this is only a suggestion).

Definition at line 42 of file MSTCache.cpp.


Member Function Documentation

void MSTCache::copy_n ( const SMList input,
const size_t  count,
SMList output 
) [static, protected]

Utility method to copy first n-entries from input to output SMList.

This is a rather straightforward method that can be used to copy the first cout entries from the input list and append them to the output list.

Parameters:
[in] input The list of entries that must be copied to the output list.
[in] count The number of entries to be copied. If this value is greater then input.size(), then only input.size() entries are copied.
[out] output The output list to which the entries are to be copied.

Definition at line 54 of file MSTCache.cpp.

Referenced by MSTMultiListCache::mergeList().

virtual void MSTCache::displayStats ( std::ostream &  os,
const int  MyRank 
) const [pure virtual]

Display cache usage statistics.

This method can be used to print the current cache usage statistics. Different derived classes track and report different statistics. Consequently, the actual statistics reported by this method will vary.

Note:
Derived classes must override this method.
Parameters:
[out] os The output stream to which statistics must be written.
[in] MyRank The MPI rank of the process to which this cache is assocaited. This information is used merely for displaying more informative statistics and does not influence the actual values displayed by this method.

Implemented in MSTHeapCache, and MSTMultiListCache.

Referenced by PMSTClusterMaker::displayStats(), and MSTClusterMaker::displayStats().

virtual void MSTCache::getBestEntry ( int &  srcESTidx,
int &  destESTidx,
float &  metric,
int &  alignmentData,
int &  directionData 
) const [pure virtual]

Obtains the top-most similar entry from the MSTCache.

This method searches the similarity metrics stored in this cache for various ESTs to locate the best entry with the highest similarity. It populates the parameters with the appropriate value. Note that the parameters are initialized to -1, -1, -1.0f, and -1 respectively.

Parameters:
[out] srcESTidx The source EST index from where the similarity metric is being measured. The srcESTidx is already present in the MST.
[out] destESTidx The destination EST index that is the best choice to be added to the MST (based on the local information).
[out] metric The similarity/distance metric between the srcESTidx and the destESTidx.
[out] alignmentData The alignment data associated with the srcESTidx and the destESTidx.
[out] directionData The direction data associated with the srcESTidx and the destESTidx.

Implemented in MSTHeapCache, and MSTMultiListCache.

Referenced by PMSTClusterMaker::computeNextESTidx(), MSTClusterMaker::computeNextESTidx(), PMSTClusterMaker::worker(), and MSTClusterMaker::worker().

bool MSTCache::isESTinMST ( const int  estIdx  )  const [inline]

Determine if a EST (given its index) is already present in the MST.

This method can be used to determine if a given EST has been added to the MST. This method was introduces to make the code a bit more readable at certain spots.

Parameters:
[in] estIdx The index of the EST to be tested.
Returns:
This method returns true if the EST (with the given index) is present in the MST. Otherwise it returns false.

Definition at line 97 of file MSTCache.h.

References nodesAdded.

Referenced by PMSTClusterMaker::populateCache(), MSTClusterMaker::populateCache(), and MSTHeapCache::pruneCaches().

virtual void MSTCache::mergeList ( const int  estIdx,
const SMList list 
) [pure virtual]

Add/merges a batch of new cache entries with the current entries in the cache for a given EST.

This method merges a given batch of new cache entries with existing cache entries for the given EST. Some dervied classes may ensure that only MaxCacheSize entries are retained in the cache (retained entries must be chosen so that they have the best metrics). Entries in the cache are removed to accommodate new entries.

Note:
Derived classes must override this method. However, they may choose to ignore the MaxCacheSize suggestion and retain more entries than necessary to reduce unnecessary computations.
Parameters:
[in] estIdx The zero-based index of the reference EST against which the batch of cache entries in list were analyzed and generated.
[in] list A SMList containing the batch of new cache entries to be merged with any existing entries in the cache for the given EST (whose zero-based index is estIdx).

Implemented in MSTHeapCache, and MSTMultiListCache.

Referenced by PMSTClusterMaker::populateCache(), and MSTClusterMaker::populateCache().

MSTCache& MSTCache::operator= ( const MSTCache src  )  [protected]

A dummy operator=.

The operator=() is supressed for this class as it has constant members whose value is set when the object is created. These values cannot be changed during the lifetime of this object.

Parameters:
[in] src The source object from where data is to be copied. Currently this value is ignored.
Returns:
Reference to this.

Reimplemented in MSTHeapCache.

virtual void MSTCache::preprocess ( SMList list  )  [inline, virtual]

Give the MSTCache a chance to preprocess lists before distributing them over the network.

This method is a helper method that can be used by the MSTClusterMaker to have a local metric list to be preprocessed prior to dispatching it over the network. Preprocessing a local list possibly reduces the size and distributes some of the processing overheads (thereby accelerating overall processing).

Note:
The default method implementation in the base class does absolutely nothing.
Parameters:
[in,out] list The list of entries that must be preprocessed.

Reimplemented in MSTMultiListCache.

Definition at line 160 of file MSTCache.h.

Referenced by PMSTClusterMaker::populateCache(), and MSTClusterMaker::populateCache().

virtual void MSTCache::pruneCaches ( const int  estIdxAdded,
std::vector< int > &  repopulateList,
const bool  prefixListSize = true 
) [pure virtual]

Suggest to clear caches that contain entries for a newly added EST.

This method can be used to suggest to the cache to prune entries corresponding to the estIdxAdded from the various data structures maintained by this cache. The pruning operation is useful for the following two purposes:

  1. First it removes entries from various lists corresponding to estIdxAdded (because these entries are vestigial as the node as already been added to the MST) thereby reducing memory usage.

  2. Second, if a list becomes empty then the cache needs to be repopulated with fresh entries for further MST computations.

Note:
Some dervied classes may choose not to prune caches or add entries to the repopulateList. Consequently, this method must be viewed as a suggestion to the MSTCache to give it a chance to prune caches as needed.
Parameters:
[in] estIdxAdded The index of the EST that has just been added to the MST.
[out] repopulateList The index values of ESTs whose cache needs to be repopulated. The list of EST indexes added to this list are only the ESTs that are owned by this process. If none of the caches need to be recomputed then this list is empty (all existing entries are removed).
[in] prefixListSize If this parameter is set to true, then the number of entries added to the repopulateList is inserted at the beginning of the vector. This eases transmission of lists via MPI to the master process.

Implemented in MSTHeapCache, and MSTMultiListCache.

Referenced by PMSTClusterMaker::estAdded(), MSTClusterMaker::estAdded(), PMSTClusterMaker::worker(), and MSTClusterMaker::worker().


Member Data Documentation

const ESTAnalyzer* const MSTCache::analyzer [protected]

The analyzer to be used for any EST comparison.

This pointer is initialized in the constructor to point to the EST analyzer that is used for EST comparisons. This pointer is essentially used for comparing EST metrics for ordering information in the cache.

Definition at line 302 of file MSTCache.h.

Referenced by MSTMultiListCache::getBestEntry(), MSTHeapCache::getBestEntry(), MSTMultiListCache::mergeList(), and MSTMultiListCache::preprocess().

const int MSTCache::maxCachePerEST [protected]

The suggested maximum number of cached entries per EST.

This instance variable is a suggested maximum number of cached entries that must be maintained for a given EST. This parameter may not be really used by all dervied classes (as this is only a suggestion).

Definition at line 345 of file MSTCache.h.

Referenced by MSTMultiListCache::mergeList(), and MSTMultiListCache::preprocess().

std::vector<bool> MSTCache::nodesAdded [protected]

Bit-vector to determine if a given node has been added to the MST.

This vector is used to track if a given EST node has been been added to the MST. This std::vector<bool> is a specialization that optimizes memory usage. So this member should not occupy much memory. The maximum number of nodes is set in the constructor and all entries are initialized to false. The pruneCache() method sets the suitable entry to true after the EST has been added to the MST. The isESTinMST() method uses this vector.

Definition at line 316 of file MSTCache.h.

Referenced by isESTinMST(), MSTMultiListCache::pruneCaches(), and MSTHeapCache::pruneCaches().

const int MSTCache::numOwnESTs [protected]

The number of ESTs that this cache logically owns and manages.

This instance variable maintains the number of ESTs that are logically owned and managed by this cache. This value is initialized in the constructor and is never changed during the lifetime of this object.

Definition at line 336 of file MSTCache.h.

const bool MSTCache::repopulateCaches [protected]

Flag to control if caches must be repopulated.

If this parameter is true then the MSTCache will request lists to be repopulated when needed. Repopulating lists guarantees that ultimately a MST will be developed. If repopulation is suppressed then the resulting spanning tree may not be a MST.

This value is set in the constructor and is never changed during the life time of this object.

Definition at line 359 of file MSTCache.h.

Referenced by MSTMultiListCache::pruneCaches().

const int MSTCache::startOwnESTidx [protected]

The index of the first EST whose information is cached.

The starting index of the EST that is owned by the process that is using this cache. This value is used by some derived classes to internally normalize EST index values to zero for the first EST owned by this process. This value is initialized in the constructor and is never changed during the lifetime of this object.

Definition at line 327 of file MSTCache.h.

Referenced by MSTMultiListCache::mergeList().


The documentation for this class was generated from the following files:

Generated on 19 Mar 2010 for PEACE by  doxygen 1.6.1