A cache to maintain similarity/distance metrics to build a Minimum Spanning Tree (MST). More...
#include <MSTCache.h>
Public Member Functions | |
virtual | ~MSTCache () |
The destructor. | |
bool | isESTinMST (const int estIdx) const |
Determine if a EST (given its index) is already present in the MST. | |
virtual void | pruneCaches (const int estIdxAdded, std::vector< int > &repopulateList, const bool prefixListSize=true)=0 |
Suggest to clear caches that contain entries for a newly added EST. | |
virtual void | preprocess (SMList &list) |
Give the MSTCache a chance to preprocess lists before distributing them over the network. | |
virtual void | mergeList (const int estIdx, const SMList &list)=0 |
Add/merges a batch of new cache entries with the current entries in the cache for a given EST. | |
virtual void | getBestEntry (int &srcESTidx, int &destESTidx, float &metric, int &alignmentData, int &directionData) const =0 |
Obtains the top-most similar entry from the MSTCache. | |
virtual void | displayStats (std::ostream &os, const int MyRank) const =0 |
Display cache usage statistics. | |
Protected Member Functions | |
MSTCache (const int totalESTCount, const int startOwnESTidx, const int numOwnESTs, const ESTAnalyzer *analyzer, const bool repopulateCaches, const int maxCachePerEST) | |
The constructor. | |
MSTCache & | operator= (const MSTCache &src) |
A dummy operator=. | |
Static Protected Member Functions | |
static void | copy_n (const SMList &input, const size_t count, SMList &output) |
Utility method to copy first n-entries from input to output SMList. | |
Protected Attributes | |
const ESTAnalyzer *const | analyzer |
The analyzer to be used for any EST comparison. | |
std::vector< bool > | nodesAdded |
Bit-vector to determine if a given node has been added to the MST. | |
const int | startOwnESTidx |
The index of the first EST whose information is cached. | |
const int | numOwnESTs |
The number of ESTs that this cache logically owns and manages. | |
const int | maxCachePerEST |
The suggested maximum number of cached entries per EST. | |
const bool | repopulateCaches |
Flag to control if caches must be repopulated. |
A cache to maintain similarity/distance metrics to build a Minimum Spanning Tree (MST).
This class is the base class of all distributed-cache maintainers used for rapdily building a Minimum Spanning Tree (MST) for clustering. This class defines the core API for all ESTCache classes. In addition, it includes a few helper methods that provides some convenience methods to manage the caches. Furthermore, this class streamlines the operation of the Master and Worker processes when building a MST in a distributed manner. This class (and its children) essentially cache similarity/distance metrics to minimize the number of times the expensive operation of computing similarity metrics is performed via an EST analyzer.
Definition at line 74 of file MSTCache.h.
virtual MSTCache::~MSTCache | ( | ) | [inline, virtual] |
The destructor.
The parent class (std::vector) manages all the memory operations for this class. Consequently, the destructor does not have any special tasks to perform. It here present merely to adhere to coding conventions.
Definition at line 83 of file MSTCache.h.
MSTCache::MSTCache | ( | const int | totalESTCount, | |
const int | startOwnESTidx, | |||
const int | numOwnESTs, | |||
const ESTAnalyzer * | analyzer, | |||
const bool | repopulateCaches, | |||
const int | maxCachePerEST | |||
) | [protected] |
The constructor.
The MST cache requires information about the total number of ESTs in the system in order to effectively track the ESTs already added to the MST. This value must be passed in as the only parameter.
[in] | totalESTCount | The total number of ESTs to be processed (that is, to be added to the MST). |
[in] | startOwnESTidx | The starting index of the EST that is owned by the process that is using this cache. This value is used to internally normalize est index values to 0 for the first EST owned by this process. |
[in] | numOwnESTs | The number of ESTs owned by the process that is using this cache. This information is used to reserve necessary space to hold SMLists for each EST owned by the manager/worker process. |
[in] | analyzer | The analyzer to be used for EST comparisons, comparing metrics, and ordering entries in this cache depending on the type of EST analyzer (whether it is similarity based or distance metric based). |
[in] | repopulateCaches | If this parameter is true then the MSTCache will request lists to be repopulated when needed. Repopulating lists guarantees that ultimately a MST will be developed. If repopulation is suppressed then the resulting spanning tree may not be a MST. |
[in] | maxCachePerEST | This parameter suggests the maximum number of cached entries that must be maintained for a given EST. This parameter may not be really used by all dervied classes (this is only a suggestion). |
Definition at line 42 of file MSTCache.cpp.
void MSTCache::copy_n | ( | const SMList & | input, | |
const size_t | count, | |||
SMList & | output | |||
) | [static, protected] |
Utility method to copy first n-entries from input to output SMList.
This is a rather straightforward method that can be used to copy the first cout
entries from the input
list and append them to the output
list.
[in] | input | The list of entries that must be copied to the output list. |
[in] | count | The number of entries to be copied. If this value is greater then input.size(), then only input.size() entries are copied. |
[out] | output | The output list to which the entries are to be copied. |
Definition at line 54 of file MSTCache.cpp.
Referenced by MSTMultiListCache::mergeList().
virtual void MSTCache::displayStats | ( | std::ostream & | os, | |
const int | MyRank | |||
) | const [pure virtual] |
Display cache usage statistics.
This method can be used to print the current cache usage statistics. Different derived classes track and report different statistics. Consequently, the actual statistics reported by this method will vary.
[out] | os | The output stream to which statistics must be written. |
[in] | MyRank | The MPI rank of the process to which this cache is assocaited. This information is used merely for displaying more informative statistics and does not influence the actual values displayed by this method. |
Implemented in MSTHeapCache, and MSTMultiListCache.
Referenced by PMSTClusterMaker::displayStats(), and MSTClusterMaker::displayStats().
virtual void MSTCache::getBestEntry | ( | int & | srcESTidx, | |
int & | destESTidx, | |||
float & | metric, | |||
int & | alignmentData, | |||
int & | directionData | |||
) | const [pure virtual] |
Obtains the top-most similar entry from the MSTCache.
This method searches the similarity metrics stored in this cache for various ESTs to locate the best entry with the highest similarity. It populates the parameters with the appropriate value. Note that the parameters are initialized to -1, -1, -1.0f, and -1 respectively.
[out] | srcESTidx | The source EST index from where the similarity metric is being measured. The srcESTidx is already present in the MST. |
[out] | destESTidx | The destination EST index that is the best choice to be added to the MST (based on the local information). |
[out] | metric | The similarity/distance metric between the srcESTidx and the destESTidx. |
[out] | alignmentData | The alignment data associated with the srcESTidx and the destESTidx. |
[out] | directionData | The direction data associated with the srcESTidx and the destESTidx. |
Implemented in MSTHeapCache, and MSTMultiListCache.
Referenced by PMSTClusterMaker::computeNextESTidx(), MSTClusterMaker::computeNextESTidx(), PMSTClusterMaker::worker(), and MSTClusterMaker::worker().
bool MSTCache::isESTinMST | ( | const int | estIdx | ) | const [inline] |
Determine if a EST (given its index) is already present in the MST.
This method can be used to determine if a given EST has been added to the MST. This method was introduces to make the code a bit more readable at certain spots.
[in] | estIdx | The index of the EST to be tested. |
true
if the EST (with the given index) is present in the MST. Otherwise it returns false
. Definition at line 97 of file MSTCache.h.
References nodesAdded.
Referenced by PMSTClusterMaker::populateCache(), MSTClusterMaker::populateCache(), and MSTHeapCache::pruneCaches().
virtual void MSTCache::mergeList | ( | const int | estIdx, | |
const SMList & | list | |||
) | [pure virtual] |
Add/merges a batch of new cache entries with the current entries in the cache for a given EST.
This method merges a given batch of new cache entries with existing cache entries for the given EST. Some dervied classes may ensure that only MaxCacheSize entries are retained in the cache (retained entries must be chosen so that they have the best metrics). Entries in the cache are removed to accommodate new entries.
MaxCacheSize
suggestion and retain more entries than necessary to reduce unnecessary computations.[in] | estIdx | The zero-based index of the reference EST against which the batch of cache entries in list were analyzed and generated. |
[in] | list | A SMList containing the batch of new cache entries to be merged with any existing entries in the cache for the given EST (whose zero-based index is estIdx ). |
Implemented in MSTHeapCache, and MSTMultiListCache.
Referenced by PMSTClusterMaker::populateCache(), and MSTClusterMaker::populateCache().
A dummy operator=.
The operator=() is supressed for this class as it has constant members whose value is set when the object is created. These values cannot be changed during the lifetime of this object.
[in] | src | The source object from where data is to be copied. Currently this value is ignored. |
Reimplemented in MSTHeapCache.
virtual void MSTCache::preprocess | ( | SMList & | list | ) | [inline, virtual] |
Give the MSTCache a chance to preprocess lists before distributing them over the network.
This method is a helper method that can be used by the MSTClusterMaker to have a local metric list to be preprocessed prior to dispatching it over the network. Preprocessing a local list possibly reduces the size and distributes some of the processing overheads (thereby accelerating overall processing).
[in,out] | list | The list of entries that must be preprocessed. |
Reimplemented in MSTMultiListCache.
Definition at line 160 of file MSTCache.h.
Referenced by PMSTClusterMaker::populateCache(), and MSTClusterMaker::populateCache().
virtual void MSTCache::pruneCaches | ( | const int | estIdxAdded, | |
std::vector< int > & | repopulateList, | |||
const bool | prefixListSize = true | |||
) | [pure virtual] |
Suggest to clear caches that contain entries for a newly added EST.
This method can be used to suggest to the cache to prune entries corresponding to the estIdxAdded
from the various data structures maintained by this cache. The pruning operation is useful for the following two purposes:
First it removes entries from various lists corresponding to estIdxAdded (because these entries are vestigial as the node as already been added to the MST) thereby reducing memory usage.
Second, if a list becomes empty then the cache needs to be repopulated with fresh entries for further MST computations.
repopulateList
. Consequently, this method must be viewed as a suggestion to the MSTCache
to give it a chance to prune caches as needed.[in] | estIdxAdded | The index of the EST that has just been added to the MST. |
[out] | repopulateList | The index values of ESTs whose cache needs to be repopulated. The list of EST indexes added to this list are only the ESTs that are owned by this process. If none of the caches need to be recomputed then this list is empty (all existing entries are removed). |
[in] | prefixListSize | If this parameter is set to true , then the number of entries added to the repopulateList is inserted at the beginning of the vector. This eases transmission of lists via MPI to the master process. |
Implemented in MSTHeapCache, and MSTMultiListCache.
Referenced by PMSTClusterMaker::estAdded(), MSTClusterMaker::estAdded(), PMSTClusterMaker::worker(), and MSTClusterMaker::worker().
const ESTAnalyzer* const MSTCache::analyzer [protected] |
The analyzer to be used for any EST comparison.
This pointer is initialized in the constructor to point to the EST analyzer that is used for EST comparisons. This pointer is essentially used for comparing EST metrics for ordering information in the cache.
Definition at line 302 of file MSTCache.h.
Referenced by MSTMultiListCache::getBestEntry(), MSTHeapCache::getBestEntry(), MSTMultiListCache::mergeList(), and MSTMultiListCache::preprocess().
const int MSTCache::maxCachePerEST [protected] |
The suggested maximum number of cached entries per EST.
This instance variable is a suggested maximum number of cached entries that must be maintained for a given EST. This parameter may not be really used by all dervied classes (as this is only a suggestion).
Definition at line 345 of file MSTCache.h.
Referenced by MSTMultiListCache::mergeList(), and MSTMultiListCache::preprocess().
std::vector<bool> MSTCache::nodesAdded [protected] |
Bit-vector to determine if a given node has been added to the MST.
This vector is used to track if a given EST node has been been added to the MST. This std::vector<bool> is a specialization that optimizes memory usage. So this member should not occupy much memory. The maximum number of nodes is set in the constructor and all entries are initialized to false. The pruneCache() method sets the suitable entry to true
after the EST has been added to the MST. The isESTinMST() method uses this vector.
Definition at line 316 of file MSTCache.h.
Referenced by isESTinMST(), MSTMultiListCache::pruneCaches(), and MSTHeapCache::pruneCaches().
const int MSTCache::numOwnESTs [protected] |
The number of ESTs that this cache logically owns and manages.
This instance variable maintains the number of ESTs that are logically owned and managed by this cache. This value is initialized in the constructor and is never changed during the lifetime of this object.
Definition at line 336 of file MSTCache.h.
const bool MSTCache::repopulateCaches [protected] |
Flag to control if caches must be repopulated.
If this parameter is true
then the MSTCache will request lists to be repopulated when needed. Repopulating lists guarantees that ultimately a MST will be developed. If repopulation is suppressed then the resulting spanning tree may not be a MST.
This value is set in the constructor and is never changed during the life time of this object.
Definition at line 359 of file MSTCache.h.
Referenced by MSTMultiListCache::pruneCaches().
const int MSTCache::startOwnESTidx [protected] |
The index of the first EST whose information is cached.
The starting index of the EST that is owned by the process that is using this cache. This value is used by some derived classes to internally normalize EST index values to zero for the first EST owned by this process. This value is initialized in the constructor and is never changed during the lifetime of this object.
Definition at line 327 of file MSTCache.h.
Referenced by MSTMultiListCache::mergeList().