#include <EST.h>
Classes | |
struct | LessEST |
Functor for EST sorting. More... | |
Public Member Functions | |
EST (const int id, const char *info, const char *sequence=NULL, const int offset=-1) | |
EST Constructor. | |
void | dumpEST (std::ostream &os) |
Dump this EST information in FASTA format. | |
int | getID () const |
Obtain the ID of this EST. | |
const char * | getInfo () const |
Obtain the information associated with this EST. | |
const char * | getSequence () const |
Obtain the actual sequence of base pairs for this EST. | |
float | getSimilarity () const |
Obtain the similarity metric for this EST. | |
void | setSimilarity (const float sim) |
Set the similarity metric for an EST. | |
void | unpopulate () |
Method to clear general information and sequence data. | |
bool | repopulate (FILE *fastaFile) |
Repopulate necessary information from a given fasta file. | |
void | setCustomData (std::auto_ptr< ESTCustomData > &src) |
Change the custom data associated with this EST. | |
void | setCustomData (ESTCustomData *src) |
Change the custom data associated with this EST. | |
std::auto_ptr< ESTCustomData > & | getCustomData () |
Obtain a mutable reference to custom data associated with this EST. | |
const std::auto_ptr < ESTCustomData > & | getCustomData () const |
Obtain an immutable reference to custom data associated with this EST. | |
~EST () | |
The destructor. | |
bool | hasBeenProcessed () const |
Determine if this EST has already been processed. | |
void | setProcessed (const bool processedFlag) |
Set if this EST has already been processed. | |
Static Public Member Functions | |
static EST * | create (const int id, const char *info, const char *sequence=NULL, const long offset=-1) |
Create a valid EST. | |
static EST * | create (FILE *fastaFile, int &lineNum, const bool maskBases=true) |
Loads data from a FASTA file to create an EST. | |
static std::vector< EST * > & | getESTList () |
Obtain the list of ESTs. | |
static int | getProcessedESTCount () |
Obtain count of ESTs that have been flagged as being processed. | |
static int | getESTCount () |
Obtain the number of ESTs in this list. | |
static size_t | getMaxESTLen () |
Helper method to determine the longest EST. | |
static EST * | getEST (const int estIdx) |
Obtain a given EST from the EST list. | |
static void | dumpESTList (std::ostream &os) |
Dump currently loaded ESTs in FASTA format. | |
static void | dumpESTList (std::ostream &os, const bool processed) |
Dump currently loaded and (un)processed ESTs in FASTA format. | |
static void | deleteAllESTs () |
Delete and clear all ESTs. | |
static void | deleteLastESTs (const int count) |
Delete and clear out the last EST in the list. | |
static std::string | getLine (FILE *fp) |
Helper method to read a line from a given file. | |
Protected Attributes | |
const int | id |
The unique ID for this EST. | |
char * | info |
The name and other information associated with the EST. | |
char * | sequence |
The actual sequence of base pairs associated with this EST. | |
const long | offset |
The offset in the FASTA file to load the data from. | |
float | similarity |
A similarity value for this EST with respect to another EST. | |
bool | processed |
Instance variable to track if EST has gone through some processing. | |
std::auto_ptr< ESTCustomData > | customData |
Place holder for some other custom data. | |
Static Protected Attributes | |
static size_t | maxESTlen = 0 |
Size of the longest EST. | |
Private Member Functions | |
EST () | |
The default constructor. | |
EST & | operator= (const EST &src) |
A dummy operator=. | |
Static Private Member Functions | |
static char * | duplicate (const char *src) |
A utility method to duplicate a c-string. | |
static void | normalizeBases (std::string &sequence, const bool maskBases=true) |
Helper method to normalize a given nucleotide sequence. | |
Static Private Attributes | |
static std::vector< EST * > | estList |
The list of EST's currently being used. |
A single EST.
This class is used to represent a single EST. An EST object instance consists of the following information:
id: A unique identifier (usually a number) for this EST.
info: The name and other information associated with the EST. This information is typically the first header line read from a FASTA file.
offset: The offset of in the FASTA file from where this EST was read. This information can be used to conditionally and rapidly load ESTs from a file.
Definition at line 67 of file EST.h.
EST::EST | ( | const int | id, | |
const char * | info, | |||
const char * | sequence = NULL , |
|||
const int | offset = -1 | |||
) |
EST Constructor.
This constructor is used to instantiate an EST method.
[in] | id | The unqiue ID value to be set for this EST. |
[in] | info | The name and other information associated with the EST. This information is typically the first header line read from a FASTA file. This information can be NULL. |
[in] | sequence | The actual sequence of base pairs associated with this EST. The sequence information that must be used to create this EST. The sequence information can be NULL. |
[in] | offset | The offset of in the FASTA file from where this EST was read. This information can be used to conditionally and rapidly load EST's from a file. |
Definition at line 51 of file EST.cpp.
References duplicate(), info, processed, sequence, and similarity.
EST::~EST | ( | ) |
The destructor.
The destructor for the EST essentially releases the memory used to hold the information and sequence data for a given EST.
Definition at line 61 of file EST.cpp.
References unpopulate().
EST::EST | ( | ) | [private] |
EST * EST::create | ( | FILE * | fastaFile, | |
int & | lineNum, | |||
const bool | maskBases = true | |||
) | [static] |
Loads data from a FASTA file to create an EST.
This method provides a convenient interface for loading information regarding an EST from a given FASTA file and using the information to create either a fully populated or partially populated EST.
[in,out] | fastaFile | The FASTA file from where the EST data is to be currently loaded. If this pointer is NULL then this method perform no action and returns immediately with NULL. |
[in,out] | lineNum | A line number counter to be updated to provide the user with a more meaningful error message. |
[in] | maskBases | If this flag is true, then all lowercase bases are converted to 'N' rather than uppercase characters, causing them to be ignored by downstream processing. |
Definition at line 115 of file EST.cpp.
References EST(), estList, getLine(), maxESTlen, normalizeBases(), offset, and sequence.
EST * EST::create | ( | const int | id, | |
const char * | info, | |||
const char * | sequence = NULL , |
|||
const long | offset = -1 | |||
) | [static] |
Create a valid EST.
This method must be used to create a valid EST in the system. The information required to create the EST must be passed in as the parameter. The EST names are expected to be unique in a given file.
id
must be equal to estList.size().[in] | id | The unqiue ID value to be set for this EST. |
[in] | info | The name and other information associated with the EST. This information is typically the first header line read from a FASTA file. This information can be NULL. |
[in] | sequence | The actual sequence of base pairs associated with this EST. The sequence information that must be used to create this EST. The sequence information can be NULL. |
[in] | offset | The offset of in the FASTA file from where this EST was read. This information can be used to conditionally and rapidly load EST's from a file. |
Definition at line 100 of file EST.cpp.
References EST(), and estList.
Referenced by LCFilter::addDummyEntry(), and ESTAnalyzer::loadFASTAFile().
void EST::deleteAllESTs | ( | ) | [static] |
Delete and clear all ESTs.
This method can be used to delete and clear all the EST's from the internal list of EST's currently loaded.
Definition at line 177 of file EST.cpp.
Referenced by CLU::~CLU(), and FWAnalyzer::~FWAnalyzer().
void EST::deleteLastESTs | ( | const int | count | ) | [static] |
Delete and clear out the last EST in the list.
This method can be used to delete the last EST in the list. This method rests the maximum EST length instance variable as needed. This method is typically used to remove dummy ESTs that are added to the end of the list by some filters.
[in] | count | The number of ESTs to be removed from the list. |
Definition at line 187 of file EST.cpp.
References estList.
Referenced by LCFilter::finalize().
void EST::dumpEST | ( | std::ostream & | os | ) |
Dump this EST information in FASTA format.
This method can be used to dump the information associated with the EST in FASTA format to a given output stream.
[in] | os | The output stream to which the EST's information must be written in FASTA format. |
Definition at line 217 of file EST.cpp.
References getInfo(), and getSequence().
void EST::dumpESTList | ( | std::ostream & | os, | |
const bool | processed | |||
) | [static] |
Dump currently loaded and (un)processed ESTs in FASTA format.
This method can be used to dump the currently loaded EST's in FASTA file format to a given output stream.
[out] | os | The output stream to which EST data is to be dumped. |
[in] | processed | If this flag is true , then this method dumps only those ESTs that have been flagged as having been processed. If this flag is false , then this method dumps only un-processed ESTs. |
void EST::dumpESTList | ( | std::ostream & | os | ) | [static] |
Dump currently loaded ESTs in FASTA format.
This method can be used to dump the currently loaded EST's in FASTA file format to a given output stream.
[out] | os | The output stream to which EST data is to be dumped. |
Definition at line 196 of file EST.cpp.
Referenced by applyFilters().
char * EST::duplicate | ( | const char * | src | ) | [static, private] |
A utility method to duplicate a c-string.
This msethod is a simple utililty method that can be used to duplicate a given C-string. This method uses the stsandard C++ new operator to duplicate the given C-string.
Definition at line 235 of file EST.cpp.
Referenced by EST().
const std::auto_ptr<ESTCustomData>& EST::getCustomData | ( | ) | const [inline] |
Obtain an immutable reference to custom data associated with this EST.
This method can be used to obtain an immutable reference to the custom data associated with this EST. This method essentially returns the custom value set by the last successful call to one of the polymorphic setCustomData() methods in this class. By default this method returns NULL.
Definition at line 405 of file EST.h.
References customData.
std::auto_ptr<ESTCustomData>& EST::getCustomData | ( | ) | [inline] |
Obtain a mutable reference to custom data associated with this EST.
This method can be used to obtain a mutable reference to the custom data associated with this EST. This method essentially returns the custom value set by the last successful call to one of the polymorphic setCustomData() methods in this class. By default this method returns NULL.
Definition at line 389 of file EST.h.
References customData.
Referenced by CLU::buildHashMaps(), and CLU::getMetric().
static EST* EST::getEST | ( | const int | estIdx | ) | [inline, static] |
Obtain a given EST from the EST list.
This method is a convenience method that can be used to obtain a given EST from the list of ESTs.
[in] | estIdx | The zero-based index of the EST that is desired from the list of ESTs in this class. If this index is invalid then the behavior of this method is undefined. |
Definition at line 211 of file EST.h.
References estList.
Referenced by MSTClusterMaker::addEST(), InteractiveConsole::analyze(), FilterChain::applyFilters(), UVSampleHeuristic::computeHash(), NewUVHeuristic::computeHash(), TVHeuristic::countCommonWords(), MSTNode::getESTInfo(), TwoPassD2::getMetric(), D2Zim::getMetric(), CLU::getMetric(), MSTClusterMaker::manager(), PMSTClusterMaker::populateCache(), MSTClusterMaker::populateCache(), InteractiveConsole::print(), D2::runD2(), LengthFilter::runFilter(), TVHeuristic::runHeuristic(), UVSampleHeuristic::setReferenceEST(), TwoPassD2::setReferenceEST(), NewUVHeuristic::setReferenceEST(), D2Zim::setReferenceEST(), and D2::setReferenceEST().
static int EST::getESTCount | ( | ) | [inline, static] |
Obtain the number of ESTs in this list.
This method may be used to determine the number of ESTs that have been defined and added to this list.
Definition at line 183 of file EST.h.
References estList.
Referenced by LCFilter::addDummyEntry(), FilterChain::applyFilters(), TwoPassD2::getMetric(), D2Zim::getMetric(), D2::getMetric(), InteractiveConsole::initialize(), MSTClusterMaker::manager(), LengthFilter::runFilter(), UVSampleHeuristic::runHeuristic(), NewUVHeuristic::runHeuristic(), UVSampleHeuristic::setReferenceEST(), TwoPassD2::setReferenceEST(), NewUVHeuristic::setReferenceEST(), D2Zim::setReferenceEST(), and D2::setReferenceEST().
static std::vector<EST*>& EST::getESTList | ( | ) | [inline, static] |
Obtain the list of ESTs.
This method may be used to obtain a reference to the list of ESTs currently defined.
Definition at line 158 of file EST.h.
References estList.
Referenced by FWAnalyzer::analyze(), FWAnalyzer::dumpHeader(), InteractiveConsole::getESTIndex(), FWAnalyzer::getMetric(), MSTClusterMaker::getOwnedESTidx(), FilterChain::getOwnedESTidx(), PMSTClusterMaker::getOwnedPartition(), MSTClusterMaker::getOwnerProcess(), TransMSTClusterMaker::initialize(), CLU::initialize(), InteractiveConsole::list(), PMSTClusterMaker::makeClusters(), MSTClusterMaker::manager(), PMSTClusterMaker::mergeManager(), MSTClusterMaker::populateMST(), InteractiveConsole::printStats(), FWAnalyzer::setReferenceEST(), and CLU::setReferenceEST().
int EST::getID | ( | ) | const [inline] |
const char* EST::getInfo | ( | ) | const [inline] |
Obtain the information associated with this EST.
The name and other information associated with the EST. This information is typically the first header line read from a FASTA file. This information can be NULL if the EST is only partially loaded.
Definition at line 284 of file EST.h.
Referenced by InteractiveConsole::analyze(), FWAnalyzer::dumpEST(), dumpEST(), CLU::dumpEST(), MSTNode::getESTInfo(), and InteractiveConsole::print().
std::string EST::getLine | ( | FILE * | fp | ) | [static] |
size_t EST::getMaxESTLen | ( | ) | [static] |
Helper method to determine the longest EST.
This method can be used to determine the length of the longest EST loaded thus far. This information is typically used to allocate buffers and other data structures for analysis.
Definition at line 292 of file EST.cpp.
References estList, getSequence(), and maxESTlen.
Referenced by D2::buildWordTable(), TwoPassD2::initialize(), TVHeuristic::initialize(), and D2Zim::initialize().
int EST::getProcessedESTCount | ( | ) | [static] |
Obtain count of ESTs that have been flagged as being processed.
This method can be used to determine the number of ESTs that have been flagged as being processed. Subtracting this number from the total number of ESTs indicates the number of ESTs to be processed.
Definition at line 280 of file EST.cpp.
References estList, and processed.
Referenced by MSTClusterMaker::manager().
const char* EST::getSequence | ( | ) | const [inline] |
Obtain the actual sequence of base pairs for this EST.
Note that sequence ifnoramtion for an EST can be null if itis only partially loaded from a file. Entries are parially loaded to reduce memory foot print when processing large data sets.
Definition at line 296 of file EST.h.
References sequence.
Referenced by CLU::buildHashMaps(), UVSampleHeuristic::computeHash(), NewUVHeuristic::computeHash(), TVHeuristic::countCommonWords(), dumpEST(), CLU::dumpEST(), FWAnalyzer::getFrame(), getMaxESTLen(), D2Zim::getMetric(), CLU::getMetric(), InteractiveConsole::print(), D2::runD2(), LengthFilter::runFilter(), TVHeuristic::runHeuristic(), UVSampleHeuristic::setReferenceEST(), TwoPassD2::setReferenceEST(), NewUVHeuristic::setReferenceEST(), D2Zim::setReferenceEST(), and D2::setReferenceEST().
float EST::getSimilarity | ( | ) | const [inline] |
Obtain the similarity metric for this EST.
The similarity metric is a quantitative representation of the similarity between two ESTs. The similarity metric is generated during analysis when one EST is compared with another. The similarity value is initialized to -1.
Definition at line 307 of file EST.h.
References similarity.
Referenced by FWAnalyzer::dumpEST(), and CLU::dumpEST().
bool EST::hasBeenProcessed | ( | ) | const [inline] |
Determine if this EST has already been processed.
This method exposes a generic flag that is provided as a convenience for algorithms to mark if this EST has gone through their processing.
true
if this EST has been flagged as having been processed. Otherwise this method returns false
. Definition at line 448 of file EST.h.
References processed.
Referenced by FilterChain::applyFilters(), MSTClusterMaker::manager(), PMSTClusterMaker::populateCache(), and MSTClusterMaker::populateCache().
void EST::normalizeBases | ( | std::string & | sequence, | |
const bool | maskBases = true | |||
) | [static, private] |
Helper method to normalize a given nucleotide sequence.
This method is used to normalize fragments read from a FASTA file. This method normalizes the sequences such that the resulting sequence is over the set {'A', 'T', 'C', 'G', 'N'} in the following manner:
If the maskBases flag is true, then all lowercase nucleotides are converted to 'N'. Otherwise they are converted to uppercase equivalents.
All nucleotides that are not in "ATCG" are converted to 'N'.
[in,out] | sequence | The sequence of nucleotides to be normalized by this method. |
[in] | maskBases | If this flag is true , then all lowercase "atcg" bases are converted to 'N'. Otherwise they are converted to uppercase letters. |
Definition at line 250 of file EST.cpp.
Referenced by create().
A dummy operator=.
The operator=() is supressed for this class as it has constant members whose value is set when the object is created. These values cannot be changed during the lifetime of this object.
[in] | src | The source object from where data is to be copied. Currently this value is ignored. |
bool EST::repopulate | ( | FILE * | fastaFile | ) |
Repopulate necessary information from a given fasta file.
This method can be used to request an EST to repopulate its FASTA header and actual sequence (base pair) information from a given FASTA file. This method uses the offset (saved when this EST was originally loaded) to load the information from the file.
[in,out] | fastaFile | The file from where the EST information is to be loaded. If the file changes during EST analysis the behavior of this method is undefined. |
void EST::setCustomData | ( | ESTCustomData * | src | ) | [inline] |
Change the custom data associated with this EST.
This method can be used to change (or set) the custom data associated with this EST. Note that any earlier custom data associated with this EST is lost (and deleted if necessary by auto_ptr) before the new value is set.
[in,out] | src | The new custom data to be set for this EST. After this call, this EST owns the data referred by src. |
Definition at line 372 of file EST.h.
References customData.
void EST::setCustomData | ( | std::auto_ptr< ESTCustomData > & | src | ) | [inline] |
Change the custom data associated with this EST.
This method can be used to change (or set) the custom data associated with this EST. Note that any earlier custom data associated with this EST is lost (and deleted if necessary by auto_ptr) before the new value is set.
[in,out] | src | The new custom data to be set for this EST. After this call, this EST owns the data referred by src. |
Definition at line 360 of file EST.h.
References customData.
void EST::setProcessed | ( | const bool | processedFlag | ) | [inline] |
Set if this EST has already been processed.
This method provides a generic flag as a convenience for algorithms to mark if this EST has gone through their processing. By default ESTs are marked has processed when they are instantiated.
[in] | processedFlag | If this flag is true then this EST is flagged as having been processed. If this flag is false then the EST is flagged as not-processed (and requiring processing). |
Definition at line 462 of file EST.h.
References processed.
Referenced by LCFilter::addDummyEntry(), and MSTClusterMaker::addEST().
void EST::setSimilarity | ( | const float | sim | ) | [inline] |
Set the similarity metric for an EST.
This method must be used to change the similarity metric for this EST. The similarity metric is a quantitative representation of the similarity between two ESTs. The similarity metric is generated during analysis when one EST is compared with another.
[in] | sim | The similarity metric value to which this EST's similarity much be changed. |
Definition at line 320 of file EST.h.
References similarity.
void EST::unpopulate | ( | ) |
Method to clear general information and sequence data.
This method can be used to unpopulate the FASTA header and actual sequence (base pairs) information from this EST. This frees up memory allocated to hold this data thereby minimizing the memory footprint for this EST. This enables holding a large number of skeleton EST's in memory.
Definition at line 66 of file EST.cpp.
References info, and sequence.
Referenced by ESTAnalyzer::loadFASTAFile(), and ~EST().
std::auto_ptr<ESTCustomData> EST::customData [protected] |
Place holder for some other custom data.
This pointer acts as a convenient place holder for other classes to associate some uninterpreted user data (or data structure). This member is initialized to NULL in the constructor. Note that this pointer is managed using an auto_ptr that automatically deletes the data when the auto_ptr loses ownership of the data object.
Definition at line 541 of file EST.h.
Referenced by getCustomData(), and setCustomData().
std::vector< EST * > EST::estList [static, private] |
The list of EST's currently being used.
This list contains the complete set of ESTs that are currently defined. This list includes partially loaded ESTs as well. New entries are added to the list by the create method.
Definition at line 598 of file EST.h.
Referenced by create(), deleteAllESTs(), deleteLastESTs(), dumpESTList(), getEST(), getESTCount(), getESTList(), getMaxESTLen(), and getProcessedESTCount().
const int EST::id [protected] |
The unique ID for this EST.
This member holds the unique ID for this EST. The ID is set when the EST is instantiated and is never changed during the life time of this EST. The id is used to access and extract EST information.
Definition at line 473 of file EST.h.
Referenced by deleteAllESTs(), and dumpESTList().
char* EST::info [protected] |
The name and other information associated with the EST.
This information is typically the first header line read from a FASTA file. The information may be dynamically loaded on demand to reduce memory footprint when processing large data sets.
Definition at line 481 of file EST.h.
Referenced by EST(), and unpopulate().
size_t EST::maxESTlen = 0 [static, protected] |
Size of the longest EST.
This static instance variable is used to track the size (in number of nucleotides) of the longest EST ever instantiated. The size of the longest EST can be used by algorithms to optimally allocate memory for processing ESTs.
Definition at line 530 of file EST.h.
Referenced by create(), and getMaxESTLen().
const long EST::offset [protected] |
The offset in the FASTA file to load the data from.
The offset of in the FASTA file from where this EST was read. This information can be used to conditionally and rapidly load EST's from a file. This value is initialized when the EST is insantiated and is never changed during the life time of an object.
Definition at line 498 of file EST.h.
Referenced by create().
bool EST::processed [protected] |
Instance variable to track if EST has gone through some processing.
This is a generic flag that is provided as a convenience for algorithms to mark if this EST has gone through their processing. By default this instance variable is intialized to false
. Once it has been processed, the setProcessed() method can be used to set/reset this flag. The hasBeenProcessed() method can be used to determine if this EST has already been processed.
Definition at line 521 of file EST.h.
Referenced by EST(), getProcessedESTCount(), hasBeenProcessed(), and setProcessed().
char* EST::sequence [protected] |
The actual sequence of base pairs associated with this EST.
This information is typically read from a FASTA file. The information may be dynamically loaded on demand to reduce memory footprint when processing large data sets.
Definition at line 488 of file EST.h.
Referenced by create(), EST(), getSequence(), and unpopulate().
float EST::similarity [protected] |
A similarity value for this EST with respect to another EST.
This instance variable is used to hold a similarity metric for this EST. The similarity metric is generated during analysis when one EST is compared with another. The similarity value is initialized to -1. It is accessed via the getSimilarity() method and changed via the setSimilarity() method.
Definition at line 508 of file EST.h.
Referenced by EST(), getSimilarity(), EST::LessEST::operator()(), and setSimilarity().