EST Class Reference

A single EST. More...

#include <EST.h>

List of all members.

Classes

struct  LessEST
 Functor for EST sorting. More...

Public Member Functions

 EST (const int id, const char *info, const char *sequence=NULL, const int offset=-1)
 EST Constructor.
void dumpEST (std::ostream &os)
 Dump this EST information in FASTA format.
int getID () const
 Obtain the ID of this EST.
const char * getInfo () const
 Obtain the information associated with this EST.
const char * getSequence () const
 Obtain the actual sequence of base pairs for this EST.
float getSimilarity () const
 Obtain the similarity metric for this EST.
void setSimilarity (const float sim)
 Set the similarity metric for an EST.
void unpopulate ()
 Method to clear general information and sequence data.
bool repopulate (FILE *fastaFile)
 Repopulate necessary information from a given fasta file.
void setCustomData (std::auto_ptr< ESTCustomData > &src)
 Change the custom data associated with this EST.
void setCustomData (ESTCustomData *src)
 Change the custom data associated with this EST.
std::auto_ptr< ESTCustomData > & getCustomData ()
 Obtain a mutable reference to custom data associated with this EST.
const std::auto_ptr
< ESTCustomData > & 
getCustomData () const
 Obtain an immutable reference to custom data associated with this EST.
 ~EST ()
 The destructor.
bool hasBeenProcessed () const
 Determine if this EST has already been processed.
void setProcessed (const bool processedFlag)
 Set if this EST has already been processed.

Static Public Member Functions

static ESTcreate (const int id, const char *info, const char *sequence=NULL, const long offset=-1)
 Create a valid EST.
static ESTcreate (FILE *fastaFile, int &lineNum, const bool maskBases=true)
 Loads data from a FASTA file to create an EST.
static std::vector< EST * > & getESTList ()
 Obtain the list of ESTs.
static int getProcessedESTCount ()
 Obtain count of ESTs that have been flagged as being processed.
static int getESTCount ()
 Obtain the number of ESTs in this list.
static size_t getMaxESTLen ()
 Helper method to determine the longest EST.
static ESTgetEST (const int estIdx)
 Obtain a given EST from the EST list.
static void dumpESTList (std::ostream &os)
 Dump currently loaded ESTs in FASTA format.
static void dumpESTList (std::ostream &os, const bool processed)
 Dump currently loaded and (un)processed ESTs in FASTA format.
static void deleteAllESTs ()
 Delete and clear all ESTs.
static void deleteLastESTs (const int count)
 Delete and clear out the last EST in the list.
static std::string getLine (FILE *fp)
 Helper method to read a line from a given file.

Protected Attributes

const int id
 The unique ID for this EST.
char * info
 The name and other information associated with the EST.
char * sequence
 The actual sequence of base pairs associated with this EST.
const long offset
 The offset in the FASTA file to load the data from.
float similarity
 A similarity value for this EST with respect to another EST.
bool processed
 Instance variable to track if EST has gone through some processing.
std::auto_ptr< ESTCustomDatacustomData
 Place holder for some other custom data.

Static Protected Attributes

static size_t maxESTlen = 0
 Size of the longest EST.

Private Member Functions

 EST ()
 The default constructor.
ESToperator= (const EST &src)
 A dummy operator=.

Static Private Member Functions

static char * duplicate (const char *src)
 A utility method to duplicate a c-string.
static void normalizeBases (std::string &sequence, const bool maskBases=true)
 Helper method to normalize a given nucleotide sequence.

Static Private Attributes

static std::vector< EST * > estList
 The list of EST's currently being used.

Detailed Description

A single EST.

This class is used to represent a single EST. An EST object instance consists of the following information:

Definition at line 67 of file EST.h.


Constructor & Destructor Documentation

EST::EST ( const int  id,
const char *  info,
const char *  sequence = NULL,
const int  offset = -1 
)

EST Constructor.

This constructor is used to instantiate an EST method.

Parameters:
[in] id The unqiue ID value to be set for this EST.
[in] info The name and other information associated with the EST. This information is typically the first header line read from a FASTA file. This information can be NULL.
[in] sequence The actual sequence of base pairs associated with this EST. The sequence information that must be used to create this EST. The sequence information can be NULL.
[in] offset The offset of in the FASTA file from where this EST was read. This information can be used to conditionally and rapidly load EST's from a file.

Definition at line 51 of file EST.cpp.

References duplicate(), info, processed, sequence, and similarity.

EST::~EST (  ) 

The destructor.

The destructor for the EST essentially releases the memory used to hold the information and sequence data for a given EST.

Definition at line 61 of file EST.cpp.

References unpopulate().

EST::EST (  )  [private]

The default constructor.

The default constructor has been made private to ensure that EST's are never directly created. Instead, a valid EST must be created using other constructors.

Definition at line 271 of file EST.cpp.

Referenced by create().


Member Function Documentation

EST * EST::create ( FILE *  fastaFile,
int &  lineNum,
const bool  maskBases = true 
) [static]

Loads data from a FASTA file to create an EST.

This method provides a convenient interface for loading information regarding an EST from a given FASTA file and using the information to create either a fully populated or partially populated EST.

Parameters:
[in,out] fastaFile The FASTA file from where the EST data is to be currently loaded. If this pointer is NULL then this method perform no action and returns immediately with NULL.
[in,out] lineNum A line number counter to be updated to provide the user with a more meaningful error message.
[in] maskBases If this flag is true, then all lowercase bases are converted to 'N' rather than uppercase characters, causing them to be ignored by downstream processing.
Note:
At the end of this method the fastaFile's file pointer will point at the beginning of the next EST (if any) in the file.

Definition at line 115 of file EST.cpp.

References EST(), estList, getLine(), maxESTlen, normalizeBases(), offset, and sequence.

EST * EST::create ( const int  id,
const char *  info,
const char *  sequence = NULL,
const long  offset = -1 
) [static]

Create a valid EST.

This method must be used to create a valid EST in the system. The information required to create the EST must be passed in as the parameter. The EST names are expected to be unique in a given file.

Note:
If the new EST is successfully instantiated, then this method adds the newly created EST to the end of the list of ESTs maintianed by this class. Consequenlty, the parameter id must be equal to estList.size().
Parameters:
[in] id The unqiue ID value to be set for this EST.
[in] info The name and other information associated with the EST. This information is typically the first header line read from a FASTA file. This information can be NULL.
[in] sequence The actual sequence of base pairs associated with this EST. The sequence information that must be used to create this EST. The sequence information can be NULL.
[in] offset The offset of in the FASTA file from where this EST was read. This information can be used to conditionally and rapidly load EST's from a file.
Returns:
If the id is valid and a duplicate EST with the same ID is not present, then this method creates a new EST and returns a pointer to that EST back to the caller.

Definition at line 100 of file EST.cpp.

References EST(), and estList.

Referenced by LCFilter::addDummyEntry(), and ESTAnalyzer::loadFASTAFile().

void EST::deleteAllESTs (  )  [static]

Delete and clear all ESTs.

This method can be used to delete and clear all the EST's from the internal list of EST's currently loaded.

Definition at line 177 of file EST.cpp.

References estList, and id.

Referenced by CLU::~CLU(), and FWAnalyzer::~FWAnalyzer().

void EST::deleteLastESTs ( const int  count  )  [static]

Delete and clear out the last EST in the list.

This method can be used to delete the last EST in the list. This method rests the maximum EST length instance variable as needed. This method is typically used to remove dummy ESTs that are added to the end of the list by some filters.

Parameters:
[in] count The number of ESTs to be removed from the list.

Definition at line 187 of file EST.cpp.

References estList.

Referenced by LCFilter::finalize().

void EST::dumpEST ( std::ostream &  os  ) 

Dump this EST information in FASTA format.

This method can be used to dump the information associated with the EST in FASTA format to a given output stream.

Parameters:
[in] os The output stream to which the EST's information must be written in FASTA format.

Definition at line 217 of file EST.cpp.

References getInfo(), and getSequence().

void EST::dumpESTList ( std::ostream &  os,
const bool  processed 
) [static]

Dump currently loaded and (un)processed ESTs in FASTA format.

This method can be used to dump the currently loaded EST's in FASTA file format to a given output stream.

Parameters:
[out] os The output stream to which EST data is to be dumped.
[in] processed If this flag is true, then this method dumps only those ESTs that have been flagged as having been processed. If this flag is false, then this method dumps only un-processed ESTs.

Definition at line 205 of file EST.cpp.

References estList, and id.

void EST::dumpESTList ( std::ostream &  os  )  [static]

Dump currently loaded ESTs in FASTA format.

This method can be used to dump the currently loaded EST's in FASTA file format to a given output stream.

Parameters:
[out] os The output stream to which EST data is to be dumped.

Definition at line 196 of file EST.cpp.

References estList, and id.

Referenced by applyFilters().

char * EST::duplicate ( const char *  src  )  [static, private]

A utility method to duplicate a c-string.

This msethod is a simple utililty method that can be used to duplicate a given C-string. This method uses the stsandard C++ new operator to duplicate the given C-string.

Returns:
This method simply returns NULL if src is NULL. Otherwise this method returns a pointer to a duplicate version of the specified string.

Definition at line 235 of file EST.cpp.

Referenced by EST().

const std::auto_ptr<ESTCustomData>& EST::getCustomData (  )  const [inline]

Obtain an immutable reference to custom data associated with this EST.

This method can be used to obtain an immutable reference to the custom data associated with this EST. This method essentially returns the custom value set by the last successful call to one of the polymorphic setCustomData() methods in this class. By default this method returns NULL.

Note:
The custom data set in this class is returned as a auto_ptr.
Returns:
The custom data (if any) associated with this EST.

Definition at line 405 of file EST.h.

References customData.

std::auto_ptr<ESTCustomData>& EST::getCustomData (  )  [inline]

Obtain a mutable reference to custom data associated with this EST.

This method can be used to obtain a mutable reference to the custom data associated with this EST. This method essentially returns the custom value set by the last successful call to one of the polymorphic setCustomData() methods in this class. By default this method returns NULL.

Note:
The custom data set in this class is returned as a auto_ptr.
Returns:
The custom data (if any) associated with this EST.

Definition at line 389 of file EST.h.

References customData.

Referenced by CLU::buildHashMaps(), and CLU::getMetric().

static EST* EST::getEST ( const int  estIdx  )  [inline, static]

Obtain a given EST from the EST list.

This method is a convenience method that can be used to obtain a given EST from the list of ESTs.

Parameters:
[in] estIdx The zero-based index of the EST that is desired from the list of ESTs in this class. If this index is invalid then the behavior of this method is undefined.
Returns:
A mutable pointer to the EST at the provided estIdx index position in the EST list.

Definition at line 211 of file EST.h.

References estList.

Referenced by MSTClusterMaker::addEST(), InteractiveConsole::analyze(), FilterChain::applyFilters(), UVSampleHeuristic::computeHash(), NewUVHeuristic::computeHash(), TVHeuristic::countCommonWords(), MSTNode::getESTInfo(), TwoPassD2::getMetric(), D2Zim::getMetric(), CLU::getMetric(), MSTClusterMaker::manager(), PMSTClusterMaker::populateCache(), MSTClusterMaker::populateCache(), InteractiveConsole::print(), D2::runD2(), LengthFilter::runFilter(), TVHeuristic::runHeuristic(), UVSampleHeuristic::setReferenceEST(), TwoPassD2::setReferenceEST(), NewUVHeuristic::setReferenceEST(), D2Zim::setReferenceEST(), and D2::setReferenceEST().

static int EST::getESTCount (  )  [inline, static]

Obtain the number of ESTs in this list.

This method may be used to determine the number of ESTs that have been defined and added to this list.

Returns:
The number of ESTs currently defined.

Definition at line 183 of file EST.h.

References estList.

Referenced by LCFilter::addDummyEntry(), FilterChain::applyFilters(), TwoPassD2::getMetric(), D2Zim::getMetric(), D2::getMetric(), InteractiveConsole::initialize(), MSTClusterMaker::manager(), LengthFilter::runFilter(), UVSampleHeuristic::runHeuristic(), NewUVHeuristic::runHeuristic(), UVSampleHeuristic::setReferenceEST(), TwoPassD2::setReferenceEST(), NewUVHeuristic::setReferenceEST(), D2Zim::setReferenceEST(), and D2::setReferenceEST().

static std::vector<EST*>& EST::getESTList (  )  [inline, static]
int EST::getID (  )  const [inline]

Obtain the ID of this EST.

Returns:
The ID of the EST that was set when this EST was created.

Definition at line 273 of file EST.h.

Referenced by ESTAnalyzer::loadFASTAFile().

const char* EST::getInfo (  )  const [inline]

Obtain the information associated with this EST.

The name and other information associated with the EST. This information is typically the first header line read from a FASTA file. This information can be NULL if the EST is only partially loaded.

Returns:
Any information available for this EST.

Definition at line 284 of file EST.h.

Referenced by InteractiveConsole::analyze(), FWAnalyzer::dumpEST(), dumpEST(), CLU::dumpEST(), MSTNode::getESTInfo(), and InteractiveConsole::print().

std::string EST::getLine ( FILE *  fp  )  [static]

Helper method to read a line from a given file.

This is a helper method that can be used to read a long line from a given file.

Parameters:
[in] fp The file from where the line is to be read.
Returns:
The string read from the file.

Definition at line 76 of file EST.cpp.

Referenced by create().

size_t EST::getMaxESTLen (  )  [static]

Helper method to determine the longest EST.

This method can be used to determine the length of the longest EST loaded thus far. This information is typically used to allocate buffers and other data structures for analysis.

Note:
This method computes the length of the longest EST the first time it is invoked. Consequently, it should be called only after all the ESTs have been loaded.
Returns:
The length of the longest EST to be processed.

Definition at line 292 of file EST.cpp.

References estList, getSequence(), and maxESTlen.

Referenced by D2::buildWordTable(), TwoPassD2::initialize(), TVHeuristic::initialize(), and D2Zim::initialize().

int EST::getProcessedESTCount (  )  [static]

Obtain count of ESTs that have been flagged as being processed.

This method can be used to determine the number of ESTs that have been flagged as being processed. Subtracting this number from the total number of ESTs indicates the number of ESTs to be processed.

Note:
This method iterates over the list of ESTs to determine the current number of processed ESTs. So use this method sparingly.
Returns:
The number of ESTs that have been flagged as having been processed.

Definition at line 280 of file EST.cpp.

References estList, and processed.

Referenced by MSTClusterMaker::manager().

const char* EST::getSequence (  )  const [inline]

Obtain the actual sequence of base pairs for this EST.

Note that sequence ifnoramtion for an EST can be null if itis only partially loaded from a file. Entries are parially loaded to reduce memory foot print when processing large data sets.

Returns:
The actual sequence of base paris for this EST if available. Otherwise this method returns NULL.

Definition at line 296 of file EST.h.

References sequence.

Referenced by CLU::buildHashMaps(), UVSampleHeuristic::computeHash(), NewUVHeuristic::computeHash(), TVHeuristic::countCommonWords(), dumpEST(), CLU::dumpEST(), FWAnalyzer::getFrame(), getMaxESTLen(), D2Zim::getMetric(), CLU::getMetric(), InteractiveConsole::print(), D2::runD2(), LengthFilter::runFilter(), TVHeuristic::runHeuristic(), UVSampleHeuristic::setReferenceEST(), TwoPassD2::setReferenceEST(), NewUVHeuristic::setReferenceEST(), D2Zim::setReferenceEST(), and D2::setReferenceEST().

float EST::getSimilarity (  )  const [inline]

Obtain the similarity metric for this EST.

The similarity metric is a quantitative representation of the similarity between two ESTs. The similarity metric is generated during analysis when one EST is compared with another. The similarity value is initialized to -1.

Returns:
The similarity metric for this EST.

Definition at line 307 of file EST.h.

References similarity.

Referenced by FWAnalyzer::dumpEST(), and CLU::dumpEST().

bool EST::hasBeenProcessed (  )  const [inline]

Determine if this EST has already been processed.

This method exposes a generic flag that is provided as a convenience for algorithms to mark if this EST has gone through their processing.

Returns:
This method returns true if this EST has been flagged as having been processed. Otherwise this method returns false.

Definition at line 448 of file EST.h.

References processed.

Referenced by FilterChain::applyFilters(), MSTClusterMaker::manager(), PMSTClusterMaker::populateCache(), and MSTClusterMaker::populateCache().

void EST::normalizeBases ( std::string &  sequence,
const bool  maskBases = true 
) [static, private]

Helper method to normalize a given nucleotide sequence.

This method is used to normalize fragments read from a FASTA file. This method normalizes the sequences such that the resulting sequence is over the set {'A', 'T', 'C', 'G', 'N'} in the following manner:

  • If the maskBases flag is true, then all lowercase nucleotides are converted to 'N'. Otherwise they are converted to uppercase equivalents.

  • All nucleotides that are not in "ATCG" are converted to 'N'.

Parameters:
[in,out] sequence The sequence of nucleotides to be normalized by this method.
[in] maskBases If this flag is true, then all lowercase "atcg" bases are converted to 'N'. Otherwise they are converted to uppercase letters.

Definition at line 250 of file EST.cpp.

Referenced by create().

EST & EST::operator= ( const EST src  )  [private]

A dummy operator=.

The operator=() is supressed for this class as it has constant members whose value is set when the object is created. These values cannot be changed during the lifetime of this object.

Parameters:
[in] src The source object from where data is to be copied. Currently this value is ignored.
Returns:
Reference to this.

Definition at line 275 of file EST.cpp.

bool EST::repopulate ( FILE *  fastaFile  ) 

Repopulate necessary information from a given fasta file.

This method can be used to request an EST to repopulate its FASTA header and actual sequence (base pair) information from a given FASTA file. This method uses the offset (saved when this EST was originally loaded) to load the information from the file.

Parameters:
[in,out] fastaFile The file from where the EST information is to be loaded. If the file changes during EST analysis the behavior of this method is undefined.
Returns:
This method returns true if the repopulating the data was successfully completed. On errors this method returns false.
void EST::setCustomData ( ESTCustomData src  )  [inline]

Change the custom data associated with this EST.

This method can be used to change (or set) the custom data associated with this EST. Note that any earlier custom data associated with this EST is lost (and deleted if necessary by auto_ptr) before the new value is set.

Parameters:
[in,out] src The new custom data to be set for this EST. After this call, this EST owns the data referred by src.

Definition at line 372 of file EST.h.

References customData.

void EST::setCustomData ( std::auto_ptr< ESTCustomData > &  src  )  [inline]

Change the custom data associated with this EST.

This method can be used to change (or set) the custom data associated with this EST. Note that any earlier custom data associated with this EST is lost (and deleted if necessary by auto_ptr) before the new value is set.

Parameters:
[in,out] src The new custom data to be set for this EST. After this call, this EST owns the data referred by src.

Definition at line 360 of file EST.h.

References customData.

void EST::setProcessed ( const bool  processedFlag  )  [inline]

Set if this EST has already been processed.

This method provides a generic flag as a convenience for algorithms to mark if this EST has gone through their processing. By default ESTs are marked has processed when they are instantiated.

Parameters:
[in] processedFlag If this flag is true then this EST is flagged as having been processed. If this flag is false then the EST is flagged as not-processed (and requiring processing).

Definition at line 462 of file EST.h.

References processed.

Referenced by LCFilter::addDummyEntry(), and MSTClusterMaker::addEST().

void EST::setSimilarity ( const float  sim  )  [inline]

Set the similarity metric for an EST.

This method must be used to change the similarity metric for this EST. The similarity metric is a quantitative representation of the similarity between two ESTs. The similarity metric is generated during analysis when one EST is compared with another.

Parameters:
[in] sim The similarity metric value to which this EST's similarity much be changed.

Definition at line 320 of file EST.h.

References similarity.

void EST::unpopulate (  ) 

Method to clear general information and sequence data.

This method can be used to unpopulate the FASTA header and actual sequence (base pairs) information from this EST. This frees up memory allocated to hold this data thereby minimizing the memory footprint for this EST. This enables holding a large number of skeleton EST's in memory.

Definition at line 66 of file EST.cpp.

References info, and sequence.

Referenced by ESTAnalyzer::loadFASTAFile(), and ~EST().


Member Data Documentation

std::auto_ptr<ESTCustomData> EST::customData [protected]

Place holder for some other custom data.

This pointer acts as a convenient place holder for other classes to associate some uninterpreted user data (or data structure). This member is initialized to NULL in the constructor. Note that this pointer is managed using an auto_ptr that automatically deletes the data when the auto_ptr loses ownership of the data object.

Definition at line 541 of file EST.h.

Referenced by getCustomData(), and setCustomData().

std::vector< EST * > EST::estList [static, private]

The list of EST's currently being used.

This list contains the complete set of ESTs that are currently defined. This list includes partially loaded ESTs as well. New entries are added to the list by the create method.

Definition at line 598 of file EST.h.

Referenced by create(), deleteAllESTs(), deleteLastESTs(), dumpESTList(), getEST(), getESTCount(), getESTList(), getMaxESTLen(), and getProcessedESTCount().

const int EST::id [protected]

The unique ID for this EST.

This member holds the unique ID for this EST. The ID is set when the EST is instantiated and is never changed during the life time of this EST. The id is used to access and extract EST information.

Definition at line 473 of file EST.h.

Referenced by deleteAllESTs(), and dumpESTList().

char* EST::info [protected]

The name and other information associated with the EST.

This information is typically the first header line read from a FASTA file. The information may be dynamically loaded on demand to reduce memory footprint when processing large data sets.

Definition at line 481 of file EST.h.

Referenced by EST(), and unpopulate().

size_t EST::maxESTlen = 0 [static, protected]

Size of the longest EST.

This static instance variable is used to track the size (in number of nucleotides) of the longest EST ever instantiated. The size of the longest EST can be used by algorithms to optimally allocate memory for processing ESTs.

Definition at line 530 of file EST.h.

Referenced by create(), and getMaxESTLen().

const long EST::offset [protected]

The offset in the FASTA file to load the data from.

The offset of in the FASTA file from where this EST was read. This information can be used to conditionally and rapidly load EST's from a file. This value is initialized when the EST is insantiated and is never changed during the life time of an object.

Definition at line 498 of file EST.h.

Referenced by create().

bool EST::processed [protected]

Instance variable to track if EST has gone through some processing.

This is a generic flag that is provided as a convenience for algorithms to mark if this EST has gone through their processing. By default this instance variable is intialized to false. Once it has been processed, the setProcessed() method can be used to set/reset this flag. The hasBeenProcessed() method can be used to determine if this EST has already been processed.

Definition at line 521 of file EST.h.

Referenced by EST(), getProcessedESTCount(), hasBeenProcessed(), and setProcessed().

char* EST::sequence [protected]

The actual sequence of base pairs associated with this EST.

This information is typically read from a FASTA file. The information may be dynamically loaded on demand to reduce memory footprint when processing large data sets.

Definition at line 488 of file EST.h.

Referenced by create(), EST(), getSequence(), and unpopulate().

float EST::similarity [protected]

A similarity value for this EST with respect to another EST.

This instance variable is used to hold a similarity metric for this EST. The similarity metric is generated during analysis when one EST is compared with another. The similarity value is initialized to -1. It is accessed via the getSimilarity() method and changed via the setSimilarity() method.

Definition at line 508 of file EST.h.

Referenced by EST(), getSimilarity(), EST::LessEST::operator()(), and setSimilarity().


The documentation for this class was generated from the following files:

Generated on 19 Mar 2010 for PEACE by  doxygen 1.6.1