A filter to weed out reads with Low Complexity (LC) sections. More...
#include <LCFilter.h>
Public Member Functions | |
virtual void | showArguments (std::ostream &os) |
Display valid command line arguments for this filter. | |
virtual bool | parseArguments (int &argc, char **argv) |
Process command line arguments. | |
virtual int | initialize () |
Method to begin filter analysis (if any). | |
virtual void | finalize () |
Method to indicate completion of filter analysis. | |
virtual | ~LCFilter () |
The destructor. | |
Protected Member Functions | |
LCFilter (ClusterMaker *clusterMaker) | |
The default constructor. | |
virtual int | runFilter (const int estIdx) |
Apply filter rules to determine if a given EST should be filtered out. | |
virtual void | addDummyEntry (const std::string &fastaID, const std::string &seq, const int length) |
Add a dummy entry with given sequence and length to the list of ESTs. | |
Private Member Functions | |
LCFilter & | operator= (const LCFilter &src) |
A dummy operator=. | |
Private Attributes | |
std::vector< DummyESTInfo > | dummyESTList |
A list containing information about dummy ESTs. | |
Static Private Attributes | |
static arg_parser::arg_record | argsList [] |
The set of arguments specific to this filter. | |
static char * | patternList = LCFilter::DefaultPatternList |
The list of patterns to be used for generating dummy ESTs. | |
static char | DefaultPatternList [] = "A,C" |
The default pattern used by this filter. | |
static int | threshold = -1 |
The filter's similarity/distance threshold metric. | |
Friends | |
class | FilterFactory |
A filter to weed out reads with Low Complexity (LC) sections.
This class provides a filter that can be used to filter out ESTs that contain regions of Low Complexity (LC) reads in them. This filter is needed When clustering FASTA data (that contains low complexity reads) because the LC sections provide a "false" relationships between ESTs giving raise to very large clusters. These large clusters are created because transitive relationships are established between ESTs due to low complexity reads.
In order to avoid super-clusters that get formed due to low complexity reads, PEACE adds dummy ESTs (such as: one with all "AAAAA...."
and another with all "CCCCCC..."
) based on the pattern ESTs specified by the user. The length of the dummy ESTs are twice the length of the largest window used for analysis. The ESTs are subjected to the same analysis and ESTs that are sufficiently similar to the dummy entries are filtered out.
This filter creates several dummy clusters (one per pattern sepcified) with the meta name "Low Complexity ESTs (filtered by LCFilter Pattern AA)" and adds any ESTs filtered out by this EST to the appropriate cluster.
This class has been developed by extending the Filter base class and implementing the necessary API methods specified by the base class. This enables the LCFilter to be used in the FilterChain along with other filters to filter out ESTs. In addition, note that this class cannot be directly instantiated. Instead, the FilterFactory::create() method must be used to obtain an instance of this class.
Definition at line 96 of file LCFilter.h.
virtual LCFilter::~LCFilter | ( | ) | [inline, virtual] |
The destructor.
The destructor for the filter. The destructor currently has no specific tasks to perform as this filter does not use any dynamic memory.
Definition at line 164 of file LCFilter.h.
LCFilter::LCFilter | ( | ClusterMaker * | clusterMaker | ) | [protected] |
The default constructor.
The constructor has been made protected to ensure that this class is never directly instantiated. Instead an instance should be created via a suitable call to the FilterFactory API method(s).
[in] | clusterMaker | The cluster maker class that is being used for analysis. This parameter is simply passed onto the base class for its use. It is used the initialize method to create a dummy cluster for use by this filter. |
Definition at line 58 of file LCFilter.cpp.
void LCFilter::addDummyEntry | ( | const std::string & | fastaID, | |
const std::string & | seq, | |||
const int | length | |||
) | [protected, virtual] |
Add a dummy entry with given sequence and length to the list of ESTs.
This method is invoked from the initialize method to add a dummy EST. Typically two dummy ESTs (one all "AAAAA..."
and another one with all "CCCC...."
) are added. This is
dummyESTList
.[in] | fastaID | The fasta ID to be assigned to the dummy EST. This FASTA ID is not really useful is included for completeness. |
[in] | seq | The nucleotide sequences to be repeated in order to generate the complete FASTA sequence for the dummy EST. This sequence must be at least one character in length. Only valid nucleotide base pairs must be |
[in] | length | The minimum length of the generated sequence. Note that if the length and the pattern in the squence are not integral multiples of each other, then this method may generate slightly longer ESTs. |
Definition at line 122 of file LCFilter.cpp.
References ClusterMaker::addDummyCluster(), Filter::clusterMaker, EST::create(), dummyESTList, EST::getESTCount(), and EST::setProcessed().
Referenced by initialize().
void LCFilter::finalize | ( | ) | [virtual] |
Method to indicate completion of filter analysis.
This method is invoked after all the filteration operations have been successfully completed. This method removes all the dummy ESTs that were created and added by this filter.
Implements Filter.
Definition at line 146 of file LCFilter.cpp.
References EST::deleteLastESTs(), and dummyESTList.
int LCFilter::initialize | ( | ) | [virtual] |
Method to begin filter analysis (if any).
This method is invoked just before commencement of filtration. This method creates a set of dummy ESTs and corresponding dummy clusters for filtering out entries with low complexity regions. It uses the patternList to add dummy ESTs.
Implements Filter.
Definition at line 85 of file LCFilter.cpp.
References addDummyEntry(), ASSERT, Filter::clusterMaker, DefaultPatternList, ClusterMaker::getAnalyzer(), ESTAnalyzer::getInvalidMetric(), ESTAnalyzer::getPreferredDummyESTLength(), patternList, and threshold.
A dummy operator=.
The operator=() is supressed for this class as it has constant members whose value is set when the object is created. These values cannot be changed during the lifetime of this object.
[in] | src | The source object from where data is to be copied. Currently this value is ignored. |
Reimplemented from Filter.
bool LCFilter::parseArguments | ( | int & | argc, | |
char ** | argv | |||
) | [virtual] |
Process command line arguments.
This method is used to process command line arguments specific to this filter. This method is typically used from the main method just after the filter has been instantiated. This method consumes all valid command line arguments. If the command line arguments were valid and successfully processed, then this method returns true
.
[in,out] | argc | The number of command line arguments to be processed. This value is updated when valid command line arguments are consumed by the filter. |
[in,out] | argv | The array of command line arguments. The number of entries in this array are modified and updated when valid arguments are consumed by the filter. |
true
if the command line arguments were successfully processed. Otherwise this method returns false
. Implements Filter.
Definition at line 71 of file LCFilter.cpp.
References arg_parser::check_args(), Filter::filterName, and patternList.
int LCFilter::runFilter | ( | const int | estIdx | ) | [protected, virtual] |
Apply filter rules to determine if a given EST should be filtered out.
This method is invoked from the applyFilter() method to perform the actual filtering. The filtering is performed on the given EST in the following manner:
This method obtains the analyzer from the cluster maker and sets the given estIdx as the reference EST for analysis.
For each dummy EST in dummyESTList
this method performs the following tasks:
[in] | estIdx | The index of the EST to be tested and filtered by this method. |
Implements Filter.
Definition at line 157 of file LCFilter.cpp.
References ESTAnalyzer::analyze(), ASSERT, Filter::clusterMaker, ESTAnalyzer::compareMetrics(), dummyESTList, ClusterMaker::getAnalyzer(), ESTAnalyzer::setReferenceEST(), and threshold.
void LCFilter::showArguments | ( | std::ostream & | os | ) | [virtual] |
Display valid command line arguments for this filter.
This method must be used to display all valid command line options that are supported by this filter. This method overrides the corresponding method in the base class API. This method is typically used in the main() method when displaying usage information.
[out] | os | The output stream to which the valid command line arguments must be written. |
Implements Filter.
Definition at line 64 of file LCFilter.cpp.
friend class FilterFactory [friend] |
Definition at line 97 of file LCFilter.h.
arg_parser::arg_record LCFilter::argsList [static, private] |
{ {"--lcPatterns", "List of (, separated) patterns to generate dummy ESTs", &LCFilter::patternList, arg_parser::STRING}, {"--lcThreshold", "Threshold value to detect low complexity sequences", &LCFilter::threshold, arg_parser::INTEGER}, {NULL, NULL, NULL, arg_parser::BOOLEAN} }
The set of arguments specific to this filter.
This instance variable contains a static list of arguments that are specific only to this filter class. This argument list is statically defined and shared by all instances of this class.
Definition at line 277 of file LCFilter.h.
char LCFilter::DefaultPatternList = "A,C" [static, private] |
The default pattern used by this filter.
This constant defines the default pattern that is used by this filter to generate dummy ESTs for identifying low complexity regions. The current default pattern is "A,C". This causes two dummy ESTs (one with all AAAAA
... and another EST with all CCCC
...) to be created and used for filtering.
Definition at line 298 of file LCFilter.h.
Referenced by initialize().
std::vector<DummyESTInfo> LCFilter::dummyESTList [private] |
A list containing information about dummy ESTs.
This list is used to hold the core information about dummy ESTs created, used, and (finally) removed by this filter. For each pattern specified by the user (as a command line argument), a dummy EST is added (to the global list of ESTs) and this list maintains information about them. Entries are added to this list in the initialize() method. The entries are used on the runFilter() method. The finalize() method clears out the dummy ESTs and the entries in this list.
Definition at line 265 of file LCFilter.h.
Referenced by addDummyEntry(), finalize(), and runFilter().
char * LCFilter::patternList = LCFilter::DefaultPatternList [static, private] |
The list of patterns to be used for generating dummy ESTs.
This variable is used to refer to the set of patterns that must be used to generate the dummy ESTs for identifying low complexity regions. The list is in the form "A,C,AG,TC" -- that is it contains a comma separated list of patterns. The patterns are repeated to generate dummy entries. The pattern can eb set via the --lcPatterns
command line argument.
Definition at line 288 of file LCFilter.h.
Referenced by initialize(), and parseArguments().
int LCFilter::threshold = -1 [static, private] |
The filter's similarity/distance threshold metric.
This variable is used to contain the similarity/distance metric to be used as the threshold value. This value is compared with the metric provided by the ESTAnalyzer to determine if a given EST contains a low complexity section. This value is an important metric. This value is set via the --lcThreshold
command line argument.
Definition at line 309 of file LCFilter.h.
Referenced by initialize(), and runFilter().