dbStat
Class ClusterUtils

java.lang.Object
  extended bydbStat.ClusterUtils

public class ClusterUtils
extends java.lang.Object

Class defines worker functions bear the brunt of the work to implement k-means clustering


Field Summary
private static java.util.ArrayList attributeVectors
          The collection of AttributeVectors which are vectors of statistical values for each individual stock in the stockstat table.
private static AttributeVector paramVector
          The cluster parameter vector that indicates the components chosen for the current clustering run.(1 if chosen 0 otherwise)
static java.util.Random randGen
          A class-wide random number generator to be used for all random number generation
 
Constructor Summary
ClusterUtils()
           
 
Method Summary
static double calculateDistance(AttributeVector vector1, AttributeVector vector2)
          Calculate the euclidian distance between 2 n-dimensional vectors
static AttributeVector cloneVector(AttributeVector originalVector)
          Returns an exact clone of the given vector
static void fetchAndLoadStatistics()
          Runs a query and retrieves the results set to get every ticker and value from the table containing statistical information.
static int findMinCluster(AttributeVector vector, double threshold)
          Finds the cluster to whom the vector is closest to
static double findVectorClusterDist(AttributeVector vector, Cluster cluster)
          Finds the distance between a vector and a cluster's centroid vector
static java.lang.String generateDetailedDescription()
          Creates a very detailed string about the outcome of the clustering
static AttributeVector getParamVector()
          Gets the parameter vector for the clustering run
static double getSampleD()
          Computes an appropriate initial distance metric to serve as the threshold distance for which vectors will create a new cluster if their distance to all existing clusters' centroids exceeds this value.
static java.util.HashMap getStockCompanyHash()
          Gets a hashmap with keys as stock tickers and values as company names
static java.util.ArrayList getVectorSpace()
          Gets the vectors that compose the vector space
static void initializeParamVector()
          Initializes the paramter vector Sets all component values as 1 indicating that all statistic are chosen for the cluster run
static void initializeParamVector(java.util.ArrayList changedVector)
          Takes an array of string of "1" and "0" and sets the paramVector class variable to take these integer values
static boolean isVectorSpaceNull()
          Gets if the vector space is null or not.
static void normalizeVectors()
          Normalizes all the vectors' values to between 0 and 1
static java.util.ArrayList randomizeCollection(java.util.ArrayList list)
          Randomizes an array list collection by repeatedly switching pairs of objects
static void randomizeInternalCollection()
          Randomizes the vector space by sequentially swapping 2 randomly chosen vectors
static void runIncrementalD(double d)
          Runs the clustering with the inter cluster cutoff distance equal to d to get an initial group seperation of vectors from the vector space
static void runKmeans(int maxIter, WaitMessage wm)
          Runs the iterative K-Means clustering algorithm until the number of iterations are satisfied or there is no more movement of vectors between clusters
static void setParamVector(AttributeVector pVector)
          Sets the parameter vector for the clustering run
static void writeResultsToFile(java.io.File fileName, java.lang.String data)
          Print the results of the clustering to the file specified
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

attributeVectors

private static java.util.ArrayList attributeVectors
The collection of AttributeVectors which are vectors of statistical values for each individual stock in the stockstat table.


paramVector

private static AttributeVector paramVector
The cluster parameter vector that indicates the components chosen for the current clustering run.(1 if chosen 0 otherwise)


randGen

public static java.util.Random randGen
A class-wide random number generator to be used for all random number generation

Constructor Detail

ClusterUtils

public ClusterUtils()
Method Detail

getParamVector

public static AttributeVector getParamVector()
Gets the parameter vector for the clustering run

Returns:
Returns the parameter vector used in the clustering run

initializeParamVector

public static void initializeParamVector()
Initializes the paramter vector Sets all component values as 1 indicating that all statistic are chosen for the cluster run


initializeParamVector

public static void initializeParamVector(java.util.ArrayList changedVector)
Takes an array of string of "1" and "0" and sets the paramVector class variable to take these integer values

Parameters:
changedVector - the array of strings which are used to update the paramVector

setParamVector

public static void setParamVector(AttributeVector pVector)
Sets the parameter vector for the clustering run

Parameters:
pVector - The parameter vector to be used for the clustering run

cloneVector

public static AttributeVector cloneVector(AttributeVector originalVector)
Returns an exact clone of the given vector

Parameters:
originalVector - The original vector to be cloned
Returns:
The clone

getVectorSpace

public static java.util.ArrayList getVectorSpace()
Gets the vectors that compose the vector space

Returns:
the collection of vectors that inhabit the clustering space

isVectorSpaceNull

public static boolean isVectorSpaceNull()
Gets if the vector space is null or not.

If it is null, it implies that the system has not been initialized

Returns:
true if the vector space is null, false otherwise

fetchAndLoadStatistics

public static void fetchAndLoadStatistics()
Runs a query and retrieves the results set to get every ticker and value from the table containing statistical information.

This data is then loaded into the class AttributeVector collection.


normalizeVectors

public static void normalizeVectors()
Normalizes all the vectors' values to between 0 and 1


runIncrementalD

public static void runIncrementalD(double d)
Runs the clustering with the inter cluster cutoff distance equal to d to get an initial group seperation of vectors from the vector space

Parameters:
d - the cutoff point for inter-cluster distances

runKmeans

public static void runKmeans(int maxIter,
                             WaitMessage wm)
Runs the iterative K-Means clustering algorithm until the number of iterations are satisfied or there is no more movement of vectors between clusters

Parameters:
maxIter - the maximum number of iterations to run the simulation for
wm - the wait message box, needed to post messages to

calculateDistance

public static double calculateDistance(AttributeVector vector1,
                                       AttributeVector vector2)
Calculate the euclidian distance between 2 n-dimensional vectors

Parameters:
vector1 - the first vector
vector2 - the second vector
Returns:
the euclidian distance between the 2 vectors

findVectorClusterDist

public static double findVectorClusterDist(AttributeVector vector,
                                           Cluster cluster)
Finds the distance between a vector and a cluster's centroid vector

Parameters:
vector - the vector to compare to the cluster's centroid
cluster - the cluster to compare the vector to
Returns:
the distance between the vector and the cluster's centroid

findMinCluster

public static int findMinCluster(AttributeVector vector,
                                 double threshold)
Finds the cluster to whom the vector is closest to

Parameters:
vector - the vector trying to find the closest cluster
threshold - a threshold value that if the min distance from the vector to the closest cluster is less than the threshold, the vector is added to that cluster, otherwise the function returns 0 and this is a signal to create a new cluster.
Returns:
the index of the cluster to whom this vector is closest to

randomizeCollection

public static java.util.ArrayList randomizeCollection(java.util.ArrayList list)
Randomizes an array list collection by repeatedly switching pairs of objects

Parameters:
list - the list to randomize
Returns:
the randomized list

randomizeInternalCollection

public static void randomizeInternalCollection()
Randomizes the vector space by sequentially swapping 2 randomly chosen vectors


getSampleD

public static double getSampleD()
Computes an appropriate initial distance metric to serve as the threshold distance for which vectors will create a new cluster if their distance to all existing clusters' centroids exceeds this value.

Returns:
the average distance between these randomly chosen vectors

generateDetailedDescription

public static java.lang.String generateDetailedDescription()
Creates a very detailed string about the outcome of the clustering

Returns:
a descriptive synopsis of the results of the clustering run

writeResultsToFile

public static void writeResultsToFile(java.io.File fileName,
                                      java.lang.String data)
Print the results of the clustering to the file specified

Parameters:
fileName - the file name to output the results to

getStockCompanyHash

public static java.util.HashMap getStockCompanyHash()
Gets a hashmap with keys as stock tickers and values as company names

Returns:
a hashmap containing ticker/company pairs