stockDataRetrieval
Class DownloadManager

java.lang.Object
  extended bystockDataRetrieval.DownloadManager
Direct Known Subclasses:
DownloadManagerAMEX, DownloadManagerNASDAQ, DownloadManagerNYSE

public abstract class DownloadManager
extends java.lang.Object

Manages the downloading of stock/comapny information from a specific set of web pages and the downloading of stock history files from Yahoo's web servers.

The manager downloads all stocks from NYSE, AMEX, and NASDAQ and saves them to a directory structure created in the local file system. In order to download the thousands of individual stock history files, the manager spawns threads and keeps, at most, a constant number of threads open at once to optimize the network delay problem.


Nested Class Summary
private  class DownloadManager.DownloaderThread
          Threaded class to download the webpages in a threaded manner instead of in a sequential manner.
 
Field Summary
private static int completedThreads
          the number of completed threads
private  java.lang.Object lock
          a lock object to synchronize threads
private  java.lang.String market
          the market to which the list of tickers belongs
private  int MAX_CONNECTIONS
          the maximum number of HTTP connections allowed to be open at once
private static int numThreads
          the number of currently active threads
private  java.util.Vector threadVector
          a collection of threads that are waiting their turn to execute
private  java.lang.String tickerFile
          the file to which to save the list of tickers to
private static java.lang.String YAHOO_DefaultDownloadPage
          direct link to yahoo's page to download a stock's .csv file with date and ticker information removed and replaced with easy to find identifiers
private static java.lang.String YAHOO_DownloadPage
          yahoo's URL with date and ticker information removed and replaced with easy to find identifyers
 
Constructor Summary
DownloadManager(int maxConnections, java.lang.String tickerFile, java.lang.String market)
          Constructor allows user to specify the maximum number of parallel connections to be active and downloading from the internet at one time.
 
Method Summary
private  java.lang.String addCurrentDate(java.lang.String oldString, java.lang.String newString)
          Inserts the current date into the download URL so that when fetching the history files, the correct date range is used
private  java.lang.String constructTickerURL(java.lang.String ticker)
          Constructs a URL with the ticker symbol passed suitable for downloading stock information from finance.yahoo.com.
 boolean createDirectory(java.lang.String directory)
          Create the directory given as a parameter
 void createDirectoryStructure()
          Creates all the directories and files necessary to store information from the web pages containing ticker information
private  int getCounter()
          Gets the current number of threads running.
 void getNewsStories()
          Fetch all news stories for every ticker in the current market
 void getStockHistoryFiles()
          Retrieves files containing all historical information about a stock and stores it to a local directory.
abstract  void getStockTickers()
          Compiles a list of all stock ticker symbols from the given market.
 void getTickerAndCompany(java.lang.String targetURL, java.lang.String embeddedTickerCode)
          Series of commands to download the web page specified, parse it for relevant information, and write the extracted data to a file.
private  void incrementCounter()
          Called when a thread starts to increment the counter of the number of current threads running.
static void main(java.lang.String[] args)
           
private  void threadFinished()
          Called when a thread finishes.
 void writeDataToFile(java.util.ArrayList data)
          Write all data gathered from the web page containing ticker/company information to the file "./Data/tickers.txt".
private  void writeHistoryFile(java.lang.String data, java.lang.String ticker)
          Function writes downloaded stock ticker data to a file in a specific folder and names the files "'ticker'_'market'.txt".
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

YAHOO_DownloadPage

private static java.lang.String YAHOO_DownloadPage
yahoo's URL with date and ticker information removed and replaced with easy to find identifyers


YAHOO_DefaultDownloadPage

private static java.lang.String YAHOO_DefaultDownloadPage
direct link to yahoo's page to download a stock's .csv file with date and ticker information removed and replaced with easy to find identifiers


tickerFile

private java.lang.String tickerFile
the file to which to save the list of tickers to


market

private java.lang.String market
the market to which the list of tickers belongs


MAX_CONNECTIONS

private int MAX_CONNECTIONS
the maximum number of HTTP connections allowed to be open at once


numThreads

private static int numThreads
the number of currently active threads


completedThreads

private static int completedThreads
the number of completed threads


lock

private java.lang.Object lock
a lock object to synchronize threads


threadVector

private java.util.Vector threadVector
a collection of threads that are waiting their turn to execute

Constructor Detail

DownloadManager

public DownloadManager(int maxConnections,
                       java.lang.String tickerFile,
                       java.lang.String market)
Constructor allows user to specify the maximum number of parallel connections to be active and downloading from the internet at one time.

Parameters:
maxConnections - the maximum number of files to download in parallel. If a number < 1 is given, the default is 50.
Method Detail

addCurrentDate

private java.lang.String addCurrentDate(java.lang.String oldString,
                                        java.lang.String newString)
Inserts the current date into the download URL so that when fetching the history files, the correct date range is used

Parameters:
oldString - the original portion of the URL to replace
newString - the replacement of the date portion of the URL
Returns:
the URL with the oldString replaced by the newString

createDirectoryStructure

public void createDirectoryStructure()
Creates all the directories and files necessary to store information from the web pages containing ticker information


createDirectory

public boolean createDirectory(java.lang.String directory)
Create the directory given as a parameter

Parameters:
directory - the directory path to create
Returns:
true if the directory created successfully (or if it already exists), false otherwise

getStockTickers

public abstract void getStockTickers()
Compiles a list of all stock ticker symbols from the given market.

Delegates responsibility to functions to go to a specific website and repeatedly hit and parse pages containing company name and ticker symbols.


getNewsStories

public void getNewsStories()
Fetch all news stories for every ticker in the current market


getTickerAndCompany

public void getTickerAndCompany(java.lang.String targetURL,
                                java.lang.String embeddedTickerCode)
Series of commands to download the web page specified, parse it for relevant information, and write the extracted data to a file.

Parameters:
targetURL - the target URL for which to download all ticker/company names from
embeddedTickerCode - the pattern of the HTML that contains an individual ticker name

writeDataToFile

public void writeDataToFile(java.util.ArrayList data)
Write all data gathered from the web page containing ticker/company information to the file "./Data/tickers.txt".

Parameters:
data - an ArrayList containing entries of Ticker/CompanyName information in a String array of size [2]

getStockHistoryFiles

public void getStockHistoryFiles()
Retrieves files containing all historical information about a stock and stores it to a local directory.


constructTickerURL

private java.lang.String constructTickerURL(java.lang.String ticker)
Constructs a URL with the ticker symbol passed suitable for downloading stock information from finance.yahoo.com.

Parameters:
ticker - the ticker symbol to be embedded into the initial link
Returns:
the URL containing the specified ticker suitable for passing to the functions to download that stock's historical data

writeHistoryFile

private void writeHistoryFile(java.lang.String data,
                              java.lang.String ticker)
Function writes downloaded stock ticker data to a file in a specific folder and names the files "'ticker'_'market'.txt".

Parameters:
data - the historical data file downloaded from YAHOO
ticker - the ticker symbol used to determine the file name to save

getCounter

private int getCounter()
Gets the current number of threads running.

Returns:
the number of threads currently running

incrementCounter

private void incrementCounter()
Called when a thread starts to increment the counter of the number of current threads running.


threadFinished

private void threadFinished()
Called when a thread finishes.

Accesses a Vector of threads that have not yet begun, picks the first thread from the Vector, removes it from the Vector and tells it to start.

Updates the number of current threads running counter as well as the counter tracking the number of successfully completed threads.


main

public static void main(java.lang.String[] args)