|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||
java.lang.ObjectstockDataRetrieval.WebPageParser
Class contains the methods to allow a web page to be parsed for specific information.
Each stock market has it's own special format for parsing and the functions adapt to that special form and retrieves the required information.
| Constructor Summary | |
WebPageParser()
|
|
| Method Summary | |
private static java.lang.String |
cleanUpURL(java.lang.String url)
Removes certain filler and session ID number from the url pointing to the next older stories page and formats the url such that it is ready to be used to fetch the older stories |
static java.util.regex.Matcher |
createMatcher(java.lang.String expressionToMatch,
java.lang.String dataToSearch)
Generates some standard calls to take a pattern and try to match it to some bit of text. |
static java.util.ArrayList |
extractNewsStories(java.lang.String pageSource,
java.lang.String ticker)
Identifies all news stories and creates objects for each news story |
static java.lang.String |
extractNextPage(java.lang.String pageSource)
Looks in the page for the link to the next available page containing older stories |
static java.util.ArrayList |
getTickerSymbolAndCompany(java.lang.String pageSource,
java.lang.String embeddedTickerCode)
Parses the web page that contains stock ticker and company name information and returns an arraylist containing an ArrayList with each entry containing a pair of information |
static boolean |
isLastNewsPage(java.lang.String pageSource)
Parses the News Page for specific information that signals all available news stories pages have been traversed. |
private static java.lang.String[] |
parseStoryProperties(java.lang.String completeString)
Return the date, time, and source of the given news story |
private static java.lang.String |
prepareForParsing(java.lang.String pageSource)
Removes new lines and large white space gaps that make regular expression matching troublesome and extracts the section of the page that contains the news stories |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
public WebPageParser()
| Method Detail |
public static java.util.regex.Matcher createMatcher(java.lang.String expressionToMatch,
java.lang.String dataToSearch)
The user passes in the expression to compile to a regular expression and the text to search and the function returns the matcher object that has been applied to the text to search.
expressionToMatch - regular expression to try to match to the text dataToSearchdataToSearch - string to try to find instances of the expressionToMatch in
Matcher object contining all information about the results of applying
the regular expression to the text
public static java.util.ArrayList getTickerSymbolAndCompany(java.lang.String pageSource,
java.lang.String embeddedTickerCode)
pageSource - page source of the page for which to parse relevant information
fromembeddedTickerCode - the pattern of HTML containing the ticker information
ArrayList object containing ticker symbol (index - [0])
and company name (index - [1])public static boolean isLastNewsPage(java.lang.String pageSource)
pageSource - the source for the page of interest
public static java.lang.String extractNextPage(java.lang.String pageSource)
pageSource - page source code containing the link to the older news stories
private static java.lang.String cleanUpURL(java.lang.String url)
url - "dirty" url with extraneous information
private static java.lang.String prepareForParsing(java.lang.String pageSource)
pageSource - the page source to be chopped up
private static java.lang.String[] parseStoryProperties(java.lang.String completeString)
completeString - the string containing the date, time and source with filler text
public static java.util.ArrayList extractNewsStories(java.lang.String pageSource,
java.lang.String ticker)
pageSource - the source code containing the news stories
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||