|
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--WordGrabber
This class is responsible for reading words one at a time from a set of files. The tasks it is responsible for include:
startNextFile
is called, open
the next file from the list of files.posNextWord
and nextWord
are
called by the user to read words one at a time from the file. Common
words and short words are ignored. The byte position of the word in the
file is returned to the user too.
and
are called.
Field Summary | |
static java.io.FileFilter |
txtFileFilter
This is the default file filter. |
Constructor Summary | |
WordGrabber()
Start a new WordGrabber object. |
|
WordGrabber(java.lang.String directory,
java.lang.String commonWordsFile)
Start a new WordGrabber, but use a different directory than the default one. |
|
WordGrabber(java.lang.String directory,
java.lang.String commonWordsFile,
java.io.FileFilter filter)
Start a new WordGrabber. |
Method Summary | |
java.lang.String |
getFileInfo(int fileNum)
Gets identifying information about the file's contents, such as the file name and in some cases, its title and author. |
boolean |
isCommon(java.lang.String s)
Tests whether a word is "common", and should be ignored. |
boolean |
isIgnored(java.lang.String s)
Tests whether a word will be ignored either because of being too short or because of being a common word. |
java.lang.String |
nextWord()
Returns the next non-common word from the file. |
int |
posNextWord()
Tests whether the current file has a next word to be processed. |
java.lang.String |
printableExtract(int fileNum,
int start,
int end)
Gets a small portion of a file and formats it as a String suitably for printing. |
java.lang.String |
printableExtractLen(int fileNum,
int mid,
int len)
Gets a small portion of a file and formats it as a String suitable for printing. |
int |
startNextFile()
Start reading from a new file. |
Methods inherited from class java.lang.Object |
clone,
equals,
finalize,
getClass,
hashCode,
notify,
notifyAll,
toString,
wait,
wait,
wait |
Field Detail |
public static final java.io.FileFilter txtFileFilter
FileFilter
that accepts files with the extension .txt
.
The extension is also permitted be in uppercase.Constructor Detail |
public WordGrabber() throws java.io.FileNotFoundException, java.io.IOException
public WordGrabber(java.lang.String directory, java.lang.String commonWordsFile) throws java.io.FileNotFoundException, java.io.IOException
directory
- the name of a directory which will be recursively searched
for files ending with .txt or .TXT. It is also permissable to use
the name of a file, in which case only that one file is used.commonWordsFile
- the name a file containing a list of common words
that will be ignored. All words of 1 or 2 or 3 symbols are automatically
treated as being common words. If this parameter is null
then
there is no file of common words.public WordGrabber(java.lang.String directory, java.lang.String commonWordsFile, java.io.FileFilter filter) throws java.io.FileNotFoundException, java.io.IOException
directory
- the name of a directory which will be recursively searched
for files ending with .txt or .TXT. It is also permissable to use
the name of a file, in which case only that one file is used. If null, it
reverts to the default directory.filter
- a FileFilter which sets which files should be read. (Intended
for use by Java experts only.) If null, it reverts to the default file
filter.Method Detail |
public int startNextFile() throws java.io.IOException
hasNextWord
.public int posNextWord()
To start reading the next file, you should call startNextFile
.
Common words are skipped over and are not returned
(see nextWord
).
nextWord
is
called it will return a word. This positive integer is the starting position
of the word in the file. Otherwise returns -1.public java.lang.String nextWord() throws java.io.IOException
You should always call posNextWord()
before
calling nextWord()
to get
the position of the word.
The returned word is in all lowercase (no matter how it appears in the file.
public boolean isCommon(java.lang.String s)
isIgnored()
if you want to test that condition.public boolean isIgnored(java.lang.String s)
public java.lang.String getFileInfo(int fileNum)
Usual format for the returned information is:
File: the file name. "Title" by Author
The second line of the string will be missing if title and author are unknown.
public java.lang.String printableExtract(int fileNum, int start, int end) throws java.lang.IllegalArgumentException
prettyPrint (printableExtract(fileNum, start, end));
fileNum
- the file number, as obtained from startNextFile
start
- the start position in the file, measured in characters.end
- the end position in the file, measured in characters.fileNum
is invalid.public java.lang.String printableExtractLen(int fileNum, int mid, int len)
printableExtract(fileNum, start, end)
where
start = mid-len/2
end = start+len
mid
- the middle position of the string to extractlen
- the length of string to extractfileNum
- the file number, as obtained from startNextFile
fileNum
is invalid.
|
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |