CSS 341 - Machine Problem #4

 

Problem Solving with Arrays

Working with Text files, Word Processors, and Spreadsheets

 

Due:  November 14, 2009 (midnight)  (note change!!!!)

              

 

Based on all the great systems we have delivered, the Keystone Daily has finally awarded us a contract to develop solutions for their core business: newspaper articles analysis system (NAAS). The editors at Keystone Daily want us to deliver a system to help them analyze the news articles their reporters submit.  There has been a marketing study indicating that the use of buzz words if far more important to increased readership than is the quality of the reporting or writing.  The editors need an automated system to help them analyze the articles submitted.  Here are the details:

 

Keystone Daily reporters submit their articles in Microsoft Word documents(either *.doc or *.docx). Each morning before the press starts to run, all editors will get together to articulate the DailyWordList. The DailyWordList is a list of words that the editors feel is important each morning. The result of this process often appears to be rather random.  The editors will then go back to their individual departments, go through each submitted article and count the number of occurrences of the words that they have placed in the  DailyWordList. Based on this count, the editors will then prepare a statistics report for each article detailing the number of words on that DailyWordList, each of those words and the number of times it is found, and finally the statistics:

              

Total Words Counted ˇV the total number of times words in the DailyWordList are encountered in the submitted document.

Max Occurrence ˇV maximum number of times a single word in the DailyWordList has been counted in the article

Min Occurrence ˇV minimum number of times a single word in the DailyWordList has been counted in the article

Avg Occurrence ˇV average number of times all words on the DailyWordList have been counted

 

For example, here is an article a novice reporter has submitted. On that day, the editors got together and came up with this list of words as the DailyWordList, and here is the report the editor wants. Notice, that the DailyWordList is really a random list of words, and depends on the mood of the editors, the number of words on this list may be different each day. Also notice that the editors are too busy to be careful and often leave blank lines in the DailyWordList and sometimes leave a space in the middle of a word.  It turns out they are careful never to place more than one word on each line, even if a space has been inserted accidentally.  The editors are unable to accurately shift case, so the article searches should be intentionally case insensitive.  Our job is to write a VBScript program to automate the above process for the editors.

 

Here is what the editors want from our system:

 

When our system first starts, it should ask the editors to select and open a text file (*.txt) containing the DailyWordList. This file will have been provided by the editors that day.  As mentioned, the editors are too busy to be responsible and may leave blank lines in this text file and unwanted spaces in some words. However we are told that each non-blank line does contain one word and only one.

We must then report to the editors how many words are in todayˇ¦s DailyWordList. For whatever reason, the editors demand we report this number directly to the user before going to the next step.  Besides leaving blank lines, they are apparently too busy to count the words.

We must allow the editors to select a Microsoft Word document (*.doc or *.docx) for analysis. 

 Analysis and final reporting is to be accomplished with Microsoft Excel spreadsheet that can be saved.

 

Based on the above input, we must generate a Microsoft Excel report describing the statistics. In this report, we must include the word count for each word in the DailyWordList. These counts must be sorted such that the word with the highest count is shown first on the list. After this detailed word count, we must then report the above statistics so that the editors can decide which article can be published in the Keystone Daily.  Each editor should be given the opportunity to save the spreadsheet with a reasonable name when they are finished with it.

 

This programming assignment contributes approximately 11% towards your final grade for this course.

 

Hints

 

ˇP         The *.txt file should be read and parsed into an array using the textstream.read and readline methods that we discussed earlier

ˇP         The *.txt file lines can be parsed and manipulated after reading into vbs with the string functions found on p.443 of Lomax.  Note that I used some of these functions in the subprogram in MP3 that found the file path from its url.  You should not need to use the RegExp object for this assignment.  It is more elaborate than needed, and we will discuss it later.

ˇP         There are different ways to handle and search in the *.doc file:  I would do the searching using Microsoft Word and its find object.  We have seen this example before.  Note that the find object has many properties that can be set to affect the nature of the search.

ˇP         Be careful with complete words:  e.g. the word ˇ§classˇ¨ is not matched by the word ˇ§classroomˇ¨.

ˇP         Do not just run the small test case I have provided.  You need to design your own tests as with any program you develop.

 

 

This assignment is slightly more challenging then the ones before. In particular, here we must work with three different systems including two major commercial applications. There are many traps that you may fall into and get stuck. Please do start early and let me know how I can help.

 

Documentation of process: 

 

ˇP         Intermediate submission:  Sometime prior to noon, November 6, your team should turn in (to the MP4 drop box) one page giving your estimates of the amount of time each of these steps will require for the team as a whole:  Design, Detailed Design, Code Development, Testing, Code Revision, Final Testing.

ˇP         In the final report,  please hand in one page that gives your estimates of the actual time spent on each of the activities above.  Also, include in this document a description of the software development process you and your partner applied to this problem.  Give the reader a description of how you conceptualized the project, addressed it as a team, and brought it to completion.  Focus on the team process rather than on the code design.