README -------- How To Setup Files And Directories ---------------------------------- To install the NEWSAGENT, please follow the following steps:- 1. The tar file is put in the directory ~newsagent/install. Use the following two commands to extract and set up all the files and directories of the system. a. "gunzip newsagent.tar.gz", then b. "tar xvf newsagent.tar" After executing the above two commands, check that there are three directories: /user_fun, /spool and /daemon under the current directory. 2. Go to the directory /user_fun and type a. "make gpcontent" b. "make getartnum" c. "make readart" to compile the user functions. 3. Then go to the directory /daemon/src to compile the daemon by typing using the command 'make'. After compiling the daemon, an executable called "agent" is appeared in the same directory. Then the installation process of daemon is finished. How To Execute The Daemon ------------------------- To execute the daemon, just type the following command >agent However, if you execute the daemon at the first time,please note that: There MUST be one (at least one) user's directory under the spool directory. This user's directory will be created automatically after you run the user interface "na". Therefore you must compile all the files for the daemon, then run the interface na before you execute the daemon. However, file ".pf" MUST be created by user. About the ".pf" file, please read the description of preference file for more detail. There is a file called LastExe. It stores the time of the last execution of daemon. In fact, it means that you want to examine the new article arrived at the NNTP server since that time. The time MUST be specified before you execute the daemon at the first time. The format of the time(24-hour clock time) is as follows: - "YYMMDD HHMMSS" where 'YY' represents the year, 'MM' represents the month, 'DD' represents the day, 'HH' represents the hour, 'MM' represents the minute, 'SS' represents the second. For example, 1st May 1995 at noon will be written as "950501 120000". When it is the first time to run the daemon, this time MUST be written into the data file 'LastExe' by the user. We suggest that it is 24hrs before you start executing the daemon. How To Execute And Use The User Interface ------------------------------------------ The user interface is named as 'na' which is a script file written in Tcl/Tk. You can execute it after you have included the path 'NEWAGENT_DIR/user_fun', where 'NEWAGENT' is the directory the whole system is installed. It is suggested that the user interface should only be executed after the daemon has finished its first time of execution, otherwise there will be no article extracted for the user. How To Manipulate The Preference File ------------------------------------- Upon the first use of the user interface, a directory '/user's account name' is created under the directory /spool for the user, e.g. user peter will have the directory /peter. Each user is required to create his/ her own preference file '.pf' under the direcotry '/user's account name'.(The preference file MUST be created before the first execution of daemon). The preference file stores the user's interests which have to be specified by the user. An interest takes the form of options and interest pattern and each interest should be, by default , written as a separate line in the preferenc file. And there should be no space between the lines. Moreover, the specification of the interest can make use of 1) wild cards, 2) boolean operator, 3) regular expression operator. The general form of an interest is: [OPTION] [INTEREST_PATTERN] OPTIONS -i Ignore the case of letters in making comparisons i.e upper case and lower case are considered identical. -e [INTEREST_PATTERN] As same as a simple interest argument, but useful when the interest begins with a `-'. -k No symbol in the interest pattern is treated as a meta character used in regular expression. For the example, -k 'a(b|c)*d', the daemon will find the occurrence of a(b|c)*d in the subject of articles. But for the example, 'a(b|c)*d', the daemon will find the substrings in the subject of articles that matches the regular expression 'a(b|c)*d'. -v Only the subject of articles do not contain the interest pattern is extracted. -w Treat the interest pattern as a word. For example, -w 'computer' will match computers. -x The daemon will find the article with the subject that match the whole line of the interest pattern (including space), approximately 25 characters. INTEREST_PATTERN The daemon supports a large variety of patterns, including simple strings, strings with classes of characters, sets of strings, boolean expression, wild cards, exact and approximate matching, and regular expressions. 1. Strings Any sequence of characters. If the user want to contain the special character such as `$', `^', `*', `[', `^', `|', `(', `)', `!', and `\', the character `\' must be proceeded it. The character `^' denotes the beginning of line and `$' denote end of line. 2. Classes of characters A list of characters inside [] (in order) corresponds to an character from the list. For example, [0-9] is any character between 0 and 9. `^' is treated as complement when it is enclosed by `[' and `]'. The symbol '.' stands for any symbol except newline character. For example, [^a-g] denotes any characters (i.e a to z) except 'a' to 'g'. 3. Boolean operations `;' is treated as logical AND whereas `,' is treated as logical OR. They cannot be used in the same interest pattern. 4. Wild cards The symbol '#' is used to denote a wild card. '#' matches zero or any number of arbitrary characters. 5. Combination of exact and approximate matching Any pattern inside angle brackets <> must match the text exactly even if the match is with errors. For example, uter matches computing. But comr does not match computation. 6. Regular expressions The union operation `|', Kleene closure `*', and parentheses () are all supported. Currently '+' is not supported. Regular expressions are currently limited to approximately 30 characters (generally excluding the meta characters). '-w' option does not currently work with regular expression. EXAMPLES -w -i 'computer' find articles with subject containing the word 'computer' in upper or lower case. 'computer;science' find the articles with subject containing the words 'computer' and 'science'. '^Re' find the articles with subject which has 'Re' as the first two character. 'f#t' find the articles with subjects which has 'f' as the first character and the 't' to be the last character. So fast or fastest will be matched. 'abc[b-k](cd)*(ka|ca)' find the articles with subject that start with abc followed by one character in the set of character 'b' to 'k', followed by zero or more repetitions of cd, followed by either ka or ca. Note: As the characters `$', `^', `*', `[', `]', `^', `|', `(', `)', `!', and `"' is meaningful to the shell. When these character is included in the interest, unexpected result will be caused. In order to avoid these problem, the interest should be enclosed entirely in single quotes i.e. 'interest'.