CS 161 Lab H - Twitter Searching

Due Fri Mar 26 at the start of class

Overview

In this lab you will practice writing methods that search and process information--what computers are best at! Moreover, you'll have a chance to see your code work on real data from that most abundant of data sources: Twitter!

You will be implementing a program that is able to search through a selection of tweets, as well as to rate the sentiment of the tweets (how positive or negative the tweet is) and inform the user about the sentiment of their search. Your program will work through the terminal using the Scanner, and produce something like this:

Welcome to Sentimental Twitter Search!

Enter some text to search for, or type QUIT to stop.
sandwich
(1.5) aaang_: I work for such an awesome company. Today, we did something
WONDERFUL.  We ordered a sandwich and... https://t.co/yRzbXqZMnu
(-0.75) nataliebmarquez: Last night I had Brussels sprouts and now I am eating an
egg salad sandwich. My apologies to the state of Washington.
(0.5) fatcakeangelo: Best sandwich I've ever made...
(-0.125) chicanaCalvillo: My dog ate my chicken sandwich. I'm beyond pissed and
hungry right now
(-0.375) m_chappell2: Chef Mandy is at once again with another killer sandwich 😎
(-0.25) DannyBoyLyd: Suspicions confirmed. My kids like ice cream sandwiches more
than me.
(-0.5) lyssajthompson: How do I react when I'm eating food from my own job and
there's a hair in my sandwich 😑
(0.625) RandomArron: Okay so I wouldn this T Shirt girls giving sandwiches😉
http://t.co/6MHWfdcqBc
Number of tweets found: 8
Average sentiment of tweets: 0.078125
Most positive tweet:
  (1.5) aaang_: I work for such an awesome company. Today, we did something
WONDERFUL.  We ordered a sandwich and... https://t.co/yRzbXqZMnu
Tweets are mostly positive: false

This lab should be completed in pairs. You will need to find a different partner than you had for the last lab. Remember to switch off who is driving and who is navigating!

Objectives

Necessary Files

You will need to download and extract the BlueJ project from the Twitter.zip file. This project will supply you with the following classes for you to use. Note that you will NOT need to modify any of these classes.

Assignment Details

In this lab, you will be creating two new classes: a very simple Tweet class to represent a single Tweet, and a TweetAnalyzer class that will contain your searching methods. You might also likely want a separate tester class to easily test that your methods work without repeating work in the codepad.

Part 1: Modeling Tweets

The first thing you should do is make a class to represent a single Tweet. This will be a very simple modeling class (with instance variables and a constructor), similar to what you've created before.
  1. A Tweet is made up of an author (represented as a String---the user name) and the content (also represented as a String). Your Tweet constructor should take in both the author and the content and assign them to instance variables as appropriate.
  2. You will also need getters for the author and the content
  3. Finally, you should include a toString() method that returns a String representation of the tweet.
  4. By now this kind of simple model class should be quick and easy to put together! You can quickly test your class with the following code (either in your tester or in the codepad):
    Tweet myTweet = new Tweet("cs161","computer science is totally my favorite!");
    System.out.println(myTweet);

    which should print out

    cs161: computer science is totally my favorite!

Part 2: All the Tweets!

  1. Once you have your Tweet model in place, we can start analyzing Tweets! Create a new TweetAnalyzer class to do this work.
    • You may want to fill in the method signature for the constructor and the printSearchResults() method now, so that everything compiles.
  2. Your TweetAnalyzer will be keeping track of a collection of Tweet objects using an ArrayList (stored as an instance variable). Remember, you can declare and assign ArrayLists using:
    ArrayList<DataType> myList;
    myList = new ArrayList<DataType>();

    Be sure to declare the variable as an instance variable, but assign a value to it in the constructor.

    • Also remember to import java.util.ArrayList at the very top of the class!
  3. After you've created the ArrayList in your constructor, you'll need to fill it with Tweets! We have provided a file tweets.txt that contains a sampling of tweets from the Tacoma area over the last few days.

    WARNING: THIS FILE CONTAINS REAL, UNSANITIZED TWITTER DATA

    This data is taken from the twitter "firehose"--all public tweets made. Tweets may include offensive, inappropriate, or triggering language or content. If you are concerned about this data (or find anything that we should definitely remove), please let us know.

    As with the rest of the Internet, any given moment on Twitter can reveal both the peaks and valleys of human behavior. This assignment should remind you that, when communicating publicly, it is worth considering how what you say may affect others.

  4. You can read the provided Tweet data from the file by using the provided TwitterDataFile class. Instantiate a new TwitterDataFile object by passing the constructor the name of the file to read (tweets.txt) as a parameter. The filename should be "hardcoded" into your TweetAnalyzer class.
  5. You can then get an ArrayList of Strings from this file object by calling the getTwitterDataList() method on it.
    • Remember to assign the results of this method to a local variable.
  6. Once you have the ArrayList<String> of twitter data, you can loop through the list and use those Strings to create new Tweet objects for your instance variable list.
    • Although it is possible to loop through an ArrayList using a while or for loop, it's usually easier, cleaner, and better practice to use a for each loop. Recall that a for each loop looks like:
      //read as: for EACH variable IN listName
      for(DataType variable : listName)
      {
        //do stuff with variable
      }

      For example:

      ArrayList<Robot> robots = new ArrayList<Robot>(); //the list
      for(Robot theRobot : robots)
      {
        //do something with bot
      }
    • You might check your loop syntax by (temporarily) using it to print out each String in the twitter data list, so you can see what they look like.
  7. Each line in the tweets.txt file is a tab-separated list of tweet characteristics. It lists the time that the tweet was posted, then a TAB, then the author of the tweet, then another TAB, then the content of the tweet. You will need to extract the relevant portions of data from each line in order to constructor a new Tweet object.
    • You could use the String class's indexOf() and substring() methods to pull out the relevant pieces (like we did for the Fraction homework). However, since there are lots of pieces separated by tabs, there is an easier solution available
    • The .split(String) method of the String class returns a String[] of all the different substrings divided by some "separator" String. For example:
      String text = "Hello world how are you?";
      String[] parts = text.split(" "); //split using a space as a separator
      //parts => {"Hello", "world", "how", "are", "you?"}
    • You should use the .split() method to separate the line of the file using a tab (written using a tab character, "\t") as a separator. You can then access the appropriate element of this array to get the author and the content of the tweet. Use that to instantiate a new Tweet object, and add it to the TweetAnalyzer's ArrayList of Tweets!
    • Note: you don't need a loop to work with your small "parts" array--you can just access each item by index. For example:
      String firstPart = parts[0]; //start at index 0
  8. You can test that your program works by instantiating a new TweetAnalyzer object (right-click on the class in BlueJ to make a new red box), and then inspecting the box to see that the ArrayList instance variable exists and has data. Note that you can double-click on the "bent arrow" symbol to inspect the object (the ArrayList) inside the other object (the TweetAnalyzer).

Part 3: Searching Tweets

Now that you have the data modeled and loaded, you can begin writing methods to work with it! These methods will all go in your TweetAnalyzer class.

  1. Start by creating a method called printTweets() that takes as a parameter an ArrayList of Tweets. The method should print each Tweet in the given ArrayList on its own line.
    • Use a foreach loop like you did to process the String data!
    • You might test this method by (temporarily) using it to print out the Tweets saved in the TweetAnalyzer's instance variable (e.g., at the end of the constructor).
  2. Next, implement a method called find() that takes as a parameter a single String searchTerm and returns an ArrayList of all the Tweets that include that search term either in the content OR as the author.
    • This method should perform a linear search--it should loop though the list of Tweets and check each one to see if it is what you are looking for.
    • You'll need some way to figure out if a Tweet's content contains the searchTerm. Take a minute and think about how you might do this using String methods we've talked about. Remember that the searchTerm might contain multiple words, so .split() isn't likely to be useful here.
    • The easiest way to do this is to use another new method from the String class: the .contains() method. This method takes in String and performs its own linear search, returning whether or not the given String is contained within the String you called the method on. For example:
        String text = "hello world";
        boolean wor = text.contains("wor"); //is true
        boolean word = text.contains("word"); //is false
    • You'll also need to check whether the Tweet's author is the same as the searchTerm. Remember to compare Strings with the .equals() method, not with the == operator.
    • Note that both .contains() and .equals() are case-sensitive, which is fine for this assignment. If you want (and have time), you can ignore case by search/comparing to the lower-case version of the Tweet's content (use the .toLowerCase() method).
  3. Now you can start filling in the printSearchResults() method. This method should take in a searchTerm, and then call your find() method to get a list of all the Tweets that match the search. Your method should then print out all the found Tweets (using your printTweets() method).
    • Your method should also print out the number of Tweets found by the search (as in the example at the top of the page). Remember you can get the number of items in an ArrayList using the .size() method.

At this point, you can start running the TwitterSearcher class and see some results for different searches. But don't get caught up, there's still more to do...

Extension: Tweet Sentiment

Now you're able to search for tweets, but it might be nice to get some more information about what tweets you found. In particular, we'll add in the ability to to measure the sentiment of the tweets. This sentiment will give us a (partially accurate) measure of whether the tweet is "positive" or "negative". Sentiment is measured as a number between -1 (all negative) to 1 (all positive).

  1. Our technique for determining the sentimentally of a Tweet is to calculate the sentiment of each word in the Tweet, and then sum up those values to determine the overall sentiment of the Tweet. The provided Sentiments class contains a method to determine the sentimentality of a given word; see the description above for how to use this class.
  2. Since you're determining the sentiment of each Tweet, you'll want to modify the Tweet class you made to support that. Create a new instance variable to store the sentiment. (Think: what data type should this variable be?).
  3. After you've declared an instance variable at the top of the class, you'll need to assign a value to it in the constructor. This means you'll need to calculate the sentiment of each word, and then sum those sentiments together
    • You can once again use the String class's .split() method to break up the content into different words (like in the previous example).
    • You can then loop through this array and sum up the word sentiments. Hint: try using a "running total" variable to calculate your sum.
  4. Now that you've calculated the Tweet's sentiment, make a getter for that value, and modify your toString() method so that your String representation includes the sentiment value as well.
  5. You can test your sentiment calculation with code similar to the following:
    Tweet tweet1 = new Tweet("cs161","computer science is totally my favorite!");
    System.out.println(tweet1);
    Tweet tweet2 = new Tweet("cs161","life without computers: awful or awesome?");
    System.out.println(tweet2);

    which should print out

    (0.75) cs161: computer science is totally my favorite!
    (-0.125) cs161: life without computers: awful or awesome?

Extension: Sentimental Searches

Finally, add in some searching functionality to the TweetAnalyzer to actually use these newly calculated sentiment values.

  1. Start by adding a method called getAverageSentiment() that takes as a parameter an ArrayList of Tweets, and returns the average sentiment of those tweets. Remember to use a for each loop to iterate through the ArrayList.
    • Calculating an average is a lot like calculating a sum (what you just did to get the sentiment of a tweet).
    • If the given list of tweets is empty, this method should return 0.
  2. Implement a method called getMostPositive() that takes as a parameter an ArrayList of Tweets, and returns the Tweet that has the highest sentiment value.
    • This method should use a king-of-the-hill search, which works as follows:
      1. Assign one item (e.g., the first item in the list) to be the "king" (the biggest/smallest/bestest) item.
      2. Loop through everyone else
      3. Compare each item to the king in a kind of challenge--if they are bigger/smaller/better than the king, then they become the new king (and should be assigned as such)
      4. Whoever is the king in the end must be the biggest/smallest/bestest item in the list!
    • If the given list of tweets is empty, this method should return null
  3. Last but not least, implement a method called isMostlyPositive() that takes as a parameter an ArrayList of Tweets, and returns whether or not the majority of those tweets are positive (have a sentiment > 0). Think: what is a good data type for the return value?
    • Like the others, this method is also a linear search. You'll need to count the number of positive and negative tweets, and then make some decision about what to return based on those counts.
    • A set of tweets needs to have strictly more positive tweets than negative tweets to be considerde "mostly positive".
  4. Finally finally, modify your printSearchResults() so it also prints out the the average sentiment of the found tweets, the most positive tweet, and whether or not all of the found tweets are positive. You should call the methods you just wrote.

And that's it! You should now be able to run the TwitterSearcher method and get details about the sentimentality of tweets that match your search!

Extensions

No extra credit on this one, but we'd love to see/hear suggestions for different kinds of analysis you might perform on Tweets!

Submitting

  1. Make sure both of your names are in a JavaDoc comment on both of your new classes (Tweet and TweetAnalyzer). If your names aren't on your assignment, we can't give you credit!
  2. Right-click on the project folder you downloaded, then:
    • If using Linux, select Compress...
    • If using Windows, select Send to and then Zip file
    • If using Mac, select Compress ... items

    This will let you take the selected folder (or files) and generate a new compressed .zip file.

  3. Navigate to the course on Moodle (Lecture Section A), (Lecture Section B). Upload your .zip file to the Lab H Submission page. Remember to click "Save Changes"! You may submit as often as you'd like before the deadline; we will grade the most recent copy.
  4. While you're on Moodle, remember to fill out the Lab H Partner Evaluation. Both partners need to submit evaluations.

Grading

This assignment will be graded out of 21 points.