CS 161 Lab H - Twitter Searching
Due Fri Mar 26 at the start of class
Overview
In this lab you will practice writing methods that search and process information--what computers are best at! Moreover, you'll have a chance to see your code work on real data from that most abundant of data sources: Twitter!
You will be implementing a program that is able to search through a selection of tweets, as well as to rate the sentiment of the tweets (how positive or negative the tweet is) and inform the user about the sentiment of their search. Your program will work through the terminal using the Scanner
, and produce something like this:
Welcome to Sentimental Twitter Search!
Enter some text to search for, or type QUIT to stop.
sandwich
(1.5) aaang_: I work for such an awesome company. Today, we did something
WONDERFUL. We ordered a sandwich and... https://t.co/yRzbXqZMnu
(-0.75) nataliebmarquez: Last night I had Brussels sprouts and now I am eating an
egg salad sandwich. My apologies to the state of Washington.
(0.5) fatcakeangelo: Best sandwich I've ever made...
(-0.125) chicanaCalvillo: My dog ate my chicken sandwich. I'm beyond pissed and
hungry right now
(-0.375) m_chappell2: Chef Mandy is at once again with another killer sandwich 😎
(-0.25) DannyBoyLyd: Suspicions confirmed. My kids like ice cream sandwiches more
than me.
(-0.5) lyssajthompson: How do I react when I'm eating food from my own job and
there's a hair in my sandwich 😑
(0.625) RandomArron: Okay so I wouldn this T Shirt girls giving sandwiches😉
http://t.co/6MHWfdcqBc
Number of tweets found: 8
Average sentiment of tweets: 0.078125
Most positive tweet:
(1.5) aaang_: I work for such an awesome company. Today, we did something
WONDERFUL. We ordered a sandwich and... https://t.co/yRzbXqZMnu
Tweets are mostly positive: false
This lab should be completed in pairs. You will need to find a different partner than you had for the last lab. Remember to switch off who is driving and who is navigating!
Objectives
- To practice working with arrays and ArrayLists (especially ArrayLists)
- To practice using loops (for and foreach) and conditionals to write searching methods
- To learn a few more String methods
- To have a chance to play with real data!
Necessary Files
You will need to download and extract the BlueJ project from the Twitter.zip file. This project will supply you with the following classes for you to use. Note that you will NOT need to modify any of these classes.
-
TwitterDataFile
This class represents a file full of Twitter data, which you can use to load an ArrayList of data into your program.- Instantiate this class by passing it a
filename
as a parameter. This is the name of the text file you want to get data from. The starter code includes atweets.txt
file for you to use. -
You can then use the provided
getTwitterDataList()
method to fetch anArrayList<String>
of the data (where each item represents a different tweet).
- Instantiate this class by passing it a
-
Sentiments
This class will allow you to determine the "sentiment" of a given word. A sentiment is adouble
between -1 and 1. A -1 means the word is "very negative" and a +1 means the word is "very positive".-
This class has a single
static
method calledgetWordSentiment
that returns adouble
value representing the sentiment of the word. Thus you can get the sentiment of a word by using:double wordSentiment = Sentiments.getWordSentiment(word);
-
This class has a single
-
TwitterSearcher
This class contains a single static method,search()
, that will be used to start your searching program. You will need to implement the TweetAnalyzer class and provide a method-
public void printSearchResults(String searchTerm)
for this code to compile.
You can start the program right-clicking on the class in BlueJ and selecting the
search()
method. -
Assignment Details
In this lab, you will be creating two new classes: a very simple Tweet
class to represent a single Tweet, and a TweetAnalyzer
class that will contain your searching methods. You might also likely want a separate tester class to easily test that your methods work without repeating work in the codepad.
Part 1: Modeling Tweets
The first thing you should do is make a class to represent a singleTweet
. This will be a very simple modeling class (with instance variables and a constructor), similar to what you've created before.
- A
Tweet
is made up of an author (represented as a String---the user name) and the content (also represented as a String). YourTweet
constructor should take in both theauthor
and thecontent
and assign them to instance variables as appropriate. - You will also need getters for the author and the content
-
Finally, you should include a
toString()
method that returns a String representation of the tweet. -
By now this kind of simple model class should be quick and easy to put together! You can quickly test your class with the following code (either in your tester or in the codepad):
Tweet myTweet = new Tweet("cs161","computer science is totally my favorite!"); System.out.println(myTweet);
which should print out
cs161: computer science is totally my favorite!
Part 2: All the Tweets!
-
Once you have your Tweet model in place, we can start analyzing Tweets! Create a new
TweetAnalyzer
class to do this work.- You may want to fill in the method signature for the constructor and the
printSearchResults()
method now, so that everything compiles.
- You may want to fill in the method signature for the constructor and the
-
Your TweetAnalyzer will be keeping track of a collection of
Tweet
objects using anArrayList
(stored as an instance variable). Remember, you can declare and assign ArrayLists using:ArrayList<DataType> myList; myList = new ArrayList<DataType>();
Be sure to declare the variable as an instance variable, but assign a value to it in the constructor.
-
Also remember to
import java.util.ArrayList
at the very top of the class!
-
Also remember to
-
After you've created the
ArrayList
in your constructor, you'll need to fill it withTweet
s! We have provided a filetweets.txt
that contains a sampling of tweets from the Tacoma area over the last few days.WARNING: THIS FILE CONTAINS REAL, UNSANITIZED TWITTER DATA
This data is taken from the twitter "firehose"--all public tweets made. Tweets may include offensive, inappropriate, or triggering language or content. If you are concerned about this data (or find anything that we should definitely remove), please let us know.
As with the rest of the Internet, any given moment on Twitter can reveal both the peaks and valleys of human behavior. This assignment should remind you that, when communicating publicly, it is worth considering how what you say may affect others.
-
You can read the provided Tweet data from the file by using the provided
TwitterDataFile
class. Instantiate anew TwitterDataFile
object by passing the constructor the name of the file to read (tweets.txt
) as a parameter. The filename should be "hardcoded" into yourTweetAnalyzer
class. -
You can then get an ArrayList of Strings from this file object by calling the
getTwitterDataList()
method on it.- Remember to assign the results of this method to a local variable.
-
Once you have the
ArrayList<String>
of twitter data, you can loop through the list and use those Strings to create newTweet
objects for your instance variable list.-
Although it is possible to loop through an ArrayList using a while or for loop, it's usually easier, cleaner, and better practice to use a for each loop. Recall that a for each loop looks like:
//read as: for EACH variable IN listName for(DataType variable : listName) { //do stuff with variable }
For example:
ArrayList<Robot> robots = new ArrayList<Robot>(); //the list for(Robot theRobot : robots) { //do something with bot }
- You might check your loop syntax by (temporarily) using it to print out each String in the twitter data list, so you can see what they look like.
-
Although it is possible to loop through an ArrayList using a while or for loop, it's usually easier, cleaner, and better practice to use a for each loop. Recall that a for each loop looks like:
-
Each line in the
tweets.txt
file is a tab-separated list of tweet characteristics. It lists the time that the tweet was posted, then a TAB, then the author of the tweet, then another TAB, then the content of the tweet. You will need to extract the relevant portions of data from each line in order to constructor anew Tweet
object.-
You could use the String class's
indexOf()
andsubstring()
methods to pull out the relevant pieces (like we did for the Fraction homework). However, since there are lots of pieces separated by tabs, there is an easier solution available -
The
.split(String)
method of theString
class returns aString[]
of all the different substrings divided by some "separator" String. For example:String text = "Hello world how are you?"; String[] parts = text.split(" "); //split using a space as a separator //parts => {"Hello", "world", "how", "are", "you?"}
-
You should use the
.split()
method to separate the line of the file using a tab (written using a tab character,"\t"
) as a separator. You can then access the appropriate element of this array to get the author and the content of the tweet. Use that to instantiate anew Tweet
object, and add it to theTweetAnalyzer
'sArrayList
of Tweets! -
Note: you don't need a loop to work with your small "parts" array--you can just access each item by index. For example:
String firstPart = parts[0]; //start at index 0
-
You could use the String class's
-
You can test that your program works by instantiating a new
TweetAnalyzer
object (right-click on the class in BlueJ to make a new red box), and then inspecting the box to see that theArrayList
instance variable exists and has data. Note that you can double-click on the "bent arrow" symbol to inspect the object (the ArrayList) inside the other object (the TweetAnalyzer).
Part 3: Searching Tweets
Now that you have the data modeled and loaded, you can begin writing methods to work with it! These methods will all go in your TweetAnalyzer
class.
-
Start by creating a method called
printTweets()
that takes as a parameter anArrayList
ofTweets
. The method should print each Tweet in the given ArrayList on its own line.- Use a foreach loop like you did to process the String data!
- You might test this method by (temporarily) using it to print out the Tweets saved in the TweetAnalyzer's instance variable (e.g., at the end of the constructor).
-
Next, implement a method called
find()
that takes as a parameter a single String searchTerm and returns anArrayList
of all theTweet
s that include that search term either in the content OR as the author.- This method should perform a linear search--it should loop though the list of Tweets and check each one to see if it is what you are looking for.
- You'll need some way to figure out if a Tweet's content contains the searchTerm. Take a minute and think about how you might do this using String methods we've talked about. Remember that the searchTerm might contain multiple words, so
.split()
isn't likely to be useful here. -
The easiest way to do this is to use another new method from the String class: the
.contains()
method. This method takes in String and performs its own linear search, returning whether or not the given String is contained within the String you called the method on. For example:String text = "hello world"; boolean wor = text.contains("wor"); //is true boolean word = text.contains("word"); //is false
-
You'll also need to check whether the Tweet's author is the same as the searchTerm. Remember to compare Strings with the
.equals()
method, not with the==
operator. -
Note that both
.contains()
and.equals()
are case-sensitive, which is fine for this assignment. If you want (and have time), you can ignore case by search/comparing to the lower-case version of the Tweet's content (use the.toLowerCase()
method).
-
Now you can start filling in the
printSearchResults()
method. This method should take in a searchTerm, and then call yourfind()
method to get a list of all the Tweets that match the search. Your method should then print out all the found Tweets (using yourprintTweets()
method).-
Your method should also print out the number of Tweets found by the search (as in the example at the top of the page). Remember you can get the number of items in an ArrayList using the
.size()
method.
-
Your method should also print out the number of Tweets found by the search (as in the example at the top of the page). Remember you can get the number of items in an ArrayList using the
At this point, you can start running the TwitterSearcher
class and see some results for different searches. But don't get caught up, there's still more to do...
Extension: Tweet Sentiment
Now you're able to search for tweets, but it might be nice to get some more information about what tweets you found. In particular, we'll add in the ability to to measure the sentiment of the tweets. This sentiment will give us a (partially accurate) measure of whether the tweet is "positive" or "negative". Sentiment is measured as a number between -1 (all negative) to 1 (all positive).
-
Our technique for determining the sentimentally of a Tweet is to calculate the sentiment of each word in the Tweet, and then sum up those values to determine the overall sentiment of the Tweet. The provided
Sentiments
class contains a method to determine the sentimentality of a given word; see the description above for how to use this class. -
Since you're determining the sentiment of each Tweet, you'll want to modify the
Tweet
class you made to support that. Create a new instance variable to store the sentiment. (Think: what data type should this variable be?). -
After you've declared an instance variable at the top of the class, you'll need to assign a value to it in the constructor. This means you'll need to calculate the sentiment of each word, and then sum those sentiments together
-
You can once again use the String class's
.split()
method to break up the content into different words (like in the previous example). - You can then loop through this array and sum up the word sentiments. Hint: try using a "running total" variable to calculate your sum.
-
You can once again use the String class's
-
Now that you've calculated the Tweet's sentiment, make a getter for that value, and modify your
toString()
method so that your String representation includes the sentiment value as well. -
You can test your sentiment calculation with code similar to the following:
Tweet tweet1 = new Tweet("cs161","computer science is totally my favorite!"); System.out.println(tweet1); Tweet tweet2 = new Tweet("cs161","life without computers: awful or awesome?"); System.out.println(tweet2);
which should print out
(0.75) cs161: computer science is totally my favorite! (-0.125) cs161: life without computers: awful or awesome?
Extension: Sentimental Searches
Finally, add in some searching functionality to the TweetAnalyzer
to actually use these newly calculated sentiment values.
-
Start by adding a method called
getAverageSentiment()
that takes as a parameter anArrayList
ofTweets
, and returns the average sentiment of those tweets. Remember to use afor each
loop to iterate through the ArrayList.- Calculating an average is a lot like calculating a sum (what you just did to get the sentiment of a tweet).
- If the given list of tweets is empty, this method should return 0.
-
Implement a method called
getMostPositive()
that takes as a parameter anArrayList
ofTweets
, and returns theTweet
that has the highest sentiment value.-
This method should use a king-of-the-hill search, which works as follows:
- Assign one item (e.g., the first item in the list) to be the "king" (the biggest/smallest/bestest) item.
- Loop through everyone else
- Compare each item to the king in a kind of challenge--if they are bigger/smaller/better than the king, then they become the new king (and should be assigned as such)
- Whoever is the king in the end must be the biggest/smallest/bestest item in the list!
-
If the given list of tweets is empty, this method should return
null
-
This method should use a king-of-the-hill search, which works as follows:
-
Last but not least, implement a method called
isMostlyPositive()
that takes as a parameter anArrayList
ofTweets
, and returns whether or not the majority of those tweets are positive (have a sentiment > 0). Think: what is a good data type for the return value?- Like the others, this method is also a linear search. You'll need to count the number of positive and negative tweets, and then make some decision about what to return based on those counts.
- A set of tweets needs to have strictly more positive tweets than negative tweets to be considerde "mostly positive".
-
Finally finally, modify your
printSearchResults()
so it also prints out the the average sentiment of the found tweets, the most positive tweet, and whether or not all of the found tweets are positive. You should call the methods you just wrote.
And that's it! You should now be able to run the TwitterSearcher
method and get details about the sentimentality of tweets that match your search!
- Try searching for Pugetsound's twitter account (
univpugetsound
). How positive has it been recently? - You might also try searching for things like "computer", "morning", "school", "rain", etc.
Extensions
No extra credit on this one, but we'd love to see/hear suggestions for different kinds of analysis you might perform on Tweets!
Submitting
-
Make sure both of your names are in a JavaDoc comment on both of your new classes (
Tweet
andTweetAnalyzer
). If your names aren't on your assignment, we can't give you credit! -
Right-click on the project folder you downloaded, then:
- If using Linux, select Compress...
- If using Windows, select Send to and then Zip file
- If using Mac, select Compress ... items
This will let you take the selected folder (or files) and generate a new compressed
.zip
file. -
Navigate to the course on Moodle
(Lecture Section A),
(Lecture Section B).
Upload your
.zip
file to theLab H Submission
page. Remember to click "Save Changes"! You may submit as often as you'd like before the deadline; we will grade the most recent copy. - While you're on Moodle, remember to fill out the Lab H Partner Evaluation. Both partners need to submit evaluations.
Grading
This assignment will be graded out of 21 points.
- Tweet Class
- [1pt] You have implemented a Tweet class with appropriate instance variables and constructors
- [1pt] Your Tweet class has required getters
- [1pt] Your Tweet class has a working toString() method
- TweetAnalyzer Class
- [1pt] You have implemented a TweetAnalyzer class with appropriate instance variables and constructors
- [2pt] Your TweetAnalyzer loads and loops through the TwitterDataFile's data
- [2pt] You use the .split() method to break a twitter data string into parts
- [1pt] You fill the instance variable ArrayList with Tweet objects
- Search Methods
- [1pt] You have implemented the printTweets() method with an appropriate signature
- [1pt] Your printTweets() method prints out the given tweets
- [1pt] You have implemented the find() method with an appropriate signature
- [1pt] Your find() method uses the .contains() method to check if a tweet contains the search term
- [1pt] Your find() method also returns tweets when the search term exactly matches the author
- [2pt] Your find() method returns a list of tweets that match the search criteria
- [1pt] Your printSearchResults() method prints out the results of a search
- [1pt] Your printSearchResults() method prints out the number of results found
- Sentiment (extra credit)
- [+1pt] Your Tweet class calculates its sentiment
- [+2pt] You have implemented sentiment-based searches (1pt each, 2 max)
- Style & Documentation
- [1pt] You use appropriate loops (for for arrays, foreach for ArrayLists)
-
[1pt] Your code shows proper style--including well-named variables and
private
instance variables - [1pt] You submitted a lab partner evaluation