CS 261 Homework 9 - Analysis of Sorting Algorithms

Due Wed May 07 at 11:59pm

Overview

For this assignment, you will be experimenting with and investigating the sorting algorithms we've discussed in class, in order to get a better understanding of their differences (especially in regards to their efficiency). You will be running a number of tests on these algorithms, then analyzing the results of these tests and writing up a report on your findings. Thus this assignment will involved writing a bit of code, but focuses primarily on considering and writing about the results of that code, rather than on the implementation. Yes, we do in fact do writing in computer science classes sometimes!

Objectives

To practice creating testing frameworks for methods
To improve your understanding of the differences between and uses of various sorting algorithms
To practice comparing theoretical results with the results of real implementations
To practice researching new algorithms!

Necessary Files

You will need a copy of the Sorting algorithm code that we've gone over in lecture. This code is posted on Moodle and will be updated after lecture each day's class (and you'll need ALL the sorts). Note that implementations of the sorting algorithms can also be found in the textbook--or you could even implement them yourself!

You will also want a copy of the Hwk9.zip file. This file contains a class PermutationSet that can be used to generate and iterate through all possible orderings of a small-sized array.

Because of memory limitations (the number of permutations has factorial growth), it is not possible to directly access the different permutations. Instead, you will need to use the provided Iterator. Calling the .iterator() method on your PermutationSet object will give you access to the iterator, which you can then walk through using the .hasNext() and .next() methods (using the same pattern/structure you've used with the Scanner, for example).
- Moreover, since a foreach loop uses an iterator in its underlying implementation, it becomes the easiest way to use the PermutationSet class. The class has a main() method at the bottom with an example of usage.
The iterator returns a new array of elements in the permuted order; you can sort this list in place without worrying about messing anything else up (though make sure you don't try and apply two sorting algorithms to the same array!).
Note that generating permutations for large lists is slow. On my workhorse machine, it takes a bit over 2 minutes to just generate permutations for a list of size 11---not counting any time for sorting! And because permutations grow factorially, this means it would take 12 times as long for a list of size 12 (around 30 minutes). A list of size 13 would take 13 times as long as that... or close to 7 hours. So although the PermutationSet class can handle lists up to 20 items, you do not want to use lists that long. Permutations of 12 item lists is the required maximum for the assignment.

Note that there is no README for this assignment. Instead, your submitted report will need to include answers to a list of familiar questions in the final section.

Finally, you will also want a copy of a word processing program (like Word), as well as a spreadsheet program (like Excel) to help you with recording, organizing, and presenting the data of your experiments.

Assignment Details

There are multiple parts to this assignment, each of which should be written as a different "section" of your report. You should use appropriate section headings to separate out each part of your report, so it's easy to read!

See the Submission section below for more details about the proper format of your report.
There is no minimum page count for this report. However, a robust analysis of these algorithms will likely require 3-5 pages of writing (in addition to any tables and graphs). Being succinct is fine, but make sure you include the required depth in your analysis!

Part 1. Quadratic Sort Comparisons

In class we discussed three different quadratic sorts, or sorting algorthms that run in O(n²):

Bubble Sort
Selection Sort
Insertion Sort

While all these have an average run-time of O(n²), they have different "best-case" and "worse-case" complexities, and different run-time coefficients that give them differnet efficiencies in practice.

Your first task will be to test and confirm these differences by calculate the exact number of comparisons used for all orderings of (a small-sized) list of elements. This will let you be able to explicitly calculate the best-case number of comparisons (the smallest number of comparisons), the average number of comparisons, and the worse-case number of comparisons (the largest number of comparisons).

You should use the provided PermutationSet class to calculate the smallest, average, and largest number of comparisons across the permutations, for all list sizes from n=2 to n=12. Do this for each of the three quadratic sorting algorithms (bubble, selection, and insertion sort). Present your results in a table, with a column for each list size (2 to 12) and a row for the smallest, average, and largest number of comparisons for each sorting algorithm (9 rows total).

Be sure and give your table an appropriate title and/or caption, so the reader know what it is referring to!
You are welcome to include further information as desired (e.g., if you test larger list sizes,)--whatever helps inform your analysis. You may use supplementary tables as needed.

In addition to your data table, you should include prose analysis of the data--that is, a written description of your results! You should not rewrite what numbers you got, but instead provide further analysis of the results. What do they mean? How do you interpret this data? What does it tell you about the way these sorting algorithms work? Points you should consider (in no particular order):

What appears to be the best-, average-, and worst-case run time (in Big-O notation) of each sort? Does that match what we had discussed in class?
Which permutations produced the smallest, average, and most number of comparisons? You will likely need to re-run your tests to also print out the permutation that had the appropriate number of comparisons.
In what circumstances are each of the three sorting algorithms most effective (in comparison to the others)? In particular:
- What effect do "turtles" and "rabbits" have when using Bubble Sort? Be sure and support your claim with data (hard numbers!) You might consider: does having a large number of turtles make the algorithm even slower?
- What is the difference between best-, average-, and worst-case number of comparisons in Selection Sort? What is the significance of the result?
What effects does having the list already in order / partial in order / in reverse order have on these sorting algorithms? Why would such differences be important in practice?

You can of course include other insights or analysis. The goal is to give a full description and consideration of these sorting algorithms based on the data produced by your experiments. Note that answering these questions may involve further experiments--for example, hard-coding in a specific set of elements to sort (rather than relying simply on the PermutationSet).

As a final note, you can also optionally use this testing strategy (considering all permutations) in analyzing the other sorting algorithms we've talked about, including that data and analysis in your report.

Part 2. Fast Sort Comparisons

In addition to the quadratic sorts, we also discussed a number of sorting algorithms that run in "linearithmic" O(n*lg(n)) time:

Merge Sort
Heap Sort
Quick Sort

(while some of these sorts may have worse-case run-times that are worse than O(n lg(n), we're interested in the average-case of these algorithms for this report).

Although they all share the same Big-O complexity, in practice the algorithms tend to have different performance (e.g., with a different run-time coefficient). Your second task will be to experiment and test these algorithms to determine which is best "in practice" (on average)---and how much better it is!

Since these are fast sorting algorithms, you should test their efficiency on large lists (with thousands or even millions of items), on small lists, and all the lists in between! Thus you will not be able to use the PermutationSet class, but instead should randomly generate lists of different sizes to test your sorting algorithms on.

But because any particular random list may be either a "good" or "bad" list (maybe you get lucky and it's already in order!), you will want to tests lots (10s, maybe 100s) of different lists of the same size. Then you can take the "average" number of comparisons from these tests to get a good "average" number of comparisons for that list size. You can run as many tests as you have time--the more you run, the closer to the actual average your results will be!
- Nested loops wll be helpful here!
- Keep in mind the amount of time it will take to run these tests, and adjust your numbers (maximum list size, number of list sizes, number of iterations, etc) accordingly.
You are welcome to use the "random array" generating code from lecture... or to write your own!
Note that for Quick Sort, you can use any pivot-choosing method you wish--but be sure to note your choice in your write-up!

For your report, you should test these three sorting algorithms with a variety of different list sizes--the more the better (you might try increasing the list size by incrments of 10 or 100 or 1000). Document the average number of comparisons for each of these list sizes in a data table (I recommend you use list sizes as rows and sorting algorithm as columns). You sohuld then generate a graph of the results, including a line indicating the "trend" (Excel calls this a trendline, and it can be automatically generated).

Your graph should include a line for each of the three sorting algorithms, as well as the graph of n*lg(n). You can calculate this last for each n (list size), and then graph those results. Include these values in your data table as well.
Make sure and include appropriate titles/labels/captions, so the reader knows how to understand your graph!

Similar to with Part I, you must include a written analysis of your results. Your analysis should consider the following points:

Give an estimate of the run-time coeffcient of each algorithm--that is, about what number can you multiply the n*lg(n) value to get the number of comparisons of the sort?
Based on the above, which sorting algorithm appears to be faster (uses fewer comparisons) on average? Why do you think this is?
As an extension, does the "clock-time" efficiency of the algorithms correlate with the number of comparisons (so algorithms with fewer comparisons are actually faster?)
For small n (e.g., less than 10 or 20), how do these fast sorting algorithms compare with your results for the quadratic sorts (above)? What does this say about the way we use Big-O to evaluate algorithmic efficiency?

Again, you can include other insights or analysis as appropriate. The goal is to consider the efficiency and trade-offs of these algorithms based on the data produced by your experiments.

Note that once you have a setup in place to test these algorithms, it is trivial to run with other sorts (such as the quadratic sorts, or Shell Sort, or any other algorithm!)

Part 3. A New Sorting Algorithm

Finally, this report will give you a chance to explore further sorting algorithms that we haven't discussed in class (and there are a lot of them!) You will pick a sorting algorithm other than one of the 7 we discussed in class, implement it, and perform a similar analysis as you did in Part 1 or 2 (or both!)

Wikipedia has an excellent list of sorting algorithms, and is a good place to start looking for options. Note that we've only talked about comparison sorts for this class, but you are welcome to explore non-comparison sorts. Most descriptions include pseudocode you can use to help you get started. Remember to give proper credit when using information from the Internet!

Your analysis should include a clear explanation of how this algorithm works (in your own words--not just a rewrite of Wikipedia!). Also include data demonstrating the algorithm's run-time (e.g., calculate the average number of comparisons and then graph it, showing this graph is O(whatever)).

If you have any problems understanding an algorithm, please let me know.

Part 4. Readme

Finally, you should include a section that answers the following (now standard) README questions:

Did you receive any help from others (classmates, tutors, etc) on this assignment? If so, please list who.
Approximately how many hours did it take you to complete assignment?
Describe any problems you encountered in this assignment. What was hard, or what should we warn students about in the future? How can I make this assignment better?
List any other comments (about the assignment or your submission) here. Feel free to provide any feedback on what you learned from doing the assignment, whether you enjoyed doing it, etc.

Timeline

We will be going over the required Sorting Algorithms in class as you are working on this assignment--I'll be updating the lecture code on Moodle as we do so. Because of this, it makes sense to complete Part 1 of the assignment first (since we'll go over those algorithms first), followed by Part 2. Part 3 you can complete at any time. Thus the order of completion is entirely up to you, and there are no real "guides" for how fast you should be completing the assignment

That said, the amount of time needed to test and generate the data can sneak up on you (and writing might go slower than you expect), so get started early! Try to do a little each day, and you'll be fine.

Extensions

It may be possible to earn extra credit by performing extra analysis. For example, you might also count the number of swaps (or assignments at all!) that occur, and add that into your efficiency counting. So maybe you track both comparisons and assignments, or call any comparision or assignment an "operation" and count those!

Submitting

Before you submit, please take a few minutes and proof read your report. Your goal is to give a clear and insightful analysis; nothing hinders clarity quite like inadvertant typos, uncompleted sentences, etc. (as you may have noticed from these assignments). Read your report through twice before you turn it in!

Syntax and readability matters for written analysis of code just like it does for code itself!

Submit your report in .pdf format (not .docx, .txt, .rtf, or any other format). Most Word processors have ways to "Save as" or "Print to" a pdf; if you have any problems, please let me know. Your document should include all required tables, charts, and anaysis.

Also be sure and put your name on your report! A cover sheet would with your name and date would be ideal.

In addition to your written report, also submit the code you used to generate your data. This will allow me to check if there were any flaws in you methodology (e.g., that you were counting comparisons correctly). And of course, your code should include an implementation of the sorting method you learned about in Part 3.

Make sure your code is fully documented and functional, so I can see what you did (don't delete old code because you're done with it--you may want to go back and re-run your tests!). Also, make sure your name is at the top of all your classes!

Submit your report and your code to the Hwk9 submission folder on hedwig. Make sure you upload your work to the correct folder!.

The homework is due at midnight on Wednesday, May 07 (the last day before reading period).

Grading

As noted above, this assignment is primarily focused on you data and written report, rather than the implemented code. Grading will reflect that.

This assignment will be graded on approximately the following criteria:

Your report includes data for the quadratic sorts (Part 1) [15%]
Your report includes analysis of the quadratic sorts (Part 1) [15%]
Your report includes data and a graph of the fast sorts (Part 2) [15%]
Your report includes a written analysis of the fast sorts (Part 2) [15%]
Your report includes an implementation of another sorting algorithm (Part 3) [10%]
Your report includes an analysis and explanation of this other algorithm (Part 3) [10%]
Your report answers the README questions (Part 4) [5%]
Your algorithm testing code is useful and extendable [15%]