CS 261 Homework 9 - Analysis of Sorting Algorithms

Due Wed May 07 at 11:59pm

Overview

For this assignment, you will be experimenting with and investigating the sorting algorithms we've discussed in class, in order to get a better understanding of their differences (especially in regards to their efficiency). You will be running a number of tests on these algorithms, then analyzing the results of these tests and writing up a report on your findings. Thus this assignment will involved writing a bit of code, but focuses primarily on considering and writing about the results of that code, rather than on the implementation. Yes, we do in fact do writing in computer science classes sometimes!

Objectives

Necessary Files

You will need a copy of the Sorting algorithm code that we've gone over in lecture. This code is posted on Moodle and will be updated after lecture each day's class (and you'll need ALL the sorts). Note that implementations of the sorting algorithms can also be found in the textbook--or you could even implement them yourself!

You will also want a copy of the Hwk9.zip file. This file contains a class PermutationSet that can be used to generate and iterate through all possible orderings of a small-sized array.

Note that there is no README for this assignment. Instead, your submitted report will need to include answers to a list of familiar questions in the final section.

Finally, you will also want a copy of a word processing program (like Word), as well as a spreadsheet program (like Excel) to help you with recording, organizing, and presenting the data of your experiments.

Assignment Details

There are multiple parts to this assignment, each of which should be written as a different "section" of your report. You should use appropriate section headings to separate out each part of your report, so it's easy to read!

Part 1. Quadratic Sort Comparisons

In class we discussed three different quadratic sorts, or sorting algorthms that run in O(n2):

  1. Bubble Sort
  2. Selection Sort
  3. Insertion Sort

While all these have an average run-time of O(n2), they have different "best-case" and "worse-case" complexities, and different run-time coefficients that give them differnet efficiencies in practice.

Your first task will be to test and confirm these differences by calculate the exact number of comparisons used for all orderings of (a small-sized) list of elements. This will let you be able to explicitly calculate the best-case number of comparisons (the smallest number of comparisons), the average number of comparisons, and the worse-case number of comparisons (the largest number of comparisons).

You should use the provided PermutationSet class to calculate the smallest, average, and largest number of comparisons across the permutations, for all list sizes from n=2 to n=12. Do this for each of the three quadratic sorting algorithms (bubble, selection, and insertion sort). Present your results in a table, with a column for each list size (2 to 12) and a row for the smallest, average, and largest number of comparisons for each sorting algorithm (9 rows total).

In addition to your data table, you should include prose analysis of the data--that is, a written description of your results! You should not rewrite what numbers you got, but instead provide further analysis of the results. What do they mean? How do you interpret this data? What does it tell you about the way these sorting algorithms work? Points you should consider (in no particular order):

You can of course include other insights or analysis. The goal is to give a full description and consideration of these sorting algorithms based on the data produced by your experiments. Note that answering these questions may involve further experiments--for example, hard-coding in a specific set of elements to sort (rather than relying simply on the PermutationSet).

As a final note, you can also optionally use this testing strategy (considering all permutations) in analyzing the other sorting algorithms we've talked about, including that data and analysis in your report.

Part 2. Fast Sort Comparisons

In addition to the quadratic sorts, we also discussed a number of sorting algorithms that run in "linearithmic" O(n*lg(n)) time:

  1. Merge Sort
  2. Heap Sort
  3. Quick Sort

(while some of these sorts may have worse-case run-times that are worse than O(n lg(n), we're interested in the average-case of these algorithms for this report).

Although they all share the same Big-O complexity, in practice the algorithms tend to have different performance (e.g., with a different run-time coefficient). Your second task will be to experiment and test these algorithms to determine which is best "in practice" (on average)---and how much better it is!

Since these are fast sorting algorithms, you should test their efficiency on large lists (with thousands or even millions of items), on small lists, and all the lists in between! Thus you will not be able to use the PermutationSet class, but instead should randomly generate lists of different sizes to test your sorting algorithms on.

For your report, you should test these three sorting algorithms with a variety of different list sizes--the more the better (you might try increasing the list size by incrments of 10 or 100 or 1000). Document the average number of comparisons for each of these list sizes in a data table (I recommend you use list sizes as rows and sorting algorithm as columns). You sohuld then generate a graph of the results, including a line indicating the "trend" (Excel calls this a trendline, and it can be automatically generated).

Similar to with Part I, you must include a written analysis of your results. Your analysis should consider the following points:

Again, you can include other insights or analysis as appropriate. The goal is to consider the efficiency and trade-offs of these algorithms based on the data produced by your experiments.

Note that once you have a setup in place to test these algorithms, it is trivial to run with other sorts (such as the quadratic sorts, or Shell Sort, or any other algorithm!)

Part 3. A New Sorting Algorithm

Finally, this report will give you a chance to explore further sorting algorithms that we haven't discussed in class (and there are a lot of them!) You will pick a sorting algorithm other than one of the 7 we discussed in class, implement it, and perform a similar analysis as you did in Part 1 or 2 (or both!)

Your analysis should include a clear explanation of how this algorithm works (in your own words--not just a rewrite of Wikipedia!). Also include data demonstrating the algorithm's run-time (e.g., calculate the average number of comparisons and then graph it, showing this graph is O(whatever)).

If you have any problems understanding an algorithm, please let me know.

Part 4. Readme

Finally, you should include a section that answers the following (now standard) README questions:

  1. Did you receive any help from others (classmates, tutors, etc) on this assignment? If so, please list who.
  2. Approximately how many hours did it take you to complete assignment?
  3. Describe any problems you encountered in this assignment. What was hard, or what should we warn students about in the future? How can I make this assignment better?
  4. List any other comments (about the assignment or your submission) here. Feel free to provide any feedback on what you learned from doing the assignment, whether you enjoyed doing it, etc.

Timeline

We will be going over the required Sorting Algorithms in class as you are working on this assignment--I'll be updating the lecture code on Moodle as we do so. Because of this, it makes sense to complete Part 1 of the assignment first (since we'll go over those algorithms first), followed by Part 2. Part 3 you can complete at any time. Thus the order of completion is entirely up to you, and there are no real "guides" for how fast you should be completing the assignment

That said, the amount of time needed to test and generate the data can sneak up on you (and writing might go slower than you expect), so get started early! Try to do a little each day, and you'll be fine.

Extensions

It may be possible to earn extra credit by performing extra analysis. For example, you might also count the number of swaps (or assignments at all!) that occur, and add that into your efficiency counting. So maybe you track both comparisons and assignments, or call any comparision or assignment an "operation" and count those!

Submitting

Before you submit, please take a few minutes and proof read your report. Your goal is to give a clear and insightful analysis; nothing hinders clarity quite like inadvertant typos, uncompleted sentences, etc. (as you may have noticed from these assignments). Read your report through twice before you turn it in!

Submit your report in .pdf format (not .docx, .txt, .rtf, or any other format). Most Word processors have ways to "Save as" or "Print to" a pdf; if you have any problems, please let me know. Your document should include all required tables, charts, and anaysis.

In addition to your written report, also submit the code you used to generate your data. This will allow me to check if there were any flaws in you methodology (e.g., that you were counting comparisons correctly). And of course, your code should include an implementation of the sorting method you learned about in Part 3.

Submit your report and your code to the Hwk9 submission folder on hedwig. Make sure you upload your work to the correct folder!.

The homework is due at midnight on Wednesday, May 07 (the last day before reading period).

Grading

As noted above, this assignment is primarily focused on you data and written report, rather than the implemented code. Grading will reflect that.

This assignment will be graded on approximately the following criteria: