UW | College of the Environment | School of Oceanography | Physical Oceanography
Parker MacCready : Home | Research | People | Classes | Publications | LiveOcean | Tools

Assignments for Effective Computing


1. Introduce yourself to me

This is for me to get an idea of how you currently use computers, software, and programming for your research.
Answer the questions in the Intro PPT.


2. Installation feedback

Write me some notes on how your linux and python installations worked. If there were no issues this can be very short ("It worked."). The goal is for me to find out any problems so I can help, especially with Windows issues.


3. Linux practice and Your Workflow

1. Read through the PPTs Linux 1 and 2 and do the exercises in blue, and any others that you want. This is to give you some practice using linux from the command line.
2. Download the example shell script. Read it using a text editor. Then, following the directions from the last page of the Linux 2 PPT, make a directory for it on your machine and run it from the terminal.
3. This is the only part you hand in: consider a sequence of computer-based tasks (a "workflow") that you do as part of your research. Write out this workflow in text form, making notes on what computer or software you use for each step and how long it takes.


For me an example would be what I do to keep part of my model-data comparison system up to date:
Goal: Compare LiveOcean 2019 output to Washington Dept. of Ecology CTD/Bottle data
Steps:
* Contact friends at Ecology and see what the current protocol is for data requests. May have to trade in some favors. (1 day)
* Get Excel Spreadsheet from them. (2 weeks)
* Spend 2 days editing my python code (local mac) to ingest their data, accounting for new column names and error codes.
* Run python cast extraction code for 2019 on LiveOcean (remote machine). Along the way I have to use git to make sure that code is up to date on the remote machine. (10 minutes)
* Download results of cast extraction to local mac using Transmit (could have used scp) (1 minute)
* Run python code (local mac) that does model-data comparison. Deal with a few new bugs, inevitably. (2 hours)
* Result: a folder of plots (.png) showing the results, and some summary statistics.
* Question to ask myself: could I have used a shell script to make any part of this easier? Answer: maybe not in this case.


4. README

Please find a folder or collection of programs you have written or edited as part of a workflow. If you have not already done so, create a README file to go with these programs. The goal is to make a document that you or someone else can use to quickly orient yourself to what all the programs do. The exact format you use is personal choice - think of what you would want to read if you were picking up an old workflow after 6 months of doing other things. Here is an example for one of my projects README.txt. I do not expect yours to be as long as this! It could even document just one program.


5. numpy & argparse

For this week's assignment, please:
1. Use the directions from the first GitHub lecture PPT to get GitHub going on your laptop, and to clone a repo of your choice (the one where you will keep your code from this class) to fjord.
2. Referring to the code examples in my repo pmec/ex_numpy (which you have to get using git) please write a program which achieves the following minimum goals:
* Make some arrays in numpy and try 5 methods of your choice on them
* Have the code save some output to your output directory ([my_code]_output/) as a pickle file, and have it read that file back in. Have the code make the output directory if needed.
* Use argparse to add command-line arguments to your code, allowing the user to make some choice about what happens.
* Push your code to github, and then pull it to your account on fjord (in /data1), and try running it there.
3. Submit a copy of your code, and let me know your GitHub username. Please detail any problems you had. The goal, as usual, is not just to get the job done but to clearly understand what made it difficult (or impossible!).


6. matplotlib

1. Find a figure you have created as part of your research, and save it as a "BEFORE" .png.
2. Write a python program to remake the figure and save it as an "AFTER" .png
3. Look through papers written by others, select a figure you think is really great, and save a copy of it (I often do this as a "selected screen shot" while reading the PDF, using the shortcut SHIFT-COMMAND-4 on my mac).
4. Upload the code and all three figures, with a note in the comment section about graphical choices you made or things you wish you could get python to do.
-- Things to think about:
* Make all the figure creation happen in the code, not waiting for later hand editing.
* Are your fonts big enough to read? All of them?
* Are all lines thick enough to see?
* Is the figure "caption-ready" (panels have (a), (b), etc.)?
* Is there too much information on the plot?
* Are your colors doing what you want?


7. Read and manipulate data using pandas

1. Please read Chapters 5 and 6 of the book "Python for Data Analysis" The book is available as a free pdf here. These introduce the essential pandas data structure, called a DataFrame, and show how to read common data sources like csv and Excel files into it.
2. Try reading in one of your own text, .csv or .xlsx files using pandas methods and see if you can (1) manipulate the data in some way like choosing subsets or filtering, and (2) plot some of it. You can see examples of this in the code in pmec/ex_pandas, available as always using git pull.
3. Assignment to turn in: Write a question that arises during your file reading exercise, and describe how you tried (or succeeded) in answering your question. Be prepared to ask your question in class.


8. Final Project

This is the last assignment of the term, and is due the first day of finals week. It involves working in teams of 3-4 to write a program collaboratively. Each of the 6 teams will give a group presentation during the last week of class.
* The challenge: create a program (plus module if needed) that
(i) gets data from the web,
(ii) does some processing of the data, and
(iii) and saves one or more informative, beautiful plots.
The program should be well commented, and it should be able to be run without editing by anyone who downloads it. Make use of the [mydir], [mydir]_output folder structure. The program should make use of input commands or command line arguments to allow the user to make some choices about what it does. It should be able to run on fjord and both Windows (using Ubuntu) and mac laptops.
* Each team should appoint a Leader who will own the GitHub repo where the code lives. Try to use the GitHub clone-and-fork techniques we discussed in lecture to work on the code collaboratively.
* Deliverables: the team Leader should turn in the GitHub URL where the repo is (by email to me) and the PPT of the team presentation. Each team member should provide a half-page written discussion of how they contributed to the effort. During the last two classes of the term the teams will give presentations, three teams each day. Each team has a total of 20 minutes (including questions and handoff, so plan on talking for 12 minutes) and each team member should deliver a part of the presentation.
* Teams please be prepared to tell the class what you are working on in class. Also identify the team leader, the team name, and the URL of the repo for me to clone.