MEDED 598A: DB & Applications in the Health Sciences (3 credits).
Mon & Wed, 3pm - 4:20pm.
Fall, 2002.

HOMEWORK ASSIGNMENTS (Fall 2002)


Homework #1: UML class diagrams. Due Monday, Oct 7th, 3pm

1) Exercise #2 in the Post textbook, p 60. In addition to the class diagram, you must also provide three sample questions that the resulting database might be able to answer. (I.e., please think a little bit about the possible uses of this database.)

2) Modeling for "The Regional Epidemiology Center" (see handout). Think about some use cases for this domain -- what sort of actors or people would interact with such a database system? What sort of output might the produce to answer what sort of questions. For this domain, list 4 questions that a good database could answer, and include what type of actor would be interested in these answers.

Next, produce a class diagram for this domain. Unlike the simpler exercise above, the appropriate business rules are much less clear. There are a variety of different assumptions you could make, which lead to different class diagrams. For each association you draw between classes in your diagram, you must write down what business rule you are assuming. For associations that are non-obvious, discuss what alternative business rules or assumptions you might make.

For extra credit, talk to Bill Lober and/or Bryant Karras. They do exactly this work (but in the real world!), so they will have opinions about the right way to model things. Don't go overboard with this assignment -- try to limit your discussion of assumptions and associations to a few paragraphs, and no more than two pages of text.

You may hand in this homework on paper (old-fashioned pen & ink technology), or as a diagram drawn in Word or Powerpoint, or the drawing tool of your choice. It will probably be a bit more work, but as an alternative, you may hand in (via email) a Poseidon file (.zargo) with the class diagrams.


Homework #2: Normalized DB tables. Due Wednesday, Oct 16th, 3pm

Exercise #11 in the Post texbook, p. 125. However, I had a hard time understanding what all the data elements are supposed to mean in the forms shown. So here are my clarifying statements (and these are sort of hints):

An election (which occurs on a particular date) consists of a set of contests. Each contest is for a particular office (e.g. Mayor, Judge, Senator). You must also distinguish between primaries (a run-off within one political party) versus final elections (competitions between political parties). For those who may not be that familiar with the US voting system, a precinct is smaller than a district, and both are geographic regions.

You must record the time that each voter voted, but not who they voted for. Thus your DB cannot actually carry out a recount (to see which candidate won), but it can verify the number of votes cast in each contest, and then the names of the people that voted in each contest. Ignore the "IP Address" data element -- I'm not sure what that's supposed to be for.

I'm also confused by the "VoteID" and "ContestCandidate" terms shown in the bottom of the second form. I would assume this form is for recording what candidates were part of what contests on what election. But I would recommend using different (more descriptive) names for storing this information in some table(s).

Your system should be able to answer such questions as:

On what day was the final election for Mayor of bloomsburg held?
How many voters voted in the race for state senate in district 42?
How many republicans voted in that contest?
On Nov 11th, 2002, in precinct 13, how many different contests were held?
Who were the candidates for State senator on June 3rd 2002, in district 13?

For this assignment you do not need to figure out how to actually answer these questions. Just design your tables so that there is sufficient data to be able to answer them.

Your task (not mentioned in the book!) is to define a set of normalized database tables for this problem. More specifically, you must hand in list of tables, with primary keys (underline these), and associations indicated by the use of foreign keys (use dashed underlines). You do not need to give data types for the attributes of your tables. You may find it useful to also draw a class diagram, but this is optional. All of your tables should be in 3rd normal form.


Homework #3: SQL Queries. Due Oct 23rd, 3pm

Suppose you are building a database (a really toy database!) to store physician orders. Each order has an ordering physician, a patient, a date and a set of "orders". Orders can be either drug orders (prescriptions), procedures (such as Cat scan, MRI, X-ray), or lab tests. Every order has an order description (a text field) and a cost. (Ah, if only life were really this simple!) If an order is a lab-test, then it has additional information such as a specimen name and an "expect results by" date (but no values yet!). If an order is a drug prescription, then it has information specific to the drug (name, brand name, & supplier) as well as information specific to that particular instance of or-der (Drug order details such as Dose, route of administration, # of refills). You also need to store some simple demographic information about patients and physicians (full name & telephone number). Physicians should also include their area of specialization.

Build a set of normalized tables for this task for use within MySQL. Then, populate these tables with some data, and write SQL queries to answer the following questions:

  1. In Jan, 2002, how many times was Zofran (a brand name) prescribed? How many times was the route oral and how many times was it IV?
  2. In Jan 2002, how many labtests were ordered? How many procedures? How many drugs?
  3. What is the total cost of all orders placed by Physician xxx in Feb 2002?
  4. List all of the physicians that placed any order for Patient yyy
  5. List all physicians that ever ordered an MRI. Also list the dates of these orders.
  6. List all of the patient names and ordering physicians for orders of Zofran with a dose > 2 mg
  7. What is the total cost of all orders placed for drugs that were supplied by supplier zzz?
  8. List patients who had an order of Digoxin (generic name of drug) who did not have an order for a serum potassium test within 1 week of the Digoxin order. (An extra tricky one!)

Technology details: You must send email with a plain text file of your queries to answer a) thru h) above. You must also email your database. This can be done in a variety of ways, but probably the simplest is to export you database as a "dump", or a set of SQL commands. MySQL front can do this via "Ex-port tables". Your database must of a reasonable size to make things interesting and fun! You must have at least 20 patients, at least 10 doctors, at least 5 different drugs, and at least 30 orders.


Homework #4: More SQL, this time in PostGres. Due Nov 25th 3pm

I apologize in advance for switching technologies. PostGreSQL seems to be much better suited for teaching a number of advanced querying concepts. On the other hand, it is sort of nice for you all to realize that different DB systems look and feel somewhat different.

I have created accounts for all of you on the PostGres server running on olympic.informatics.washington.edu. Log in (with the pgAdmin II client), and load the partial database as specified by the file I emailed to all of you. You'll see that this is the same tired schema we've been using in class about MDs, Patients, and Orders. As supplied to you, the DB just has three tables: MD, PT and Orders.

1) Complete the DB -- add tables to store information about the three types of orders (labs, drugs, & procedures), including cost. As per the solutions to the last homework, there are at least two reasonable schemas to do this. Populate your tables (pgAdmin II provides a simple way to do this) so that your DB has enough information to answer the query in part 2 (with a non-empty result). You must also have some order details for Jeff Klein (see part 5).

2) List the cost and drug names of all the drugs ordered by Dr. Brooks.

3) On pgAdmin, there is an "Explain" button on the SQL window. Hit this button for the query you used for part 2). What the heck does this output mean? (I don't expect complete details about the Postgres implementation, but based on class lectures, make some educated guesses.) Give an example of one relational algebra equivalence rule that the optimizer might have used to help produce this plan.

4) Write an SQL query that lists all patient names that have orders from more than one physician. Note that this is different than listing patients with more than one order (from any physician). Hint: I would recommend the use of some temporary tables or views. This means that your answer will depend on other commands (like "create view...").

5) Suppose poor Jeffrey Klein dies. We've decided that we should therefore remove all orders and database entries associated with Jeffrey. How do we do this?

Technology details: Your PostGres login is as follows: Your user name is your email address name (the stuff before "@"), and your password is your last 4 digits of your student ID. Send Laurel Rees and/or me email if you have trouble logging in.

To get started in pgAdmin, create a database (using a selection from the "toolbox" icon), and then open an SQL window (the wierd alien icon), and load the file "HW4dumpfile.sql" (which I sent via email). Assuming you do not get an error, you should be able to browse thru the starter DB.


Homework #5: Project Preview. Due Dec. 10th, 5pm.

This final homework is simply a "check-point" in your work toward your final presentation (on Dec 18th).

For this homework, you should hand in a description of the domain in which you will be presenting two alternative designs. You should sketch out a some of the queries that you expect a "good" design to be able to handle. I would like a relatively rich description of the domain. One of the goals of this final project is to allow you to get a bit beyond the "toy" example databases we have used in class so far. You might, for example, want to try to import some large set of data available from a web site or some other source. It would also be good if you had relatively complex inter-relations among the concepts in your design. (Remember, one of the deliverables for the project will be a UML diagram.) I will not expect completed designs at this check-point, but you also must be able to describe (at least at the "buzzword" level) what alternatives you will be considering (E.g. normalized vs. Non-normalized, or relational versus XML, or EAV vs. XML, etc...)

Since the final project is pretty open-ended with respect to what sort of domain and what sorts of alternative designs, it would probably be a good idea to contact me (email or in person) about your project ideas prior to Dec. 10th. Otherwise, if I do not see enough details, or if I see other problems, you will have to re-submit this homework.