Mebi 550 Project 1

Knowledge Representation and Biomedical Applications
MEBI 550, Fall 2013

Project 1: Modeling with RDF
Due: Oct 9th

In this assignment, you will explore some web resources that provide biomedical informatics data and knowledge. Your goal is to develop your own small model of some tiny portion of the data, and then instantiate this model using the RDF triple-store formalism.

Although open-ended, I am strongly suggesting one of four possible starting points:

Striated muscle contraction, from Reactome.
Patient data as stored in the MIMIC II database. This data is too huge to look at in entirety, so I've pulled a tiny sample of meds and labs and notes for two patients in this zipped directory of comma-delimited text files. This sample will be plenty of data to use for the assignment, but you may want to look at the schema information available from the MIMIC II web site, for additional relationships. On the other hand, this is a full-on relational database schema in UML notation, so it may just confuse you.
Gene queries from the JAX Mouse Genomics Informatics resource. As one example starting point, look at information about the Bace1 protein and / or the APP protein (both implicated in mouse models of Alzheimer's disease) If you choose this starting point, I would expect you to have some background in genomics, as one could otherwise be easily overwhelmed and lost.
Patient data stored at the Personal Genome Project. This isn't data as it might appear in a commercial electronic medical record system, but it is real data. As a starting point, try looking at procedures, conditions, or medications. Be warned that the data is very sparse; you might select only patients with some specific condition.

Each of these resources contains data, and, in entirety, the data are large and complex enough that there is no single, simple "right" answer for knowledge representaion.These resources are also vetted -- they've been around a long time, are well used and robust. Starting from one of these four pages, read and browse. Additionally, unless you are already an expert in one of these topics, I'd strongly recommend using related Wikipedia pages. Learn something. Then, your goal is to formalize and store a small portion of this knowledge as a set of RDF triples.

Although I'm asking you to look at "real" domain knowledge -- each of these sites has incredibly deep and detailed knowledge -- you should only capture a very small, toy amount of knowledge in this exercise. Furthermore, although it may be tempting (and appropriate in the real world) to start by looking at other knowledge bases or ontologies, for learning purposes, I want you to try and build an RDF knowledge base from scratch. For example, the BioPAX ontology provides an excellent framework for storing knowledge about biochemical reactions and pathways such as are stored in Reactome. However, although you are welcome to look at BioPax (and perhaps you should), for this assignment I want you to re-create those ideas in your own RDF store.

In addition, since a learning goal is to use and understand RDF, you may actually "make up" additional facts that aren't explicitly listed your data source, if those facts help you "connect the dots" in your RDF graph. These new facts should be reasonable, not nonesensical fact -- e.g. that a patient has a particular primary care physician, even though your data source (MIMIC of PGP) might not list such a fact.

For this very small, toy RDF store, you can just use turtle syntax and plain text -- using whatever text editor of choice. I would recommend that you validate your RDF syntax via one (or more) of the tools listed below. Although I expect these RDF knowledge bases to be small, they should be "big enough" to explore some interesting relationships and a variety of bits of information you may have learned and extracted from the web bio-informatics knowledge resources. Under "deliverables", I've defined some (somewhat arbitrary) boundaries for what is "big enough".

Resources:

RDF syntax validator and converter (i.e., to and from N3 / Turtle syntax).
RDF Gravity -- RDF visualizer and query program (not editing). Requires RDF/XML syntax.
Welkin project -- An RDF visualizer designed for large RDF or OWL data sets.

Deliverables (due Noon, Wednesday, Oct 9th):

You must write 1-2 pages of text explaining what you learned from browsing the web pages, and what you choose to try and capture as your RDF knowledge base. If appropriate, you may wish to let me know what your level of prior knowledge was about the topic. Your essay must be clear and well-written, but need not be overly formal. Use first-person voice. I encourage reflective learning, where you consider which aspects of the assignment were satisfiying / effective for you, and which were frustrating / not helpful.

You must hand in a N3 / Turtle syntax plain text file. If you successfully visualized your RDF knowledge with a tool, please let me know which tool, and I wll use the same when I evaluate your work. Your RDF KB must have at least 5 different sorts of properties (predicates) and at least 50 total triples.

Both documents should be handed in via the course Catylst drop box, as will be true for all assignments.

Grading rubric:

	Less than 3.0	3.0 -- 3.2	3.3 - 3.7	3.8 -- 4.0
Essay	Report is unclear and/or poorly written and/or does not meet requirements.	Some writing problems, and student minimally satisfies requirements.	Shows reflective learning, and is a clearly written report.	Student explores interesting issues.
Content:	No evidence that the student learned new material or did not satisfy requirements.	Satisfied requirements, content is reasonably interesting.	Student demonstrates an exploration of representational choices for their content; essay describes these choices.	Student explores open issues in KR for their content.
RDF correctness	RDF file is not correct	RDF is correct/valid, and matches the choice of content selected.	RDF elegantly represents the selected content; student uses appropriate constructs.

Last Updated:
Sept, '12

Contact the instructor at: gennari@u.washington.edu