My overarching research goal is to discover approaches, techniques, and tools that facilitate the
management of information. In an increasingly data-driven world, the ability to efficiently connect
related information, which may be scattered across different files, software tools, machines, or groups of
people, is becoming a necessary task in many domains. Thus far, my investigations are in the domains of
software engineering, eScience, and software acquisition.
Connecting related information in software engineering: Software Traceability
My dissertation work, Architecture-Centric Traceability for Stakeholders (ACTS), has focused on effectively managing heterogeneous and distributed information in the domain of software engineering. ACTS addresses the multi-faceted software traceability challenges from the economic, technical, and social perspectives using insights from the fields of e-Science, software architecture, and open hypermedia. Insights were also gleaned from a successful traceability project at Wonderware (see abstract). ACTS prospectively captures links as users perform their development tasks (see mp4).
Later, I incorporated the prospective link capture in ACTS with a machine learning technique known as topic modeling to automatically analyze linked information (see ICSE preprint). Our technique enables software engineers to gain a comprehensive view of the software and its related files (e.g., specifications, bug reports, test results, email).
More recently, I have investigated how to use traceability techniques to manage changes across the software lifecycle, including changes to source code and its accompanying documentation. Change management is a challenging task in software development, as numerous heterogeneous artifacts change at different times and changes to one artifact can impact many other artifacts.
Connecting related information in e-Science: Data Provenance
In the domain of eScience, the ability to trace the origin of the data and its connections to the eventual results, referred to as data provenance, is crucial in supporting the repeatability of analyses or experiments. There is a growing need to support data provenance since more and more researchers are using computational resources to collect data, run experiments, and perform scientific analyses.
One way to enable researchers to capture provenance is to provide automated support within a popular analysis tool, such as spreadsheets. Our research group has developed techniques for both capturing changes within a spreadsheet environment (see journal article) and tracking the source of information as it enters the spreadsheet environment (see paper).
In collaboration with researchers at the Lawrence Livermore National Laboratory, we have also examined ways to increase the accessibility of experiment files (see paper and conference presentation by an undergraduate student). Most recently, we have investigated ways to recover provenance for experiments or analyses that occured in the past (see paper).
Connecting related information in software acquisition: License Analysis
In the domain of software acquisition, I investigated the tracing of software license conflicts in heterogeneously composed software systems, in collaboration with Dr. Alspaugh and Dr. Scacchi at UC Irvine (see abstract). To support automated license analysis, we formally describe licenses and open architecture (OA) systems. We then linked the components in the architecture to their corresponding license information. As we traverse the architectural graph, we can automatically detect conflicting rights and obligations between the licenses in a system.