My overarching research goal is to discover approaches, techniques, and tools that facilitate the
management of information. In an increasingly data-driven world, the ability to efficiently connect
related information, which may be scattered across different files, software tools, machines, or groups of
people, is becoming a necessary task in many domains. Thus far, my investigations are in the domains of
software engineering, eScience, and software acquisition.
Connecting related information in software engineering: Software Traceability
My dissertation work, Architecture-Centric Traceability for Stakeholders (ACTS), focused on effectively managing heterogeneous and distributed information in the domain of software engineering. ACTS addresses the multi-faceted software traceability challenges from the economic, technical, and social perspectives using insights from the fields of eScience, software architecture, and open hypermedia. Insights were also gleaned from a successful traceability project at Wonderware (see abstract). ACTS prospectively captures links as users perform their development tasks (see mp4).
Later, I incorporated the prospective link capture in ACTS with a machine learning technique known as topic modeling to automatically analyze linked information (see ICSE preprint). Our technique enables software engineers to gain a comprehensive view of the software and its related files (e.g., specifications, bug reports, test results, email).
More recently, I investigated how to use traceability techniques to manage changes across the software life cycle, including changes to source code and its accompanying documentation (see FACTS). Change management is a challenging task in software development, as numerous heterogeneous artifacts change at different times and changes to one artifact can impact many other artifacts.
Connecting related information in eScience: Data Provenance
In the domain of eScience, the ability to trace the origin of data and its connections to the eventual results, referred to as data provenance, is crucial in supporting the repeatability of analyses or experiments. There is a growing need to support data provenance since more and more researchers are using computational resources to collect data, run experiments, and perform scientific analyses.
Thus far, our research group has investigated techniques for reconstructing provenance for digital objects (see paper), recovering provenance for experiments or analyses that occurred in the past (see paper), and supporting provenance in popularly used third party tools (see journal article). We also investigated how to support provenance in different domains: computational neuroscience, atmospheric research, and multi-disciplinary research for the North Creek Wetlands restoration project
More recently, we developed ProvMASS, a technique for tracking the execution software (i.e., simulation logic) in a distributed and parallel environment such as MASS. This type of data provenance, which captures provenance during simulations, aims to assist researchers in multi-agent modeling for applications such as traffic or disaster response simulations.
Connecting related information in software acquisition: License Analysis
In the domain of software acquisition, I investigated the tracing of software license conflicts in heterogeneously composed software systems, in collaboration with Dr. Alspaugh and Dr. Scacchi at UC Irvine (see abstract). To support automated license analysis, we formally describe licenses and open architecture (OA) systems. We then linked the components in the architecture to their corresponding license information. As we traverse the architectural graph, we can automatically detect conflicting rights and obligations between the licenses in a system.