My research interest lies in the area of data management and information
retrieval, crowdsourcing, health informatics, and so on. My PhD research
has focused in designing novel online data exploration techniques from
underlying large data repositories (structured data
and web), that extend existing ranked retrieval based query-answering
Sincere thanks to all my collaborators and Microsoft Research, Multicare Health Systems, and Edifecs for their support on my research.
Data Cleaning Efforts in Microsoft Academic Search Engine Data (collaborator: Microsoft Academic Search team)
ALIAS:Identifying Duplicate authors in Microsoft Academic Search
Big Data in Health Informatics (collaborators: Ankur Teredesai, Multicare Health Systems)
Predicting Risk-of-Readmission for Congestive Heart Failure Patients
Program Committee - Sigmod 2015, W3PHI-2015, Sigmod 2014, EDBT 2014, AMIA 2014, CIKM 2014, BDCA 2014, IIWeb 2014, GDM 2014, KDD Cup 2013, WebDB 2013, ICHI 2013, GDM 2012,
IIWeb 2012, SIGMOD Travel Award Committee 2012, GDM 2013, WebDB 2013.
Journal Review - TKDE, Information Systems, TOIS, Distributed and Parallel Databases, VLDB Journal.
Organizing Committee - Guest Editor of JMLR for the KDD Cup Special Issue, KDD Cup 2013
MSR Blog on KDDCup 2013.
Organizing Committee - HRPCRM 2013 in Conjunction with ICHI 2013.
Advisory board - UW Certificate Program in Machine Learning.
Atsuyuki Morishima, Sihem Amer-Yahia, Senjuti Basu Roy.
“Crowd4U: An Initiative for Constructing an Open Academic Crowdsourcing Network”, to appear HCOMP 2014.
Senjuti Basu Roy, Tina Eliassi-Rad, Spiros Papadimitriou.
“Fast Best-Effort Search on Graphs with Multiple Attributes”, TKDE 2014.
Vivek Rao, Kiyana Zolfaghar, Vani Mandava, Senjuti Basu Roy, Ankur Teredesai
“Readmission Score as a Service(RaaS)”, Data Science for Social Good, in conjunction with KDD 2014.
Gigaom Article, July 14, 2014
DataConomy Article, July 15, 2014
Davide Mottin, Senjuti Basu Roy, Alice Marascu, Themis Palpanas, Yannis Velegrakis, Gautam Das
“IQR: An Interactive Query Relaxation System for the Empty-Answer Problem” (demo), SIGMOD 2014.
Michael Pitts, Swapna Savvana, Senjuti Basu Roy, and Vani Mandava.
“Author Disambiguation in Microsoft Academic Search Engine Dataset”, EDBT 2014.
Senjuti Basu Roy, Saravanan T., Sihem Amer-Yahia, Gautam Das, and Cong Yu.
“Exploiting Group Recommendation Functions for Flexible Preferences”, ICDE 2014.
Senjuti Basu Roy, Si-chi Chin.
“Prediction and Management of Readmission Risk for Congestive Heart Failure”, HealthInf 2014.
Si-chi Chin, Kiyana Zolfaghar, Senjuti Basu Roy, Ankur Teredesai, Paul Amoroso.
“Divide-n-Discover : Discretization based Data Exploration Framework for Healthcare Analytics”, HealthInf 2014.
Kiyana Zolfaghar, Naren Meadem, Ankur Teredesai, Senjuti Basu Roy, Si-Chi Chin, Brian Muckian
“Big Data Solutions for Predicting Risk-of-Readmission for Congestive Heart Failure Patients”, BigData'13, Big Data in Bioinformatics and Health Informatics .
Senjuti Basu Roy, Martine De Cock, Vani Mandava, Swapna Savvana, Brian Dalessandro, Claudia Perlich, William Cukierski, Ben Hamner
“The Microsoft Academic Search Dataset and KDD Cup 2013”, KDDCup 2013.
Davide Mottin, Alice Marascu, Senjuti Basu Roy, Gautam Das, Themis Palpanas, Yannis Velegrakis
“A Probabilistic Optimization Framework for the Empty-Answer Problem”, PVLDB 2013.
Senjuti Basu Roy, Ioanna Lykourentzou, Saravanan Thirumuruganathanz, Sihem AmerYahia, Gautam Das
“Crowds, not Drones: Modeling Human Factors in Interactive Crowdsourcing”, DBCrowd, in conjunction with PVLDB 2013.
Naren Meadem, Nele Verbiest, Kiyana Zolfaghar, Jayshree Agarwal, Si-chi Chin, Senjuti Basu Roy, Ankur Teredesai
“Exploring Preprocessing Techniques for Prediction of Risk
of Readmission for Congestive Heart Failure Patients”, Data Mining in Healthcare, in conjunction with KDD 2013.
Kiyana Zolfaghar, Jayshree Agarwal, Nele Verbiest, Si-chi Chin, Senjuti Basu Roy, Ankur Teredesai
“Risk-O-Meter: An Intelligent Healthcare Risk Calculator”, KDD 2013.
Priya Govindan, Tina Elaissi-Rad, Senjuti Basu Roy
“Learning to Predict the Presence of Nodes in Anonymized Graphs”, WIN Workshop 2012.
Senjuti Basu Roy, Gautam Das, Sajal K. Das
“Algorithms for Computing Best Coverage Paths in the Presence of
Obstacles in a Sensor Field”, JDA 2012.
Senjuti Basu Roy, Kaushik Chakrabarti.
“Location-Aware Type Ahead Search on Spatial Databases (submitted version)”, Sigmod 2011.
Senjuti Basu Roy, Sihem Amer-Yahia, Gautam Das and Cong Yu.
“Interactive Itinerary Planning”, Accepted in ICDE 2011.
We consider the problem of suggesting a personalized itinerary to a user in an interactive manner.
The iterative process works as follows: (1) the user provides feedback on POIs selected by the system, (2) the system
recommends the best itineraries based on all feedback so far, and (3) the system further
selects a new set of POIs, with optimal utility, to solicit feedback for, at the next step. This
iterative process stops when the user is satisfied with the recommended itinerary.
Senjuti Basu Roy, Sihem Amer-Yahia, Ashish Chawla, Gautam Das and Cong Yu.
“Space Efficiency in Group Recommendations”,
VLDB journal 2010(Special Issue on Data Management and
Mining for Social Networks and Social Media).
In this journal, we explore the impact
of space constraints on maintaining per-user and pairwise item lists and
develop two complementary solutions that leverage shared user behavior
to maintain the efficiency of our group recommendation algorithms
space budget. The ?rst solution, behavior factoring, factors out user
agreements from disagreement lists, while
the second solution, partial materialization, selectively materializes a
subset of disagreement lists.
Ning Yan, Chengkai Li, Senjuti Basu Roy, Rakesh Ramegowda, Gautam Das.
“Facetedpedia: Enabling Query-Dependent Faceted Search for Wikipedia”. demo paper, Accepted in CIKM 2010.
Facetedpedia is a faceted search system that dynamically discovers
query-dependent faceted interfaces for Wikipedia search
Senjuti Basu Roy, Sihem Amer-Yahia, Ashish Chawla, Gautam Das and Cong Yu.
“Constructing and Exploring Composite Items”. Accepted in SIGMOD 2010.
We investigate how to assist user in online shopping by suggesting
packages to her with the item she is primarily interested to purchase
(central item) (e.g., the accessories of iPhone as a package during
iPhone purchase). In particular, we propose to build composite items
which associates a central item with a set of packages, formed by
satellite items, and help users explore them.
We define and study the problem of effective construction and
exploration of large sets of packages associated with the central item,
and design and implement efficient algorithms for solving the problem in
two stages: summarization, and visual effect optimization.
Ning Yan, Chengkai Li, Senjuti Basu Roy, Lekhendro Lisham and Gautam Das.
“Facetedpedia: Dynamic Generation of Query-Dependent Faceted Interfaces for Wikipedia”. Accepted in WWW 2010.
Wikipedia has become the largest encyclopedia ever created, with
close to 3 million English articles by far. We propose FacetedPedia, a faceted retrieval system for information
discovery and exploration over Wikipedia. Given the set of articles resulting from a
keyword search query, FacetedPedia dynamically and automatically discovers a
faceted interface for navigating and exploring the result articles.
Sihem Amer-Yahia, Senjuti Basu Roy, Ashish Chawla, Gautam Das and Cong Yu.
“Group Recommendation: Semantics and Efficiency”. VLDB 2009.
The need for group recommendation arises in many scenarios: a movie for
friends to watch
together, a travel destination for a family to spend a holiday
break, and so on. We consider the problem of group recommendation where
each group is formed by a set of users. The central idea is to aim at
returning items which are more likely to be liked by each member in the
We investigate several properties critical to group recommendation, and
design efficient algortihms.
Senjuti Basu Roy, Haidong Wang, Ullas Nambiar, Gautam Das and Mukesh Mohania.
“DynaCet: Building Dynamic Faceted Search Systems over Databases”. Demo paper, ICDE 2009.
In this demo, we present DynaCet - a domain independent system that provides effective
minimum-effort based dynamic faceted search solutions over
enterprise databases. At every step, Dynacet suggests facets
depending on the user response at previous step. Facets are
selected based on their ability to rapidly drill down to the
most promising tuples, as well as on the ability of the user to
provide desired values for them. The benefits provided include
faster access to information stored in databases while taking into
consideration the variance in user knowledge and preferences.
Senjuti Basu Roy and Gautam Das
“Top-k Implementation Techniques of Minimum Effort Driven Faceted Search For Databases”. Accepted in COMAD 2009.
We investigate opportunities to improve the performance of minimum effort driven
Faceted search techniques. The main idea is motivated by the early stopping
techniques used in the TA-family of algorithms for top-k computations.
Senjuti Basu Roy, Haidong Wang, Gautam Das, Ullas Nambiar and Mukesh Mohania
“Minimum-Effort Driven Dynamic Faceted Search in Structured Databases”. In Proc. CIKM 2008: 13-22.
We investigate how Faceted Search can be enabled over structured
databases for tuple search. Our Facet Selection techniques rely on
selecting facets dynamically based on user response using minimum effort
based techniques in principle.
Senjuti Basu Roy, Gautam Das and Sajal K. Das.
“Computing Best Coverage Path in the Presence
of Obstacles in a Sensor Field”. WADS 2007: 577-588.
We develop Computational geometry based algorithms and approximation algorithms
for Intrusion detection in Wireless Sensor Networks in the presence of obstacles. Details
can be found at http://dbxlab.uta.edu/sensor.htm.
Senjuti Basu Roy, Kausik Kayal and Jaya Sil.
“Edge Preserving Image Compression Technique using Adaptive Feed Forward Neural Network.” EuroIMSA 2005: 467-471.
The aim of this work is to develop an edge preserving image compression
technique using one hidden layer feed forward neural network of which
the neurons are determined adaptively. The network is trained using the
single processed image block. The work proposes initialization of
weights between the input and lone hidden layer by transforming pixel
coordinates of the input pattern block into its equivalent