Research Interest

My research interest lies in the area of data management and information retrieval, crowdsourcing, health informatics, and so on. My PhD research has focused in designing novel online data exploration techniques from underlying large data repositories (structured data and web), that extend existing ranked retrieval based query-answering paradigm.

Sincere thanks to all my collaborators and Microsoft Research, Multicare Health Systems, and Edifecs for their support on my research.

Online Tools:

  • Data Cleaning Efforts in Microsoft Academic Search Engine Data (collaborator: Microsoft Academic Search team)

    ALIAS:Identifying Duplicate authors in Microsoft Academic Search

  • Big Data in Health Informatics (collaborators: Ankur Teredesai, Multicare Health Systems)

    Predicting Risk-of-Readmission for Congestive Heart Failure Patients

  • Professional Activities:

  • Program Committee - Sigmod 2017, VLDB 2017, CIKM 2016, SIGKDD 2016, IJCAI 2016, EDBT 2016 (poster track), Sigmod 2015, AMIA 2015, KDD 2015, W3PHI-2015, CIKM 2015, ECML PKDD 2015 (PhD Track), Sigmod 2014, EDBT 2014, AMIA 2014, CIKM 2014, BDCA 2014, IIWeb 2014, GDM 2014, KDD Cup 2013, WebDB 2013, ICHI 2013, GDM 2012, IIWeb 2012, SIGMOD Travel Award Committee 2012, GDM 2013, WebDB 2013.

  • Journal Review - TKDE, Information Systems, TOIS, Distributed and Parallel Databases, VLDB Journal, Expert Systems, DAMI, DMKD, TKDD.

  • Program Co-Chair:

    ExploreDB 2016 (Co-located with Sigmod 2016).

  • Building a Predictive Readmissions Model

    Executive Insight Article on Building a predictive readmission model.

  • All that RaaS: saving lives and transforming healthcare economics

    MSR Blog on Azure for Research.

  • Data mining competition takes center stage in Chicago

    MSR Blog on KDD Cup 2013.

  • Organizing Committee - HRPCRM 2013 in Conjunction with ICHI 2013.

  • Advisory board - UW Certificate Program in Machine Learning.

  • Publications:

  • Davide Mottin, Alice Marascu, Senjuti Basu Roy, Themis Palpanas, Yannis Velegrakis, Gautam Das

    “A Holistic and Principled Approach for the Empty-Answer Problem", VLDB Journal 2016.

  • Kosetsu Ikeda, Atsuyuki Morishima, Habibur Rahman, Senjuti Basu Roy, Saravanan Thirumuruganathan, Sihem Amer-Yahia, and Gautam Das

    “Collaborative Crowdsourcing with Crowd4U”, to be presented at VLDB 2016.

  • Sihem Amer-Yahia, Senjuti Basu Roy.

    “Human Factors in Crowdsourcing (tutorial)”, to be presented at VLDB 2016.

  • Senjuti Basu Roy, Tina Eliassi-Rad, Spiros Papadimitriou.

    “Fast Best-Effort Search on Graphs with Multiple Attributes”, to appear ICDE 2016 (poster track).

  • Habibur Rahman, Senjuti Basu Roy, Saravanan Thirumuruganathan, Sihem Amer-Yahia, and Gautam Das.

    “Task Assignment Optimization in Collaborative Crowdsourcing”, ICDM 2015.

  • Senjuti Basu Roy, Ankur Teredesai, Kiyana Zolfaghar, Rui Liu, David Hazel, Stacey Newman, Albert Marinez.

    “Dynamic Hierarchical Classification for Patient Risk-of-Readmission”, ACM SIGKDD 2015 (industry and government track).

  • Habibur Rahman, Saravanan Thirumuruganathan, Senjuti Basu Roy, Sihem Amer-Yahia, Gautam Das.

    “Worker Skill Estimation in Team-Based Tasks”, VLDB 2015.

  • Senjuti Basu Roy,Ioanna Lykourentzou, Saravanan Thirumuruganathan, Sihem Amer-Yahia, Gautam Das.

    “Task-Assignment Optimization in Knowledge Intensive Crowdsourcing”, VLDB Journal 2015.

  • Senjuti Basu Roy, Laks V.S. Lakshmanan, Rui Liu.

    “From Group Recommendations to Group Formation”, Sigmod 2015.

  • Sihem Amer-Yahia, Senjuti Basu Roy.

    “From Complex Object Exploration to Complex Crowdsourcing”, tutorial at WWW 2015.

  • Senjuti Basu Roy, Sihem Amer-Yahia, Lucas Joppa.

    “ECCO- A Framework for Ecological Data Collection and Management Involving Human Workers”, EDBT 2015.

  • Sihem Amer-Yahia, Behrooz Omidvar-Tehrani, Senjuti Basu Roy, Nafiseh Shabib.

    “Group Recommendation with Temporal Affinities”, EDBT 2015.

  • Rui Liu, Kiyana Zolfaghar, Si-Chi Chin, Senjuti Basu Roy, Ankur Teredesai.

    “A Framework to Recommend Interventions for 30-Day Heart Failure Readmission Risk”, ICDM 2014.

  • Rui Liu, Raj Velamur Srinivasan, Kiyana Zolfaghar, Si-Chi Chin, Senjut Basu Roy, Aftab Hasan, David Hazel

    “Pathway-Finder: An Interactive Recommender System for Supporting Personalized Care Pathways”, ICDM 2014 (Demo).

  • Atsuyuki Morishima, Sihem Amer-Yahia, Senjuti Basu Roy.

    “Crowd4U: An Initiative for Constructing an Open Academic Crowdsourcing Network”, HCOMP 2014.

  • Senjuti Basu Roy, Tina Eliassi-Rad, Spiros Papadimitriou.

    “Fast Best-Effort Search on Graphs with Multiple Attributes”, TKDE 2014.

  • Vivek Rao, Kiyana Zolfaghar, Vani Mandava, Senjuti Basu Roy, Ankur Teredesai

    “Readmission Score as a Service(RaaS)”, Data Science for Social Good, in conjunction with KDD 2014.

    Gigaom Article, July 14, 2014

    DataConomy Article, July 15, 2014

  • Davide Mottin, Senjuti Basu Roy, Alice Marascu, Themis Palpanas, Yannis Velegrakis, Gautam Das

    “IQR: An Interactive Query Relaxation System for the Empty-Answer Problem” (demo), SIGMOD 2014.

  • Michael Pitts, Swapna Savvana, Senjuti Basu Roy, and Vani Mandava.

    “Author Disambiguation in Microsoft Academic Search Engine Dataset”, EDBT 2014.

  • Senjuti Basu Roy, Saravanan T., Sihem Amer-Yahia, Gautam Das, and Cong Yu.

    “Exploiting Group Recommendation Functions for Flexible Preferences”, ICDE 2014.

  • Senjuti Basu Roy, Si-chi Chin.

    “Prediction and Management of Readmission Risk for Congestive Heart Failure”, HealthInf 2014.

  • Si-chi Chin, Kiyana Zolfaghar, Senjuti Basu Roy, Ankur Teredesai, Paul Amoroso.

    “Divide-n-Discover : Discretization based Data Exploration Framework for Healthcare Analytics”, HealthInf 2014.

  • Kiyana Zolfaghar, Naren Meadem, Ankur Teredesai, Senjuti Basu Roy, Si-Chi Chin, Brian Muckian

    “Big Data Solutions for Predicting Risk-of-Readmission for Congestive Heart Failure Patients”, BigData'13, Big Data in Bioinformatics and Health Informatics .

  • Senjuti Basu Roy, Martine De Cock, Vani Mandava, Swapna Savvana, Brian Dalessandro, Claudia Perlich, William Cukierski, Ben Hamner

    “The Microsoft Academic Search Dataset and KDD Cup 2013”, KDDCup 2013.

  • Davide Mottin, Alice Marascu, Senjuti Basu Roy, Gautam Das, Themis Palpanas, Yannis Velegrakis

    “A Probabilistic Optimization Framework for the Empty-Answer Problem”, PVLDB 2013.

  • Senjuti Basu Roy, Ioanna Lykourentzou, Saravanan Thirumuruganathanz, Sihem AmerYahia, Gautam Das

    “Crowds, not Drones: Modeling Human Factors in Interactive Crowdsourcing”, DBCrowd, in conjunction with PVLDB 2013.

  • Naren Meadem, Nele Verbiest, Kiyana Zolfaghar, Jayshree Agarwal, Si-chi Chin, Senjuti Basu Roy, Ankur Teredesai

    “Exploring Preprocessing Techniques for Prediction of Risk of Readmission for Congestive Heart Failure Patients”, Data Mining in Healthcare, in conjunction with KDD 2013.

  • Kiyana Zolfaghar, Jayshree Agarwal, Nele Verbiest, Si-chi Chin, Senjuti Basu Roy, Ankur Teredesai

    “Risk-O-Meter: An Intelligent Healthcare Risk Calculator”, KDD 2013.

  • Priya Govindan, Tina Elaissi-Rad, Senjuti Basu Roy

    “Learning to Predict the Presence of Nodes in Anonymized Graphs”, WIN Workshop 2012.

  • Senjuti Basu Roy, Gautam Das, Sajal K. Das

    “Algorithms for Computing Best Coverage Paths in the Presence of Obstacles in a Sensor Field”, JDA 2012.

  • Senjuti Basu Roy, Kaushik Chakrabarti.

    “Location-Aware Type Ahead Search on Spatial Databases (submitted version)”, Sigmod 2011.

  • Senjuti Basu Roy, Sihem Amer-Yahia, Gautam Das and Cong Yu.

    “Interactive Itinerary Planning”, Accepted in ICDE 2011.

  • We consider the problem of suggesting a personalized itinerary to a user in an interactive manner. The iterative process works as follows: (1) the user provides feedback on POIs selected by the system, (2) the system recommends the best itineraries based on all feedback so far, and (3) the system further selects a new set of POIs, with optimal utility, to solicit feedback for, at the next step. This iterative process stops when the user is satisfied with the recommended itinerary.

  • Senjuti Basu Roy, Sihem Amer-Yahia, Ashish Chawla, Gautam Das and Cong Yu.

    “Space Efficiency in Group Recommendations”, VLDB journal 2010(Special Issue on Data Management and Mining for Social Networks and Social Media).

    In this journal, we explore the impact of space constraints on maintaining per-user and pairwise item lists and develop two complementary solutions that leverage shared user behavior to maintain the efficiency of our group recommendation algorithms within a space budget. The ?rst solution, behavior factoring, factors out user agreements from disagreement lists, while the second solution, partial materialization, selectively materializes a subset of disagreement lists.

  • Ning Yan, Chengkai Li, Senjuti Basu Roy, Rakesh Ramegowda, Gautam Das.

    “Facetedpedia: Enabling Query-Dependent Faceted Search for Wikipedia”. demo paper, Accepted in CIKM 2010.

    Facetedpedia is a faceted search system that dynamically discovers query-dependent faceted interfaces for Wikipedia search result articles.

  • Senjuti Basu Roy, Sihem Amer-Yahia, Ashish Chawla, Gautam Das and Cong Yu.

    “Constructing and Exploring Composite Items”. Accepted in SIGMOD 2010.

    We investigate how to assist user in online shopping by suggesting packages to her with the item she is primarily interested to purchase (central item) (e.g., the accessories of iPhone as a package during iPhone purchase). In particular, we propose to build composite items which associates a central item with a set of packages, formed by satellite items, and help users explore them. We define and study the problem of effective construction and exploration of large sets of packages associated with the central item, and design and implement efficient algorithms for solving the problem in two stages: summarization, and visual effect optimization.

  • Ning Yan, Chengkai Li, Senjuti Basu Roy, Lekhendro Lisham and Gautam Das.

    “Facetedpedia: Dynamic Generation of Query-Dependent Faceted Interfaces for Wikipedia”. Accepted in WWW 2010.

    Wikipedia has become the largest encyclopedia ever created, with close to 3 million English articles by far. We propose FacetedPedia, a faceted retrieval system for information discovery and exploration over Wikipedia. Given the set of articles resulting from a keyword search query, FacetedPedia dynamically and automatically discovers a faceted interface for navigating and exploring the result articles.

  • Sihem Amer-Yahia, Senjuti Basu Roy, Ashish Chawla, Gautam Das and Cong Yu.

    “Group Recommendation: Semantics and Efficiency”. VLDB 2009.

    The need for group recommendation arises in many scenarios: a movie for friends to watch together, a travel destination for a family to spend a holiday break, and so on. We consider the problem of group recommendation where each group is formed by a set of users. The central idea is to aim at returning items which are more likely to be liked by each member in the group. We investigate several properties critical to group recommendation, and design efficient algortihms.

  • Senjuti Basu Roy, Haidong Wang, Ullas Nambiar, Gautam Das and Mukesh Mohania.

    “DynaCet: Building Dynamic Faceted Search Systems over Databases”. Demo paper, ICDE 2009.

    In this demo, we present DynaCet - a domain independent system that provides effective minimum-effort based dynamic faceted search solutions over enterprise databases. At every step, Dynacet suggests facets depending on the user response at previous step. Facets are selected based on their ability to rapidly drill down to the most promising tuples, as well as on the ability of the user to provide desired values for them. The benefits provided include faster access to information stored in databases while taking into consideration the variance in user knowledge and preferences.

  • Senjuti Basu Roy and Gautam Das

    “Top-k Implementation Techniques of Minimum Effort Driven Faceted Search For Databases”. Accepted in COMAD 2009.

    We investigate opportunities to improve the performance of minimum effort driven Faceted search techniques. The main idea is motivated by the early stopping techniques used in the TA-family of algorithms for top-k computations.

  • Senjuti Basu Roy, Haidong Wang, Gautam Das, Ullas Nambiar and Mukesh Mohania

    “Minimum-Effort Driven Dynamic Faceted Search in Structured Databases”. In Proc. CIKM 2008: 13-22.

    We investigate how Faceted Search can be enabled over structured databases for tuple search. Our Facet Selection techniques rely on selecting facets dynamically based on user response using minimum effort based techniques in principle.

  • Senjuti Basu Roy, Gautam Das and Sajal K. Das.

    “Computing Best Coverage Path in the Presence of Obstacles in a Sensor Field”. WADS 2007: 577-588.

    We develop Computational geometry based algorithms and approximation algorithms for Intrusion detection in Wireless Sensor Networks in the presence of obstacles. Details can be found at

  • Senjuti Basu Roy, Kausik Kayal and Jaya Sil.

    “Edge Preserving Image Compression Technique using Adaptive Feed Forward Neural Network.” EuroIMSA 2005: 467-471.

    The aim of this work is to develop an edge preserving image compression technique using one hidden layer feed forward neural network of which the neurons are determined adaptively. The network is trained using the single processed image block. The work proposes initialization of weights between the input and lone hidden layer by transforming pixel coordinates of the input pattern block into its equivalent one-dimensional representation.