Large language models have proliferated across multiple domains in as short period of time. There is however hesitation in the medical and healthcare domain towards their adoption because of issues like factuality, coherence, and hallucinations. Give the high stakes nature of healthcare, many researchers have even cautioned against its usage until these issues are resolved. The key to the implementation and deployment of LLMs in healthcare is to make these models trustworthy, transparent (as much possible) and explainable. In this paper we describe the key elements in creating reliable, trustworthy, and unbiased models as a necessary condition for their adoption in healthcare. Specifically we focus on the quantification, validation, and mitigation of hallucinations in the context in healthcare. Lastly, we discuss how the future of LLMs in healthcare may look like.
Surgical Clinics
Show Your Work: Responsible Model Reporting in Health Care Artificial Intelligence.
Standardized and thorough model reporting is an integral component in the development and deployment of machine learning models in health care. Model reporting includes sharing multiple model performance metrics and incorporating metadata to provide the necessary context for model evaluation. Thorough model reporting addresses common concerns about artificial intelligence in health care including model explainability, transparency, fairness, and generalizability. All stages in the model development lifecycle, from initial design to data capture to model deployment, can be communicated openly to stakeholders with responsible model reporting. Physician involvement throughout these processes can ensure clinical concerns and potential consequences are considered.
ICHI
Validation of a Hospital Digital Twin with Machine Learning
Muhammad Aurangzeb
Ahmad, Vijay
Chickarmane, Farinaz Sabz Ali
Pour, and
2 more authors
Recently there has been a surge of interest in developing Digital Twins of process flows in healthcare to better understand bottlenecks and areas of improvement. A key challenge is in the validation process. We describe a work in progress for a digital twin using an agent based simulation model for determining bed turnaround time for patients in hospitals. We employ a strategy using machine learning for validating the model and implementing sensitivity analysis.
2022
ArXiV
Machine Learning for Deferral of Care Prediction
Muhammad Aurangzeb
Ahmad, Raafia
Ahmed, Dr Steve
Overman, and
3 more authors
Care deferral is the phenomenon where patients defer or are unable to receive healthcare services, such as seeing doctors, medications or planned surgery. Care deferral can be the result of patient decisions, service availability, service limitations, or restrictions due to cost. Continual care deferral in populations may lead to a decline in population health and compound health issues leading to higher social and financial costs in the long term. [1]. Consequently, identification of patients who may be at risk of deferring care is important towards improving population health and reducing care total costs. Additionally, minority and vulnerable populations are at a greater risk of care deferral due to socioeconomic factors. In this paper, we (a) address the problem of predicting care deferral for well-care visits; (b) observe that social determinants of health are relevant explanatory factors towards predicting care deferral, and (c) compute how fair the models are with respect to demographics, socioeconomic factors and selected comorbidities. Many health systems currently use rules-based techniques to retroactively identify patients who previously deferred care. The objective of this model is to identify patients at risk of deferring care and allow the health system to prevent care deferrals through direct outreach or social determinant mediation.
2021
IEEE ICHI
Machine learning approaches for patient state prediction in pediatric ICUs
Muhammad Aurangzeb
Ahmad, Eduardo Antonio Trujillo
Rivera, Pollack MD
Murray, and
3 more authors
In 2021 IEEE 9th International Conference on Healthcare Informatics (ICHI), 2021
We consider the problem of characterizing and predicting the condition of pediatric patients in intensive care units (ICUs). This population is often typified by rapid changes in patient conditions which necessitate predictions that can capture transition in patient states. While the assessment of patient’s condition is currently usually done using domain based scoring systems, we employ machine learning models for predicting the state of the pediatric patient. Additionally, we explore how model explainability could affect the usage of predictive models in a real world settings.
IEEE ICHI
Interpretable phenotyping for electronic health records
Christine
Allen, Juhua
Hu, Vikas
Kumar, and
2 more authors
In 2021 IEEE 9th International Conference on Healthcare Informatics (ICHI), 2021
Datasets from Electronic Health Records (EHRs) are increasingly large and complex, creating challenges in their use for predictive modeling. The two major challenges are large-scale and high-dimensionality. One of the common way to address the large-scale challenge is through use of data phenotypes: clinically relevant characteristic groupings that can be expressed as logical queries (e.g., “senior patients with diabetes”). With the increasing use of machine learning across the continuum of care, phenotypes play an important role in modeling for population management, clinical trials, observational and interventional research, and quality measures. Yet, phenotype interpretation can often be difficult and require post-hoc clarifications from experienced clinicians. For example, detailed analysis may be needed to find that all patients in a a phenotype are diabetic seniors with complications from previous surgery. Moreover, the high-dimensionality problem is often addressed either separately or simultaneously with phenotyping by dimension reduction methods that may further hinder interpretability. In this paper, we introduce the notion of interpretable data phenotypes generated by an unsupervised learning technique. Methods are designed to disambiguate relative feature memberships, thus facilitating general clinical validation, and alleviating the problem of high-dimensionality. The empirical study applies the proposed unsupervised interpretable phenotyping method to a real world healthcare dataset (MIMIC), then uses hospital length of stay as a reference prediction task. The results demonstrate that the proposed method produces phenotypes with improved interpretability and without diminishing the quality of prediction results.
Annals of Surgery
A surgeon’s guide to machine learning
Daniel T
Lammers, Carly M
Eckert, Muhammad A
Ahmad, and
2 more authors
Machine learning (ML) represents a collection of advanced data modeling techniques beyond the traditional statistical models and tests with which most clinicians are familiar. While a subset of artificial intelligence, ML is far from the science fiction impression frequently associated with AI. At its most basic, ML is about pattern finding, sometimes with complex algorithms. The advanced mathematical modeling of ML is seeing expanding use throughout healthcare and increasingly in the day-to-day practice of surgeons. As with any new technique or technology, a basic understanding of principles, applications, and limitations are essential for appropriate implementation. This primer is intended to provide the surgical reader an accelerated introduction to applied ML and considerations in potential research applications or the review of publications, including ML techniques.
IEEE ICHI
Machine learning approaches for pressure injury prediction
Muhammad Aurangzeb
Ahmad, Barrett
Larson, Steve
Overman, and
5 more authors
In 2021 IEEE 9th International Conference on Healthcare Informatics (ICHI), 2021
Pressure Injuries are localized damages to the skin caused by sustained pressure. It is a common yet preventable disease affecting millions of patients. While there are multiple scales to determine if a patient has pressure injury, these methods suffer from high inter-rater subjectivity. To address this problem we create predictive models for pressure injury using Centers for Medicare Medicaid Services claims data. The models show relatively good predictive performance, we also explore aspects of the model where they will be deployed in a real world clinical settings.
ICHI 2021
Fairness in healthcare AI
Muhammad Aurangzeb
Ahmad, Carly
Eckert, Christine
Allen, and
3 more authors
In 2021 IEEE 9th International Conference on Healthcare Informatics (ICHI), 2021
The issue of bias and fairness in healthcare has been around for centuries. With the integration of AI in healthcare the potential to discriminate and perpetuate unfair and biased practices in healthcare increases many folds. The tutorial focuses on the challenges, requirements and opportunities in the area of fairness in healthcare AI and the various nuances associated with it. The problem healthcare as a multi-faceted systems level problem that necessitates careful consideration of different notions of fairness in healthcare to corresponding concepts in machine learning is elucidated via different real world examples.
ArXiV
Machine Learning Approaches for Type 2 Diabetes Prediction and Care Management
Aloysius
Lim, Ashish
Singh, Jody
Chiam, and
4 more authors
Prediction of diabetes and its various complications has been studied in a number of settings, but a comprehensive overview of problem setting for diabetes prediction and care management has not been addressed in the literature. In this document we seek to remedy this omission in literature with an encompassing overview of diabetes complication prediction as well as situating this problem in the context of real world healthcare management. We illustrate various problems encountered in real world clinical scenarios via our own experience with building and deploying such models. In this manuscript we illustrate a Machine Learning (ML) framework for addressing the problem of predicting Type 2 Diabetes Mellitus (T2DM) together with a solution for risk stratification, intervention and management. These ML models align with how physicians think about disease management and mitigation, which comprises these four steps: Identify, Stratify, Engage, Measure.
ArXiV
Assessing fairness in classification parity of machine learning models in healthcare
Ming
Yuan, Vikas
Kumar, Muhammad Aurangzeb
Ahmad, and
1 more author
Fairness in AI and machine learning systems has become a fundamental problem in the accountability of AI systems. While the need for accountability of AI models is near ubiquitous, healthcare in particular is a challenging field where accountability of such systems takes upon additional importance, as decisions in healthcare can have life altering consequences. In this paper we present preliminary results on fairness in the context of classification parity in healthcare. We also present some exploratory methods to improve fairness and choosing appropriate classification algorithms in the context of healthcare.
ArXiV
Emergency Department Optimization and Load Prediction in Hospitals
Karthik K
Padthe, Vikas
Kumar, Carly M
Eckert, and
4 more authors
Over the past several years, across the globe, there has been an increase in people seeking care in emergency departments (EDs). ED resources, including nurse staffing, are strained by such increases in patient volume. Accurate forecasting of incoming patient volume in emergency departments (ED) is crucial for efficient utilization and allocation of ED resources. Working with a suburban ED in the Pacific Northwest, we developed a tool powered by machine learning models, to forecast ED arrivals and ED patient volume to assist end-users, such as ED nurses, in resource allocation. In this paper, we discuss the results from our predictive models, the challenges, and the learnings from users’ experiences with the tool in active clinical deployment in a real world setting.
KDD
Software as a medical device: regulating AI in healthcare via responsible AI
Muhammad Aurangzeb
Ahmad, Steve
Overman, Christine
Allen, and
3 more authors
In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021
With the increased adoption of AI in healthcare, there is a growing recognition and demand to regulate AI in healthcare to avoid potential harm and unfair bias against vulnerable populations. Around a hundred governmental bodies and commissions as well as leaders in the tech sector have proposed principles to create responsible AI systems. However, most of these proposals are short on specifics which has led to charges of ethics washing. In this tutorial we offer a guide to help navigate through complex governmental regulations and explain the various constituent practical elements of a responsible AI system in healthcare in the light of proposed regulations. Additionally, we breakdown and emphasize that the recommendations from regulatory bodies like FDA or the EU are necessary but not sufficient elements of creating a responsible AI system. We elucidate how regulations and guidelines often focus on epistemic concerns to the detriment of practical concerns e.g., requirement for fairness without explicating what fairness constitutes for a use case. FDA’s Software as a medical device document and EU’s GDPR among other AI governance documents talk about the need for implementing sufficiently good machine learning practices. In this tutorial we elucidate what that would mean from a practical perspective for real world use cases in healthcare throughout the machine learning cycle i.e., Data Management, Data Specification, Feature Engineering, Model Evaluation, Model Specification, Model Explainability, Model Fairness, Reproducibility, checks for data leakage and model leakage. We note that conceptualizing responsible AI as a process rather than an end goal accords well with how AI systems are used in practice. We also discuss how a domain centric stakeholder perspective translates into balancing requirements for multiple competing optimization criteria.
Book Chapter
Survey of explainable machine learning with visual and granular methods beyond quasi-explanations
Boris
Kovalerchuk, Muhammad Aurangzeb
Ahmad, and Ankur
Teredesai
Interpretable artificial intelligence: A perspective of granular computing, 2021
This chapter surveys and analyses visual methods of explainability of Machine Learning (ML) approaches with focus on moving from quasi-explanations that dominate in ML to actual domain-specific explanation supported by granular visuals. The importance of visual and granular methods to increase the interpretability and validity of the ML model has grown in recent years. Visuals have an appeal to human perception, which other methods do not. ML interpretation is fundamentally a human activity, not a machine activity. Thus, visual methods are more readily interpretable. Visual granularity is a natural way for efficient ML explanation. Understanding complex causal reasoning can be beyond human abilities without “downgrading” it to human perceptual and cognitive limits. The visual exploration of multidimensional data at different levels of granularity for knowledge discovery is a long-standing research focus. While multiple efficient methods for visual representation of high-dimensional data exist, the loss of interpretable information, occlusion, and clutter continue to be a challenge, which lead to quasi-explanations. This chapter starts with the motivation and the definitions of different forms of explainability and how these concepts and information granularity can integrate in ML. The chapter focuses on a clear distinction between quasi-explanations and actual domain specific explanations, as well as between potentially explainable and an actually explained ML model that are critically important for the further progress of the ML explainability domain. We discuss foundations of interpretability, overview visual interpretability and present several types of methods to visualize the ML models. Next, we present methods of visual discovery of ML models, with the focus on interpretable models, based on the recently introduced concept of General Line Coordinates (GLC). This family of methods take the critical step of creating visual explanations that are not merely quasi-explanations but are also domain specific visual explanations while these methods themselves are domain-agnostic. The chapter includes results on theoretical limits to preserve n-D distances in lower dimensions, based on the Johnson-Lindenstrauss lemma, point-to-point and point-to-graph GLC approaches, and real-world case studies. The chapter also covers traditional visual methods for understanding multiple ML models, which include deep learning and time series models. We illustrate that many of these methods are quasi-explanations and need further enhancement to become actual domain specific explanations. The chapter concludes with outlining open problems and current research frontiers.
2020
FAccT
Fairness, accountability, transparency in AI at scale: Lessons from national programs
Muhammad Aurangzeb
Ahmad, Ankur
Teredesai, and Carly
Eckert
In Proceedings of the 2020 conference on fairness, accountability, and transparency, 2020
The panel aims to elucidate how different national govenmental programs are implementing accountability of machine learning systems in healthcare and how accountability is operationlized in different cultural settings in legislation, policy and deployment. We have representatives from three different govenments, UAE, Singapore and Maldives who will discuss what accountability of AI and machine learning means in their contexts and use cases. We hope to have a fruitful conversation around FAT ML as it is operationalized ccross cultures, national boundries and legislative constraints.
KDD
Physics inspired models in artificial intelligence
Muhammad Aurangzeb
Ahmad, and Şener
Özönder
In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020
Ideas originating in physics have informed progress in artificial intelligence and machine learning for many decades. However the pedigree of many such ideas is oft neglected in the Computer Science community. The tutorial focuses on current and past ideas from physics that have helped in furthering AI and machine learning. Recent advances in physics inspired ideas in AI are also explored especially how insights from physics may hold the promise of opening the black box of deep learning. Lastly, current and future trends in this area and outlines of a research agenda on how physics-inspired models can benefit AI machine learning is given.
KDD
Fairness in machine learning for healthcare
Muhammad Aurangzeb
Ahmad, Arpit
Patel, Carly
Eckert, and
2 more authors
In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020
The issue of bias and fairness in healthcare has been around for centuries. With the integration of AI in healthcare the potential to discriminate and perpetuate unfair and biased practices in healthcare increases many folds The tutorial focuses on the challenges, requirements and opportunities in the area of fairness in healthcare AI and the various nuances associated with it. The problem healthcare as a multi-faceted systems level problem that necessitates careful of different notions of fairness in healthcare to corresponding concepts in machine learning is elucidated via different real world examples.
2019
IJCAI
The challenge of imputation in explainable artificial intelligence models
Muhammad Aurangzeb
Ahmad, Carly
Eckert, and Ankur
Teredesai
Explainable models in Artificial Intelligence are often employed to ensure transparency and accountability of AI systems. The fidelity of the explanations are dependent upon the algorithms used as well as on the fidelity of the data. Many real world datasets have missing values that can greatly influence explanation fidelity. The standard way to deal with such scenarios is imputation. This can, however, lead to situations where the imputed values may correspond to a setting which refer to counterfactuals. Acting on explanations from AI models with imputed values may lead to unsafe outcomes. In this paper, we explore different settings where AI models with imputation can be problematic and describe ways to address such scenarios.
ACI
Development and prospective validation of a machine learning-based risk of readmission model in a large military hospital
Carly
Eckert, Neris
Nieves-Robbins, Elena
Spieker, and
8 more authors
Thirty-day hospital readmissions are a quality metric for health care systems. Predictive models aim to identify patients likely to readmit to more effectively target preventive strategies. Many risk of readmission models have been developed on retrospective data, but prospective validation of readmission models is rare. To the best of our knowledge, none of these developed models have been evaluated or prospectively validated in a military hospital. The objectives of this study are to demonstrate the development and prospective validation of machine learning (ML) risk of readmission models to be utilized by clinical staff at a military medical facility and demonstrate the collaboration between the U.S. Department of Defense’s integrated health care system and a private company. We evaluated multiple ML algorithms to develop a predictive model for 30-day readmissions using data from a retrospective cohort of all-cause inpatient readmissions at Madigan Army Medical Center (MAMC).
2018
ACM-BCB
Interpretable machine learning in healthcare
Muhammad Aurangzeb
Ahmad, Carly
Eckert, and Ankur
Teredesai
Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics, 2018
This tutorial extensively covers the definitions, nuances, challenges, and requirements for the design of interpretable and explainable machine learning models and systems in healthcare. We discuss many uses in which interpretable machine learning models are needed in healthcare and how they should be deployed. Additionally, we explore the landscape of recent advances to address the challenges model interpretability in healthcare and also describe how one would go about choosing the right interpretable machine learnig algorithm for a given problem in healthcare.
Thorax
S45 predicting likelihood of emergency department admission prior to triage: utilising machine learning within a COPD cohort
C
Eckert, M
Ahmad, K
Zolfaghar, and
3 more authors
Acute exacerbation of COPD is one of the commonest reasons for emergency department (ED) attendance and admission. Optimising ED patient flow requires early identification of patients needing inpatient care to initiate the admissions process, improve bed management and reduce ED length of stay. Stratifying COPD patients by likelihood of admission and length of stay at triage would potentially facilitate these and other operational efficiencies, such as targeting early supported discharge team review and COPD care bundle. We propose a machine learning (ML) based approach to predicting the need for admission from ED amongst a cohort of COPD patients.
IAAI
Death vs. data science: predicting end of life
Muhammad
Ahmad, Carly
Eckert, Greg
McKelvey, and
3 more authors
In Proceedings of the AAAI Conference on Artificial Intelligence, 2018
Death is an inevitable part of life and while it cannot be delayed indefinitely it is possible to predict with some certainty when the health of a person is going to deteriorate. In this paper, we predict risk of mortality for patients from two large hospital systems in the Pacific Northwest. Using medical claims and electronic medical records (EMR) data we greatly improve prediction for risk of mortality and explore machine learning models with explanations for end of life predictions. The insights that are derived from the predictions can then be used to improve the quality of patient care towards the end of life.
SIAM
Automatic detection of excess healthcare spending and cost variation in ACOs
Eric
Liu, Muhammad A
Ahmad, Carly
Eckert, and
5 more authors
There are more than nine hundred Accountable Care Organizations (ACOs) in the United States, both in the public and private sector, serving millions of patients across the country in a process to transition from fee-for-service to a value-based-care model for healthcare delivery in an effort to contain expenditures. Identifying fraud, waste, and abuse resulting in superfluous expenditures associated with care delivery is central to the success of ACOs and for making the cost of healthcare sustainable. In theory, such expenditures should be easily identifiable with large amounts of historical data. However, to the best of our knowledge there is no data mining framework that systematically addresses the problem of identifying unwarranted variation in expenditures on high dimensional claims data using unsupervised machine learning techniques. In this paper we propose methods to uncover unwarranted variation in healthcare spending by automatically extracting reference groups of peerproviders from the data and then detecting high cost outliers within these groups. We demonstrate the utility of our proposed framework on datasets from a large ACO in the United States to successfully identify unwarranted variation in therapeutic procedures even in low cost claims that had previously gone unnoticed.
2016
CHI
After death: big data and the promise of resurrection by proxy
Muhammad Aurangzeb
Ahmad
In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems, 2016
With the advent of Big Data and the possibility of capturing massive personal data it is possible to simulate some aspects of a person’s personality. The imitation game is based on the observation that it is possible to convince a person of a fake identity if sufficient information is available about the identity being faked. Imitation is however not limited to a person who is alive but also a person who is not alive; the question of simulating a deceased person for the purpose of having the simulation interact with a person is addressed. Various challenges and background considerations for such an endeavor are discussed. The goal of the paper is to open up discussion on this subject and examine its feasibility.
2014
Book
Predicting real world behaviors from virtual world data
Muhammad Aurangzeb
Ahmad, Cuihua
Shen, Jaideep
Srivastava, and
1 more author
There is a growing body of literature that focuses on the similarities and differences between how people behave in the offline world vs. how they behave in these virtual environments. Data mining has aided in discovering interesting insights with respect to how people behave in these virtual environments. The book addresses prediction, mining and analysis of offline characteristics and behaviors from online data and vice versa. Each chapter will focus on a different aspect of virtual worlds to real world prediction e.g., demographics, personality, location, etc.
WSDM
Behavioral data mining and network analysis in massive online games
Muhammad Aurangzeb
Ahmad, and Jaideep
Srivastava
In Proceedings of the 7th ACM international conference on web search and data mining, 2014
The last decade has been characterized by an explosion of social media in a variety of forms. Since the data is captured in digital form it has become possible for the first time study human behavior at a massive scale. Not only is it possible to address traditional questions in the social sciences regarding collective dynamics of human behaviors but it is also possible to study new types of human behaviors which have arisen as a result of usage of new mediums like twitter, YouTube, Facebook, one games etc. Each of these mediums has its respective limitations and affordances. Out of all these mediums the most complex and data rich medium is that of Massive Online Games (MOGs). MOGs refer to massive online persistent environments (World of Warcraft, EVE Online, EverQuest etc) shared by millions of people . In general these environments are characterized by a rich array of activities and social interactions with a wide array of behaviors e.g., cooperation, trade, quest, deceit, mentoring etc. Such environments allow one to study human behavior at a level of granularity where it was not possible to do so previously. Given the challenges associated with analyzing this type of data traditional techniques in data mining and social network analysis have to be extended with insights from the social sciences. The tutorial will cover predictive and generative models in the study of MOGs. Additionally we will cover some SNA techniques which are more appropriate for MOGs given the multi-dimensionality of the data (P*/ERGM Models, IR Based Network Analysis, Hypergrah based Techniques, Coextensive Social Networks etc). We also describe the various ways in which MOGs exhibit similarities to the real world e.g., economic behaviors, clandestine behaviors, mentoring etc).
Encyclopedia
Dark Sides of Social Networking
Brian C
Keegan, and Muhammad Aurangzeb
Ahmad
In Encyclopedia of Social Network Analysis and Mining, 2014
Availability of massive amounts of data about the social and behavioral characteristics of a large subset of the population opens up new possibilities that allow researchers to not only observe people’s behaviors in a natural, rather than artificial, environment but also conduct predictive modeling of those behaviors and characteristics. Thus an emerging area of study is the prediction of real world characteristics and behaviors of people in the offline or “real” world based on their behaviors in the online virtual worlds. We explore the challenges and opportunities in the emerging field of prediction of real world characteristics based on people’s virtual world characteristics, i.e., what are the major paradigms in this field, what are the limitations in current predictive models, limitations in terms of generalizability, etc. Lastly, we also address the future challenges and avenues of research in this area.
2013
Patent
Systems and methods for programming implantable medical devices
Embodiments of the invention are directed to systems and methods for programming implantable medical devices, amongst other things. In an embodiment, the invention includes a method of programming an implantable medical device. The method can include gathering parameter data representing a set of previously programmed parameter values from a plurality of implanted medical devices. The method can further include performing association analysis on the parameter data to form a set of association rules. The method can further include suggesting parameter choices to a system user regarding a specific patient based on the set of association rules. In an embodiment, the invention can include a medical system including a server configured to perform association analysis on a set of data representing previously programmed parameter values from a plurality of implanted medical devices to derive a set of association rules. Other embodiments are also included herein.
Journal
Robust features of trust in social networks
Zoheb Hassan
Borbora, Muhammad Aurangzeb
Ahmad, Jehwan
Oh, and
3 more authors
We identify robust features of trust in social networks; these are features which are discriminating yet uncorrelated and can potentially be used to predict trust formation between agents in other social networks. The features we investigate are based on an agent’s individual properties as well as those based on the agent’s location within the network. In addition, we analyze features which take into account the agent’s participation in other social interactions within the same network. Three datasets were used in our study—Sony Online Entertainment’s EverQuest II game dataset, a large email network with sentiments and the publicly available Epinions dataset. The first dataset captures activities from a complex persistent game environment characterized by several types of in-game social interactions, whereas the second dataset has anonymized information about people’s email and instant messaging communication. We formulate the problem as one of the link predictions, intranetwork and internetwork, in social networks. We first build machine learning models and then perform an ablation study to identify robust features of trust. Results indicate that shared skills and interests between two agents, their level of activity and level of expertise are the top three predictors of trust in a social network. Furthermore, if only network topology information were available, then an agent’s propensity to connect or communicate, the cosine similarity between two agents and shortest distance between them are found to be the top three predictors of trust. In our study, we have identified the generic characteristics of the networks used as well as the features investigated so that they can be used as guidelines for studying the problem of predicting trust formation in other social networks.
ASONAM
Guilt by association? Network based propagation approaches for gold farmer detection
Muhammad Aurangzeb
Ahmad, Brian
Keegan, Atanu
Roy, and
3 more authors
In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2013
The term ’Gold Farmer’ refers to a class of players in massive online games (MOGs) involved in a set of interrelated activities which are considered to be deviant activities. Consequently these gold farmers are actively banned by game administrators. The task of gold farmer detection is to identify gold farmers in a population of players but just like other clandestine actors they not labeled as such. In this paper the problem of extending the label of gold farmers to players which are not labeled as such is considered. Two main classes of techniques are described and evaluated: Network-based approaches and similarity based approaches. It is also explored how dividing the problem further by relabeling the data based on behavioral patterns can further improve the results
The literature on social networks has explored a large number of classes of human networks e.g., citation networks, online social networks, co-authorship networks, event networks, co-location networks, genealogy networks etc. The coverage of networks in the literature is however not uniform, one such neglected area is narrative networks. These networks constitute a class of networks which are formed by chains of narrations from one person to another person, these networks can even span generations. One of the reasons that the field has been neglected is because of the lack of datasets for analysis. In this paper an outline for the network science of narrative networks is given and a historical narrative network is constructed from a 9th century Middle Eastern source book. This network is used for network analysis and to illustrate the major problems in this field and lastly a set of research questions are proposed that should be addressed by researchers.
Patent
Automatic detection of deviant players in massively multiplayer online role playing games (mmogs)
Dmitri
Williams, Muhammad Aurangzeb
Ahmad, Jaideep
Srivastava, and
2 more authors
Gold farming refers to the illicit practice of gathering and selling virtual goods in online games for real money. Although around one million gold farmers engage in gold farming related activities, to date a systematic study of identifying gold farmers has not been done. Here data is used from the Massively Multiplayer Online Role Playing Game (MMOG) EverQuest II to identify gold farmers. This is posed as a binary classification problem and a set of features is identified for classification purposes. Given the cost associated with investigating gold farmers, criteria are also given for evaluating gold farming detection techniques, and suggestions provided for future testing and evaluation techniques.
Journal
Trust, distrust and lack of confidence of users in online social media-sharing communities
With the proliferation of online communities, the deployment of knowledge, skills, experiences and user generated content are generally facilitated among participant users. In online social media-sharing communities, the success of social interactions for content sharing and dissemination among completely unknown users depends on ’trust’. Therefore, providing a satisfactory trust model to evaluate the quality of content and to recommend personalized trustworthy content providers is vital for a successful online social media-sharing community. Current research on trust prediction strongly relies on a web of trust, which is directly collected from users. However, the web of trust is not always available in online communities and, even when it is available, it is often too sparse to accurately predict the trust value between two unacquainted people. Moreover, most of the extant trust research studies have not paid attention to the importance of distrust, even though distrust is a distinct concept from trust with different impacts on behavior. In this paper, we adopt the concepts of ’trust’, ’distrust’, and ’lack of confidence’ in social relationships and propose a novel unifying framework to predict trust and distrust as well as to distinguish the confidently-made decisions (trust or distrust) from lack of confidence without a web of trust. This approach uses interaction histories among users including rating data that is available and much denser than explicit trust/distrust statements (i.e. a web of trust).
2012
SIGSPATIAL
Streaming driving behavior data
Anas
Basalamah, Muhammad Aurangzeb
Ahmad, Mohamed
Elidrisi, and
2 more authors
In Proceedings of the 3rd ACM SIGSPATIAL International Workshop on GeoStreaming, 2012
People’s driving behavior patterns have significant effects on the modern transportation systems. Public safety, traffic congestions, driving convenience are all affected by the driver’s behaviors on the road. With the recent developments in data communications, streaming and mining technologies, developing driving behavior monitoring systems that can stream driving behavioral patterns, discover and predict traffic violations can be made possible. In this paper, we describe our vision towards the realization of such technologies from the view points of data steaming and analysis.
Social Computing
The ones that got away: False negative estimation based approaches for gold farmer detection
Atanu
Roy, Muhammad Aurangzeb
Ahmad, Chandrima
Sarkar, and
2 more authors
In 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing, 2012
The problem of gold farmer detection is the problem of detecting players with illicit behaviors in massively multiplayer online games (MMOs) and has been studied extensively. Detecting gold farmers or other deviant actors in social systems is traditionally understood as a binary classification problem, but the issue of false negatives is significant for administrators as residual actors can serve as the backbone for subsequent clandestine organizing. In this paper we address this gap in the literature by addressing the problem of false negative estimation for gold farmers in MMOs by employing the capture-recapture technique for false negative estimation and combine it with graph clustering techniques to determine "hidden" gold farmers in social networks of farmers and normal players. This paper redefines the problem of gold farming as a false negative estimation problem and estimates the gold farmers in co-extensive MMO networks, previously undetected by the game administrators. It also identifies these undetected gold farmers using graph partitioning techniques and applies network data to address rare class classification problem. The experiments in this research found 53% gold farmers who were previously undetected by the game administrators.
Trust is a ubiquitous phenomenon in human societies. Computational trust refers to the mediation of trust via a computational infrastructure. It has been studied in a variety of contexts e.g., peer-to-peer systems, multi-agent systems, recommendation systems etc. While this is an active area of research, the types of questions that have been explored in this eld has been limited mainly because of limitations in the types of datasets which are available to researchers. In this thesis questions related to trust in complex social environments represented by Massively Multiplayer Online Games (MMOGs) are explored. The main emphasis is that trust is a multi-level phenomenon both in terms of how it operates at multiple levels of network granularities and how trust relates to other social phenomenon like homophily, expertise, mentoring, clandestine behaviors etc. Social contexts and social environments aect not just the qualitative aspects of trust but this phenomenon is also manifested with respect to the network and structural signatures of trust network Additionally trust is also explored in the context of predictive tasks: Previously described prediction tasks like link prediction are studied in the context of trust within the context of the link prediction family of problems: Link formation, link breakage, change in links etc. Additionally we dene and explore new trust-related prediction problems i.e., trust propensity prediction, trust prediction across networks which can be generalized to the inter-network link prediction problem and success prediction based on using network measures of a person’s social capital as a proxy.