Background: Cancer remains one of the foremost global causes of mortality, with nearly 10 million deaths recorded by 2020. As incidence rates rise, there is a growing interest in leveraging machine learning (ML) to enhance prediction, diagnosis, and treatment strategies. Despite these advancements, insufficient attention has been directed towards the integration of sociodemographic variables, which are crucial determinants of health equity, into ML models in oncology. This review, investigates how machine learning techniques have been used to identify patterns of predictive association between sociodemographic factors and cancer-related outcomes. Specifically, it seeks to map current research endeavours by detailing the types of algorithms employed, the sociodemographic variables examined, and the validation methodologies utilized. We conducted a systematic literature review in accordance with the PRISMA guidelines. Searches were executed across seven databases, focusing on primary studies employing machine learning to investigate the relationship between sociodemographic characteristics and cancer-related outcomes. The search strategy was informed by the PICO framework, and a set of predefined inclusion criteria was utilized to screen the studies. The methodological quality of each included paper was assessed. Out of the 328 records examined, 19 satisfied the inclusion criteria. The majority of studies employed supervised machine learning techniques, with Random Forest and XGBoost being the most commonly utilized. Frequently analysed variables include age, sex, education level, income, and geographic location. Cross-validation is the predominant method for evaluating model performance. Nevertheless, the integration of clinical and sociodemographic data is limited, and efforts toward external validation are infrequent. Machine learning (ML) holds significant potential for discerning patterns associated with the social determinants of cancer. Nevertheless, research in this domain remains fragmented and inconsistent. Future investigations should prioritize the integration of contextual factors, enhance model transparency, and bolster external validation. These measures are crucial for the development of more
equitable, generalizable, and actionable ML applications in cancer care.
This study will be published in https://www.jmir.org
