Implementasi Arsitektur Hybrid RAG pada Chatbot Skrining Kesehatan Mental

M Dhimas Hadid Fachrezy; Nazruddin Safaat; Muhammad Irsyad; Febi Yanto

doi:10.52060/juptik.v4i1.4342

Penulis

M Dhimas Hadid Fachrezy ( Universitas Islam Negeri Sultan Syarif Kasim Riau )

Nazruddin Safaat ( Universitas Islam Negeri Sultan Syarif Kasim Riau )

Muhammad Irsyad ( Universitas Islam Negeri Sultan Syarif Kasim Riau )

Febi Yanto ( Universitas Islam Negeri Sultan Syarif Kasim Riau )

DOI:

https://doi.org/10.52060/juptik.v4i1.4342

Abstrak

Kesehatan mental pekerja merupakan tantangan global yang mendesak. Di Indonesia, hanya 57% penyandang gangguan jiwa yang memperoleh layanan kesehatan mental, masih jauh dari target nasional sebesar 90%. Sistem Retrieval-Augmented Generation (RAG) konvensional belum dirancang untuk menangani data asesmen psikometri dan pengambilan literatur klinis secara bersamaan tanpa meningkatkan risiko halusinasi. Penelitian ini mengusulkan arsitektur Hybrid RAG dengan Query Router berlapis untuk mengintegrasikan interpretasi psikometri yang bersifat deterministik dan retrieval pengetahuan klinis yang relevan secara semantik. Sebuah chatbot skrining kesehatan mental dikembangkan pada Telegram menggunakan arsitektur dua jalur yang memisahkan pemrosesan relasional melalui PostgreSQL dan retrieval semantik melalui Qdrant, BM25, serta Distribution-Based Score Fusion (DBSF). Sistem dikendalikan oleh Query Router empat lapis dan Routing-Aware Quality Gate dengan kebijakan Fail-Closed. Lima instrumen psikometri diimplementasikan, yaitu WHO-5, GAD-7, M-TBI, K10, dan NAQ-R. Evaluasi dilakukan menggunakan LaBSE BERTScore, E5 Cosine Similarity, dan RAGAS LLM-as-a-Judge. Hasil pengujian menunjukkan bahwa Query Router mencapai akurasi klasifikasi 100% pada 100 kueri uji. Konfigurasi Hybrid RAG dengan DBSF memperoleh nilai LaBSE F1 tertinggi sebesar 0,7039 dan E5 Cosine Similarity sebesar 0,9234. Selain itu, skor Faithfulness sebesar 0,9350 menunjukkan bahwa sebagian besar klaim yang dihasilkan didukung oleh dokumen klinis yang berhasil diambil sistem. Nilai Answer Correctness sebesar 0,7447 dipengaruhi keterbatasan evaluasi lintas bahasa, bukan kelemahan arsitektur yang diusulkan.

Referensi

[1] WHO, “Mental health.” Accessed: Mar. 04, 2026. [Online]. Available: https://www.who.int/health-topics/mental-health

[2] WHO, “COVID-19 pandemic triggers 25% increase in prevalence of anxiety and depression worldwide.” Accessed: May 03, 2026. [Online]. Available: https://www.who.int/news/item/02-03-2022-covid-19-pandemic-triggers-25-increase-in-prevalence-of-anxiety-and-depression-worldwide

[3] WHO, “Over a billion people living with mental health conditions – services require urgent scale-up.” Accessed: Apr. 29, 2026. [Online]. Available: https://www.who.int/news/item/02-09-2025-over-a-billion-people-living-with-mental-health-conditions-services-require-urgent-scale-up

[4] Deloitte Global, “2025 Gen Z and Millennial Survey: Growth and the pursuit of money, meaning, and well-being” New York, 2025.

[5] S. Winurini, “Penanganan Kesehatan Mental di Indonesia,” Jakarta, 2023.

[6] Kementerian Kesehatan RI, “Laporan Kinerja Pembinaan Pelayanan Kesehatan Jiwa dan Kelompok Disabilitas serta Korban KTP/A Tahun 2025,” Jakarta, 2025.

[7] Kementerian Kesehatan RI, “Skrining Kesehatan Jiwa Gratis Lewat SATUSEHAT Mobile.” Accessed: Mar. 01, 2026. [Online]. Available: https://kemkes.go.id/id/skrining-kesehatan-jiwa-gratis-lewat-satusehat-mobile

[8] I. Hungerbuehler, K. Daley, and K. Cavanagh, “Chatbot-Based Assessment of Employees’Mental Health: Design Process and Pilot Implementation,” JMIR Form Res, vol. 5, no. 4, p. e21678, 2021, https://doi.org/10.2196/21678

[9] C. S. Mala, G. Gezici, and F. Giannotti, “Hybrid Retrieval for Hallucination Mitigation in Large Language Models : A Comparative Analysis,” ArXiv, Feb. 2025, doi: 10.48550/arXiv.2504.05324.

[10] V. Gumma, A. Raghunath, M. Jain, and S. Sitaram, “HEALTH-PARIKSHA: Assessing RAG Models for Health Chatbots in Real-World Multilingual Settings,” ArXiv, Oct. 2025, doi: 10.48550/arXiv.2410.13671.

[11] R. S. Pressman and B. R. Maxim, Software Engineering: A Practitioner’s Approach, 9th ed. New York: McGraw-Hill Education, 2020.

[12] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, “BERTScore: Evaluating Text Generation With Bert,” in International Conference on Learning Representations (ICLR), Feb. 2020, pp. 1–43. Accessed: Apr. 03, 2026. [Online]. Available: https://arxiv.org/abs/1904.09675

[13] L. Wang et al., “Text Embeddings by Weakly-Supervised Contrastive Pre-training,” ArXiv, Feb. 2024, doi: 10.48550/arXiv.2212.03533.

[14] S. Es, J. James, L. Espinosa-Anke, S. Schockaert, and E. Gradients, “RAGAS: Automated Evaluation of Retrieval Augmented Generation,” in Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, Mar. 2024, pp. 150–158. https://doi.org/10.18653/v1/2024.eacl-demo.16

[15] C. W. Topp, S. Dinesen, and S. Søndergaard, “The WHO-5 Well-Being Index : A Systematic Review of the Literature,” Psychother Psychosom, vol. 84, no. 3, pp. 167–176, 2015, https://doi.org/10.1159/000376585

[16] M. Siradjuddin, D. A. Perwitasari, L. M. Irham, H. Dania, and T. Herlina, “Validity and reliability of the world health organisation-five well being index ( WHO-5 ) questionnaire in early detection of depression during Covid-19 pandemic in Yogyakarta,” Pharmaciana, vol. 13, no. 2, pp. 204–210, Feb. 2023, https://doi.org/10.12928/pharmaciana.v13i2.24319

[17] R. L. Spitzer, K. Kroenke, J. B. W. Williams, and B. Lo, “A Brief Measure for Assessing Generalized Anxiety Disorder: The GAD-7,” Arch Intern Med, vol. 166, no. 10, pp. 1092–1097, 2006, https://doi.org/10.1001/archinte.166.10.1092

[18] A. Budikayanti, A. Larasari, K. Malik, Z. Syeban, L. A. Indrawati, and F. Octaviana, “Screening of Generalized Anxiety Disorder in Patients with Epilepsy: Using a Valid and Reliable Indonesian Version of Generalized Anxiety Disorder-7 (GAD-7),” Neurology Research International, vol. 2019, p. 10, Jun. 2019, https://doi.org/10.1155/2019/5902610

[19] L. T. Widhianingtanti and G. van Luijtelaar, “The Maslach-Trisni Burnout Inventory : Adaptation for Indonesia,” JP3I (Jurnal Pengukuran Psikologi dan Pendidikan Indonesia), vol. 11, no. 1, pp. 1–21, 2022, https://doi.org/10.15408/jp3i.v11i1.24400

[20] T. Duc, F. Kaligis, T. Wiguna, and L. Willenberg, “Screening for depressive and anxiety disorders among adolescents in Indonesia: Formal validation of the centre for epidemiologic studies depression scale– revised and the Kessler psychological distress scale,” J Affect Disord, vol. 246, pp. 189–194, 2019, https://doi.org/10.1016/j.jad.2018.12.042

[21] S. Einarsen, H. Hoel, and G. Notelaers, “Measuring exposure to bullying and harassment at work : Validity , factor structure and psychometric properties of the Negative Acts Questionnaire-Revised,” vol. 23, no. 1, pp. 22–24, May 2009, doi: 10.1080/02678370902815673.

[22] D. Erwandi and A. Kadir, “Identification of Workplace Bullying : Reliability and Validity of Indonesian Version of the Negative Acts,” Environmental Research and Public Health, vol. 18, no. 8, Apr. 2021, https://doi.org/10.3390/ijerph18083985

[23] E. F. Codd, “A Relational Model of Data Large Shared Data Banks,” vol. 13, no. 6, Jun. 1970, doi: 10.1145/362384.362685.

[24] P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” in Advances in Neural Information Processing Systems 33 (NeurIPS 2020), Vancouver: Curran Associates, Inc., 2020, pp. 9459–9474. doi: 10.48550/arXiv.2005.11401.

[25] Y. Luan, J. Eisenstein, K. Toutanova, and M. Collins, “Sparse, Dense, and Attentional Representations for Text Retrieval,” Trans Assoc Comput Linguist, vol. 9, pp. 329–345, Apr. 2021, https://doi.org/10.1162/tacl_a_00369

[26] S. Schulhoff et al., “The Prompt Report: A Systematic Survey of Prompt Engineering Techniques,” ArXiv, Feb. 2025, doi: 10.48550/arXiv.2406.06608.

[27] N. F. Liu et al., “Lost in the Middle: How Language Models Use Long Contexts,” Trans Assoc Comput Linguist, vol. 12, pp. 157–173, 2024, https://doi.org/10.1162/tacl_a_00638

[28] S. Jeong, J. Baek, S. Cho, S. J. Hwang, and J. C. Park, “Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity,” ArXiv, Mar. 2024, https://doi.org/10.18653/v1/2024.naacl-long.389

[29] Telegram, “Telegram Bot API,” https://core.telegram.org/bots/api. Accessed: May 10, 2026. [Online]. Available: https://core.telegram.org/bots/api

[30] T. Sušánka and J. Kokes, “Security Analysis of the Telegram IM,” no. 8, pp. 1–8, Nov. 2017, https://doi.org/10.1145/3150376.3150382

[31] Qdrant, “Hybrid Search with Distribution-Based Score Fusion.” Accessed: Jun. 09, 2026. [Online]. Available: https://qdrant.tech/articles/hybrid-search/

[32] F. Feng, Y. Yang, D. Cer, N. Arivazhagan, and W. Wang, “Language-agnostic BERT Sentence Embedding,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin: Association for Computational Linguistics, May 2022, pp. 878–891. https://doi.org/10.18653/v1/2022.acl-long.62

[33] M. Casu, S. Triscari, S. Battiato, L. Guarnera, and P. Caponnetto, “AI Chatbots for Mental Health: A Scoping Review of Effectiveness, Feasibility, and Applications,” MDPI, vol. 14, no. 13, p. 5889, Jul. 2024, https://doi.org/10.3390/app14135889

[34] Suharyadi and I. Saputra, “Hybrid Ensemble Retrieval-Augmented Generation for Indonesian Legal Consultation with Keyword Boosting,” Journal of Novel Engineering Science and Technology, vol. 4, no. 2, pp. 71–85, Aug. 2025, https://doi.org/10.56741/jnest.v4i02.1042

Keywords	:	Kata Kunci: Chatbot, Hybrid RAG, Query Router, Skrining Kesehatan Mental, Telegram
Galleys	:	PDF (Inggris)
Diterbitkan	:	2026-06-01
Terbitan	:	Vol 4 No 1 (2026): JURNAL PENGEMBANGAN TEKNOLOGI INFORMASI DAN KOMUNIAKSI (JUPTIK) Bagian Articles

Implementasi Arsitektur Hybrid RAG pada Chatbot Skrining Kesehatan Mental

Penulis

DOI:

Abstrak

Referensi

Cara Mengutip