Implementasi Fine Tuning Menggunakan Metode QLoRA Pada Sistem Tanya Jawab Hadits
DOI:
https://doi.org/10.52060/juptik.v4i1.4349Abstract
Hadits Islam merupakan sumber hukum kedua dalam ajaran Islam, namun penerapan Large Language Model (LLM) pada domain ini menghadapi masalah halusinasi faktual dan kesalahan atribusi sitasi. Penelitian ini mengembangkan sistem tanya jawab hadits berbasis Qwen 2.5 7B Instruct melalui tiga tahap. Pertama, Supervised Fine tuning (SFT) dengan Quantized Low-Rank Adaptation (QLoRA) pada 988 pasangan instruksi-respons yang dipilih dari 1.730 data mentah menggunakan Instruction Following Difficulty (IFD) Scoring berbasis rasio perplexity pada rentang P20–P80. Kedua, Direct Preference Optimization (DPO) Iterasi 1 dengan strategi off-policy menyebabkan regresi perilaku model akibat perbedaan distribusi data. Ketiga, DPO Iterasi 2 dengan strategi on-policy penuh menghasilkan respons chosen (T=0,1) dan rejected (T=0,9) dari model SFT yang sama, menghasilkan 509 pasangan valid. Komponen Hybrid Retrieval-Augmented Generation (RAG) mengindeks 65.811 hadits dari 11 kitab di Qdrant Cloud menggunakan BGE-M3 dan BM25 dengan Reciprocal Rank Fusion. Evaluasi RAGAS v0.2.6 dengan hakim GPT-4o dan BERTScore berbasis xlm-roberta-base menunjukkan DPO Iterasi 2 memperoleh Faithfulness lebih tinggi (0,676 vs 0,633), Context Precision sempurna (1,000) pada kedua model, dan BERTScore F1 yang setara (0,8621 vs 0,8615). Temuan ini mengonfirmasi bahwa strategi on-policy DPO menghasilkan keselarasan perilaku yang lebih stabil untuk model bahasa domain-spesifik.
References
[1] M. Fikri and U. Hasanah, “Unsur-Unsur Hadis dan Asbabul Wurud Hadis dalam Studi Ilmu Hadits,” ojs.uma, vol. 1, no. 2, pp. 120–128, 2023, https://doi.org/10.31289/aij.v1i2.10180
[2] APJII, “APJII Jumlah Pengguna Internet Indonesia Tembus 221 Juta Orang,” APJII, 2024. https://apjii.or.id/berita/d/apjii-jumlah-pengguna-internet-indonesia-tembus-221-juta-orang (accessed Mar. 07, 2026).
[3] M. Wildan, S. imam A. Pratama, and D. Sugiarto, “Gen Z Muslims, Social Contestation, and Digital Citizenship in Indonesia,” Tribakti, vol. 36, no. 1, pp. 165–182, 2025, https://doi.org/10.33367/tribakti.v36i1.6421
[4] Y. Gao et al., “Retrieval-Augmented Generation for Large Language Models : A Survey,” arxiv, vol. 5, pp. 1–21, 2024, doi: 10.48550/arXiv.2312.10997.
[5] S. Alnefaie, E. Atwell, and M. A. Alsalka, “Is GPT-4 a Good Islamic Expert for Answering Quran Questions ?,” in Research Gate, 2023, no. Rocling, pp. 124–133, doi: Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023).
[6] F. Sutiyo, “Implementasi Question Answering Berbasis Chatbot Telegram Pada Tafsir,” Implementasi Question Answering Berbasis Chatbot Telegram Pada Tafsir Al-Jalalain Menggunakan Langchain dan LLM, vol. 4, no. 5, pp. 2464–2472, 2024, doi: 10.30865/klik.v4i5.1784.
[7] Z. Han, C. Gao, J. Liu, J. J. Zhang, and S. Q. Zhang, “Parameter-Efficient Fine-Tuning for Large Models : A Comprehensive Survey,” arxiv, pp. 1–25, 2024, doi: 10.48550/arXiv.2403.14608.
[8] W. Fan, S. Wang, H. Li, and D. Yin, “A Survey on RAG Meeting LLMs : Towards Retrieval-Augmented Large Language Models,” in Proc. 30th ACM SIGKDD Conf. Knowledge Discovery and Data Mining (KDD ’24), pp. 6491–6501, 2024, https://doi.org/10.1145/3637528.3671470
[9] N. A. M. Herwanza, N. S. Harahap, F. Yanto, and F. Insani, “Penerapan Langchain Retriever Dengan Model Chat Openai Dalam Pengembangan Sistem Chatbot Hadis Berbasis Telegram,” JTIM: Jurnal Teknologi Informasi dan Multimedia, vol. 6, no. 1, pp. 70–83, 2024, doi: https://doi.org/10.35746/jtim.v6i1.514.
[10] M. Li et al., “From Quantity to Quality : Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning,” NAACL, vol. 1, pp. 7602–7635, 2024, https://doi.org/10.18653/v1/2024.naacl-long.421
[11] R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D. Manning, and C. Finn, “Direct Preference Optimization : Your Language Model is Secretly a Reward Model,” arxiv, vol. 3, no. NeurIPS, 2024, doi: https://doi.org/10.48550/arXiv.2305.18290.
[12] S. Es, J. James, L. Espinosa-anke, and S. Schockaert, “RAGAS : Automated Evaluation of Retrieval Augmented Generation,” in Proc. 18th Conf. European Chapter of the Association for Computational Linguistics (EACL 2024): System Demonstrations, pp. 150–158, 2024, doi: 10.18653/v1/2024.eacl-demo.16.
[13] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, “BERTScore: Evaluating Text Generation with BERT,” in Proc. Int. Conf. Learning Representations (ICLR 2020), pp. 1–43, 2020, doi: 10.48550/arXiv.1904.09675.
[14] E. Bjarnason, F. Lang, and A. Mjöberg, “An empirically based model of software prototyping : a mapping study and a multi ‑ case study,” Empirical Software Engineering, vol. 8, no. 5, 2023, https://doi.org/10.1007/s10664-023-10331-w
[15] R. K. Pradhana, D. P. Seerapu, G. Routhu, S. K. Manda, and G. R. Jami, “Locally Deployed NLP System for Secure Document Summarization and Context-Aware Question Answering Using LLMs and Vector Embeddings,” International Journal on Science and Technology (IJSAT), vol. 16, no. 2, pp. 1–12, 2025, https://doi.org/10.71097/IJSAT.v16.i2.4275
[16] L. S. A. Burhani, “Perkembangan Digitalisasi Hadis : Analisis Ensiklopedia Hadits 9 Imam Karya Lidwa Pusaka,” Jurnal Hukum Syariah, vol. 4, pp. 23–37, 2021, https://doi.org/10.32506/johs.v4i1-03
[17] J. Chen, S. Xiao, P. Zhang, K. Luo, D. Lian, and Z. Liu, “M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation,” in Findings of the Association for Computational Linguistics: ACL 2024, pp. 2318–2335, 2024, https://doi.org/10.18653/v1/2024.findings-acl.137
[18] S. Ockerman, U. States, S. Ockerman, R. Underwood, N. Chia, and K. Chard, “Exploring Distributed Vector Databases Performance on HPC Platforms : A Study with Qdrant Exploring Distributed Vector Databases Performance on HPC Platforms : A Study with Qdrant,” in Proc. HPC Asia 2025, no. December, 2025, https://doi.org/10.1145/3731599.3767404
[19] C. Zhou et al., “LIMA : Less Is More for Alignment,” in Advances in Neural Information Processing Systems (NeurIPS 2023), vol. 1, pp. 1–15, 2023, doi: 10.48550/arXiv.2305.11206.
[20] T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “QLORA : Efficient Finetuning of Quantized LLMs,” n Advances in Neural Information Processing Systems (NeurIPS 2023), vol. 36, pp. 10088–10115, 2023, https://doi.org/10.52202/075280-0441
[21] E. Hu et al., “LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS,” arxiv, vol. 2, pp. 1–26, 2021, doi: https://doi.org/10.48550/arXiv.2106.09685.
[22] Z. Chen, Y. Deng, H. Yuan, K. Ji, and Q. Gu, “Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models,” arxiv, vol. 3, 2022, doi: 10.48550/arXiv.2401.01335.
[23] D. IGuo et al., “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning,” arxiv, vol. 2, 2026, doi: https://doi.org/10.48550/arXiv.2501.12948.
[24] K. Sawarkar, A. Mangal, and S. R. Solanki, “Blended RAG : Improving RAG ( Retriever-Augmented Generation ) Accuracy with Semantic Search and Hybrid Query-Based Retrievers,” arxiv, 2024, https://doi.org/10.1109/MIPR62202.2024.00031
| Keywords | : |
Keywords:
DPO, Hybrid RAG, QLoRA, BERTScore, RAGAS
|
| Galleys | : | |
| Published | : |
2026-06-01
|
| Issue | : |
Copyright (c) 2026 Jurnal Pengembangan Teknologi Informasi dan Komunikasi (JUPTIK)

This work is licensed under a Creative Commons Attribution 4.0 International License.
