Analyzing Medium and Long Text Indonesian Tourism Feedback Using Topic Modeling and Sentiment Analysis

Sulisetyo Puji Widodo; Isnaeni Noviyanti

doi:10.34123/jurnalasks.v18i1.895

Authors

Sulisetyo Puji Widodo BPS-Statistics Indonesia, Jakarta, Indonesia
Isnaeni Noviyanti Universitas Indonesia, Depok, Indonesia

DOI:

https://doi.org/10.34123/jurnalasks.v18i1.895

Keywords:

Feedback, Indonesian Tourism, Natural Language Processing, Sentiment Analysis, Topic Modeling

Abstract

Introduction/Main Objectives: Tourism is a vital sector supporting Indonesia’s economic growth, making the effective utilization of public feedback essential for improving service quality. Most feedback is collected through web-based forms in the form of open-text responses that provide rich insights but remain underutilized due to their unstructured nature. Background Problems: This study examines the challenge of identifying the most suitable topic modeling and sentiment analysis techniques for analyzing medium- and long-text feedback in the Indonesian tourism context. Novelty: The novelty lies in the comparative evaluation of classical topic modeling algorithms against modern embedding-based approaches combined with multiple Indonesian transformer models, which has not been extensively explored in tourism-related datasets. Research Methods: The research compares LDA and NMF with BERTopic, Top2Vec, kBERT, and kUSE using coherence scores, and evaluates sentiment analysis using majority voting across transformer architectures. Finding/Results: The results show that BERTopic performed best for medium-length text, while NMF was optimal for long text, and a RoBERTa-based model achieved the highest sentiment agreement. Positive sentiment often appeared in feedback on facilities and fees, whereas negative sentiment dominated topics on environmental and governance issues. These findings offer valuable insights for tourism managers and policymakers in prioritizing improvements and refining strategies.

Downloads

Download data is not yet available.

References

D. Angelov, “Top2Vec: Distributed representations of topics,” arXiv preprint arXiv:2008.09470, 2020.

L. Hong and B. D. Davison, “Empirical study of topic modeling in Twitter,” in Proc. First Workshop on Social Media Analytics (SOMA ’10), Washington, DC, USA, Jul. 2010, pp. 80–88.

X. Yan, J. Guo, Y. Lan, and X. Cheng, “A biterm topic model for short texts,” in Proc. 22nd Int. World Wide Web Conf. (WWW ’13), Rio de Janeiro, Brazil, May 2013, pp. 1445–1456.

J. Qiang, Z. Qian, Y. Li, Y. Yuan, and X. Wu, “Short text topic modeling techniques, applications, and performance: A survey,” IEEE Trans. Knowl. Data Eng., vol. 34, no. 3, pp. 1427–1445, Mar. 2022.

W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li, “Comparing Twitter and traditional media using topic models,” in Advances in Information Retrieval (ECIR 2011), LNCS, vol. 6611. Berlin, Germany: Springer, 2011, pp. 338–349.

Z. Ji, Z. Lu, and H. Li, “An information retrieval approach to short text conversation,” arXiv preprint arXiv:1408.6988, 2014.

J. Yin and J. Wang, “A Dirichlet multinomial mixture model-based approach for short text clustering,” in Proc. 20th ACM Int. Conf. Inf. Knowl. Manag. (CIKM ’11), Glasgow, U.K., Oct. 2011, pp. 2333–2336.

M. Grootendorst, “BERTopic: Neural topic modeling with a class-based TF–IDF procedure,” arXiv preprint arXiv:2203.05794, 2022.

B. Bianchi, G. Lami, and F. Sebastiani, “CombinedTM: Combining topic models for improved short text modeling,” Inf. Process. Manage., vol. 58, no. 2, 2021.

A. B. Dieng, F. J. R. Ruiz, and D. M. Blei, “The embedded topic model,” arXiv preprint arXiv:1707.01417, 2020.

M. Röder, A. Both, and A. Hinneburg, “Exploring the space of topic coherence measures,” in Proc. 8th ACM Int. Conf. Web Search Data Mining (WSDM), Shanghai, China, 2015, pp. 399–408.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. NAACL-HLT, Minneapolis, MN, USA, 2019, pp. 4171–4186.

Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “RoBERTa: A robustly optimized BERT pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.

[P. He, X. Liu, J. Gao, and W. Chen, “DeBERTa: Decoding-enhanced BERT with disentangled attention,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2021.

D. Araci, “FinBERT: Financial sentiment analysis with pre-trained language models,” arXiv preprint arXiv:1908.10063, 2019.

M. A. Jahin, M. N. Uddin, and M. A. Hossain, “TRABSA: Transformer and attention-based bidirectional LSTM for sentiment analysis,” Sci. Rep., vol. 14, 2024.

R. Artstein and M. Poesio, “Inter-coder agreement for computational linguistics,” Comput. Linguist., vol. 34, no. 4, pp. 555–596, 2008.

K. Gwet, Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters. Gaithersburg, MD: Advanced Analytics, LLC, 2014.

K. Krippendorff, Content Analysis: An Introduction to Its Methodology. Thousand Oaks, CA: SAGE Publications, 2018.

J. M. Serrano-Guerrero, J. A. Olivas, F. P. Romero, and E. Herrera-Viedma, “Sentiment analysis: A review and comparative analysis of web services,” Inf. Sci., vol. 311, pp. 18–38, 2015.

Analyzing Medium and Long Text Indonesian Tourism Feedback Using Topic Modeling and Sentiment Analysis

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

Journal Policy

ISSN

Instruction for Author

Reviewer Link

Scopus Citation

Tools

Indexed By:

Analyzing Medium and Long Text Indonesian Tourism Feedback Using Topic Modeling and Sentiment Analysis

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

Journal Policy

ISSN

Instruction for Author

Reviewer Link

Scopus Citation

Sponsor

Tools