Named Entity Recognition on A Collection of Research Titles

Authors

  • Siti Mariyah The Center of Computational Statistics Study, Institute of Statistics

DOI:

https://doi.org/10.34123/jurnalasks.v9i1.95

Keywords:

research titles, named entity recognition, information extraction, contextual features, naïve bayes classifier

Abstract

The title can help the reader to get the universal point of view of the article as the initial understanding before reading the content as a whole. On technical research papers, the title states essential information. In this study, we aim to develop information extraction techniques to recognize and extract problem, method, and domain of research contained in a title. We apply supervised learning on 671 research titles in computer science from various online journals and international conference proceedings. We conducted some experiments with different schemas to discover the influence of features and the performance of the algorithm. We examined contextual, syntactic, and the bag of words feature sets using Naïve Bayes and Maximum Entropy. The Naïve Bayes classifier learned from the first group of the feature set is successful in predicting category of each token in title dataset. The accuracy and f1-score for each class are more than 0.80 since the first group of feature sets considers the location of a token within a sentence, considers the token and POS tag of some tokens before and after and deliberates the rules of a token. While the Naïve Bayes classifier learned from the second group of the feature set is more appropriate classifying a phrase token than a word token.

Downloads

Download data is not yet available.

References

Ayan, Necip Fazil, and Bonnie J. Dorr. 2006. A Maximum Entropy Approach to Combining Word Alignments. Proceedings of the Human Language Technology Conference of the NAACL, Main Conference (June): 96–103.
Bodenreider, Olivier, and Pierre Zweigenbaum. 2000. Identifying Proper Names in Parallel Medical Terminologies. Studies in Health Technology and Informatics 77: 443–47.
Chodey, Krishna Prasad, and Gongzhu Hu. 2016. Clinical Text Analysis Using Machine Learning Methods. Computer and Information Science (ICIS), 2016 IEEE/ACIS 15th International Conference on.
Dimililer, Nazife, Ekrem Varo?lu, and Hakan Altinçay. 2009. Classifier Subset Selection for Biomedical Named Entity Recognition. Applied Intelligence 31(3): 267–82.
Ek, Tobias, Camilla Kirkegaard, Håkan Jonsson, and Pierre Nugues. 2011. Named Entity Recognition for Short Text Messages. Procedia - Social and Behavioral Sciences 27(Pacling): 178–87.
Joachims, Thorsten. 1998. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In The 10th European Conference on Machine Learning, , 137–42.
Mao, Xinnian et al. 2007. Using Non-Local Features to Improve Named Entity Recognition Recall. In Proceedings of the 21st Pasific Asia Conference on Language, Information, and Computation, 303–10. http://dspace.wul.waseda.ac.jp/dspace/bitstream/2065/29132/1/PACLIC_21_00_031_Mao.pdf.
McKenzie, Amber. 2013. Focused Training Sets to Reduce Noise in NER Feature Models. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, , 411–15. http://www.aclweb.org/anthology/N13-1042.
Nadeau, D. 2007. A Survey of Named Entity Recognition and Classification. Linguisticae Investigationes (30): 3–26. http://nlp.cs.nyu.edu/sekine/papers/li07.pdf.
Qin, Ying, Taozheng Zhang, and Xiaojie Wang. 2008. Chinese Named Entity Recognition with New Contextual Features. 2008 International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2008: 1–6.
Rafi, Muhammad, Sundus Hassan, and Mohammad Shahid Shaikh. 2012. Content-Based Text Categorization Using Wikitology. International Journal of Computer Science Issues 9(4): 9. http://arxiv.org/abs/1208.3623.
S, Amarappa, and Sathyanarayana S.V. 2015. Kannada Named Entity Recognition and Classification (NERC) Based on Multinomial Naïve Bayes (MNB) Classifier. International Journal on Natural Language Computing 4(4): 39–52. http://www.airccse.org/journal/ijnlc/papers/4415ijnlc04.pdf.
Saha, Sujan Kumar, Sudeshna Sarkar, and Pabitra Mitra. 2009. Feature Selection Techniques for Maximum Entropy Based Biomedical Named Entity Recognition. Journal of Biomedical Informatics 42(5): 905–11. http://dx.doi.org/10.1016/j.jbi.2008.12.012.
Sebastiani, Fabrizio. 2001. Machine Learning in Automated Text Categorization. Journal ACM Computing Surveys (CSUR) 34(1): 1–47. http://arxiv.org/abs/cs/0110053.
Suakkaphong, Nichalin, Zhu Zhang, and Hsinchun Chen. 2009. Disease Named Entity Recognition Using Semisupervised Learning and Conditional Random Fields. Journal of The American Society for Information Science and Technology 3(2): 80–90.
Wu, Tianhao, William M Pottenger, and Computer Science. 2005. A Semi-Supervised Active Learning Algorithm for Information Extraction from Textual Data. Journal of the American Society for Information Science and Technology 56(3): 258–71. http://doi.wiley.com/10.1002/asi.20119.

Downloads

Published

2017-06-30

How to Cite

Mariyah, S. (2017). Named Entity Recognition on A Collection of Research Titles. Jurnal Aplikasi Statistika & Komputasi Statistik, 9(1), 12. https://doi.org/10.34123/jurnalasks.v9i1.95