Search Engine Augmented Chinese Named Entity Recognition for Big Data Content Security
Published in CBDCom 2025, 2025
Recommended citation: Qinghua Mao, Jiatong Li, Kui Meng, Yuanyuan Sun, Pengjiao Wang, Jianhua Li. (2025). Search Engine Augmented Chinese Named Entity Recognition for Big Data Content Security. CBDCom 2025. https://ieeexplore.ieee.org/document/11344245
Named Entity Recognition (NER) is a critical component for real-time public opinion monitoring and threat perception in big data systems. Compared with English, Chinese suffers from more grammatical ambiguities, such as fuzzy word boundaries and polysemous words, which can make contextual information insufficient to support Chinese NER. To address this issue, recent works have suggested that retrieving background knowledge as assistance is a potential solution, yet how to obtain and leverage related background knowledge for the NER task remains a challenge. In this paper, we propose a neural-based approach that utilizes auxiliary knowledge from search engines for Chinese NER in big data systems. Specifically, we retrieve external related texts via the search engine with a query generated from the input sentence. Then, a multi-channel semantic fusion model is adopted to aggregate the original sentences with the retrieved external related texts. Experiments have demonstrated the superiority of our model across four NER datasets, including formal and social media language contexts, which further proves the effectiveness of our approach. By improving the accuracy and robustness of Chinese NER, this work provides a practical solution for safeguarding big data content security.
