[1]段艳会1,2,李晓林1,等.基于条件随机场的中文地址行政区划提取方法[J].武汉工程大学学报,2015,37(11):47-51.[doi:10. 3969/j. issn. 1674-2869. 2015. 11. 010]
,,et al.Extraction of administrative division of Chinese address based on conditional random fields[J].Journal of Wuhan Institute of Technology,2015,37(11):47-51.[doi:10. 3969/j. issn. 1674-2869. 2015. 11. 010]
Extraction of administrative division of Chinese address based on conditional random fields
段艳会1; 2; 李晓林1; 2*; 黄 爽1; 2
1.智能机器人湖北省重点实验室(武汉工程大学),湖北 武汉 430205;2.武汉工程大学计算机科学与工程学院,湖北 武汉 430205
DUAN Yan-hui1; 2; LI Xiao-lin1; 2; HUANG Shuang1; 2
1.Hubei Key Laboratory of Intelligent Robot(Wuhan Institute of Technology), Wuhan 430205, China;2.School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China
位置信息解析; 条件随机场; 训练语料
location information parsing; condition random fields; training corpus
10. 3969/j. issn. 1674-2869. 2015. 11. 010
为了在非规范中文地址中有效的提取行政区划信息,提出了一种基于条件随机场的方法. 该方法根据中文地址中行政区划的表达特点和特征,采用判别式概率模型,在观测序列已知的基础上对目标序列建模,通过构建语料训练集和建立相应的特征模板,得到行政区划的表达模型,然后使用该模型对测试集进行测试,并与标注好的测试数据进行比对,验证模型的性能. 实验表明,与最大熵模型相比,条件随机场模型总的性能指标在其之上,地址信息解析的准确率能达到89.93%.
To extract the information of administrative division effectively from the non-standard Chinese address, a method based on conditional random fields was proposed. According to the characteristics of administrative division, the model of the target sequence was constructed on the basis of the observation sequence by using the discriminative probability model. Then, the expression model of the administrative division was obtained by constructing the corpus training set and the corresponding feature template. Finally, the performance of the model was verified by testing the test set and comparing its results with the marked test data. Experimental results show that the performance of the model is better than that of the maximum entropy model, and the accuracy rate of analysis of address information reaches 89.93%.
收稿日期:2015-10-13基金项目:国家863 项目(2013AA12A202);武汉工程大学研究生教育创新基金项目(CX2014090)作者简介:段艳会(1993-),女,湖北公安人,硕士研究生.研究方向:数据挖掘尧机器学习.* 通信联系人
