2021, 13(2): 7-14. doi: 10.16670/j.cnki.cn11-5823/tu.2021.02.02
基于LDA的地铁施工安全隐患排查要点挖掘与可视化研究
华中科技大学 土木与水利工程学院,武汉 430074 |
LDA-Based Hazard Troubleshooting Keys Mining and Visualization Analysis in Metro Construction
School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan 430074, China |
引用本文: 潘杏, 钟波涛, 黑永健, 骆汉宾. 基于LDA的地铁施工安全隐患排查要点挖掘与可视化研究[J]. 土木建筑工程信息技术, 2021, 13(2): 7-14. doi: 10.16670/j.cnki.cn11-5823/tu.2021.02.02
Citation: Xing Pan, Botao Zhong, Yongjian Hei, Hanbin Luo. LDA-Based Hazard Troubleshooting Keys Mining and Visualization Analysis in Metro Construction[J]. Journal of Information Technologyin Civil Engineering and Architecture, 2021, 13(2): 7-14. doi: 10.16670/j.cnki.cn11-5823/tu.2021.02.02
摘要:随着地铁的快速建设和隐患排查系统的建立,系统中积累了大量隐患排查记录,但是隐患排查记录信息冗杂,相关工作严重依赖导则与专家经验,需要投入大量人力成本。为提高隐患排查工作效率和安全管理决策,同时促进排查工作实现全程自动化,本文提出了一种基于文本挖掘与可视化技术的自动化分析隐患排查文本框架,该框架主要包括以下四个步骤:第一,基于Term Frequency-Inverse Document Frequency(TF-IDF)算法,对隐患描述下的关键词有一个整体的概括;第二,基于TF-IDF筛出特征值较高的关键词,借助吉布斯抽样的Latent Dirichlet Allocation(LDA)模型识别出大规模隐患描述语料库中潜藏的主题信息和隐患排查要点;第三,结合时间维度,通过Word Cloud(WC)技术对隐患描述进行可视化分析,绘制隐患词云演化图;第四,借助Word Co-occurrence Network(WCN)模型,挖掘隐患共现关系。该框架在分析武汉地铁2016-2018年施工安全隐患排查记录中得到了应用和验证。实验结果表明,该框架有效挖掘出34类隐患所对应的隐患排查要点和可视化信息。
Abstract: With the rapid construction of metro and the establishment of hazard troubleshooting system, a large number of hazard records during the construction are saved in the system. However, there are many hazard records abundant in information, and they need to be analyzed on the basis of guidelines and expert experience seriously, requiring significant labor cost. In order to improve the efficiency of hazard troubleshooting process and the level of safety management decision-making, this paper presents a novel framework that combines text mining and visualization technologies, providing the ability to analyze hazard records automatically. The framework comprises the following four-step modelling approach.Firstly, an overview of hazard records is provided through the quantitative analysis by TF-IDF technology of keywords. Secondly, the thematic information and key points hidden in the large-scale hazard troubleshooting corpus are identified using a Latent Dirichlet Allocation algorithm. Thirdly, a visual overview of hazard records is generated through the quantitative analysis by Word Cloud technology of keywords.Finally, a Word Co-occurrence Network is produced to determine the interrelations between hazard categories and sites. The framework has been used and verified in the analysis of hazard troubleshooting records of Wuhan metro in 2016-2018, showing it can mine 34 categories of hazard troubleshooting keys and visual information.
[1] |
蓝兰. 全国轨道交通运营里程5年后有望破万在建里程超五千公里[J]. 交通建设与管理, 2017(7): 76-81.doi: 10.3969/j.issn.1673-8098.2017.07.018 |
[2] |
刘艳萍, 曲福年, 任佃忠, 等. 安全生产隐患排查治理工作研究[J]. 中国安全生产科学技术, 2009, 5(2): 185-188. |
[3] |
白杨. 大数据环境下的文本挖掘教学内容探讨[J]. 无线互联科技, 2018(9): 38.doi: 10.3969/j.issn.1672-6944.2018.09.016 |
[4] | |
[5] |
孙鸽, 张丹, 郭庆军. 地铁基坑施工安全风险预警指标体系构建[J]. 四川建材, 2015(4): 204-205.doi: 10.3969/j.issn.1672-4011.2015.04.101 |
[6] |
顾雪景, 李得伟, 张岚, 等. 北京地铁隐患管理体系构建与实施[J]. 城市轨道交通研究, 2015, 18(7): 9-13. |
[7] |
宋帅, 刘方克, 米保伟. 青岛地铁工程建设隐患排查治理体系研究[J]. 都市快轨交通, 2018, 31(6): 51-55. |
[8] |
郝亮. 地铁建设对城市交通发展的影响[J]. 城市建设理论研究: 电子版, 2013, 17. |
[9] | |
[10] |
李解, 王建平, 许娜. 基于文本挖掘的地铁施工安全风险事故致险因素分析[J]. 隧道建设, 2017(2): 51-57. |
[11] |
邓军, 李贝, 张兴华. LEC法在建筑施工企业安全生产事故隐患排查治理中的运用[J]. 安全与环境工程, 2014, 21(1): 103-107.doi: 10.3969/j.issn.1671-1556.2014.01.020 |
[12] |
师尚伟, 黄永峰, 王烨. 基于网络文本大数据的信息隐藏方法[J]. 小型微型计算机系统, 2017, 38(2): 227-231. |
[13] |
Wu H. C, Luk R.W. P, Wong K.F. Interpreting TF-IDF term weights as making relevance decisions[J]. Acm Transactions on Information Systems, 2008, 26(3): 55-59. |
[14] |
Goh Y M, Ubeynarayana C U. Construction accident narrative classification: An evaluation of text mining techniques-ScienceDirec[J]. Accident Analysis and Prevention, 2017(108): 122-130. |
[15] |
Ling H, Ma J, Chen C. Topic detection from microblogs using T-LDA and perplexity[J]. Asia-pacific Software Engineering Conference Workshops, 2017: 71-77. |
[16] |
Pavlinek M, Podgorelec V. Text classification method based on self-training and LDA topic models[J]. Expert Systems with Applications, 2017(80): 83-93. |
[17] |
王博, 刘盛博, 丁堃. 基于LDA主题模型的专利内容分析方法[J]. 科研管理, 2015, 36(3): 111-117. |
[18] |
Hong T. P, Lin C. W, Yang K.T. Using TF-IDF to hide sensitive itemsets[J]. Applied Intelligence, 2013, 38(4): 502-510.doi: 10.1007/s10489-012-0377-5 |
[19] |
Eick S T. Aspects of network visualization[J]. IEEE Computer Graphics and Applications, 1996, 16(2): 69-72.doi: 10.1109/38.486685 |
[20] |
Khan M. A, Peters S, Sahinel D. Understanding autonomic network management: A look into the past, a solution for the future[J]. Computer Communications, 2018, 122: 93-117.doi: 10.1016/j.comcom.2018.01.014 |
[21] |
侯学渊, 刘国彬, 黄院雄. 城市基坑工程发展的几点看法[J]. 施工技术, 2000(1): 5-7.doi: 10.3969/j.issn.1002-8498.2000.01.004 |
[22] |
王贵福. 简议钢筋工程施工质量监理的重要性及其监理要点[J]. 建材与装饰, 2017(28): 186-187.doi: 10.3969/j.issn.1673-0038.2017.28.123 |
计量
- PDF下载量(44)
- 文章访问量(2787)
- HTML全文浏览量(1567)