• ISSN: 1674-7461
  • CN: 11-5823/TU
  • 主管:中国科学技术协会
  • 主办:中国图学学会
  • 承办:中国建筑科学研究院有限公司

基于LDA的地铁施工安全隐患排查要点挖掘与可视化研究

LDA-Based Hazard Troubleshooting Keys Mining and Visualization Analysis in Metro Construction

  • 摘要: 随着地铁的快速建设和隐患排查系统的建立,系统中积累了大量隐患排查记录,但是隐患排查记录信息冗杂,相关工作严重依赖导则与专家经验,需要投入大量人力成本。为提高隐患排查工作效率和安全管理决策,同时促进排查工作实现全程自动化,本文提出了一种基于文本挖掘与可视化技术的自动化分析隐患排查文本框架,该框架主要包括以下四个步骤:第一,基于Term Frequency-Inverse Document Frequency(TF-IDF)算法,对隐患描述下的关键词有一个整体的概括;第二,基于TF-IDF筛出特征值较高的关键词,借助吉布斯抽样的Latent Dirichlet Allocation(LDA)模型识别出大规模隐患描述语料库中潜藏的主题信息和隐患排查要点;第三,结合时间维度,通过Word Cloud(WC)技术对隐患描述进行可视化分析,绘制隐患词云演化图;第四,借助Word Co-occurrence Network(WCN)模型,挖掘隐患共现关系。该框架在分析武汉地铁2016-2018年施工安全隐患排查记录中得到了应用和验证。实验结果表明,该框架有效挖掘出34类隐患所对应的隐患排查要点和可视化信息。

     

    Abstract: With the rapid construction of metro and the establishment of hazard troubleshooting system, a large number of hazard records during the construction are saved in the system. However, there are many hazard records abundant in information, and they need to be analyzed on the basis of guidelines and expert experience seriously, requiring significant labor cost. In order to improve the efficiency of hazard troubleshooting process and the level of safety management decision-making, this paper presents a novel framework that combines text mining and visualization technologies, providing the ability to analyze hazard records automatically. The framework comprises the following four-step modelling approach.Firstly, an overview of hazard records is provided through the quantitative analysis by TF-IDF technology of keywords. Secondly, the thematic information and key points hidden in the large-scale hazard troubleshooting corpus are identified using a Latent Dirichlet Allocation algorithm. Thirdly, a visual overview of hazard records is generated through the quantitative analysis by Word Cloud technology of keywords.Finally, a Word Co-occurrence Network is produced to determine the interrelations between hazard categories and sites. The framework has been used and verified in the analysis of hazard troubleshooting records of Wuhan metro in 2016-2018, showing it can mine 34 categories of hazard troubleshooting keys and visual information.

     

/

返回文章
返回