Abstract:
With the rapid construction of metro and the establishment of hazard troubleshooting system, a large number of hazard records during the construction are saved in the system. However, there are many hazard records abundant in information, and they need to be analyzed on the basis of guidelines and expert experience seriously, requiring significant labor cost. In order to improve the efficiency of hazard troubleshooting process and the level of safety management decision-making, this paper presents a novel framework that combines text mining and visualization technologies, providing the ability to analyze hazard records automatically. The framework comprises the following four-step modelling approach.Firstly, an overview of hazard records is provided through the quantitative analysis by TF-IDF technology of keywords. Secondly, the thematic information and key points hidden in the large-scale hazard troubleshooting corpus are identified using a Latent Dirichlet Allocation algorithm. Thirdly, a visual overview of hazard records is generated through the quantitative analysis by Word Cloud technology of keywords.Finally, a Word Co-occurrence Network is produced to determine the interrelations between hazard categories and sites. The framework has been used and verified in the analysis of hazard troubleshooting records of Wuhan metro in 2016-2018, showing it can mine 34 categories of hazard troubleshooting keys and visual information.