2024, 16(4): 7-13. doi: 10.16670/j.cnki.cn11-5823/tu.2024.04.02
历史建筑多模态检索方法研究
1. | 华中科技大学 土木与水利工程学院,武汉 430074 |
2. | 国家数字建造技术创新中心,武汉 430074 |
Research on Multimodal Retrieval Methods for Historical Buildings
1. | School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan 430074, China |
2. | National Digital Construction Technology Innovation Center, Wuhan 430074, China |
引用本文: 袁嘉梦, 陈浪, 陈维亚, 骆汉宾. 历史建筑多模态检索方法研究[J]. 土木建筑工程信息技术, 2024, 16(4): 7-13. doi: 10.16670/j.cnki.cn11-5823/tu.2024.04.02
Citation: Jiameng Yuan, Lang Chen, Weiya Chen, Hanbin Luo. Research on Multimodal Retrieval Methods for Historical Buildings[J]. Journal of Information Technologyin Civil Engineering and Architecture, 2024, 16(4): 7-13. doi: 10.16670/j.cnki.cn11-5823/tu.2024.04.02
摘要:在HBIM (Historic Building Information Modeling) 数据库中进行信息查询面临三个问题:一是没有普适性的规则判断建筑之间的相似性;二是未考虑建筑本身所包含的历史文化信息;三是查询文本多基于关键词,难以检索到关键词未包含的信息。针对以上问题,提出了一种面向历史建筑的多模态检索方法,用户能通过输入图像或自然语言文本数据,检索到与输入特征相符的建筑,并以列表形式进行排序。在以图像检索建筑时,利用“dino_vit16”模型对图像进行特征提取,所提出的图像-建筑检索方法检索精度达90.08%;在文本检索建筑时则基于CLIP(Contrastive Language-Image Pre-training)模型建立图像和文本的关联,研究了图文相似度和文本相似度权重的取值,选择m =0.6,n =0.4作为权重的最佳配置。实验证明所提出的文本-建筑检索算法对于包含某种外观特征查询语句的检索效果最好,对于描述某种功能和建筑风格的查询语句检索效果最差,而当查询语句中包含4个以上的混合特征,能够描述出建筑的基本面貌时,可以准确地检索到符合条件的建筑。
Abstract: The retrieval of historical buildings in HBIM database faces three main issues: 1) the absence of universal rules for determining the similarity between buildings; 2) the neglect of historical and cultural information inherent to the buildings themselves; 3) most queries rely on keywords, which imposes limitations of available information. Addressing these challenges, this paper introduces a multimodal retrieval approach for historical buildings. Users can retrieve a list of buildings matching their input features, whether through images or natural language text data. For image-based building retrieval, the "dino_vit16" model is employed for feature extraction, achieving a retrieval accuracy of 90.08% with the proposed image-building retrieval method. For text-based building retrieval, a connection between images and text is established through the CLIP model. The study explores the values of image-text similarity and text similarity weights, selecting m=0.6 and n=0.4 as the optimal configuration for these weights. Experimental results have shown that the proposed text-based architectural retrieval algorithm performs best when the query statement contains a specific visual feature, and it performs worst when the query statement describes a particular function and architectural style. However, when the query statement includes four or more mixed features that accurately describe the fundamental appearance of a building, it can accurately retrieve buildings that meet the criteria.
[1] |
Murphy M., Mcgovern E., Pavia S., et al. Historic building information modelling (HBIM), 2009: 311-327. |
[2] |
Dore C., Murphy M. . Integration of historic building information modeling (HBIM) and 3D GIS for recording and managing cultural heritage sites[C]// International Conference on Virtual Systems and Multimedia, 2012. |
[3] |
Murphy M., Corns A., Cahill J., et al. Developing Historic Building Information Modelling Guidelines and Procedures for Architectural Heritage in Ireland[J]. Semantic Scholar, 2017, 8: 539-546. |
[4] |
López F J, Lerones P M, Llamas J, et al. A review of heritage building information modeling (HBIM)[J]. Multimodal Technologies and Interaction, 2018, 2(2): 21.doi: 10.3390/mti2020021 |
[5] |
Devesh R, Jha J, Jayaswal R, et al. Retrieval of monuments images through ACO optimization approach[J]. Int. Res. J. Eng. Technol, 2017, 4(7): 279-285. |
[6] |
Devesh R, Jha J. An Efficient Approach for Monuments Image Retrieval Using Multi-visual Descriptors[C]//Proceeding of the Second International Conference on Microelectronics, Computing & Communication Systems (MCCS 2017). Springer Singapore, 2019: 281-293. |
[7] |
Jha J, Bhaduaria S S. A novel approach for retrieval of historical monuments images using visual contents and unsupervised machine learning[J]. Int J, 2020, 9(3). |
[8] |
文政颖, 卫欣. 多分辨批量古典建筑图像深度学习检索算法[J]. 河南工程学院学报(自然科学版), 2019, 31(02): 66-71. |
[9] |
杨蕾. 基于深度学习的地标建筑图像检索研究与实现[D]. 西安: 西安建筑科技大学, 2022. |
[10] |
Agarwal A, Saxena V. Content based multimodal retrieval for databases of Indian monuments[C]//Contemporary Computing: Third International Conference, IC3 2010, Noida, India. Proceedings, Part Ⅰ 3. Springer Berlin Heidelberg, 2010: 446-455. |
[11] |
Wu H, Mao J, Zhang Y, et al. Unified visual-semantic embeddings: Bridging vision and language with structured meaning representations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 6609-6618. |
[12] |
Matsubara T. Target-oriented deformation of visual-semantic embedding space[J]. Leice Transactions on Information and Systems, 2021, 104(1): 24-33. |
[13] |
Wang Z, Liu X, Li H, et al. Camp: Cross-modal adaptive message passing for text-image retrieval[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 5764-5773. |
[14] |
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.doi: 10.1145/3065386 |
[15] |
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv, 2014: 1409.1556. |
[16] |
Chen W, Liu Y, Wang W, et al. Deep learning for instance retrieval: A survey[J]. arXiv preprint arXiv, 2021: 2101.11282. |
[17] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30. |
[18] |
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv, 2010: 11929, 2020. |
[19] |
Caron M, Touvron H, Misra I, et al. Emerging properties in self-supervised vision transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 9650-9660. |
[20] |
Radford A, Kim J W, Hallacy C, et al. Learning transferable visual models from natural language supervision[C]//International Conference on Machine Learning. PMLR, 2021: 8748-8763. |
[21] |
青岛安娜别墅_百度百科(baidu. com) [EB/OL] (2023-08-19) [2023-10-28] |
[22] |
Chen W, Yuan J, Luo H. Design and development of heritage building information model (HBIM) database to support maintenance[J]. EG-ICE International Workshop on Intelligent Computing in Engineering, 2022: 359-367. |
计量
- PDF下载量(14)
- 文章访问量(530)
- HTML全文浏览量(248)