Research on Multimodal Retrieval Methods for Historical Buildings

Jiameng Yuan; Lang Chen; Weiya Chen; Hanbin Luo

doi:10.16670/j.cnki.cn11-5823/tu.2024.04.02

Citation: Jiameng Yuan, Lang Chen, Weiya Chen, Hanbin Luo. Research on Multimodal Retrieval Methods for Historical Buildings. Journal of Information Technologyin Civil Engineering and Architecture, 2024, 16(4): 7-13. doi: 10.16670/j.cnki.cn11-5823/tu.2024.04.02

2024, 16(4): 7-13. doi: 10.16670/j.cnki.cn11-5823/tu.2024.04.02

Research on Multimodal Retrieval Methods for Historical Buildings

Jiameng Yuan ^1,, Lang Chen ^1,, Weiya Chen ^1,2,,, Hanbin Luo ^1,2,

1.	School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
2.	National Digital Construction Technology Innovation Center, Wuhan 430074, China

Corresponding author: 陈维亚,

Web Publishing Date: 2024-08-20

Fund Project: 国家自然科学基金项目 72001086

The retrieval of historical buildings in HBIM database faces three main issues: 1) the absence of universal rules for determining the similarity between buildings; 2) the neglect of historical and cultural information inherent to the buildings themselves; 3) most queries rely on keywords, which imposes limitations of available information. Addressing these challenges, this paper introduces a multimodal retrieval approach for historical buildings. Users can retrieve a list of buildings matching their input features, whether through images or natural language text data. For image-based building retrieval, the "dino_vit16" model is employed for feature extraction, achieving a retrieval accuracy of 90.08% with the proposed image-building retrieval method. For text-based building retrieval, a connection between images and text is established through the CLIP model. The study explores the values of image-text similarity and text similarity weights, selecting m=0.6 and n=0.4 as the optimal configuration for these weights. Experimental results have shown that the proposed text-based architectural retrieval algorithm performs best when the query statement contains a specific visual feature, and it performs worst when the query statement describes a particular function and architectural style. However, when the query statement includes four or more mixed features that accurately describe the fundamental appearance of a building, it can accurately retrieve buildings that meet the criteria.

Key words: Historical Building, Historical Building Information Modeling (HBIM), Vision Transformer (ViT), Similarity Measurement, Multimodal Retrieval

[1]	Murphy M., Mcgovern E., Pavia S., et al. Historic building information modelling (HBIM), 2009: 311-327.
[2]	Dore C., Murphy M. . Integration of historic building information modeling (HBIM) and 3D GIS for recording and managing cultural heritage sites[C]// International Conference on Virtual Systems and Multimedia, 2012.
[3]	Murphy M., Corns A., Cahill J., et al. Developing Historic Building Information Modelling Guidelines and Procedures for Architectural Heritage in Ireland[J]. Semantic Scholar, 2017, 8: 539-546.
[4]	López F J, Lerones P M, Llamas J, et al. A review of heritage building information modeling (HBIM)[J]. Multimodal Technologies and Interaction, 2018, 2(2): 21.doi: 10.3390/mti2020021
[5]	Devesh R, Jha J, Jayaswal R, et al. Retrieval of monuments images through ACO optimization approach[J]. Int. Res. J. Eng. Technol, 2017, 4(7): 279-285.
[6]	Devesh R, Jha J. An Efficient Approach for Monuments Image Retrieval Using Multi-visual Descriptors[C]//Proceeding of the Second International Conference on Microelectronics, Computing & Communication Systems (MCCS 2017). Springer Singapore, 2019: 281-293.
[7]	Jha J, Bhaduaria S S. A novel approach for retrieval of historical monuments images using visual contents and unsupervised machine learning[J]. Int J, 2020, 9(3).
[8]	文政颖, 卫欣. 多分辨批量古典建筑图像深度学习检索算法[J]. 河南工程学院学报(自然科学版), 2019, 31(02): 66-71.
[9]	杨蕾. 基于深度学习的地标建筑图像检索研究与实现[D]. 西安: 西安建筑科技大学, 2022.
[10]	Agarwal A, Saxena V. Content based multimodal retrieval for databases of Indian monuments[C]//Contemporary Computing: Third International Conference, IC3 2010, Noida, India. Proceedings, Part Ⅰ 3. Springer Berlin Heidelberg, 2010: 446-455.
[11]	Wu H, Mao J, Zhang Y, et al. Unified visual-semantic embeddings: Bridging vision and language with structured meaning representations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 6609-6618.
[12]	Matsubara T. Target-oriented deformation of visual-semantic embedding space[J]. Leice Transactions on Information and Systems, 2021, 104(1): 24-33.
[13]	Wang Z, Liu X, Li H, et al. Camp: Cross-modal adaptive message passing for text-image retrieval[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 5764-5773.
[14]	Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.doi: 10.1145/3065386
[15]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv, 2014: 1409.1556.
[16]	Chen W, Liu Y, Wang W, et al. Deep learning for instance retrieval: A survey[J]. arXiv preprint arXiv, 2021: 2101.11282.
[17]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30.
[18]	Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv, 2010: 11929, 2020.
[19]	Caron M, Touvron H, Misra I, et al. Emerging properties in self-supervised vision transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 9650-9660.
[20]	Radford A, Kim J W, Hallacy C, et al. Learning transferable visual models from natural language supervision[C]//International Conference on Machine Learning. PMLR, 2021: 8748-8763.
[21]	青岛安娜别墅_百度百科(baidu. com) [EB/OL] (2023-08-19) [2023-10-28]
[22]	Chen W, Yuan J, Luo H. Design and development of heritage building information model (HBIM) database to support maintenance[J]. EG-ICE International Workshop on Intelligent Computing in Engineering, 2022: 359-367.

Metrics

PDF Downloads(14)
Abstract views(530)
HTML views(246)

Research on Multimodal Retrieval Methods for Historical Buildings

Metrics

Other articles by authors

Related Keywords

Figures And Tables