• ISSN: 1674-7461
  • CN: 11-5823/TU
  • 主管:中国科学技术协会
  • 主办:中国图学学会
  • 承办:中国建筑科学研究院有限公司

2024, 16(4): 7-13. doi: 10.16670/j.cnki.cn11-5823/tu.2024.04.02

历史建筑多模态检索方法研究

1. 

华中科技大学 土木与水利工程学院,武汉 430074

2. 

国家数字建造技术创新中心,武汉 430074

通讯作者: 陈维亚,

网络出版日期: 2024-08-20

作者简介: 袁嘉梦(1999-),女,硕士,主要研究方向:计算机视觉及历史建筑保护

基金项目: 国家自然科学基金项目 72001086

Research on Multimodal Retrieval Methods for Historical Buildings

1. 

School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan 430074, China

2. 

National Digital Construction Technology Innovation Center, Wuhan 430074, China

Corresponding author: Weiya Chen,

Available Online: 2024-08-20

引用本文: 袁嘉梦, 陈浪, 陈维亚, 骆汉宾. 历史建筑多模态检索方法研究[J]. 土木建筑工程信息技术, 2024, 16(4): 7-13. doi: 10.16670/j.cnki.cn11-5823/tu.2024.04.02

Citation: Jiameng Yuan, Lang Chen, Weiya Chen, Hanbin Luo. Research on Multimodal Retrieval Methods for Historical Buildings[J]. Journal of Information Technologyin Civil Engineering and Architecture, 2024, 16(4): 7-13. doi: 10.16670/j.cnki.cn11-5823/tu.2024.04.02

摘要:在HBIM (Historic Building Information Modeling) 数据库中进行信息查询面临三个问题:一是没有普适性的规则判断建筑之间的相似性;二是未考虑建筑本身所包含的历史文化信息;三是查询文本多基于关键词,难以检索到关键词未包含的信息。针对以上问题,提出了一种面向历史建筑的多模态检索方法,用户能通过输入图像或自然语言文本数据,检索到与输入特征相符的建筑,并以列表形式进行排序。在以图像检索建筑时,利用“dino_vit16”模型对图像进行特征提取,所提出的图像-建筑检索方法检索精度达90.08%;在文本检索建筑时则基于CLIP(Contrastive Language-Image Pre-training)模型建立图像和文本的关联,研究了图文相似度和文本相似度权重的取值,选择m =0.6,n =0.4作为权重的最佳配置。实验证明所提出的文本-建筑检索算法对于包含某种外观特征查询语句的检索效果最好,对于描述某种功能和建筑风格的查询语句检索效果最差,而当查询语句中包含4个以上的混合特征,能够描述出建筑的基本面貌时,可以准确地检索到符合条件的建筑。

关键词: 历史建筑, HBIM, ViT, 相似性度量, 多模态检索
[1]

Murphy M., Mcgovern E., Pavia S., et al. Historic building information modelling (HBIM), 2009: 311-327.

[2]

Dore C., Murphy M. . Integration of historic building information modeling (HBIM) and 3D GIS for recording and managing cultural heritage sites[C]// International Conference on Virtual Systems and Multimedia, 2012.

[3]

Murphy M., Corns A., Cahill J., et al. Developing Historic Building Information Modelling Guidelines and Procedures for Architectural Heritage in Ireland[J]. Semantic Scholar, 2017, 8: 539-546.

[4]

López F J, Lerones P M, Llamas J, et al. A review of heritage building information modeling (HBIM)[J]. Multimodal Technologies and Interaction, 2018, 2(2): 21.doi: 10.3390/mti2020021

[5]

Devesh R, Jha J, Jayaswal R, et al. Retrieval of monuments images through ACO optimization approach[J]. Int. Res. J. Eng. Technol, 2017, 4(7): 279-285.

[6]

Devesh R, Jha J. An Efficient Approach for Monuments Image Retrieval Using Multi-visual Descriptors[C]//Proceeding of the Second International Conference on Microelectronics, Computing & Communication Systems (MCCS 2017). Springer Singapore, 2019: 281-293.

[7]

Jha J, Bhaduaria S S. A novel approach for retrieval of historical monuments images using visual contents and unsupervised machine learning[J]. Int J, 2020, 9(3).

[8]

文政颖, 卫欣. 多分辨批量古典建筑图像深度学习检索算法[J]. 河南工程学院学报(自然科学版), 2019, 31(02): 66-71.

[9]

杨蕾. 基于深度学习的地标建筑图像检索研究与实现[D]. 西安: 西安建筑科技大学, 2022.

[10]

Agarwal A, Saxena V. Content based multimodal retrieval for databases of Indian monuments[C]//Contemporary Computing: Third International Conference, IC3 2010, Noida, India. Proceedings, Part Ⅰ 3. Springer Berlin Heidelberg, 2010: 446-455.

[11]

Wu H, Mao J, Zhang Y, et al. Unified visual-semantic embeddings: Bridging vision and language with structured meaning representations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 6609-6618.

[12]

Matsubara T. Target-oriented deformation of visual-semantic embedding space[J]. Leice Transactions on Information and Systems, 2021, 104(1): 24-33.

[13]

Wang Z, Liu X, Li H, et al. Camp: Cross-modal adaptive message passing for text-image retrieval[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 5764-5773.

[14]

Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.doi: 10.1145/3065386

[15]

Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv, 2014: 1409.1556.

[16]

Chen W, Liu Y, Wang W, et al. Deep learning for instance retrieval: A survey[J]. arXiv preprint arXiv, 2021: 2101.11282.

[17]

Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30.

[18]

Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv, 2010: 11929, 2020.

[19]

Caron M, Touvron H, Misra I, et al. Emerging properties in self-supervised vision transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 9650-9660.

[20]

Radford A, Kim J W, Hallacy C, et al. Learning transferable visual models from natural language supervision[C]//International Conference on Machine Learning. PMLR, 2021: 8748-8763.

[21]

青岛安娜别墅_百度百科(baidu. com) [EB/OL] (2023-08-19) [2023-10-28]

[22]

Chen W, Yuan J, Luo H. Design and development of heritage building information model (HBIM) database to support maintenance[J]. EG-ICE International Workshop on Intelligent Computing in Engineering, 2022: 359-367.

计量
  • PDF下载量(14)
  • 文章访问量(532)
  • HTML全文浏览量(248)
目录

Figures And Tables

历史建筑多模态检索方法研究

袁嘉梦, 陈浪, 陈维亚, 骆汉宾

  • 版权所有© 《土木建筑工程信息技术》编辑部
  • 京ICP备17057008号
  • 地址:北京市朝阳区兴化路2号院1号楼
  • 电话:010-64517910 邮编:100013
  • 微信号:tmjzgcxxjs  QQ:3676678954  E-mail:tmqk@cgn.net.cn
本系统由北京仁和汇智信息技术有限公司设计开 技术支持: info@rhhz.net