空间大数据
空间大数据研究综述与范式演进
梳理空间大数据的发展背景、特征、技术体系、科研范式演进以及面临的挑战,提供全局性视角。
- The Era of Big Spatial Data: A Survey(Ahmed Eldawy, Mohamed F. Mokbel, 2016, Foundations and Trends in Databases)
- Geospatial Big Data: Challenges and Opportunities(Jae-Gil Lee, Minseo Kang, 2015, Big Data Research)
- 多模态地理大数据时空分析方法(2020)
- 空间位置信息的多源POI数据融合
- 面向多源地理空间数据的知识图谱构建(2020)
- Introduction to Big Data Computing for Geospatial Applications(Zhenlong Li, Wenwu Tang, Qunying Huang, Eric Shook, Q. Guan, 2020, ISPRS International Journal of Geo-Information)
- Considerations on geospatial big data(Z Liu, H Guo, C Wang, 2016, IOP Conference Series: Earth and …)
- Paradigms for spatial and spatio-temporal data mining(H. Miller, Jiawei Han, 2001, Geographic Data Mining and Knowledge Discovery)
- Big Geospatial Data or Geospatial Big Data? A Systematic Narrative Review on the Use of Spatial Data Infrastructures for Big Geospatial Sensing Data in Public Health(K. Koh, A. Hyder, Yogita Karale, M. Boulos, 2022, Remote Sensing)
- The era of Big Spatial Data(Ahmed Eldawy, M. Mokbel, 2015, 2016 IEEE 32nd International Conference on Data Engineering (ICDE))
- Geospatial big data: theory, methods, and applications(Lei Zou, Yongze Song, Guido Cervone, 2024, Annals of GIS)
空间数据库管理与分布式索引技术
聚焦于大规模地理空间数据在云计算与分布式环境下的存储架构、高效索引构建及高性能查询处理机制。
- CQRtree空间数据库索引结构及实现算法(卢炎生, 向祥兵, 潘鹏, 2006, 计算机工程与科学)
- Storing and Indexing Spatial Data in P2P Systems(Verena Kantere, Spiros Skiadopoulos, T. Sellis, 2009, IEEE Transactions on Knowledge and Data Engineering)
- Indexing non-uniform spatial data(K. Kanth, A. El Abbadi, D. Agrawal, Ambuj K. Singh, 1997, Proceedings of the 1997 International Database Engineering and Applications Symposium (Cat. No.97TB100166))
- 空间数据库中连接运算的处理与优化(李立言, 秦小麟, 2003, 中国图象图形学报)
- RISK: Efficiently processing rich spatial-keyword queries on encrypted geo-textual data(Zhen Lv, C. Cao, H. Huo, Jiangtao Cui, Yanguo Peng, Hui Li, Yingfan Liu, 2026, arXiv.org)
- Lightweight Indexing and Querying Services for Big Spatial Data(Kisung Lee, Ling Liu, R. Ganti, M. Srivatsa, Qi Zhang, Yang Zhou, Qingyang Wang, 2019, IEEE Transactions on Services Computing)
- Indexing Issues in Spatial Big Data Management(Shahnawaz Khan, Thirunavukkarasu Kannapiran, 2019, SSRN Electronic Journal)
- DAPR-tree: a distributed spatial data indexing scheme with data access patterns to support Digital Earth initiatives(J. Xia, Sicheng Huang, Shaobiao Zhang, Xiaoming Li, Jianrong Lyu, Wenqun Xiu, Wei Tu, 2020, International Journal of Digital Earth)
- Spatial indexing and analytics on Hadoop(Randall T. Whitman, Michael B. Park, S. Ambrose, E. Hoel, 2014, Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems)
- Spatial Indexing for Data Searching in Mobile Sensing Environments(Yuchao Zhou, Suparna De, Wei Wang, K. Moessner, M. Palaniswami, 2017, Sensors)
- Spatial Indexing with a Scale Dimension(Mike Hörhammer, M. Freeston, 1999, Lecture Notes in Computer Science)
- SpatialHadoop: A MapReduce framework for spatial data(Ahmed Eldawy, M. Mokbel, 2015, 2015 IEEE 31st International Conference on Data Engineering)
- Mining, indexing, and querying historical spatiotemporal data(N. Mamoulis, H. Cao, G. Kollios, Marios Hadjieleftheriou, Yufei Tao, D. Cheung, 2004, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining)
- 遥感大数据分布式技术研究与实现(罗敬宁, 刘立葳, 2017, 航天返回与遥感)
- Geospatial Big Data Platforms: A Comprehensive Review(Yassine Loukili, Y. Lakhrissi, Safae Elhaj Ben Ali, 2022, KN - Journal of Cartography and Geographic Information)
- Indexing spatial data in cloud data managements(Ling-Yin Wei, Ya-Ting Hsu, Wen-Chih Peng, Wang-Chien Lee, 2014, Pervasive and Mobile Computing)
- Optimal Bounds-Only Pruning for Spatial AkNN Joins(Dominik Winecki, 2026, arXiv.org)
- Spatial indexing of distributed multidimensional datasets(Beomseok Nam, Alan Sussman, 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005.)
- Hybrid mist-cloud systems for large scale geospatial big data analytics and processing: opportunities and challenges(Rabindra Kumar Barik, C. Misra, R. K. Lenka, Harishchandra Dubey, K. Mankodiya, 2019, Arabian Journal of Geosciences)
- Retrieving and Indexing Spatial Data in the Cloud Computing Environment(Yonggang Wang, Sheng Wang, D. Zhou, 2009, Lecture Notes in Computer Science)
- SksOpen: Efficient Indexing, Querying, and Visualization of Geo-spatial Big Data(Yun Lu, Mingjin Zhang, S. Witherspoon, Y. Yesha, Y. Yesha, N. Rishe, 2013, 2013 12th International Conference on Machine Learning and Applications)
- 空间索引技术的研究(李萍, 2003, 盐城工学院学报)
- Performance Comparison of Spatial Data Indexing Using Distributed Systems(Hatem Mosa, Amro Saleh, A. Al-badarneh, 2025, 2025 International Conference on New Trends in Computing Sciences (ICTCS))
- 大数据时代的GIS软件技术发展(宋关福, 钟耳顺, 李绍俊, 蔡文文, 王少华, 2018, 测绘地理信息)
- Spatial Data Management in IoT systems: A study of available storage and indexing solutions(Maria Krommyda, Verena Kantere, 2020, 2020 Second International Conference on Transdisciplinary AI (TransAI))
- Analytical Insight of Earth: A Cloud-Platform of Intelligent Computing for Geospatial Big Data(Hao Xu, Yuanbin Man, Mingyang Yang, Jichao Wu, Qi Zhang, Jing Wang, 2023, arXiv.org)
- Cloud Computing for Geospatial Big Data Analytics(H Das, RK Barik, H Dubey, DS Roy, 2019, Studies in Big Data)
- Uncertain spatial data handling: Modeling, indexing and query(Rui Li, B. Bhanu, C. Ravishankar, M. Kurth, Jinfeng Ni, 2007, Computers & Geosciences)
时空大数据挖掘、建模与智能预测
研究针对时空轨迹及相关数据的模式发现、机器学习算法、图神经网络及动力学建模,实现时空信号的精确预测。
- Chronnet: a network-based model for spatiotemporal data analysis(L. N. Ferreira, D. Vega-Oliveros, M. Cotacallapa, M. Cardoso, M. G. Quiles, Liang Zhao, E. Macau, 2020, arXiv.org)
- SPATIOTEMPORAL DATA MINING : ISSUES , TASKS AND APPLICATIONS(K. .. Rao, A. Govardhan, K. Rao, 2012, International Journal of Computer Science & Engineering Survey)
- Spatiotemporal Data Mining: A Computational Perspective(S. Shekhar, Zhe Jiang, Reem Y. Ali, E. Eftelioglu, Xun Tang, Venkata M. V. Gunturi, Xun Zhou, 2015, ISPRS International Journal of Geo-Information)
- Spatiotemporal Data Mining: A Survey(Arun Sharma, Zhe Jiang, S. Shekhar, 2022, arXiv.org)
- Spatiotemporal Data Mining Problems and Methods(Eleftheria Koutsaki, George Vardakis, N. Papadakis, 2023, Analytics)
- Spatiotemporal data mining in the era of big spatial data: algorithms and applications(Ranga Raju Vatsavai, A. Ganguly, V. Chandola, A. Stefanidis, S. Klasky, S. Shekhar, 2012, Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data)
- 论空间数据挖掘和知识发现(Li, D, Wang, S, Shi, W, Wang, X, 2001, 武汉大学学报. 信息科学版 (Geomatics and information science of Wuhan University))
- Deep learning-based urban big data fusion in smart cities: Towards traffic monitoring and flow-preserving fusion(Sulaiman Khan, S. Nazir, I. García-Magariño, Anwar Hussain, 2021, Computers & Electrical Engineering)
- URBAN-SPIN: A street-level bikeability index to inform design implementations in historical city centres(Haining Ding, Chenxi Wang, M. Gath-Morad, 2026, arXiv.org)
- Accelerating Statewide Connected Vehicles Big (Sensor Fusion) Data ETL Pipelines on GPUs(Abdul Rashid Mussah, Maged Shoman, M. Amo-Boateng, Y. Adu-Gyamfi, 2023, arXiv.org)
- Traffic Flow Data Completion and Anomaly Diagnosis via Sparse and Low-Rank Tensor Optimization(Junxi Man, Yumin Lin, Xiaoyu Li, 2025, arXiv.org)
- Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach(Qianlong Wang, Y. Guo, Lixing Yu, Pan Li, 2020, IEEE Transactions on Emerging Topics in Computing)
- SENDAI: A Hierarchical Sparse-measurement, EfficieNt Data AssImilation Framework(Xing Zhang, Y. Bao, M. Gao, J. N. Kutz, 2026, arXiv.org)
- Mining GIS Data to Predict Urban Sprawl(Anita Pampoore-Thampi, A. Varde, Danlin Yu, 2021, arXiv.org)
- Risk Prediction on Traffic Accidents using a Compact Neural Model for Multimodal Information Fusion over Urban Big Data(Wenshan Wang, Su Yang, Weishan Zhang, 2021, arXiv.org)
- Explore Spatiotemporal and Demographic Characteristics of Human Mobility via Twitter: A Case Study of Chicago(Feixiong Luo, G. Cao, K. Mulligan, Xiang Li, 2015, arXiv.org)
- SpatCode: Rotary-based Unified Encoding Framework for Efficient Spatiotemporal Vector Retrieval(Bingde Hu, Enhao Pan, Wanjing Zhou, Yang Gao, Zunlei Feng, Hao Zhong, 2026, arXiv.org)
- A Dataset for Spatiotemporal-Sensitive POI Question Answering(Xiao Han, Dayan Pan, Xiangyu Zhao, Xuyuan Hu, Zhaolin Deng, Xiangjie Kong, Guojiang Shen, 2025, arXiv.org)
- Representation Learning of Pedestrian Trajectories Using Actor-Critic Sequence-to-Sequence Autoencoder(Ka-Ho Chow, Anish Hiranandani, Yifeng Zhang, Shueng-Han Gary Chan, 2018, arXiv.org)
- SparseST: Exploiting Data Sparsity in Spatiotemporal Modeling and Prediction(Junfeng Wu, Hadjer Benmeziane, K. E. Maghraoui, Liu Liu, Yinan Wang, 2025, arXiv.org)
- Traj-Transformer: Diffusion Models with Transformer for GPS Trajectory Generation(Zhiyang Zhang, Ning Chen, Xin Zhang, Yanhua Li, Shen Su, Hui Lu, Jun Luo, 2025, arXiv.org)
- Spatiotemporal data mining: a survey on challenges and open problems(Ali Hamdi, K. Shaban, A. Erradi, Amr Mohamed, Shakila Khan Rumi, Flora D. Salim, 2021, Artificial Intelligence Review)
- A Survey on Spatial and Spatiotemporal Prediction Methods(Zhe Jiang, 2020, arXiv.org)
- Spatiotemporal Contrastive Learning for Cross-View Video Localization in Unstructured Off-road Terrains(Zhiyun Deng, Dongmyeong Lee, Amanda Adkins, Jesse Quattrociocchi, Christian Ellis, Joydeep Biswas, 2025, arXiv.org)
- A bibliography of temporal, spatial and spatio-temporal data mining research(J. Roddick, M. Spiliopoulou, 1999, ACM SIGKDD Explorations Newsletter)
- An Updated Bibliography of Temporal, Spatial, and Spatio-temporal Data Mining Research(J. Roddick, K. Stewart, M. Spiliopoulou, 2000, Lecture Notes in Computer Science)
- Graph Learning for Spatiotemporal Signal with Long Short-Term Characterization(Yueliang Liu, Wenbin Guo, Kangyong You, Lei Zhao, Tao Peng, Wenbo Wang, 2019, arXiv.org)
- 出租车轨迹数据挖掘进展(吴华意, 黄蕊, 游兰, 向隆刚, 2019, 测绘学报)
- AgriPINN: A Process-Informed Neural Network for Interpretable and Scalable Crop Biomass Prediction Under Water Stress(Yue Shi, Liangxiu Han, Xin Zhang, Tam Sobeih, T. Gaiser, Nguyen Huu Thuy, D. Behrend, A. Srivastava, Krishnagopal Halder, Frank Ewert, 2026, arXiv.org)
- CityGuard: Graph-Aware Private Descriptors for Bias-Resilient Identity Search Across Urban Cameras(Rong Fu, Wenxin Zhang, Yibo Meng, Jia Yee Tan, Jiaxuan Lu, Rui Lu, Jie Wu, Zhaolu Kang, Simon Fong, 2026, arXiv.org)
- Spatio-temporal Data Mining for Climate Data: Advances, Challenges, and Opportunities(James H. Faghmous, Vipin Kumar, 2014, Studies in Big Data)
- Temporal and spatial heterogeneity research of urban anthropogenic heat emissions based on multi-source spatial big data fusion for Xi’an, China(Duo Xu, Dian Zhou, Yupeng Wang, Xiangzhao Meng, Zhaolin Gu, Yujun Yang, 2021, Energy and Buildings)
- Spatial-Morphological Modeling for Multi-Attribute Imputation of Urban Blocks(V. Starikov, Ruslan Kozliak, G. Kontsevik, Sergey A. Mityagin, 2026, arXiv.org)
- LID Framework: A new method for geospatial and exploratory data analysis of potential innovation deter-minants at the neighborhood level(Eleni Oikonomaki, Dimitris Belivanis, C. Kakderi, 2026, arXiv.org)
- Combining data from multiple sources for urban travel mode choice modelling(Maciej Grzenda, Marcin Luckner, Jakub Zawieska, Przemyslaw Wrona, 2024, arXiv.org)
- GeoAI for Knowledge Graph Construction: Identifying Causality Between Cascading Events to Support Environmental Resilience Research(Yuanyuan Tian, Wenwen Li, 2022, arXiv.org)
- Spatial and spatiotemporal data mining: Recent advances(S Shekhar, RR Vatsavai, 2008, … generation of data mining)
- Towards Daily High-resolution Inundation Observations using Deep Learning and EO(A. Dasgupta, Lasse Hybbeneth, B. Waske, 2022, arXiv.org)
- Geospatial Big Data, Analytics and IoT: Challenges, Applications and Potential(R. Kashyap, 2018, Studies in Big Data)
- A Data as a Service (DaaS) Model for GPU-based Data Analytics(John Olorunfemi Abe, Burak Berk Ustundaug, 2018, arXiv.org)
- Kriformer: A Novel Spatiotemporal Kriging Approach Based on Graph Transformers(Renbin Pan, Feng Xiao, Hegui Zhang, Minyu Shen, 2024, arXiv.org)
- An Empirical Survey and Benchmark of Learned Distance Indexes for Road Networks(G. Choudhary, Libin Zhou, Yeasir Rayhan, Walid G. Aref, 2026, arXiv.org)
- p-Causality: Identifying Spatiotemporal Causal Pathways for Air Pollutants with Urban Big Data(Julie Yixuan Zhu, Chao Zhang, Huichu Zhang, Shi Zhi, V. Li, Jiawei Han, Yu Zheng, 2016, arXiv.org)
多源异构空间数据融合与协同分析
探讨多来源、多时相、多尺度空间数据的融合重构技术,旨在通过跨源数据协作提升环境感知与应用分析精度。
- Geospatial Big Data Analytics Engine for Spark(Shaohua Wang, Y. Zhong, Hao Lu, E. Wang, W. Yun, W. Cai, 2017, Proceedings of the 6th ACM SIGSPATIAL Workshop on Analytics for Big Geospatial Data)
- Quantifying production-living-ecology functions with spatial detail using big data fusion and mining approaches: A case study of a typical karst region in Southwest China(Jingxin Li, Hongqi Zhang, Erqi Xu, 2022, Ecological Indicators)
- Multi-source data fusion of big spatial-temporal data in soil, geo-engineering and environmental studies.(D. Di Curzio, A. Castrignanò, S. Fountas, M. Romić, R. V. Viscarra Rossel, 2021, Science of The Total Environment)
- 时空大数据背景下并行数据处理分析挖掘的进展及趋势(关雪峰, 曾宇媚, 2018, 地理科学进展)
- Big data environment for geospatial data analysis(P. Praveen, Ch. Jayanth Babu, B. Rama, 2016, 2016 International Conference on Communication and Electronics Systems (ICCES))
- Multi-source imagery fusion using deep learning in a cloud computing platform(Carlos A. Theran, Michael A. Álvarez, Emmanuel Arzuaga, H. Sierra, 2023, arXiv.org)
- 多源气象数据融合格点实况产品研制进展(师春香, 潘旸, 谷军霞, 徐宾, 韩帅, 朱智, 张雷, 孙帅, 姜志伟, 2019, 气象学报)
- Spatial data fusion in Spatial Data Infrastructures using Linked Data(S. Wiemann, L. Bernard, 2016, International Journal of Geographical Information Science)
- Evaluation of Polycentric Spatial Structure in the Urban Agglomeration of the Pearl River Delta (PRD) Based on Multi-Source Big Data Fusion(Xiong He, Yongwang Cao, Chunshan Zhou, 2021, Remote Sensing)
- Spatio-Temporal Data Fusion for Very Large Remote Sensing Datasets(H. Nguyen, M. Katzfuss, N. Cressie, A. Braverman, 2014, Technometrics)
- Multisource Open Geospatial Big Data Fusion: Application of the Method to Demarcate Urban Agglomeration Footprints(Nelunika Priyashani, N. Kankanamge, Tan Yigitcanlar, 2023, Land)
- Urban big data fusion based on deep learning: An overview(Jia Liu, Tianrui Li, Peng Xie, Shengdong Du, Fei Teng, Xin Yang, 2020, Information Fusion)
- Multivariate Spatial Data Fusion for Very Large Remote Sensing Datasets(H. Nguyen, N. Cressie, A. Braverman, 2017, Remote Sensing)
- 基于多源数据融合的地表覆盖数据重建研究进展综述(陈迪, 吴文斌, 陆苗, 胡琼, 周清波, 2016, 中国农业资源与区划)
- Utilizing Spatiotemporal Data Analytics to Pinpoint Outage Location(Reddy Mandati, Po-Chen Chen, V. Anderson, Bishwa Sapkota, Michael Jarrell Warren, Bobby Besharati, Ankush Agarwal, Samuel Johnston, 2024, arXiv.org)
- Assessing the livability within the 15-minute city concept based on mobile phone data(Tianqiong Wang, Teemu Jama, H. Tenkanen, 2026, arXiv.org)
- Keyword-based Community Search in Bipartite Spatial-Social Networks (Technical Report)(Kovan A. Bavi, Xiang Lian, 2026, arXiv.org)
- 多源矢量空间数据融合处理技术研究进展(孙群, 2017, 测绘学报)
- 服务于海洋碳通量研究的时空分布式存算一体化架构平台——2018年中国地理学会地理大数据计算环境“优秀实用案例”(2018)
- Study for Multi-Resources Spatial Data Fusion Methods in Big Data Environment(Zhiquan Huang, Yu Fu, Fuchu Dai, 2018, Intelligent Automation and Soft Computing)
- 面向大规模空间Agent建模的分布式地理模拟框架(2022)
- 多源卫星遥感影像时空融合研究的现状及展望(黄波, 赵涌泉, 2017, 测绘学报)
- The Spatiotemporal Data Fusion (STDF) Approach: IoT-Based Data Fusion Using Big Data Analytics(D. Fawzy, Sherin M. Moussa, N. Badr, 2021, Sensors)
- Spatial data fusion for large non‐Gaussian remote sensing datasets(Hongxiang Shi, E. Kang, 2017, Stat)
- Remote Sensing and Geospatial Analysis in the Big Data Era: A Survey(Elias Dritsas, M. Trigka, 2025, Remote Sensing)
- Australian Bushfire Intelligence with AI-Driven Environmental Analytics(Tanvi K Jois, Hussain Ahmad, F. Noor, Faheem Ullah, 2026, arXiv.org)
- 时空大数据时代的地图学(王家耀, 2017, 测绘学报)
- Geospatial Big Data for Environmental and Agricultural Applications(Athanasios Karmas, A. Tzotsos, K. Karantzalos, 2016, Big Data Concepts, Theories, and Applications)
空间数据隐私保护与安全计算
专门研究在大数据环境下进行空间数据挖掘和处理时的隐私安全保护机制、合规方法与数据安全挖掘技术。
- Privacy-Aware Data Fusion and Prediction With Spatial-Temporal Context for Smart City Industrial Environment(Lianyong Qi, Chunhua Hu, Xuyun Zhang, M. Khosravi, Suraj Sharma, Shaoning Pang, Tian Wang, 2021, IEEE Transactions on Industrial Informatics)
- Privacy in Spatiotemporal Data Mining(F. Bonchi, Y. Saygin, V. Verykios, Maurizio Atzori, A. Gkoulalas-Divanis, Selim Volkan Kaya, E. Savaş, 2008, Mobility, Data Mining and Privacy)
本报告将空间大数据领域的文献划分为五大核心维度:综述与研究范式、存储与索引基础设施、数据挖掘与智能建模、多源数据融合应用、以及隐私与安全保障。该结构覆盖了从底层存取到高层应用及安全合规的完整技术链路,体现了当前研究从早期的空间索引优化,向融合人工智能与高性能计算的动态时空知识发现的深刻转型。
总计114篇相关文献
地图学研究的主战场是“地图”,但不同历史时期的地图其信息源、主题、内容、载体、形式、制作方法和应用方式是不同的,当然它的全方位价值也就不同了。从科学史上的科学范式的变化来看,随着大数据时代的到来,如今已经步入“数据密集型”科学范式时代,地图学亦如此,具有明显的大数据科学的特征。所有大数据都是由包括人类活动在内的地理世界的任何事物和任何现象运动变化产生的,都具有空间和时间特性,当然也就离不开空间参照和时间参照。因此,大数据本质上就是时空大数据。自20世纪50年代末60年代初以来的现代地图学,即信息化时代的地图学,是以时空数据为对象的,其核心是时空数据处理与表达,但并没有像今天面对天空地海一体的大规模多源(元)异构和多维动态的数据流(或流数据),地图的实时动态性、主题针对性、内容复合性、载体多样化、表现形式个性化、制作方法现代化、应用泛在化等特征,是以往任何时期都无法比拟的,这就产生了地图学理论、技术和应用体系的巨大变化。而所有这些变化都正好发生在20世纪50年代末和60年代初以来的约60年间,故以本文纪念《测绘学报》创刊60周年。
随着现代科技和传感器的发展和应用 ,复杂多变的空间数据日益膨胀 ,远远超出了人的解译能力 ,迫切需要数据挖掘和知识发现为其提供知识。本文研究了空间数据挖掘和知识发现的含义、可发现的空间的关联、特征、分类和聚类等知识 ,以及它与数据挖掘和知识发现、机器学习、地学数据分析、空间数据库、空间数据仓库、数字地球等相关学科的关系 ,概述了SDMKD的产生和发展 ,分析和展望了SDMKD的应用开发
随着互联网、物联网和云计算的高速发展,与时间、空间相关的数据呈现出“爆炸式”增长的趋势,时空大数据时代已经来临。时空大数据除具备大数据典型的“4V”特性外,还具备丰富的语义特征和时空动态关联特性,已经成为地理学者分析自然地理环境、感知人类社会活动规律的重要资源。然而在具体研究应用中,传统数据处理和分析方法已无法满足时空大数据高效存取、实时处理、智能挖掘的性能需求。因此,时空大数据与高性能计算/云计算融合是必然的发展趋势。在此背景下,本文首先从大数据的起源出发,回顾了大数据概念的发展历程,以及时空大数据的特有特征;然后分析了时空大数据研究应用产生的性能需求,总结了底层平台软硬件的发展现状;进而重点从时空大数据的存储管理、时空分析和领域挖掘3个角度对并行化现状进行了总结,阐述了其中存在的问题;最后指出了时空大数据研究发展趋势。
大数据、物联网与精密定位技术的发展推动了城市感知的进步。随着社会活动的与日俱增,出租车轨迹数据不仅记录了出租车的行车轨迹,还蕴藏着道路交通状态、城市居民出行规律、城市结构及其他社会问题。通过各种数据分析与挖掘手段对出租车轨迹数据进行深入探究,对于智能交通、城市规划等有着重要意义。本文综述了近十年国内外基于出租车轨迹大数据的相关研究,按照空间统计方法、时间序列方法、图论与网络方法及机器学习方法等4类,详细阐述各类方法的研究现状。随后,本文分析了现有研究的应用领域、热点主题和发展趋势。最后,本文指出了出租车轨迹数据挖掘研究领域面临的挑战和未来研究方向。
新时代国土空间规划的核心目标为“满足人民对美好生活的需求”和“自然资源保护”并重。作为国家部门调整后的新规划类型,国土空间规划编制的方法研究整体缺乏,学者主要聚焦于自然资源本身的禀赋情况,利用传统统计、空间及调查数据和统计分析、空间分析及归纳演绎等方法对国家及省域层面的国土空间承载力与适宜性进行静态评价,对生态红线、基本农田保护线、城镇开发边界进行刚性划定。然而,现有研究较少考虑人类活动对国土空间利用的动态影响,缺乏对新发展趋势之下的生态空间、农业空间及城镇空间的科学安排。本文引入能够直接反映人类活动时空变化的大数据,重点从国土空间适宜性评价、生态空间规划、农业空间规划及城镇空间规划四个环节探讨了大数据应用的方向与具体方法框架,强调“自然空间”+“社会经济活动”相互作用下的国土空间规划编制的科学化路径。
面向卫星遥感海量数据,针对其数据量的急速增长,对数据分析、价值挖掘提出了全新的挑战,引入驱动大数据应用的分布式模式,建立了适应卫星遥感大数据的网格模型,打破了数据的时空割裂和限制,数据可以作为整体进行存储、计算和应用,模型设计的网格、时间片、物理层的基本结构,可以保证未来云计算的实施。该文提出了基于希尔伯特曲线的网格散列算法,以此建立的分布式系统具有优异的并行读写性能和良好的负载均衡能力;遥感大数据分布式系统,实现了数据的高速分布式并行读写,支持数据的精确时空匹配 and 动态获取,整个系统的扩展能力可以达到线性增长,系统基于通用软硬件平台实施,实现卫星遥感大数据灵活、按需和简便的应用。
空间大数据对GIS软件技术的发展提出了新的要求和挑战。但业界对于空间大数据的认知有待明晰,对于如何挖掘大数据的价值尚存疑虑。首先阐述了空间大数据的内涵,在此基础上,提出了大数据时代的GIS基础软件技术,并分析了其应用前景。大数据GIS软件技术包括针对空间大数据处理和挖掘的空间大数据技术,也包括针对经典空间数据管理和处理的对传统GIS功能的分布式重构,同时还需要云GIS技术和跨平台GIS技术作为支撑,提供弹性的计算资源和服务以及支撑跨平台的访问 and 应用。研究表明,大数据GIS软件技术和产品可以有效地降低大数据挖掘的技术门槛,降低空间大数据挖掘的成本。
矢量空间数据既是人类社会与地理环境信息的重要组成部分,也是相关社会信息的重要载体,在国民经济和国防现代化建设中起着非常重要的作用。多源矢量空间数据融合处理技术是解决多源数据在几何位置、属性特征等方面不一致性问题的有效方法,近年来相关的技术和应用得到了深入发展。本文在分析二维矢量空间数据应用所面临问题的基础上,综述和评价了二维矢量空间数据几何特征融合、属性特征融合等相关理论、算法和技术的研究现状,并根据目前的研究展望了其理论和应用未来的重点研究方向。
阐述了中外主要的多源气象数据融合产品研究进展与趋势,重点介绍了中国气象局国家气象信息中心研制的陆面气象要素(包括气温、降水、湿度、风、气压、辐射等)、土壤温度与土壤湿度、洋面温度与洋面风、三维云等多源融合格点产品研发现状,以及中国气象局国家气象信息中心多源数据融合中试平台及统一质量检验评估系统的进展,并对未来多源气象数据融合产品研制进行了展望。
高空间分辨率的地表或者大气环境动态监测需要高时间-空间分辨率的卫星遥感影像作为数据支撑,但由于卫星传感器硬件技术及卫星发射成本等客观因素的限制,使得获取高时空分辨率遥感影像的较为便捷高效、低成本的可行手段就是将分别具有高时间和高空间分辨率的多源遥感影像进行时空融合,从而生成不同研究和应用所需的高时空分辨率卫星影像。现阶段,虽然国内外的学者进行了大量的时空融合算法研究,但是这些研究都局限于特定的数据类型、算法原理、应用目的等客观限制,而且其发展呈现出多样性。本文对现有主流的时空融合算法研究进行了归纳总结,将其分为4种:① 基于地物组分的时空融合;② 基于地表空间信息的时空融合;③ 基于地物时相变化的时空融合;④ 组合性的时空融合。同时,本文还对时空融合算法中存在的问题和面临的挑战进行了分析,并对其未来的发展方向进行了前瞻性的展望。
地表覆盖数据对于全球环境变化、生物多样性和发展政策制定有着重要意义。遥感已成为获取地表覆盖数据的重要手段。而目前的地表覆盖数据产品,如GlobeLand30、FROM-GLC、MODIS Collection5、MODIS Cropland、GlobCover、GLC2000等,存在数据精度不高、数据间一致性较差、与统计数据差异较大等问题。因此,基于多源数据融合的数据重建方法成为目前研究中的热点问题。文章检索了近10年关于多源数据融合在地表覆盖数据重建中的应用的相关文献,概括了多源数据在数据重建中的应用现状,并对基于多源数据融合的地表覆盖数据重建方法进行了归纳总结,重点评述了不同方法的特点及应用情况,阐明了各种方法的优势与不足,同时对存在的问题进行探讨并展望了未来基于多源数据融合的地表覆盖数据重建研究的发展方向。基于多源数据融合的数据重建方法包括基于多源遥感数据融合法以及基于多源遥感和非遥感数据融合法。该文在对基于多源遥感数据融合的数据重建方法进行论述时,主要讨论了其中应用最广泛的两种融合方法:基于数据一致性的融合法和基于回归分析的融合法。对于其他基于多源遥感数据融合的数据重建方法,如基于D-S证据理论融合法、基于数据集成融合法、基于统计模型融合法,也列举了最具代表性的相关文献进行论述。在对基于遥感数据和非遥感数据融合的数据重建方法进行论述时,主要讨论了其3种空间分配方法:完全依赖法、部分依赖法、动态依赖法。在对目前研究进行探讨的过程中,该文对其研究区域、数据源、地表覆盖类型、空间分辨率、融合方法和文献来源进行总结分析,并重点就融合方法展开讨论。围绕各种融合方法在数据重建中的运用,该文归纳出目前研究中存在的主要问题:研究对象和区域上的不足,研究区多为全球及欧美,其他区域的研究过少,研究对象多为所有地表覆盖类型和森林,对耕地和草地的研究过少; 融合算法上的不足、重建结果精度上的不足。最后,指出基于多源数据融合的数据重建方法未来的发展方向,即综合运用两类方法,得到具有详细完整空间信息的长时间序列的地表覆盖数据集。
空间数据库的索引是提高空间数据库存储效率、空间检索性能的关键技术.本文在R树索引的基础上提出了一种新的空间数据库索引结构CQRtree,给出了CQRtree的数据结构、插入、删除、查询实现算法以及性能分析与比较,最后指出了进一步的研究方向.
空间数据库的性能问题严重制约了它的应用与发展.由于空间连接运算是空间数据库中最复杂、最耗时的基本操作,因此其处理效率在很大程度上决定了空间数据库的整体性能.尽管目前已经有许多空间连接算法,但空间连接运算的代价估计和查询优化仍然有待进一步研究.众所周知,大部分空间连接算法都是基于 R树索引实现的,如果参与空间连接运算的关系上没有索引或只有部分索引,那么就需要使用特殊的算法来处理.另外,各种算法的代价评估模型需要一个相对统一的计算方法,实践证明,根据空间数据库的实际情况,使用 I/ O代价来估计算法的复杂性较为合理.在此基础上,针对复杂的空间查询中可能出现多个关系参与空间连接运算的情况,故还需要合理地应用动态编程算法来找出代价最优的连接顺序,以便最终形成一个通用的算法框架.通过对该算法框架的复杂性分析可以看出,在此基础上实现的空间数据库查询优化系统将具有较高的时空效率,并且能够处理非常复杂的空间查询
The geography of innovation offers a framework to understand how territorial characteristics shape innovation, often via spatial and cognitive proximity. Empirical research has focused largely on national and regional scales, while urban and sub-regional geographies receive less attention. Local studies typically rely on limited indicators (e.g., firm-level data, patents, basic socioeconomic measures), with few offering a systematic framework integrating urban form, mobility, amenities, and human-capital proxies at the neighborhood scale. Our study investigates innovation at a finer spatial resolution, going beyond proprietary or static indicators. We develop the Local Innovation Determinants (LID) database and framework to identify key enabling factors across regions, combining traditional government data with publicly available data via APIs for a more granular understanding of spatial dynamics shaping innovation capacity. Using exploratory big and geospatial data analytics and random forest models, we examine neighborhoods in New York and Massachusetts across four dimensions: social factors, economic characteristics, land use and mobility, morphology, and environment. Results show that alternative data sources offer significant yet underexplored potential to enhance insights into innovation dynamics. City policymakers should consider neighborhood-specific determinants and characteristics when designing and implementing local innovation strategies.
Understanding the exact fault location in the post-event analysis is the key to improving the accuracy of outage management. Unfortunately, the fault location is not generally well documented during the restoration process, creating a big challenge for post-event analysis. By utilizing various data source systems, including outage management system (OMS) data, asset geospatial information system (GIS) data, and vehicle location data, this paper creates a novel method to pinpoint the outage location accurately to create additional insights for distribution operations and performance teams during the post-event analysis.
The rapid accumulation of Earth observation data presents a formidable challenge for the processing capabilities of traditional remote sensing desktop software, particularly when it comes to analyzing expansive geographical areas and prolonged temporal sequences. Cloud computing has emerged as a transformative solution, surmounting the barriers traditionally associated with the management and computation of voluminous datasets. This paper introduces the Analytical Insight of Earth (AI Earth), an innovative remote sensing intelligent computing cloud platform, powered by the robust Alibaba Cloud infrastructure. AI Earth provides an extensive collection of publicly available remote sensing datasets, along with a suite of computational tools powered by a high-performance computing engine. Furthermore, it provides a variety of classic deep learning (DL) models and a novel remote sensing large vision segmentation model tailored to different recognition tasks. The platform enables users to upload their unique samples for model training and to deploy third-party models, thereby increasing the accessibility and openness of DL applications. This platform will facilitate researchers in leveraging remote sensing data for large-scale applied research in areas such as resources, environment, ecology, and climate.
Several approaches have been recently proposed for community search in bipartite graphs. These methods have shown promising results in identifying communities in real-world bipartite networks, such as social and biological networks. Given a query user $q$, community search in bipartite graphs involves identifying a group of users containing $q$, with common characteristics or functions within a given bipartite graph. These problems are particularly challenging because bipartite graphs have two distinct sets of nodes, and community search algorithms must account for this structure. However, finding communities in keyword-based bipartite spatial-social networks has yet to be investigated enough. The spatial-social networks are naturally structured as bipartite graphs. Thus, this paper proposes a new community search problem in Bipartite spatial-social networks with a novel $(\omega, \pi)\mbox{-}keyword\mbox{-}core$, named Keyword-based Community Search in Bipartite Spatial-Social Networks ($KCS\mbox{-}BSSN$). The $KCS\mbox{-}BSSN$ returns a tightly-knit community, significant social influence, minimal travel distance, and includes a $(\omega, \pi)\mbox{-}keyword\mbox{-}core$. To address the $KCS\mbox{-}BSSN$ problem, we have developed pruning methods that effectively filter out irrelevant users and points of interest. To improve query-answering efficiency, we have also proposed an indexing technique named the bipartite-spatial-social index. Our pruning techniques, and indexing approach, have proven effective and efficient through experiments with real and artificial data sets.
Symmetric searchable encryption (SSE) for geo-textual data has attracted significant attention. However, existing schemes rely on task-specific, incompatible indices for isolated specific secure queries (e.g., range or k-nearest neighbor spatial-keyword queries), limiting practicality due to prohibitive multi-index overhead. To address this, we propose RISK, a model for rich spatial-keyword queries on encrypted geo-textual data. In a textual-first-then-spatial manner, RISK is built on a novel k-nearest neighbor quadtree (kQ-tree) that embeds representative and regional nearest neighbors, with the kQ-tree further encrypted using standard cryptographic tools (e.g., keyed hash functions and symmetric encryption). Overall, RISK seamlessly supports both secure range and k-nearest neighbor queries, is provably secure under IND-CKA2 model, and extensible to multi-party scenarios and dynamic updates. Experiments on three real-world and one synthetic datasets show that RISK outperforms state-of-the-art methods by at least 0.5 and 4 orders of magnitude in response time for 1% range queries and 10-nearest neighbor queries, respectively.
City-scale person re-identification across distributed cameras must handle severe appearance changes from viewpoint, occlusion, and domain shift while complying with data protection rules that prevent sharing raw imagery. We introduce CityGuard, a topology-aware transformer for privacy-preserving identity retrieval in decentralized surveillance. The framework integrates three components. A dispersion-adaptive metric learner adjusts instance-level margins according to feature spread, increasing intra-class compactness. Spatially conditioned attention injects coarse geometry, such as GPS or deployment floor plans, into graph-based self-attention to enable projectively consistent cross-view alignment using only coarse geometric priors without requiring survey-grade calibration. Differentially private embedding maps are coupled with compact approximate indexes to support secure and cost-efficient deployment. Together these designs produce descriptors robust to viewpoint variation, occlusion, and domain shifts, and they enable a tunable balance between privacy and utility under rigorous differential-privacy accounting. Experiments on Market-1501 and additional public benchmarks, complemented by database-scale retrieval studies, show consistent gains in retrieval precision and query throughput over strong baselines, confirming the practicality of the framework for privacy-critical urban identity matching.
Accurate reconstruction of missing morphological indicators of a city is crucial for urban planning and data-driven analysis. This study presents the spatial-morphological (SM) imputer tool, which combines data-driven morphological clustering with neighborhood-based methods to reconstruct missing values of the floor space index (FSI) and ground space index (GSI) at the city block level, inspired by the SpaceMatrix framework. This approach combines city-scale morphological patterns as global priors with local spatial information for context-dependent interpolation. The evaluation shows that while SM alone captures meaningful morphological structure, its combination with inverse distance weighting (IDW) or spatial k-nearest neighbor (sKNN) methods provides superior performance compared to existing SOTA models. Composite methods demonstrate the complementary advantages of combining morphological and spatial approaches.
We propose a bounds-only pruning test for exact Euclidean AkNN joins on partitioned spatial datasets. Data warehouses commonly partition large tables and store row group statistics for them to accelerate searches and joins, rather than maintaining indexes. AkNN joins can benefit from such statistics by constructing bounds and localizing join evaluations to a few partitions before loading them to build spatial indexes. Existing pruning methods are overly conservative for bounds-only spatial data because they do not fully capture its directional semantics, thereby missing opportunities to skip unneeded partitions at the earliest stages of a join. We propose a three-bound proximity test to determine whether all points within a partition have a closer neighbor in one partition than in another, potentially occluded partition. We show that our algorithm is both optimal and efficient.
The calculation of shortest-path distances in road networks is a core operation in navigation systems, location-based services, and spatial analytics. Although classical algorithms, e.g., Dijkstra's algorithm, provide exact answers, their latency is prohibitive for modern real-time, large-scale deployments. Over the past two decades, numerous distance indexes have been proposed to speed up query processing for shortest distance queries. More recently, with the advancement in machine learning (ML), researchers have designed and proposed ML-based distance indexes to answer approximate shortest path and distance queries efficiently. However, a comprehensive and systematic evaluation of these ML-based approaches is lacking. This paper presents the first empirical survey of ML-based distance indexes on road networks, evaluating them along four key dimensions: Training time, query latency, storage, and accuracy. Using seven real-world road networks and workload-driven query datasets derived from trajectory data, we benchmark ten representative ML techniques and compare them against strong classical non-ML baselines, highlighting key insights and practical trade-offs. We release a unified open-source codebase to support reproducibility and future research on learned distance indexes.
Cycling is reported by an average of 35\% of adults at least once per week across 28 countries, and as vulnerable road users directly exposed to their surroundings, cyclists experience the street at an intensity unmatched by other modes. Yet the street-level features that shape this experience remain under-analysed, particularly in historical urban contexts where spatial constraints rule out large-scale infrastructural change and where typological context is often overlooked. This study develops a perception-led, typology-based, and data-integrated framework that explicitly models street typologies and their sub-classifications to evaluate how visual and spatial configurations shape cycling experience. Drawing on the Cambridge Cycling Experience Video Dataset (CCEVD), a first-person and handlebar-mounted corpus developed in this study, we extract fine-grained streetscape indicators with computer vision and pair them with built-environment variables and subjective ratings from a Balanced Incomplete Block Design (BIBD) survey, thereby constructing a typology-sensitive Bikeability Index that integrates subjective and perceived dimensions with physical metrics for segment-level comparison. Statistical analysis shows that perceived bikeability arises from cumulative, context-specific interactions among features. While greenness and openness consistently enhance comfort and pleasure, enclosure, imageability, and building continuity display threshold or divergent effects contingent on street type and subtype. AI-assisted visual redesigns further demonstrate that subtle, targeted changes can yield meaningful perceptual gains without large-scale structural interventions. The framework offers a transferable model for evaluating and improving cycling conditions in heritage cities through perceptually attuned, typology-aware design strategies.
Bridging the gap between data-rich training regimes and observation-sparse deployment conditions remains a central challenge in spatiotemporal field reconstruction, particularly when target domains exhibit distributional shifts, heterogeneous structure, and multi-scale dynamics absent from available training data. We present SENDAI, a hierarchical Sparse-measurement, EfficieNt Data AssImilation Framework that reconstructs full spatial states from hyper sparse sensor observations by combining simulation-derived priors with learned discrepancy corrections. We demonstrate the performance on satellite remote sensing, reconstructing MODIS (Moderate Resolution Imaging Spectroradiometer) derived vegetation index fields across six globally distributed sites. Using seasonal periods as a proxy for domain shift, the framework consistently outperforms established baselines that require substantially denser observations -- SENDAI achieves a maximum SSIM improvement of 185% over traditional baselines and a 36% improvement over recent high-frequency-based methods. These gains are particularly pronounced for landscapes with sharp boundaries and sub-seasonal dynamics; more importantly, the framework effectively preserves diagnostically relevant structures -- such as field topologies, land cover discontinuities, and spatial gradients. By yielding corrections that are more structurally and spectrally separable, the reconstructed fields are better suited for downstream inference of indirectly observed variables. The results therefore highlight a lightweight and operationally viable framework for sparse-measurement reconstruction that is applicable to physically grounded inference, resource-limited deployment, and real-time monitor and control.
Accurate prediction of crop above-ground biomass (AGB) under water stress is critical for monitoring crop productivity, guiding irrigation, and supporting climate-resilient agriculture. Data-driven models scale well but often lack interpretability and degrade under distribution shift, whereas process-based crop models (e.g. DSSAT, APSIM, LINTUL5) require extensive calibration and are difficult to deploy over large spatial domains. To address these limitations, we propose AgriPINN, a process-informed neural network that integrates a biophysical crop-growth differential equation as a differentiable constraint within a deep learning backbone. This design encourages physiologically consistent biomass dynamics under water-stress conditions while preserving model scalability for spatially distributed AGB prediction. AgriPINN recovers latent physiological variables, including leaf area index (LAI), absorbed photosynthetically active radiation (PAR), radiation use efficiency (RUE), and water-stress factors, without requiring direct supervision. We pretrain AgriPINN on 60 years of historical data across 397 regions in Germany and fine-tune it on three years of field experiments under controlled water treatments. Results show that AgriPINN consistently outperforms state-of-the-art deep-learning baselines (ConvLSTM-ViT, SLTF, CNN-Transformer) and the process-based LINTUL5 model in terms of accuracy (RMSE reductions up to $43\%$) and computational efficiency. By combining the scalability of deep learning with the biophysical rigor of process-based modeling, AgriPINN provides a robust and interpretable framework for spatio-temporal AGB prediction, offering practical value for planning of irrigation infrastructure, yield forecasting, and climate-adaptation planning.
Many cities promote walkability through concepts such as the compact city and 15-minute city to enhance urban livability, yet few methods link spatial walkability features to empirically measured livability and account for temporal dynamics. The method developed for this study uses mobile phone data from the Helsinki Metropolitan Area (Finland) to assess whether commonly used, literature-derived livability indicators (diversity, density, proximity, accessibility) predict observed human activity patterns across different times of day. We constructed two key dimensions of livability: attractiveness and walkability with quantifiable sub-indicators that were selected based on literature. Our analysis shows that walkability, and even more so the combined livability index, correlates with activity patterns, outperforming the pure attractiveness perspective. However, this relationship is temporally unstable, significantly weakening at night and fluctuating daily. Moreover, based on Geographically Weighted Regression analysis, our results reveal significant spatial variation in the relationship between livability and the intensity of human activities. The findings suggest that traditional urban planning goals, such as functional diversity to enhance walkability, contribute to livability but have a limited impact on the 15-minute city's overall sustainable mobility objectives, necessitating a larger-scale perspective and more functionally profiled approaches for urban development.
Spatiotemporal vector retrieval has emerged as a critical paradigm in modern information retrieval, enabling efficient access to massive, heterogeneous data that evolve over both time and space. However, existing spatiotemporal retrieval methods are often extensions of conventional vector search systems that rely on external filters or specialized indices to incorporate temporal and spatial constraints, leading to inefficiency, architectural complexity, and limited flexibility in handling heterogeneous modalities. To overcome these challenges, we present a unified spatiotemporal vector retrieval framework that integrates temporal, spatial, and semantic cues within a coherent similarity space while maintaining scalability and adaptability to continuous data streams. Specifically, we propose (1) a Rotary-based Unified Encoding Method that embeds time and location into rotational position vectors for consistent spatiotemporal representation; (2) a Circular Incremental Update Mechanism that supports efficient sliding-window updates without global re-encoding or index reconstruction; and (3) a Weighted Interest-based Retrieval Algorithm that adaptively balances modality weights for context-aware and personalized retrieval. Extensive experiments across multiple real-world datasets demonstrate that our framework substantially outperforms state-of-the-art baselines in both retrieval accuracy and efficiency, while maintaining robustness under dynamic data evolution. These results highlight the effectiveness and practicality of the proposed approach for scalable spatiotemporal information retrieval in intelligent systems.
Bushfires are among the most destructive natural hazards in Australia, causing significant ecological, economic, and social damage. Accurate prediction of bushfire intensity is therefore essential for effective disaster preparedness and response. This study examines the predictive capability of spatio-temporal environmental data for identifying high-risk bushfire zones across Australia. We integrated historical fire events from NASA FIRMS, daily meteorological observations from Meteostat, and vegetation indices such as the Normalized Difference Vegetation Index (NDVI) from Google Earth Engine for the period 2015-2023. After harmonizing the datasets using spatial and temporal joins, we evaluated several machine learning models, including Random Forest, XGBoost, LightGBM, a Multi-Layer Perceptron (MLP), and an ensemble classifier. Under a binary classification framework distinguishing'low'and'high'fire risk, the ensemble approach achieved an accuracy of 87%. The results demonstrate that combining multi-source environmental features with advanced machine learning techniques can produce reliable bushfire intensity predictions, supporting more informed and timely disaster management.
Spatiotemporal data mining (STDM) has a wide range of applications in various complex physical systems (CPS), i.e., transportation, manufacturing, healthcare, etc. Among all the proposed methods, the Convolutional Long Short-Term Memory (ConvLSTM) has proved to be generalizable and extendable in different applications and has multiple variants achieving state-of-the-art performance in various STDM applications. However, ConvLSTM and its variants are computationally expensive, which makes them inapplicable in edge devices with limited computational resources. With the emerging need for edge computing in CPS, efficient AI is essential to reduce the computational cost while preserving the model performance. Common methods of efficient AI are developed to reduce redundancy in model capacity (i.e., model pruning, compression, etc.). However, spatiotemporal data mining naturally requires extensive model capacity, as the embedded dependencies in spatiotemporal data are complex and hard to capture, which limits the model redundancy. Instead, there is a fairly high level of data and feature redundancy that introduces an unnecessary computational burden, which has been largely overlooked in existing research. Therefore, we developed a novel framework SparseST, that pioneered in exploiting data sparsity to develop an efficient spatiotemporal model. In addition, we explore and approximate the Pareto front between model performance and computational efficiency by designing a multi-objective composite loss function, which provides a practical guide for practitioners to adjust the model according to computational resource constraints and the performance requirements of downstream tasks.
The widespread use of GPS devices has driven advances in spatiotemporal data mining, enabling machine learning models to simulate human decision making and generate realistic trajectories, addressing both data collection costs and privacy concerns. Recent studies have shown the promise of diffusion models for high-quality trajectory generation. However, most existing methods rely on convolution based architectures (e.g. UNet) to predict noise during the diffusion process, which often results in notable deviations and the loss of fine-grained street-level details due to limited model capacity. In this paper, we propose Trajectory Transformer, a novel model that employs a transformer backbone for both conditional information embedding and noise prediction. We explore two GPS coordinate embedding strategies, location embedding and longitude-latitude embedding, and analyze model performance at different scales. Experiments on two real-world datasets demonstrate that Trajectory Transformer significantly enhances generation quality and effectively alleviates the deviation issues observed in prior approaches.
Robust cross-view 3-DoF localization in GPS-denied, off-road environments remains challenging due to (1) perceptual ambiguities from repetitive vegetation and unstructured terrain, and (2) seasonal shifts that significantly alter scene appearance, hindering alignment with outdated satellite imagery. To address this, we introduce MoViX, a self-supervised cross-view video localization framework that learns viewpoint- and season-invariant representations while preserving directional awareness essential for accurate localization. MoViX employs a pose-dependent positive sampling strategy to enhance directional discrimination and temporally aligned hard negative mining to discourage shortcut learning from seasonal cues. A motion-informed frame sampler selects spatially diverse frames, and a lightweight temporal aggregator emphasizes geometrically aligned observations while downweighting ambiguous ones. At inference, MoViX runs within a Monte Carlo Localization framework, using a learned cross-view matching module in place of handcrafted models. Entropy-guided temperature scaling enables robust multi-hypothesis tracking and confident convergence under visual ambiguity. We evaluate MoViX on the TartanDrive 2.0 dataset, training on under 30 minutes of data and testing over 12.29 km. Despite outdated satellite imagery, MoViX localizes within 25 meters of ground truth 93% of the time, and within 50 meters 100% of the time in unseen regions, outperforming state-of-the-art baselines without environment-specific tuning. We further demonstrate generalization on a real-world off-road dataset from a geographically distinct site with a different robot platform.
Spatiotemporal relationships are critical in data science, as many prediction and reasoning tasks require analysis across both spatial and temporal dimensions--for instance, navigating an unfamiliar city involves planning itineraries that sequence locations and timing cultural experiences. However, existing Question-Answering (QA) datasets lack sufficient spatiotemporal-sensitive questions, making them inadequate benchmarks for evaluating models' spatiotemporal reasoning capabilities. To address this gap, we introduce POI-QA, a novel spatiotemporal-sensitive QA dataset centered on Point of Interest (POI), constructed through three key steps: mining and aligning open-source vehicle trajectory data from GAIA with high-precision geographic POI data, rigorous manual validation of noisy spatiotemporal facts, and generating bilingual (Chinese/English) QA pairs that reflect human-understandable spatiotemporal reasoning tasks. Our dataset challenges models to parse complex spatiotemporal dependencies, and evaluations of state-of-the-art multilingual LLMs (e.g., Qwen2.5-7B, Llama3.1-8B) reveal stark limitations: even the top-performing model (Qwen2.5-7B fine-tuned with RAG+LoRA) achieves a top 10 Hit Ratio (HR@10) of only 0.41 on the easiest task, far below human performance at 0.56. This underscores persistent weaknesses in LLMs' ability to perform consistent spatiotemporal reasoning, while highlighting POI-QA as a robust benchmark to advance algorithms sensitive to spatiotemporal dynamics. The dataset is publicly available at https://www.kaggle.com/ds/7394666.
Spatiotemporal traffic time series, such as traffic speed data, collected from sensing systems are often incomplete, with considerable corruption and large amounts of missing values. A vast amount of data conceals implicit data structures, which poses significant challenges for data recovery issues, such as mining the potential spatio-temporal correlations of data and identifying abnormal data. In this paper, we propose a Tucker decomposition-based sparse low-rank high-order tensor optimization model (TSLTO) for data imputation and anomaly diagnosis. We decompose the traffic tensor data into low-rank and sparse tensors, and establish a sparse low-rank high-order tensor optimization model based on Tucker decomposition. By utilizing tools of non-smooth analysis for tensor functions, we explore the optimality conditions of the proposed tensor optimization model and design an ADMM optimization algorithm for solving the model. Finally, numerical experiments are conducted on both synthetic data and a real-world dataset: the urban traffic speed dataset of Guangzhou. Numerical comparisons with several representative existing algorithms demonstrate that our proposed approach achieves higher accuracy and efficiency in traffic flow data recovery and anomaly diagnosis tasks.
Accurately estimating data in sensor-less areas is crucial for understanding system dynamics, such as traffic state estimation and environmental monitoring. This study addresses challenges posed by sparse sensor deployment and unreliable data by framing the problem as a spatiotemporal kriging task and proposing a novel graph transformer model, Kriformer. This model estimates data at locations without sensors by mining spatial and temporal correlations, even with limited resources. Kriformer utilizes transformer architecture to enhance the model's perceptual range and solve edge information aggregation challenges, capturing spatiotemporal information effectively. A carefully constructed positional encoding module embeds the spatiotemporal features of nodes, while a sophisticated spatiotemporal attention mechanism enhances estimation accuracy. The multi-head spatial interaction attention module captures subtle spatial relationships between observed and unobserved locations. During training, a random masking strategy prompts the model to learn with partial information loss, allowing the spatiotemporal embedding and multi-head attention mechanisms to synergistically capture correlations among locations. Experimental results show that Kriformer excels in representation learning for unobserved locations, validated on two real-world traffic speed datasets, demonstrating its effectiveness in spatiotemporal kriging tasks.
Knowledge graph technology is considered a powerful and semantically enabled solution to link entities, allowing users to derive new knowledge by reasoning data according to various types of reasoning rules. However, in building such a knowledge graph, events modeling, such as that of disasters, is often limited to single, isolated events. The linkages among cascading events are often missing in existing knowledge graphs. This paper introduces our GeoAI (Geospatial Artificial Intelligence) solutions to identify causality among events, in particular, disaster events, based on a set of spatially and temporally-enabled semantic rules. Through a use case of causal disaster events modeling, we demonstrated how these defined rules, including theme-based identification of correlated events, spatiotemporal co-occurrence constraint, and text mining of event metadata, enable the automatic extraction of causal relationships between different events. Our solution enriches the event knowledge base and allows for the exploration of linked cascading events in large knowledge graphs, therefore empowering knowledge query and discovery.
Spatiotemporal data mining aims to discover interesting, useful but non-trivial patterns in big spatial and spatiotemporal data. They are used in various application domains such as public safety, ecology, epidemiology, earth science, etc. This problem is challenging because of the high societal cost of spurious patterns and exorbitant computational cost. Recent surveys of spatiotemporal data mining need update due to rapid growth. In addition, they did not adequately survey parallel techniques for spatiotemporal data mining. This paper provides a more up-to-date survey of spatiotemporal data mining methods. Furthermore, it has a detailed survey of parallel formulations of spatiotemporal data mining.
This paper addresses the interesting problem of processing and analyzing data in geographic information systems (GIS) to achieve a clear perspective on urban sprawl. The term urban sprawl refers to overgrowth and expansion of low-density areas with issues such as car dependency and segregation between residential versus commercial use. Sprawl has impacts on the environment and public health. In our work, spatiotemporal features related to real GIS data on urban sprawl such as population growth and demographics are mined to discover knowledge for decision support. We adapt data mining algorithms, Apriori for association rule mining and J4.8 for decision tree classification to geospatial analysis, deploying the ArcGIS tool for mapping. Knowledge discovered by mining this spatiotemporal data is used to implement a prototype spatial decision support system (SDSS). This SDSS predicts whether urban sprawl is likely to occur. Further, it estimates the values of pertinent variables to understand how the variables impact each other. The SDSS can help decision-makers identify problems and create solutions for avoiding future sprawl occurrence and conducting urban planning where sprawl already occurs, thus aiding sustainable development. This work falls in the broad realm of geospatial intelligence and sets the stage for designing a large scale SDSS to process big data in complex environments, which constitutes part of our future work.
With the advancement of GPS and remote sensing technologies, large amounts of geospatial and spatiotemporal data are being collected from various domains, driving the need for effective and efficient prediction methods. Given spatial data samples with explanatory features and targeted responses (categorical or continuous) at a set of locations, the problem aims to learn a model that can predict the response variable based on explanatory features. The problem is important with broad applications in earth science, urban informatics, geosocial media analytics and public health, but is challenging due to the unique characteristics of spatiotemporal data, including spatial and temporal autocorrelation, spatial heterogeneity, temporal non-stationarity, limited ground truth, and multiple scales and resolutions. This paper provides a systematic review on principles and methods in spatial and spatiotemporal prediction. We provide a taxonomy of methods categorized by the key challenge they address. For each method, we introduce its underlying assumption, theoretical foundation, and discuss its advantages and disadvantages. Our goal is to help interdisciplinary domain scientists choose techniques to solve their problems, and more importantly, to help data mining researchers to understand the main principles and methods in spatial and spatiotemporal prediction and identify future research opportunities.
The amount and size of spatiotemporal data sets from different domains have been rapidly increasing in the last years, which demands the development of robust and fast methods to analyze and extract information from them. In this paper, we propose a network-based model for spatiotemporal data analysis called chronnet. It consists of dividing a geometrical space into grid cells represented by nodes connected chronologically. The main goal of this model is to represent consecutive recurrent events between cells with strong links in the network. This representation permits the use of network science and graphing mining tools to extract information from spatiotemporal data. The chronnet construction process is fast, which makes it suitable for large data sets. In this paper, we describe how to use our model considering artificial and real data. For this purpose, we propose an artificial spatiotemporal data set generator to show how chronnets capture not just simple statistics, but also frequent patterns, spatial changes, outliers, and spatiotemporal clusters. Additionally, we analyze a real-world data set composed of global fire detections, in which we describe the frequency of fire events, outlier fire detections, and the seasonal activity, using a single chronnet.
Mining natural associations from high-dimensional spatiotemporal signals have received significant attention in various fields including biology, climatology and financial analysis, etcetera. Due to the widespread correlation in diverse applications, ideas that taking full advantage of correlated property to find meaningful insights of spatiotemporal signals have begun to emerge. In this paper, we study the problem of uncovering graphs that better reveal the relations behind data, with the help of long and short term correlated property in spatiotemporal signals. A spatiotemporal signal model considering both spatial and temporal relationship is firstly presented. Particularly, a low-rank representation together with a Gaussian Markov process is adopted to describe the signals' time-correlated behavior. Next, we cast the graph learning problem as a joint low-rank component estimation and graph Laplacian inference problem. A Low-Rank and Spatiotemporal Smoothness-based graph learning method (GL-LRSS) is proposed, which novelly introduces spatiotemporal smooth prior to the field of time-vertex signal analysis. Through jointly exploiting the low-rank property of long-time observations and the smoothness of short-time observations, the overall performance is effectively improved. Experiments on both synthetic and real-world datasets demonstrate the significant improvement on learning accuracy of the proposed GL-LRSS over current state-of-the-art low-rank estimation and graph learning methods.
Representation learning of pedestrian trajectories transforms variable-length timestamp-coordinate tuples of a trajectory into a fixed-length vector representation that summarizes spatiotemporal characteristics. It is a crucial technique to connect feature-based data mining with trajectory data. Trajectory representation is a challenging problem, because both environmental constraints (e.g., wall partitions) and temporal user dynamics should be meticulously considered and accounted for. Furthermore, traditional sequence-to-sequence autoencoders using maximum log-likelihood often require dataset covering all the possible spatiotemporal characteristics to perform well. This is infeasible or impractical in reality. We propose TREP, a practical pedestrian trajectory representation learning algorithm which captures the environmental constraints and the pedestrian dynamics without the need of any training dataset. By formulating a sequence-to-sequence autoencoder with a spatial-aware objective function under the paradigm of actor-critic reinforcement learning, TREP intelligently encodes spatiotemporal characteristics of trajectories with the capability of handling diverse trajectory patterns. Extensive experiments on both synthetic and real datasets validate the high fidelity of TREP to represent trajectories.
Cloud-based services with resources to be provisioned for consumers are increasingly the norm, especially with respect to Big data, spatiotemporal data mining and application services that impose a user's agreed Quality of Service (QoS) rules or Service Level Agreement (SLA). Considering the pervasive nature of data centers and cloud system, there is a need for a real-time analytics of the systems considering cost, utility and energy. This work presents an overlay model of GPU system for Data As A Service (DaaS) to give a real-time data analysis of network data, customers, investors and users' data from the datacenters or cloud system. Using a modeled layer to define a learning protocol and system, we give a custom, profitable system for DaaS on GPU. The GPU-enabled pre-processing and initial operations of the clustering model analysis is promising as shown in the results. We examine the model on real-world data sets to model a big data set or spatiotemporal data mining services. We also produce results of our model with clustering, neural networks' Self-organizing feature maps (SOFM or SOM) to produce a distribution of the clustering for DaaS model. The experimental results thus far show a promising model that could enhance SLA and or QoS based DaaS.
Many countries are suffering from severe air pollution. Understanding how different air pollutants accumulate and propagate is critical to making relevant public policies. In this paper, we use urban big data (air quality data and meteorological data) to identify the \emph{spatiotemporal (ST) causal pathways} for air pollutants. This problem is challenging because: (1) there are numerous noisy and low-pollution periods in the raw air quality data, which may lead to unreliable causality analysis, (2) for large-scale data in the ST space, the computational complexity of constructing a causal structure is very high, and (3) the \emph{ST causal pathways} are complex due to the interactions of multiple pollutants and the influence of environmental factors. Therefore, we present \emph{p-Causality}, a novel pattern-aided causality analysis approach that combines the strengths of \emph{pattern mining} and \emph{Bayesian learning} to efficiently and faithfully identify the \emph{ST causal pathways}. First, \emph{Pattern mining} helps suppress the noise by capturing frequent evolving patterns (FEPs) of each monitoring sensor, and greatly reduce the complexity by selecting the pattern-matched sensors as "causers". Then, \emph{Bayesian learning} carefully encodes the local and ST causal relations with a Gaussian Bayesian network (GBN)-based graphical model, which also integrates environmental influences to minimize biases in the final results. We evaluate our approach with three real-world data sets containing 982 air quality sensors, in three regions of China from 01-Jun-2013 to 19-Dec-2015. Results show that our approach outperforms the traditional causal structure learning methods in time efficiency, inference accuracy and interpretability.
Abstract Characterizing human mobility patterns is essential for understanding human behaviors and the interactions with socioeconomic and natural environment, and plays a critical role in public health, urban planning, transportation engineering and related fields. With the widespread of location-aware mobile devices and continuing advancement of Web 2.0 technologies, location-based social media (LBSM) have been gaining widespread popularity in the past few years. With an access to locations of hundreds of million users, profiles and the contents of the social media posts, the LBSM data provided a novel modality of data source for human mobility study. By exploiting the explicit location footprints and mining the latent demographic information implied in the LBSM data, the purpose of this paper is to investigate the spatiotemporal characteristics of human mobility with a particular focus on the impact of demography. To serve this purpose, we first collect geo-tagged Twitter feeds posted in the conterminous United States area, and organize the collection of feeds using the concept of space-time trajectory corresponding to each Twitter user. Commonly human mobility measures, including detected home and activity centers, are derived for each user trajectory. We then select a subset of Twitter users that have detected home locations in the city of Chicago as a case study, and apply name analysis to the names provided in user profiles to learn the implicit demographic information of Twitter users, including race/ethnicity, gender and age. Finally we explore the spatiotemporal distribution and mobility characteristics of Chicago Twitter users, and investigate the demographic impact by comparing the differences across three demographic dimensions (race/ethnicity, gender and age). We found that, although the human mobility measures of different demographic groups generally follow the generic laws (e.g., power law distribution), the demographic information, particular the race/ethnicity group, significantly affects the urban human mobility patterns.
Demand for sustainable mobility is particularly high in urban areas. Hence, there is a growing need to predict when people will decide to use different travel modes with an emphasis on environmentally friendly travel modes. As travel mode choice (TMC) is influenced by multiple factors, in a growing number of cases machine learning methods are used to predict travel mode choices given respondent and journey features. Typically, travel diaries are used to provide core relevant data. However, other features such as attributes of mode alternatives including, but not limited to travel times, and, in the case of public transport (PT), also walking distances have a major impact on whether a person decides to use a travel mode of interest. Hence, in this work, we propose an architecture of a software platform performing the data fusion combining data documenting journeys with the features calculated to summarise transport options available for these journeys, built environment and environmental factors such as weather conditions possibly influencing travel mode decisions. Furthermore, we propose various novel features, many of which we show to be among the most important for TMC prediction. We propose how stream processing engines and other Big Data systems can be used for their calculation. The data processed by the platform is used to develop machine learning models predicting travel mode choices. To validate the platform, we propose ablation studies investigating the importance of individual feature subsets calculated by it and their impact on the TMC models built with them. In our experiments, we combine survey data, GPS traces, weather and pollution time series, transport model data, and spatial data of the built environment. The growth in the accuracy of TMC models built with the additional features is up to 18.2% compared to the use of core survey data only.
Given the high availability of data collected by different remote sensing instruments, the data fusion of multi-spectral and hyperspectral images (HSI) is an important topic in remote sensing. In particular, super-resolution as a data fusion application using spatial and spectral domains is highly investigated because its fused images is used to improve the classification and tracking objects accuracy. On the other hand, the huge amount of data obtained by remote sensing instruments represent a key concern in terms of data storage, management and pre-processing. This paper proposes a Big Data Cloud platform using Hadoop and Spark to store, manages, and process remote sensing data. Also, a study over the parameter \textit{chunk size} is presented to suggest the appropriate value for this parameter to download imagery data from Hadoop into a Spark application, based on the format of our data. We also developed an alternative approach based on Long Short Term Memory trained with different patch sizes for super-resolution image. This approach fuse hyperspectral and multispectral images. As a result, we obtain images with high-spatial and high-spectral resolution. The experimental results show that for a chunk size of 64k, an average of 3.5s was required to download data from Hadoop into a Spark application. The proposed model for super-resolution provides a structural similarity index of 0.98 and 0.907 for the used dataset.
Real-time traffic and sensor data from connected vehicles have the potential to provide insights that will lead to the immediate benefit of efficient management of the transportation infrastructure and related adjacent services. However, the growth of electric vehicles (EVs) and connected vehicles (CVs) has generated an abundance of CV data and sensor data that has put a strain on the processing capabilities of existing data center infrastructure. As a result, the benefits are either delayed or not fully realized. To address this issue, we propose a solution for processing state-wide CV traffic and sensor data on GPUs that provides real-time micro-scale insights in both temporal and spatial dimensions. This is achieved through the use of the Nvidia Rapids framework and the Dask parallel cluster in Python. Our findings demonstrate a 70x acceleration in the extraction, transformation, and loading (ETL) of CV data for the State of Missouri for a full day of all unique CV journeys, reducing the processing time from approximately 48 hours to just 25 minutes. Given that these results are for thousands of CVs and several thousands of individual journeys with sub-second sensor data, implies that we can model and obtain actionable insights for the management of the transportation infrastructure.
Large-scale flood inundation observations, not only provide crucial information for emergency response and decision-making but also for future preparedness or damage assessment. Satellite remote sensing presents a cost-effective solution for synoptic flood monitoring, and satellite-derived flood maps provide a computationally efficient alternative to numerical flood inundation models traditionally used. While satellites do offer timely inundation information when they happen to cover an ongoing flood event, they are limited by their spatiotemporal resolution in terms of their ability to dynamically monitor flood evolution at various scales. Constantly improving access to new satellite data sources as well as big data processing capabilities has unlocked an unprecedented number of possibilities in terms of data-driven solutions to this problem, specifically using deep-learning algorithms with multi-sensor remotely sensed data. Specifically, the fusion of data from satellites, such as the Copernicus Sentinels, which have high spatial and low temporal resolution, with data from NASA’s SMAP and GPM missions, which have low spatial but high temporal resolutions could yield high-resolution flood inundation at a daily scale. Here a Convolutional-Neural-Network is trained using flood inundation maps derived from Sentinel-1 Synthetic Aperture Radar and various hydrological, topographical, and land-use based predictors for the first time, to predict high-resolution probabilistic maps of flood inundation. The performance of UNet and SegNet model architectures for this task is evaluated, using flood masks derived from Sentinel-1 and Sentinel-2, separately with 95%-confidence intervals. The Area under the Curve (AUC) of the 2 Precision Recall Curve (PR-AUC) is used as the main evaluation metric, due to the inherently imbalanced nature of classes in a binary flood mapping problem, with the best model delivering a PRAUC of ~0.85. Feature importance analysis using the permutation feature importance method, showed low importance of precipitation and soil-moisture, due to the large spatial resolution mismatch and the consequent lack of spatial variability per output pixel. Results from this proof-of-concept study indicate that multi-sensor data fusion could yield daily scale high-resolution flood inundation maps, enabling new possibilities for flash flood and dynamic flood evolution monitoring using satellites.
Predicting risk map of traffic accidents is vital for accident prevention and early planning of emergency response. Here, the challenge lies in the multimodal nature of urban big data. We propose a compact neural ensemble model to alleviate overfitting in fusing multimodal features and develop some new features such as fractal measure of road complexity in satellite images, taxi flows, POIs, and road width and connectivity in OpenStreetMap. The solution is more promising in performance than the baseline methods and the single-modality data based solutions. After visualization from a micro view, the visual patterns of the scenes related to high and low risk are revealed, providing lessons for future road design. From city point of view, the predicted risk map is close to the ground truth, and can act as the base in optimizing spatial configuration of resources for emergency response, and alarming signs. To the best of our knowledge, it is the first work to fuse visual and spatio-temporal features in traffic accident prediction while advances to bridge the gap between data mining based urban computing and computer vision based urban perception.
… a lot of efforts to improve the value of geospatial big data as well as take advantage of its … analytics of geospatial big data, especially on interactive analytics of real-time or dynamic data…
… seas of accessible data [1]. A few critical uses of Machine learning algorithms for geospatial … As a rule, geospatial information is not just information considered in a topographical a few …
… , processing, and analysis of geospatial big data. Latest trend is … for geospatial analysis of data from various geospatial … MistGIS for analytics in mining domain from geospatial big data. …
… computing based geospatial information and geospatial web … to focus the provision of geospatial information towards the … for analytics in big data from various geospatial applications. …
… Health analytics benefit from geospatial big data by mapping disease outbreaks through crowdsourcing, assessing healthcare accessibility considering real-world visitations, and …
The present survey examines the role of big data analytics in advancing remote sensing and geospatial analysis. The increasing volume and complexity of geospatial data are driving the adoption of machine learning (ML) and artificial intelligence (AI) techniques, such as convolutional neural networks (CNNs) and long short-term memory (LSTM) networks, to extract meaningful insights from large, diverse datasets. These AI methods enhance the accuracy and efficiency of spatial and temporal data analysis, benefiting applications in environmental monitoring, urban planning, and disaster management. Despite these advancements, challenges related to computational efficiency, data integration, and model transparency remain. This paper also discusses emerging trends and highlights the potential of hybrid approaches, cloud computing, and edge processing in overcoming these challenges. The integration of AI with geospatial data is poised to significantly improve our ability to monitor and manage Earth systems, supporting more informed and sustainable decision-making.
… geospatial data in the scope of the four ‘science paradigms’. This paper proposes that geospatial big data … research methodology from ‘hypothesis to data’ to ‘data to questions’ and it is …
… traditional algorithms for geospatial big data processing. Traditional data mining algorithms … we concentrate on parallel clustering & outliers detection algorithms for geospatial big data. …
The convergence of big data and geospatial computing has brought challenges and opportunities to GIScience with regards to geospatial data management, processing, analysis, modeling, and visualization. This special issue highlights recent advancements in integrating new computing approaches, spatial methods, and data management strategies to tackle geospatial big data challenges and meanwhile demonstrates the opportunities for using big data for geospatial applications. Crucial to the advancements highlighted here is the integration of computational thinking and spatial thinking and the transformation of abstract ideas and models to concrete data structures and algorithms. This editorial first introduces the background and motivation of this special issue followed by an overview of the ten included articles. Conclusion and future research directions are provided in the last section.
Background: Often combined with other traditional and non-traditional types of data, geospatial sensing data have a crucial role in public health studies. We conducted a systematic narrative review to broaden our understanding of the usage of big geospatial sensing, ancillary data, and related spatial data infrastructures in public health studies. Methods: English-written, original research articles published during the last ten years were examined using three leading bibliographic databases (i.e., PubMed, Scopus, and Web of Science) in April 2022. Study quality was assessed by following well-established practices in the literature. Results: A total of thirty-two articles were identified through the literature search. We observed the included studies used various data-driven approaches to make better use of geospatial big data focusing on a range of health and health-related topics. We found the terms ‘big’ geospatial data and geospatial ‘big data’ have been inconsistently used in the existing geospatial sensing studies focusing on public health. We also learned that the existing research made good use of spatial data infrastructures (SDIs) for geospatial sensing data but did not fully use health SDIs for research. Conclusions: This study reiterates the importance of interdisciplinary collaboration as a prerequisite to fully taking advantage of geospatial big data for future public health studies.
… The goal is to put together geospatial big data repositories along with geospatial big data frameworks and create novel, sustainable, useful and cost-effective services. There have been …
… of geospatial data analysis. In this study, we developed a geospatial big data analytics engine … The geospatial big data analytics engine can increase the RDD representation ability of …
… geospatial big data using traditional approaches on local machines. In this article, a survey of geospatial big data … the main platforms for processing geospatial big data. This article is …
… In this section, we first give an example of how spatial indexes are implemented in the Hadoop MapReduce environment and then show how to generalize the described method to …
In this study, the efficiency and scalability of Geomesa and Mosaic, two well-known platforms for spatial data indexing and join operations in distributed environments, are investigated using Apache Spark and Scala/Python. The experiments investigated the impact of varying the input sizes, index utilization, and multi-node environments. Geomesa exhibits commendable performance, particularly without geohash indexing incorporation, demonstrating its efficiency in handling large-scale spatiotemporal datasets, as Geomesa requires 10.6% less time than Mosaic. By contrast, Mosaic employing the H3 indexing methodology consistently outperformed Geomesa, achieving 16.6% less time. These findings underscore the pivotal role of spatial indexing, with H3 indexing emerging as a superior technique in join operations. This research provides valuable insights into the comparative strengths and efficiencies of Geomesa and Mosaic, providing practitioners with informed perspectives for spatial data processing applications in distributed systems.
Data searching and retrieval is one of the fundamental functionalities in many Web of Things applications, which need to collect, process and analyze huge amounts of sensor stream data. The problem in fact has been well studied for data generated by sensors that are installed at fixed locations; however, challenges emerge along with the popularity of opportunistic sensing applications in which mobile sensors keep reporting observation and measurement data at variable intervals and changing geographical locations. To address these challenges, we develop the Geohash-Grid Tree, a spatial indexing technique specially designed for searching data integrated from heterogeneous sources in a mobile sensing environment. Results of the experiments on a real-world dataset collected from the SmartSantander smart city testbed show that the index structure allows efficient search based on spatial distance, range and time windows in a large time series database.
… spatial data index services, it is very difficult to get the visualization of query results of their own spatial data… on using TerraFly sksOpen for spatial data indexing, query, and visualization. …
… data with spatial information, referred to as spatial data, is dramatically increasing. Cloud computing plays an important role handling large-scale data analysis, and several cloud data …
… and store big spatial data efficiently and effectively… spatial data and the key operations for spatial queries. After that, we have summarized the key methods and strategies for spatial data …
… in supporting spatial data as it deals with spatial data in the same way as non-spatial data. … In the rest of this section, we give an overview of spatial indexing in SpatialHadoop (Section …
… of extremely large volumes of spatial data has led to many … to indexing and performing key analytics on spatial data that … that it combines spatial indexing, data load balancing, and data …
Managing and manipulating uncertainty in spatial databases are important problems for … -based method to model and index uncertain spatial data is proposed. In this scheme, each …
… Our approach is not suited for spatial data sets with only minimal variations in object scales or more large-scale than small-scale objects. With a scale dimension, the index structure …
There has been a recent marked increase in the amount of spatial data collected by smart phones, space telescopes, and medical devices, among others. The increased volume has brought into focus the need for specialized systems to handle big spatial data. The Era of Big Spatial Data: A Survey summarizes the state-of-the-art in this area. It classifies the existing work by considering six aspects of big spatial data systems, namely, approach, architecture, language, indexing, querying, and visualization. It describes each of these components in detail and gives examples of how they are implemented in existing systems. It also provides the reader with case studies of real applications that make use of these systems to provide services for end users. The Era of Big Spatial Data: A Survey is an invaluable reference for researchers or practitioners interested in getting a handle on the state of the art for Big Spatial Data storage and querying.
… In order to solve the drawbacks of spatial data storage in common Cloud … , indexing, accessing and managing spatial data in the Cloud environment. An interoperable spatial data object …
With the widespread use of GPS-equipped smartphones and Internet of Things devices, a huge amount of data with location information is being generated at an unprecedented rate. To gain a deeper insight into such a plethora of spatial data, scientists and engineers are widely using spatial queries for their big data applications. However, because of not only the massive spatial data size but also the complexity of spatial query processing, they are struggling to efficiently process the spatial queries. In this paper, we propose lightweight and scalable indexing and querying services for big spatial data stored in distributed storage systems or graph-based systems. Our spatial services have several advantages over existing approaches. First, our services can be easily applied to existing storage systems or graph-based models without modifying the internal implementation of existing systems/models. Second, our services achieve high pruning power by efficiently selecting only relevant spatial objects based on a simple yet effective filter. Third, our services support a customizable and easy-to-use control of index data size by adjusting the precision of indexed geometries. Lastly, our services support efficient updates of spatial data. Our experimental results using real-world datasets validate the effectiveness and efficiency of our spatial services.
ABSTRACT This paper proposes a novel data indexing scheme, the distributed access pattern R-tree (DAPR-tree), for spatial data retrieval in a distributed computing environment. As compared to traditional distributed indexing schemes, the DAPR-tree introduces the data access patterns during the indexing utilization stage so that a more balanced indexing structure can be provided for spatial applications (e.g. Digital Earth data warehouse). In this new indexing scheme, (a) an indexing penalty matrix is proposed by considering the balance of data number, topology and access load between different indexing nodes; (b) an ‘access possibility’ element is integrated to a classic ‘Master-Client’ structure for a distributed indexing environment; and (c) indexing algorithm for the DAPR-tree is provided for index implementations. By using a duplication of official GEOSS Clearinghouse system as a case study, the DAPR-tree was evaluated in a number of scenarios. The results show that our indexing schemes generally outperform (around 9%) traditional distributed indices with the utilization of data access patterns. Finally, we discuss the applicability of the DARP-tree and document DARP-tree shortcomings to encourage researchers pursuing related topics in Big Data indexing for Digital Earth and other geospatial initiatives.
… In this paper, we examined efficient indexing strategies for non-uniform spatial data. We observed that the performance of a tree-based index depends on two factors - the height of the …
… work, we have focused on spatial data. We have presented the SPATIALP2P framework for handling spatial data in a P2P network. SPATIALP2P provides efficient storing, indexing, and …
As the Internet of Things (IoT) systems gain in popularity, an increasing number of Big Data sources are available. Ranging from small sensor networks designed for household use to large fully automated industrial environments, the Internet of Things systems create billions of measurements each second making traditional storage and indexing solutions obsolete. While research around Big Data has focused on scalable solutions that can support the datasets produced by these systems, the focus has been mainly on managing the volume and velocity of these data, rather than providing efficient solutions for their retrieval and analysis. A key characteristic of these data, which is, more often than not, overlooked, is the spatial information that can be used to integrate data from multiple sources and conduct multidimensional analysis of the collected information. We present here the solutions currently available for the storage and indexing of spatial datasets produced by the IoT systems and we discuss their applicability in real world scenarios.
… We have chosen Spatial Hybridtrees (SHtrees) [12] for the indexing data structure, which provides the same basic functionality as R-trees [2]. SH-trees have been shown to provide …
Explosive growth in geospatial and temporal data as well as the emergence of new technologies emphasize the need for automated discovery of spatiotemporal knowledge. Spatiotemporal data mining studies the process of discovering interesting and previously unknown, but potentially useful patterns from large spatiotemporal databases. It has broad application domains including ecology and environmental management, public safety, transportation, earth science, epidemiology, and climatology. The complexity of spatiotemporal data and intrinsic relationships limits the usefulness of conventional data science techniques for extracting spatiotemporal patterns. In this survey, we review recent computational techniques and tools in spatiotemporal data mining, focusing on several major pattern families: spatiotemporal outlier, spatiotemporal coupling and tele-coupling, spatiotemporal prediction, spatiotemporal partitioning and summarization, spatiotemporal hotspots, and change detection. Compared with other surveys in the literature, this paper emphasizes the statistical foundations of spatiotemporal data mining and provides comprehensive coverage of computational approaches for various pattern families. ISPRS Int. J. Geo-Inf. 2015, 4 2307 We also list popular software tools for spatiotemporal data analysis. The survey concludes with a look at future research needs.
Large volumes of spatio-temporal data are increasingly collected and studied in diverse domains, including climate science, social sciences, neuroscience, epidemiology, transportation, mobile health, and Earth sciences. Spatio-temporal data differ from relational data for which computational approaches are developed in the data-mining community for multiple decades in that both spatial and temporal attributes are available in addition to the actual measurements/attributes. The presence of these attributes introduces additional challenges that needs to be dealt with. Approaches for mining spatio-temporal data have been studied for over a decade in the data-mining community. In this article, we present a broad survey of this relatively young field of spatio-temporal data mining. We discuss different types of spatio-temporal data and the relevant data-mining questions that arise in the context of analyzing each of these datasets. Based on the nature of the data-mining problem studied, we classify literature on spatio-temporal data mining into six major categories: clustering, predictive learning, change detection, frequent pattern mining, anomaly detection, and relationship mining. We discuss the various forms of spatio-temporal data-mining problems in each of these categories.
Spatiotemporal data mining (STDM) discovers useful patterns from the dynamic interplay between space and time. Several available surveys capture STDM advances and report a wealth of important progress in this field. However, STDM challenges and problems are not thoroughly discussed and presented in articles of their own. We attempt to fill this gap by providing a comprehensive literature survey on state-of-the-art advances in STDM. We describe the challenging issues and their causes and open gaps of multiple STDM directions and aspects. Specifically, we investigate the challenging issues in regards to spatiotemporal relationships, interdisciplinarity, discretisation, and data characteristics. Moreover, we discuss the limitations in the literature and open research problems related to spatiotemporal data representations, modelling and visualisation, and comprehensiveness of approaches. We explain issues related to STDM tasks of classification, clustering, hotspot detection, association and pattern mining, outlier detection, visualisation, visual analytics, and computer vision tasks. We also highlight STDM issues related to multiple applications including crime and public safety, traffic and transportation, earth and environment monitoring, epidemiology, social media, and Internet of Things.
Many scientific fields show great interest in the extraction and processing of spatiotemporal data, such as medicine with an emphasis on epidemiology and neurology, geology, social sciences, meteorology, and a great interest is also observed in the study of transport. Spatiotemporal data differ significantly from spatial data, since spatiotemporal data refer to measurements, which take into account both the place and the time in which they are received, with their respective characteristics, while spatial data refer to and describe information related only to place. The innovation brought about by spatiotemporal data mining has caused a revolution in many scientific fields, and this is because through it we can now provide solutions and answers to complex problems, as well as provide useful and valuable predictions, through predictive learning. However, combining time and place in data mining presents significant challenges and difficulties that must be overcome. Spatiotemporal data mining and analysis is a relatively new approach to data mining which has been studied more systematically in the last decade. The purpose of this article is to provide a good introduction to spatiotemporal data, and through this detailed description, we attempt to introduce descriptive logic and gain a complete knowledge of these data. We aim to introduce a new way of describing them, aiming for future studies, by combining the expressions that arise by type of data, using descriptive logic, with new expressions, that can be derived, to describe future states of objects and environments with great precision, providing accurate predictions. In order to highlight the value of spatiotemporal data, we proceed to give a brief description of ST data in the introduction. We describe the relevant work carried out to date, the types of spatiotemporal (ST) data, their properties and the transformations that can be made between them, attempting, to a small extent, to introduce constraints and rules using descriptive logic, introducing descriptive logic into spatiotemporal data by type, when initially presenting the ST data. The data snapshots by species and similarities between the cases are then described. We describe methods, introducing clustering, dynamic ST clusters, predictive learning, pattern mining frequency, and pattern emergence, and problems such as anomaly detection, identifying time points of changes in the behavior of the observed object, and development of relationships between them. We describe the application of ST data in various fields today, as well as the future work. We finally conclude with our conclusions, with the representation and study of spatiotemporal data can, in combination with other properties which accompany all natural phenomena, through their appropriate processing, lead to safe conclusions regarding the study of problems, and also with great precision in the extraction of predictions by accurately determining future states of an environment or an object. Thus, the importance of ST data makes them particularly valuable today in various scientific fields, and their extraction is a particularly demanding challenge for the future.
… These patterns and trends can be used for understanding spatiotemporal phenomena and … mining. Depending on kind of knowledge to be mined, various spatiotemporal data mining …
… big data. In this paper, we review major spatial data mining algorithms by closely looking at the … and I/O requirements and allude to few applications dealing with big spatial data. …
… on spatial data. … data mining, especially in the areas of prediction and classification, outlier detection, spatial colocation rules, and clustering techniques. Spatiotemporal (ST) data mining …
… mining will not have the same powerful impact that it has had on fields such as … to spatio-temporal data mining (STDM) as a collection of methods that mine the data’s spatio-temporal …
… , apart from unveiling important information to the data analyst, can facilitate data … We define the spatiotemporal periodic pattern mining problem and propose an effective and fast mining …
… However, we also include here papers that discuss new ways of viewing data mining … In general, most have not focussed specifically on temporal, spatial or spatio-temporal data mining …
… vacy of spatiotemporal data, we need to develop privacy-preserving data mining techniques. … as normal, nonsequential, tabular data as spatiotemporal observations of an object are not …
… data mining techniques to accommodate space and time. The bibliography is organized into contributions for temporal, spatial and spatio-temporal data mining. … Since data mining is an …
… the discovery of spatial and spatio-temporal knowledge different with an emphasis on geographical data. We discuss how the new opportunities of data mining can be integrated into a …
Earthquake prediction is a very important problem in seismology, the success of which can potentially save many human lives. Various kinds of technologies have been proposed to address this problem, such as mathematical analysis, machine learning algorithms like decision trees and support vector machines, and precursors signal study. Unfortunately, they usually do not have very good results due to the seemingly dynamic and unpredictable nature of earthquakes. In contrast, we notice that earthquakes are spatially and temporally correlated because of the crust movement. Therefore, earthquake prediction for a particular location should not be conducted only based on the history data in that location, but according to the history data in a larger area. In this paper, we employ a deep learning technique called long short-term memory (LSTM) networks to learn the spatio-temporal relationship among earthquakes in different locations and make predictions by taking advantage of that relationship. Simulation results show that the LSTM network with two-dimensional input developed in this paper is able to discover and exploit the spatio-temporal correlations among earthquakes to make better predictions than before.
Abstract Urban big data fusion creates huge values for urban computing in solving urban problems. In recent years, various models and algorithms based on deep learning have been proposed to unlock the power of knowledge from urban big data. To clarify the methodologies of urban big data fusion based on deep learning (DL), this paper classifies them into three categories: DL-output-based fusion, DL-input-based fusion and DL-double-stage-based fusion. These methods use deep learning to learn feature representation from multi-source big data. Then each category of fusion methods is introduced and some examples are shown. The difficulties and ideas of dealing with urban big data will also be discussed.
… research fields data fusion has been widely applied when efficiently using different types of data (… original research papers dealing with data fusion and utilizing statistical, geostatistical, …
Enormous heterogeneous sensory data are generated in the Internet of Things (IoT) for various applications. These big data are characterized by additional features related to IoT, including trustworthiness, timing and spatial features. This reveals more perspectives to consider while processing, posing vast challenges to traditional data fusion methods at different fusion levels for collection and analysis. In this paper, an IoT-based spatiotemporal data fusion (STDF) approach for low-level data in–data out fusion is proposed for real-time spatial IoT source aggregation. It grants optimum performance through leveraging traditional data fusion methods based on big data analytics while exclusively maintaining the data expiry, trustworthiness and spatial and temporal IoT data perspectives, in addition to the volume and velocity. It applies cluster sampling for data reduction upon data acquisition from all IoT sources. For each source, it utilizes a combination of k-means clustering for spatial analysis and Tiny AGgregation (TAG) for temporal aggregation to maintain spatiotemporal data fusion at the processing server. STDF is validated via a public IoT data stream simulator. The experiments examine diverse IoT processing challenges in different datasets, reducing the data size by 95% and decreasing the processing time by 80%, with an accuracy level up to 90% for the largest used dataset.
… Spatial data fusion: classification and decomposition The term data fusion is widely used in electronic data … In this article, spatial data fusion refers to the synthesis of spatial data from …
… To expound the formation and developing trends of multi-resources spatial data fusion … multi-resources spatial data fusion methods, and foresees the prospects of data fusion in big data …
Global maps of total-column carbon dioxide (CO2) mole fraction (in units of parts per million) are important tools for climate research since they provide insights into the spatial distribution of carbon intake and emissions as well as their seasonal and annual evolutions. Currently, two main remote sensing instruments for total-column CO2 are the Orbiting Carbon Observatory-2 (OCO-2) and the Greenhouse gases Observing SATellite (GOSAT), both of which produce estimates of CO2 concentration, called profiles, at 20 different pressure levels. Operationally, each profile estimate is then convolved into a single estimate of column-averaged CO2 using a linear pressure weighting function. This total-column CO2 is then used for subsequent analyses such as Level 3 map generation and colocation for validation. In principle, total-column CO2 in these applications may be more efficiently estimated by making optimal estimates of the vector-valued CO2 profiles and applying the pressure weighting function afterwards. These estimates will be more efficient if there is multivariate dependence between CO2 values in the profile. In this article, we describe a methodology that uses a modified Spatial Random Effects model to account for the multivariate nature of the data fusion of OCO-2 and GOSAT. We show that multivariate fusion of the profiles has improved mean squared error relative to scalar fusion of the column-averaged CO2 values from OCO-2 and GOSAT. The computations scale linearly with the number of data points, making it suitable for the typically massive remote sensing datasets. Furthermore, the methodology properly accounts for differences in instrument footprint, measurement-error characteristics, and data coverages.
Abstract Anthropogenic heat emission (AHE) influences the local energy balance and intensify the urban heat island (UHI) effect. An accurate calculation of the AHE can improve the precision of UHI predictions. However, reliable AHE calculations with high temporal and spatial resolution in domestic research is still lacking. Therefore, this study proposes an approach to estimate the dynamic AHE by integrating multi-source Internet big data and high-precision urban spatial data. First, we quantified the dynamic distribution of residents’ trajectories by tracking multi-stage Internet geographic location data, real-time traffic conditions of Xi’an city, supplemented by on-site drone monitoring. Then the parameters of cooling and the thermal load coefficient of building emissions, personnel cooling loads, and traffic densities were introduced. Finally, the temporal and spatial dynamic rules of the AHE were revealed. Results showed the AHE was subject to a large changing amplitude. The diurnal AHE values of 64% of the blocks ranged from 93 to 498 W/m2, especially in some core commercial areas, the value could reach above 1000 W/m2 during the peak stages. Compared with previous research, this study dynamically evaluates the temporal and spatial heterogeneity of the AHE under different emission scenarios with a short update cycle and high spatial resolution.
… spatial information and causing scale biases, particularly in landscapes with strong heterogeneity. Combining big data mining and fusion … ) land functions with spatial detail. The scheme …
The rapid development of the urban city has led to great changes in the urban spatial structure. Thus, analyses of polycentric urban spatial structures are important for understanding these kinds of structures. In order to accurately evaluate the polycentric spatial structure of urban agglomerations and judge the differences between the actual development situation and overall planning of urban agglomerations, this study proposes a new method to identify the polycentric spatial structure of urban agglomerations in the Pearl River Delta based on the fusion of nighttime light (NTL) data, point of interest (POI) data, and Tencent migration data (TMG). In the first step, the NTL, POI, and TMG data are fused via wavelet transform; in the second step, Anselin local Moran’s I (LMI) and geographically weighted regression (GWR) were used to identify the main centers and subcenters, respectively. In the third step, the accuracy of the results of this study was further verified and discussed in the context of overall planning. The results show that the accuracy of urban polycenter identification via LMI and GWR after data fusion was 92.84%, and the Kappa value was 0.8971, which was higher than the results of polycenter identification via the traditional relative threshold. After comparing the identification results with the overall planning, firstly, we see that the fusion of multi-source big data can help to accurately evaluate the polycentric spatial structure within the urban agglomeration. Secondly, the fusion of dynamic data and static data can help identify the polycentric spatial structure of urban space more accurately. Therefore, this study can provide a new design for urban polycentric spatial structures, and further provide a reliable reference for the spatial optimization of urban agglomeration and the formulation of regional spatial development policies.
Urban agglomeration is a continuous urban spread and generally comprises a main city at the core and its adjoining growth areas. These agglomerations are studied using different concepts, theories, models, criteria, indices, and approaches, where population distribution and its associated characteristics are mainly used as the main parameters. Given the difficulties in accurately demarcating these agglomerations, novel methods and approaches have emerged in recent years. The use of geospatial big data sources to demarcate urban agglomeration is one of them. This promising method, however, has not yet been studied widely and hence remains an understudied area of research. This study explores using a multisource open geospatial big data fusion approach to demarcate urban agglomeration footprint. The paper uses the Southern Coastal Belt of Sri Lanka as the testbed to demonstrate the capabilities of this novel approach. The methodological approach considers both the urban form and functions related to the parameters of cities in defining urban agglomeration footprint. It employs near-real-time data in defining the urban function-related parameters. The results disclosed that employing urban form and function-related parameters delivers more accurate demarcation outcomes than single parameter use. Hence, the utilization of a multisource geospatial big data fusion approach for the demarcation of urban agglomeration footprint informs urban authorities in developing appropriate policies for managing urban growth.
Abstract Objective In the last few years, several techniques and models are used for retrieving significant information from urban big data of smart cities. This research work aims at developing a data fusion-based traffic congestion control system in smart cities using a deep learning model. Methodology A hybrid model based on the convolution neural network (CNN) and long short term memory (LSTM) architectures are used for region-based traffic flow predictions in smart cities. CNN is used for the classification of spatial data while LSTM for temporal data. Conclusion The experiments used the CityPulse Traffic and CityPulse Pollution datasets, and measured root mean square error (RMSE), time consumption and accuracy. A small RMSE value of 49 and highest accuracy of 92.3% compared to other baseline models depicts the applicability of the proposed model in the region-based traffic flow prediction problems in the smart cities.
… 1.1 GOSAT and AIRS Data In this article, we carry out data fusion on ACOS data and AIRS data over the contiguous United States during the Boreal summer of 2010. The ACOS and …
As one of the cyber–physical–social systems that plays a key role in people's daily activities, a smart city is producing a considerable amount of industrial data associated with transportation, healthcare, business, social activities, and so on. Effectively and efficiently fusing and mining such data from multiple sources can contribute much to the development and improvements of various smart city applications. However, the industrial data collected from the smart city are often sensitive and contain partial user privacy such as spatial–temporal context information. Therefore, it is becoming a necessity to secure user privacy hidden in the smart city data before these data are integrated together for further mining, analyses, and prediction. However, due to the inherent tradeoff between data privacy and data availability, it is often a challenging task to protect users’ context privacy while guaranteeing accurate data analysis and prediction results after data fusion. Considering this challenge, a novel privacy-aware data fusion and prediction approach for the smart city industrial environment is put forward in this article, which is based on the classic locality-sensitive hashing technique. At last, our proposal is evaluated by a set of experiments based on a real-world dataset. Experimental results show better prediction performances of our approach compared to other competitive ones.
… We propose a spatial data-fusion methodology that is able to take advantage of two (or potentially more) large remote sensing datasets with the exponential family of distributions. Our …
本报告将空间大数据领域的文献划分为五大核心维度:综述与研究范式、存储与索引基础设施、数据挖掘与智能建模、多源数据融合应用、以及隐私与安全保障。该结构覆盖了从底层存取到高层应用及安全合规的完整技术链路,体现了当前研究从早期的空间索引优化,向融合人工智能与高性能计算的动态时空知识发现的深刻转型。