Table of Links
Abstract and 1 Introduction
1.1. Spatial Digital Twins (SDTs)
1.2. Applications
1.3. Different Components of SDTs
1.4. Scope of This Work and Contributions
2. Related Work and 2.1. Digital Twins and Variants
2.2. Spatial Digital Twin Case Studies
3. Building Blocks of Spatial Digital Twins and 3.1. Data Acquisition and Processing
3.2. Data Modeling, Storage and Management
3.3. Big Data Analytics System
3.4. Maps and GIS Based Middleware
3.5. Key Functional Components
4. Other Relevant Modern Technologies and 4.1. AI & ML
4.2. Blockchain
4.3. Cloud Computing
5. Challenges and Future Work, and 5.1. Multi-modal and Multi-resolution Data Acquisition
5.2. NLP for Spatial Queries and 5.3. Benchmarking the Databases and Big Data Platform for SDT
5.4. Automated Spatial Insights and 5.5. Multi-modal Analysis
5.6. Building Simulation Environment
5.7. Visualizing Complex and Diverse Interactions
5.8. Mitigating the Security and Privacy Concerns
6. Conclusion and References
3.2. Data Modeling, Storage and Management
The spatial data captured from data acquisition layer needs to be modelled and then stored in a data management system. Broadly, in geospatial domain, we can categorize spatial data modeling into two groups: raster data models, and vector data models [37]. In raster data models, the whole space is divided into grid-cells and each cell is represented as a pixel denoting the cell information. A matrix data structure (or an array of grid cells) is generally used for storing the raster data. This is the most common data model of the GIS community. Continuous geospatial data such as satellite image, thematic map, etc. are generally modeled using raster data model. In contrast, in vector data models, spatial entities are represented using spatial primitive geometric data types such as points, lines, and polygons. For example, the location of a school is represented as a point, a suburb can be represented as a ploygon, and a road network can be represented as a graph. Each object can be associated with other non-spatial attributes, e.g., a school has many non-spatial attributes like, name, ranking, number of students, category, etc. This vector modeling is the most commonly used paradigm in the spatial database community.
A key component of an SDT is the data management system as it needs to handle a wide range of data with varying levels of spatial and temporal granularity. Over the years, many spatial data management techniques and algorithms have been introduced to handle various spatial data types such as points, lines, plygons and time series [38].
To handle large spatial data types and operations, traditional relational database management systems (RDBMS) have extended their scopes to handle spatial data. For example, Oracle Spatial [39], Microsoft SQL Server [40] and PostgreSQL/PosGIS [41] have provided support for many important spatial data types and some key spatial operations. PogreSQL has also included the support of spatial raster data (e.g., images) [42]. As reported in recent studies [43, 44], PostgreSQL has the best spatial support in terms of spatial data types, queries, and scalability. It supports different geometry types that include 2D and 3D geometries such as points and ploylines, and supports various queries such as joins and k-nearest neighbors (kNNs). Other popular spatial RDBMSs (e.g., Oracle Spatial and SQL Server) are also continuously adapting new features to make these systems suitable for the era of big spatial data. Recent research [45] also discusses how 3D city models can be managed and stored using these spatial RDBMSs.
Unstructured data such as text, documents and graphs are augmenting spatial data in many applications. NoSQL (Not-Only-SQL) database [46] systems have become very popular to handle large volumes of unstructured data of various types due to their schema-free and scalable natures. Many such NoSQL database systems have recently been extended to handle spatial data types and operations [47, 48, 49].
For example, MongoDB, a database for managing documents, provide support for basic GeoJSON objects, (e.g., points, linestring, and polygons). Similarly, Oracle NoSQL, a key-value store database, also supports common geometry objects, and a set of spatial operators (e.g., intersect, inside, near, etc.) for processing spatial data [47]. Neo4j, a popular graph database management system, has a spatial extension called Neo4j Spatial [48] that can store, index, and process spatial data.
In summary, though both RDBMS and NoSQL database systems are continuously adopting spatial features, still there is no comprehensive evaluation of how these systems perform for different spatial operations on complex spatial objects such as 3D buildings. As for the SDTs, after building the underlying database, there will be significantly more read operations than write operations, and we may not need to strictly follow ACID properties of the transactional database system, and a combination of RDBMS and NoSQL can be the best options for handing different use-case scenarios.
Apart from RDBMS and NoSQL, there are some popular file formats such as GeoJSON, Shapefiles, GPX and Keyhole Markup Language (KML) for storing and sharing geospatial data in different formats [50]. Among them GeoJSON is the most common format for storing and sharing spatial objects in the forms of points, lines, ploygons, etc. Similarly, the shapefile format was created by Esri, the company that develops ArcGIS software, for storing and sharing spatial objects. The GPX format is an open standard XML markup language for storing a collection of timestamps and latitude/longitude coordinates of moving objects. The KML format is commonly used to store two- and three-dimensional geographic data.
Authors:
(1) Mohammed Eunus Ali, Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, ECE Building, Dhaka, 1000, Bangladesh;
(2) Muhammad Aamir Cheema, Faculty of Information Technology, Monash University, 20 Exhibition Walk, Clayton, 3164, VIC, Australia;
(3) Tanzima Hashem, Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, ECE Building, Dhaka, 1000, Bangladesh;
(4) Anwaar Ulhaq, School of Computing, Charles Sturt University, Port Macquarie, 2444, NSW, Australia;
(5) Muhammad Ali Babar, School of Computer and Mathematical Sciences, The University of Adelaide, Adelaide, 5005, SA, Australia.