Google Cloud recently introduced the preview of Bigtable tiered storage. The new feature allows developers to manage both hot and cold data within a single Bigtable instance, optimizing costs while maintaining access to all data.
With the new Bigtable capability, developers can set an age-based tiering policy on a table, with a minimum age threshold of 30 days. The service automatically moves data between SSD and infrequent access tiers, without requiring manual exports of infrequently accessed data.
Anton Gething, a senior product manager at Google, and Derek Lee, a software engineer at Google, explain how tiered storage reduces operational overhead and eliminates manual data migration. They write:
This feature works with Bigtable’s autoscaling to optimize your Bigtable instance resource utilization. Moreover, data in the infrequent access storage tier is still accessible alongside existing SSD storage through the same Bigtable API.
Data is moved to the infrequent-access tier based on an age-based tiering policy, with an age threshold configured by the developer. When a cell’s timestamp exceeds the configured age, it is moved from the SSD tier to the infrequent access tier. Data movement is based solely on the cell’s timestamp and is unaffected by how often the data is read.
Bigtable is a key-value and wide-column store on Google Cloud, providing a managed, low-latency, Cassandra- and HBase-compatible NoSQL database. The service is designed for fast access to structured, semi-structured, or unstructured data. Among common use cases for the service, the cloud provider suggests time-series data from sensors, equipment, and operations in industries such as manufacturing and automotive.
The documentation highlights that to achieve optimal SSD performance and fully benefit from tiered storage, developers should use timestamp range filters in queries that access data residing only on SSD.
As Bigtable aims to support both operational and analytical workloads in a single database, Gething and Lee explain why the new tiered storage simplifies data accessibility for analytics and reporting:
Use Bigtable SQL to query infrequently used data. You can then build Bigtable logical views to present this data in a format that can be queried when needed. This feature is useful for giving specific users access to historical data for reports, without giving them complete access to the table.
Furthermore, the new feature increases the storage capacity of Bigtable nodes, with a tiered-storage node providing 540% more capacity than a regular SSD node. Florin Lungu, lead DevOps engineer and VP at Deutsche Bank, comments:
Bigtable tiered storage offers a solution to manage data costs without having to sacrifice data. (…) This could significantly impact how organizations optimize their data storage strategies.
To move data back to SSD, developers must either increase the tiering policy age threshold to include older data in SSD, disable tiered storage, or rewrite the data with a new timestamp and delete the older copy.
Bigtable pricing is based on compute capacity, database storage, backup storage, and network usage, with colder storage up to 85% cheaper than SSD storage. Tiered storage is not available for Bigtable HDD instances. Furthermore, Bigtable Data Boost and hot backups are not supported.
Earlier this year, Google introduced tiered storage on Spanner, the managed distributed SQL database, as previously reported on InfoQ.
