06. Transactional vs Analytical Databases

This is the sixth chapter of the DB engineering series. As a prerequisite ensure that you read the previous chapters.

Jump to other chapters:

Introduction

In order to achieve better performance, efficiency, and maintainability for both transactional and analytical tasks, there’s a need to have specific designs for these databases. Transactional and analytical databases are designed to serve different purposes in managing and analyzing data.
Let’s understand how they differ from each other and why these differences need to be considered while designing a system.

These two fundamental database processing systems (transactional and analytical) are also referred to as OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) databases respectively.

Here’s an introduction to the key characteristics and differences between these two types of databases:

Transactional Databases

In most transactional tasks, there’s a requirement for faster operation of read and write queries and rapid transaction processing. Transactional databases are used in the live applications that you used every day. They are primarily designed for day-to-day transactional operations and are optimized for high-speed data processing and real-time updates. These databases are commonly used in operational systems, such as e-commerce platforms, banking systems, and inventory management systems.

Key features of transactional databases include:

1. ACID Compliance: Transactional databases adhere to ACID (Atomicity, Consistency, Isolation, Durability) properties, ensuring that transactions are executed reliably and accurately. ACID compliance guarantees that database transactions are either fully completed or fully rolled back in case of failures.

2. Fast Read and Write Operations: Transactional databases are optimized for quick and efficient read and write operations, enabling real-time updates and maintaining data integrity. They are designed to handle a large number of concurrent transactions, ensuring that data remains consistent during concurrent access.

3. Normalized Data Structure: Transactional databases typically use a normalized data model to minimize redundancy and maintain data consistency. This approach reduces data duplication but may require more complex joins and queries to retrieve information.

The terms “transactional databases” and “OLTP” may be used interchangeably, they both refer to the same concept of managing real-time transactional processing within a database system.

Analytical Databases

Analytical databases used data warehouses or decision support systems, are designed to support complex analytical queries and reporting. They are optimized for read-intensive operations and facilitate advanced data analysis and business intelligence activities. Key features of analytical databases include:

1. Aggregation and Summarization: Analytical databases allow for the efficient aggregation and summarization of large volumes of data. They enable the execution of complex queries involving joins, filtering, and calculations across multiple tables to provide meaningful insights.

2. Denormalized Data Structure: Unlike transactional databases, analytical databases often employ a denormalized data structure. Denormalization simplifies complex queries by storing redundant data, which enhances query performance but may increase storage requirements.

3. Historical Data Preservation: Analytical databases are designed to retain and analyze historical data over time. They provide the ability to store and access large volumes of data for long periods, allowing for trend analysis, forecasting, and comparisons across different time periods.

4. Read-Optimized Performance: Analytical databases prioritize query performance and are optimized for complex, ad-hoc queries involving large datasets. They utilize indexing, partitioning, and other optimization techniques to deliver faster query response times.

The terms “analytical databases” and “OLAP” may be used interchangeably, they both refer to the same concept of managing and analyzing large volumes of data for complex analysis and reporting purposes.

Row-oriented and column-oriented storage

Row-oriented and column-oriented storage are two different approaches for organizing and storing data in a database. Each method offers specific advantages and is suited for different types of data processing workloads.

Let’s understand how both work in detail. Consider a stock market ticker.
Let’s say the denormalized values are stored in a table with columns as below.

id	ticker	company_name	current_price	change	percent_change	timestamp
699	AAPL	Apple Inc.	$135.25	+1.50	+1.12%	2023-07-03 14:00:01
700	GOOGL	Alphabet Inc.	$2,780.50	-3.20	-0.12%	2023-07-03 14:00:01
701	MSFT	Microsoft Corporation	$305.75	-0.60	-0.20%	2023-07-03 14:00:01
702	AMZN	Amazon.com, Inc.	$3,500.00	+10.25	+0.29%	2023-07-03 14:00:01
703	TSLA	Tesla, Inc.	$700.50	-5.80	-0.82%	2023-07-03 14:00:01
704	AAPL	Apple Inc.	$136.25	-1.00	-1.1%	2023-07-03 14:00:02
705	GOOGL	Alphabet Inc.	$2,760.50	-20.00	-0.2%	2023-07-03 14:00:02
706	MSFT	Microsoft Corporation	$300.75	-5.00	-0.30%	2023-07-03 14:00:02
707	AMZN	Amazon.com, Inc.	$3,505.00	+5.00	+0.30%	2023-07-03 14:00:02
708	TSLA	Tesla, Inc.	$699.50	-1.00	-0.12%	2023-07-03 14:00:02

Row-oriented storage

In row-oriented storage, data is stored in a sequential manner, where each row of a table is stored together in a contiguous block on the disk. This means that all the columns of a row are kept together in the storage.

For example for the above table, all values of columns of row 700 are stored together.

This approach is commonly used in traditional relational databases and is well-suited for OLTP (Online Transaction Processing) workloads that involve frequent read-and-write operations on individual rows.

Advantages of row-oriented storage:

– Efficient for transactional operations: Since OLTP workloads often involve retrieving entire rows at once, row-oriented storage is efficient for handling transactional operations that deal with individual records.

– Simplified data retrieval: Row-oriented storage allows for faster retrieval of the required data when fetching complete rows, such as in single-record lookups.

– Better suited for updates and inserts: Row-oriented storage performs well when updating or inserting new records, as the data of a complete row is stored together, making it easy to modify a single row without affecting other columns.

Now, what if we wanted to find the average price of TSLA on July 3rd, 2023? As you might have guessed, we will take an average of the column current_price. Under the hood, however, Postgres will scan all the matching rows, pick up all the columns and drop the non-relevant ones i.e. the values of columns like company_name, percent_change are scanned unnecessarily. These kinds of aggregation queries are where column-oriented databases shine.

Column-oriented storage

In column-oriented storage, data is organized and stored by columns rather than by rows. Each column of a table is stored together, and data belonging to the same column is kept in contiguous blocks on disk.

So, in our stock market example, all values of current_price are stored in one place on the disk which helps in picking up only the relevant column values for aggregation.

This approach is commonly used in analytical databases and data warehouses, where the primary focus is on complex data analysis and reporting.

Advantages of column-oriented storage:

– Enhanced analytical query performance: Column-oriented storage excels in analytical workloads that involve aggregations, filtering, and processing large datasets. Since each column is stored separately, the database engine can skip over irrelevant columns during query execution, leading to significant performance improvements.

– Improved data compression: Columns typically contain repeated values, enabling better compression ratios, reducing storage requirements, and improving query performance by reading fewer data from disk.

As you might have guessed, the common RDBMS we use like MySQL, PostgreSQL, and Oracle for transactional workloads are all row-oriented databases.

On the other hand, databases for analytical processing like Apache Cassandra, Amazon Redshift, and Apache HBase are all column-oriented databases.

💡 Did you know?
AlloyDB launched by Google is a unified data platform that can support both transactional and analytical workloads.
Column-oriented storage is not just limited to databases. Apache Parquet adopts a column-oriented approach, which provides advantages in terms of data storage and retrieval performance.

DB Types Quiz

4 Questions

Let's have a quick knowledge check of what you have learned.

Conclusion

Choosing between a row-oriented and column-oriented database in system design can have significant implications on the performance, storage efficiency, and query capabilities of the system. Understanding the workload characteristics, data access patterns, and performance expectations will help determine the most suitable option for your system design.

Photo by Anna Hunko on Unsplash