Author: Balamurali M

Organizations rely on information systems and data to run their daily operations. Reliable data storage infrastructure is key in minimizing risk of disruption of activities. For any organization that is dependent heavily on data, and most of the organizations are, data storage and operations play a very important role.

We’ll first explore the key terms related to Data Storage.

Data Base

Data Base is any collection of stored data.

Data Storage

Data is stored in a variety of formats and devices, from hard drives to cloud storage. Data Storage also involves the collection, management, and retrieval of data in digital form.

Storage Medium

Storage Medium is the physical device or location where data is stored. Examples include hard drives (HDDs), solid-state drives (SSDs), etc.

Data Backup & Recovery

Regular data backups ensure that in case of hardware failure, data corruption, or other emergencies, one can restore the database to a previous state. Databases often automate this process, using snapshots or incremental backups

Data Compression

Data Compression refers to the process of reducing the size of data files to save storage space. Compression algorithms are used to make files smaller, allowing for more efficient use of storage. There are two main types of compression: lossless (where no data is lost) and lossy (where some data loss involved to achieve higher compression rates).

Encryption

Encryption is for ensuring data security and involves converting data into a code to prevent unauthorized access. Only with the correct decryption key one can access the information.

Schema

Schema is a subset of database objects contained within the database or an instance

There are some important goals related to data storage such as Designing for Reuse, Implementing Best Practices, Supporting Requirements, Continuous Automation, Establishing DBA Role

Designing for Reuse

Always design data storage systems and processes in such a way that components can be reused in different scenarios. Ensure you are not reinventing the wheel. Creating Database views, triggers, functions, stored procedures, data access layers, templates, standard configurations with reusability in mind saves a lot of  time and also helps ensure consistency and reliability across different systems.

Implementing Best Practices

Guidelines should be developed to ensure that data is stored, managed, and protected efficiently. Adhering to best practices helps prevent data loss, unauthorized access, and system failures.

Supporting Requirements

The Service Level Agreements (SLA) should reflect the DBA recommended methods & standards which should be accepted by the development team for ensuring data security, data integrity, etc.

Continuous Automation

Automation in data storage refers to the use of tools and technologies to handle routine tasks, which would otherwise require manual intervention. Tasks such as data backups, maintenance, and performance monitoring should be automated. Automation allows you to set these tasks to happen regularly without the need of direct oversight.

Establish DBA Role

The Data Base Admin role is critical as DBAs are responsible for the overall performance, integrity, and security of a database. Their job includes installing and configuring databases, setting up user accounts, monitoring system performance, ensuring that data is backed up and so on. DBAs collaborate with developers and IT staff to optimize database performance, troubleshoot problems, and implement changes.

Now, we will discuss some of the key concepts in Data Storage

Data Storage Architecture

Data Storage Architecture refers to the way storage systems are designed and organized.

Federated Database Architecture

A federated database system is essentially a collection of multiple autonomous databases that function as a single, unified system. In this architecture, each database retains its own autonomy, i.e. it operates independently but can communicate and exchange data with other databases in the federation. This system is ideal for organizations that need to integrate multiple databases across different departments, regions, or even different platforms, without giving up control over their individual systems. A federated architecture allows the databases to work together without merging them into one large system. This helps maintain the autonomy of each department.

Non-Federated Database Architecture

Non-Federated Database Architecture is also known as a centralized or monolithic architecture. In this system, data is stored and managed within a single, unified database. Non-federated architectures are often simpler to manage because they don’t require coordination between different systems. The data is stored in one place, which makes it easier to maintain consistency and enforce data integrity.

Local vs. Cloud Storage

Local storage involves saving data on physical devices like hard drives or servers located on-site. On the other hand, cloud storage stores data on remote servers accessed via the internet.

Data Redundancy and Replication

Redundancy ensures that data is duplicated or backed up in multiple locations to prevent data loss in case of hardware failure. Replication refers to creating real-time copies of data in systems across geographic locations. This is particularly important in environments where uninterrupted data access is crucial, such as financial institutions or healthcare systems.

Data Scalability

Scalability refers to the ability of a storage system to grow with increasing data needs. Systems must be designed to handle ever-growing volumes of data without compromising performance. In cloud environments, scalability is often achieved through elastic storage, which dynamically allocates resources as demand increases.

Data Security

Data security refers to the essential measures for protecting sensitive data from unauthorized access or breaches. This involves Encryption, access control, authentication, etc.

ACID

Database Transactions must follow the ACID principles: Atomicity, Consistency, Isolation, and Durability.

Atomicity means a transaction is either fully completed or rolled back. i.e. if one part of the transaction failed, entire transaction fails

Consistency means the database remains consistent before and after the transaction. The transaction must meet all rules and must void half completed transactions.

Isolation means Transactions occur independently without any interference.

Durability means Once a transaction is committed, it remains permanent even if system crashes

CAP Theorem

The CAP theorem states that in a distributed database, you can only guarantee two out of three properties: Consistency, Availability, and Partition Tolerance. AT most, 2 of the 3 properties can exist in any shared-data system.  Below are the properties:

1)       Consistency means – At all times, system must operate as designed and expected. All the nodes see the same data at the same time.

2)       Availability means System must be available when requested and must respond to each request

3)       Partition Tolerance means During events of partial system failures or data loss, system should be able to continue operations. Partition Tolerance allows the system to function even if there’s a network split.

BASE Systems

BASE type system is an alternative to traditional ACID properties used in databases. BASE stands for ‘Basically Available, Soft state, Eventually consistent.’ These systems are designed for scalability, favoring availability and partition tolerance over immediate consistency. In BASE, data may be temporarily inconsistent, but it will eventually reach consistency. This approach is commonly used in distributed systems, like NoSQL databases, where high availability and fault tolerance are critical.

Understanding how databases process data is essential to managing, storing, and retrieving information effectively in any organization.

Database Processing refers to the actions taken to store, retrieve, update, and manage data within a database. When one interact with a database—whether they are retrieving information or inserting new records—they are initiating a process to ensure data is stored in a structured manner and is quickly retrievable when needed.

Transaction Processing

One of the core aspects of database processing is transaction processing. A transaction refers to a single unit of work performed within a database, such as adding, deleting, or updating data. Transactions must follow the ACID principles. We had already explained about ACID principles earlier.

Query Processing

When one wants to retrieve specific data from a database, one can issue a query, often written in SQL (Structured Query Language). The database management system (DBMS) then processes that query and returns the requested data.

Data Storage and Indexing

A crucial part of database processing is how the data is physically stored and indexed. Indexes allow the database to quickly find the information without scanning every record. When you store data in a database, especially in large systems, the DBMS creates these indexes to improve the performance of data retrieval.

Concurrency Control

Databases are able to handle requests from multiple users simultaneously. Concurrency control ensures that multiple transactions can occur at the same time without causing inconsistencies.

You can also refer to this article at: https://www.linkedin.com/pulse/key-concepts-data-storage-balamurali-m-1f2zc/