The 97/3 Problem
The 97/3 Problem describes the concentration of data or work in which about 97 percent resides in cold or infrequently accessed tiers while about 3 percent is hot or frequently accessed, creating imbalance in storage, governance, and cost structures.
Expanded Explanation
1. Technical Function and Core Characteristics
The 97/3 Problem refers to an observed imbalance between a small volume of highly active data and a large volume of inactive or archival data in enterprise environments. It characterizes how most storage capacity holds rarely accessed data while a small fraction receives most queries and processing.
This construct appears in discussions of information lifecycle management, data warehousing, and analytics platforms. It highlights how organizations maintain large datasets on primary or nearline storage even when only a small subset participates in operational workloads or analytics jobs.
2. Enterprise Usage and Architectural Context
Enterprises reference the 97/3 Problem when evaluating data tiering strategies, such as separating hot, warm, and cold data across storage classes. It informs policies for moving inactive data to lower-cost tiers while preserving access paths for compliance, audit, and historical analysis.
Architects use the concept when planning data lakes, log retention systems, and backup architectures, where most retained data supports traceability and governance rather than daily processing. It also appears in cost models that compare primary storage, object storage, and archival services.
3. Related or Adjacent Technologies
The 97/3 Problem relates to hierarchical storage management, information lifecycle management, and tiered storage architectures that assign datasets to storage media based on access patterns and retention requirements. It also connects to data minimization and retention policies in regulatory frameworks.
Technologies such as object storage, cloud archival services, and nearline storage classes address the imbalance by offering lower-cost capacity for cold data. Query engines that support separation of compute and storage also intersect with this problem by enabling on-demand access to cold data without constant residency in hot tiers.
4. Business and Operational Significance
For enterprises, the 97/3 Problem affects storage spending, capacity planning, and total cost of ownership for data platforms. Organizations that retain large volumes of inactive data on premium storage incur operational expenses that do not align with actual usage patterns.
The concept also affects risk management and compliance because large cold datasets may contain regulated or sensitive information. Addressing the 97/3 Problem supports more precise retention practices, auditability, and alignment between storage policies, data governance, and financial planning.