cpp-know-hows

cpp related stuff

View on GitHub

Unordered Containers

Overview

Unordered containers use hash tables for storage, providing average O(1) access time. They don’t maintain any ordering of elements but offer significantly faster lookup, insertion, and deletion for most use cases.

┌──────────────────────────────────────────────────────────┐
│           UNORDERED CONTAINERS FAMILY                    │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  std::unordered_set        Hash-based unique elements    │
│  std::unordered_multiset   Hash-based with duplicates    │
│  std::unordered_map        Hash-based key-value pairs    │
│  std::unordered_multimap   Hash-based with dup keys      │
│                                                          │
│  All use hash tables (introduced in C++11)               │
│  Average O(1) lookup, worst case O(n)                    │
│                                                          │
└──────────────────────────────────────────────────────────┘

Hash Table Fundamentals

Hash Table Structure

Hash Table with Separate Chaining:

Hash Function: h(key) → bucket index

┌─────────────────────────────────────────────────┐
│              Bucket Array                       │
├─────────────────────────────────────────────────┤
│ [0] → [key1, val1] → [key5, val5] → nullptr     │
│ [1] → nullptr                                   │
│ [2] → [key3, val3] → nullptr                    │
│ [3] → [key2, val2] → [key9, val9] → nullptr     │
│ [4] → nullptr                                   │
│ [5] → [key4, val4] → nullptr                    │
│ [6] → nullptr                                   │
│ [7] → [key6, val6] → nullptr                    │
└─────────────────────────────────────────────────┘
         │                    │
         └────── Chain ───────┘
         (collision resolution)

Load Factor = num_elements / num_buckets
When load factor > threshold (usually 1.0), rehash occurs

Hash Function Requirements

// A good hash function should:
// 1. Be deterministic (same input → same output)
// 2. Distribute keys uniformly
// 3. Be fast to compute
// 4. Minimize collisions

Example hash function for string:
size_t hash(const string& s) {
    size_t h = 0;
    for (char c : s) {
        h = h * 31 + c;  // Prime multiplier reduces collisions
    }
    return h;
}

Bucket index = hash(key) % bucket_count

Collision Resolution

Method 1: Separate Chaining (used by C++ STL)
Each bucket points to a linked list of elements

Bucket[3]: [A→B→C→nullptr]

Method 2: Open Addressing (not used by STL)
Find next available bucket
Bucket[3]: [A]
Bucket[4]: [B]
Bucket[5]: [C]

1. std::unordered_set

Description

Hash-based set with unique elements. No ordering guarantee, but O(1) average access time.

Internal Structure

unordered_set<string> s = {"cat", "dog", "bird", "fish"};

Possible internal layout (bucket view):
┌──────────────────────────────────────┐
│ Bucket 0: →                          │
│ Bucket 1: → ["dog"] →                │
│ Bucket 2: → ["bird"] →               │
│ Bucket 3: →                          │
│ Bucket 4: → ["cat"] → ["fish"] →     │
│ Bucket 5: →                          │
│ Bucket 6: →                          │
│ Bucket 7: →                          │
└──────────────────────────────────────┘

Note: Order is determined by hash values, not insertion order!
Iteration order is NOT predictable!

Basic Operations

#include <unordered_set>
#include <iostream>
#include <string>

int main() {
    // Construction
    std::unordered_set<int> s1;
    std::unordered_set<int> s2 = {1, 5, 3, 9, 2};
    std::unordered_set<std::string> s3 = {"hello", "world"};
    
    // Insertion - O(1) average
    std::unordered_set<int> s;
    s.insert(42);
    s.insert(17);
    s.insert(42);  // Ignored - duplicate
    
    auto [it, success] = s.insert(99);
    if (success) {
        std::cout << "Inserted: " << *it << '\n';
    }
    
    s.emplace(56);  // Construct in-place
    
    // Search - O(1) average
    if (s.find(42) != s.end()) {
        std::cout << "Found 42\n";
    }
    
    // C++20: contains()
    if (s.contains(17)) {
        std::cout << "Contains 17\n";
    }
    
    // Count (returns 0 or 1)
    size_t count = s.count(99);
    
    // Erase - O(1) average
    s.erase(42);           // By value
    s.erase(s.find(17));   // By iterator
    
    // Size
    std::cout << "Size: " << s.size() << '\n';
    
    // Iteration (no guaranteed order!)
    for (int x : s) {
        std::cout << x << ' ';
    }
    
    return 0;
}

Hash Table Configuration

#include <unordered_set>

int main() {
    std::unordered_set<int> s;
    
    // Hash table properties
    std::cout << "Bucket count: " << s.bucket_count() << '\n';
    std::cout << "Max bucket count: " << s.max_bucket_count() << '\n';
    std::cout << "Load factor: " << s.load_factor() << '\n';
    std::cout << "Max load factor: " << s.max_load_factor() << '\n';
    
    // Set max load factor (default is usually 1.0)
    s.max_load_factor(0.75f);
    
    // Reserve space for N elements
    s.reserve(1000);  // Allocates enough buckets for 1000 elements
    
    // Rehash to specific bucket count
    s.rehash(128);    // Set bucket count to at least 128
    
    // Check which bucket an element is in
    s.insert(42);
    size_t bucket = s.bucket(42);
    std::cout << "42 is in bucket " << bucket << '\n';
    
    // Iterate through a specific bucket
    for (auto it = s.begin(bucket); it != s.end(bucket); ++it) {
        std::cout << *it << ' ';
    }
    
    return 0;
}

Performance Characteristics

Operation       | Average | Worst Case | Notes
----------------|---------|------------|----------------------
Search          | O(1)    | O(n)       | Worst when all hash to same bucket
Insert          | O(1)*   | O(n)       | *May trigger rehash: O(n)
Erase           | O(1)    | O(n)       | 
Iteration       | O(n)    | O(n)       | Must visit all buckets

Space complexity: O(n), but with overhead for buckets

When to Use unordered_set

✓ Need fast lookups (don’t need ordering)
✓ Checking membership frequently
✓ Large datasets where O(1) vs O(log n) matters
✗ Need elements in sorted order (use set)
✗ Need range queries (use set)
✗ Hash function is expensive
✗ Worst-case guarantees needed (use set)


2. std::unordered_multiset

Description

Like unordered_set, but allows duplicate elements.

Basic Operations

#include <unordered_set>
#include <iostream>

int main() {
    std::unordered_multiset<int> ms;
    
    // Insert duplicates
    ms.insert(5);
    ms.insert(3);
    ms.insert(5);
    ms.insert(5);
    ms.insert(3);
    // ms contains: {3, 3, 5, 5, 5} (order not guaranteed!)
    
    // Count occurrences - O(k) where k is count
    std::cout << "Count of 5: " << ms.count(5) << '\n';  // 3
    
    // Find (returns iterator to any occurrence)
    auto it = ms.find(5);
    
    // Get all occurrences
    auto range = ms.equal_range(5);
    std::cout << "All 5s: ";
    for (auto it = range.first; it != range.second; ++it) {
        std::cout << *it << ' ';
    }
    std::cout << '\n';
    
    // Erase all occurrences
    ms.erase(5);  // Removes all 5s
    
    // Erase single occurrence
    auto found = ms.find(3);
    if (found != ms.end()) {
        ms.erase(found);  // Removes only one 3
    }
    
    return 0;
}

Multiset Hash Distribution

unordered_multiset<int> with values {5, 5, 5, 3, 3}

Bucket layout:
┌──────────────────────────────────────┐
│ Bucket 0: →                          │
│ Bucket 1: →                          │
│ Bucket 2: →                          │
│ Bucket 3: → [3] → [3] →              │  ← All 3s in same bucket
│ Bucket 4: →                          │
│ Bucket 5: → [5] → [5] → [5] →        │  ← All 5s in same bucket
│ Bucket 6: →                          │
│ Bucket 7: →                          │
└──────────────────────────────────────┘

Duplicates with same hash always go to same bucket!

3. std::unordered_map

Description

Hash table implementation of key-value pairs. Keys are unique, providing O(1) average access to values.

Internal Structure

unordered_map<string, int> ages = {
    {"Alice", 25}, {"Bob", 30}, {"Charlie", 35}
};

Bucket layout:
┌─────────────────────────────────────────────┐
│ Bucket 0: →                                 │
│ Bucket 1: → ["Bob", 30] →                   │
│ Bucket 2: →                                 │
│ Bucket 3: → ["Alice", 25] →                 │
│ Bucket 4: →                                 │
│ Bucket 5: → ["Charlie", 35] →               │
│ Bucket 6: →                                 │
│ Bucket 7: →                                 │
└─────────────────────────────────────────────┘

Each bucket contains pairs: {key, value}

Basic Operations

#include <unordered_map>
#include <string>
#include <iostream>

int main() {
    // Construction
    std::unordered_map<std::string, int> ages;
    std::unordered_map<std::string, int> scores = {
        {"Alice", 95},
        {"Bob", 87},
        {"Charlie", 92}
    };
    
    // Insertion
    ages["Alice"] = 25;           // operator[]
    ages.insert({"Bob", 30});     // insert with pair
    ages.emplace("Charlie", 35);  // emplace
    
    // C++17: try_emplace (doesn't overwrite)
    ages.try_emplace("Alice", 99);  // Alice remains 25
    
    // C++17: insert_or_assign
    ages.insert_or_assign("Alice", 26);  // Updates to 26
    
    // Access - O(1) average
    int alice_age = ages["Alice"];      // Creates if doesn't exist!
    int bob_age = ages.at("Bob");       // Throws if doesn't exist
    
    // Safe access
    if (ages.contains("Charlie")) {     // C++20
        std::cout << ages["Charlie"] << '\n';
    }
    
    // Or pre-C++20:
    auto it = ages.find("David");
    if (it != ages.end()) {
        std::cout << it->second << '\n';
    } else {
        std::cout << "Not found\n";
    }
    
    // Erase
    ages.erase("Bob");              // By key
    ages.erase(ages.find("Charlie")); // By iterator
    
    // Iteration (no guaranteed order!)
    for (const auto& [name, age] : ages) {  // C++17 structured bindings
        std::cout << name << ": " << age << '\n';
    }
    
    return 0;
}

operator[] Behavior

std::unordered_map<std::string, int> m;

// operator[] ALWAYS succeeds (creates if needed)
int x = m["key1"];  
// Equivalent to:
// if (!m.contains("key1")) {
//     m["key1"] = int();  // Default construct (0 for int)
// }
// return m["key1"];

std::cout << m["key1"];  // 0 (default int value)

// Use find() if you don't want to create:
auto it = m.find("key2");
if (it != m.end()) {
    std::cout << it->second;
} else {
    std::cout << "Not found";
}

map vs unordered_map Comparison

                        std::map            std::unordered_map
────────────────────────────────────────────────────────────────
Internal structure      Red-Black Tree      Hash Table
Search                  O(log n)            O(1) average
Insert                  O(log n)            O(1) average
Delete                  O(log n)            O(1) average
Worst case search       O(log n)            O(n)
Iteration order         Sorted by key       No order
Range queries           Supported           Not supported
Memory overhead         Moderate            Higher
Predictable performance Yes                 Mostly

Custom Hash Function

#include <unordered_map>
#include <string>

// For standard types, std::hash is defined
// For custom types, provide hash function

struct Person {
    std::string name;
    int age;
    
    bool operator==(const Person& other) const {
        return name == other.name && age == other.age;
    }
};

// Method 1: Specialize std::hash
namespace std {
    template<>
    struct hash<Person> {
        size_t operator()(const Person& p) const {
            // Combine hashes
            size_t h1 = std::hash<std::string>{}(p.name);
            size_t h2 = std::hash<int>{}(p.age);
            return h1 ^ (h2 << 1);  // Simple combination
        }
    };
}

// Method 2: Provide custom hasher
struct PersonHash {
    size_t operator()(const Person& p) const {
        return std::hash<std::string>{}(p.name) ^ 
               (std::hash<int>{}(p.age) << 1);
    }
};

int main() {
    // Using specialized std::hash
    std::unordered_map<Person, std::string> m1;
    m1[{"Alice", 25}] = "Engineer";
    
    // Using custom hasher
    std::unordered_map<Person, std::string, PersonHash> m2;
    m2[{"Bob", 30}] = "Doctor";
    
    return 0;
}

Hash Combination Techniques

// Simple XOR combination (not ideal for all cases)
size_t hash1 = std::hash<T1>{}(value1);
size_t hash2 = std::hash<T2>{}(value2);
return hash1 ^ hash2;

// Better: XOR with shift (reduces symmetry problems)
return hash1 ^ (hash2 << 1);

// Boost's hash_combine approach
size_t hash_combine(size_t h1, size_t h2) {
    return h1 ^ (h2 + 0x9e3779b9 + (h1 << 6) + (h1 >> 2));
}

// For multiple values
size_t hash = 0;
hash = hash_combine(hash, std::hash<T1>{}(v1));
hash = hash_combine(hash, std::hash<T2>{}(v2));
hash = hash_combine(hash, std::hash<T3>{}(v3));

4. std::unordered_multimap

Description

Hash table with multiple values allowed per key.

Basic Operations

#include <unordered_map>
#include <string>
#include <iostream>

int main() {
    std::unordered_multimap<std::string, int> grades;
    
    // Insert multiple values for same key
    grades.insert({"Alice", 85});
    grades.insert({"Alice", 92});
    grades.insert({"Alice", 88});
    grades.insert({"Bob", 78});
    grades.insert({"Bob", 91});
    
    // NO operator[] (which value would it return?)
    
    // Count
    std::cout << "Alice has " << grades.count("Alice") << " grades\n";  // 3
    
    // Find all values for a key
    auto range = grades.equal_range("Alice");
    std::cout << "Alice's grades: ";
    for (auto it = range.first; it != range.second; ++it) {
        std::cout << it->second << ' ';
    }
    std::cout << '\n';
    
    // Erase all values for a key
    grades.erase("Alice");  // Removes all of Alice's grades
    
    // Erase single value
    auto it = grades.find("Bob");
    if (it != grades.end()) {
        grades.erase(it);  // Removes only one Bob grade
    }
    
    return 0;
}

Bucket Distribution for Multimap

unordered_multimap<string, int> grades:
{"Alice", 85}, {"Alice", 92}, {"Alice", 88}

All entries with key "Alice" hash to same bucket:
┌─────────────────────────────────────────────┐
│ Bucket hash("Alice"): → ["Alice",85]        │
│                        → ["Alice",92]       │
│                        → ["Alice",88] →     │
└─────────────────────────────────────────────┘

All values for same key are in same bucket (adjacent in chain)

Advanced Topics

1. Load Factor and Rehashing

#include <unordered_set>
#include <iostream>

void analyze_hash_table() {
    std::unordered_set<int> s;
    
    std::cout << "Initial bucket count: " << s.bucket_count() << '\n';
    
    for (int i = 0; i < 100; ++i) {
        s.insert(i);
        
        float lf = s.load_factor();
        if (lf > 0.9) {  // Approaching max_load_factor
            std::cout << "Size: " << s.size() 
                      << ", Buckets: " << s.bucket_count()
                      << ", Load factor: " << lf << '\n';
        }
    }
}

Rehashing Visualization

Before rehash (load factor > threshold):
┌────────────────┐
│ Bucket 0: →[5] │
│ Bucket 1: →[1]→[6] │
│ Bucket 2: →[2]→[7] │
│ Bucket 3: →[3] │
└────────────────┘
Size: 6, Buckets: 4, Load: 1.5

After rehash (double bucket count):
┌────────────────┐
│ Bucket 0: →    │
│ Bucket 1: →[1] │
│ Bucket 2: →[2] │
│ Bucket 3: →[3] │
│ Bucket 4: →    │
│ Bucket 5: →[5] │
│ Bucket 6: →[6] │
│ Bucket 7: →[7] │
└────────────────┘
Size: 6, Buckets: 8, Load: 0.75

All elements rehashed to new buckets!
Cost: O(n) operation

2. Performance Optimization

// Tip 1: Reserve space if you know size
std::unordered_map<int, int> m;
m.reserve(10000);  // Prevents multiple rehashes
for (int i = 0; i < 10000; ++i) {
    m[i] = i * 2;
}

// Tip 2: Adjust max load factor
m.max_load_factor(0.75f);  // Lower = less collisions, more memory

// Tip 3: Use emplace instead of insert
m.emplace(42, 84);  // Constructs in-place
// vs
m.insert({42, 84}); // Creates temporary pair

// Tip 4: Use find() instead of [] for lookups
// BAD (creates element if not found):
if (m[key] == value) { /* ... */ }

// GOOD:
auto it = m.find(key);
if (it != m.end() && it->second == value) { /* ... */ }

3. Bucket Interface

#include <unordered_map>
#include <iostream>

void inspect_buckets() {
    std::unordered_map<int, std::string> m = {
        {1, "one"}, {2, "two"}, {3, "three"}, 
        {4, "four"}, {5, "five"}
    };
    
    std::cout << "Bucket count: " << m.bucket_count() << '\n';
    
    // Iterate through all buckets
    for (size_t i = 0; i < m.bucket_count(); ++i) {
        std::cout << "Bucket " << i << " has " 
                  << m.bucket_size(i) << " elements: ";
        
        // Iterate elements in this bucket
        for (auto it = m.begin(i); it != m.end(i); ++it) {
            std::cout << "{" << it->first << ":" << it->second << "} ";
        }
        std::cout << '\n';
    }
    
    // Find which bucket a key is in
    int key = 3;
    std::cout << "Key " << key << " is in bucket " 
              << m.bucket(key) << '\n';
}

4. C++20 Features

#include <unordered_map>

int main() {
    std::unordered_map<int, std::string> m = {
        {1, "one"}, {2, "two"}, {3, "three"}
    };
    
    // contains() - much cleaner than find()
    if (m.contains(2)) {
        std::cout << "Found 2\n";
    }
    
    // erase_if() - remove elements matching predicate
    std::erase_if(m, [](const auto& pair) {
        return pair.first % 2 == 0;  // Remove even keys
    });
    
    return 0;
}

Common Use Cases

Use Case 1: Caching

#include <unordered_map>
#include <string>

class Cache {
    std::unordered_map<std::string, int> cache_;
    
public:
    int get(const std::string& key) {
        auto it = cache_.find(key);
        if (it != cache_.end()) {
            return it->second;  // Cache hit
        }
        
        // Cache miss - compute and store
        int value = expensive_computation(key);
        cache_[key] = value;
        return value;
    }
    
private:
    int expensive_computation(const std::string& key) {
        // ... expensive operation ...
        return 42;
    }
};

Use Case 2: Frequency Counter

#include <unordered_map>
#include <vector>
#include <string>

std::unordered_map<std::string, int> count_words(const std::vector<std::string>& words) {
    std::unordered_map<std::string, int> frequency;
    
    for (const auto& word : words) {
        frequency[word]++;  // operator[] creates with 0 if not found
    }
    
    return frequency;
}

// Usage:
std::vector<std::string> text = {"hello", "world", "hello", "cpp"};
auto freq = count_words(text);
// freq = { {"hello", 2}, {"world", 1}, {"cpp", 1} }

Use Case 3: Graph Adjacency List

#include <unordered_map>
#include <unordered_set>
#include <vector>

class Graph {
    std::unordered_map<int, std::unordered_set<int>> adj_list_;
    
public:
    void add_edge(int from, int to) {
        adj_list_[from].insert(to);
    }
    
    const std::unordered_set<int>& neighbors(int node) const {
        static const std::unordered_set<int> empty;
        auto it = adj_list_.find(node);
        return it != adj_list_.end() ? it->second : empty;
    }
    
    bool has_edge(int from, int to) const {
        auto it = adj_list_.find(from);
        return it != adj_list_.end() && it->second.contains(to);
    }
};

Use Case 4: Two Sum Problem

#include <unordered_map>
#include <vector>

// Find two numbers that sum to target
std::vector<int> two_sum(const std::vector<int>& nums, int target) {
    std::unordered_map<int, int> seen;  // value → index
    
    for (int i = 0; i < nums.size(); ++i) {
        int complement = target - nums[i];
        
        if (seen.contains(complement)) {
            return {seen[complement], i};
        }
        
        seen[nums[i]] = i;
    }
    
    return {};  // Not found
}

// O(n) time, O(n) space
// Much better than O(n²) brute force!

Performance Analysis

Time Complexity Summary

Operation           | Average  | Worst Case | Amortized
--------------------|----------|------------|------------
Search              | O(1)     | O(n)       | -
Insert              | O(1)     | O(n)       | O(1)
Erase               | O(1)     | O(n)       | -
Count (multimap)    | O(k)     | O(n)       | -
Rehash              | -        | O(n)       | -

k = number of elements with same key

Space Complexity

Container occupies:
- Array of buckets: O(bucket_count)
- Elements with overhead: O(n) × (element_size + pointer_overhead)

Typical overhead per element: 8-16 bytes (pointers for chaining)

Example:
unordered_set<int>: ~16-20 bytes per element
unordered_map<int,int>: ~24-28 bytes per pair

Compare to:
set<int>: ~32 bytes per element (tree node overhead)
vector<int>: ~4 bytes per element (no overhead)

Benchmark Comparison

Operation: 1,000,000 searches

Data Structure          | Time      | Notes
------------------------|-----------|-------------------------
vector (unsorted)       | ~15 ms    | O(n) linear search
vector (sorted, binary) | ~0.5 ms   | O(log n) binary search
set                     | ~1.2 ms   | O(log n) tree search
unordered_set           | ~0.1 ms   | O(1) average hash lookup

Conclusion: unordered_set is 10x+ faster for large datasets!

Common Pitfalls

1. Forgetting operator[] Creates Elements

std::unordered_map<int, int> m;
std::cout << m[42];  // Creates m[42] = 0, then prints 0!

// After this line, m.size() == 1

// Use find() for checking without modification
auto it = m.find(42);
if (it != m.end()) {
    std::cout << it->second;
}

2. Poor Hash Function

// BAD: Always returns same hash (everything in one bucket!)
struct BadHash {
    size_t operator()(int x) const {
        return 0;  // Everything hashes to bucket 0!
    }
};

std::unordered_set<int, BadHash> s;
// All operations degrade to O(n)!

// GOOD: Use std::hash or implement properly

3. Pointer Keys Without Custom Hash

// BAD: Hashes pointer address, not pointed value
std::unordered_set<std::string*> s;
std::string str1 = "hello";
std::string str2 = "hello";
s.insert(&str1);
s.insert(&str2);  // Both inserted (different addresses)

// GOOD: Use values or provide custom hash
std::unordered_set<std::string> s2;  // Values, not pointers
s2.insert("hello");
s2.insert("hello");  // Second one ignored

4. Modifying Elements During Iteration

std::unordered_set<int> s = {1, 2, 3, 4, 5};

// BAD: Can invalidate iterators
for (auto it = s.begin(); it != s.end(); ++it) {
    if (*it % 2 == 0) {
        s.erase(it);  // Invalidates it!
    }
}

// GOOD: Use erase return value
for (auto it = s.begin(); it != s.end(); ) {
    if (*it % 2 == 0) {
        it = s.erase(it);  // Returns next valid iterator
    } else {
        ++it;
    }
}

5. Assuming Iteration Order

std::unordered_set<int> s = {1, 2, 3, 4, 5};

// BAD: Assuming order
for (int x : s) {
    std::cout << x << ' ';
}
// Might print: 3 1 4 2 5 (or any other order!)

// If you need order, use set or sort after
std::set<int> ordered_s(s.begin(), s.end());

Decision Guide

Unordered vs Ordered Containers

Choose UNORDERED when:
✓ Need fastest possible lookup
✓ Order doesn't matter
✓ No range queries needed
✓ Good hash function available

Choose ORDERED when:
✓ Need sorted iteration
✓ Range queries (lower_bound, upper_bound)
✓ Need worst-case guarantees
✓ Poor hash function for your type

Flow Chart

Need fast lookups?
├─ Yes
│  └─ Need sorted order?
│     ├─ Yes → use std::set / std::map
│     └─ No  → use std::unordered_set / std::unordered_map
└─ No → use std::vector

Practice Exercises

Exercise 1: Implement LRU Cache

// Implement Least Recently Used cache using:
// - unordered_map for O(1) lookups
// - list for O(1) move-to-front
class LRUCache {
    // Your implementation
};

Exercise 2: Group Anagrams

// Group anagrams together
// Input: ["eat", "tea", "tan", "ate", "nat", "bat"]
// Output: [["eat","tea","ate"], ["tan","nat"], ["bat"]]
// Hint: Use unordered_map with sorted string as key

Exercise 3: First Non-Repeating Character

// Find first character that appears only once
// Use unordered_map to count frequencies in O(n)

Complete Practical Example: Distributed Cache System

Here’s a comprehensive example integrating all unordered containers with custom hash functions and real-world caching:

#include <iostream>
#include <unordered_map>
#include <unordered_set>
#include <string>
#include <vector>
#include <chrono>
#include <list>
#include <functional>
#include <iomanip>

// 1. Custom type for cache keys
struct CacheKey {
    std::string user_id;
    std::string resource_id;
    
    bool operator==(const CacheKey& other) const {
        return user_id == other.user_id && resource_id == other.resource_id;
    }
};

// 2. Custom hash function for CacheKey
struct CacheKeyHash {
    size_t operator()(const CacheKey& key) const {
        // Combine hashes of both fields
        size_t h1 = std::hash<std::string>{}(key.user_id);
        size_t h2 = std::hash<std::string>{}(key.resource_id);
        
        // Good hash combination technique
        return h1 ^ (h2 << 1);
    }
};

// 3. Cache entry with metadata
struct CacheEntry {
    std::string data;
    std::chrono::system_clock::time_point timestamp;
    size_t access_count;
    
    CacheEntry(std::string d)
        : data(std::move(d))
        , timestamp(std::chrono::system_clock::now())
        , access_count(0) {}
};

// 4. LRU Cache implementation
class LRUCache {
private:
    size_t capacity_;
    std::list<std::string> lru_list_;  // Most recent at front
    std::unordered_map<std::string, std::pair<std::string, 
                      std::list<std::string>::iterator>> cache_;
    
public:
    explicit LRUCache(size_t capacity) : capacity_(capacity) {}
    
    std::string get(const std::string& key) {
        auto it = cache_.find(key);
        
        if (it == cache_.end()) {
            return "";  // Cache miss
        }
        
        // Move to front (most recently used)
        lru_list_.erase(it->second.second);
        lru_list_.push_front(key);
        it->second.second = lru_list_.begin();
        
        return it->second.first;
    }
    
    void put(const std::string& key, const std::string& value) {
        auto it = cache_.find(key);
        
        if (it != cache_.end()) {
            // Update existing
            lru_list_.erase(it->second.second);
            lru_list_.push_front(key);
            it->second = {value, lru_list_.begin()};
            return;
        }
        
        // Add new entry
        if (cache_.size() >= capacity_) {
            // Evict least recently used
            std::string lru_key = lru_list_.back();
            lru_list_.pop_back();
            cache_.erase(lru_key);
            std::cout << "  [LRU] Evicted: " << lru_key << "\n";
        }
        
        lru_list_.push_front(key);
        cache_[key] = {value, lru_list_.begin()};
    }
    
    size_t size() const { return cache_.size(); }
};

// 5. Distributed cache with consistent hashing
class DistributedCache {
private:
    // unordered_map: Custom key type with custom hash
    std::unordered_map<CacheKey, CacheEntry, CacheKeyHash> main_cache_;
    
    // unordered_multimap: Tag-based indexing (one tag, many entries)
    std::unordered_multimap<std::string, CacheKey> tag_index_;
    
    // unordered_set: Track active sessions
    std::unordered_set<std::string> active_sessions_;
    
    // unordered_multiset: Access frequency tracking
    std::unordered_multiset<std::string> access_log_;
    
    // LRU cache for hot data
    LRUCache hot_cache_;
    
    // Statistics
    size_t hits_ = 0;
    size_t misses_ = 0;
    
public:
    DistributedCache() : hot_cache_(100) {}
    
    // Store data with tags
    void put(const std::string& user_id, const std::string& resource_id,
             const std::string& data, const std::vector<std::string>& tags) {
        
        CacheKey key{user_id, resource_id};
        
        // Store in main cache (unordered_map with custom hash)
        main_cache_.emplace(key, CacheEntry{data});
        
        // Index by tags (unordered_multimap)
        for (const auto& tag : tags) {
            tag_index_.emplace(tag, key);
        }
        
        // Also store in hot cache
        std::string hot_key = user_id + ":" + resource_id;
        hot_cache_.put(hot_key, data);
        
        std::cout << "Stored: " << user_id << "/" << resource_id 
                  << " with " << tags.size() << " tags\n";
    }
    
    // Retrieve data
    std::string get(const std::string& user_id, const std::string& resource_id) {
        CacheKey key{user_id, resource_id};
        
        // Try hot cache first
        std::string hot_key = user_id + ":" + resource_id;
        std::string hot_data = hot_cache_.get(hot_key);
        if (!hot_data.empty()) {
            hits_++;
            std::cout << "  [HOT CACHE HIT] " << user_id << "/" << resource_id << "\n";
            return hot_data;
        }
        
        // Try main cache
        auto it = main_cache_.find(key);
        
        if (it != main_cache_.end()) {
            hits_++;
            it->second.access_count++;
            
            // Log access (unordered_multiset for frequency)
            access_log_.insert(user_id);
            
            std::cout << "  [CACHE HIT] " << user_id << "/" << resource_id 
                      << " (accessed " << it->second.access_count << " times)\n";
            
            return it->second.data;
        }
        
        misses_++;
        std::cout << "  [CACHE MISS] " << user_id << "/" << resource_id << "\n";
        return "";
    }
    
    // Find all entries with a tag (using unordered_multimap)
    std::vector<std::string> find_by_tag(const std::string& tag) {
        std::vector<std::string> results;
        
        auto range = tag_index_.equal_range(tag);
        
        for (auto it = range.first; it != range.second; ++it) {
            const CacheKey& key = it->second;
            auto entry_it = main_cache_.find(key);
            
            if (entry_it != main_cache_.end()) {
                results.push_back(entry_it->second.data);
            }
        }
        
        return results;
    }
    
    // Session management (unordered_set)
    void start_session(const std::string& session_id) {
        active_sessions_.insert(session_id);
        std::cout << "Started session: " << session_id 
                  << " (total: " << active_sessions_.size() << ")\n";
    }
    
    void end_session(const std::string& session_id) {
        active_sessions_.erase(session_id);
        std::cout << "Ended session: " << session_id 
                  << " (remaining: " << active_sessions_.size() << ")\n";
    }
    
    bool is_session_active(const std::string& session_id) const {
        return active_sessions_.contains(session_id);
    }
    
    // Get top users by access frequency
    std::vector<std::pair<std::string, size_t>> get_top_users(size_t n) const {
        // Count frequencies using unordered_map
        std::unordered_map<std::string, size_t> frequency;
        
        for (const auto& user : access_log_) {
            frequency[user]++;
        }
        
        // Convert to vector and sort
        std::vector<std::pair<std::string, size_t>> users(
            frequency.begin(), frequency.end()
        );
        
        std::partial_sort(users.begin(), 
                         users.begin() + std::min(n, users.size()),
                         users.end(),
                         [](const auto& a, const auto& b) {
                             return a.second > b.second;
                         });
        
        users.resize(std::min(n, users.size()));
        return users;
    }
    
    // Invalidate all entries with a tag
    void invalidate_by_tag(const std::string& tag) {
        auto range = tag_index_.equal_range(tag);
        size_t count = 0;
        
        for (auto it = range.first; it != range.second; ++it) {
            main_cache_.erase(it->second);
            count++;
        }
        
        tag_index_.erase(tag);
        
        std::cout << "Invalidated " << count << " entries with tag: " 
                  << tag << "\n";
    }
    
    // Get statistics
    void print_statistics() const {
        size_t total = hits_ + misses_;
        double hit_rate = total > 0 ? (100.0 * hits_ / total) : 0.0;
        
        std::cout << "\n=== Cache Statistics ===\n";
        std::cout << "Total requests: " << total << "\n";
        std::cout << "Hits: " << hits_ << "\n";
        std::cout << "Misses: " << misses_ << "\n";
        std::cout << "Hit rate: " << std::fixed << std::setprecision(2) 
                  << hit_rate << "%\n";
        std::cout << "Entries in cache: " << main_cache_.size() << "\n";
        std::cout << "Active sessions: " << active_sessions_.size() << "\n";
        std::cout << "Hot cache size: " << hot_cache_.size() << "\n";
        std::cout << "Total accesses logged: " << access_log_.size() << "\n";
        
        // Show bucket statistics
        std::cout << "\nHash table stats:\n";
        std::cout << "  Bucket count: " << main_cache_.bucket_count() << "\n";
        std::cout << "  Load factor: " << main_cache_.load_factor() << "\n";
        std::cout << "  Max load factor: " << main_cache_.max_load_factor() << "\n";
    }
    
    // Analyze hash distribution
    void analyze_hash_distribution() const {
        std::cout << "\n=== Hash Distribution Analysis ===\n";
        
        size_t max_bucket_size = 0;
        size_t empty_buckets = 0;
        
        for (size_t i = 0; i < main_cache_.bucket_count(); ++i) {
            size_t bucket_size = main_cache_.bucket_size(i);
            
            if (bucket_size == 0) {
                empty_buckets++;
            }
            
            if (bucket_size > max_bucket_size) {
                max_bucket_size = bucket_size;
            }
        }
        
        std::cout << "Empty buckets: " << empty_buckets 
                  << " (" << (100.0 * empty_buckets / main_cache_.bucket_count()) 
                  << "%)\n";
        std::cout << "Max bucket size: " << max_bucket_size << "\n";
        std::cout << "Average bucket size: " 
                  << (main_cache_.size() / (double)main_cache_.bucket_count()) 
                  << "\n";
        
        if (max_bucket_size > 5) {
            std::cout << "⚠ Warning: Large bucket detected. Consider rehashing.\n";
        }
    }
};

int main() {
    std::cout << "=== Distributed Cache System Demo ===\n\n";
    
    DistributedCache cache;
    
    // 1. Session management (unordered_set)
    std::cout << "--- Session Management ---\n";
    cache.start_session("session_001");
    cache.start_session("session_002");
    cache.start_session("session_003");
    std::cout << "Session active? " 
              << cache.is_session_active("session_001") << "\n\n";
    
    // 2. Store data with tags
    std::cout << "--- Storing Data ---\n";
    cache.put("user1", "profile", "User1 Profile Data", {"profile", "user_data"});
    cache.put("user1", "settings", "User1 Settings", {"settings", "user_data"});
    cache.put("user2", "profile", "User2 Profile Data", {"profile", "user_data"});
    cache.put("user3", "profile", "User3 Profile Data", {"profile", "admin"});
    cache.put("user2", "orders", "User2 Orders", {"orders", "user_data"});
    std::cout << "\n";
    
    // 3. Retrieve data
    std::cout << "--- Retrieving Data ---\n";
    cache.get("user1", "profile");  // Should hit
    cache.get("user1", "profile");  // Hot cache hit
    cache.get("user2", "profile");  // Should hit
    cache.get("user99", "profile"); // Should miss
    cache.get("user1", "settings"); // Should hit
    std::cout << "\n";
    
    // 4. Find by tag (unordered_multimap)
    std::cout << "--- Find by Tag ---\n";
    auto profile_entries = cache.find_by_tag("profile");
    std::cout << "Found " << profile_entries.size() 
              << " entries with tag 'profile'\n";
    
    auto admin_entries = cache.find_by_tag("admin");
    std::cout << "Found " << admin_entries.size() 
              << " entries with tag 'admin'\n\n";
    
    // 5. More accesses for frequency analysis
    for (int i = 0; i < 5; ++i) {
        cache.get("user1", "profile");
    }
    for (int i = 0; i < 3; ++i) {
        cache.get("user2", "profile");
    }
    
    // 6. Get top users
    std::cout << "\n--- Top Users by Access ---\n";
    auto top_users = cache.get_top_users(3);
    for (const auto& [user, count] : top_users) {
        std::cout << user << ": " << count << " accesses\n";
    }
    
    // 7. Invalidate by tag
    std::cout << "\n--- Invalidation ---\n";
    cache.invalidate_by_tag("profile");
    cache.get("user1", "profile");  // Should miss now
    std::cout << "\n";
    
    // 8. End session
    std::cout << "--- End Sessions ---\n";
    cache.end_session("session_001");
    cache.end_session("session_002");
    std::cout << "\n";
    
    // 9. Statistics
    cache.print_statistics();
    cache.analyze_hash_distribution();
    
    return 0;
}

Output (Sample):

=== Distributed Cache System Demo ===

--- Session Management ---
Started session: session_001 (total: 1)
Started session: session_002 (total: 2)
Started session: session_003 (total: 3)
Session active? 1

--- Storing Data ---
Stored: user1/profile with 2 tags
Stored: user1/settings with 2 tags
Stored: user2/profile with 2 tags
Stored: user3/profile with 2 tags
Stored: user2/orders with 2 tags

--- Retrieving Data ---
  [CACHE HIT] user1/profile (accessed 1 times)
  [HOT CACHE HIT] user1/profile
  [CACHE HIT] user2/profile (accessed 1 times)
  [CACHE MISS] user99/profile
  [CACHE HIT] user1/settings (accessed 1 times)

--- Find by Tag ---
Found 3 entries with tag 'profile'
Found 1 entries with tag 'admin'

  [HOT CACHE HIT] user1/profile
  [HOT CACHE HIT] user1/profile
...

=== Cache Statistics ===
Total requests: 15
Hits: 13
Misses: 2
Hit rate: 86.67%
Entries in cache: 5
Active sessions: 3
Hot cache size: 5
Total accesses logged: 13

=== Hash Distribution Analysis ===
Empty buckets: 5 (83%)
Max bucket size: 2
Average bucket size: 0.83

Concepts Demonstrated:

This example shows real-world use of hash tables in a caching system!


Next Steps


Part 3 of 22 - Unordered Containers