第 3 章：集合类型的底层数据结构

深入理解 List<T>、Dictionary<K,V>、HashSet<T>、ConcurrentDictionary 的底层实现、扩容机制和性能特征。

3.1 List<T> — 动态数组

底层结构

List<T> 内部是一个 T[] 数组：

// 简化的内部结构
public class List<T>
{
    private T[] _items;     // 底层数组
    private int _size;      // 实际元素数量
    private int _version;   // 版本号（用于检测遍历时修改）
}

扩容机制

var list = new List<int>();       // 初始容量 0，_items = Array.Empty<int>()
list.Add(1);                      // 首次添加，容量变为 4
list.Add(2); list.Add(3); list.Add(4);  // 容量 4，已满
list.Add(5);                      // 触发扩容：容量变为 8（翻倍）

扩容过程：

创建新数组，容量 = 旧容量 × 2（最小为 4）
将旧数组的元素复制到新数组（Array.Copy）
旧数组等待 GC 回收

容量变化：0 → 4 → 8 → 16 → 32 → 64 → 128 → ...

性能特征

操作	时间复杂度	说明
`Add`（末尾）	均摊 O(1)	扩容时 O(n)
`Insert`（中间）	O(n)	需要移动后续元素
`RemoveAt`（中间）	O(n)	需要移动后续元素
`this[index]`	O(1)	直接索引访问
`Contains`	O(n)	线性搜索
`Sort`	O(n log n)	内省排序（IntroSort）

最佳实践

// 已知大小时预分配容量
var list = new List<int>(10000);  // 避免多次扩容

// 批量添加用 AddRange
list.AddRange(otherCollection);

// 释放多余容量
list.TrimExcess();

// .NET 8+ CollectionsMarshal 直接访问内部数组
var span = CollectionsMarshal.AsSpan(list);

3.2 Dictionary<TKey, TValue> — 哈希表

底层结构

// 简化的内部结构
public class Dictionary<TKey, TValue>
{
    private int[] _buckets;       // 桶数组（存储 entries 的索引）
    private Entry[] _entries;     // 条目数组
    private int _count;           // 元素数量
    private int _freeList;        // 空闲链表头
    private int _freeCount;       // 空闲条目数量
}

private struct Entry
{
    public uint hashCode;         // 哈希值
    public int next;              // 同一个桶中下一个条目的索引（链表）
    public TKey key;
    public TValue value;
}

查找过程

1. 计算 key 的哈希值：hashCode = key.GetHashCode()
2. 定位桶：bucketIndex = hashCode % _buckets.Length
3. 遍历桶中的链表：_buckets[bucketIndex] → entry.next → ...
4. 比较 hashCode 和 key.Equals()

示例：查找 key = "Alice"
hashCode = "Alice".GetHashCode()  // 假设 = 12345
bucketIndex = 12345 % 7           // 假设 7 个桶 → 桶 4
遍历桶 4 的链表，找到匹配的 entry

_buckets:  [_, _, _, _, 2, _, _]
                        │
_entries:  [..., Entry{hash=12345, key="Alice", value=30, next=-1}, ...]
                  ↑ index=2

扩容机制

当元素数量达到容量时触发扩容：

新容量 = 大于当前容量 × 2 的最小质数
重新分配 _buckets 和 _entries
所有元素重新计算桶位置（rehash）

容量变化（质数）：3 → 7 → 17 → 37 → 71 → 163 → ...

为什么用质数？减少哈希冲突（取模运算分布更均匀）。

性能特征

操作	平均	最差（大量冲突）
`Add`	O(1)	O(n)
`TryGetValue`	O(1)	O(n)
`Remove`	O(1)	O(n)
`ContainsKey`	O(1)	O(n)
`ContainsValue`	O(n)	O(n)

最佳实践

// 预分配容量
var dict = new Dictionary<string, int>(10000);

// 使用 TryGetValue 而不是 ContainsKey + 索引器
if (dict.TryGetValue(key, out var value))
{
    // 一次查找
}

// 自定义比较器
var dict = new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);

// .NET 6+ TryAdd
dict.TryAdd("key", 42);  // 不存在才添加，避免异常

// .NET 8+ GetAlternateLookup（避免字符串分配）
var lookup = dict.GetAlternateLookup<ReadOnlySpan<char>>();
if (lookup.TryGetValue("key".AsSpan(), out var val)) { }

3.3 HashSet<T> — 哈希集合

底层结构与 Dictionary 几乎相同，只是没有 Value：

// 内部结构类似 Dictionary，但 Entry 没有 value 字段
private struct Entry
{
    public int hashCode;
    public int next;
    public T value;  // 这里的 value 就是集合的元素
}

集合运算

var set1 = new HashSet<int> { 1, 2, 3, 4, 5 };
var set2 = new HashSet<int> { 3, 4, 5, 6, 7 };

set1.IntersectWith(set2);      // 交集：{3, 4, 5}（修改 set1）
set1.UnionWith(set2);          // 并集：{1, 2, 3, 4, 5, 6, 7}
set1.ExceptWith(set2);         // 差集：{1, 2}
set1.SymmetricExceptWith(set2); // 对称差集：{1, 2, 6, 7}

bool isSubset = set1.IsSubsetOf(set2);
bool overlaps = set1.Overlaps(set2);

FrozenSet / FrozenDictionary（.NET 8+）

创建后不可修改，查找性能更好：

using System.Collections.Frozen;

var frozenSet = new HashSet<string> { "a", "b", "c" }.ToFrozenSet();
var frozenDict = new Dictionary<string, int> { ["a"] = 1 }.ToFrozenDictionary();

// 查找性能比普通 HashSet/Dictionary 更好
// 因为创建时会选择最优的哈希策略
bool contains = frozenSet.Contains("a");

3.4 ConcurrentDictionary<TKey, TValue>

线程安全的字典，使用分段锁（Striped Locking）：

┌─────────────────────────────────────────┐
│         ConcurrentDictionary            │
│                                         │
│  Lock 0 → [Bucket 0] [Bucket 1]       │
│  Lock 1 → [Bucket 2] [Bucket 3]       │
│  Lock 2 → [Bucket 4] [Bucket 5]       │
│  ...                                    │
│                                         │
│  不同的锁保护不同的桶组                  │
│  不同桶组的操作可以并发执行              │
└─────────────────────────────────────────┘

var dict = new ConcurrentDictionary<string, int>();

// 线程安全的操作
dict.TryAdd("key", 42);
dict.TryGetValue("key", out var value);
dict.TryRemove("key", out _);

// 原子更新
dict.AddOrUpdate("counter",
    addValue: 1,                           // key 不存在时的初始值
    updateValueFactory: (key, old) => old + 1  // key 存在时的更新逻辑
);

// GetOrAdd（只创建一次）
var val = dict.GetOrAdd("key", key => ExpensiveComputation(key));
// 注意：valueFactory 可能被多个线程同时调用，但只有一个结果会被使用

ConcurrentDictionary vs Dictionary + lock

场景	推荐
读多写少	ConcurrentDictionary（读无锁）
写多读少	Dictionary + lock（锁粒度可控）
需要原子的复合操作	ConcurrentDictionary（AddOrUpdate）
需要遍历一致性	Dictionary + lock

3.5 其他常用集合

// LinkedList<T>：双向链表
// 插入/删除 O(1)，查找 O(n)，内存不连续
var linked = new LinkedList<int>();
var node = linked.AddLast(1);
linked.AddAfter(node, 2);

// Queue<T>：环形缓冲区
// Enqueue O(1)，Dequeue O(1)
var queue = new Queue<int>();
queue.Enqueue(1);
int item = queue.Dequeue();

// Stack<T>：数组实现
// Push O(1)，Pop O(1)
var stack = new Stack<int>();
stack.Push(1);
int top = stack.Pop();

// SortedDictionary<K,V>：红黑树
// 增删查 O(log n)，有序遍历
var sorted = new SortedDictionary<string, int>();

// PriorityQueue<T, TPriority>（.NET 6+）：最小堆
var pq = new PriorityQueue<string, int>();
pq.Enqueue("low", 10);
pq.Enqueue("high", 1);
string highest = pq.Dequeue();  // "high"（优先级 1 最小）

3.6 面试要点

List<T> 的扩容机制？
- 翻倍扩容（0→4→8→16→…），扩容时复制整个数组
Dictionary 的底层实现？
- 哈希表，桶数组 + 条目数组 + 链表解决冲突
- 容量是质数，扩容时 rehash
Dictionary 查找的时间复杂度？
- 平均 O(1)，最差 O(n)（大量哈希冲突时）
ConcurrentDictionary 的并发原理？
- 分段锁，不同桶组使用不同的锁，读操作无锁
什么时候用 HashSet 而不是 List？
- 需要去重、需要集合运算、需要 O(1) 的 Contains

练习

预分配 vs 不预分配 List 容量，用 BenchmarkDotNet 对比性能
实现一个简化版的 Dictionary，理解哈希表的工作原理
对比 Dictionary 和 ConcurrentDictionary 在多线程下的性能
使用 FrozenDictionary 优化只读查找场景

← 上一章：String 的底层实现 | 下一章：垃圾回收深入剖析 →

3.1 List<T> — 动态数组

底层结构

扩容机制

性能特征

最佳实践

3.2 Dictionary<TKey, TValue> — 哈希表

底层结构

查找过程

扩容机制

性能特征

最佳实践

3.3 HashSet<T> — 哈希集合

集合运算

FrozenSet / FrozenDictionary（.NET 8+）

3.4 ConcurrentDictionary<TKey, TValue>

ConcurrentDictionary vs Dictionary + lock

3.5 其他常用集合

3.6 面试要点

练习

评论