Python 级内存管理 - xiaorui.cc
Object-specific allocators [ int ] [ dict ] [ list ]... [ string ] Python core +3 <----- Object-specific memory -----> <-- Non-object memory --> [ Python's object allocator ] +2 ####### Object memory ####### <------ Internal buffers ------> [ Python's raw memory allocator (PyMem_ API) ] +1 <----- Python memory (under PyMem manager's control) ------> [ Underlying general-purpose allocator (ex: C library malloc) ] 0 <------ Virtual memory allocated for the python process -------> ========================================================================= [ OS-specific Virtual Memory Manager (VMM) ] -1 <--- Kernel dynamic storage allocation & management (page-based) ---> [ ] [ ] -2 <-- Physical memory: ROM/RAM --> <-- Secondary storage (swap) -->
* Request in bytes Size of allocated block Size class idx * ---------------------------------------------------------------- * 1-8 8 0 * 9-16 16 1 * 17-24 24 2 * 25-32 32 3 * 33-40 40 4 * 41-48 48 5 * 49-56 56 6 * 57-64 64 7 *......... * 497-504 504 62 * 505-512 512 63 * * */
名词解释 process heap Arenas Pool UsedPools FreePools
method posix malloc python memory pool object buffer pool
Arena Process Pool stack heap Arena UserPool 1-8 249-256 Free Block Free Block Use Block bss init data malloc heap & pool FeeePool Pool text Pool Headers Pool No BLock Pool
userdpool design UserdPools 1-8 9-16 17-24 249-256 Pool Header Free Block Free Block Pool Free Block Free Block Use Block Header 分配回收 同 个 Pool 下 Block 样长单 Pool 为 4kb Block 及 Pool 都为单链表
free pool desgin FeeePool Pool Pool Headers Pool Pool No BLock Pool Headers Pool 为 4kb 小 Pool 清理 Headers No BLock
where store variable? run-time Stack heap list [1,2, 3] dict { n : 1 } int 1
why? In [1]: a = 123 In [7]: a = 'n' In [2]: b = 123 In [8]: b = 'n' In [3]: a is b Out[3]: True In [9]: a is b Out[9]: True In [4]: a = 1000 In [10]: a = "python" In [5]: b = 1000 In [11]: b = "python" In [6]: a is b Out[6]: False In [12]: a is b Out[12]: True
why? In [10]: a = b = 'nima' In [1]: def go(var): In [11]: b = a...: print id(var) : In [12]: a is b 只有引用? Out[12]: True In [2]: id(a) Out[2]: 4401335072 In [13]: b = 'hehe' In [3]: go(a) In [14]: a is b 4401335072 Out[14]: False
python objects stored in memory? names Python Has Names, Not Variables!!! names object
整数对象池 小整数 整数 -5-4 0 256 257-5 -4 257-5 -4 257 var_1 var_2 28 bytes 解释器初始化 var_3 var_4 not the same addr! the same addr!
整数对象池 Block List PyIntBlock PyIntBlock 不会归还给 Arena 和 os!!! Free List PyIntBlock PyIntBlock
字符对象池 a b c d var_1 var_2 单个字符 38 bytes 由解释器初始化 the same addr!
字符串对象池 0 1 2 3 aa en cao oh woyao buyao kuai feile ref hash 存储变量 var_1 var_2 共用地址 记录引用计数
PyObject_GC_TRACK func: PyList_New PyGC_Head Node Node func: list_dealloc ref: https://svn.python.org/projects/python/trunk/objects/listobject.c
ref count 300 x = 300 y = x z = [x, y] ref += 1 X ref += 1 y ref += 2 Z References -> 4!
What does del do? x = 300 300 y = x del x ref -= 1 X y The del statement doesn t delete objects. References -> 1! removes that name as a reference to that object reduces the ref count by 1
ref count case def go(): w = 300 ref count +1 go() w is out of scope; ref count -1 a = fuc. del a del a; ref count -1 b = en, a 重新赋值 ; ref count -1 b = None
class Node: cyclical ref def init (self, va): self.va = va def next(self, next): Mid self.next = next if del mid node: mid = Node( root ) how? left = Node( left ) right = Node( right ) left right mid(left) left.next(right) right.next(left)
mark & sweep gc root b a R c w K G
分代回收 PyGC_Head Young node node node node node node node 分 治之 Old node node node node node node 提 效率 命周期 空间换时间 Permanent node node node node node node
when gc import gc gc.set_threshold(700, 10, 5) 计数器? 700? PyMemApi 分配计数器 10? 5? 0 代回收 > 700 1 代回收 N % 10 2 代回收 N % 5
summery 分配内存 -> 发现超过阈值了 -> 触发垃圾回收 -> 将所有可收集对象链表放到 起 -> 遍历, 计算有效引用计数 -> 分成有效引用计数 =0 和有效引用计数 > 0 两个集合 -> 于 0 的, 放 到更老 代 -> =0 的, 执 回收 -> 回收遍历容器内的各个元素, 减掉对应元素引用计数 ( 破掉循环引用 ) -> 执 -1 的逻辑, 若发现对象引用计数 =0, 触发内存回收 -> python 底层内存管理机制回收内存
weakref 弱引用 class Expensive(object): def del (self): 不参与引用计数 print '(Deleting %s)' % self 解决循环引用 obj = Expensive() r = weakref.ref(obj) del obj print 'r():', r() class Parent(object): def init (self): self.children = [ Child(self) ] class Child(object): def init (self, parent): self.parent = weakref.proxy(parent)
可变 vs 不可变 (obj) string list int dict tuple
container objects a = [10, 10, 11] b = a PyObject Type integer PyListObject Type list rc 1 10 10 rc 2 value 10 items size 11 PyObject Type integer rc 1 value 11
copy.copy a = [10, 10, [10, 11] ] b = copy.copy(a) PyObject PyListObject Type integer Type list rc 1 items 10 10 rc 2 value 10 size ref PyListObject PyListObject PyObject Type list rc 1 10 10 11 Type integer items 10 rc 1 size ref value 11
copy.deepcopy a = [10, [ 10, 11 ] ] b = copy.deep(a) PyObject PyListObject Type list rc 1 items size 10 ref PyListObject 10 11 Type integer rc 2 value 10 PyListObject Type list rc 1 items 10 PyListObject 10 11 PyObject Type integer rc 1 size ref value 11
diy gc import gc import sys gc.set_debug(gc.debug_stats gc.debug_leak) a=[] b=[] a.append(b) print 'a refcount:',sys.getrefcount(a) # 2 print 'b refcount:',sys.getrefcount(b) # 3 del a del b print gc.collect() # 0
Garbage Collector Optimize memory bound 可以降低 threshold 来时间换空间 cpu bound 提 threshold 来空间换时间 暂停 gc, 引 master worker 设计
Q & A 引用计数跟 gil 的影响? gc 是否是原? gc 的 stop the world 现象?
END xiaorui.cc