[ACCEPTED]-How do i use subprocesses to force python to release memory?-subprocess

Accepted answer
Score: 50

The important thing about the optimization 77 suggestion is to make sure that my_function() is only 76 invoked in a subprocess. The deepcopy and del are irrelevant 75 — once you create five million distinct 74 integers in a process, holding onto all 73 of them at the same time, it's game over. Even 72 if you stop referring to those objects, Python 71 will free them by keeping references to 70 five million empty integer-object-sized 69 fields in a limbo where they await reuse 68 for the next function that wants to create 67 five million integers. This is the free list mentioned 66 in the other answer, and it buys blindingly 65 fast allocation and deallocation of ints 64 and floats. It is only fair to Python to 63 note that this is not a memory leak since 62 the memory is definitely made available 61 for further allocations. However, that memory 60 will not get returned to the system until 59 the process ends, nor will it be reused 58 for anything other than allocating numbers 57 of the same type.

Most programs don't have 56 this problem because most programs do not 55 create pathologically huge lists of numbers, free 54 them, and then expect to reuse that memory 53 for other objects. Programs using numpy are also 52 safe because numpy stores numeric data of its 51 arrays in tightly packed native format. For 50 programs that do follow this usage pattern, the 49 way to mitigate the problem is by not creating 48 a large number of the integers at the same 47 time in the first place, at least not in 46 the process which needs to return memory 45 to the system. It is unclear what exact 44 use case you have, but a real-world solution 43 will likely require more than a "magic decorator".

This 42 is where subprocess come in: if the list 41 of numbers is created in another process, then 40 all the memory associated with the list, including 39 but not limited to storage of ints, is both 38 freed and returned to the system by the 37 mere act of terminating the subprocess. Of 36 course, you must design your program so 35 that the list can be both created and processed 34 in the subsystem, without requiring the 33 transfer of all these numbers. The subprocess 32 can receive information needed to create 31 the data set, and can send back the information 30 obtained from processing the list.

To illustrate 29 the principle, let's upgrade your example 28 so that the whole list actually needs to 27 exist - say we're benchmarking sorting algorithms. We 26 want to create a huge list of integers, sort 25 it, and reliably free the memory associated 24 with the list, so that the next benchmark 23 can allocate memory for its own needs without 22 worrying of running out of RAM. To spawn 21 the subprocess and communicate, this uses 20 the multiprocessing module:

# To run this, save it to a file that looks like a valid Python module, e.g.
# "foo.py" - multiprocessing requires being able to import the main module.
# Then run it with "python foo.py".

import multiprocessing, random, sys, os, time

def create_list(size):
    # utility function for clarity - runs in subprocess
    maxint = sys.maxint
    randrange = random.randrange
    return [randrange(maxint) for i in xrange(size)]

def run_test(state):
    # this function is run in a separate process
    size = state['list_size']
    print 'creating a list with %d random elements - this can take a while... ' % size,
    sys.stdout.flush()
    lst = create_list(size)
    print 'done'
    t0 = time.time()
    lst.sort()
    t1 = time.time()
    state['time'] = t1 - t0

if __name__ == '__main__':
    manager = multiprocessing.Manager()
    state = manager.dict(list_size=5*1000*1000)  # shared state
    p = multiprocessing.Process(target=run_test, args=(state,))
    p.start()
    p.join()
    print 'time to sort: %.3f' % state['time']
    print 'my PID is %d, sleeping for a minute...' % os.getpid()
    time.sleep(60)
    # at this point you can inspect the running process to see that it
    # does not consume excess memory

Bonus Answer

It is hard to provide an answer 19 to the bonus question, since the question 18 is unclear. The "free list concept" is exactly 17 that, a concept, an implementation strategy 16 that needs to be explicitly coded on top 15 of the regular Python allocator. Most Python 14 types do not use that allocation strategy, for 13 example it is not used for instances of 12 classes created with the class statement. Implementing 11 a free list is not hard, but it is fairly 10 advanced and rarely undertaken without good 9 reason. If some extension author has chosen 8 to use a free list for one of its types, it 7 can be expected that they are aware of the 6 tradeoff a free list offers — gaining extra-fast 5 allocation/deallocation at the cost of some 4 additional space (for the objects on the 3 free list and the free list itself) and 2 inability to reuse the memory for something 1 else.

More Related questions