20

I am currently using a function making extremely long dictionaries (used to compare DNA strings) and sometimes I'm getting MemoryError. Is there a way to allot more memory to Python so it can deal with more data at once?

cs95
  • 330,695
  • 80
  • 606
  • 657
  • Python 64-bits have **a lot more** memory support. I would say real numbers but I don't remember (I see this in a StackOverflow Question.) – Ender Look Jun 12 '17 at 20:36
  • I'm comparing strings in lengths of 3-5 million characters, in the process creating a dictionary for each containing roughly as many keys as its length. Does that count as a lot? –  Jun 12 '17 at 20:40
  • 1
    @Maor That is definitely a lot. You should consider refactoring your code. – cs95 Jun 12 '17 at 20:41
  • Hey, if it's DNA, then how these dictionaries have so many keys? – enedil Jun 12 '17 at 20:45
  • 1
    How much RAM are you working with? Can you add details about the data *in the question itself* instead of in the comments? Elaborate a bit more. If if it is a 32bit version of Python, you might benefit greatly by going 64bit. Depends. – juanpa.arrivillaga Jun 12 '17 at 20:49
  • Bear in mind that Python objects incur some memory overhead on top of the "raw" data size. An empty string in 32 bit Python 3 consumes 25 bytes, each additional ASCII char will add 1 byte. If you use `bytes` strings instead the cost of an empty `b''` drops to 17 bytes. You can get this info via the `sys.getsizeof` function. Python 3.6 dicts are more space-efficient than previous versions, but they still have some unavoidable overheads. – PM 2Ring Jun 12 '17 at 21:04
  • python libraries like [resource](https://docs.python.org/3/library/resource.html) _can_ impose a limit though. – matanster Jun 07 '19 at 17:31

3 Answers3

25

Python doesn’t limit memory usage on your program. It will allocate as much memory as your program needs until your computer is out of memory. The most you can do is reduce the limit to a fixed upper cap. That can be done with the resource module, but it isn't what you're looking for.

You'd need to look at making your code more memory/performance friendly.

cs95
  • 330,695
  • 80
  • 606
  • 657
  • 1
    Or until the limit in the OS is reached (e.g. on linux you can easily impose limits via configuration) – matanster Jun 07 '19 at 17:31
0

Python has MomeoryError which is the limit of your System RAM util you've defined it manually with resource package.

Defining your class with slots makes the python interpreter know that the attributes/members of your class are fixed. And can lead to significant memory savings!

You can reduce dict creation by python interpreter by using __slot__ . This will tell interpreter to not create dict internally and reuse same variable.

If the memory consumed by your python processes will continue to grow with time. This seems to be a combination of:

  • How the C memory allocator in Python works. This is essentially memory fragmentation, because the allocation cannot call ‘free’ unless the entire memory chunk is unused. But the memory chunk usage is usually not perfectly aligned to the objects that you are creating and using.
  • Using a number of small string to compare data. A process called interning used internally but creating multiple small strings brings load on interpreter.

The best way is to create Worker Thread or single threaded pool to do your work and invalidate worker/kill to free up resources attached/used in worker thread.

Below code creates single thread worker :

__slot__ = ('dna1','dna2','lock','errorResultMap')
lock = threading.Lock()
errorResultMap = []
def process_dna_compare(dna1, dna2):
    with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
        futures = {executor.submit(getDnaDict, lock, dna_key): dna_key for dna_key in dna1}
    '''max_workers=1 will create single threadpool'''
    dna_differences_map={}
    count = 0
    dna_processed = False;
    for future in concurrent.futures.as_completed(futures):
        result_dict = future.result()
        if result_dict :
            count += 1
            '''Do your processing XYZ here'''
    logger.info('Total dna keys processed ' + str(count))

def getDnaDict(lock,dna_key):
    '''process dna_key here and return item'''
    try:
        dataItem = item[0]
        return dataItem
    except:
        lock.acquire()
        errorResultMap.append({'dna_key_1': '', 'dna_key_2': dna_key_2, 'dna_key_3': dna_key_3,
                          'dna_key_4': 'No data for dna found'})
        lock.release()
        logger.error('Error in processing dna :'+ dna_key)
    pass

if __name__ == "__main__":
    dna1 = '''get data for dna1'''
    dna2 = '''get data for dna2'''
    process_dna_compare(dna1,dna2)
    if errorResultMap != []:
       ''' print or write to file the errorResultMap'''

Below code will help you understand memory usage :

import objgraph
import random
import inspect

class Dna(object):
    def __init__(self):
        self.val = None
    def __str__(self):
        return "dna – val: {0}".format(self.val)

def f():
    l = []
    for i in range(3):
        dna = Dna()
        #print “id of dna: {0}”.format(id(dna))
        #print “dna is: {0}”.format(dna)
        l.append(dna)
    return l

def main():
    d = {}
    l = f()
    d['k'] = l
    print("list l has {0} objects of type Dna()".format(len(l)))
    objgraph.show_most_common_types()
    objgraph.show_backrefs(random.choice(objgraph.by_type('Dna')),
    filename="dna_refs.png")

    objgraph.show_refs(d, filename='myDna-image.png')

if __name__ == "__main__":
    main()

Output for memory usage :

list l has 3 objects of type Dna()
function                   2021
wrapper_descriptor         1072
dict                       998
method_descriptor          778
builtin_function_or_method 759
tuple                      667
weakref                    577
getset_descriptor          396
member_descriptor          296
type                       180

More read on slots please visit : https://elfsternberg.com/2009/07/06/python-what-the-hell-is-a-slot/

Avision
  • 3,254
  • 1
  • 16
  • 23
Mandy
  • 153
  • 1
  • 9
-1

Try to update your py from 32bit to 64bit.

Simply type python in the command line and you will see which your python is. The memory in 32bit python is very low.

Tim
  • 2,109
  • 2
  • 20
  • 29