3

I came across the following piece of code while studying Numpy:

import numpy as np

import time
import sys
S= range(1000)
print(sys.getsizeof(5)*len(S))

D= np.arange(1000)
print(D.size*D.itemsize)

The output of this is:

O/P -  14000

4000

So Numpy saves memory storage. But I want to know how does Numpy do it?

Source: https://www.edureka.co/blog/python-numpy-tutorial/

Edit: This question only answers half of my question. Doesn't mention anything regarding what the Numpy module does.

Mithil Bhoras
  • 604
  • 7
  • 21
  • Possible duplicate of ["sys.getsizeof(int)" returns an unreasonably large value?](https://stackoverflow.com/questions/10365624/sys-getsizeofint-returns-an-unreasonably-large-value) – FlyingTeller Jul 09 '18 at 07:31
  • @FlyingTeller my question is regarding Numpy. The link you posted only covers half my answer. – Mithil Bhoras Jul 09 '18 at 08:56

3 Answers3

3

NumPy's arrays are more compact than Python lists -- a list of lists as you describe, in Python, would take at least 20 MB or so, while a NumPy 3D array with single-precision floats in the cells would fit in 4 MB. Access to reading and writing items is also faster with NumPy.

Maybe you don't care that much for just a million cells, but you definitely would for a billion cells -- neither approach would fit in a 32-bit architecture, but with 64-bit builds NumPy would get away with 4 GB or so, Python alone would need at least about 12 GB (lots of pointers which double in size) -- a much costlier piece of hardware!

The difference is mostly due to "indirectness" -- a Python list is an array of pointers to Python objects, at least 4 bytes per pointer plus 16 bytes for even the smallest Python object (4 for type pointer, 4 for reference count, 4 for value -- and the memory allocators rounds up to 16). A NumPy array is an array of uniform values -- single-precision numbers takes 4 bytes each, double-precision ones, 8 bytes. Less flexible, but you pay substantially for the flexibility of standard Python lists!

abhi krishnan
  • 1,061
  • 6
  • 21
0

In your example, D.size == len(S), so the difference is due to the difference between D.itemsize (8) and sys.getsizeof(5) (28).

D.dtype shows you that NumPy used int64 as the data type, which uses (unsurprisingly) 64 bits == 8 bytes per item. This is really only the raw numerical data, similar to a data type in C (under the hood it pretty much is exactly that).

In contrast, Python uses an int for storing the items, which (as pointed out the question linked to by FlyingTeller) is more than just the raw numerical data.

Florian Brucker
  • 8,479
  • 3
  • 41
  • 65
0

A ndarray stores its data in a contiguous data buffer

For an example in my current ipython session:

In [63]: x.shape
Out[63]: (35, 7)
In [64]: x.dtype
Out[64]: dtype('int64')
In [65]: x.size
Out[65]: 245
In [66]: x.itemsize
Out[66]: 8
In [67]: x.nbytes
Out[67]: 1960

The array referenced by x has a block of memory with info like shape and strides, and this data buffer that takes up 1960 bytes.

Identifying the memory use of a list, e.g. xl = x.tolist() is trickier. len(xl) is 35, that is, it's databuffer has 35 pointers. But each pointer references a different list of 7 elements. Each of those lists has pointers to numbers. In my example the numbers are all integers less than 255, so each is unique (repeats point to the same object). For larger integers and floats there will be a separate Python object for each. So the memory footprint of a list depends on the degree of nesting as well as the type of the individual elements.

ndarray can also have object dtype, in which case it too contains pointers to objects elsewhere in memory.

And another nuance - the primary pointer buffer of a list is slightly oversized, to make append faster.

hpaulj
  • 201,845
  • 13
  • 203
  • 313