-2

I keep getting the following error: AttributeError: 'tuple' object has no attribute 'size'. The split function also added

import numpy as np
def split(array):
    N = {}
    uniqe_array = np.unique(array)
    for i in uniqe_array:
        N[i] = np.where(array==i)
    
    return N




def information_gain(x_array, y_array): 
    parent_entropy = entropy(x_array)
    split_dict = split(y_array)
    for val in split_dict.values():
        freq = val.size / x_array.size
        child_entropy = entropy([x_array[i] for i in val])
        parent_entropy -= child_entropy* freq
    return parent_entropy

x = np.array([0, 1, 0, 1, 0, 1])
y = np.array([0, 1, 0, 1, 1, 1]) 
print(round(information_gain(x, y), 4))
x = np.array([0, 0, 1, 1, 2, 2])
y = np.array([0, 1, 0, 1, 1, 1]) 
print(round(information_gain(x, y), 4))
jon-hanson
  • 7,961
  • 2
  • 35
  • 59
PNN
  • 1
  • 1
  • 4

2 Answers2

1

It appears that the values of split_dict are tuples and not what I assume to be expected np.array's. I would recommend taking a look at what function split is returning to split_dict because it might be creating tuples instead of np.array's.

Edit:

Based on what's inside function split, it's returning {0: (array([0, 2], dtype=int64),), 1: (array([1, 3, 4, 5], dtype=int64),)} to split_dict so the values are tuples that contain a numpy.array and the data type (in this case int64) as elements, thus raising the AttributeError.

A slightly modified split that does what you're looking for would look something like this:

def split(array):
    N={}
    uniqe_array=np.unique(array)
    for i in uniqe_array:
        N[i]=np.where(array==i)[0]     #Notice change here to take first element 
    return N

See this answer for more information: What is the purpose of numpy.where returning a tuple?

Chris Greening
  • 450
  • 5
  • 13
0

I think the name of the function is len and not size

objects = (1,2,3)
len(objects)

Will give output as 3

Mihir
  • 320
  • 2
  • 3
  • 14