7

I have used the following code to convert the sk learn breast cancer data set to data frame : I am not getting the output ? I am very new in python and not able to figure out what is wrong.

def answer_one(): 

    import numpy as np
    import pandas as pd
    from sklearn.datasets import load_breast_cancer 
    cancer = load_breast_cancer()     
    data = numpy.c_[cancer.data, cancer.target]
    columns = numpy.append(cancer.feature_names, ["target"])
    return pandas.DataFrame(data, columns=columns)

answer_one()
talonmies
  • 68,743
  • 34
  • 184
  • 258
solly bennet
  • 111
  • 1
  • 2
  • 7

4 Answers4

6

The following code works

def answer_one(): 
    import numpy as np
    import pandas as pd
    from sklearn.datasets import load_breast_cancer 
    cancer = load_breast_cancer()     
    data = np.c_[cancer.data, cancer.target]
    columns = np.append(cancer.feature_names, ["target"])
    return pd.DataFrame(data, columns=columns)

answer_one()

The reason why your code doesn't work before was you try to call numpy and pandas package again after defining it as np and pd respectively.

However, i suggest that the package loading and redefinition is done at the beginning of the script, outside a function definition.

  • def answer_one(): data = numpy.c_[cancer.data, cancer.target] columns = numpy.append(cancer.feature_names, ["target"]) return pandas.DataFrame(data, columns=columns) answer_one() – solly bennet Feb 13 '18 at 17:22
  • I tried without defining them. but does not get an output. Anything wrong with the return statement ? – solly bennet Feb 13 '18 at 17:23
6

Use pandas

There was a great answer here: How to convert a Scikit-learn dataset to a Pandas dataset?

The keys in bunch object give you an idea about which data you want to make columns for.

df = pd.DataFrame(cancer.data, columns=cancer.feature_names)
df['target'] = pd.Series(cancer.target)
3
dataframe = pd.DataFrame(data=cancer.data, columns=cancer.feature_names)
dataframe['target'] = cancer.target
return dataframe
4b0
  • 20,627
  • 30
  • 92
  • 137
Marckhz
  • 31
  • 1
  • 3
    Welcome to Stack Overflow! Code-only answers are not particularly helpful. Please include a brief description of how this code solves the problem. – 4b0 Apr 18 '20 at 20:07
3

As of scikit-learn 0.23 you can do the following to get a DataFrame and save some keystrokes:

df = load_breast_cancer(as_frame=True)
df.frame
jeffhale
  • 2,861
  • 6
  • 36
  • 54
  • Not working for me for some reason in Google Colab. Colab has 0.22, but I upgraded to 0.24 using pip (and the __version__ shows the updated version), still using as_frame=True) still returns a bunch :-/ – Levon Feb 17 '21 at 00:06