Dataframe has no column names. How to add a header?

Question

I am using a dataset to practice for building a decision tree classifier.

Here is my code:

import pandas as pd 
tdf = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', sep = ',', header=0)
tdf.info()

The column has no name, and i have problem to add the column name, already tried reindex, pd.melt, rename, etc.

The column names Ι want to assign are:

Sample code number: id number
Clump Thickness: 1 - 10
Uniformity of Cell Size: 1 - 10
Uniformity of Cell Shape: 1 - 10
Marginal Adhesion: 1 - 10
Single Epithelial Cell Size: 1 - 10
Bare Nuclei: 1 - 10
Bland Chromatin: 1 - 10
Normal Nucleoli: 1 - 10
Mitoses: 1 - 10
Class: (2 for benign, 4 for malignant)

Thanks,

score 20 · Accepted Answer · edited May 12 '22 at 15:44

20

For any dataframe, say df , you can add/modify column names by passing the column names in a list to the df.columns method: For example, if you want the column names to be 'A', 'B', 'C', 'D'],use this:

df.columns = ['A', 'B', 'C', 'D']

In your code , can you remove header=0? This basically tells pandas to take the first row as the column headers . Once you remove that, use the above to assign the column names.

edited May 12 '22 at 15:44

Community

1

answered Feb 09 '19 at 18:39

Gyan Ranjan

811
7
12

Thank you for your respond. Tried this, it doesn't work. – user633599 Feb 09 '19 at 18:45
1

Can you remove header = 0 ? This argument basically tells pandas to take the first row as header . – Gyan Ranjan Feb 09 '19 at 18:48
It works, thank you very much. I am learning both DS and Python at the same time, it is really challenging. I appreciate your help. – user633599 Feb 09 '19 at 19:01
Quick follow up: – user633599 Feb 09 '19 at 19:02
1

Keep learning. Soon those efforts will pay off – Gyan Ranjan Feb 09 '19 at 19:03
If it worked , please accept and upvote the answer so others can benefit from it as well . Happy learning :) – Gyan Ranjan Feb 09 '19 at 19:10
i have a follow up question: – user633599 Feb 09 '19 at 19:16
1

Hi, I tried the code above and you are missing the first line of data. I added a new answer which gets all the rows @GyanRanjan – daco Feb 11 '19 at 11:32

score 11 · Answer 2 · edited Aug 10 '19 at 14:02

11

df = pd.read_csv("Price Data.csv", names=['Date', 'Price'])

use the names field to add a header to your pandas dataframe.

edited Aug 10 '19 at 14:02

Stephen Rauch

1,783
11
22
34

answered Aug 10 '19 at 10:10

Priyanshu Khullar

111
1
4

score 0 · Answer 3 · answered Feb 11 '19 at 11:02

I tried the code above and you are missing the first line of data.

1. original

tdf = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', sep = ',', header=0)
tdf.shape

(698, 11)

2. as the previous questions, removing header=0

tdf = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', sep = ',')
tdf.shape

(698, 11)

3. new answer, adding column names while reading csv, does get all the rows

 tdf = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', sep = ',', names=['Sample code number: id number','Clump Thickness: 1 - 10','Uniformity of Cell Size: 1 - 10','Uniformity of Cell Shape: 1 - 10','Marginal Adhesion: 1 - 10','Single Epithelial Cell Size: 1 - 10','Bare Nuclei: 1 - 10','Bland Chromatin: 1 - 10','Normal Nucleoli: 1 - 10','Mitoses: 1 - 10','Class: (2 for benign, 4 for malignant)'])  
    tdf.shape

(699, 11)

You can assign the names of the columns when reading the csv file

import pandas as pd 
tdf = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', sep = ',', names=['Sample code number: id number','Clump Thickness: 1 - 10','Uniformity of Cell Size: 1 - 10','Uniformity of Cell Shape: 1 - 10','Marginal Adhesion: 1 - 10','Single Epithelial Cell Size: 1 - 10','Bare Nuclei: 1 - 10','Bland Chromatin: 1 - 10','Normal Nucleoli: 1 - 10','Mitoses: 1 - 10','Class: (2 for benign, 4 for malignant)'])

You can check the dataframe using

tdf.head()

and you get

You can check the code on https://gist.github.com/e94b31914dbaebda7d11f6bfe0cfbdec

Dataframe has no column names. How to add a header?

3 Answers3