0

I import tab delimited data via pandas and assign new column names via dataframe.columns = []. However, while assigning the column names, the names' order is being changed.

This is my data:

"ID_final"  "Value01"   "Value02"   "Value03"   "Value04"   "Value05"   "Value06"   "Value07"   "Value08"   "Value09"   "Value10"   "Value11"   "Value12"
724 0.00332 0.00224 0.00186 0.00131 0.00108 0.09092 0.14388 0.02926 0.01127 0.00829 0.00593 0.00448
1029    0.00317 0.00221 0.00193 0.00139 0.00128 0.04204 0.09327 0.02509 0.01035 0.00776 0.00561 0.00438
1700    0.0051  0.00353 0.00304 0.00233 0.00189 0.13548 0.21747 0.04044 0.01531 0.01173 0.00856 0.00667

And this is what I do:

import pandas as pd 

dataframe = pd.read_csv('data.txt', sep='\t') 

header = {
        'ID',
        'January',
        'Febraury',
        'March',
        'April',
        'May',
        'June',
        'July',
        'August',
        'September',
        'October',
        'November',
        'December'}

dataframe.columns = header

After I've assigned the column names the order of the header has been changed and and starts with September with the other months following more or less randomly. How can I keep the order of header.

Stücke
  • 683
  • 3
  • 9
  • 30

1 Answers1

2

I believe you need pass values in list to parameter names in read_csv, also is necessary set header=0 for overwrite old columns names:

header = [
        'ID',
        'January',
        'Febraury',
        'March',
        'April',
        'May',
        'June',
        'July',
        'August',
        'September',
        'October',
        'November',
        'December']
dataframe = pd.read_csv('data.txt', sep='\t', header=0, names=header) 

Alternative solution is skip first header values:

dataframe = pd.read_csv('data.txt', sep='\t', skiprows=1, names=header) 

EDIT: Like @roganjosh mentioned in your solution only pass list to columns names:

dataframe = pd.read_csv('data.txt', sep='\t') 

header = [
        'ID',
        'January',
        'Febraury',
        'March',
        'April',
        'May',
        'June',
        'July',
        'August',
        'September',
        'October',
        'November',
        'December']

dataframe.columns = header
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
  • 1
    They could rename the columns afterwards, the issue is that the iterable of names that they passed is a `set` and not a `list` so the ordering isn't preserved. – roganjosh Aug 16 '19 at 11:12
  • @roganjosh - Sure, it is another solution. – jezrael Aug 16 '19 at 11:13