0

Situation

I have a dataframe attributes that holds some attribute information about cars in 3 series:

attributes = {'brand': ['Honda Civic','Honda Civic','Honda Civic','Toyota Corolla','Toyota Corolla','Audi A4'],
          'attributeName': ['wheels','doors','fuelType','wheels','color','wheels'],
          'attributeValue': ['4','2','hybrid','4','red','4']
        }

Expected result

result = {'brand':   ['Honda Civic','Toyota Corolla','Audi A4'],
          'wheels':  ['4','4','4'],
          'doors':   ['2','',''],
          'fuelType':['hybrid','',''],
          'color':   ['','red','']
         }

How can I realize this?

Transform the values from attributeName into series to represent its attributeValue for each brand/car in one row.

With get_dummies I get this transformation, but only with true/false values not with the original values.

HedgeHog
  • 12,487
  • 2
  • 11
  • 31

1 Answers1

0

This is a simple pivot:

attributes.pivot(index='brand',
                 columns='attributeName',
                 values='attributeValue').fillna('')

or, shorter as your columns are in right order:

attributes.pivot(*attributes).fillna('')

To format it exactly as your provided output (except column order, please give details on that), you can use:

(attributes.pivot(index='brand', columns='attributeName', values='attributeValue')
           .fillna('').rename_axis(None, axis=1)
           .reset_index()
)

output:

            brand color doors fuelType wheels
0         Audi A4                           4
1     Honda Civic           2   hybrid      4
2  Toyota Corolla   red                     4
mozway
  • 81,317
  • 8
  • 19
  • 49