3

I have satellite image which contains 3 different bands. I'm using Python (jupyter notebook) in order to calculate new band by applying random forest regression. My problem is that after I have predict all the values for the new pixels, I don't know how to take it back to the original dataframe with the original bands in order to create in the end new image.

This is the process I did:

  1. open the 3 bands image with rasterio, the band has this shape: (3, 869, 1202)
  2. create pandas df when each row represents a pixel and each column is a band :

    enter image description here

  3. train the data and fit to random forest:

    #split the data
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
    
    #import the algorithm
    rf=RandomForestRegressor()
    
    #reshape the y_train to fit the the model
    y_train=y_train.values.ravel()
    
    #fit the model
    rf.fit(X_train,y_train)
    
    rf_pred=rf.predict(X_test)
    
  4. after checking the results, apply it to the full dataset and not only to training and set in order to predict the new band:

    #create the data
    data=df.iloc[:,1:]
    
    pred_all=rf.predict(data)
    
    #reshape to one column:
    pred_all.reshape(1006560,1)
    

    So after this, I don't know how to take this predicted values back to my table or to "link" it with the original pixel entities.

My end goal is to be able in the end to have this predicted values as new band so I can create image with the new predicted values.

Noura
  • 3,429
  • 3
  • 20
  • 41
ReutKeller
  • 2,139
  • 4
  • 30
  • 84
  • As I understand it, at the moment you have a two-dimensional array with values, and you want to convert it to the tiff format with a geographic reference. It's right? – Comrade Che May 28 '20 at 06:45
  • yes, It was originally tif image, I extract it into pandas when each pixel has the three bands value, I did calculation with RF, which is seperated, and now I want to push it back into the pandas and get it back as image – ReutKeller May 28 '20 at 06:47
  • Have you read this? https://gis.stackexchange.com/questions/37238/writing-numpy-array-to-raster-file – Comrade Che May 28 '20 at 06:54
  • Another solution: https://stackoverflow.com/questions/37648439/simplest-way-to-save-array-into-raster-file-in-python – Comrade Che May 28 '20 at 06:57
  • no, thank you, i'll check it out – ReutKeller May 28 '20 at 06:57
  • i'm afraid that the fact that I have it as pandas table makes me lose the location of the pixels – ReutKeller May 28 '20 at 08:20
  • You can export geo-referencing from original satellite imagery. – Comrade Che May 28 '20 at 09:07
  • You can use rasterio to write a GeoTIFF again: https://rasterio.readthedocs.io/en/latest/quickstart.html#opening-a-dataset-in-writing-mode Most importantly, you must retrieve the transform from the original image in order to apply it to the tabular data in your predicted df. – StefanBrand_EOX May 28 '20 at 10:42
  • Do you have a typo? 869x1202=1 044 538‬, not 1 006 560 – StefanBrand_EOX May 28 '20 at 10:42
  • @Stefan I used dropna to remove the null values becuase RF doesn't work with null – ReutKeller May 28 '20 at 11:28

1 Answers1

2

The output will maintain the same order as it was predicted. You can use pd.concat to join it back to the original data on axis = 1.

# Re-run random forest using all the data we have available in our train set to predict accross the map area
random_forest_2 = RandomForestClassifier(n_estimators=1000, n_jobs = -1, oob_score = True)

random_forest_2.fit(Model_data_X, Model_data_Y)

#Run prediction on our apply dataset
print ('Performing prediction')
Model_apply = apply_zStats.drop('FOREST_ID', axis = 1)
Model_apply_predict = random_forest_2.predict(Model_apply)

#create dataframe
Model_apply_predict_df = pd.DataFrame(Model_apply_predict)

# Join predictions to FID and output
output = pd.DataFrame(apply_zStats['FID'])
output_merge = pd.concat([output.reset_index(drop=True), Model_apply_predict_df], axis=1)
output_merge.columns = ['FID','Class']

# Join back the training data
output_final = output_merge.append(reference_data, ignore_index = True)
Pdavis327
  • 960
  • 4
  • 12