0

I have a pandas dataframe and I want to check or keep only the columns which have a trend.

import numpy as np
import pandas as pd

df = pd.DataFrame(
    np.array([[1, 2, 3], [4, 3, 6], [7, 2, 9], [4, 2, 11], [4, 2, 13]]),
    columns=["a", "b", "c"],
)

In the following thread, there is a description of numpy.polyfit(): How can I detect if trend is increasing or decreasing in time series? or https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html

numpy.polyfit(x, y, deg, rcond=None, full=False, w=None, cov=False) 

If the result is much greater than zero, so it shows your data is increasing steadily.

How to apply to pandas dataframe in order to define random value xy and the trend has to be larger than this random value in order to keep only columns which are greater than this specified value xy?

Laurent
  • 5,506
  • 7
  • 15
  • 28
Adler Müller
  • 218
  • 1
  • 9
  • 2
    Use df['column_name'].to_numpy() to get numpy arrays that you can then pass to polyfit. As for the random threshold, pick a distribution from which you want to draw your variable. Then you can use the numpy.random or scipy.stats modules to draw a variable from your chosen distribution. – kubatucka Oct 11 '21 at 07:32

0 Answers0