-5

So, I have a dataset (some first rows of it pasted here). My goal is to plot a frequency distribution of the 'sample_date' column. It seemed pretty simple to me at first. Just convert the column to datetime, sort values (dates) by default in an ascending order, and finally plot the bar chart. But the problem is that the bar chart is displayed NOT IN AN ASCENDING ORDER OF DATES (which is what I want to get), but in a DESCENDING ORDER OF VALUE COUNTS CORRESPONDING TO THESE DATES.

Here is the code:

import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv('dataset.csv')
data['sample_date'] = pd.to_datetime(data['sample_date'])
data = data.sort_values(by='sample_date')
data['sample_date'].value_counts().plot(kind='bar')

Here is the dataset.csv:

,sequence_name,sample_date,epi_week,epi_date,lineage
1,England/MILK-1647769/2021,2021-06-07,76,2021-06-06,C.37
2,England/MILK-156082C/2021,2021-05-06,71,2021-05-02,C.37
3,England/CAMC-149B04F/2021,2021-03-30,66,2021-03-28,C.37
4,England/CAMC-13962F4/2021,2021-03-04,62,2021-02-28,C.37
5,England/CAMC-13238EB/2021,2021-02-23,61,2021-02-21,C.37
0,England/PHEC-L304L78C/2021,2021-05-12,72,2021-05-09,B.1.617.3
1,England/MILK-15607D4/2021,2021-05-06,71,2021-05-02,B.1.617.3
2,England/MILK-156C77E/2021,2021-05-05,71,2021-05-02,B.1.617.3
4,England/PHEC-K305K062/2021,2021-04-25,70,2021-04-25,B.1.617.3
5,England/PHEC-K305K080/2021,2021-04-25,70,2021-04-25,B.1.617.3
6,England/ALDP-153351C/2021,2021-04-23,69,2021-04-18,B.1.617.3
7,England/PHEC-30C13B/2021,2021-04-22,69,2021-04-18,B.1.617.3
8,England/PHEC-30AFE8/2021,2021-04-22,69,2021-04-18,B.1.617.3
9,England/PHEC-30A935/2021,2021-04-21,69,2021-04-18,B.1.617.3
10,England/ALDP-152BC6D/2021,2021-04-21,69,2021-04-18,B.1.617.3
11,England/ALDP-15192D9/2021,2021-04-17,68,2021-04-11,B.1.617.3
12,England/ALDP-1511E0A/2021,2021-04-15,68,2021-04-11,B.1.617.3
13,England/PHEC-306896/2021,2021-04-12,68,2021-04-11,B.1.617.3
14,England/PORT-2DFB70/2021,2021-04-06,67,2021-04-04,B.1.617.3

Here is what I get and do not want to get: BAR CHART FOR THE 'SAMPLE_DATE' COLUMN IN A DESCENDING ORDER OF VALUE COUNTS OF THE DATES

Ameerah
  • 23
  • 5

2 Answers2

1
import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv('dataset.csv')
data['sample_date'] = pd.to_datetime(data['sample_date'])
data['sample_date'].value_counts().sort_index().plot(kind='bar') # Use sort_index()

plt.tight_layout()
plt.show()

Sorted by date

filiabel
  • 355
  • 1
  • 7
-1

The value_counts() give you a option to add a flag - ascending you only need to set it to True and the bar chart will be in ascending order. actually you don't need to use the sort_values() at all.

Check out value_counts() documentation: https://pandas.pydata.org/docs/reference/api/pandas.Series.value_counts.html

Code:

import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv('dataset.csv')
data['sample_date'] = pd.to_datetime(data['sample_date'])
data['sample_date'].value_counts(ascending=True).plot(kind='bar')
plt.show()  

Output:

enter image description here

Eitan Rosati
  • 533
  • 4
  • 18