1

I have a pandas dataframe with date values, however, I need to convert it from dates to text General format like in Excel, not to date string, in order to match with primary keys values in SQL, which are, unfortunately, reordered in general format. Is it possible to do it Python or the only way to convert this column to general format in Excel?

Here is how the dataframe's column looks like:

   ID         Desired Output
1/1/2022        44562
7/21/2024       45494
1/1/1931        11324
FObersteiner
  • 16,957
  • 5
  • 24
  • 56
Camilla
  • 111
  • 6
  • What does the output represent ? – Psidom Dec 30 '21 at 20:53
  • Does [this](https://stackoverflow.com/questions/9574793/how-to-convert-a-python-datetime-datetime-to-excel-serial-date-number#comment49502311_9574948) answer your question? – BrokenBenchmark Dec 30 '21 at 20:55
  • they are these dates in General format, if you will change these dates column format in excel from short date to general it will show this output – Camilla Dec 30 '21 at 20:55

2 Answers2

2

Yes, it's possible. The general format in Excel starts counting the days from the date 1900-1-1.

You can calculate a time delta between the dates in ID and 1900-1-1.

Inspired by this post you could do...

import pandas as pd

from datetime import date

# create a data frame
data = pd.DataFrame({'ID': ['1/1/2022','7/21/2024','1/1/1931']})

# convert the strings in ID to a datetime, then into a series with squeeze and then to a date format. The date format is helpful when calculating time deltas.

sr = pd.to_datetime(data['ID'], format= '%m/%d/%Y').squeeze().dt.date

# Calculate the time deltas by subtracting 1900-1-1 from date in sr and store it in the General format column of data.

data['General format'] =  sr.apply(lambda x: (x - date(1900, 1, 1)).days +2 ).to_frame()

print(data)

          ID  General format
0   1/1/2022           44562
1  7/21/2024           45494
2   1/1/1931           11324

Here a bit less condensed...

import pandas as pd

from datetime import date

data = pd.DataFrame({'ID': ['1/1/2022','7/21/2024','1/1/1931']})

ID_to_datetime = pd.to_datetime(data['ID'], format= '%m/%d/%Y')

ID_to_datetime_to_series = ID_to_datetime.squeeze() 

ID_to_datetime_to_series_to_date = ID_to_datetime_to_series.dt.date 

General_format = []

for a_date in ID_to_datetime_to_series_to_date:
   
   timedelta = a_date - date(1900, 1, 1) 
   
   General_format.append(timedelta.days + 2 )

data['General format'] =  General_format

print(data)

          ID  General format
0   1/1/2022           44562
1  7/21/2024           45494
2   1/1/1931           11324

The plus 2 tries to take care of the leap years. For the dates you provided +2 seems correct but you should verify this.

EDIT

Using pandas only as per suggestion by MrFuppes

data = pd.DataFrame({'ID': ['1/1/2022','7/21/2024','1/1/1931']})
data['General format'] =  (pd.to_datetime(data["ID"])-pd.Timestamp("1899-12-30")).dt.days
print(data)

I guess pandas is taking care of the leap years?

RSale
  • 385
  • 3
  • 11
  • 1
    Regarding your question, yes pandas datetime takes care of leap years (as Python datetime does). Adding 2 days to 1900-01-01 is necessary because 1) Excel starts counting at one, not zero and 2) it considers 1900 to be a leap year which is a bug inherited from Lotus 123 I think. – FObersteiner Dec 31 '21 at 12:31
  • 1
    See [my answer here](https://stackoverflow.com/a/65460255/10197418) and follow the links if you want some more background info. – FObersteiner Dec 31 '21 at 12:33
0

Excel stores dates as sequential serial numbers so that they can be used in calculations. By default, January 1, 1900 is serial number 1, and January 1, 2008 is serial number 39448 because it is 39,447 days after January 1, 1900.
-Microsoft's documentation

So you can just calculate (difference between your date and January 1, 1900) + 1

see How to calculate number of days between two given dates

tzman
  • 144
  • 1
  • 10