Splitting test data in a csv column into several using Python

Question

I am relatively new to python.

I need to split out some test data (humidity,temperature) from a csv file. The file has no headers but column 0 contains the date, column 1 contains time and column 2 contains the data which I need to split out. The data comes from an Arduino where I measured humidity and temperature.

I can open the file using pandas csv_read but I can't seem to seperate column 2 out using the sep=',' and I can't work out why. Once I have sepearted the data out I need to write this to a new file.

import pandas as pd

file = open('RH_TEMP-13-01-2020-161111.csv', 'r')

df = pd.read_csv(file, sep = ',', header = None)

print(df)

Output:

        0         1                          2

0   13/01  16:11:13  41.84,20.36,37.19,21.59\n

1   13/01  16:11:15  41.84,20.36,37.17,21.59\n

2   13/01  16:11:16  41.79,20.37,37.25,21.59\n

3   13/01  16:11:18  41.79,20.36,37.25,21.59\n

4   13/01  16:11:19  41.77,20.37,37.04,21.61\n

5   13/01  16:11:20  41.77,20.37,36.95,21.59\n

6   13/01  16:11:22  41.74,20.37,37.69,21.61\n

7   13/01  16:11:23  41.72,20.37,38.48,21.61\n

8   13/01  16:11:25  41.72,20.39,37.94,21.61\n

9   13/01  16:11:26  41.72,20.39,37.54,21.62\n

10  13/01  16:11:27  41.70,20.39,37.25,21.62\n

11  13/01  16:11:29  41.70,20.37,37.04,21.61\n

12  13/01  16:11:30  41.70,20.40,36.95,21.61\n

13  13/01  16:11:32  41.67,20.40,36.90,21.61\n

14  13/01  16:11:33  41.67,20.40,36.92,21.62\n

15  13/01  16:11:34  41.67,20.41,36.87,21.61\n

16  13/01  16:11:36  41.64,20.40,36.87,21.62\n

17  13/01  16:11:37  41.64,20.41,36.87,21.62\n

18  13/01  16:11:39  41.64,20.41,36.90,21.64\n

19  13/01  16:11:40  41.62,20.41,36.90,21.62\n

20  13/01  16:11:42  41.62,20.41,36.90,21.62\n

21  13/01  16:11:43  41.62,20.43,39.02,21.62\n

Original csv file

13/01,16:11:13,"41.84,20.36,37.19,21.59 " 13/01,16:11:15,"41.84,20.36,37.17,21.59 " 13/01,16:11:16,"41.79,20.37,37.25,21.59 " 13/01,16:11:18,"41.79,20.36,37.25,21.59 " 13/01,16:11:19,"41.77,20.37,37.04,21.61 " 13/01,16:11:20,"41.77,20.37,36.95,21.59 "

Can you please mention first few lines of the raw CSV file ( RH_TEMP-13-01-2020-161111.csv before importing or processing)? — instinct246, Jan 19 '20 at 18:10
[Please, don't post images of text.](https://unix.meta.stackexchange.com/questions/4086/psa-please-dont-post-images-of-text). Open your CSV on notepad or wordpad (assuming you are on Windows because of the screenshot), select a few lines, copy them, and then edit your post and paste the selected text on it. — accdias, Jan 19 '20 at 18:27
without the data (not a picture of part of it) its hard to see whats going on. I assume you are not actually saving a .csv file with comma delimited columns. read_csv does not need sep= because read_csv is read_table with sep=',' Open your ".csv" file in a text reader (notepad, gedit, etc.), copy a few rows for us and post above. — merit_2, Jan 19 '20 at 18:31
Another article that can help you improve your question: [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — accdias, Jan 19 '20 at 18:33
sorry I get you now I have pasted the csv data into the question. First time I've used this so bit of a noob. Ok I will remove the 'sep='. I want to split the data out of that column into multiple columns, then save it as a new csv for later analysis. — Seanbean666, Jan 19 '20 at 18:40

score 1 · Answer 1 · answered Jan 19 '20 at 18:57

I created a .csv file from what you provided and imported it like this (gave the columns names)

import pandas as pd

df = pd.read_csv('splitting.csv', header=None, names=["date", "time", "nums"])

the data looks like this after I import them,

   date      time                       nums
0  13/01  16:11:13  41.84,20.36,37.19,21.59  
1  13/01  16:11:15  41.84,20.36,37.17,21.59  
2  13/01  16:11:16  41.79,20.37,37.25,21.59  
3  13/01  16:11:18  41.79,20.36,37.25,21.59  
4  13/01  16:11:19  41.77,20.37,37.04,21.61

the issue you have is python see's column 'nums' as a string. We need to split that up.

Like this,

new= df["nums"].str.split(",", n = 5, expand = True)

I set n to 5 to be arbitrarily long. You only need 4.

print(new.head())

      0      1      2        3
0  41.84  20.36  37.19  21.59  
1  41.84  20.36  37.17  21.59  
2  41.79  20.37  37.25  21.59  
3  41.79  20.36  37.25  21.59  
4  41.77  20.37  37.04  21.61

From here you can name each column and/or put these new columns back into your original dataframe.

That's great it works! Just one more thing, when you print(new) it comes up with /n. Is there a way of removing this or is there no point? When I merge the file the final value appears blank but when I click the box in excel it is there (weird). — Seanbean666, Jan 19 '20 at 19:46

Splitting test data in a csv column into several using Python

1 Answers1