Pandas: Reading TSV into DataFrame

Question

I'm using Python 2.7 and have a TSV formatted as follows (368 rows × 3 columns):

date    dayOfWeek    pageviews
2016    4            3920
...

I have a Jupyter notebook saved in the same location as the TSV. I'm running this code:

import pandas as pd
pd.read_table('query_explorer.tsv')

I get back a dataframe that's 736 rows × 3 columns and filled with NaNs. It's interesting too, because I should have only 368 rows (exactly half of what I do have).

Any idea what's going on here?

Do you have blank lines at the end of the file, then? Check the number of _lines_ (not _rows_) in your TSV file. — DYZ, Jun 12 '17 at 15:37
@DYZ I just ran "wc -l query_explorer.tsv" and got back 369 which is what I expected... — anon_swe, Jun 12 '17 at 15:44
Where are the NaNs in your DataFrame? At the end or scattered around? — DYZ, Jun 12 '17 at 16:00
I cannot reproduce your issue if file is truly a **t**ab-**s**eparated **v**alues (tsv) file. — Parfait, Jun 12 '17 at 17:26

score 4 · Answer 1 · answered Jun 12 '17 at 16:02

4

How about:

pd.read_table('query_explorer.tsv',delim_whitespace=True,header=0)

answered Jun 12 '17 at 16:02

suvy

693
6
18

Same result as before, unfortunately – anon_swe Jun 12 '17 at 17:26
can you share the link to the file or subset of it? – suvy Jun 12 '17 at 19:30

score 1 · Answer 2 · edited May 15 '22 at 19:25

1

In csv files comma is the separator. For tsv files, the tab character will separate each field. pandas according to separator can recognize and separate columns.

import pandas as pd
pd.read_csv('query_explorer.tsv',sep="\t")

edited May 15 '22 at 19:25

marc_s

704,970
168
1,303
1,425

answered Aug 01 '20 at 22:25

sha_hla

284
1
2
13

Please edit the answer to include an explanation that covers why this will work. – Jason Aller Aug 02 '20 at 00:33
@JasonAller Done. – sha_hla Nov 15 '21 at 01:36

Pandas: Reading TSV into DataFrame

2 Answers2

Linked