1

I'm using Python 2.7 and have a TSV formatted as follows (368 rows × 3 columns):

date    dayOfWeek    pageviews
2016    4            3920
...

I have a Jupyter notebook saved in the same location as the TSV. I'm running this code:

import pandas as pd
pd.read_table('query_explorer.tsv')

I get back a dataframe that's 736 rows × 3 columns and filled with NaNs. It's interesting too, because I should have only 368 rows (exactly half of what I do have).

Any idea what's going on here?

smci
  • 29,564
  • 18
  • 109
  • 144
anon_swe
  • 8,021
  • 17
  • 78
  • 129

2 Answers2

4

How about:

pd.read_table('query_explorer.tsv',delim_whitespace=True,header=0)
suvy
  • 693
  • 6
  • 18
1

In csv files comma is the separator. For tsv files, the tab character will separate each field. pandas according to separator can recognize and separate columns.

import pandas as pd
pd.read_csv('query_explorer.tsv',sep="\t")
marc_s
  • 704,970
  • 168
  • 1,303
  • 1,425
sha_hla
  • 284
  • 1
  • 2
  • 13