-1

I have a huge data frame, and I want to calculate the correlation between some columns. The problem is those columns contain strings here and there, which prevents me from such a calculation. How can I delete the rows that contain strings, just in a certain column? Note: I don't know what the strings look like, there are many, and I need just a line of code that deletes them all together.

This is what I tried, didn't work for some reason:

df= df[df['column_name'].apply(lambda x: str(x).isdigit())]
eng
  • 11
  • 1
  • 2
L0987
  • 139
  • 6
  • Does [this post](https://stackoverflow.com/a/28680078/7375347) answer your question? – tax evader May 22 '22 at 18:05
  • Do you know how to check an object's type? See [What's the canonical way to check for type in Python?](/q/152580/4518341) – wjandrea May 22 '22 at 18:07
  • It'd help if you provided a [reproducible pandas example](/q/20109391/4518341). See also [mre]. – wjandrea May 22 '22 at 18:09
  • 1
    @taxevader No because this post talks about a particular string, I have several I and cant start looking for them because the data is too large – L0987 May 22 '22 at 18:10
  • @wjandrea Yes, the columns are type object – L0987 May 22 '22 at 18:10
  • @L0987 No, not the Pandas dtype, the Python type. That is, a column of type `object` might have objects of any Python type inside it, like `str`, `int`, `list`, etc. – wjandrea May 22 '22 at 18:11
  • I see, no I don't know how to do that @wjandrea – L0987 May 22 '22 at 18:16
  • @L0987 OK, I've closed your question as a duplicate then. LMK if anything's unclear. – wjandrea May 22 '22 at 18:16
  • @L0987 Actually, I should probably clarify up front: you already know how to use all the Pandas stuff properly, you just need to change what the lambda does. – wjandrea May 22 '22 at 18:19
  • 2
    @L0987 Oh I see, I think you can convert the column into numeric type by using [pd.to_numeric](https://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.to_numeric.html) `df['column_name'] = pd.to_numeric(df['column_name'], errors='coerce')` which will cause any column that can't be converted to numeric type `NaN` and then all you need to do is filter out row with NaN in that column with `df = df[df['column_name'].notna()]` – tax evader May 22 '22 at 18:40
  • @taxevader This solves my problem. Thank you all very much! – L0987 May 23 '22 at 06:36
  • @taxevader Any idea how to do the opposite? turn numeric values to NaN. Just out of curiosity. – L0987 May 23 '22 at 08:11
  • @L0987 Hey, I just realized I might have misunderstood what you're trying to accomplish, but it looks like Tax Evader already got it :) Sorry, I'm still learning Pandas myself. There's another existing question about that, so I tacked it on :) – wjandrea May 23 '22 at 18:03
  • @L0987 If you want to filter out row with numeric value in column and retain row with non-numeric, you can use the `pd.ro_numeric` function but instead assign it to a different column so it don't overwrite the existing column `df['column_numeric'] = pd.to_numeric(df['column_name'], errors='coerce')` and then filter out row without NaN in the `column_numeric` column by using `notna` but with inverse `~` operator `df[~df['column_numeric'].notna()]` – tax evader May 24 '22 at 06:22

0 Answers0