1

I am trying to plan out how long it will take me to clean my survey data. I have about 200 responses. The survey takes about 15 minutes, about 40-60 questions (depending on the logic). I have very few open-ended questions (maybe three total). Someone told me it should only take a few days to clean the data while others say 2 weeks. I am not really sure how to plan for this stage, so any guidance about how long it might take to clean a survey of this scale would be very helpful. I have never cleaned survey data before, and while I have some experience with Stata, I have never used it to clean my own data. Also, if you have any recommendations for resources that detail the cleaning process, I would be very grateful.

  • 1
    There is no single answer to this question, it is opinion based. The answer depends on your data, skills, dedication, conscientiousness etc. – Tim Feb 06 '19 at 19:53
  • 1
    Some Stata-specific advice here and here and here. For R and Python, this is great. All these book-length resources cover more than just cleaning, and I don't always follow their advice to the letter, but certainly in spirit. – dimitriy Feb 06 '19 at 22:02
  • Thank you so much for these resources - this is very helpful. I came across Mitchell's and Long's book, but have not purchased yet. Do you recommend one over the other? – user3424836 Feb 06 '19 at 22:24
  • How long is a piece of string? – Ben Feb 07 '19 at 01:41
  • @user3424836 I would go with Long if I had to get just one. – dimitriy Feb 07 '19 at 09:11

1 Answers1

2

It will depend on your skill, the "cleanliness" required, and the messiness of the data when you receive it.

It sounds like you are inexperienced. So that will make it take longer. I'd recommend you stick to Excel and not mess with STATA (but either way make sure you save versions, check things against the original, etc.)

Cleanliness: sometimes this is just fixing things like non-numeric values in a numeric field. Other times as part of the analysis process it means re-coding values (if age is a question, and you get one number, you might decide later to group it 10-20, 21-30, etc). The better you can define this up front the less time it will take you (and the more research you can do in advance).

Perhaps someone who has done it before (preferably, someone who says it only takes a few days) can help you-- either sitting with you as you start; or showing you examples of what they've done before.

To summarize: there is no one-size-fits all approach to data cleaning. You will find many examples and tips and packages to help under names like data cleaning/cleansing/tidying/wrangling, but those generally will only help you do something that you already conceptually understand that you need to do what it takes to do it (they'll just help do it better/faster/more accurately).

  • Thank you very much - I appreciate your advice. I have read some articles/chapters about cleaning data but not entirely sure yet what steps apply for my dataset. Thank you for putting this into perspective. – user3424836 Feb 06 '19 at 21:19
  • 2
    I would avoid cleaning the data in Excel. This is not a replicable or audit-able process, and every project I have ever participated in requires iterating on the data wrangling. Using Excel is also arguably more error prone. – dimitriy Feb 06 '19 at 21:30
  • Thank you. I was actually thinking about that too because I know that in stata you can document all the code but wasn't sure how that worked in Excel. Thank you for pointing that out. – user3424836 Feb 06 '19 at 21:31
  • 1
    I do not disagree with @Dimitriy V. Masterov. That being said, this may not be the time to get better at STATA. If you don't use excel, learn to do this with R not STATA. In grad school for econ many of my profs used STATA but they did their data cleanup in Excel. There are high profile examples of the pitfalls of using Excel for peer-reviewed publishing; i'm assuming your stakes are not that high. – Chris Umphlett Feb 06 '19 at 21:41
  • Thank you. This is part of my dissertation so (in my mind) the stakes are high :) I am a qualitative researcher, but I did mixed-methods for my project. All of my coursework used Stata. A number of people I've talked with also recommend R, but I don't have any programming background, and from the very little I know about R, it seems like the syntax is less intuitive than stata. It seems like a lot of people think R is a better program (as an aside - I wonder why we only learned stata/sas in grad school?) – user3424836 Feb 06 '19 at 21:52
  • 1
    There are many examples of prominent economists screwing things up with Excel data cleaning, of which this is the most prominent one. Using R and Stata is the way to go for anything that matters doing well. Publishing something wrong can wreak havoc on your career, but so can making a stupid business decision. – dimitriy Feb 06 '19 at 21:52
  • Use SAS. Way more resources available than STATA when u need help. I learned SAS first, it’s more than adequate if u don’t have to pay for your own license. The SAS community message board will help u a lot. You can also get in touch with me for SAS help. – Chris Umphlett Feb 06 '19 at 22:05
  • TY for your offer and help - are there resources you can recommend that outline steps for data cleaning in SAS? I am currently learning about data cleaning using this book: The Practice of Survey Research, Theory, and Applications – Erin Ruel, William Edward Wagner, III, Brian Joseph Gillespie. This book addresses the conceptual side of data cleaning you mentioned earlier. Also, because I only learned STATA and SPSS and I have a month turnaround (I am hoping to submit this final article by end of March), do you think it would be feasible to learn SAS and do the analysis in that time frame? – user3424836 Feb 06 '19 at 22:13
  • Just to clarify - we learned a little SAS but primarily worked with SPSS and STATA – user3424836 Feb 06 '19 at 22:28
  • Learn R then. SAS will work well if you understand the mechanics of the DATA step but using R and the tidyverse packages will probably be faster to learn and apply. You can use this book which is free online: https://r4ds.had.co.nz/ – Chris Umphlett Feb 06 '19 at 23:02
  • Thank you so much for your advice and the resource. I very much appreciate your time and help. – user3424836 Feb 07 '19 at 00:27
  • 1
    The spelling is Stata. – Nick Cox Feb 07 '19 at 01:01