Error while estimating a production function with prodest package in R

Question

I have an unbalanced panel with 15063 firms between 2012 and 2018.

I am using this code to estimate the production function with Levinsohn & Petrin method:

levpet <- prodestLP(Y=base$c_y,
 fX=base$c_l,
 sX=base$c_k,
 pX=base$c_m,
 idvar=base$ruc,
 timevar = base$year,
 R=100  )

where

c_y is log of value added
c_l is log of wages
c_k is log of capital
c_m is log of materials
ruc is the firm identifier (a string)
year is a numeric.

The error I am getting looks like this:

Error in `[[<-.data.frame`(`*tmp*`, i, value = c(42719L, 82109L, 82678L,  : 
  replacement has 469326 rows, data has 78221

I don´t know what it means or what I have to do about it.

How can I solve it? My data is very similar to this one:

data(chilean)

data and prodestLPfunction are inside library(prodest)

These are all the libraries I have in the current script

library(tidyverse)
library(dplyr)
library(foreign)
library(haven)
library(readxl)
library(stringr)
library(expss)
library(lubridate)
library(prodest)
library(plm)

Thanks in advance.

Edit: With the dataset that's in the prodest package it runs just fine. Here's an example: https://rpubs.com/hacamvan/319728

My dataset is very similar, just way more observations, and a string idvar.

I think I've made it.

I had to assign a numerical id for every firm. then it worked.

now I just hope I can export the regressions.

Can you post your own solution as an answer and accept it and then question can be closed. — Jesper Hybel, Aug 08 '21 at 16:16

score 3 · Accepted Answer · answered Aug 10 '21 at 02:29

Well, I figured out that the error meant something about not fitting in the dataset. My dataset had 78221 obs.

Everything was like data(chilean) the only difference had to be the id for each firm.

So I did this:

list <- data.frame(table(base$ruc)) # so here i got the string id for every firm
list$idvar <- seq(1:nrow(list)) # here i put a number for each firm from 1.
list <- select(list, ruc, idvar) # We don't need freq var
base <- base %>% left_join(y=list, by=c("ruc"="ruc")) #join it to the original dataset

And then it worked using base$idvar instead of using base$ruc.

thank you, this is really helping. I believe the issue was the IDs not being an ordered number starting from 1, which is really annoying — Bob, Oct 03 '22 at 10:03

Bob · Answer 2 · 2022-10-18T09:14:35.203

Elaborating on the answer by @Jor Jorge Paredes, it is worth noting that the idvar needs to follow the original ID variable, while the above answer does not guarantee that this is the case and may create a wrong numbering.

A safer solution (edited):

idlist <- df[, c("time","original_id")] # id for every firm
idlist  <- transform(idlist, idvar = as.numeric(factor(original_id)))
df <- merge(df, idlist, by=c("original_id", "time"), all.x = T) #join it to the original dataset

EDIT: Or even smoother

 df <- df %>%
   mutate(idvar = as.numeric(factor(original_id)))

Error while estimating a production function with prodest package in R

2 Answers2