5

I have an unbalanced panel with 15063 firms between 2012 and 2018.

I am using this code to estimate the production function with Levinsohn & Petrin method:

levpet <- prodestLP(Y=base$c_y,
 fX=base$c_l,
 sX=base$c_k,
 pX=base$c_m,
 idvar=base$ruc,
 timevar = base$year,
 R=100  )

where

  • c_y is log of value added
  • c_l is log of wages
  • c_k is log of capital
  • c_m is log of materials
  • ruc is the firm identifier (a string)
  • year is a numeric.

The error I am getting looks like this:

Error in `[[<-.data.frame`(`*tmp*`, i, value = c(42719L, 82109L, 82678L,  : 
  replacement has 469326 rows, data has 78221

I don´t know what it means or what I have to do about it.

How can I solve it? My data is very similar to this one:

data(chilean)

data and prodestLPfunction are inside library(prodest)


These are all the libraries I have in the current script

library(tidyverse)
library(dplyr)
library(foreign)
library(haven)
library(readxl)
library(stringr)
library(expss)
library(lubridate)
library(prodest)
library(plm)

Thanks in advance.


Edit: With the dataset that's in the prodest package it runs just fine. Here's an example: https://rpubs.com/hacamvan/319728

My dataset is very similar, just way more observations, and a string idvar.


I think I've made it.

I had to assign a numerical id for every firm. then it worked.

now I just hope I can export the regressions.

2 Answers2

3

Well, I figured out that the error meant something about not fitting in the dataset. My dataset had 78221 obs.

Everything was like data(chilean) the only difference had to be the id for each firm.

So I did this:

list <- data.frame(table(base$ruc)) # so here i got the string id for every firm
list$idvar <- seq(1:nrow(list)) # here i put a number for each firm from 1.
list <- select(list, ruc, idvar) # We don't need freq var

base <- base %>% left_join(y=list, by=c("ruc"="ruc")) #join it to the original dataset

And then it worked using base$idvar instead of using base$ruc.

  • thank you, this is really helping. I believe the issue was the IDs not being an ordered number starting from 1, which is really annoying – Bob Oct 03 '22 at 10:03
0

Elaborating on the answer by @Jor Jorge Paredes, it is worth noting that the idvar needs to follow the original ID variable, while the above answer does not guarantee that this is the case and may create a wrong numbering.

A safer solution (edited):

idlist <- df[, c("time","original_id")] # id for every firm
idlist  <- transform(idlist, idvar = as.numeric(factor(original_id)))
df <- merge(df, idlist, by=c("original_id", "time"), all.x = T) #join it to the original dataset

EDIT: Or even smoother

 df <- df %>%
   mutate(idvar = as.numeric(factor(original_id)))
Bob
  • 81
  • 4