0

I'm trying to download data from SEC Edgar database using this code in R:

text<- "https://www.sec.gov/Archives/edgar/data/1602065/0001602065-20-000056.txt" %>% 
  GET(., add_headers("User-Agent" = "your$email.com")) %>% 
  read_html(.) %>%
  html_nodes("p") %>%
  html_text()

Running the code yields the following error message:

"Error in read_xml.raw(x, encoding = encoding, ..., as_html = TRUE, options = options) : Name t5o:2j is not XML Namespace compliant [202"]

(the "t5o:2j" part may be different for every document that returns an error.)

An error like this is returned on about 40% of the documents I try (500).

Does anyone have a clue on how such an error can be fixed?

lovalery
  • 4,254
  • 3
  • 13
  • 27
MariusJ
  • 41
  • 4

0 Answers0