XML namespace compliant when crawling Edgar SEC database

Asked Nov 07 '21 at 17:17

Active Nov 07 '21 at 20:53

Viewed 54 times

I'm trying to download data from SEC Edgar database using this code in R:

text<- "https://www.sec.gov/Archives/edgar/data/1602065/0001602065-20-000056.txt" %>% 
  GET(., add_headers("User-Agent" = "your$email.com")) %>% 
  read_html(.) %>%
  html_nodes("p") %>%
  html_text()

Running the code yields the following error message:

"Error in read_xml.raw(x, encoding = encoding, ..., as_html = TRUE, options = options) : Name t5o:2j is not XML Namespace compliant [202"]

(the "t5o:2j" part may be different for every document that returns an error.)

An error like this is returned on about 40% of the documents I try (500).

Does anyone have a clue on how such an error can be fixed?

edited Nov 07 '21 at 20:53

lovalery

4,254
3
13
27

asked Nov 07 '21 at 17:17

MariusJ

https://stackoverflow.com/questions/39281889/name-is-not-xml-namespace-compliant – QHarr Nov 07 '21 at 22:43

XML namespace compliant when crawling Edgar SEC database

0 Answers0