I'm trying to download data from SEC Edgar database using this code in R:
text<- "https://www.sec.gov/Archives/edgar/data/1602065/0001602065-20-000056.txt" %>%
GET(., add_headers("User-Agent" = "your$email.com")) %>%
read_html(.) %>%
html_nodes("p") %>%
html_text()
Running the code yields the following error message:
"Error in read_xml.raw(x, encoding = encoding, ..., as_html = TRUE, options = options) : Name t5o:2j is not XML Namespace compliant [202"]
(the "t5o:2j" part may be different for every document that returns an error.)
An error like this is returned on about 40% of the documents I try (500).
Does anyone have a clue on how such an error can be fixed?