This latex:
\documentclass{article}
\showoutput
\usepackage{lipsum}
\begin{document}
\lipsum
\end{document}
Produces a log file showing the position of all the output:
.....
LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 4.
LaTeX Font Info: ... okay on input line 4.
LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 4.
LaTeX Font Info: ... okay on input line 4.
Completed box being shipped out [1]
\vbox(633.0+0.0)x407.0
.\glue 16.0
.\vbox(617.0+0.0)x345.0, shifted 62.0
..\vbox(12.0+0.0)x345.0, glue set 12.0fil
...\glue 0.0 plus 1.0fil
...\hbox(0.0+0.0)x345.0
..\glue 25.0
..\glue(\lineskip) 0.0
......
...\hbox(6.94444+1.94444)x345.0, glue set 0.85849
....\hbox(0.0+0.0)x15.0
....\OT1/cmr/m/n/10 L
....\OT1/cmr/m/n/10 o
....\OT1/cmr/m/n/10 r
....\OT1/cmr/m/n/10 e
....\OT1/cmr/m/n/10 m
....\glue 3.33333 plus 1.66666 minus 1.11111
....\OT1/cmr/m/n/10 i
....\OT1/cmr/m/n/10 p
.......
So with a bit of perl (which might need to be made smarter in a real example)
You can re-constitute the text adding the requested line and paragraph markup:
#!/usr/bin/perl
while(<>){
chomp();
if(m@^\.[^ ]* (.)\s*$@){
print "$1";
}
if (m@ligature ([^ ]*)\)\s*$@){
print "$1";
}
if(m@^\.*\\glue ([0-9.]*)@){
print " " if ($1 > 2);
}
print"\n<br>" if (m@\\baselineskip@);
print"\n<p>" if (m@\\parskip@);
print "\n\n<hr>\n\n" if (m@Completed box being shipped@);
}
then perl zz.pl zz.log > zz.html produces:
.....
<br>fau-cibus. Morbi do-lor nulla, male-suada eu, pul-v-inar at, mol-lis ac, nulla. Cur-
<br>abitur auc-tor sem-per nulla. Donec var-ius orci eget risus. Duis nibh mi, congue
<br>eu, ac-cum-san eleifend, sagit-tis quis, diam. Duis eget orci sit amet orci dig-nis-sim
<br>rutrum.
<p>
<br>Nam dui ligula, fringilla a, euismod sodales, sollicitudin vel, wisi. Morbi
<br>auctor lorem non justo. Nam lacus libero, pretium at, lobortis vitae, ultric
...
which looks like

\showoutputto the document then essentially all the information about where each character went is in the log file, but constructing any usable html from that format might be an interesting exercise. – David Carlisle Mar 02 '15 at 13:00