EDIT: I feel I should clarify, that I DON'T have a parsing problem to the best of my knowledge. I'm already using DOM and xpath and that feels adequate. The problem I'm having is a logic riddle in using the already parsed results within a loop.
I have a long string of HTML I'm trying to parse with PHP. So far, I have preg_match's that are isolating what I want, but I would like some help to refine the output. The HTML is poorly formatted as one long string that is not between any tags, but basically contains headings, followed by one or more paragraphs and so on.
So my preg_match regular expressions are isolating the headings and paragraphs, and I'm using a for loop to grab less than or equal to 13 results, because the div into which they will be output has limited dimensions.
The amount of headings is an unknown variable, as is the amount of paragraphs per heading. What I am trying do to, is make sure the output ALWAYS ends with a paragraph, regardless of how many previous matches there are (sometimes there are 2 headings in a row. This is a data input issue and I'm working on that too, but have much less control over that). I realise the 'less than or equal' to part of the for loop is problematic. I imagine the solution is to isolate heading-paragraph pairs and then count them, I'm just not making any headway in doing so.
Would some kind bright spark point me in the right direction? :)
An example of the string I have to work with, is below:
<h2 class="border_top">Description:</h2>
<b>Surroundings</b><br /><br />Dunoon is the marine gateway to the Loch Lomond and the Trossachs National Park, and has a number of
local amenities including a cinema, two supermarkets, several choice
restaurants and bars, primary and secondary schools, library, Post
Office, community hospital, doctor surgeries and veterinary clinic,
leisure centre with swimming pool and gym.<br />Innellan has its own
primary school, shop and Post Office. There is a busy village hall and
a popular restaurant nearby. There are a number of clubs in the
Innellan area, including golf, bowling, tennis, cricket and
sailing.<br /><br /><b>Amenities</b><br /><br />A regular local bus
service operates in the area and there are bus and coach services to
popular towns across the West of Scotland. Frequent passenger ferry
services operate from Dunoon Pier to Gourock with regular bus and
train connections to Glasgow Central railway station. The train stops
at Paisley Gilmore Street en route, which is convenient for access to
Glasgow Airport. Western Ferries located at Hunter's Quay, offers
a regular vehicle service to McInroys Point in Gourock for access to
the national road network.<br /><br /><b>Contact</b><br /><br />To
view this property, please contact <b>Someone Somewhere</b> on <b>01234
467890</b> or <b>09876 543210</b><br /><br /><b>Porch/Vestibule 1.7m x
1.8m</b><br /><br />Modern PVCu wood effect front door set between glazed side panels. Cupboard with electricity meter and room for coats
and boots. Solid wooden flooring and shelving. Glazed door to lower
hall.<br /><br /><b>Lower Hall 2.7m x 3.65m at widest point</b><br
/><br />Provides access to double garage and WC, and to the winding
staircase to lower landing area. Solid wooden flooring.<br /><br
/><b>Downstairs Cloakroom 2m x 0.9m</b><br /><br />Under stairs WC and
wash basin with adequate headroom. Cork effect flooring. Extractor
fan. Fitted shelving.<br /><br /><br /><b>Lower Landing 1.5m x
3.8m</b><br /><br />Carpeted. Large cupboard with double doors, fitted with shelving and hanging rail.<br />Door access to the master bedroom
and stairs up to the next landing.<br /><br /><b>Master Bedroom 4.9m x
5.5m</b><br /><br />This large carpeted area featuring 2 large length windows, offering fantastic views of the Firth of Clyde and hills on
the opposite shore. There is also a side window making the room
bright. Access from here through door to the walk-in dressing room
area.<br /><br /><b>Dressing Room 3.85m x 2.2m</b><br /><br />Carpeted
with shelving and hanging rails and with plenty of room for additional
storage. Access through door to the en-suite bathroom. Extractor
fan.<br /><br /><b>En-suite 2m x 3.1m</b><br /><br />Walk-in shower
measuring 1100mm x 800mm, chrome towel rail, bidet, WC and wash basin
inset into fitted cupboards. Privacy glazed window and extractor
fan.<br /><br /><b>Main Landing 2.4m x 3.4m</b><br /><br />Solid wood
flooring. High ceiling with Velux roof light. Access to kitchen,
utility room and bedroom/study.<br /><br /><b>Bedroom/Study 4.1m x
4.2m</b><br /><br />This room which is currently used as a study, would be a generous double bedroom. Large fitted wardrobe. Loft
access. Twin windows with amazing views over the river. Solid wooden
flooring.<br /><br /><b>Kitchen 4.6m x 3.5m</b><br /><br
/>Double-doors from the main landing, leads into this bright and
spacious modern fitted kitchen with lots of storage and countertop,
some providing serving access to the dining area. Double window, tiled
splash-backs and hob extractor hood. Solid wooden flooring. Access to
dining area and to Sun Room/Family Room/Office.<br /><br /><b>Dining
Area 4.6m x 4.3m</b><br /><br />Single window in addition to glazed
French doors that open to a small guarded balcony with wonderful sea
views. Solid wooden flooring continues in from the kitchen to this
bright dining area.<br /><br /><b>Sun Room/Family Room/Office 3.1m x
4.3m</b><br /><br />Currently used as an office, this room has a window to the side and two to the back of the house, together with
patio doors which provide access to the decking. This bright and sunny
room has solid wooden flooring.<br /><br /><b>Utility Room 1.65m x
3.05m</b><br /><br />Worktop with stainless steel sink and tiled splash-back. Fitted shelving. Room for washing machine. Modern PVCu
back door with access to decking and garden. Window to garden.<br
/><br /><b>Upper Landing 2.7m x 2.3m</b><br /><br />Carpeted area with
fitted cupboard and access to lounge, bedroom and bathroom.<br /><br
/><b>Bathroom</b><br /><br />Fitted bath with mixer taps. WC and wash
basin inset into fitted cupboard. Shower unit with electric shower.
Cork effect flooring. Privacy glazed double windows and extractor
fan.<br /><br /><b>Bedroom 2.75m x 3.9m</b><br /><br />Peaceful
generous double bedroom with double window looking out onto the back
garden. Large, fitted, double-door wardrobe with fitted shelving and
hanging rail. Carpeted.<br /><br /><b>Lounge 5.5m x 4.2m</b><br /><br
/>Double-door entrance into this bright carpeted area. The lounge
features a total of 6 windows, 2 single side facing at each end, 2
facing the river as well as 2 x length windows offering fabulous
views of the Firth of Clyde and of the hills on the other side.
Elegant fire surround with living flame gas fire.<br /><br /><b>Double
Garage 6m x 6.5m</b><br /><br />Plenty of room in this double garage,
containing double up and over door and modern gas water heating
system.<br /><br /><b>Garden</b><br /><br />Terraced front garden with
mature planting and additional car parking. Side gate and pathway from
the front of the property leads along the side of the house and up
wooden stairs to timber decking. Decking extends along the rear and
steps up to a small grassed area with flower bed and planters. Pathway
at the other side of the house leads to a planted area at rear of the
house.<br /> <br />
<a name="map"></a>
As you can see, it lies between 2 tags, which I'm using to locate it in the HTML document. I am using DOM loadHTMLfile and then xpath to query the HTML, but I'm then using preg_match and regular expressions to get the headings and paragraphs from the result.
Headings from the above would include Surroundings, Amenities, anything with 0.00m x 0.00m
In this particular example, the headings are in bold tags, but this isn't always the case.