3

I have seen several other StackOverflow posts such as this one: How to Use/Enable (RegExp object) Regular Expression using VBA (MACRO) in word on using regular expressions in Microsoft Word with VBA using the Microsoft VB script Regular Expressions 5.5 Reference.

That helped me prepare the following, which I use in Word to highlight US Dollar amounts:

Sub dollarHighlighter()
Set regExp = New regExp
Dim objMatch As Match
Dim colMatches As MatchCollection
Dim offsetStart As Long
offsetStart = Selection.Start
regExp.Pattern = "\$([\,\d{1,3}]*(?:\.\d{2})?)"
regExp.Global = True
Set colMatches = regExp.Execute(Selection.Text)   ' Execute search.
For Each objMatch In colMatches   ' Iterate Matches collection.
  Set myRange = ActiveDocument.Range(objMatch.FirstIndex + offsetStart, 
    End:=offsetStart + objMatch.FirstIndex + objMatch.Length)
  myRange.FormattedText.HighlightColorIndex = wdYellow
Next
   End Sub

While this works as expected on a list of dollar amounts within text (for the most part - among its imperfections the regex is intentionally a bit loose) it does not work as anticipated when there are hyperlinks present in the Word document.

In that instance, there appears to be a shift in offset of the highlighted characters in a somewhat unpredictable manner. I assume this is because there is a lot of new xml/css in the document.xml source file.

Ultimately, my overarching questions is, can I use regex to highlight word document contents even if it contains hyperlinks? Is it an offset question or should I run the regex on the compressed xml, re compress and reopen for better results? As when I test various regex variations on the source code, I get the anticipated results, but not when formatting what would be the Word range.

I have also asked this here: https://social.msdn.microsoft.com/Forums/en-US/3a95c5e4-9e0c-4da9-970f-e0bf801c3170/macro-for-a-regexp-search-replace?forum=isvvba&prof=required but realize it was an ancient post...

Per question below, here are some possibly helpful links: an example document http://www.smithany.com/test.docx step 1 http://www.smithany.com/wordusd1.jpg Step 2 http://www.smithany.com/wordhighlighterrun.jpg and what happens http://www.smithany.com/whatactuallyhappens.jpg

Temporary Workaround: As suggested below Word's Wildcard find is fast if you do not stack the loops. try this:

Sub Macro2()
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.highlight = True
With Selection.Find
    .Text = "$[0-9,]{1,}"
    .Replacement.Text = ""
    .Forward = True
    .Wrap = wdFindContinue
    .Format = True
    .MatchCase = False
    .MatchWholeWord = False
    .MatchAllWordForms = False
    .MatchSoundsLike = False
    .MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.highlight = True
With Selection.Find
    .Text = "$[0-9,]{1,}.[0-9]{2,3}"
    .Replacement.Text = ""
    .Forward = True
    .Wrap = wdFindContinue
    .Format = True
    .MatchCase = False
    .MatchWholeWord = False
    .MatchAllWordForms = False
    .MatchSoundsLike = False
    .MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll

End Sub

which basically gets all the dollar amounts highlighted. That said, complex expressions like matching a variety of date formats could get messy, but i suppose it is all possible doing them one step at a time.

Allan
  • 95
  • 6
  • Please provide an example of the input and the expected match. – 41686d6564 stands w. Palestine May 23 '18 at 03:21
  • 52 52 52. $52.52 ($) ($52) ($5.52) the above should highlight any dollar sign $ followed by a digit. When there is a hyper link present such as www.adsfasdf.com or asdf@asdfasdf.com preceded or followed by arbitrary contents $123.12, the incorrect contents are highlighted by some seemingly random offset, again, only inside MS word – Allan May 23 '18 at 03:23
  • specifically, here are some links that may be helpful: www.smithany.com/test.docx and www.smithany.com/wordusd1.jpg and www.smithany.com/wordhighlighterrun.jpg and also www.smithany.com/whatactuallyhappens.jpg – Allan May 23 '18 at 03:29
  • 1
    Put simply, you can't rely on the `Range.Start` and `.End` properties as "set" values. Word has too many non-printing (and non-visible) characters in the document that simply cannot be factored in. For hyperlinks, those are field codes. Does Word's wildcard find not work? – Cindy Meister May 23 '18 at 04:14
  • 1
    Is there a reason you can't use a *wildcard* Find in Word for this? In your MSDN post you seem to think that can't be used to find dates or dollar amounts, neither of which is true. A *wildcard* Find in Word in Word would have no trouble identifying the ranges correctly. – macropod May 23 '18 at 05:18
  • The MSDN article was not my OP, i just replied. Without alternation at a minimum, it seems like that'll be very difficult. Try this regex (ugly but seems functional). In Word it'll be a nightmare if even possible: (January|February|March|April|May|June|July|August|September|October|November|December|Jan[-\.]?|Feb[-\.]?|Mar[-\.]?|Apr[-\.]?|Jun[-\.]?|Jul[-\.]?|Aug[-\.]?|Sept?[-\.]?|Oct[-\.]?|Nov[-\.]?|Dec[-\.]?)|(\d{1,2})[\/-](\d{1,2})(?:[\/-])?(\d{4}|\d{2})?\s?|(\d{1,2})[,\s][,\s']*(\d{4}|\d{2})|(\d{1,2})[thsnrd]{0,2}[,\s(?=\b)][,\s']*(\d{4}|\d{2})? – Allan May 23 '18 at 14:17

1 Answers1

3

I had not touched VBA for years but I guess it's like bicycling.

Anyways, here is a sub that should help you out. It's based on Cindy Meister sound recommendation and fills the gap between Regex and Wildcard Find using a collection of match patterns for optional parts.

First, the Wildcard matches: $[0-9,]{1,} and $[0-9,]{1,}.[0-9]{2}

It's not that different after all, isn't it? However, to take the optional fraction part into account I have to use two patterns.

And here's the routine:

Sub WildcardsHighlightWords()
    Dim Word As Range
    Dim WildcardCollection(2) As String
    Dim Words As Variant
    WildcardCollection(0) = "$[0-9,]{1,}"
    WildcardCollection(1) = "$[0-9,]{1,}.[0-9]{2}"
    Options.DefaultHighlightColorIndex = wdYellow
    'Clear existing formatting and settings in Find
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    'Set highlight to replace setting.
    Selection.Find.Replacement.Highlight = True
    'Cycle through document and find wildcards patterns, highlight words when found
    For Each Word In ActiveDocument.Words
        For Each WildcardsPattern In WildcardCollection
            With Selection.Find
                .Text = WildcardsPattern
                .Replacement.Text = ""
                .Forward = True
                .Wrap = wdFindContinue
                .Format = True
                .MatchCase = False
                .MatchWholeWord = False
                .MatchWildcards = True
                .MatchSoundsLike = False
                .MatchAllWordForms = False
            End With
            Selection.Find.Execute Replace:=wdReplaceAll
        Next
    Next
End Sub

It should be easy to extend or modify this approach if needed.

This highlithts the Dollar amounts as desired on my end:

enter image description here

Note: The separator in the quantifier {n,m} is not the same in all localizations, e.g. it's {n;m} in the German version.

wp78de
  • 17,272
  • 6
  • 36
  • 68
  • For some reason, on larger documents, I believe any multi page document, your Collection of Word Wildcards subroutine above seems to hang. I will take a closer look and see if I can make it work on larger documents in which case I would be inclined to mark as an answer. Certainly my goal here was to be able to use my existing patterns from other sources to match dates (numbers and spelled months), emails etc which often require much broader pattern matching than I believe word can offer. – Allan May 23 '18 at 14:15