- Paste the html code you are working on into a new word document.
- Press ctrl+f to bring up the “Find and Replace” dialog box.
- In my particular case I used this code in the “Find what:” field. -> \”\>(*)\<\/a\>
- In the “Search Options” portion on the bottom half of the dialog box you’ll need to check the box “Use wildcards”. This is what enables the regular expression functionality of Microsoft Word.
- Just to get a preview of your results click on the “Reading Highlight” drop-down box and choose “Highlight All”
- Once you’ve verified that you’re getting the results you want by choosing “Highlight All” to preview it, then you click on “Find in” and choose “Main Document”
- All of your highlighted text will now actually♦ be selected. Now right click on top of any of the selected areas in the document and click on “copy”.
- Press ctrl+n to create a new document.
- Press ctrl+v to paste what you’ve just copied to the clipboard into your new document.
- Press ctrl+f to bring up the “Find and Replace” dialog box again. Now click on the “Replace” tab.
- In the “Find what:” field make sure you still have this code -> \”\>(*)\<\/a\>
- “Wallah. She is clean.” (Jacques on “Finding Nemo”)
This was difficult information to find. I searched for a method using Notepad++, Vim for windows, and a few others, but finally found the easy way to do this using – of all things – Microsoft Word.
I’ll spare you the details of the other methods I attempted and just give you the detailed answer of how to accomplish this with Microsoft Word.
So I’ll use my scenario as the example for what you might want to use this for.
I had a bunch of html code from which I needed to extract every instance of anchor text between the <a href=”…”></a> tags. In this particular case here is how you accomplish this:
Your needs will most likely be different and you will need to research a bit if you’re unfamiliar with regular expressions. Even if you are familiar with regular expressions (as I am) you’ll still probably need to do a little research to get the exact expression that will work for you. I’m sure there are regex gurus out there who dream in regex, but in my case I find that I always have to spend more time than I would like just coming up with the syntax for the regular expression that will provide me the results I am looking for. Microsoft Word regular expressions.
Here are the results in my scenario (which is exactly what I want… well, not exactly. I’d like to just get the text in between the <a> and </a> tags, but I don’t know how to do that in one step. The only way I know to do it is to include the “> and the </a> at the beginning and end of the regular expression so that it captures what I’m looking for):
In the “Replace with:” field put this code -> “\1″ (without the quotation marks)
Make sure the “Use wildcards” checkbox is checked and then click “Replace All”
Kudos and thanks to Ivaylo for posting his workaround which led to my finding this solution and expanding on it here.







