Here is an example html output from a DOCX file imported into WordPress (in this case with the Mammoth docx converter plugin):
<h2><a id=”post-44034-_7ttglipb267h”></a>Header Text</h2>
Some Content here
<h3><a id=”post-44034-_lx7dlbt1tney”></a>Sub Header Text</h3>
Some Content here
The IDs will obviously vary but the common factor is that the <a> contains no content.
If we’re not referencing the headings in our own table of contents, we don’t need this code – and if we’re using a Plugin to create a TOC it will create its own IDs as required so we can remove these using Notepad++ using a RegEx (regular expression) in the Replace Dialog:
Ensure that you have checked the “RegEx” option on the bottom left search mode of the search/replace dialog and then search for the following:
<a[^>]*></a>
Explanation:
<a
: Matches the opening<a>
tag.[^>]*
: Matches any attributes inside the<a>
tag (if any).></a>
: Matches the closing</a>
tag, ensuring the tag has no content between them.
This regex will find all <a>
tags with no content inside them and can be used to remove those unwanted tags.