RegEx To Replace Empty Tags with just IDs Using Notepad++

Here is an example html output from a DOCX file imported into WordPress (in this case with the Mammoth docx converter plugin):

<h2><a id=”post-44034-_7ttglipb267h”></a>Header Text</h2>
Some Content here
<h3><a id=”post-44034-_lx7dlbt1tney”></a>Sub Header Text</h3>
Some Content here

The IDs will obviously vary but the common factor is that the <a> contains no content.

If we’re not referencing the headings in our own table of contents, we don’t need this code – and if we’re using a Plugin to create a TOC it will create its own IDs as required so we can remove these using Notepad++ using a RegEx (regular expression) in the Replace Dialog:

The Notepad++ Find and replace showing our RegEX

Ensure that you have checked the “RegEx” option on the bottom left search mode of the search/replace dialog and then search for the following:

<a[^>]*></a>

Explanation:

<a: Matches the opening <a> tag.
[^>]*: Matches any attributes inside the <a> tag (if any).
></a>: Matches the closing </a> tag, ensuring the tag has no content between them.

This regex will find all <a> tags with no content inside them and can be used to remove those unwanted tags.

Disclaimer: The code on this website is provided "as is" and comes with no warranty. The author of this website does not accept any responsibility for issues arising from the use of code on this website. Before making any significant changes, ensure you take a backup of all files and do not work directly on a live/production website without thoughly testing your changes first.

RegEx To Replace Empty Tags with just IDs Using Notepad++

Leave a Reply Cancel reply