Removing HTML Tags from a String with Python Image Removing HTML Tags from a String with Python

This tutorial will demonstrate two different methods as to how one can remove html tags from a string such as the one that we retrieved in my previous tutorial on fetching a web page using python.

Method 1

This method will demonstrate a way that we can remove html tags from a string using regex strings. 

import re

TAG_RE = re.compile(r'<[^>]+>')

def remove_tags(text):
    return TAG_RE.sub('', text)

Method 2

This is another method we can use to remove html tags using functionality present in the Python Standard library so there is no need for any imports.

def remove_tags(text):
    ''.join(xml.etree.ElementTree.fromstring(text).itertext())

Conclusions

In the coming tutorials we will be learning how to calculate important seo metrics such as keyword density that will allow us to perform important seo analysis of competing sites to try and understand how they have achieved their success.

The methods for tag removal can be found here: http://stackoverflow.com/questions/9662346/python-code-to-remove-html-tags-from-a-string

Elliot Forbes

Elliot Forbes
Twitter: @Elliot_f

Hey, I'm Elliot and I've been working on TutorialEdge for the last 4 years! If my work has helped you in any way, shape, or form then please consider supporting my work.

become a patron Buy Me A Coffee