#Removing HTML Tags from a String with Python
This tutorial will demonstrate two different methods as to how one can remove html tags from a string such as the one that we retrieved in my previous tutorial on fetching a web page using Python
Method 1
This method will demonstrate a way that we can remove html tags from a string using regex strings.
import re
TAG_RE = re.compile(r'<[^>]+>')
def remove_tags(text):
return TAG_RE.sub('', text)
Method 2
This is another method we can use to remove html tags using functionality present in the Python Standard library so there is no need for any imports.
def remove_tags(text):
''.join(xml.etree.ElementTree.fromstring(text).itertext())
Conclusions
In the coming tutorials we will be learning how to calculate important seo metrics such as keyword density that will allow us to perform important seo analysis of competing sites to try and understand how they have achieved their success.
The methods for tag removal can be found here: http://stackoverflow.com/questions/9662346/python-code-to-remove-html-tags-from-a-string
Continue Learning
Working with Lists in Python - Tutorial
In this tutorial we will look at how we can work with lists in Python
Working With The File System in Python
In this tutorial we evaluate the different ways you can work with the file system in Python
Getting Started With Python
An absolute beginners introduction to the Python programming language
Functions and Variables in Python
In this tutorial we introduce the concept of both functions and variables and how we can use them in our Python programs