Python — Extracting Domain Name From URLs Using Regular Expressions

Ryan Arjun
2 min readFeb 26, 2020

As a python developers/programmers, we have to accomplished a lot of data cleansing jobs from a file before processing the other business operations.

For an example, you have a raw data text file containing web scrapping data and you have to read some specific data like website URLs by to performing the actual Regular Expression matching to pull the domain names.

Extracting the Domain name accurately can be quite tricky mainly because the domain extension can contain 2 parts (like .com.au or .co.uk) and the sub domain (the prefix) may or may not be there.

The hard part is knowing if the name is at the second or third level or so on.

What is a Regular Expression and which module is used in Python?

Regular expression is a sequence of special character(s) mainly used to find and replace patterns in a string or file, using a specialized syntax held in a pattern.

The Python module re provides full support for Perl-like regular expressions in Python. The re module raises the exception re.error if an error occurs while compiling or using a regular expression.

Example -

# Python program to extract domain names from the list of website URLs By Regular Expression.

# Importing module required for regular expressions

import re

--

--

Ryan Arjun
Ryan Arjun

Written by Ryan Arjun

BI Specialist || Azure || AWS || GCP — SQL|Python|PySpark — Talend, Alteryx, SSIS — PowerBI, Tableau, SSRS

No responses yet