Python — Extracting Email addresses and domain names from strings

Ryan Arjun
2 min readFeb 27, 2020

As a python developers, we have to accomplished a lot of jobs such as data cleansing from a file before processing the other business operations.

For an example, you have a raw data text file and you have to read some specific data like email addresses and domain names by to performing the actual Regular Expression matching.

What is a Regular Expression and which module is used in Python?

Regular expression is a sequence of special character(s) mainly used to find and replace patterns in a string or file, using a specialized syntax held in a pattern.

The Python module re provides full support for Perl-like regular expressions in Python. The re module raises the exception re.error if an error occurs while compiling or using a regular expression.

pandas is a Python package providing fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

Example — Python program to extract emails and domain names from the String By Regular Expression.

# Importing module required for regular expressions 
import re
import pandas as pd

# Example string
txt = “Ryan has sent an invoice email to john.d@yahoo.com by using his email id ryan.arjun@gmail.com and he also shared a copy to his boss rosy.gray@amazon.co.uk on the cc part.”

# \w matches any non-whitespace character
# @ for as in…

--

--

Ryan Arjun
Ryan Arjun

Written by Ryan Arjun

BI Specialist || Azure || AWS || GCP — SQL|Python|PySpark — Talend, Alteryx, SSIS — PowerBI, Tableau, SSRS

Responses (1)