LinkedIn Profile Scraper in Python

LinkedIn Profile Scraping using Selenium and BeautifulSoup

Extracting profile details from LinkedIn is a valuable task for marketing, lead generation, talent acquisition, and market research. In this blog, we build a Python scraping script. It logs into our LinkedIn account, navigates to a target profile, and extracts structured fields such as name, location, job history, and education.

To build this tool, we use Selenium WebDriver for browser automation (including handling login and scrolling to trigger lazy-loaded sections) and BeautifulSoup to parse and structure the HTML content.

Prerequisites and Setup

First, install the required packages:

BASH

pip install selenium beautifulsoup4 lxml

We can review the official documentation for these packages at the following links:

Next, download the browser driver matching our browser. For Google Chrome, retrieve the appropriate binary from the ChromeDriver Downloads page and save it in the project root folder.

Additionally, create a config.txt file in the project root containing our LinkedIn credentials:

PLAINTEXT

your_linkedin_email@example.com
your_secure_password

For a detailed video demonstration of the environment setup and bot automation, watch the video tutorial below:

This project builds directly on top of the authentication and connection workflows from the LinkedIn Auto Connect Bot tutorial.

The following imports bring in the necessary libraries.

PYTHON

import requests, time, random
from bs4 import BeautifulSoup
from selenium import webdriver

The code below opens the Chrome driver, navigates to the LinkedIn login page, reads credentials from config.txt, and performs the automated login. find_element_by_id() returns the first element matching the given id. send_keys() types text into a field. submit() submits the form.

PYTHON

browser = webdriver.Chrome('driver/chromedriver.exe')
browser.get('https://www.linkedin.com/uas/login')
file = open('config.txt')
lines = file.readlines()
username = lines[0]
password = lines[1]

elementID = browser.find_element_by_id('username')
elementID.send_keys(username)

elementID = browser.find_element_by_id('password')
elementID.send_keys(password)

elementID.submit()

link holds the URL of the profile to scrape. We can target any public profile, or loop over a list of links to scrape multiple profiles.

PYTHON

link = 'https://www.linkedin.com/in/rishabh-singh-61b706114/'
browser.get(link)

Watch Video for this blog:

The full profile is not loaded immediately. Only the visible portion loads on first render, so the script must scroll to the bottom to trigger all lazy-loaded sections. The code below scrolls to the end of the page.

PYTHON

SCROLL_PAUSE_TIME = 5

# Get scroll height
last_height = browser.execute_script("return document.body.scrollHeight")

for i in range(3):
    # Scroll down to bottom
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = browser.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

With the full page loaded, retrieve the page source and parse it into a BeautifulSoup object using the lxml parser.

PYTHON

src = browser.page_source
soup = BeautifulSoup(src, 'lxml')

To extract anything from the webpage, inspect it by right-clicking and selecting 'inspect'.

The block containing basic information is represented by a div tag with class flex-1 mr5.

PYTHON

name_div = soup.find('div', {'class': 'flex-1 mr5'})
name_div

OUTPUT

Rishabh Singh


3rd degree connection3rd



  Rishabh has a  account



            #futureshaper


              Bengaluru, Karnataka, India


                  500+ connections



                  Contact info

Inside name_div there are two ul tags. The first ul holds the name, and the second holds the location and connection count.

Fetch both ul tags with name_div.find_all('ul'). Find the li in the first ul using name_loc[0].find('li') and extract its text with get_text().

PYTHON

name_loc = name_div.find_all('ul')
name = name_loc[0].find('li').get_text().strip()
name

OUTPUT

'Rishabh Singh'

For the location, find the li in the second ul.

PYTHON

loc = name_loc[1].find('li').get_text().strip()
loc

OUTPUT

'Bengaluru, Karnataka, India'

The profile title is in the h2 tag, extracted via name_div.find('h2').get_text().

PYTHON

profile_title = name_div.find('h2').get_text().strip()
profile_title

OUTPUT

'#futureshaper'

The connection count is in the second li of the second ul. Find all li tags in the second ul with name_loc[1].find_all('li'), then get the text from the second one.

PYTHON

connection = name_loc[1].find_all('li')
connection = connection[1].get_text().strip()
connection

OUTPUT

'500+ connections'

Append all collected fields to info.

PYTHON

info = []
info.append(link)
info.append(name)
info.append(profile_title)
info.append(loc)
info.append(connection)
info

OUTPUT

['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections']

Experience

The experience section is accessible via a section tag with id experience-section.

PYTHON

exp_section = soup.find('section', {'id': 'experience-section'})
exp_section

OUTPUT

Experience

FPGA Engineer
Company Name

      Honeywell


Dates Employed
Aug 2019 – Present

Employment Duration
1 yr 2 mos

Location
Bengaluru Area, India

FPGA Design Engineer
Company Name

      L&T Technology Services Limited
        Full-time

Dates Employed
Jan 2017 – Jul 2019

Employment Duration
2 yrs 7 mos

Location
Bengaluru Area, India

From exp_section, get the first ul, then the first div inside it, and then the first a tag inside that div.

PYTHON

exp_section = exp_section.find('ul')
div_tag = exp_section.find('div')
a_tag = div_tag.find('a')
a_tag

OUTPUT

FPGA Engineer
Company Name

      Honeywell


Dates Employed
Aug 2019 – Present

Employment Duration
1 yr 2 mos

Location
Bengaluru Area, India

Extract the job title from the h3 tag.

PYTHON

job_title = a_tag.find('h3').get_text().strip()
job_title

OUTPUT

'FPGA Engineer'

The company name is in the second p tag, accessed via a_tag.find_all('p')[1].get_text().

PYTHON

company_name = a_tag.find_all('p')[1].get_text().strip()
company_name

OUTPUT

'Honeywell'

For the joining date, extract the first h4 tag using a_tag.find_all('h4')[0], then get the second span from it.

PYTHON

joining_date = a_tag.find_all('h4')[0].find_all('span')[1].get_text().strip()
joining_date

OUTPUT

'Aug 2019 – Present'

For the duration, extract the second h4 tag and its second span.

PYTHON

exp = a_tag.find_all('h4')[1].find_all('span')[1].get_text().strip()
exp

OUTPUT

'1 yr 2 mos'

Append the scraped experience data to info.

PLAINTEXT

info

PLAINTEXT

['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections']

PYTHON

info.append(company_name)
info.append(job_title)
info.append(joining_date)
info.append(exp)
info

OUTPUT

['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections', 'Honeywell', 'FPGA Engineer', 'Aug 2019 – Present', '1 yr 2 mos']

Education

The education section uses a section tag with id education-section. Retrieve the ul tag inside it to access all education entries.

PYTHON

edu_section = soup.find('section', {'id': 'education-section'}).find('ul')
edu_section

OUTPUT

Technocrats Institute of Technology (Excellence), Anand Nagar, PB No. 24, Post Piplani, BHEL, Bhopal - 462021

Degree Name
Bachelor of Engineering (B.E.)

Field Of Study
Electrical, Electronics and Communications Engineering

Grade
FIRST

Dates attended or expected graduation

2012 – 2016




S.H.S.B.B

Field Of Study
PCM

The college name is directly in the h3 tag.

PYTHON

college_name = edu_section.find('h3').get_text().strip()
college_name

OUTPUT

'Technocrats Institute of Technology (Excellence), Anand Nagar, PB No. 24, Post Piplani, BHEL, Bhopal - 462021'

The degree name is in the second span of the p tag with class pv-entity__secondary-title pv-entity__degree-name t-14 t-black t-normal.

PYTHON

degree_name = edu_section.find('p', {'class': 'pv-entity__secondary-title pv-entity__degree-name t-14 t-black t-normal'}).find_all('span')[1].get_text().strip()
degree_name

OUTPUT

'Bachelor of Engineering (B.E.)'

The field of study is in the second span of the p tag with class pv-entity__secondary-title pv-entity__fos t-14 t-black t-normal.

PYTHON

stream = edu_section.find('p', {'class': 'pv-entity__secondary-title pv-entity__fos t-14 t-black t-normal'}).find_all('span')[1].get_text().strip()
stream

OUTPUT

'Electrical, Electronics and Communications Engineering'

The graduation years are in the second span of the p tag with class pv-entity__dates t-14 t-black--light t-normal.

PYTHON

degree_year = edu_section.find('p', {'class': 'pv-entity__dates t-14 t-black--light t-normal'}).find_all('span')[1].get_text().strip()
degree_year

OUTPUT

'2012 – 2016'

Append the education fields to info.

PLAINTEXT

info

PLAINTEXT

['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections', 'Honeywell', 'FPGA Engineer', 'Aug 2019 – Present', '1 yr 2 mos']

PYTHON

info.append(college_name)
info.append(degree_name)
info.append(stream)
info.append(degree_year)
info

OUTPUT

['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections', 'Honeywell', 'FPGA Engineer', 'Aug 2019 – Present', '1 yr 2 mos', 'Technocrats Institute of Technology (Excellence), Anand Nagar, PB No. 24, Post Piplani, BHEL, Bhopal - 462021', 'Bachelor of Engineering (B.E.)', 'Electrical, Electronics and Communications Engineering', '2012 – 2016', 'Technocrats Institute of Technology (Excellence), Anand Nagar, PB No. 24, Post Piplani, BHEL, Bhopal - 462021', 'Bachelor of Engineering (B.E.)', 'Electrical, Electronics and Communications Engineering', '2012 – 2016']

All target data points from the LinkedIn profile have been scraped. This modular script can be extended inside a loop to scrape multiple profiles sequentially.

Warning

Web page structures, dynamic class names, and element IDs on LinkedIn change frequently. If the script fails to locate elements, inspect the target page and update the Selenium selectors accordingly.

Conclusion

In this blog, we built a LinkedIn profile scraper using Selenium and BeautifulSoup to extract professional histories, education, locations, and connection stats.

Key takeaways:

Constraining scraping to official APIs is always preferred, but browser automation via Selenium is effective for fetching dynamic, lazy-loaded components.
Simulating bottom-scrolling behavior ensures that all delayed elements (like historical job details and academic timelines) load properly in the DOM.
BeautifulSoup parses structural data reliably using DOM attributes, class properties, and hierarchy nesting.
Saving scraped information into lists simplifies downstream processing, such as database insertion or CSV conversion.

Next steps:

Read the LinkedIn Auto Connect Bot tutorial to learn how to automate outreach workflows using these scraped profile targets.
Extend this script to export the scraped profiles directly into a local .csv file or a relational database for further data analysis.
Implement explicit wait structures (WebDriverWait) in Selenium to replace static time.sleep() calls, improving execution speed and reliability.

LinkedIn Profile Scraper in Python

LinkedIn Profile Scraping using Selenium and BeautifulSoup

Prerequisites and Setup

Experience

Education

Conclusion

Found this useful? Keep building with me.

Latest recommendations you might like

LinkedIn Auto Connect Bot

Download HD Wallpapers from Unsplash API

Feature Engineering: Variable Magnitude

Feature Engineering: Outlier Detection

Find this tutorial useful?

Discussion & Comments