#aarya#beautifulsoup#kgptalkie#LinkedIn profile scrapper#python#selenium#web scrapping

LinkedIn Profile Scraper in Python

Scrape public LinkedIn profile data using Selenium and BeautifulSoup in Python. Covers automated login, profile extraction, and exporting structured results.

May 21, 2026 at 1:30 PM9 min readFollowFollow (Hindi)

Topics You Will Master

Automating LinkedIn login with Selenium WebDriver
Extracting profile fields: name, headline, experience, and skills
Parsing HTML with BeautifulSoup for structured data extraction
Saving scraped profile data to CSV or JSON
Best For

Python developers building data collection tools for LinkedIn profile analysis.

Expected Outcome

A Python scraper that logs into LinkedIn, visits profiles, and extracts structured profile data.

LinkedIn Profile Scraping using Selenium and BeautifulSoup

Extracting profile details from LinkedIn is a valuable task for marketing, lead generation, talent acquisition, and market research. In this project, you will build a Python scraping script that programmatically logs into your LinkedIn account, navigates to a target profile, and extracts structured fields such as name, location, job history, and education.

To build this tool, you will use Selenium WebDriver for browser automation (including handling login and scrolling to trigger lazy-loaded sections) and BeautifulSoup to parse and structure the HTML content.

Prerequisites and Setup

First, install the required packages:

BASH
pip install selenium beautifulsoup4 lxml

You can review the official documentation for these packages at the following links:

Next, download the browser driver matching your browser. For Google Chrome, retrieve the appropriate binary from the ChromeDriver Downloads page and save it in your project's root folder.

Additionally, create a config.txt file in your project root containing your LinkedIn credentials:

PLAINTEXT
your_linkedin_email@example.com
your_secure_password

For a detailed video demonstration of the environment setup and bot automation, watch the video tutorial below:

This project builds directly on top of the authentication and connection workflows from the LinkedIn Auto Connect Bot tutorial.

Here we have imported the necessary libraries.

PYTHON
import requests, time, random
from bs4 import BeautifulSoup
from selenium import webdriver

Here we are getting the address of the Google Chrome driver using browser = webdriver.Chrome('driver/chromedriver.exe'). Then we will open the LinkedIn login page using browser.get(). We will open the config.txt file which we have created and read the username and password from the file.

Now we have to automate the login process. For that, we will have to check the id of the textboxes which accept the username and password on the webpage. We can do this by right-clicking anywhere on the webpage and then clicking on 'inspect'. After doing this you will see that the id of the username textbox is username and the id of password textbox is password.

find_element_by_id() returns the first element with the id attribute value matching the location. send_keys() method is used to send text to any field, such as input field of a form or even to anchor tag paragraph, etc. It replaces its contents on the webpage in your browser. submit() method is used to submit a form after you have sent data to a form.

PYTHON
browser = webdriver.Chrome('driver/chromedriver.exe')
browser.get('https://www.linkedin.com/uas/login')
file = open('config.txt')
lines = file.readlines()
username = lines[0]
password = lines[1]

elementID = browser.find_element_by_id('username')
elementID.send_keys(username)

elementID = browser.find_element_by_id('password')
elementID.send_keys(password)

elementID.submit()

link contains the link of the profile we want to scrap. You can scrap any profile of your choice or you can even scrap multiple links using a for loop.

PYTHON
link = 'https://www.linkedin.com/in/rishabh-singh-61b706114/'
browser.get(link)

Watch Video for this blog:

The whole profile doesn't get loaded at the start. Only the part which we can see is loaded. So we will have to scroll the profile till the end so that the complete profile is loaded. The code given below scrolls the profile till the end.

PYTHON
SCROLL_PAUSE_TIME = 5

# Get scroll height
last_height = browser.execute_script("return document.body.scrollHeight")

for i in range(3):
    # Scroll down to bottom
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = browser.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

Now as the full page is loaded, you are ready to get the page source. We will use the lxml parser and the source code in a BeautifulSoup object soup.

PYTHON
src = browser.page_source
soup = BeautifulSoup(src, 'lxml')

To extract anything from the webpage we will have to inspect the webpage. We can do this by right-clicking anywhere on the webpage and then clicking on 'inspect'.

The block containing the basic information is represented using the div tag with class name as flex-1 mr5.

PYTHON
name_div = soup.find('div', {'class': 'flex-1 mr5'})
name_div
OUTPUT
Rishabh Singh


3rd degree connection3rd



  Rishabh has a  account



            #futureshaper


              Bengaluru, Karnataka, India


                  500+ connections



                  Contact info

We will first get the name. As you can see name_div there are 2 ul tags. The first ul consists of the name and the second ul consists of the location and no. of connections.

Here we will first get both the ul tags using name_div.find_all('ul'). We will find the li in the first ul tag using name_loc[0].find('li') and get the text enclosed in it using get_text().

PYTHON
name_loc = name_div.find_all('ul')
name = name_loc[0].find('li').get_text().strip()
name
OUTPUT
'Rishabh Singh'

Simillarly, for the location we will find the li in the second ul tag.

PYTHON
loc = name_loc[1].find('li').get_text().strip()
loc
OUTPUT
'Bengaluru, Karnataka, India'

The profile title is enclosed in the h2 tag. So we can extract it using name_div.find('h2').get_text().

PYTHON
profile_title = name_div.find('h2').get_text().strip()
profile_title
OUTPUT
'#futureshaper'

The no. of connections is in 2nd li of the 2nd ul. Hence first we will find all the li tags in the second ul using name_loc[1].find_all('li'). Then we will get the text from the second li tag using connection[1].get_text().

PYTHON
connection = name_loc[1].find_all('li')
connection = connection[1].get_text().strip()
connection
OUTPUT
'500+ connections'

We will append everything we have scrapped till now in info.

PYTHON
info = []
info.append(link)
info.append(name)
info.append(profile_title)
info.append(loc)
info.append(connection)
info
OUTPUT
['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections']

Experience

Now we will scrap the information under the experience section in the profile. We can access the experience section using the tag section and id experience-section.

PYTHON
exp_section = soup.find('section', {'id': 'experience-section'})
exp_section
OUTPUT
Experience

FPGA Engineer
Company Name

      Honeywell


Dates Employed
Aug 2019 – Present

Employment Duration
1 yr 2 mos

Location
Bengaluru Area, India

FPGA Design Engineer
Company Name

      L&T Technology Services Limited
        Full-time

Dates Employed
Jan 2017 – Jul 2019

Employment Duration
2 yrs 7 mos

Location
Bengaluru Area, India

From exp_section we are going to get the first ul tag. Then from the first ul tag we are going to get the first div tag. Then from the first div tag we are going to get the first a tag.

PYTHON
exp_section = exp_section.find('ul')
div_tag = exp_section.find('div')
a_tag = div_tag.find('a')
a_tag
OUTPUT
FPGA Engineer
Company Name

      Honeywell


Dates Employed
Aug 2019 – Present

Employment Duration
1 yr 2 mos

Location
Bengaluru Area, India

We can extract the job title using h3 tag.

PYTHON
job_title = a_tag.find('h3').get_text().strip()
job_title
OUTPUT
'FPGA Engineer'

The company name is enclosed by the 2nd p tag. Hence we can get it by a_tag.find_all('p')[1].get_text().

PYTHON
company_name = a_tag.find_all('p')[1].get_text().strip()
company_name
OUTPUT
'Honeywell'

For the joining date we will extract the first h4 tag using a_tag.find_all('h4')[0]. Then we will get the second span from the first h4 using find_all('span')[1].

PYTHON
joining_date = a_tag.find_all('h4')[0].find_all('span')[1].get_text().strip()
joining_date
OUTPUT
'Aug 2019 – Present'

For the duration we will extract the second h4 tag using a_tag.find_all('h4')[1]. Then we will get the second span using find_all('span')[1].

PYTHON
exp = a_tag.find_all('h4')[1].find_all('span')[1].get_text().strip()
exp
OUTPUT
'1 yr 2 mos'

We will append all the scrapped data to info.

PLAINTEXT
info
PLAINTEXT
['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections']
PYTHON
info.append(company_name)
info.append(job_title)
info.append(joining_date)
info.append(exp)
info
OUTPUT
['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections', 'Honeywell', 'FPGA Engineer', 'Aug 2019 – Present', '1 yr 2 mos']

Education

Now we will move to the education section. We can extract it using the section tag having id as education-section. Then we will get the ul tag which contains all the information.

PYTHON
edu_section = soup.find('section', {'id': 'education-section'}).find('ul')
edu_section
OUTPUT
Technocrats Institute of Technology (Excellence), Anand Nagar, PB No. 24, Post Piplani, BHEL, Bhopal - 462021

Degree Name
Bachelor of Engineering (B.E.)

Field Of Study
Electrical, Electronics and Communications Engineering

Grade
FIRST

Dates attended or expected graduation

2012 – 2016





S.H.S.B.B

Field Of Study
PCM

We can get the name of the college directly using the h3 tag.

PYTHON
college_name = edu_section.find('h3').get_text().strip()
college_name
OUTPUT
'Technocrats Institute of Technology (Excellence), Anand Nagar, PB No. 24, Post Piplani, BHEL, Bhopal - 462021'

We will get the name of the degree from the second span of the p tag with class pv-entity__secondary-title pv-entity__degree-name t-14 t-black t-normal.

PYTHON
degree_name = edu_section.find('p', {'class': 'pv-entity__secondary-title pv-entity__degree-name t-14 t-black t-normal'}).find_all('span')[1].get_text().strip()
degree_name
OUTPUT
'Bachelor of Engineering (B.E.)'

We will get the stream from the second span of the p tag with class pv-entity__secondary-title pv-entity__fos t-14 t-black t-normal.

PYTHON
stream = edu_section.find('p', {'class': 'pv-entity__secondary-title pv-entity__fos t-14 t-black t-normal'}).find_all('span')[1].get_text().strip()
stream
OUTPUT
'Electrical, Electronics and Communications Engineering'

We will get the years of degree from the second span of the p tag with class pv-entity__dates t-14 t-black--light t-normal.

PYTHON
degree_year = edu_section.find('p', {'class': 'pv-entity__dates t-14 t-black--light t-normal'}).find_all('span')[1].get_text().strip()
degree_year
OUTPUT
'2012 – 2016'

We will append everything we have scrapped in info.

PLAINTEXT
info
PLAINTEXT
['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections', 'Honeywell', 'FPGA Engineer', 'Aug 2019 – Present', '1 yr 2 mos']
PYTHON
info.append(college_name)
info.append(degree_name)
info.append(stream)
info.append(degree_year)
info
OUTPUT
['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections', 'Honeywell', 'FPGA Engineer', 'Aug 2019 – Present', '1 yr 2 mos', 'Technocrats Institute of Technology (Excellence), Anand Nagar, PB No. 24, Post Piplani, BHEL, Bhopal - 462021', 'Bachelor of Engineering (B.E.)', 'Electrical, Electronics and Communications Engineering', '2012 – 2016', 'Technocrats Institute of Technology (Excellence), Anand Nagar, PB No. 24, Post Piplani, BHEL, Bhopal - 462021', 'Bachelor of Engineering (B.E.)', 'Electrical, Electronics and Communications Engineering', '2012 – 2016']

You have successfully scraped all of the target data points from the LinkedIn profile. This modular script can be extended inside a loop to scrape multiple profiles sequentially.

Warning

Web page structures, dynamic class names, and element IDs on LinkedIn change frequently. If the script fails to locate elements, inspect the target page and update the Selenium selectors accordingly.

Conclusion

In this tutorial, you built a robust LinkedIn profile scraper using Selenium and BeautifulSoup to extract professional histories, education, locations, and connection stats.

Key takeaways:

  • Constraining scraping to official APIs is always preferred, but browser automation via Selenium is highly effective for fetching dynamic, lazy-loaded components.
  • Simulating bottom-scrolling behavior ensures that all delayed elements (like historical job details and academic timelines) load properly in the DOM.
  • BeautifulSoup parses structural data reliably using DOM attributes, class properties, and hierarchy nesting.
  • Saving scraped information into structured structures like lists simplifies downstream processes (such as database insertion or CSV conversion).

Next steps:

  • Read the LinkedIn Auto Connect Bot tutorial to learn how to automate outreach workflows using these scraped profile targets.
  • Extend this script to export the scraped profiles directly into a local .csv file or a relational database for further data analysis.
  • Implement explicit wait structures (WebDriverWait) in Selenium to replace static time.sleep() calls, improving execution speed and robustness.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments