LinkedIn Profile Scraping using Selenium and BeautifulSoup
Extracting profile details from LinkedIn is a valuable task for marketing, lead generation, talent acquisition, and market research. In this project, you will build a Python scraping script that programmatically logs into your LinkedIn account, navigates to a target profile, and extracts structured fields such as name, location, job history, and education.
To build this tool, you will use Selenium WebDriver for browser automation (including handling login and scrolling to trigger lazy-loaded sections) and BeautifulSoup to parse and structure the HTML content.
Prerequisites and Setup
First, install the required packages:
pip install selenium beautifulsoup4 lxml
You can review the official documentation for these packages at the following links:
Next, download the browser driver matching your browser. For Google Chrome, retrieve the appropriate binary from the ChromeDriver Downloads page and save it in your project's root folder.
Additionally, create a config.txt file in your project root containing your LinkedIn credentials:
your_linkedin_email@example.com
your_secure_password
For a detailed video demonstration of the environment setup and bot automation, watch the video tutorial below:
This project builds directly on top of the authentication and connection workflows from the LinkedIn Auto Connect Bot tutorial.
The following imports bring in the necessary libraries.
import requests, time, random
from bs4 import BeautifulSoup
from selenium import webdriver
The code below opens the Chrome driver, navigates to the LinkedIn login page, reads credentials from config.txt, and performs the automated login. find_element_by_id() returns the first element matching the given id. send_keys() types text into a field. submit() submits the form.
browser = webdriver.Chrome('driver/chromedriver.exe')
browser.get('https://www.linkedin.com/uas/login')
file = open('config.txt')
lines = file.readlines()
username = lines[0]
password = lines[1]
elementID = browser.find_element_by_id('username')
elementID.send_keys(username)
elementID = browser.find_element_by_id('password')
elementID.send_keys(password)
elementID.submit()
link holds the URL of the profile to scrape. You can target any public profile, or loop over a list of links to scrape multiple profiles.
link = 'https://www.linkedin.com/in/rishabh-singh-61b706114/'
browser.get(link)
Watch Video for this blog:
The full profile is not loaded immediately. Only the visible portion loads on first render, so the script must scroll to the bottom to trigger all lazy-loaded sections. The code below scrolls to the end of the page.
SCROLL_PAUSE_TIME = 5
# Get scroll height
last_height = browser.execute_script("return document.body.scrollHeight")
for i in range(3):
# Scroll down to bottom
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = browser.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
With the full page loaded, retrieve the page source and parse it into a BeautifulSoup object using the lxml parser.
src = browser.page_source
soup = BeautifulSoup(src, 'lxml')
To extract anything from the webpage, inspect it by right-clicking and selecting 'inspect'.
The block containing basic information is represented by a div tag with class flex-1 mr5.
name_div = soup.find('div', {'class': 'flex-1 mr5'})
name_div
Rishabh Singh
3rd degree connection3rd
Rishabh has a account
#futureshaper
Bengaluru, Karnataka, India
500+ connections
Contact info
Inside name_div there are two ul tags. The first ul holds the name, and the second holds the location and connection count.
Fetch both ul tags with name_div.find_all('ul'). Find the li in the first ul using name_loc[0].find('li') and extract its text with get_text().
name_loc = name_div.find_all('ul')
name = name_loc[0].find('li').get_text().strip()
name
'Rishabh Singh'
For the location, find the li in the second ul.
loc = name_loc[1].find('li').get_text().strip()
loc
'Bengaluru, Karnataka, India'
The profile title is in the h2 tag, extracted via name_div.find('h2').get_text().
profile_title = name_div.find('h2').get_text().strip()
profile_title
'#futureshaper'
The connection count is in the second li of the second ul. Find all li tags in the second ul with name_loc[1].find_all('li'), then get the text from the second one.
connection = name_loc[1].find_all('li')
connection = connection[1].get_text().strip()
connection
'500+ connections'
Append all collected fields to info.
info = []
info.append(link)
info.append(name)
info.append(profile_title)
info.append(loc)
info.append(connection)
info
['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections']
Experience
The experience section is accessible via a section tag with id experience-section.
exp_section = soup.find('section', {'id': 'experience-section'})
exp_section
Experience
FPGA Engineer
Company Name
Honeywell
Dates Employed
Aug 2019 – Present
Employment Duration
1 yr 2 mos
Location
Bengaluru Area, India
FPGA Design Engineer
Company Name
L&T Technology Services Limited
Full-time
Dates Employed
Jan 2017 – Jul 2019
Employment Duration
2 yrs 7 mos
Location
Bengaluru Area, India
From exp_section, get the first ul, then the first div inside it, and then the first a tag inside that div.
exp_section = exp_section.find('ul')
div_tag = exp_section.find('div')
a_tag = div_tag.find('a')
a_tag
FPGA Engineer
Company Name
Honeywell
Dates Employed
Aug 2019 – Present
Employment Duration
1 yr 2 mos
Location
Bengaluru Area, India
Extract the job title from the h3 tag.
job_title = a_tag.find('h3').get_text().strip()
job_title
'FPGA Engineer'
The company name is in the second p tag, accessed via a_tag.find_all('p')[1].get_text().
company_name = a_tag.find_all('p')[1].get_text().strip()
company_name
'Honeywell'
For the joining date, extract the first h4 tag using a_tag.find_all('h4')[0], then get the second span from it.
joining_date = a_tag.find_all('h4')[0].find_all('span')[1].get_text().strip()
joining_date
'Aug 2019 – Present'
For the duration, extract the second h4 tag and its second span.
exp = a_tag.find_all('h4')[1].find_all('span')[1].get_text().strip()
exp
'1 yr 2 mos'
Append the scraped experience data to info.
info
['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections']
info.append(company_name)
info.append(job_title)
info.append(joining_date)
info.append(exp)
info
['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections', 'Honeywell', 'FPGA Engineer', 'Aug 2019 – Present', '1 yr 2 mos']
Education
The education section uses a section tag with id education-section. Retrieve the ul tag inside it to access all education entries.
edu_section = soup.find('section', {'id': 'education-section'}).find('ul')
edu_section
Technocrats Institute of Technology (Excellence), Anand Nagar, PB No. 24, Post Piplani, BHEL, Bhopal - 462021
Degree Name
Bachelor of Engineering (B.E.)
Field Of Study
Electrical, Electronics and Communications Engineering
Grade
FIRST
Dates attended or expected graduation
2012 – 2016
S.H.S.B.B
Field Of Study
PCM
The college name is directly in the h3 tag.
college_name = edu_section.find('h3').get_text().strip()
college_name
'Technocrats Institute of Technology (Excellence), Anand Nagar, PB No. 24, Post Piplani, BHEL, Bhopal - 462021'
The degree name is in the second span of the p tag with class pv-entity__secondary-title pv-entity__degree-name t-14 t-black t-normal.
degree_name = edu_section.find('p', {'class': 'pv-entity__secondary-title pv-entity__degree-name t-14 t-black t-normal'}).find_all('span')[1].get_text().strip()
degree_name
'Bachelor of Engineering (B.E.)'
The field of study is in the second span of the p tag with class pv-entity__secondary-title pv-entity__fos t-14 t-black t-normal.
stream = edu_section.find('p', {'class': 'pv-entity__secondary-title pv-entity__fos t-14 t-black t-normal'}).find_all('span')[1].get_text().strip()
stream
'Electrical, Electronics and Communications Engineering'
The graduation years are in the second span of the p tag with class pv-entity__dates t-14 t-black--light t-normal.
degree_year = edu_section.find('p', {'class': 'pv-entity__dates t-14 t-black--light t-normal'}).find_all('span')[1].get_text().strip()
degree_year
'2012 – 2016'
Append the education fields to info.
info
['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections', 'Honeywell', 'FPGA Engineer', 'Aug 2019 – Present', '1 yr 2 mos']
info.append(college_name)
info.append(degree_name)
info.append(stream)
info.append(degree_year)
info
['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections', 'Honeywell', 'FPGA Engineer', 'Aug 2019 – Present', '1 yr 2 mos', 'Technocrats Institute of Technology (Excellence), Anand Nagar, PB No. 24, Post Piplani, BHEL, Bhopal - 462021', 'Bachelor of Engineering (B.E.)', 'Electrical, Electronics and Communications Engineering', '2012 – 2016', 'Technocrats Institute of Technology (Excellence), Anand Nagar, PB No. 24, Post Piplani, BHEL, Bhopal - 462021', 'Bachelor of Engineering (B.E.)', 'Electrical, Electronics and Communications Engineering', '2012 – 2016']
All target data points from the LinkedIn profile have been scraped. This modular script can be extended inside a loop to scrape multiple profiles sequentially.
Warning
Web page structures, dynamic class names, and element IDs on LinkedIn change frequently. If the script fails to locate elements, inspect the target page and update the Selenium selectors accordingly.
Conclusion
In this tutorial, you built a LinkedIn profile scraper using Selenium and BeautifulSoup to extract professional histories, education, locations, and connection stats.
Key takeaways:
- Constraining scraping to official APIs is always preferred, but browser automation via Selenium is effective for fetching dynamic, lazy-loaded components.
- Simulating bottom-scrolling behavior ensures that all delayed elements (like historical job details and academic timelines) load properly in the DOM.
- BeautifulSoup parses structural data reliably using DOM attributes, class properties, and hierarchy nesting.
- Saving scraped information into lists simplifies downstream processing, such as database insertion or CSV conversion.
Next steps:
- Read the LinkedIn Auto Connect Bot tutorial to learn how to automate outreach workflows using these scraped profile targets.
- Extend this script to export the scraped profiles directly into a local
.csvfile or a relational database for further data analysis. - Implement explicit wait structures (
WebDriverWait) in Selenium to replace statictime.sleep()calls, improving execution speed and reliability.