LinkedIn Profile Scraping using Selenium and BeautifulSoup
Extracting profile details from LinkedIn is a valuable task for marketing, lead generation, talent acquisition, and market research. In this project, you will build a Python scraping script that programmatically logs into your LinkedIn account, navigates to a target profile, and extracts structured fields such as name, location, job history, and education.
To build this tool, you will use Selenium WebDriver for browser automation (including handling login and scrolling to trigger lazy-loaded sections) and BeautifulSoup to parse and structure the HTML content.
Prerequisites and Setup
First, install the required packages:
pip install selenium beautifulsoup4 lxml
You can review the official documentation for these packages at the following links:
Next, download the browser driver matching your browser. For Google Chrome, retrieve the appropriate binary from the ChromeDriver Downloads page and save it in your project's root folder.
Additionally, create a config.txt file in your project root containing your LinkedIn credentials:
your_linkedin_email@example.com
your_secure_password
For a detailed video demonstration of the environment setup and bot automation, watch the video tutorial below:
This project builds directly on top of the authentication and connection workflows from the LinkedIn Auto Connect Bot tutorial.
Here we have imported the necessary libraries.
import requests, time, random
from bs4 import BeautifulSoup
from selenium import webdriver
Here we are getting the address of the Google Chrome driver using browser = webdriver.Chrome('driver/chromedriver.exe'). Then we will open the LinkedIn login page using browser.get(). We will open the config.txt file which we have created and read the username and password from the file.
Now we have to automate the login process. For that, we will have to check the id of the textboxes which accept the username and password on the webpage. We can do this by right-clicking anywhere on the webpage and then clicking on 'inspect'. After doing this you will see that the id of the username textbox is username and the id of password textbox is password.
find_element_by_id() returns the first element with the id attribute value matching the location. send_keys() method is used to send text to any field, such as input field of a form or even to anchor tag paragraph, etc. It replaces its contents on the webpage in your browser. submit() method is used to submit a form after you have sent data to a form.
browser = webdriver.Chrome('driver/chromedriver.exe')
browser.get('https://www.linkedin.com/uas/login')
file = open('config.txt')
lines = file.readlines()
username = lines[0]
password = lines[1]
elementID = browser.find_element_by_id('username')
elementID.send_keys(username)
elementID = browser.find_element_by_id('password')
elementID.send_keys(password)
elementID.submit()
link contains the link of the profile we want to scrap. You can scrap any profile of your choice or you can even scrap multiple links using a for loop.
link = 'https://www.linkedin.com/in/rishabh-singh-61b706114/'
browser.get(link)
Watch Video for this blog:
The whole profile doesn't get loaded at the start. Only the part which we can see is loaded. So we will have to scroll the profile till the end so that the complete profile is loaded. The code given below scrolls the profile till the end.
SCROLL_PAUSE_TIME = 5
# Get scroll height
last_height = browser.execute_script("return document.body.scrollHeight")
for i in range(3):
# Scroll down to bottom
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = browser.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
Now as the full page is loaded, you are ready to get the page source. We will use the lxml parser and the source code in a BeautifulSoup object soup.
src = browser.page_source
soup = BeautifulSoup(src, 'lxml')
To extract anything from the webpage we will have to inspect the webpage. We can do this by right-clicking anywhere on the webpage and then clicking on 'inspect'.
The block containing the basic information is represented using the div tag with class name as flex-1 mr5.
name_div = soup.find('div', {'class': 'flex-1 mr5'})
name_div
Rishabh Singh
3rd degree connection3rd
Rishabh has a account
#futureshaper
Bengaluru, Karnataka, India
500+ connections
Contact info
We will first get the name. As you can see name_div there are 2 ul tags. The first ul consists of the name and the second ul consists of the location and no. of connections.
Here we will first get both the ul tags using name_div.find_all('ul'). We will find the li in the first ul tag using name_loc[0].find('li') and get the text enclosed in it using get_text().
name_loc = name_div.find_all('ul')
name = name_loc[0].find('li').get_text().strip()
name
'Rishabh Singh'
Simillarly, for the location we will find the li in the second ul tag.
loc = name_loc[1].find('li').get_text().strip()
loc
'Bengaluru, Karnataka, India'
The profile title is enclosed in the h2 tag. So we can extract it using name_div.find('h2').get_text().
profile_title = name_div.find('h2').get_text().strip()
profile_title
'#futureshaper'
The no. of connections is in 2nd li of the 2nd ul. Hence first we will find all the li tags in the second ul using name_loc[1].find_all('li'). Then we will get the text from the second li tag using connection[1].get_text().
connection = name_loc[1].find_all('li')
connection = connection[1].get_text().strip()
connection
'500+ connections'
We will append everything we have scrapped till now in info.
info = []
info.append(link)
info.append(name)
info.append(profile_title)
info.append(loc)
info.append(connection)
info
['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections']
Experience
Now we will scrap the information under the experience section in the profile. We can access the experience section using the tag section and id experience-section.
exp_section = soup.find('section', {'id': 'experience-section'})
exp_section
Experience
FPGA Engineer
Company Name
Honeywell
Dates Employed
Aug 2019 – Present
Employment Duration
1 yr 2 mos
Location
Bengaluru Area, India
FPGA Design Engineer
Company Name
L&T Technology Services Limited
Full-time
Dates Employed
Jan 2017 – Jul 2019
Employment Duration
2 yrs 7 mos
Location
Bengaluru Area, India
From exp_section we are going to get the first ul tag. Then from the first ul tag we are going to get the first div tag. Then from the first div tag we are going to get the first a tag.
exp_section = exp_section.find('ul')
div_tag = exp_section.find('div')
a_tag = div_tag.find('a')
a_tag
FPGA Engineer
Company Name
Honeywell
Dates Employed
Aug 2019 – Present
Employment Duration
1 yr 2 mos
Location
Bengaluru Area, India
We can extract the job title using h3 tag.
job_title = a_tag.find('h3').get_text().strip()
job_title
'FPGA Engineer'
The company name is enclosed by the 2nd p tag. Hence we can get it by a_tag.find_all('p')[1].get_text().
company_name = a_tag.find_all('p')[1].get_text().strip()
company_name
'Honeywell'
For the joining date we will extract the first h4 tag using a_tag.find_all('h4')[0]. Then we will get the second span from the first h4 using find_all('span')[1].
joining_date = a_tag.find_all('h4')[0].find_all('span')[1].get_text().strip()
joining_date
'Aug 2019 – Present'
For the duration we will extract the second h4 tag using a_tag.find_all('h4')[1]. Then we will get the second span using find_all('span')[1].
exp = a_tag.find_all('h4')[1].find_all('span')[1].get_text().strip()
exp
'1 yr 2 mos'
We will append all the scrapped data to info.
info
['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections']
info.append(company_name)
info.append(job_title)
info.append(joining_date)
info.append(exp)
info
['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections', 'Honeywell', 'FPGA Engineer', 'Aug 2019 – Present', '1 yr 2 mos']
Education
Now we will move to the education section. We can extract it using the section tag having id as education-section. Then we will get the ul tag which contains all the information.
edu_section = soup.find('section', {'id': 'education-section'}).find('ul')
edu_section
Technocrats Institute of Technology (Excellence), Anand Nagar, PB No. 24, Post Piplani, BHEL, Bhopal - 462021
Degree Name
Bachelor of Engineering (B.E.)
Field Of Study
Electrical, Electronics and Communications Engineering
Grade
FIRST
Dates attended or expected graduation
2012 – 2016
S.H.S.B.B
Field Of Study
PCM
We can get the name of the college directly using the h3 tag.
college_name = edu_section.find('h3').get_text().strip()
college_name
'Technocrats Institute of Technology (Excellence), Anand Nagar, PB No. 24, Post Piplani, BHEL, Bhopal - 462021'
We will get the name of the degree from the second span of the p tag with class pv-entity__secondary-title pv-entity__degree-name t-14 t-black t-normal.
degree_name = edu_section.find('p', {'class': 'pv-entity__secondary-title pv-entity__degree-name t-14 t-black t-normal'}).find_all('span')[1].get_text().strip()
degree_name
'Bachelor of Engineering (B.E.)'
We will get the stream from the second span of the p tag with class pv-entity__secondary-title pv-entity__fos t-14 t-black t-normal.
stream = edu_section.find('p', {'class': 'pv-entity__secondary-title pv-entity__fos t-14 t-black t-normal'}).find_all('span')[1].get_text().strip()
stream
'Electrical, Electronics and Communications Engineering'
We will get the years of degree from the second span of the p tag with class pv-entity__dates t-14 t-black--light t-normal.
degree_year = edu_section.find('p', {'class': 'pv-entity__dates t-14 t-black--light t-normal'}).find_all('span')[1].get_text().strip()
degree_year
'2012 – 2016'
We will append everything we have scrapped in info.
info
['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections', 'Honeywell', 'FPGA Engineer', 'Aug 2019 – Present', '1 yr 2 mos']
info.append(college_name)
info.append(degree_name)
info.append(stream)
info.append(degree_year)
info
['https://www.linkedin.com/in/rishabh-singh-61b706114/', 'Rishabh Singh', '#futureshaper', 'Bengaluru, Karnataka, India', '500+ connections', 'Honeywell', 'FPGA Engineer', 'Aug 2019 – Present', '1 yr 2 mos', 'Technocrats Institute of Technology (Excellence), Anand Nagar, PB No. 24, Post Piplani, BHEL, Bhopal - 462021', 'Bachelor of Engineering (B.E.)', 'Electrical, Electronics and Communications Engineering', '2012 – 2016', 'Technocrats Institute of Technology (Excellence), Anand Nagar, PB No. 24, Post Piplani, BHEL, Bhopal - 462021', 'Bachelor of Engineering (B.E.)', 'Electrical, Electronics and Communications Engineering', '2012 – 2016']
You have successfully scraped all of the target data points from the LinkedIn profile. This modular script can be extended inside a loop to scrape multiple profiles sequentially.
Warning
Web page structures, dynamic class names, and element IDs on LinkedIn change frequently. If the script fails to locate elements, inspect the target page and update the Selenium selectors accordingly.
Conclusion
In this tutorial, you built a robust LinkedIn profile scraper using Selenium and BeautifulSoup to extract professional histories, education, locations, and connection stats.
Key takeaways:
- Constraining scraping to official APIs is always preferred, but browser automation via Selenium is highly effective for fetching dynamic, lazy-loaded components.
- Simulating bottom-scrolling behavior ensures that all delayed elements (like historical job details and academic timelines) load properly in the DOM.
- BeautifulSoup parses structural data reliably using DOM attributes, class properties, and hierarchy nesting.
- Saving scraped information into structured structures like lists simplifies downstream processes (such as database insertion or CSV conversion).
Next steps:
- Read the LinkedIn Auto Connect Bot tutorial to learn how to automate outreach workflows using these scraped profile targets.
- Extend this script to export the scraped profiles directly into a local
.csvfile or a relational database for further data analysis. - Implement explicit wait structures (
WebDriverWait) in Selenium to replace statictime.sleep()calls, improving execution speed and robustness.
