Home Developer Programming Language Framework Articles Contact Me

HTML CSS JAVASCRIPT JAVA PYTHON JQUERY REACTJS VUEJS ANGULARJS EMBERJS SASS LESS STYLUS GITHUB GITLAB BIGBUCKET JAVA KOTLIN HADOOP JDBC SERVLETS JAVA BEANS(EJB) JAVA SERVER PAGES(JSP) UNIX LINUX POSTGRESQL MSSQLSERVER SQLITE MONGODB XML FLASK PYTHON MYSQL JAVA C++ JUNIT SELENIUM

Extract Script and CSS Files from Web URL Using Python

Prerequisite:-

In this article, we will learn how how to extract Script and CSS Files from Web URL Using Python.

Libraries Required:-

  1. Requests
  2. BeautifulSoup

Installation:-

      
pip install requests
pip install bs4
    
Approach:-

Let's Understand Step by Step Implementation:-

Step 1(Import Libraries)

      
# Import Required Library
import requests
from bs4 import BeautifulSoup
    

Step 2 (Parse HTML Content)


# Web URL
web_url = "Enter Web URL"

# get HTML content
html = requests.get(web_url).content

# parse HTML Content
soup = BeautifulSoup(html, "html.parser")
  

Step 3 (Get JavaScript and CSS Files)


js_files = []
cs_files = []

for script in soup.find_all("script"):
    if script.attrs.get("src"):
        # if the tag has the attribute 'src'
        url = script.attrs.get("src")
        js_files.append(web_url+url)
    

for css in soup.find_all("link"):
    if css.attrs.get("href"):
        # if the link tag has the 'href' attribute
        _url = css.attrs.get("href")
        cs_files.append(web_url+_url)
  
Below is the Implementation:-
      
# Import Required Library
import requests
from bs4 import BeautifulSoup

# Web URL
web_url = "Enter Web URL"

# get HTML content
html = requests.get(web_url).content

# parse HTML Content
soup = BeautifulSoup(html, "html.parser")

js_files = []
cs_files = []

for script in soup.find_all("script"):
    if script.attrs.get("src"):
        # if the tag has the attribute 'src'
        url = script.attrs.get("src")
        js_files.append(web_url+url)
    

for css in soup.find_all("link"):
    if css.attrs.get("href"):
        # if the link tag has the 'href' attribute
        _url = css.attrs.get("href")
        cs_files.append(web_url+_url)
        
print(f"Total {len(js_files)} javascript files found")
print(f"Total {len(cs_files)} CSS files found")
    
Input:-
      
Web URL:- https://www.techroadmap.in/
    
Output:-
      
Total 2 javascript files found
Total 3 CSS files found
    

We can also use file handling to import fetched links into the text files.

      
with open("javajavascript_files.txt", "w") as f:
    for js_file in javascript_files:
        print(js_file, file=f)

with open("css_files.txt", "w") as f:
    for css_file in css_files:
        print(css_file, file=f)