Setup for Web Scrapping Project

Whenever you want any information, you Google it, which offers the most relevant answer to your search. You can view the data you needed, but what if you need to save it locally? What if you want to see the data of a hundred more pages? 

Most of the web pages present on the internet don’t offer the option to save the data present there locally. To keep it that way, you’ll have to copy and paste everything manually, which is very tedious. Moreover, when you have to save the data of hundreds (sometimes, thousands) of web pages, this task can seem strenuous. You might end up spending days just copy-pasting bits from different websites. Here we use Web Scrapping, Data Scraping, Web Harvesting, etc.


Let's create a setup for a Web Scraping Project. Navigate to the directory where Virtual Environment has been created and after activating the env, install all the required libraries.

pip install requests
pip install bs4
pip install html5lib


It will take just a few seconds and you will be good to go.

Requests Library: for making HTTP requests to a specified URL
Bs4 Library     for pulling out data of the HTML & XML files.
Html5lib Library: a pure-python library for parsing HTML

Create a new project in pycharm.


Set the already created virtual environment in project settings. 
File> Settings> Project:WebScrapingProject> Python Interpreter
Expand the list of the available interpreters and click the Show All link. Alternatively, click the The Configure project interpreter icon and select Show All or Add manually.




Below we can see the libraries we installed previously.


It's ready to start writing the script now.












Comments

Post a Comment