Setup for Web Scrapping Project
Whenever you want any information, you Google it, which offers the most relevant answer to your search. You can view the data you needed, but what if you need to save it locally? What if you want to see the data of a hundred more pages?
Most of the web pages present on the internet don’t offer the option to save the data present there locally. To keep it that way, you’ll have to copy and paste everything manually, which is very tedious. Moreover, when you have to save the data of hundreds (sometimes, thousands) of web pages, this task can seem strenuous. You might end up spending days just copy-pasting bits from different websites. Here we use Web Scrapping, Data Scraping, Web Harvesting, etc.
Let's create a setup for a Web Scraping Project. Navigate to the directory where Virtual Environment has been created and after activating the env, install all the required libraries.
pip install requestspip install bs4pip install html5lib
It will take just a few seconds and you will be good to go.
Requests Library: for making HTTP requests to a specified URL
Bs4 Library: for pulling out data of the HTML & XML files.
Html5lib Library: a pure-python library for parsing HTML.
Create a new project in pycharm.
Set the already created virtual environment in project settings.
File> Settings> Project:WebScrapingProject> Python Interpreter
It helped me alot. Thanks
ReplyDelete