How to Scrape Newegg.com for laptop price and spec via Python Scrapy and import the data into SQLite ? (Free)
Purpose
Search and grab the laptop information in the newegg website via Scrapy and store the data into the SQLite so that we can easily play around with the data.
See the full version LINK.
Tool
- Python package: scrapy/ sqlite3/json
- SQLite
Target Website
Steps
- Let’s set up the Scrapy environment first.
- You can refer to my another post to set up the environment. LINK
2. Go through the structure of the website to locate where the target elements are.
- When you inspect the website, you can pick price of an example and look it up across the source codes. You will find the information located in the tag of <class=”item-cell”>.
- Let’s search for the detailed info like model#, item description, price, etc.
a. model#: Take an example and search on where it resides. It looks like it is located under <ul class=”item-features”> / <li> where text of <strong> equal “Model #:”.
b. item description: It is located under the text of <a class=”item-title”>.
c. price: It is located under the text of <a class=”item-title”>.
- Handle page change: Unlike the previous session for B&H, it does not show the components to allow us to grab the url of next page and use it from your spider callbacks. As a result, we are going to leverage the pattern of url and for loop to jump to the next page.
3. “crawler.py”:
- Go to command prompt under your project folder and type “scrapy crawl newegg_scrapy -o newegg_scrapy.json -t json”, which will run the scripts and store the data in the json format.
4. Import json file into the table “table_egg” in SQLite DB ”web_scrapy”
Here is the example of data in SQLite DB.
Thank you! Enjoy it:)