How to Scrape Newegg.com for laptop price and spec via Python Scrapy and import the data into SQLite ? (Free)

DigNo Ape
3 min readDec 3, 2020

--

Purpose

Search and grab the laptop information in the newegg website via Scrapy and store the data into the SQLite so that we can easily play around with the data.

See the full version LINK.

Tool

  • Python package: scrapy/ sqlite3/json
  • SQLite

Target Website

Steps

  1. Let’s set up the Scrapy environment first.
  • You can refer to my another post to set up the environment. LINK

2. Go through the structure of the website to locate where the target elements are.

  • When you inspect the website, you can pick price of an example and look it up across the source codes. You will find the information located in the tag of <class=”item-cell”>.
  • Let’s search for the detailed info like model#, item description, price, etc.

a. model#: Take an example and search on where it resides. It looks like it is located under <ul class=”item-features”> / <li> where text of <strong> equal “Model #:”.

b. item description: It is located under the text of <a class=”item-title”>.

c. price: It is located under the text of <a class=”item-title”>.

  • Handle page change: Unlike the previous session for B&H, it does not show the components to allow us to grab the url of next page and use it from your spider callbacks. As a result, we are going to leverage the pattern of url and for loop to jump to the next page.

3. “crawler.py”:

  • Go to command prompt under your project folder and type “scrapy crawl newegg_scrapy -o newegg_scrapy.json -t json”, which will run the scripts and store the data in the json format.

4. Import json file into the table “table_egg” in SQLite DB ”web_scrapy

Here is the example of data in SQLite DB.

Thank you! Enjoy it:)

--

--

DigNo Ape
DigNo Ape

Written by DigNo Ape

我們秉持著從原人進化的精神,不斷追求智慧的累積和工具的運用來提升生產力。我們相信,每一個成員都擁有無限的潛力,透過學習和實踐,不斷成長和進步。

No responses yet