Web crawling and scraping : developing a sale-based website
Panta, Deepak (2015)
Panta, Deepak
Turun ammattikorkeakoulu
2015
All rights reserved
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-201505035716
https://urn.fi/URN:NBN:fi:amk-201505035716
Tiivistelmä
Most customers spend a fair amount of time searching various items that are on discount. Retailers try to attract as many customers as possible with discounts. While there are ample web sites that offer the service of providing information of discount on items such as clothes, electronics or household appliances, there hardly are websites that offer this service on grocery items.
This thesis describes how Scrapy is used to crawl and scrape discount information from major Finnish grocery stores and display the information in a web site using Django as framework. Since the items are on discount for limited period of time the crawlers and scrapers are scheduled to run on pre-defined intervals. Once the new items are available for scraping the old information is removed and updated with new information.
This thesis also explores the various aspects of the tools (Django, Scrapy, Dynamic Django Scraper, Celery) used to build the website. The outcome of the thesis is a fully functional sales-based web site that displays price, image and additional information about items along with the location of nearby markets in grid format.
This thesis describes how Scrapy is used to crawl and scrape discount information from major Finnish grocery stores and display the information in a web site using Django as framework. Since the items are on discount for limited period of time the crawlers and scrapers are scheduled to run on pre-defined intervals. Once the new items are available for scraping the old information is removed and updated with new information.
This thesis also explores the various aspects of the tools (Django, Scrapy, Dynamic Django Scraper, Celery) used to build the website. The outcome of the thesis is a fully functional sales-based web site that displays price, image and additional information about items along with the location of nearby markets in grid format.