This repository contains the web scraper I used to crawl the Utopia.de website to collect German-language online user reviews of organic/fair trade coffee.
The dataset is available on Kaggle: https://www.kaggle.com/mldado/german-online-reviewsratings-of-organic-coffee
The scraper will collect the following data:
- brand name of the coffee being reviewed
- user rating of the coffee (1-5 stars)
- user review in German
There aren't that many NLP datasets in German. This one is a little small, but should be enough to try out some sentiment analysis and other advancesd techniques like aspect-based sentiment analysis. It would be interesting to extract features that represent the preferences of German coffee drinkers, why they chose to buy organic/fair trade coffee brands over conventional ones, and maybe even find out what differentiates a 5-star coffee from 'just' a 4-star coffee.