Blog

Software Guides, Tutorials and News

How to Configure your Domain Filters and Website Blacklists

How to Configure your Domain Filters and Website Blacklists

In this guide, we will show you how to configure your domain filters and blacklists to ensure that the web scraper and email extractor are only harvesting website data that matches your niche and domain keyword filter keywords.

How to Configure your Domain Filters and Website Blacklists


Go to Settings and Domain Filters. You will see three columns. Inside the first column, you will need to enter a list of keywords that must be present inside the target website's url. You will need to think about your niche and pick out the most popular keywords. For example, if we want to scrape all hemp and cbd shops, we would enter keywords like cbd  and hemp as these are the two keywords that define the entire cbd industry. However, you should be aware that not all websites will contain these keywords. For instance, a lot of websites are branded and may contain different names. Therefore, whilst the domain name filters will increase the relevance of your scraped results, it will equally trim the total number of results to domains that contain your keywords. It is advisable to think very carefully about your strategy and industry. Coming back to our previous example, https://justcbdstore.com and https://chillhempire.com contain our keywords and would be captured. But https://theeliquidboutique.co.uk would not be captured. Bear in mind that some websites may contain data about your business niche but may not be from your niche.

Inside the second column, you can enter a list of keywords that a url must not contain. The web scraper will just skip any domains containing these keywords. 

In the third column, you can enter websites that you would like the website scraping software to skip. Such websites could include large marketplaces such as Ebay and Amazon as well as known spam and malware sites, PBNs, news sites and magazine and so on. Likewise, you can exclude all websites inside your unsubscribe list. 

No comments yet...

Leave your comment

49125

Character Limit 400