Crawler – MrHacker

News

Jul 02, 2019

Google Open Sources Its ‘Web Crawler’ After 20 Years

Google’s Robot Exclusion Protocol (REP), also known as robots.txt, is a standard used by many websites to tell the automated crawlers which parts of the site should be crawled or not. However, it isn’t the officially adopted standard, leading to different interpretations. In a bid to make REP an official web standard, Google has open-sourced […]

root

Nov 23, 2018

ACHE – A Web Crawler For Domain-Specific Search

ACHE is a focused web crawler. It collects web pages that satisfy some specific criteria, e.g., pages that belong to a given domain or that contain a user-specified pattern. ACHE differs from generic crawlers in sense that it uses page classifiers to distinguish between relevant and irrelevant pages in a given domain. A page classifier […]

root

Information Gathering

Nov 16, 2018

Dirhunt v0.6.0 – Find Web Directories Without Bruteforce

DEVELOPMENT BRANCH: The current branch is a development version. Go to the stable release by clicking on the master branch. Dirhunt is a web crawler optimize for search and analyze directories. This tool can find interesting things if the server has the “index of” mode enabled. Dirhunt is also useful if the directory listing is […]

root

Security Tools

Oct 17, 2018

EKFiddle v.0.8.2 – A Framework Based On The Fiddler Web Debugger To Study Exploit Kits, Malvertising And Malicious Traffic In General

A framework based on the Fiddler web debugger to study Exploit Kits, malvertising and malicious traffic in general. Installation Download and install the latest version of Fiddlerhttps://www.telerik.com/fiddler Special instructions for Linux and Mac here:https://www.telerik.com/blogs/fiddler-for-linux-beta-is-herehttps://www.telerik.com/blogs/introducing-fiddler-for-os-x-beta-1 Enable C# scripting (Windows only) Launch Fiddler, and go to Tools -> Options In the Scripting tab, change the default (JScript.NET) […]

root

Vulnerability Analysis

Oct 28, 2017

Paskto – Passive Web Scanner

Paskto will passively scan the web using the Common Crawl internet index either by downloading the indexes on request or parsing data from your local system. URLs are then processed through Nikto and known URL lists to identify interesting content. Hash signatures are also used to identify known default content for some IoT devices or […]

root

Geek

Aug 12, 2015

How to Build a Basic Web Crawler in Python

Short Bytes: Web crawler is a program that browses the Internet (World Wide Web) in a predetermined, configurable and automated manner and performs given action on crawled content. Search engines like Google and Yahoo use spidering as a means of providing up-to-date data. Webhose.io, a company which provides direct access to live data from hundreds of […]

root