DRKSpiderJava is a website crawler stand-alone tool for finding broken links and inspecting a website structure. It builds a tree representing the hierarchical page distribution inside the site. Analyzing every link found, including those which point to another domain. Crawling is limited by external links, a maximum depth level given by the user, and the optional setting for obeying robots.txt site definition. DRKSpiderJava can keep site's content in memory (optional) for doing global search. Once the spider finishes its work, the user can export a sitemap, a list of all links and the list of broken links. Every list provides a different amount of information according to the common usage. The output can be plain text or CSV files.
When a node in the tree is selected, DRKSpiderJava displays contextual information in the right panel. For HTML nodes there is a detailed set of items about document metadata, along with the list of links found. Link information is divided into pieces like: link tag, URL, anchor text, nofollow status, link status code, and depth. Clicking over a link in the list selects the text in the document source. This makes easy locating and fixing problems.
There are two advanced search dialogs: one for link searching and other for nodes. The difference between links and nodes is related to the user interface and DRKSpiderJava internal matters. Links are those links shown in the HTML panel, and nodes are items in the navigation tree. There is just one node for every URL in the website. But (usually) there are many links pointing to the same URL; all the links pointing to the same URL resolve to the same node, and therefore have the same HTTP status code.
This is the first HD video tutorial for DRKSpiderJava. Use full screen mode for better readability
This video was made using CamStudio - Free Streaming Video Desktop Recording Software
The link search tool allows searching by tag, URL, anchor text, nofollow property, internal/external and error status. Results display over a detailed table which can be exported to CSV file. The node search tool filters by title, description, keywords, author and robots meta tag. The result window can locate the node in the tree.
The crawler uses a single thread. While this makes the process slower it ensures building the navigation tree according to the order links appear in the website. And allows accurate link-depth computation. The crawl result can be saved to a file. After reloading a tree it is possible to recheck broken links only, or the whole tree.
The SEO tool does a basic analysis of title, description and keywords matching the page content. A 100% score means all the words in these fields were found in page content. It doesn't detect over optimization. This feature will be improved as time goes on. But keep in mind that DRKSpiderJava isn't mean to be a SEO tool. The SEO analysis is there to help the user make the site more search engine friendly.
DRKSpiderJAVA is a open source software made in Java, which means you can use it, modify it and distribute it to your friends, freely. And it's multiplatform; you can run it on Windows, Linux, Mac or any other platform supporting Java SE virtual machine.
In order to run DRKSpiderJava you have to download and install Java 1.6 or over. The binary distribution is portable and self contained into a single directory. Any comments, feedback or questions are welcome: Feedback form.
DRKSpiderJava.zip - 1.5 MBytes - July 30, 2014
Download and unzip, then execute drkspider.cmd or drkspider.sh
DRKSpiderJavaSrc.7z - 1.3 MBytes - July 22, 2014
DRKSpider_setup.exe - 1.1 MBytes - August 26, 2013
DRKSpider-src.7z - 80.1 kBytes - August 26, 2013
DRKSpider is licensed under the GNU General Public License
SWFRIP add-in is published under the GNU General Public License
ZLIB library is published under a propietary license