DRKSpiderJava is a website crawler stand-alone tool for finding broken links and inspecting a website structure. It builds a tree representing the hierarchical page distribution inside the site. Analyzing every link found, including those which point to another domain. Crawling is limited by external links, a maximum depth level given by the user, URL exclusion list, and the optional setting for obeying robots.txt site definition. DRKSpiderJava can keep site's content in memory (optional) for doing global searches in content. Once the spider finishes its work, the user can export a sitemap, a list of all links and the list of broken links. Every list provides a different amount of information according to the common usage. The output can be XML, plain text or CSV files.
When a node in the tree is selected, DRKSpiderJava displays contextual information in the right panel. For HTML nodes there is a detailed set of items about document metadata, along with the list of links found. Link information is divided into pieces like: link tag, URL, anchor text, nofollow status, link status code, and depth. Clicking over a link in the list selects the text in the document source. This makes easy locating and fixing problems.
There are two advanced search dialogs: one for link searching and other for nodes. The difference between links and nodes is related to the user interface and DRKSpiderJava internal matters. Links are those links shown in the HTML panel, and nodes are items in the navigation tree. There is just one node for every URL in the website. But (usually) there are many links pointing to the same URL; all the links pointing to the same URL resolve to the same node, and therefore have the same HTTP status code.
This is the first HD video tutorial for DRKSpiderJava. Use full screen mode for better readability
This video was made using CamStudio - Free Streaming Video Desktop Recording Software
The link search tool allows searching by tag, URL, anchor text, nofollow property, internal/external and error status. Results display over a detailed table which can be exported to CSV file. The node search tool filters by title, description, keywords, author and robots meta tag. The result window can locate the node in the tree.
The crawler can be configured for using from 1 to 15 threads. Using a single thread helps building the tree in the order the links (repeated links) appear while crawling. Using two threads is faster and respects the HTTP protocol recomendation: not opening more than two connections to the same server. Using three or more thread increases speed a lot but could be a little aggresive for the target server.
The SEO tool does a basic analysis of title, description and keywords matching the page content. A 100% score means all the words in these fields were found in page content. It doesn't detect over optimization. This feature will be improved as time goes on. But keep in mind that DRKSpiderJava isn't mean to be a SEO tool. The SEO analysis is there to help the user make the site more search engine friendly.
DRKSpiderJAVA is a open source software made in Java, which means you can use it, modify it and distribute it to your friends, freely. And it's multiplatform; you can run it on Windows, Linux, Mac or any other platform supporting Java SE virtual machine.
In order to run DRKSpiderJava you have to download and install Java 1.6 or over. The binary distribution is portable and self contained into a single directory. Any comments, feedback or questions are welcome: Feedback form.
DRKSpiderJava.zip - 1.5 MBytes - August 11, 2014
DRKSpiderJava.tar.gz - 1.5 MBytes - August 11, 2014
Download the package of your choice, extract and double click on drkspiderjava.jar, or execute the script for your platform. CMD for Windows and SH for Linux or MacOSX.
DRKSpiderJavaSrc.7z - 1.3 MBytes - July 22, 2014
DRKSpider is licensed under the GNU General Public License
DRKSpider_setup.exe - 1.1 MBytes - August 26, 2013