Asia/Mumbai

Lets Talk 👋
Made By Hrushikesh Shinde
Security Tools

ReconSpider: HTB Web Enumeration Tool Guide (2026)

Learn how to install and use ReconSpider for web enumeration on HackTheBox. Covers setup, JSON output analysis, and practical recon workflow for pentesters.

Published on: February 19, 2026
Last Modified: March 04, 2026
Reading Time: 8 min read
ReconSpider: HTB Web Enumeration Tool Guide (2026)

TL;DR

ReconSpider is a Python-based web enumeration tool built by HackTheBox that crawls a target domain and extracts structured reconnaissance data into a result.json file. Its standout capability is HTML comment extraction — a recon signal most tools skip entirely, and one that frequently surfaces hidden credentials and developer notes in HTB challenges. Setup takes under five minutes with Python and Scrapy as the only dependencies.


What Is ReconSpider?

ReconSpider is a web reconnaissance automation tool built by Hack The Box for use in authorized security assessments and HTB Academy labs. It crawls a target URL using Scrapy under the hood and outputs a structured JSON file containing every web-layer asset it discovers — emails, internal and external links, JavaScript files, PDFs, images, form fields, and HTML source comments.

The key reason to add it to your workflow: most recon tools map ports or brute-force directories. ReconSpider maps the content layer — what the application is exposing through its own HTML and resources. HTML comment extraction in particular is underused by most practitioners, and HTB challenge designers know it.

TypeWeb content enumeration and asset extraction
Built byHack The Box
Best useFirst-pass web recon to map assets, links, and hidden content
Not forPort scanning, directory brute-forcing, vulnerability exploitation
Typical usersHTB players, penetration testers, bug bounty researchers

Prerequisites

Before downloading ReconSpider, confirm your environment meets two requirements.

Python 3.7 or higher:

python3 --version
# Must return Python 3.7.x or above

Scrapy (ReconSpider's crawling engine):

pip3 install scrapy

If Scrapy is already installed, skip directly to the download step. No other dependencies are required.


Installation

Official HTB Download

# Step 1: Download the zip from HTB Academy
wget -O ReconSpider.zip https://academy.hackthebox.com/storage/modules/144/ReconSpider.v1.2.zip
 
# Step 2: Unzip
unzip ReconSpider.zip
 

If the wget URL returns a 404 or times out, use the community GitHub mirror instead: ReconSpider-HTB GitHub Repository Download the repository as a ZIP, unzip it, and cd into the extracted folder. Continue from Step 4 below.


Running ReconSpider

Basic usage

python3 ReconSpider.py http://testfire.net

Replace http://testfire.net with your authorized target. In this example, http://testfire.net is used only for testing and demonstration purposes, as it is a publicly available intentionally vulnerable website. ReconSpider will crawl the domain and save the results to result.json in the same directory.

Screenshot context: You should see Scrapy's crawl log output in the terminal — request counts, item counts, and a completion message. The crawl depth and speed depends on the target site's size.

Reading the output

cat result.json

Screenshot context: The terminal displays a formatted JSON object. Each key contains an array of discovered items. A site with active content will show populated emails, links, js_files, and comments arrays.


Understanding the result.json Output

ReconSpider organizes all findings into a single JSON file with eight keys. Here is the full output structure from a real crawl:

{
    "emails": [],
    "links": [
        "http://testfire.net/index.jsp?content=privacy.htm",
        "https://github.com/AppSecDev/AltoroJ/",
        "http://testfire.net/disclaimer.htm?url=http://www.microsoft.com",
        "http://testfire.net/Privacypolicy.jsp?sec=Careers&template=US",
        "http://testfire.net/index.jsp?content=security.htm",
        "http://testfire.net/index.jsp?content=business_retirement.htm",
        "http://testfire.net/swagger/index.html",
        "http://testfire.net/default.jsp?content=security.htm",
        "http://testfire.net/index.jsp?content=business_insurance.htm",
        "http://testfire.net/index.jsp?content=pr/20061109.htm",
        "http://testfire.net/index.jsp?content=inside_internships.htm",
        "http://testfire.net/index.jsp?content=inside_jobs.htm&job=Teller:ConsumaerBanking",
        "http://testfire.net/index.jsp",
        "http://testfire.net/index.jsp?content=inside_community.htm",
        "http://testfire.net/index.jsp?content=inside_jobs.htm&job=ExecutiveAssistant:Administration",
        "http://testfire.net/survey_questions.jsp?step=email",
        "http://testfire.net/inside_points_of_interest.htm",
        "http://testfire.net/survey_questions.jsp",
        "http://testfire.net/index.jsp?content=personal_savings.htm",
        "http://testfire.net/index.jsp?content=inside_executives.htm",
        "http://testfire.net/survey_questions.jsp?step=a",
        "http://testfire.net/subscribe.jsp",
        "http://testfire.net/index.jsp?content=personal_other.htm",
        "http://testfire.net/disclaimer.htm?url=http://www.netscape.com",
        "http://testfire.net/login.jsp",
        "http://testfire.net/index.jsp?content=inside_investor.htm",
        "http://testfire.net/index.jsp?content=business_deposit.htm",
        "http://testfire.net/index.jsp?content=pr/20060928.htm",
        "http://testfire.net/index.jsp?content=pr/20060817.htm",
        "http://www.cert.org/",
        "http://testfire.net/index.jsp?content=inside_trainee.htm",
        "http://www.adobe.com/products/acrobat/readstep2.html",
        "http://testfire.net/index.jsp?content=pr/20060720.htm",
        "http://testfire.net/index.jsp?content=personal_checking.htm",
        "http://testfire.net/index.jsp?content=security.htm#top",
        "http://testfire.net/index.jsp?content=pr/20061005.htm",
        "http://testfire.net/index.jsp?content=business_lending.htm",
        "http://testfire.net/high_yield_investments.htm",
        "http://testfire.net/index.jsp?content=business_cards.htm",
        "http://testfire.net/index.jsp?content=business.htm",
        "http://testfire.net/index.jsp?content=inside_about.htm",
        "http://testfire.net/index.jsp?content=inside_volunteering.htm#gift",
        "http://testfire.net/Documents/JohnSmith/VoluteeringInformation.pdf",
        "http://testfire.net/pr/communityannualreport.pdf",
        "http://testfire.net/index.jsp?content=inside_jobs.htm&job=LoyaltyMarketingProgramManager:Marketing",
        "http://testfire.net/index.jsp?content=inside_contact.htm",
        "http://testfire.net/my%20documents/JohnSmith/Bank%20Site%20Documents/grouplife.htm",
        "http://testfire.net/admin/clients.xls",
        "http://www.watchfire.com/statements/terms.aspx",
        "http://www.newspapersyndications.tv",
        "https://www.hcl-software.com/appscan/",
        "http://testfire.net/index.jsp?content=personal_loans.htm",
        "http://testfire.net/index.jsp?content=inside_press.htm",
        "http://testfire.net/index.jsp?content=inside_contact.htm#ContactUs",
        "http://testfire.net/index.jsp?content=pr/20060518.htm",
        "http://testfire.net/index.jsp?content=inside_jobs.htm&job=MortgageLendingAccountExecutive:Sales",
        "http://testfire.net/survey_questions.jsp?step=d",
        "http://testfire.net/index.jsp?content=personal_cards.htm",
        "http://testfire.net/survey_questions.jsp?step=b",
        "http://testfire.net/cgi.exe",
        "http://testfire.net/index.jsp?content=pr/20060413.htm",
        "http://testfire.net/index.jsp?content=inside_jobs.htm&job=CustomerServiceRepresentative:CustomerService",
        "http://testfire.net/feedback.jsp",
        "http://testfire.net/index.jsp?content=pr/20060921.htm",
        "http://testfire.net/index.jsp?content=inside_volunteering.htm",
        "http://testfire.net/index.jsp?content=inside_benefits.htm",
        "http://testfire.net/index.jsp?content=inside_volunteering.htm#time",
        "http://testfire.net/index.jsp?content=personal_deposit.htm",
        "http://testfire.net/security.htm",
        "http://testfire.net/index.jsp?content=personal.htm",
        "http://testfire.net/index.jsp?content=inside_jobs.htm&job=OperationalRiskManager:RiskManagement",
        "http://testfire.net/default.jsp",
        "http://testfire.net/index.jsp?content=personal_investments.htm",
        "http://testfire.net/status_check.jsp",
        "http://testfire.net/index.jsp?content=business_other.htm",
        "http://testfire.net/index.jsp?content=inside_jobs.htm",
        "http://testfire.net/survey_questions.jsp?step=c",
        "http://testfire.net/index.jsp?content=inside.htm",
        "http://testfire.net/index.jsp?content=inside_careers.htm"
    ],
    "external_files": [
        "http://testfire.net/css",
        "http://testfire.net/xls",
        "http://testfire.net/pdf",
        "http://testfire.net/pr/communityannualreport.pdf",
        "http://testfire.net/swagger/css"
    ],
    "js_files": [
        "http://testfire.net/swagger/swagger-ui-bundle.js",
        "http://demo-analytics.testfire.net/urchin.js",
        "http://testfire.net/swagger/swagger-ui-standalone-preset.js"
    ],
    "form_fields": [
        "email_addr",
        "cfile",
        "btnSubmit",
        "uid",
        "submit",
        "query",
        "subject",
        "comments",
        "step",
        "reset",
        "name",
        "passw",
        "txtEmail",
        "email"
    ],
    "images": [
        "http://testfire.net/images/icon_top.gif",
        "http://testfire.net/images/b_lending.jpg",
        "http://testfire.net/images/cancel.gif",
        "http://www.exampledomainnotinuse.org/mybeacon.gif",
        "http://testfire.net/images/altoro.gif",
        "http://testfire.net/images/b_main.jpg",
        "http://testfire.net/images/inside7.jpg",
        "http://testfire.net/images/p_other.jpg",
        "http://testfire.net/images/p_cards.jpg",
        "http://testfire.net/images/logo.gif",
        "http://testfire.net/images/b_insurance.jpg",
        "http://testfire.net/images/inside1.jpg",
        "http://testfire.net/images/p_main.jpg",
        "http://testfire.net/images/inside5.jpg",
        "http://testfire.net/feedback.jsp",
        "http://testfire.net/images/home1.jpg",
        "http://testfire.net/images/inside3.jpg",
        "http://testfire.net/images/adobe.gif",
        "http://testfire.net/images/p_deposit.jpg",
        "http://testfire.net/images/ok.gif",
        "http://testfire.net/images/b_other.jpg",
        "http://testfire.net/images/home2.jpg",
        "http://testfire.net/images/inside4.jpg",
        "http://testfire.net/images/pf_lock.gif",
        "http://testfire.net/images/p_investments.jpg",
        "http://testfire.net/images/spacer.gif",
        "http://testfire.net/images/inside6.jpg",
        "http://testfire.net/images/b_deposit.jpg",
        "http://testfire.net/images/header_pic.jpg",
        "http://testfire.net/images/home3.jpg",
        "http://testfire.net/images/b_cards.jpg",
        "http://testfire.net/images/p_loans.jpg",
        "http://testfire.net/images/p_checking.jpg"
    ],
    "videos": [],
    "audio": [],
    "comments": [
        "<!-- Keywords:Altoro Mutual, business succession, wealth management, international trade services, mergers, acquisitions -->",
        "<!-- HTML for static distribution bundle build -->",
        "<!-- Keywords:Altoro Mutual, student internships, student co-op -->",
        "<!-- Keywords:Altoro Mutual -->",
        "<!-- Keywords:Altoro Mutual, security, security, security, we provide security, secure online banking -->",
        "<!-- Keywords:Altoro Mutual, disability insurance, insurince, life insurance -->",
        "<!-- Keywords:Altoro Mutual, executives, board of directors -->",
        "<!-- Keywords:Altoro Mutual, brokerage services, retirement, insurance, private banking, wealth and tax services -->",
        "<!-- TOC END -->",
        "<!-- Keywords:Altoro Mutual, job openings, benefits, student internships, management trainee programs -->",
        "<!-- Keywords:Altoro Mutual, management trainess, Careers, advancement -->",
        "<!-- Keywords:Altoro Mutual, Altoro Private Bank, Altoro Wealth and Tax -->",
        "<!-- Keywords:Altoro Mutual, privacy, information collection, safeguards, data usage -->",
        "<!-- Keywords:Altoro Mutual, stocks, stock quotes -->",
        "<!-- Keywords:Altoro Mutual, employee volunteering -->",
        "<!-- Keywords:Altoro Mutual, personal checking, checking platinum, checking gold, checking silver, checking bronze -->",
        "<!-- Keywords:Altoro Mutual, online banking, banking, checking, savings, accounts -->",
        "<!-- Keywords:Altoro Mutual, platinum card, gold card, silver card, bronze card, student credit -->",
        "<!-- Keywords:Altoro Mutual, deposit products, personal deposits -->",
        "<!-- Keywords:Altoro Mutual, press releases, media, news, events, public relations -->",
        "<!-- Keywords:Altoro Mutual, benefits, child-care, flexible time, health club, company discounts, paid vacations -->",
        "<!-- Keywords:Altoro Mutual, online banking, contact information, subscriptions -->",
        "<!-- BEGIN FOOTER -->",
        "<!--- Dave- Hard code this into the final script - Possible security problem.\n\t\t  Re-generated every Tuesday and old files are saved to .bak format at L:\\backup\\website\\oldfiles    --->",
        "<!-- Keywords:Altoro Mutual, auto loans, boat loans, lines of credit, home equity, mortgage loans, student loans -->",
        "<!-- Keywords:Altoro Mutual, careers, opportunities, jobs, management -->",
        "<!-- BEGIN HEADER -->",
        "<!-- END HEADER -->",
        "<!-- Keywords:Altoro Mutual, deposit products, lending, credit cards, insurance, retirement -->",
        "<!-- Keywords:Altoro Mutual, personal deposit, personal checking, personal loans, personal cards, personal investments -->",
        "<!-- Keywords:Altoro Mutual, community events, volunteering -->",
        "<!-- TOC BEGIN -->",
        "<!-- Keywords:Altoro Mutual Press Release -->",
        "<!-- END FOOTER -->",
        "<!-- Keywords:Altoro Mutual, real estate loans, small business loands, small business loands, equipment leasing, credit line -->",
        "<!-- To get the latest admin login, please contact SiteOps at 415-555-6159 -->",
        "<!-- Keywords:Altoro Mutual, credit cards, platinum cards, premium credit -->"
    ]
}

Each key maps to a distinct category of discovered data:

JSON KeyWhat it containsWhy it matters in recon
emailsEmail addresses found on the domainStaff enumeration, phishing surface, username patterns
linksInternal and external URLsMaps application structure, reveals third-party dependencies
external_filesPDFs, docs, and downloadable filesOften contain metadata, internal paths, or sensitive content
js_filesJavaScript file URLsReveals API endpoints, secret keys, and client-side logic
form_fieldsInput field names from formsAttack surface for injection, parameter discovery
imagesImage URLsOccasionally contain embedded metadata (EXIF)
videosVideo file URLsRarely populated but worth checking in media-heavy apps
audioAudio file URLsRarely populated
commentsRaw HTML comment stringsHighest signal for HTB — developers leave credentials, debug notes, and versioning hints here

Why HTML Comments Are the Most Valuable Output

The comments key is the reason ReconSpider earns a permanent place in any HTB web recon workflow.

HTML comments (<!-- ... -->) are invisible to end users in the browser but present in raw page source. Developers routinely leave behind:

  • Commented-out login credentials from testing
  • Internal hostnames and file paths
  • Version strings that reveal vulnerable software
  • Debug notes that describe application behavior
  • Disabled features that hint at hidden functionality

Most automated scanners and directory fuzzers never touch HTML comment content. ReconSpider extracts it in every crawl, structured and ready to grep.

# Filter just comments from result.json using Python
python3 -c "import json; data=json.load(open('results.json')); [print(c) for c in data['comments']]"

Scan the output for anything that looks like a credential pattern, a hostname, a version number, or a path that doesn't appear in your visible sitemap.


ReconSpider in a Pentest Workflow

ReconSpider belongs at the start of web-layer recon, before active scanning or exploitation.

1. Confirm scope and authorization

2. Run ReconSpider → generates result.json

3. Triage result.json

  • emails → build username list for brute-force
  • js_files → manually review for API keys and endpoints
  • external_files → download and extract metadata
  • comments → manually review for credentials and hints

4. Feed findings into next-layer tools

  • Gobuster / ffuf → directory brute-force discovered paths
  • Nmap → port scan discovered subdomains
  • Burp Suite → proxy and test discovered endpoints

5. Document all findings with timestamps


ReconSpider vs. Complementary Tools

ReconSpider operates at the web content layer. Each tool below operates at a different layer — they are not substitutes.

ToolPrimary StrengthRecon LayerCost
ReconSpiderWeb asset and comment extractionContent layerFree
NmapPort and service discoveryNetwork layerFree
Gobuster / ffufDirectory and file brute-forcingURL layerFree
OWASP AmassSubdomain and ASN enumerationDNS layerFree
Sublist3rFast subdomain discoveryDNS layerFree

Use all five in sequence. ReconSpider gives you the content map; the others give you the infrastructure map.


Quick Reference Cheat Sheet

# Install Scrapy dependency
pip3 install scrapy
 
# Download ReconSpider (HTB Academy)
wget -O ReconSpider.zip https://academy.hackthebox.com/storage/modules/144/ReconSpider.v1.2.zip
unzip ReconSpider.zip && cd ReconSpider
 
# Download ReconSpider (GitHub mirror, if Academy URL fails)
# https://github.com/HowdoComputer/ReconSpider-HTB → download ZIP → unzip → cd into folder
 
# Run against target
python3 ReconSpider.py <target-domain>
 
# View full output
cat result.json
 
# Extract only comments
python3 -c "import json; data=json.load(open('results.json')); [print(c) for c in data['comments']]"
 
# Extract only emails
python3 -c "import json; data=json.load(open('results.json')); [print(e) for e in data['emails']]"
 
# Extract only JS files
python3 -c "import json; data=json.load(open('results.json')); [print(j) for j in data['js_files']]"
 
# Pretty-print the entire result
python3 -m json.tool results.json

Common Mistakes to Avoid

Running ReconSpider without reviewing js_files manually. JavaScript files frequently contain hardcoded API keys, endpoint URLs, and authentication tokens that don't appear anywhere else in the application. Skipping JS review means leaving the most exploitable content layer untouched. Use Burp Suite to proxy and inspect these endpoints directly after discovery.

Treating empty arrays as confirmed negatives. If form_fields or comments returns an empty array, it means ReconSpider didn't find any on the pages it crawled — not that none exist. Scrapy's crawl depth is finite. Manually check pages that ReconSpider may not have reached.

Ignoring external_files because they look harmless. PDFs and Word documents hosted on a target frequently contain author metadata, internal network paths, and revision history. Download and run exiftool against every file in this array before moving on.

Skipping the GitHub mirror when the Academy download fails. The academy.hackthebox.com wget URL occasionally returns a 404 or times out outside of active lab sessions. The GitHub mirror at github.com/HowdoComputer/ReconSpider-HTB is functionally identical — don't abandon the tool because one download link failed.

Running ReconSpider against out-of-scope targets. Scrapy will follow external links. Confirm your target scope before running and pass only in-scope domains. Crawling an unintended host — even accidentally — creates legal exposure.


Frequently Asked Questions


Conclusion

ReconSpider does one thing most recon tools skip: it reads what the application is openly exposing through its own content layer. Emails, JavaScript endpoints, external file references, and — most valuably — HTML comments all land in a structured JSON file after a single command. The workflow is: run ReconSpider first, triage result.json systematically, then feed discoveries into Nmap, Gobuster, and Burp Suite for the next recon layer. That sequencing keeps your coverage complete and your findings grounded in what the target is actually serving.


Sources

  • ReconSpider-HTB GitHub Repository — Community mirror of the ReconSpider tool with installation instructions
  • HackTheBox Academy — Footprinting Module — Official HTB module where ReconSpider is introduced
  • Scrapy Documentation — Official docs for Scrapy, the Python crawling framework powering ReconSpider
  • OWASP Web Security Testing Guide — Information Gathering — OWASP methodology for the recon phase ReconSpider supports
  • Python Documentation — Reference for Python 3.7+ environment requirements

Share this article

Share
Previous Post
Cybersecurity Threats Explained: Attackers, Malware, Social Engineering & Network Attacks
November 27, 2024
Next Post
Burp Suite for HTB & CTF Players: Complete Guide (2026)
March 05, 2026