ReconSpider: HTB Web Enumeration Tool Guide (2026)
Learn how to install and use ReconSpider for web enumeration on HackTheBox. Covers setup, JSON output analysis, and practical recon workflow for pentesters.

TL;DR
ReconSpider is a Python-based web enumeration tool built by HackTheBox that crawls a target domain and extracts structured reconnaissance data into a result.json file. Its standout capability is HTML comment extraction — a recon signal most tools skip entirely, and one that frequently surfaces hidden credentials and developer notes in HTB challenges. Setup takes under five minutes with Python and Scrapy as the only dependencies.
What Is ReconSpider?
ReconSpider is a web reconnaissance automation tool built by Hack The Box for use in authorized security assessments and HTB Academy labs. It crawls a target URL using Scrapy under the hood and outputs a structured JSON file containing every web-layer asset it discovers — emails, internal and external links, JavaScript files, PDFs, images, form fields, and HTML source comments.
The key reason to add it to your workflow: most recon tools map ports or brute-force directories. ReconSpider maps the content layer — what the application is exposing through its own HTML and resources. HTML comment extraction in particular is underused by most practitioners, and HTB challenge designers know it.
| Type | Web content enumeration and asset extraction |
| Built by | Hack The Box |
| Best use | First-pass web recon to map assets, links, and hidden content |
| Not for | Port scanning, directory brute-forcing, vulnerability exploitation |
| Typical users | HTB players, penetration testers, bug bounty researchers |
Prerequisites
Before downloading ReconSpider, confirm your environment meets two requirements.
Python 3.7 or higher:
python3 --version
# Must return Python 3.7.x or aboveScrapy (ReconSpider's crawling engine):
pip3 install scrapyIf Scrapy is already installed, skip directly to the download step. No other dependencies are required.
Installation
Official HTB Download
# Step 1: Download the zip from HTB Academy
wget -O ReconSpider.zip https://academy.hackthebox.com/storage/modules/144/ReconSpider.v1.2.zip
# Step 2: Unzip
unzip ReconSpider.zip
If the wget URL returns a 404 or times out, use the community GitHub mirror instead: ReconSpider-HTB GitHub Repository Download the repository as a ZIP, unzip it, and
cdinto the extracted folder. Continue from Step 4 below.
Running ReconSpider
Basic usage
python3 ReconSpider.py http://testfire.netReplace http://testfire.net with your authorized target. In this example, http://testfire.net is used only for testing and demonstration purposes, as it is a publicly available intentionally vulnerable website. ReconSpider will crawl the domain and save the results to result.json in the same directory.
Screenshot context: You should see Scrapy's crawl log output in the terminal — request counts, item counts, and a completion message. The crawl depth and speed depends on the target site's size.
Reading the output
cat result.json
Screenshot context: The terminal displays a formatted JSON object. Each key contains an array of discovered items. A site with active content will show populated
emails,links,js_files, andcommentsarrays.
Understanding the result.json Output
ReconSpider organizes all findings into a single JSON file with eight keys. Here is the full output structure from a real crawl:
{
"emails": [],
"links": [
"http://testfire.net/index.jsp?content=privacy.htm",
"https://github.com/AppSecDev/AltoroJ/",
"http://testfire.net/disclaimer.htm?url=http://www.microsoft.com",
"http://testfire.net/Privacypolicy.jsp?sec=Careers&template=US",
"http://testfire.net/index.jsp?content=security.htm",
"http://testfire.net/index.jsp?content=business_retirement.htm",
"http://testfire.net/swagger/index.html",
"http://testfire.net/default.jsp?content=security.htm",
"http://testfire.net/index.jsp?content=business_insurance.htm",
"http://testfire.net/index.jsp?content=pr/20061109.htm",
"http://testfire.net/index.jsp?content=inside_internships.htm",
"http://testfire.net/index.jsp?content=inside_jobs.htm&job=Teller:ConsumaerBanking",
"http://testfire.net/index.jsp",
"http://testfire.net/index.jsp?content=inside_community.htm",
"http://testfire.net/index.jsp?content=inside_jobs.htm&job=ExecutiveAssistant:Administration",
"http://testfire.net/survey_questions.jsp?step=email",
"http://testfire.net/inside_points_of_interest.htm",
"http://testfire.net/survey_questions.jsp",
"http://testfire.net/index.jsp?content=personal_savings.htm",
"http://testfire.net/index.jsp?content=inside_executives.htm",
"http://testfire.net/survey_questions.jsp?step=a",
"http://testfire.net/subscribe.jsp",
"http://testfire.net/index.jsp?content=personal_other.htm",
"http://testfire.net/disclaimer.htm?url=http://www.netscape.com",
"http://testfire.net/login.jsp",
"http://testfire.net/index.jsp?content=inside_investor.htm",
"http://testfire.net/index.jsp?content=business_deposit.htm",
"http://testfire.net/index.jsp?content=pr/20060928.htm",
"http://testfire.net/index.jsp?content=pr/20060817.htm",
"http://www.cert.org/",
"http://testfire.net/index.jsp?content=inside_trainee.htm",
"http://www.adobe.com/products/acrobat/readstep2.html",
"http://testfire.net/index.jsp?content=pr/20060720.htm",
"http://testfire.net/index.jsp?content=personal_checking.htm",
"http://testfire.net/index.jsp?content=security.htm#top",
"http://testfire.net/index.jsp?content=pr/20061005.htm",
"http://testfire.net/index.jsp?content=business_lending.htm",
"http://testfire.net/high_yield_investments.htm",
"http://testfire.net/index.jsp?content=business_cards.htm",
"http://testfire.net/index.jsp?content=business.htm",
"http://testfire.net/index.jsp?content=inside_about.htm",
"http://testfire.net/index.jsp?content=inside_volunteering.htm#gift",
"http://testfire.net/Documents/JohnSmith/VoluteeringInformation.pdf",
"http://testfire.net/pr/communityannualreport.pdf",
"http://testfire.net/index.jsp?content=inside_jobs.htm&job=LoyaltyMarketingProgramManager:Marketing",
"http://testfire.net/index.jsp?content=inside_contact.htm",
"http://testfire.net/my%20documents/JohnSmith/Bank%20Site%20Documents/grouplife.htm",
"http://testfire.net/admin/clients.xls",
"http://www.watchfire.com/statements/terms.aspx",
"http://www.newspapersyndications.tv",
"https://www.hcl-software.com/appscan/",
"http://testfire.net/index.jsp?content=personal_loans.htm",
"http://testfire.net/index.jsp?content=inside_press.htm",
"http://testfire.net/index.jsp?content=inside_contact.htm#ContactUs",
"http://testfire.net/index.jsp?content=pr/20060518.htm",
"http://testfire.net/index.jsp?content=inside_jobs.htm&job=MortgageLendingAccountExecutive:Sales",
"http://testfire.net/survey_questions.jsp?step=d",
"http://testfire.net/index.jsp?content=personal_cards.htm",
"http://testfire.net/survey_questions.jsp?step=b",
"http://testfire.net/cgi.exe",
"http://testfire.net/index.jsp?content=pr/20060413.htm",
"http://testfire.net/index.jsp?content=inside_jobs.htm&job=CustomerServiceRepresentative:CustomerService",
"http://testfire.net/feedback.jsp",
"http://testfire.net/index.jsp?content=pr/20060921.htm",
"http://testfire.net/index.jsp?content=inside_volunteering.htm",
"http://testfire.net/index.jsp?content=inside_benefits.htm",
"http://testfire.net/index.jsp?content=inside_volunteering.htm#time",
"http://testfire.net/index.jsp?content=personal_deposit.htm",
"http://testfire.net/security.htm",
"http://testfire.net/index.jsp?content=personal.htm",
"http://testfire.net/index.jsp?content=inside_jobs.htm&job=OperationalRiskManager:RiskManagement",
"http://testfire.net/default.jsp",
"http://testfire.net/index.jsp?content=personal_investments.htm",
"http://testfire.net/status_check.jsp",
"http://testfire.net/index.jsp?content=business_other.htm",
"http://testfire.net/index.jsp?content=inside_jobs.htm",
"http://testfire.net/survey_questions.jsp?step=c",
"http://testfire.net/index.jsp?content=inside.htm",
"http://testfire.net/index.jsp?content=inside_careers.htm"
],
"external_files": [
"http://testfire.net/css",
"http://testfire.net/xls",
"http://testfire.net/pdf",
"http://testfire.net/pr/communityannualreport.pdf",
"http://testfire.net/swagger/css"
],
"js_files": [
"http://testfire.net/swagger/swagger-ui-bundle.js",
"http://demo-analytics.testfire.net/urchin.js",
"http://testfire.net/swagger/swagger-ui-standalone-preset.js"
],
"form_fields": [
"email_addr",
"cfile",
"btnSubmit",
"uid",
"submit",
"query",
"subject",
"comments",
"step",
"reset",
"name",
"passw",
"txtEmail",
"email"
],
"images": [
"http://testfire.net/images/icon_top.gif",
"http://testfire.net/images/b_lending.jpg",
"http://testfire.net/images/cancel.gif",
"http://www.exampledomainnotinuse.org/mybeacon.gif",
"http://testfire.net/images/altoro.gif",
"http://testfire.net/images/b_main.jpg",
"http://testfire.net/images/inside7.jpg",
"http://testfire.net/images/p_other.jpg",
"http://testfire.net/images/p_cards.jpg",
"http://testfire.net/images/logo.gif",
"http://testfire.net/images/b_insurance.jpg",
"http://testfire.net/images/inside1.jpg",
"http://testfire.net/images/p_main.jpg",
"http://testfire.net/images/inside5.jpg",
"http://testfire.net/feedback.jsp",
"http://testfire.net/images/home1.jpg",
"http://testfire.net/images/inside3.jpg",
"http://testfire.net/images/adobe.gif",
"http://testfire.net/images/p_deposit.jpg",
"http://testfire.net/images/ok.gif",
"http://testfire.net/images/b_other.jpg",
"http://testfire.net/images/home2.jpg",
"http://testfire.net/images/inside4.jpg",
"http://testfire.net/images/pf_lock.gif",
"http://testfire.net/images/p_investments.jpg",
"http://testfire.net/images/spacer.gif",
"http://testfire.net/images/inside6.jpg",
"http://testfire.net/images/b_deposit.jpg",
"http://testfire.net/images/header_pic.jpg",
"http://testfire.net/images/home3.jpg",
"http://testfire.net/images/b_cards.jpg",
"http://testfire.net/images/p_loans.jpg",
"http://testfire.net/images/p_checking.jpg"
],
"videos": [],
"audio": [],
"comments": [
"<!-- Keywords:Altoro Mutual, business succession, wealth management, international trade services, mergers, acquisitions -->",
"<!-- HTML for static distribution bundle build -->",
"<!-- Keywords:Altoro Mutual, student internships, student co-op -->",
"<!-- Keywords:Altoro Mutual -->",
"<!-- Keywords:Altoro Mutual, security, security, security, we provide security, secure online banking -->",
"<!-- Keywords:Altoro Mutual, disability insurance, insurince, life insurance -->",
"<!-- Keywords:Altoro Mutual, executives, board of directors -->",
"<!-- Keywords:Altoro Mutual, brokerage services, retirement, insurance, private banking, wealth and tax services -->",
"<!-- TOC END -->",
"<!-- Keywords:Altoro Mutual, job openings, benefits, student internships, management trainee programs -->",
"<!-- Keywords:Altoro Mutual, management trainess, Careers, advancement -->",
"<!-- Keywords:Altoro Mutual, Altoro Private Bank, Altoro Wealth and Tax -->",
"<!-- Keywords:Altoro Mutual, privacy, information collection, safeguards, data usage -->",
"<!-- Keywords:Altoro Mutual, stocks, stock quotes -->",
"<!-- Keywords:Altoro Mutual, employee volunteering -->",
"<!-- Keywords:Altoro Mutual, personal checking, checking platinum, checking gold, checking silver, checking bronze -->",
"<!-- Keywords:Altoro Mutual, online banking, banking, checking, savings, accounts -->",
"<!-- Keywords:Altoro Mutual, platinum card, gold card, silver card, bronze card, student credit -->",
"<!-- Keywords:Altoro Mutual, deposit products, personal deposits -->",
"<!-- Keywords:Altoro Mutual, press releases, media, news, events, public relations -->",
"<!-- Keywords:Altoro Mutual, benefits, child-care, flexible time, health club, company discounts, paid vacations -->",
"<!-- Keywords:Altoro Mutual, online banking, contact information, subscriptions -->",
"<!-- BEGIN FOOTER -->",
"<!--- Dave- Hard code this into the final script - Possible security problem.\n\t\t Re-generated every Tuesday and old files are saved to .bak format at L:\\backup\\website\\oldfiles --->",
"<!-- Keywords:Altoro Mutual, auto loans, boat loans, lines of credit, home equity, mortgage loans, student loans -->",
"<!-- Keywords:Altoro Mutual, careers, opportunities, jobs, management -->",
"<!-- BEGIN HEADER -->",
"<!-- END HEADER -->",
"<!-- Keywords:Altoro Mutual, deposit products, lending, credit cards, insurance, retirement -->",
"<!-- Keywords:Altoro Mutual, personal deposit, personal checking, personal loans, personal cards, personal investments -->",
"<!-- Keywords:Altoro Mutual, community events, volunteering -->",
"<!-- TOC BEGIN -->",
"<!-- Keywords:Altoro Mutual Press Release -->",
"<!-- END FOOTER -->",
"<!-- Keywords:Altoro Mutual, real estate loans, small business loands, small business loands, equipment leasing, credit line -->",
"<!-- To get the latest admin login, please contact SiteOps at 415-555-6159 -->",
"<!-- Keywords:Altoro Mutual, credit cards, platinum cards, premium credit -->"
]
}Each key maps to a distinct category of discovered data:
| JSON Key | What it contains | Why it matters in recon |
|---|---|---|
emails | Email addresses found on the domain | Staff enumeration, phishing surface, username patterns |
links | Internal and external URLs | Maps application structure, reveals third-party dependencies |
external_files | PDFs, docs, and downloadable files | Often contain metadata, internal paths, or sensitive content |
js_files | JavaScript file URLs | Reveals API endpoints, secret keys, and client-side logic |
form_fields | Input field names from forms | Attack surface for injection, parameter discovery |
images | Image URLs | Occasionally contain embedded metadata (EXIF) |
videos | Video file URLs | Rarely populated but worth checking in media-heavy apps |
audio | Audio file URLs | Rarely populated |
comments | Raw HTML comment strings | Highest signal for HTB — developers leave credentials, debug notes, and versioning hints here |
Why HTML Comments Are the Most Valuable Output
The comments key is the reason ReconSpider earns a permanent place in any HTB web recon workflow.
HTML comments (<!-- ... -->) are invisible to end users in the browser but present in raw page source. Developers routinely leave behind:
- Commented-out login credentials from testing
- Internal hostnames and file paths
- Version strings that reveal vulnerable software
- Debug notes that describe application behavior
- Disabled features that hint at hidden functionality
Most automated scanners and directory fuzzers never touch HTML comment content. ReconSpider extracts it in every crawl, structured and ready to grep.
# Filter just comments from result.json using Python
python3 -c "import json; data=json.load(open('results.json')); [print(c) for c in data['comments']]"Scan the output for anything that looks like a credential pattern, a hostname, a version number, or a path that doesn't appear in your visible sitemap.
ReconSpider in a Pentest Workflow
ReconSpider belongs at the start of web-layer recon, before active scanning or exploitation.
1. Confirm scope and authorization
2. Run ReconSpider → generates result.json
3. Triage result.json
emails→ build username list for brute-forcejs_files→ manually review for API keys and endpointsexternal_files→ download and extract metadatacomments→ manually review for credentials and hints
4. Feed findings into next-layer tools
- Gobuster / ffuf → directory brute-force discovered paths
- Nmap → port scan discovered subdomains
- Burp Suite → proxy and test discovered endpoints
5. Document all findings with timestamps
ReconSpider vs. Complementary Tools
ReconSpider operates at the web content layer. Each tool below operates at a different layer — they are not substitutes.
| Tool | Primary Strength | Recon Layer | Cost |
|---|---|---|---|
| ReconSpider | Web asset and comment extraction | Content layer | Free |
| Nmap | Port and service discovery | Network layer | Free |
| Gobuster / ffuf | Directory and file brute-forcing | URL layer | Free |
| OWASP Amass | Subdomain and ASN enumeration | DNS layer | Free |
| Sublist3r | Fast subdomain discovery | DNS layer | Free |
Use all five in sequence. ReconSpider gives you the content map; the others give you the infrastructure map.
Quick Reference Cheat Sheet
# Install Scrapy dependency
pip3 install scrapy
# Download ReconSpider (HTB Academy)
wget -O ReconSpider.zip https://academy.hackthebox.com/storage/modules/144/ReconSpider.v1.2.zip
unzip ReconSpider.zip && cd ReconSpider
# Download ReconSpider (GitHub mirror, if Academy URL fails)
# https://github.com/HowdoComputer/ReconSpider-HTB → download ZIP → unzip → cd into folder
# Run against target
python3 ReconSpider.py <target-domain>
# View full output
cat result.json
# Extract only comments
python3 -c "import json; data=json.load(open('results.json')); [print(c) for c in data['comments']]"
# Extract only emails
python3 -c "import json; data=json.load(open('results.json')); [print(e) for e in data['emails']]"
# Extract only JS files
python3 -c "import json; data=json.load(open('results.json')); [print(j) for j in data['js_files']]"
# Pretty-print the entire result
python3 -m json.tool results.jsonCommon Mistakes to Avoid
Running ReconSpider without reviewing js_files manually. JavaScript files frequently
contain hardcoded API keys, endpoint URLs, and authentication tokens that don't appear
anywhere else in the application. Skipping JS review means leaving the most exploitable
content layer untouched. Use Burp Suite to
proxy and inspect these endpoints directly after discovery.
Treating empty arrays as confirmed negatives. If form_fields or comments returns an empty array, it means ReconSpider didn't find any on the pages it crawled — not that none exist. Scrapy's crawl depth is finite. Manually check pages that ReconSpider may not have reached.
Ignoring external_files because they look harmless. PDFs and Word documents hosted on a target frequently contain author metadata, internal network paths, and revision history. Download and run exiftool against every file in this array before moving on.
Skipping the GitHub mirror when the Academy download fails. The academy.hackthebox.com wget URL occasionally returns a 404 or times out outside of active lab sessions. The GitHub mirror at github.com/HowdoComputer/ReconSpider-HTB is functionally identical — don't abandon the tool because one download link failed.
Running ReconSpider against out-of-scope targets. Scrapy will follow external links. Confirm your target scope before running and pass only in-scope domains. Crawling an unintended host — even accidentally — creates legal exposure.
Frequently Asked Questions
Conclusion
ReconSpider does one thing most recon tools skip: it reads what the application is openly exposing through its own content layer. Emails, JavaScript endpoints, external file references, and — most valuably — HTML comments all land in a structured JSON file after a single command. The workflow is: run ReconSpider first, triage result.json systematically, then feed discoveries into Nmap, Gobuster, and Burp Suite for the next recon layer. That sequencing keeps your coverage complete and your findings grounded in what the target is actually serving.
Sources
- ReconSpider-HTB GitHub Repository — Community mirror of the ReconSpider tool with installation instructions
- HackTheBox Academy — Footprinting Module — Official HTB module where ReconSpider is introduced
- Scrapy Documentation — Official docs for Scrapy, the Python crawling framework powering ReconSpider
- OWASP Web Security Testing Guide — Information Gathering — OWASP methodology for the recon phase ReconSpider supports
- Python Documentation — Reference for Python 3.7+ environment requirements