| Description | Category: New Tool Requests
Reproducibility: N/A
Severity: minor
Priority: normal
Summary: MetaExtract - Modern document metadata extraction tool for OSINT (Metagoofil replacement)
Description:
[Name] MetaExtract
[Version] 3.0.0
[Homepage] https://github.com/cereZ23/metaextract
[Download] https://github.com/cereZ23/metaextract/releases/tag/v3.0.0
[Author] Andrea Ceresoni [email protected]
(Based on original Metagoofil by Christian Martorella, Edge-Security)
[Licence] GPL-2.0
[Description] MetaExtract is a complete Python 3.12+ rewrite of Metagoofil, the document metadata extraction tool for OSINT. It searches for documents on a target domain, downloads them, and extracts metadata including usernames, email addresses, software versions, and file paths.
The original Metagoofil in Kali is broken due to:
- Deprecated Python 2.x code
- Broken Google scraping
-
Bundled obsolete libraries (hachoir 2011, old pdfminer)
MetaExtract fixes all these issues with:
- Async architecture using aiohttp
- DuckDuckGo search (no API key required)
- Modern metadata libraries (pdfminer.six, python-docx, openpyxl, python-pptx)
- Rich CLI with progress indicators
- HTML and JSON report export
-
Anti-rate-limiting with User-Agent rotation
[Dependencies]
- Python >= 3.12
- aiohttp >= 3.9.0
- pdfminer.six >= 20231228
- python-docx >= 1.1.0
- openpyxl >= 3.1.2
- python-pptx >= 0.6.23
- olefile >= 0.47
- lxml >= 5.1.0
- click >= 8.1.7
- rich >= 13.7.0
- pydantic >= 2.5.0
-
jinja2 >= 3.1.2
[Similar tools]
- Metagoofil (original, currently broken in Kali)
- FOCA (Windows only)
-
Recon-ng (broader scope)
[Activity]
- Project started: December 2024
- Actively maintained: Yes
- CI/CD: GitHub Actions (Python 3.12, 3.13, Docker, Debian packaging)
-
Last commit: December 2024
[How to install]
pip install -e .
Or using virtual environment:
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
Or Docker:
docker build -t metaextract .
[How to use]
Search domain for documents and extract metadata
metaextract -d example.com -t pdf,docx -n 50 -f report.html
Analyze local files
metaextract --local -o ./documents -f report.json
With rate-limit protection
metaextract -d example.com -t pdf --delay 5 -f report.html
[Packaged]
Yes - debian/ directory included with:
- debian/control
- debian/rules
- debian/changelog
- debian/copyright
-
debian/source/format
CI validates Debian packaging on every commit.
|
|---|