View Issue Details

IDProjectCategoryView StatusLast Update
0009455Kali LinuxNew Tool Requestspublic2025-12-19 14:54
Reportercere23 Assigned To 
PrioritynormalSeverityminorReproducibilityhave not tried
Status newResolutionopen 
Summary0009455: MetaExtract - Modern document metadata extraction tool for OSINT (Metagoofil replacement)
Description

Category: New Tool Requests

Reproducibility: N/A

Severity: minor

Priority: normal

Summary: MetaExtract - Modern document metadata extraction tool for OSINT (Metagoofil replacement)

Description:

[Name] MetaExtract

[Version] 3.0.0

[Homepage] https://github.com/cereZ23/metaextract

[Download] https://github.com/cereZ23/metaextract/releases/tag/v3.0.0

[Author] Andrea Ceresoni [email protected]
(Based on original Metagoofil by Christian Martorella, Edge-Security)

[Licence] GPL-2.0

[Description] MetaExtract is a complete Python 3.12+ rewrite of Metagoofil, the document metadata extraction tool for OSINT. It searches for documents on a target domain, downloads them, and extracts metadata including usernames, email addresses, software versions, and file paths.

The original Metagoofil in Kali is broken due to:

  • Deprecated Python 2.x code
  • Broken Google scraping
  • Bundled obsolete libraries (hachoir 2011, old pdfminer)

    MetaExtract fixes all these issues with:

  • Async architecture using aiohttp
  • DuckDuckGo search (no API key required)
  • Modern metadata libraries (pdfminer.six, python-docx, openpyxl, python-pptx)
  • Rich CLI with progress indicators
  • HTML and JSON report export
  • Anti-rate-limiting with User-Agent rotation

    [Dependencies]

  • Python >= 3.12
  • aiohttp >= 3.9.0
  • pdfminer.six >= 20231228
  • python-docx >= 1.1.0
  • openpyxl >= 3.1.2
  • python-pptx >= 0.6.23
  • olefile >= 0.47
  • lxml >= 5.1.0
  • click >= 8.1.7
  • rich >= 13.7.0
  • pydantic >= 2.5.0
  • jinja2 >= 3.1.2

    [Similar tools]

  • Metagoofil (original, currently broken in Kali)
  • FOCA (Windows only)
  • Recon-ng (broader scope)

    [Activity]

  • Project started: December 2024
  • Actively maintained: Yes
  • CI/CD: GitHub Actions (Python 3.12, 3.13, Docker, Debian packaging)
  • Last commit: December 2024

    [How to install]
    pip install -e .

    Or using virtual environment:
    python3 -m venv .venv
    source .venv/bin/activate
    pip install -e .

    Or Docker:
    docker build -t metaextract .

    [How to use]

    Search domain for documents and extract metadata

    metaextract -d example.com -t pdf,docx -n 50 -f report.html

    Analyze local files

    metaextract --local -o ./documents -f report.json

    With rate-limit protection

    metaextract -d example.com -t pdf --delay 5 -f report.html

    [Packaged]
    Yes - debian/ directory included with:

  • debian/control
  • debian/rules
  • debian/changelog
  • debian/copyright
  • debian/source/format

    CI validates Debian packaging on every commit.

Activities

There are no notes attached to this issue.

Issue History

Date Modified Username Field Change
2025-12-19 14:54 cere23 New Issue