View Issue Details

IDProjectCategoryView StatusLast Update
0009455Kali LinuxNew Tool Requestspublic2026-03-26 10:21
Reportercere23 Assigned Todaniruiz  
PrioritynormalSeverityminorReproducibilityhave not tried
Status closedResolutionwon't fix 
Summary0009455: MetaExtract - Modern document metadata extraction tool for OSINT (Metagoofil replacement)
Description

Category: New Tool Requests

Reproducibility: N/A

Severity: minor

Priority: normal

Summary: MetaExtract - Modern document metadata extraction tool for OSINT (Metagoofil replacement)

Description:

[Name] MetaExtract

[Version] 3.0.0

[Homepage] https://github.com/cereZ23/metaextract

[Download] https://github.com/cereZ23/metaextract/releases/tag/v3.0.0

[Author] Andrea Ceresoni [email protected]
(Based on original Metagoofil by Christian Martorella, Edge-Security)

[Licence] GPL-2.0

[Description] MetaExtract is a complete Python 3.12+ rewrite of Metagoofil, the document metadata extraction tool for OSINT. It searches for documents on a target domain, downloads them, and extracts metadata including usernames, email addresses, software versions, and file paths.

The original Metagoofil in Kali is broken due to:

  • Deprecated Python 2.x code
  • Broken Google scraping
  • Bundled obsolete libraries (hachoir 2011, old pdfminer)

    MetaExtract fixes all these issues with:

  • Async architecture using aiohttp
  • DuckDuckGo search (no API key required)
  • Modern metadata libraries (pdfminer.six, python-docx, openpyxl, python-pptx)
  • Rich CLI with progress indicators
  • HTML and JSON report export
  • Anti-rate-limiting with User-Agent rotation

    [Dependencies]

  • Python >= 3.12
  • aiohttp >= 3.9.0
  • pdfminer.six >= 20231228
  • python-docx >= 1.1.0
  • openpyxl >= 3.1.2
  • python-pptx >= 0.6.23
  • olefile >= 0.47
  • lxml >= 5.1.0
  • click >= 8.1.7
  • rich >= 13.7.0
  • pydantic >= 2.5.0
  • jinja2 >= 3.1.2

    [Similar tools]

  • Metagoofil (original, currently broken in Kali)
  • FOCA (Windows only)
  • Recon-ng (broader scope)

    [Activity]

  • Project started: December 2024
  • Actively maintained: Yes
  • CI/CD: GitHub Actions (Python 3.12, 3.13, Docker, Debian packaging)
  • Last commit: December 2024

    [How to install]
    pip install -e .

    Or using virtual environment:
    python3 -m venv .venv
    source .venv/bin/activate
    pip install -e .

    Or Docker:
    docker build -t metaextract .

    [How to use]

    Search domain for documents and extract metadata

    metaextract -d example.com -t pdf,docx -n 50 -f report.html

    Analyze local files

    metaextract --local -o ./documents -f report.json

    With rate-limit protection

    metaextract -d example.com -t pdf --delay 5 -f report.html

    [Packaged]
    Yes - debian/ directory included with:

  • debian/control
  • debian/rules
  • debian/changelog
  • debian/copyright
  • debian/source/format

    CI validates Debian packaging on every commit.

Activities

daniruiz

daniruiz

2026-03-26 10:21

manager   ~0021478

Hello,

Thanks for your submission. We can’t package every infosec tool, so we prioritize those with wider adoption and community usage.

Best of luck with your project.

Issue History

Date Modified Username Field Change
2025-12-19 14:54 cere23 New Issue
2026-03-26 10:21 daniruiz Note Added: 0021478
2026-03-26 10:21 daniruiz Assigned To => daniruiz
2026-03-26 10:21 daniruiz Status new => closed
2026-03-26 10:21 daniruiz Resolution open => won't fix