View Issue Details

IDProjectCategoryView StatusLast Update
0008064Kali LinuxQueued Tool Additionpublic2024-04-13 21:12
Reporterentropydaemon1 Assigned To 
PrioritynormalSeverityminorReproducibilityN/A
Status acknowledgedResolutionopen 
Summary0008064: ScrapPY - a Python utility for scraping manuals, documents, and other sensitive PDFs to generate targeted wordlists
Description

Name: ScrapPY
Version: V1
HomePage/Download: https://github.com/RoseSecurity/ScrapPY
Author: RoseSecurity
License: WTFPL

Description: ScrapPY is a Python utility for scraping manuals, documents, and other sensitive PDFs to generate targeted wordlists that can be utilized by offensive security tools to perform brute force, forced browsing, and dictionary attacks. ScrapPY performs word frequency, entropy, and metadata analysis, and can run in full output modes to craft custom wordlists for targeted attacks. The tool dives deep to discover keywords and phrases leading to potential passwords or hidden directories, outputting to a text file that is readable by tools such as Hydra, Dirb, and Nmap. Expedite initial access, vulnerability discovery, and lateral movement with ScrapPY!

Dependencies: scipy, collections, pandas, PyPDF2, and textract.
Similar Tools: Cewl
Activity: Released within the month but is actively being updated with new features including image OCR analysis
Install: After dependencies are installed, only requires Python 3.

Activities

g0tmi1k

g0tmi1k

2023-02-03 15:39

administrator   ~0017455

@kali-team, please could this be packaged up.
@author, If you want to help the packaging process, you can check the documentation here ~ https://www.kali.org/docs/development/public-packaging

Arszilla

Arszilla

2024-04-13 21:12

reporter   ~0019143

I've been trying to package this for the team, however, it requires the following libraries to be packaged up too:

  • PyPDF2
  • textract

However, this is easier said than done. One package that textract requires is pstotext, which does not exist anymore: https://github.com/deanmalmgren/textract/issues/504

This is an archaic library that perhaps only remains in Ubuntu and would be a pain to copy to Debian, especially because it's as ancient as time.

Issue History

Date Modified Username Field Change
2022-11-21 03:20 entropydaemon1 New Issue
2023-02-03 15:39 g0tmi1k Status new => acknowledged
2023-02-03 15:39 g0tmi1k Category New Tool Requests => Queued Tool Addition
2023-02-03 15:39 g0tmi1k Summary ScrapPY: a Python utility for scraping manuals, documents, and other sensitive PDFs to generate targeted wordlists => ScrapPY - a Python utility for scraping manuals, documents, and other sensitive PDFs to generate targeted wordlists
2023-02-03 15:39 g0tmi1k Note Added: 0017455
2024-04-13 21:12 Arszilla Note Added: 0019143