Penetration Testing: What It Actually Involves and Why It’s Harder Than It Looks

The popular image of a penetration test is a lone hacker in a dark room, typing furiously, exploiting a vulnerability with a single dramatic keystroke. The reality is closer to a methodical multi-week process that involves more documentation, more communication, and more careful planning than any TV depiction suggests.

Understanding what a penetration test actually involves — and what the common variations on scope and methodology mean — matters whether you’re commissioning one, evaluating whether your organisation needs one, or working in security.

What a Penetration Test Is (And Isn’t)

A penetration test is an authorised simulation of an attack against a defined target — a network, an application, a specific system, or some combination — with the goal of identifying exploitable vulnerabilities before an actual attacker does.

The key word is authorised. Everything a penetration tester does is done with explicit written permission from the organisation that owns the target. This distinguishes penetration testing from actual hacking; the legal and contractual framework is as important as the technical work.

A penetration test is not a vulnerability scan. Automated vulnerability scanners (Nessus, Qualys, OpenVAS) identify known vulnerability signatures — software versions with CVEs, open ports, weak configurations. They generate output quickly and cheaply. What they can’t do is chain vulnerabilities together, exploit business logic flaws, or assess the real-world impact of a finding. A pen test does these things; a scan doesn’t.

A pen test is also not a comprehensive security audit. An audit assesses policies, processes, and controls against a standard. A pen test tests whether specific defences can be bypassed in practice.

The Scoping Phase: Where Most Value Is Lost

The quality of a penetration test is determined primarily in the scoping phase, before a single packet has been sent. Scope decisions determine what the tester can and cannot do, which determines what gets tested and what goes undiscovered.

What’s in scope: Specific IP ranges, domain names, applications, physical locations, user roles. “The entire internet-facing infrastructure” and “just the login page of the web application” are both valid but very different engagements.

Rules of engagement: Can the tester use social engineering? Can they test production systems that process live transactions, or only a staging environment? Are there hours during which testing is prohibited to avoid disrupting operations? Can they conduct denial-of-service testing?

Starting point and knowledge level: Black box testing starts with no information about the target — the tester must discover it, as a real external attacker would. White box (or crystal box) testing provides full information — network diagrams, source code, architecture documentation — allowing the tester to focus depth of testing rather than reconnaissance breadth. Grey box is somewhere between.

The choice between these isn’t purely about realism. Black box testing mirrors an external attacker’s experience; white box testing is more efficient for finding the most serious vulnerabilities when time is the constraint. Most mature organisations use grey box testing for application assessments — providing context that an internal attacker or an attacker who has done initial reconnaissance would have.

The Methodology

Professional penetration testing follows a structured methodology. Different frameworks (PTES, OWASP Testing Guide, NIST SP 800-115) use different terminology but cover similar phases.

Reconnaissance: Understanding the target before touching it. Passive reconnaissance — OSINT, public records, job postings (which reveal tech stack), certificate transparency logs, historical DNS records — produces a detailed picture of the attack surface without sending a single packet to the target. Active reconnaissance — port scanning, service enumeration — begins to touch the target and requires that scoping is in place.

Scanning and enumeration: Detailed mapping of what’s running on the target: services, versions, configurations, potential entry points. This is where automated tools contribute most, but experienced testers interpret output that tools alone would misclassify.

Exploitation: Actively attempting to exploit identified vulnerabilities. This requires both technical skill and judgment — understanding which vulnerabilities are worth exploiting given the engagement constraints, which exploits risk causing unintended damage, and how to chain multiple low-severity findings into a higher-severity path.

Post-exploitation: If initial access is obtained, how far can an attacker go? Privilege escalation, lateral movement, persistence, data exfiltration attempts — the post-exploitation phase determines whether an initial compromise leads to significant impact or is contained.

Reporting: The deliverable that most clients will spend the most time with. A good penetration test report communicates findings to two audiences: technical staff who need to understand and fix the specific vulnerability, and leadership who need to understand the business risk. Findings should be rated by severity, described with enough detail to reproduce, and accompanied by concrete remediation guidance.

Application vs Network Penetration Testing

These are the two most common engagement types and they require different methodologies, different toolsets, and different expertise.

Network penetration testing focuses on the infrastructure layer: firewalls, routers, switches, servers, VPNs, wireless networks. The attack chains typically involve initial foothold through an internet-exposed service, privilege escalation on the host, and lateral movement through the internal network. Common findings include unpatched services, weak credentials, network misconfigurations, and overly permissive internal firewall rules.

Web application penetration testing focuses on the application layer: authentication mechanisms, authorisation controls, session management, input validation, API endpoints, business logic. The OWASP Top Ten provides a useful framework of the most common and consequential categories — injection flaws, broken access control, security misconfigurations, and so on. Web application testing requires understanding application architecture deeply enough to identify logic flaws that no automated scanner would catch.

Most modern organisations need both, and the team structure should reflect this — good network testers are not necessarily good application testers and vice versa.

Red Team Engagements

A red team engagement is a fundamentally different exercise from a penetration test, though the two are frequently conflated.

A penetration test aims to find as many vulnerabilities as possible within a defined scope and time period. A red team engagement simulates a realistic, goal-oriented adversary — typically with a defined objective like “access the finance director’s email” or “exfiltrate customer payment data” — using whatever realistic techniques a sophisticated attacker would employ, including physical security testing and social engineering, over an extended period without notifying defensive teams.

The red team is testing the people and processes as much as the technology. Whether an organisation detects and responds to the simulated attack is as important a finding as whether the attack succeeded. Red team exercises require significantly more preparation, expertise, and budget than standard penetration tests, and are most valuable for organisations that have already addressed the fundamentals.

What the Report Should Tell You

The finding-by-finding list is the least important part of a penetration test report for most stakeholders. What matters:

Attack paths, not just vulnerabilities: A single critical vulnerability that’s exploitable from the internet is different from a chain of three medium-severity vulnerabilities that together give an attacker domain administrator access from a phishing email. The report should describe paths, not just individual weaknesses.

Actual versus theoretical impact: “SQL injection in the login form” is a finding. “SQL injection in the login form allowed extraction of all 200,000 customer records, including hashed passwords, within 15 minutes of discovery” is a finding with impact. The difference matters for prioritisation.

Remediation specificity: “Patch the application” is not helpful guidance. “The Django framework is running version 3.1.2; update to 3.2.18 or later to remediate CVE-2022-34265, and additionally review the custom authentication bypass in views.py line 143 which is unrelated to the CVE and requires code-level remediation” is actionable.

Risk-ranked prioritisation: Not everything can be fixed at once. A clear prioritisation of what to address first, based on exploitability and business impact, helps security teams make the case for remediation resources.

Common Misconceptions

“We just did a vulnerability scan, that covers it”: A vulnerability scan and a penetration test address different questions. The scan asks “what known vulnerability signatures are present?” The pen test asks “can an attacker actually exploit these, and what can they do if they succeed?”

“We’ll do a pen test when we’re ready”: Waiting until everything is perfect before testing means never testing when it matters. Testing an imperfect system reveals the most severe problems; finding those problems earlier is the point.

“If the pen tester didn’t find anything, we’re secure”: A clean pen test report means the tester didn’t find anything within scope, with the techniques available, in the time allocated. It’s useful evidence of security, not proof of it.

“Our cloud provider is responsible for security”: Cloud providers (AWS, Azure, GCP) are responsible for the security of the infrastructure. You are responsible for the security of what you build and configure on top of it. Most breaches in cloud environments involve misconfigurations, not infrastructure vulnerabilities.

Getting Value from the Exercise

The organisations that get the most from penetration testing treat it as part of a continuous security programme, not a compliance checkbox. Findings are tracked to remediation, re-tested after fixes, and used to drive systemic improvements in development and operations processes.

The test that produces a report that sits unread in a shared drive is an expensive waste. The test whose findings drive actual remediations, whose attack paths inform how your incident response team thinks about detection, and whose results are discussed with engineering leadership — that’s a penetration test producing real security value.