How to create a CTF a patch: the structure of the flag, storytelling and errors of authors

Depov

Moderator
Staff member
MODERATOR
ULTIMATE
SUPREME
PREMIUM
MEMBER
Joined
Feb 18, 2025
Messages
167
Reaction score
165
Deposit
0$
Educational goal is the basis of a good CTF task
Before you open the code editor, answer one question: what specific skill will the participant work when solving this hand? Not “hanging” and not “feel” and “feel like a hacker” – but what he will learn. Task without an educational goal - just a puzzle. The task for the purpose is a mini-course on a specific attack or defense technique.

The pattern that any organizer with experience will confirm: tasks with limited real applicability are quickly eaten. The stego-task, where once again used one tool with different passwords, or endless Twitter tasks on Base64/ROT13 - will go off like an icebreaker, but with abusing they begin to spoil the impression of the entire event
How to Formulate a Purpose
Three questions to each shack before starting the development:
1. What real attack or protective equipment is behind the task? SQL injection, buffer overflow, memory forensics - the task should simulate a real scenario.
2. What tool or methodology is being worked out? If the participant after the decision did not learn anything new about Ghidra, pwtools or Wireshark - the task is empty.
3. To what category (web, crypto, reversing, forensics, pwn) does the pack belong and what level of difficulty is adequate for the target audience?
An example of a poorly formulated goal: “A party must find the flag.” Example of good: “The participant studies Common Modulus Attack on RSA#Attacks_against_pain_SAA) - a situation where two ciphertexts are encrypted with the same module, but different exhibitors e1 and e2, and applies Euclid's Advanced Algorithm to restore the open text.”

The context of the real world is particularly critical for the forensics category. If the task is built entirely on Linux desktop artifacts - ask yourself: How often do you investigate incidents on Linux workstations? The context of “compromised web server on Linux” or “Windows-machine memory dumbbell with suspicious activity” is more plausible and teaches the skills that will be useful on real IR.
Target Audience Determines the Difficulty
How to Formulate a Purpose
Three questions to each shack before starting the development:
1. What real attack or protective equipment is behind the task? SQL injection, buffer overflow, memory forensics - the task should simulate a real scenario.
2. What tool or methodology is being worked out? If the participant after the decision did not learn anything new about Ghidra, pwtools or Wireshark - the task is empty.
3. To what category (web, crypto, reversing, forensics, pwn) does the pack belong and what level of difficulty is adequate for the target audience?
An example of a poorly formulated goal: “A party must find the flag.” Example of good: “The participant studies Common Modulus Attack on RSA#Attacks_against_pain_SAA) - a situation where two ciphertexts are encrypted with the same module, but different exhibitors e1 and e2, and applies Euclid's Advanced Algorithm to restore the open text.”

The context of the real world is particularly critical for the forensics category. If the task is built entirely on Linux desktop artifacts - ask yourself: How often do you investigate incidents on Linux workstations? The context of “compromised web server on Linux” or “Windows-machine memory dumbbell with suspicious activity” is more plausible and teaches the skills that will be useful on real IR.
Target Audience Determines the Difficulty
If the audience is schoolchildren on the first CTF, easy-stranded can be trivial: “Whirkey command in Linux to look at the list of files?” For pentesters with easy experience, there should be technical, but with one step of the solution.

The distribution of complexity depends on the context. For the CTF training, the working proportion is about 35% easy, 35% medium, 25% hard and 5% extreme. For a conference with an experienced audience, the proportions are shifted: 10/25/40/25. Numbers are not a dogma, but a starting point, confirmed by the practice of the organizers
Be honest with yourself: it is difficult even for an experienced author. Not sure - it's better to do another medium than a broken extreme.
CTF flag structure: format, generation, cheating protection
The flag is a line that the participant enters on the platform as an answer. Small at first glance, but the unsuccessful format is able to spoil the impression of the whole task.
Flag format and anti-patterns

What Kills the UX Flag:
• Lack of format. Just a line s3cur1ty_m4st3rwithout wrapping. The participant is not sure - this is the answer or random text from the task.
• Ambiguous symbols. Ovs 0, lvs 1, Ivs l. On one of the events, the participant spent 20 minutes, sorting out the spelling options - it's not a challenge design, it's UX failure.
• Intersection with ordinary data. If the prefix flag, line flagcan be found in the documentation or source code, creating false positives.
Static and Dynamic Flags
The statue flag is the same for all teams. The problem is obvious: one team decided, threw the answer to the chat - everyone passes without a solution. For the internal CTF tutorial, this is tolerable. For the competition, it is destructive.

The dynamic flag is generated individually for each team. Basic Scheme: team_id + challenge_id + secret key, SHA256, the first 16 characters of the hash:
Python:
import hashlib, os

def generate_flag(team_id: str, challenge_id: str) -> str:
secret = os.environ["FLAG_SECRET"]
seed = f"{team_id}:{challenge_id}:{secret}"
h = hashlib.sha256(seed.encode()).hexdigest()[:16]
CTFd supports dynamic flags through plugins: a customized type challenge is created that generates a unique response when the command calls to the task. rCTF and other platforms offer similar mechanisms. Sharing answers disappear, each decision is verifiable.
Protection against oglogia
Before the deck, drive away each flag through the search engine. It's - change. This applies not only to the flag itself, but also to the intermediate data: encryption keys, user passwords in the task, the names of tables in the database. Check your public repositories, Pastebin, gist-- all that test data could accidentally leak.
CTF Storytelling: How to Write a Task for CTF with History
Narrative is not mandatory for each task. But it is he who distinguishes the “normal” CTF from the one that is discussed for another six months after.
Theme of the competition
Successful CTFs are often combined with a common theme: film franchise, retro video games, corporate espionage, a universe of a particular series. The theme creates a single space in which each task is an episode. The participant is not just “looking for SQL injection” – it penetrates the antagonist’s order system, and this creates the motivation to dig deeper.

Thematicization turns each child into a small story with a purpose (source: contrastsecurity.com, guide to the organization CTF). And the theme gives the author a natural way to embed the clues - through the elements of the plot, and not through direct instructions.

Popular Themes: Classical Films (Jurassic Park, Back to the Future), retro games, western, corporate detective. The main thing is that the topic supports, and not replace the technical essence.
Hints in the title and description
A good CTF challenge design involves sewn tips. Example: The task is called “Financial PATH”, in the description – the story of an employee whose “career journey” led him to the financial secrets of the company. The word PATH - a hint on Path Traversal. For a newcomer, it's imperceptible. For an attentive participant - the vector of attack.

The formalized system of hints with fines for use works on events with a multi-ranked audience:
• Free tip: general direction (“Look at the request parameters”)
• Tip for -50 points: more specific ("Try UNION SELECT")
• Hint for -100 points: almost the answer ("Table is called admin_secrets")
Teams of different levels bring the task to the end, the competition for strong participants does not suffer.
Easter eggs and writeup as part of the narrative
Additional non-decision data create an atmosphere: employee correspondence in the forensics image, humorous comments in source code, false footprints with funny messages. In my experience, Easter eggs become a topic for discussion after the event and form the community effect.

Side effect: additional data increases complexity. If in the forensics-image three docx files and the question "what file opened on this date" - the answer is selected by thigest. If the files are thirty, you will have to look for artifacts in the right way. The SANS DFIR team, for example, puts serious efforts into generating realistic data for their capstone-vants – and then reuses these images for years, each time discovering new artifacts.

The CTF task writeup is the same part of the storylet as the task itself. Writepus fixes the intended solution, explains the course of the author's thought and consolidates the educational goal. The rule is simple: if writeup is not written before the deck, the task is not ready.
CTF Testing Tasks: Unintended Solution Hunting
Testing is the most underrated phase of the CTF competition development. Most failures are not related to a bad idea, but to the fact that the author did not try to break his own task.
What is unintended solution and how to look for them
Unintended solution is the way to the flag that the author did not provide. Classics from real events:
• Open MySQL port in the Docker container: participants connect to the database directly instead of SQL injection via web interface
• .git-directorship on the web server: the source leak along with the flag in config (hello from my intro)
• The flag visible through strings binarywithout full reversing in Ghidra or IDA
• File flag.txtwith rights 644- readable for everyone
• Absence .dockerignore: enters the container .env, Makefile. test scripts with answers
The technique that works: after writing the task, postpone it for a day. Then try to solve yourself without looking at the source. Use only the tools available to participants. Forget the intended path - try to break the task non-standard. Run nmap by container, check the standard paths (/robots.txt, /.env, /.git/HEAD), incite strings for every binary.
Peer review: other people's eyes find your blind spots
Give the task to a colleague. Explain only the category and level of complexity - nothing else. Observe:
• Decided in 5 minutes the task of the level of hard - the complexity is overestimated in your head
• Stuck for more than an hour - description of the task is bad, lacking context
• Found unintended path - you just saved your event
Recommendation from the organizers of major CTFs: each author writes a solution to his task and commits it to a general repository. Not every author will be available during the competition, and other organizers should be able to answer the questions of the participants. Written solution - mandatory requirement, not an option.
Adjacent requirements for testing
• OS: Same as on a combat server (usually Debian/Ubuntu last LTS)
• Docker: docker-compose with same limitations (CPU ≤0.5, RAM ≤256M)
• Network: Scanning open ports from outside. Anything that is not part of the task must be closed
• RAM: 4 GB is enough for local testing; if you lift multiple containers at the same time - 8 GB
The mistakes of the authors of the CTF: antipatterns from real events
For several years of organization and the authorship of the cottage, a collection of errors has accumulated - their own and others. I'm sharing so you don't repeat.
Guesswork-tasks
A task that cannot be solved logically. Examples confirmed by participants’ complaints about real events (source: contrast security.com): open a binary in a particular reversing tool, scroll the call graph to a certain scale and read the flag from the shape of the graph. Or: get GPS coordinates, go to street view, scroll the panorama to sign, take the name and drive through the MD5.

Rule: if the decision depends on guessing the tool or on actions not derived from the condition - this is a bad task. Every step should have a logical reason.
Unclosed infrastructure
Open ports, default passwords, no insulation. The Docker container with a port of MySQL out and the root password turns a four-hour chain into a five-minute connection to the base. Minimal Dockerffile for web-task:
Code:
FROM python:3.11-slim
RUN useradd -m -s /bin/bash ctfuser
COPY --chown=ctfuser:ctfuser . /app/
WORKDIR /app
USER ctfuser
EXPOSE 8080
CMD ["gunicorn", "--bind", "0.0.0.0:8080", "--workers", "2", "app:app"]
Unprecedented user, one open port, no extra services. CPU and RAM restrictions are set in docker-compose.yml through deploy.resources.limits. Deal with a separate fact that docker-compose.yml does not throw the database port on the host - this is the most common cause of unintended on web-psies.
Steganography for Steganography
Stage-task, where the only difference is the password to steghide, is tired after the second pull. One or two stegoes for the entire CTF - normal. Half of the tasks in the format of "open Stegsolve and go through the channels" - lazy challenge design.
Absence of monitoring
If the container fell or the task became unsolvable due to the actions of the participant - without monitoring, you will learn about it from the angry commands in the chat. Minimum: health-check containers, logging attempts to hand over flags, analts for atypical load or a drop in service.
Uncalibrated complexity
“I’ve decided in 10 minutes, it’s a quick-witted” is a trap that every novice author gets into. You know the answer, you wrote the code, you remember every line. For an objective assessment of the complexity - only peer review from a person who has not seen the source code. There are no other ways.
 
Top Bottom