DoHExfTlk — Exfiltration Guide¶
TL;DR — One command¶
# From the repo root (where docker-compose.yml lives)
cd exfiltration/client
bash scripts/run_pipeline.sh
*.json configs in ./test_configs/, run a test for each, pull back artifacts, aggregate ML stats, and rank configs by least detected.
What the pipeline does¶
- Starts containers (exfil client + traffic analyzer).
- For each config JSON:
- Runs the exfiltration client (
run_client.py) against your DoH endpoint, using the file you set (default:/app/test_data/image.png). - Waits ~30s for traffic to be captured.
- Filters DoHLyzer output to keep only suspected exfil flows.
- Classifies those flows with
ml/predictor.py(per‑model thresholds). - Copies the filtered CSV and predictor log back to the host.
- Prints a per‑model summary and a final ranking of configs (which one evaded best).
Files & folders that matter¶
- Configs (input):
exfiltration/client/test_configs/*.json - Results (output on host):
exfiltration/client/results/run-YYYYmmdd-HHMMSS/ - One subfolder per config, containing:
logs/client.log&logs/predictor_*.logcaptured/*.only_*.csv(filtered flows used by the predictor)
- ML scripts:
ml_analyzer/model_trainer.py,ml_analyzer/predictor.py
Environment switches (quick reference)¶
You can override these when launching the script:
TEST_CONFIG_DIR # default ./test_configs
RESULTS_DIR # default ./results
DOCKER_COMPOSE_FILE # default ../../docker-compose.yml
EXFIL_CONTAINER # default exfil_client
ANALYZER_CONTAINER# default traffic_analyzer
CLIENT_PY_PATH # default /app/run_client.py
IN_CONTAINER_CONFIG_DIR # default /app/test_configs
IN_CONTAINER_CAPTURED_DIR # default /app/captured
IN_CONTAINER_FILTER_SCRIPT# default /app/filter_detection_csv.sh
PREDICTOR_PY # default /ml/predictor.py
FILE_TO_EXFILTRATE # default /app/test_data/image.png
RESULTS_DIR=./my_runs FILE_TO_EXFILTRATE=/app/test_data/doc.pdf bash scripts/run_pipeline.sh
Reading the output¶
At the end you’ll see two things:
1) Per‑model totals (aggregated over all configs):
Model Benign Malicious Total Detection Rate
random forest 280 528 808 65.35%
logistic regression 778 30 808 3.71%
...
Config Model used Detection rate
apt_simulation random_forest 12.50%
stealth_burst logistic_regression 3.23%
...
Best (least detected): stealth_burst — 3.23% — model: logistic_regression
If no logs are found for a config, you’ll see a message and it will be skipped.
Exfiltration Server¶
- Listens on your chosen interface (default
eth0) viaDoHTrafficInterceptor. - Detects suspicious DoH queries (e.g., domains containing
exfill,data,leak). - Parses chunks named like:
TIMESTAMP-INDEX-TOTAL-ENCODED...
(e.g.,17545887-12-143-k8I4U...) - Reassembles when all chunks arrive, tries to decode in order: 1) URL‑safe Base64 → 2) Base64 → 3) Base32 → 4) Hex → else raw bytes
- Writes a file to
/app/captured/and guesses type by magic bytes; renames with extension when possible (e.g.,.png,.pdf,.zip).
Tip: sessions are tracked in memory; once complete, the server cleans them up.
Minimal troubleshooting¶
- No filtered CSV found: ensure the traffic actually hit
/dns-queryand the filter script path is correct (IN_CONTAINER_FILTER_SCRIPT). - Predictor says no models: train first (
ml_analyzer/model_trainer.py), then re‑run pipeline. - TLS errors: for quick tests use
curl -kor import the generated CA into your lab machine. - Ranking empty: check per‑config
logs/predictor_*.logfor model sections.
Good to know¶
- Add new test scenarios by dropping more
*.jsonfiles intotest_configs/. - Change the file you exfiltrate with
FILE_TO_EXFILTRATE(inside the exfil client container path). - You can tune ML thresholds by retraining with a different
--fprtarget. - The final winner is the config with the lowest malicious rate for the chosen model.