#Taint analysis

Nyx tracks untrusted data from sources (where it enters the program) through assignments and function calls to sinks (where it's used dangerously). If the flow reaches a sink without passing a matching sanitizer, a finding fires.

The engine is a monotone forward dataflow over a finite lattice with guaranteed termination. It's flow-sensitive inside a function, and interprocedural across files via persisted per-function summaries.

#Rule ID

taint-unsanitised-flow (source <line>:<col>)

One rule ID, parameterized by the source location. Suppressions can target either the base ID or the full string.

#What it detects

User input flowing to shell execution: req.body.cmd → child_process.exec
User input flowing to code evaluation: req.query.code → eval
User input flowing to SQL: request.args.get('id') → cursor.execute(f"... {id}")
Environment variables flowing to shell: env::var("CMD") → Command::new("sh").arg("-c")
Request parameters flowing to HTML: req.query.name → innerHTML
File contents flowing to privileged sinks: fs::read_to_string → db.execute
Any other source-to-sink flow where the sink's required capability is not stripped along the way

#What it can't detect

Library calls without summaries. If a callee has no summary (no source, binary-only dependency), Nyx treats it as neither propagating nor sanitizing. This is conservative for sanitization but lossy for propagation.
Deep pointer aliasing. let y = &x; sink(*y) works through one level, but arbitrary chains of pointer arithmetic and aliased writes (*p, p->field in C/C++) are not tracked end-to-end. Function pointers and indirect calls resolve to no callee.
Implicit flows. Taint follows explicit data, not branching signal. if (secret) x = 1 else x = 0 does not taint x.
Globals and statics across functions. Not tracked across function boundaries.

#Common false positives

Scenario	Why	Mitigation
Custom sanitizer not recognised	Only built-in + configured sanitizers match	Add a custom sanitizer rule in config
Container holds mixed-typed items the engine cannot tell apart	A `vector<int>` of port numbers and a `vector<string>` of user input share the same store/load model	Sanitize the values on the way in (numeric parse / explicit validator) so the values themselves carry no cap, not just the container
Dead branches	Path-insensitive within a function	Constraint solving catches trivially infeasible combos; path-validated findings are scored lower
Library wrapper re-introduces taint	Wrapper opaque, or summary marks it as propagating	Summarize the wrapper explicitly or add it as a sanitizer

#Common false negatives

Scenario	Why
Third-party library on the path	No summary available, callee treated opaquely
Globals / statics across function boundaries	Not tracked
Some closure captures	Closure analysis is limited. JS/TS/Ruby/Go anonymous functions passed as callbacks are analyzed as separate scopes
Very deep cross-file chains	Summary approximation loses precision at depth

#Confidence signals

Higher confidence:

Source + Sink both present in evidence with specific call locations.
source_kind: user_input (direct attacker control).
path_validated: false.
No dominating guard on the path.
Symex produced a witness string (rendered sink value visible in JSON/SARIF evidence.symbolic.witness).

Lower confidence:

Path-validated taint (path_validated: true).
Source is a database read or internal file (pre-validated at insertion is common).
Engine note ForwardBailed / PathWidened. Use --require-converged to drop these in strict gates.

#Tuning

#Custom sanitizer

# nyx.local
[[analysis.languages.javascript.rules]]
matchers = ["escapeHtml", "sanitizeInput"]
kind     = "sanitizer"
cap      = "html_escape"

Or: nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape.

#Filter by severity or confidence

nyx scan . --severity HIGH
nyx scan . --min-confidence medium

#Skip dataflow entirely

nyx scan . --mode ast

AST-only mode gives you structural pattern matches without taint.

In the browser UI, taint findings render as a numbered flow walk so you can see each hop the engine took:

Nyx finding detail: HIGH taint-unsanitised-flow with numbered source → call → sink steps and How to fix guidance

#Example

Rust:

use std::env;
use std::process::Command;

fn main() {
    let cmd = env::var("USER_CMD").unwrap();           // source
    Command::new("sh").arg("-c").arg(&cmd).output();   // sink
}

Finding:

[HIGH] taint-unsanitised-flow (source 5:15)  src/main.rs:6:5
       Unsanitised user input flows from env::var → Command::new
       Source: env::var (5:15)
       Sink:   Command::new

Safe rewrite: drop the shell and pass the value as argv directly (Command::new(&cmd).output()), or validate against an allowlist before passing to the shell.

#Capabilities

Sources, sanitizers, and sinks are linked by named capabilities. A sanitizer only clears taint for the cap it declares. A sink only fires when the remaining taint still carries its required cap.

Capability	Typical source	Typical sanitizer	Typical sink
`env_var`	`env::var`, `getenv`, `process.env`
`html_escape`		`html.escape`, `DOMPurify.sanitize`	`innerHTML`, `document.write`
`shell_escape`		`shlex.quote`, `shell_escape::escape`	`system`, `Command::new`, `eval`
`url_encode`		`encodeURIComponent`	`location.href`, HTTP client URL arg
`json_parse`		`JSON.parse`
`file_io`		`os.path.realpath`, `filepath.Clean`, canonicalise + `starts_with`-rooted guard	`open`, `fs::read_to_string`, `send_file`
`fmt_string`			`printf(var)`
`sql_query`		parameterized query binders	`cursor.execute`, `db.query` with concatenation
`deserialize`			`pickle.loads`, `yaml.load`, `Marshal.load`
`ssrf`		URL-prefix locks	`requests.get`, `fetch` URL arg, outbound HTTP destination
`code_exec`			`eval`, `exec`, `Function`
`crypto`			weak-algorithm constructors
`unauthorized_id`	request-bound scoped IDs (Rust auth analysis)	ownership check	row-level write
`ldap_injection`		`ldap-escape` filter / dn helpers, project-local `escapeLdapFilter`	`DirContext.search`, `LdapClient.search`, `ldap_search`, `Net::LDAP#search`, `ldap_search_ext_s`
`xpath_injection`		bound `XPathVariableResolver`, `escapeXpath` / `xpathEscape` helpers	`XPath.evaluate`, `DOMXPath::query`, `document.evaluate`, `xpath.select`, `etree.XPath`
`header_injection`		`stripCRLF` / `escapeHeader` / `sanitizeHeader`	`setHeader`, `res.set`, `res.append`, `headers["X-Foo"] = bar`, `Header().Set`, `header()`, `setcookie`
`open_redirect`		leading-slash check (`startsWith("/")`), URL-parse + host allowlist (`new URL(x).host === ALLOWED`)	`Redirect::to`, Spring `redirect:` view name, `flask.redirect`, `http.Redirect`, `redirect_to`
`ssti`			template constructors fed by tainted source: `Jinja2 Template(...)`, `freemarker.Template`, `Twig::createTemplate`, Handlebars `compile`, `ERB.new`, Mako `Template(...)`
`xxe`		hardened parser config (`secure_processing`, `disallow-doctype-decl`, `processEntities: false`, `LIBXML_NOENT` not set)	`DocumentBuilder.parse`, `SAXParser.parse`, `xml2js`, `fast-xml-parser`, `lxml.etree.parse`, `xmlReadFile`
`prototype_pollution`		constant-key fold, reject / allowlist guards on the key, `Object.create(null)` receivers	`obj[tainted] = v` synthetic `__index_set__`, `_.merge`, `_.set`, `dotProp.set`, `objectPath.set`, jQuery `extend(true, ...)`
`data_exfil`	cookies, headers, env, db rows, file reads (Sensitive-tier sources only)		`fetch` body / headers / json, `XMLHttpRequest.send` body
`all`	Sources typically use `all` so they match any sink

Sources typically use cap = "all" so they match every sink. Sinks declare the specific cap they need. Sanitizers only clear the cap they name.

#Source sensitivity

Some detector classes need to know not just that a value is attacker-influenced but what kind of value it is. Each source carries a SourceKind (UserInput, Cookie, Header, EnvironmentConfig, FileSystem, Database, CaughtException, Unknown) and a derived sensitivity tier:

Tier	Source kinds	Meaning
`Plain`	`UserInput` (request bodies, query strings, form fields, argv, stdin)	Attacker-controlled but already in the attacker's hands. Echoing it back to them is not a disclosure.
`Sensitive`	`Cookie`, `Header`, `EnvironmentConfig`, `FileSystem`, `Database`, `CaughtException`, `Unknown`	Operator-bound state that should not leak across boundaries.
`Secret`	(reserved for explicit credential sources)	Highest tier; treated identically to `Sensitive` today.

Cap::DATA_EXFIL only fires when the contributing source is at least Sensitive. Plain user input flowing into an outbound fetch body is suppressed at finding-emission time. That is the canonical false-positive class for API gateways and telemetry forwarders that proxy req.body. SSRF and other classes are unaffected; the gate is scoped to DATA_EXFIL.

If a project legitimately classifies a request body as sensitive (e.g. an internal forwarder where req.body carries a pre-authenticated user token), override via custom rules in nyx.conf:

# Treat the forwarder's outbound payload as already-sanitized so the
# DATA_EXFIL gate stops firing on it.
[[analysis.languages.javascript.rules]]
matchers = ["sanitizeOutbound"]
kind     = "sanitizer"
cap      = "data_exfil"

Or re-classify the source itself with a custom Source rule whose name matches one of the Sensitive substrings (cookie, header).

#DATA_EXFIL suppression layers

Three suppression knobs ship by default so projects can match the cap to their architecture without per-call suppressions.

#1. Forwarding-wrapper sanitizer convention

A named function that exists to forward a payload across a known boundary is the developer's explicit decision to send the data. The default sanitizer rules treat the following identifiers as Sanitizer(data_exfil) in JavaScript and TypeScript:

serializeForUpstream
forwardPayload
tracker.send
analytics.track
metrics.report
logEvent

If your codebase follows this convention, the cap stops firing on these calls automatically. Extend the convention with your own forwarding wrappers via the standard custom-rule path:

[[analysis.languages.javascript.rules]]
matchers = ["dispatchTelemetry", "sendToBus"]
kind     = "sanitizer"
cap      = "data_exfil"

The rule of thumb: a function that only exists to ship a payload to a known boundary belongs in this list. A function that might leak (a generic HTTP wrapper, a logging helper that writes to an arbitrary destination) does not.

#2. Destination allowlist

Configure a set of trusted outbound prefixes once and the cap is dropped on every site whose destination argument has a static prefix that begins with one of them:

[detectors.data_exfil]
trusted_destinations = [
  "https://api.internal/",
  "https://telemetry.",
]

Use full origins or origin-pinned paths so a partial-host match across unrelated origins cannot occur. https://api. would also match https://api.evil.example.com/, so the entry must include the path separator (/) at the end of the host.

The match consults the abstract string domain: a literal URL is a static prefix; a template literal \https://api.internal/${id}\`` exposes the prefix https://api.internal/; a fully dynamic URL has no prefix and the cap fires as usual.

#3. Detector-class disable

Some projects forward user-bound payloads as a matter of architecture. Turn the entire detector class off when the noise is permanent:

[detectors.data_exfil]
enabled = false

enabled = false strips Cap::DATA_EXFIL from sink caps before event emission, so no taint-data-exfiltration finding reaches the report. The decision is per-project; other projects loaded by the same nyx serve instance keep their own settings.

#DATA_EXFIL sinks per language

Sinks Nyx ships with for Cap::DATA_EXFIL. The body, headers, or json payload arg fires; the URL arg routes through the SSRF gate and emits taint-unsanitised-flow instead.

Language	Sinks	Example
JavaScript, TypeScript	`fetch(url, {body, headers, json})` body-bind, `XMLHttpRequest.prototype.send`, type-qualified `HttpClient.send`	`fetch('/upload', {method: 'POST', body: req.cookies.session})`
Python	`requests.post / put / patch` body and json kwargs, `httpx.AsyncClient().post` json kwarg, `aiohttp.ClientSession().post` body, dict round-trip into json	`requests.post('https://api.internal/ingest', json={'k': os.environ.get('SECRET')})`
Java	`HttpClient.send` with `BodyPublishers.ofString`, OkHttp `newCall(req).execute` body chain, Apache `HttpClient.execute(HttpPost)`, `RestTemplate.postForEntity / exchange`, `WebClient.post().bodyValue / body`	`client.send(HttpRequest.newBuilder().uri(...).POST(BodyPublishers.ofString(token)).build(), ...)`
Go	`http.Post(url, ct, body)` body arg, `http.PostForm` form arg, `(http.Client).Do(req)` after `http.NewRequest`, `(http.Request).Body` assignment	`http.Post("https://analytics.internal/track", "text/plain", strings.NewReader(c.Value))`
Rust	`reqwest::Client.post().body / json / form / multipart().send()`, `ureq::post().send_string / send_form / send_json`, `surf::post().body_string / body_json`, `hyper::Request::builder().body()`	`reqwest::Client::new().post(url).form(&secret).send()`
Ruby	`Net::HTTP.post(uri, body)` body arg, `Net::HTTP::Post.new(uri).body=`, `RestClient.post / put`, `HTTParty.post(url, body: ...)` body	`Net::HTTP.post(URI('https://analytics.internal/track'), "session=#{request.cookies[:auth]}")`
C, C++	`curl_easy_setopt(handle, CURLOPT_POSTFIELDS, body)` and `CURLOPT_COPYPOSTFIELDS` gated sinks (macro-arg activation), `CURLOPT_POSTFIELDSIZE` body-bind	`curl_easy_setopt(curl, CURLOPT_POSTFIELDS, getenv("AUTH_TOKEN"));`
PHP	`curl_setopt($ch, CURLOPT_POSTFIELDS, $body)`, `Guzzle\Client.post($url, ['body' => $tainted])`, `Symfony\HttpClient->request('POST', $url, ['body' => $tainted])`	`curl_setopt($ch, CURLOPT_POSTFIELDS, $_COOKIE['session']);`

Add project-specific sinks with nyx config add-rule --kind sink --cap data_exfil --matcher <name> or the equivalent TOML rule.

#DATA_EXFIL calibration ranges

taint-data-exfiltration is calibrated below the other taint classes on purpose.

Source kind	Severity	Confidence ceiling
Cookie, environment variable	High	Medium
Header	Medium	Medium
File system, database	Medium	Medium
Caught exception	Medium	Low

Path-validated flows (path_validated: true) drop one severity tier. Confidence drops to Low when the abstract or symbolic domain cannot corroborate a concrete string reaching the outbound payload (for example, when the body comes from a callee with no summary).

Attack-surface score ranges:

Finding shape	Score
High DATA_EXFIL, cookie or env source, body confirmed	around 76
Medium DATA_EXFIL, header, fs, db, or caught-exception source	40 to 45
Low DATA_EXFIL, no abstract corroboration, path-validated	18 to 25

For reference: High SSRF, SQLi, cmdi land at 76 to 81; Medium taint with env source lands at 45 to 50; AST-only patterns sit around 10. Data-exfil sits below the direct-compromise classes but above informational AST patterns.