Whitepaper / Python bugs in production
a working taxonomy of python bugs and optimization opportunities
8 families~45 patternstool coverage map Bug families, optimization patterns, and detection landscape for production python codebases. Reference material for static analyzers, type checkers, security scanners, and LLM-assisted code review.
Premise
Python's flexibility is its footgun surface. Late binding, duck typing, mutable defaults, the GIL, async colored functions, two string types — every convenience has a documented failure mode. The bug families below are the ones that actually hit production, ranked by which tool tier catches them.
The taxonomy is not novel; many of these bugs have been known since Python 2. What's worth structuring is the detection landscape: which family is reliably caught by which tool tier, and which families require LLM-grade semantic review because no tool sees them.
Family map
Runtime exceptions
The visible bugs. IndexError, KeyError, AttributeError, TypeError show up at runtime; the program crashes; you find them fast. The hidden danger is the conditional path where the exception fires only under specific input — your test suite happens to miss that branch and the bug ships.
Subtypes worth calling out:
- None dereference: functions returning
Optional[T]whose return value is used without a null check. Caught reliably by pyright --strict andmypy --strict. Missed by default ruff. - Bare except: catches
SystemExit,KeyboardInterrupt, and the bug you needed to see. Caught by ruff E722 in defaults. - Off-by-one in slicing: Python's half-open intervals trip up people thinking in C-style inclusive ranges. Needs semantic review; no tool catches the intent mismatch.
Logic and state
The most uniquely Pythonic bugs. Mutable default arguments and late binding in closures are interview questions because they keep biting real code. Knowing about them doesn't stop the bug from appearing — a tired engineer writes def f(x=[]): in a code review at 5pm and ships it.
Other logic bugs in this family: dict ordering assumptions (subtle since Python 3.7 guarantees order, but legacy code may still assume hash order), shadowed builtins (assigning to list or id at module level), truthy/falsy traps (a function returning 0 is indistinguishable from one returning None via a truth check).
Data and types
Silent wrong-answer bugs. The most expensive class in the taxonomy because they don't crash. str/bytes mixing ships data with mojibake. Shallow copies share inner mutable state. Reference aliasing means a function that "modifies its argument" mutates the caller's object.
Type checkers help when configured. pyright --strict catches the str/bytes and Optional cases. mypy --strict catches them slower but as reliably. Neither catches reference aliasing — that's semantic. copy.deepcopy vs copy.copy needs human review of intent.
Resource lifecycle
Files, sockets, locks, db connections. Each is finite. The context manager pattern (with statement) is the idiomatic answer. Code that opens resources without a with block — especially in functions with early returns or exception paths — leaks under load.
ruff SIM115 catches naked open() outside with. ResourceWarning fires at runtime if a file is GC'd while open. Catching db connection leaks is harder — they're typically per-library and ruff doesn't know the connection-pool API.
Special mention: unbounded module-level caches. The classic "memoization" pattern that grows forever. functools.lru_cache(maxsize=N) is bounded; a bare dict is not.
Concurrency and async
The highest-LLM-value bug family. Tools rarely catch async/sync mismatches, shared state across await points, or improper task lifecycle. The asyncio gotchas catalog in the python-analyzer-research corpus documents 12 patterns that no default tool config detects.
Top hitters:
- Blocking calls in async functions:
time.sleep(),requests.get(),open().read()insideasync defblocks the entire event loop. ruffASYNC251,ASYNC210catch with the right select. - Task not awaited:
asyncio.create_task()return value discarded, GC collects mid-flight. ruffRUF006. - Shared mutable state across awaits: pure-Python statements are atomic between awaits, but
awaityields the loop. Read-modify-write across anawaitraces without a lock. No tool catches this. - queue.Queue vs asyncio.Queue: same name, different semantics. The threading queue blocks the loop. No tool catches.
- GIL assumptions:
counter += 1is not atomic at the bytecode level. The most expensive misconception in Python concurrency.
Security
Python ships unsafe primitives. pickle.loads is arbitrary code execution. SQL injection via f-strings is CWE-89 wrapped in syntactic sugar. yaml.load constructs arbitrary Python objects unless you use safe_load. eval and exec on user input are the same as giving them a shell.
bandit catches the obvious cases — pickle imports, hardcoded credentials, shell=True. Indirect cases (pickle hidden in joblib, sql via SQLAlchemy text() with f-string content) often slip through. The pattern "tools catch direct, miss indirect" is general in Python security analysis.
Performance opportunities
Most Python performance problems are algorithmic or idiomatic, not the GIL. The GIL is the convenient excuse; the actual cost is usually O(n²) inner loops from list-as-set membership tests, string concatenation in loops, or unnecessary materialization of generators into lists.
The biggest delta comes from vectorizing numpy/pandas code. A Python loop over a numpy array can be 100-1000x slower than the vectorized form. This is rarely caught by tools because it requires recognizing the array context.
perflint exists for this family. ruff PERF covers a subset. Most projects don't run either.
Detection coverage
| family | tier-1 tools | opt-in tools | needs LLM |
|---|---|---|---|
| runtime exceptions | pyright/mypy strict | ruff Optional access plugins | conditional paths missed by tests |
| logic & state | ruff B, SIM | ruff RUF, A (builtins) | cross-function mutation; intent vs code |
| data & types | pyright/mypy strict | ruff UP for str/bytes upgrades | reference aliasing; shallow vs deep intent |
| resource lifecycle | ruff SIM115 | pylint R1732 | cross-function leaks via early return |
| concurrency & async | — | ruff ASYNC | shared state races; lifecycle; semantic flow |
| security | bandit defaults | semgrep custom; S ruff plugin | indirect injection through libraries |
| performance | — | perflint; ruff PERF | workload-dependent hotness; vectorize candidates |
| smells | radon; ruff C | xenon for thresholds | god-class refactor recommendations |
Roughly: ~30% of bug-finding value needs LLM-grade reasoning. The other 70% is achievable by tools that most projects don't enable. The python-analyzer skill's job is both: run the opt-in tools by default, then handle the 30% that needs semantic analysis.
Baseline configuration
For projects starting a serious lint regime, this catches roughly 70% of the families above: