Whitepaper / Python bugs in production

a working taxonomy of python bugs and optimization opportunities

8 families~45 patternstool coverage map Bug families, optimization patterns, and detection landscape for production python codebases. Reference material for static analyzers, type checkers, security scanners, and LLM-assisted code review.

bug families

~700

ruff rules

tier-1 tools

~30%

needs cross-function review

~5%

signal in default lint output

Premise

Python's flexibility is its footgun surface. Late binding, duck typing, mutable defaults, the GIL, async colored functions, two string types — every convenience has a documented failure mode. The bug families below are the ones that actually hit production, ranked by which tool tier catches them.

The taxonomy is not novel; many of these bugs have been known since Python 2. What's worth structuring is the detection landscape: which family is reliably caught by which tool tier, and which families require LLM-grade semantic review because no tool sees them.

Family map

runtime

exception bugs

IndexError, KeyError, None dereferences, TypeError on duck-typing. Visible failures.

logic

behavioral

Mutable defaults, late binding, off-by-one, condition inversion, shadowed builtins.

data & types

silent wrong

str/bytes, shallow vs deep copy, encoding, reference aliasing.

resource

lifecycle

Leaked files / sockets / locks / db connections; context manager misuse.

concurrency

races, deadlocks

Async/sync mismatches, GIL assumptions, shared state across awaits.

security

vulnerabilities

Injection, unsafe deserialization, hardcoded creds, eval/exec on user input.

performance

opportunities

Algorithmic O(n²), string concat in loops, materialization, repeated attr lookup.

smells

maintainability

God functions, deep nesting, magic numbers, duplication, eval/exec abuse.

Runtime exceptions

The visible bugs. IndexError, KeyError, AttributeError, TypeError show up at runtime; the program crashes; you find them fast. The hidden danger is the conditional path where the exception fires only under specific input — your test suite happens to miss that branch and the bug ships.

Subtypes worth calling out:

None dereference: functions returning Optional[T] whose return value is used without a null check. Caught reliably by pyright --strict and mypy --strict. Missed by default ruff.
Bare except: catches SystemExit, KeyboardInterrupt, and the bug you needed to see. Caught by ruff E722 in defaults.
Off-by-one in slicing: Python's half-open intervals trip up people thinking in C-style inclusive ranges. Needs semantic review; no tool catches the intent mismatch.

Logic and state

The most uniquely Pythonic bugs. Mutable default arguments and late binding in closures are interview questions because they keep biting real code. Knowing about them doesn't stop the bug from appearing — a tired engineer writes def f(x=[]): in a code review at 5pm and ships it.

# the canonical bug def add_item(item, items=[]): items.append(item) return items add_item(1) # [1] add_item(2) # [1, 2] -- not [2]! # the fix def add_item(item, items=None): if items is None: items = [] items.append(item) return items

Other logic bugs in this family: dict ordering assumptions (subtle since Python 3.7 guarantees order, but legacy code may still assume hash order), shadowed builtins (assigning to list or id at module level), truthy/falsy traps (a function returning 0 is indistinguishable from one returning None via a truth check).

Data and types

Silent wrong-answer bugs. The most expensive class in the taxonomy because they don't crash. str/bytes mixing ships data with mojibake. Shallow copies share inner mutable state. Reference aliasing means a function that "modifies its argument" mutates the caller's object.

Type checkers help when configured. pyright --strict catches the str/bytes and Optional cases. mypy --strict catches them slower but as reliably. Neither catches reference aliasing — that's semantic. copy.deepcopy vs copy.copy needs human review of intent.

Resource lifecycle

Files, sockets, locks, db connections. Each is finite. The context manager pattern (with statement) is the idiomatic answer. Code that opens resources without a with block — especially in functions with early returns or exception paths — leaks under load.

ruff SIM115 catches naked open() outside with. ResourceWarning fires at runtime if a file is GC'd while open. Catching db connection leaks is harder — they're typically per-library and ruff doesn't know the connection-pool API.

Special mention: unbounded module-level caches. The classic "memoization" pattern that grows forever. functools.lru_cache(maxsize=N) is bounded; a bare dict is not.

Concurrency and async

The highest-LLM-value bug family. Tools rarely catch async/sync mismatches, shared state across await points, or improper task lifecycle. The asyncio gotchas catalog in the python-analyzer-research corpus documents 12 patterns that no default tool config detects.

Top hitters:

Blocking calls in async functions: time.sleep(), requests.get(), open().read() inside async def blocks the entire event loop. ruff ASYNC251, ASYNC210 catch with the right select.
Task not awaited: asyncio.create_task() return value discarded, GC collects mid-flight. ruff RUF006.
Shared mutable state across awaits: pure-Python statements are atomic between awaits, but await yields the loop. Read-modify-write across an await races without a lock. No tool catches this.
queue.Queue vs asyncio.Queue: same name, different semantics. The threading queue blocks the loop. No tool catches.
GIL assumptions: counter += 1 is not atomic at the bytecode level. The most expensive misconception in Python concurrency.

Security

Python ships unsafe primitives. pickle.loads is arbitrary code execution. SQL injection via f-strings is CWE-89 wrapped in syntactic sugar. yaml.load constructs arbitrary Python objects unless you use safe_load. eval and exec on user input are the same as giving them a shell.

bandit catches the obvious cases — pickle imports, hardcoded credentials, shell=True. Indirect cases (pickle hidden in joblib, sql via SQLAlchemy text() with f-string content) often slip through. The pattern "tools catch direct, miss indirect" is general in Python security analysis.

Performance opportunities

Most Python performance problems are algorithmic or idiomatic, not the GIL. The GIL is the convenient excuse; the actual cost is usually O(n²) inner loops from list-as-set membership tests, string concatenation in loops, or unnecessary materialization of generators into lists.

The biggest delta comes from vectorizing numpy/pandas code. A Python loop over a numpy array can be 100-1000x slower than the vectorized form. This is rarely caught by tools because it requires recognizing the array context.

perflint exists for this family. ruff PERF covers a subset. Most projects don't run either.

Detection coverage

family	tier-1 tools	opt-in tools	needs LLM
runtime exceptions	pyright/mypy strict	ruff Optional access plugins	conditional paths missed by tests
logic & state	ruff `B`, `SIM`	ruff `RUF`, `A` (builtins)	cross-function mutation; intent vs code
data & types	pyright/mypy strict	ruff `UP` for str/bytes upgrades	reference aliasing; shallow vs deep intent
resource lifecycle	ruff `SIM115`	pylint `R1732`	cross-function leaks via early return
concurrency & async	—	ruff `ASYNC`	shared state races; lifecycle; semantic flow
security	bandit defaults	semgrep custom; `S` ruff plugin	indirect injection through libraries
performance	—	perflint; ruff `PERF`	workload-dependent hotness; vectorize candidates
smells	radon; ruff `C`	xenon for thresholds	god-class refactor recommendations

Roughly: ~30% of bug-finding value needs LLM-grade reasoning. The other 70% is achievable by tools that most projects don't enable. The python-analyzer skill's job is both: run the opt-in tools by default, then handle the 30% that needs semantic analysis.

Baseline configuration

For projects starting a serious lint regime, this catches roughly 70% of the families above:

# pyproject.toml [tool.ruff] select = [ "E", "F", "W", # pycodestyle + pyflakes "B", # flake8-bugbear (mutable defaults, etc) "SIM", # simplification hints "ASYNC", # async pitfalls (high signal) "PERF", # performance idioms "RUF", # ruff-specific (RUF006 task storage) "S", # bandit-equivalent "UP", # pyupgrade ] [tool.pyright] typeCheckingMode = "strict"

References

ruffruff rule catalog~700 rules across ~30 categories

pyrightpyright type checkerstrict mode is the floor

banditbandit security linterCWE-mapped plugin set

asyncioasyncio referencethe source of most async bugs is here

glyphGlyph's async writeupscanonical pre-asyncio thinking; still relevant

brett cannonHow async/await works3-part series; required reading

hettingerRaymond Hettinger PyCon talksdata structures + perf idioms

CWE PythonCWE-89: SQL injectioncore security primitive

corpuspython-analyzer-researchprivate repo: pattern catalogs, false-positive DB, severity rubric