SCOPE & THESIS
Python's flexibility is its footgun surface. Late binding, duck typing, mutable defaults, the GIL, async colored functions, two string types — every convenience has a failure mode. Most of those failure modes are silent: the program runs, returns a wrong answer, and you find out at 3am from a customer.
This teardown enumerates the bug families that actually hit production Python codebases. For each, the canonical bad code, the fix, the runtime mechanism, and whether existing tools catch it. The goal is a complete map: what tools catch, what needs human or LLM review, what hides forever.
def f(x=[]): and wanted to understand why.
// the categories
| family | what hits production | tool coverage |
|---|---|---|
| runtime | IndexError, KeyError, AttributeError on None, TypeError on duck-typing | partial — types catch some, runtime catches the rest |
| logic | mutable defaults, late binding, off-by-one, shadowed names | mixed — mutable default is caught, late binding is not |
| types | str/bytes, shallow vs deep copy, falsy gotchas | type checkers, when configured |
| resource | leaked files / sockets / locks / db connections | partial — context manager hints exist |
| concurrency | races, deadlocks, async/sync mismatches, GIL assumptions | poor — most needs semantic review |
| security | injection (sql/shell/eval), pickle, hardcoded creds | bandit catches the obvious; subtle injection often missed |
| performance | O(n²) inner loops, string concat, list-as-set, materialization | perflint exists but rarely run |
| smells | god classes, deep nesting, magic numbers, eval/exec abuse | complexity tools catch metrics, not semantics |
RUNTIME EXCEPTIONS
The bugs that crash visibly. Less dangerous than logic bugs (you find them fast) but still ubiquitous because Python's late binding hides them from static analysis.
// the none dereference
Almost every Python attribute or subscript bug eventually traces to a None sneaking in where an object was expected. Functions that can return None but usually don't are landmines.
bad# dict.get returns None when missing; no second arg = no default def user_age(users, name): return users.get(name).age # AttributeError if name absent
gooddef user_age(users, name): user = users.get(name) if user is None: return None return user.age
pyright --strict catches this via reportOptionalMemberAccess. mypy --strict with --no-implicit-optional catches it. ruff default does not. The bug shows up because dict.get() returns Optional[T] in its type stub and the access ignores the Optional.
// bare except
Catches everything including SystemExit, KeyboardInterrupt, and the bug you needed to see. The most cited code smell that's also a real correctness bug.
badtry: do_work() except: # swallows ctrl-c, swallows MemoryError, swallows the real bug log("oops")
goodtry: do_work() except (ValueError, KeyError) as e: log("recoverable: %s", e)
ruff E722 in default rules. Easy catch. Bigger issue is except Exception: — also wrong in many contexts, but not in the default ruleset.
// the index-and-pop race
Pattern that looks defensive but isn't:
badif queue: item = queue.pop(0) # IndexError if another thread popped first
The if queue check is not atomic with the pop. Under threads this races. Under asyncio across await points it races too if you yielded between them.
goodtry: item = queue.pop(0) except IndexError: item = None
// off-by-one in slicing
Python's half-open intervals are intuitive once internalized but trip up people thinking in inclusive C-style ranges.
bad# take the last N items; off-by-one if you forget slice semantics last_n = items[len(items) - n - 1:] # takes n+1 items
goodlast_n = items[-n:] # pythonic and correct for n > 0
LOGIC & STATE
The most uniquely Pythonic bugs live here. Mutable defaults and late binding closures are interview questions because they keep biting production code. The fact that they're known doesn't stop them.
// THE classic: mutable default arguments
Default argument values are evaluated once, when the def statement executes, not every call. A mutable default is shared across every invocation.
baddef add_item(item, items=[]): items.append(item) return items add_item(1) # [1] add_item(2) # [1, 2] -- not [2]! add_item(3) # [1, 2, 3]
gooddef add_item(item, items=None): if items is None: items = [] items.append(item) return items
ruff B006 in the B ruleset catches this. The rule is opt-in (not in defaults) but extremely high signal. Every project that runs ruff should have B selected. Pylint catches it as W0102 dangerous-default-value.
// late binding in closures
Closures capture variable names, not values. When the closure runs, it sees the variable's current value, not the value at capture time. This breaks loops that build callables.
badhandlers = [] for i in range(3): handlers.append(lambda: i) [h() for h in handlers] # [2, 2, 2] -- all see final i
good# bind i as a default arg, evaluated at lambda definition handlers = [lambda i=i: i for i in range(3)] [h() for h in handlers] # [0, 1, 2]
ruff B023 catches function definitions that don't bind loop variables. Default-off. Real bug, often missed.
// is vs ==
is compares identity (same object in memory). == compares value. They coincidentally agree for small ints (-5 to 256 are interned) and short strings — until they don't.
bad# two distinct objects with equal value a = [1, 2, 3] b = [1, 2, 3] a == b # True -- same value a is b # False -- different objects # small ints (-5..256) are cached, which masks the difference x, y = 5, 5 x is y # True -- implementation detail, not a guarantee
ruff E711 catches == None, ruff F632 catches is with a literal. The general "wrong identity check" needs semantic review.
// truthy/falsy traps
0, 0.0, "", [], {}, set(), None, and False are all falsy. A function returning 0 as a meaningful value is indistinguishable from one returning None by a truth check.
badcount = users.get_count() # returns int, possibly 0 if not count: handle_no_users() # fires when count == 0 (correct) AND when count is None (might be wrong)
goodif count is None: handle_missing() elif count == 0: handle_empty()
// dict ordering assumptions
Dict insertion order has been guaranteed since Python 3.7. Iteration order matches insertion order. But code written against earlier Python or ported from other languages still sometimes assumes alphabetical, hash, or arbitrary order. The fix is usually to wrap the dict in sorted() or use collections.OrderedDict for clarity of intent.
// shadowed builtins
list, dict, id, type, filter, map, sum — assigning to any of these in module scope shadows the builtin for the rest of the file. Subsequent code that expects the builtin breaks silently.
bad# at module top level list = get_users() # 200 lines later ids = list(map(get_id, items)) # TypeError: 'list' object is not callable
ruff A001/A002/A003 from the flake8-builtins plugin. Opt-in. Catches every shadowing case.
DATA & TYPE ISSUES
Python's flexibility around types makes "it works in dev" mean "it might break in prod under different input." The bugs in this family are about silent behavior changes when types differ.
// str / bytes mixing
The most-fought-over Python 3 transition. str and bytes don't compare equal, don't concatenate, don't substitute into each other. A function that accepts either silently does the wrong thing.
baddef starts_with_x(s): return s[0] == "x" starts_with_x("xyz") # True starts_with_x(b"xyz") # False -- s[0] is int 120, "x" is str
gooddef starts_with_x(s: str | bytes): if isinstance(s, bytes): return s.startswith(b"x") return s.startswith("x")
// shallow vs deep copy
copy.copy() copies the outer container. Nested mutable objects are still shared. list[:] and dict.copy() have the same behavior. copy.deepcopy() recursively copies.
badimport copy defaults = {"perms": ["read"]} user_config = copy.copy(defaults) user_config["perms"].append("write") defaults["perms"] # ["read", "write"] -- defaults mutated!
gooduser_config = copy.deepcopy(defaults) user_config["perms"].append("write") defaults["perms"] # ["read"]
// reference aliasing
Assignment in Python binds names to objects; it does not copy. Same for function arguments. Mutation through one name shows up through every other name pointing at the same object.
baddef scale(values, factor): for i in range(len(values)): values[i] *= factor # mutates caller's list original = [1, 2, 3] scale(original, 2) original # [2, 4, 6]
Returning a new list instead of mutating is almost always the right answer unless mutation is the documented contract.
// encoding mismatches
Reading bytes as the wrong encoding doesn't always raise. It silently produces wrong-looking text (mojibake) that propagates through your system. latin-1 is the worst offender — it accepts any byte but interprets non-ASCII as garbage.
badopen("data.txt").read() # platform-dependent default encoding
goodopen("data.txt", encoding="utf-8").read()
ruff PLW1514 catches missing encoding on open(). Opt-in via the PL plugin set. In Python 3.10+, set PYTHONWARNDEFAULTENCODING=1 to surface this at runtime.
RESOURCE LIFECYCLE
Files, sockets, db connections, locks. Each is a finite resource. Leaks compound: small per-request leak times millions of requests equals process restart.
// the bare open()
Python doesn't close files automatically when their reference goes out of scope. CPython does via reference counting most of the time, but PyPy and other implementations don't, and exceptions can leave references dangling.
baddata = open("big.json").read() # file may stay open until GC
goodwith open("big.json") as f: data = f.read() # closed on scope exit, even on exception
ruff SIM115 catches open() outside a with. pylint R1732 same. ResourceWarning fires at runtime if a file is GC'd while open. Easy class of bug to find statically.
// db connections without context managers
Same pattern, higher stakes. A leaked connection can block the entire pool.
badconn = pool.acquire() result = conn.execute(query) if result.empty(): return None # leak! conn never released on this path conn.release()
goodwith pool.acquire() as conn: result = conn.execute(query) if result.empty(): return None # released on exit anyway
// lock release in error paths
Acquiring a threading.Lock manually requires releasing on every code path. One exception with no finally and you have a permanently held lock.
badlock.acquire() do_work() # if this raises, lock is never released lock.release()
goodwith lock: do_work()
// __del__ surprises
__del__ runs when the object is garbage collected, which is not guaranteed to be when the last reference drops. Reference cycles can leave __del__ never called (pre-3.4) or called in undefined order (post-3.4). Don't put critical cleanup there.
The correct primitive for "do X when this resource is no longer needed" is a context manager, an explicit close method, or weakref.finalize.
// growing caches
Module-level dicts used as caches grow without bound. They survive every garbage collection. They survive request boundaries. A long-running process slowly leaks until OOM.
bad_cache = {} def expensive(key): if key not in _cache: _cache[key] = compute(key) return _cache[key]
goodfrom functools import lru_cache # bounded, evicts least-recently-used @lru_cache(maxsize=1024) def expensive(key): return compute(key)
CONCURRENCY & ASYNC
The highest-LLM-value bug family. Tools rarely catch async/sync mismatches or shared-state races. Required reading: Glyph's posts, Brett Cannon on coroutines, and the asyncio-gotchas pattern catalog.
// blocking calls in async functions
An async def function that calls time.sleep(), requests.get(), or open().read() blocks the event loop. Every other coroutine waiting for the loop is stalled. Latency spikes for unrelated requests.
badasync def fetch(url): time.sleep(1) # blocks the loop r = requests.get(url) # blocks the loop return r.text
goodasync def fetch(url): await asyncio.sleep(1) async with aiohttp.ClientSession() as sess: async with sess.get(url) as r: return await r.text()
ruff --select=ASYNC catches ASYNC251 (time.sleep), ASYNC210 (blocking HTTP). Default ruff misses both. This rule set should be table-stakes for any async-heavy project.
// task not awaited
asyncio.create_task() returns a task. If you don't keep a reference, the garbage collector can collect the task while it's still running. The task gets cancelled mid-flight.
badasync def main(): asyncio.create_task(background_work()) # task may be GC'd await other_work()
good_background = set() async def main(): t = asyncio.create_task(background_work()) _background.add(t) t.add_done_callback(_background.discard) await other_work()
// shared mutable state across awaits
The deceiving bug. Pure-Python statements between two await points are atomic relative to other coroutines (the loop can't preempt them). But the moment you await, another coroutine can run, mutate shared state, and return.
badbalance = 100 async def transfer(amount): global balance current = balance # read await log_transfer(amount) # yields! another coroutine can run balance = current - amount # write based on stale read
goodbalance_lock = asyncio.Lock() async def transfer(amount): global balance async with balance_lock: current = balance await log_transfer(amount) balance = current - amount
// queue.Queue vs asyncio.Queue
Two queues, same name. queue.Queue uses OS threads and blocks. asyncio.Queue uses the event loop and yields. Using the wrong one in async code blocks the loop or deadlocks.
badfrom queue import Queue # thread-safe, blocking q = Queue() async def consumer(): while True: item = q.get() # blocks the entire event loop process(item)
goodfrom asyncio import Queue # yields to the loop q = Queue() async def consumer(): while True: item = await q.get() process(item)
// GIL assumptions
The Global Interpreter Lock makes single-bytecode operations atomic. Most operations are not single-bytecode. counter += 1 is read, increment, write — three bytecodes. Under threading, races appear at bytecode boundaries even with the GIL.
"Python has the GIL so I don't need locks" is the most expensive misconception in Python concurrency.
SECURITY VULNS
Python's flexibility extends to unsafe primitives. eval, exec, pickle, yaml.load, and string interpolation into shell or SQL are all ways to turn user input into code execution.
// pickle from untrusted sources
pickle.loads() is equivalent to executing arbitrary Python. A pickle payload can spawn shells, exfiltrate data, install backdoors. There is no way to make it safe.
baddata = pickle.loads(request.body) # RCE
gooddata = json.loads(request.body) # or use a typed schema with msgpack, protobuf, etc.
bandit B301 (blacklist pickle), B403 (import pickle). Both default. Reliable catch for the obvious cases. Indirect pickle use through libraries (joblib, dill) often missed.
// sql injection via f-strings
badcursor.execute(f"SELECT * FROM users WHERE name = '{name}'") # name = "x'; DROP TABLE users; --"
goodcursor.execute("SELECT * FROM users WHERE name = %s", (name,))
bandit B608 catches f-string SQL. Sqlalchemy's text() with bound params is the right primitive. Format-string SQL is a CWE-89 classic.
// shell injection
bados.system(f"convert {filename} out.png") # filename = "x.png; rm -rf /"
goodsubprocess.run(["convert", filename, "out.png"], check=True)
// yaml.load
Default yaml.load() constructs arbitrary Python objects from the input. Yes, including running code. yaml.safe_load() is the safe variant — it only constructs basic types.
badconfig = yaml.load(open("config.yaml")) # can execute code
goodconfig = yaml.safe_load(open("config.yaml"))
// eval / exec on user input
Self-explanatory. eval(user_input) is "please run my user's code as me."
// hardcoded credentials
badAPI_KEY = "sk-proj-AbCd1234..."
Beyond the obvious "don't commit secrets," API keys in code leak to logs, error reports, AI assistants, every clone of the repo. git filter-repo can scrub history but only if you notice.
bandit B105 / B106 / B107 catches hardcoded password literals. Truffhog, gitleaks, and github's secret scanning catch more patterns. Bare API keys still slip through if they don't match known token patterns.
PERFORMANCE PATTERNS
Most Python performance problems are algorithmic or idiomatic, not the GIL. The GIL is the convenient excuse; the actual cost is usually quadratic loops or unnecessary materialization.
// O(n²) via list membership
A list's in operator scans linearly. Doing it inside a loop over another collection is O(n × m).
baddef find_overlap(items, blacklist): return [x for x in items if x in blacklist] # O(n * m) -- m is len(blacklist)
gooddef find_overlap(items, blacklist): bl = set(blacklist) return [x for x in items if x in bl] # O(n + m)
x in [a, b, c] → x in {a, b, c}. ruff PLR1714 is the closest, but only flags the related x == a or x == b pattern. The cross-function case (list passed in) needs LLM-level reasoning either way.
// string concatenation in loops
Strings are immutable. Each concat allocates a new string and copies. Quadratic behavior masquerading as a loop.
badresult = "" for chunk in chunks: result += chunk # O(n²) total
goodresult = "".join(chunks) # O(n)
// list materialization where a generator would do
list(map(...)) when you only iterate once. [x for x in big] when you just need to check membership. Materializing means allocating, populating, and (often) tearing down a list you didn't need.
bad# reads whole file into memory just to check if any line matches if any([is_match(line) for line in open("huge.log")]): handle()
good# generator, short-circuits on first match if any(is_match(line) for line in open("huge.log")): handle()
// repeated attribute lookup in hot loops
Python looks up attributes by name at every access. obj.method in a loop hits the object's __getattribute__ every iteration. Hoisting saves real time on inner loops.
badfor item in items: self.processor.handler.process(item) # 3 lookups per iter
goodprocess = self.processor.handler.process for item in items: process(item)
// loops where pandas/numpy vectorize
The largest delta. A Python loop over a numpy array can be 100-1000x slower than the vectorized equivalent.
badresult = [] for i in range(len(arr)): result.append(arr[i] * 2 + 1)
goodresult = arr * 2 + 1 # vectorized; runs in C
// loading whole files when streaming would do
JSON files of moderate size, CSVs that exceed RAM, log scans. f.read() is the default but rarely the right choice for files over a few MB.
SMELLS & MAINTAINABILITY
Not bugs in the "wrong answer" sense, but causes of future bugs. Worth flagging in a code review but not blocking.
// god functions
A 500-line function with 20 parameters and 8 levels of nesting. Every call site has to reason about every branch. Refactoring fear is high; bug rate is higher.
radon cc measures cyclomatic complexity. Functions over CC=15 are warning territory; over 30 should be split. ruff C901 in the C ruleset flags by complexity threshold.
// magic numbers and strings
if status == 7: is a bug magnet. Six months later nobody knows what 7 means. StatusCode.PROCESSED is self-documenting and refactorable.
// deep nesting
Each if or for level doubles the mental model. Early return / guard clauses flatten nesting and make every branch explicit.
baddef handle(req): if req: if req.user: if req.user.active: if req.action: return dispatch(req) return None
gooddef handle(req): if not req or not req.user or not req.user.active: return None if not req.action: return None return dispatch(req)
// duplicated logic
The same five-line block appearing in three places. The bug fix that lands in one place and misses the others. The classic case for extraction.
// eval / exec / globals() abuse
Every now and then someone uses globals() as a dispatcher or exec to build a class dynamically. The performance is terrible, the bug surface is enormous, and the alternative is almost always a dict of callables.
DETECTION LANDSCAPE
What runs, what catches what, where the holes are.
// tool matrix
| tool | strong at | weak at | config burden |
|---|---|---|---|
| ruff | style, imports, simple bugs, perf hints; fast | cross-function semantic, opt-in rulesets often skipped | low |
| pyright | types, control flow, optional access | dynamic code, untyped third-party libs | low (strict mode adds work) |
| mypy | types, strict null safety | slow, weaker flow analysis than pyright | medium |
| bandit | security CWE patterns | narrow scope, lots of FPs in default config | low |
| vulture | dead code, unused symbols | FPs on plugins, reflection, public API | low |
| perflint | performance idioms | narrow rule set, rarely run in CI | low |
| semgrep | custom syntactic patterns; powerful | requires rules; setup investment | medium |
| radon / xenon | complexity metrics | metrics, not bugs; needs human interpretation | low |
// what no tool catches well
- Cross-function mutation: function A mutates an object B assumed unchanged
- State-machine violations: methods called in wrong order on a stateful object
- Algorithmic complexity that needs workload context (the
inon a list might be hot or cold) - Race conditions across async boundaries with mutable shared state
- Resource leaks via exception paths that skip cleanup in non-context-manager flows
- Security: indirect injection through dependency calls, untrusted deserialization buried in libraries
- Logic errors that compile and run correctly but don't match intent
- Documentation/comment lies (the comment says X, the code does Y)
The above is where LLM-assisted review earns its keep over tool-only output.
ruff check --select=ALL on a real codebase produces hundreds to thousands of findings. The hard problem isn't generating candidates; it's culling, ranking, and presenting the 10-15 that matter. That gap — between linter output and actionable report — is the python-analyzer skill's reason to exist.
// recommended baseline config
For a new project running x8r-style analysis discipline, start here:
pyproject.toml[tool.ruff] select = [ "E", "F", "W", # pyflakes + pycodestyle "B", # flake8-bugbear (mutable defaults, etc) "SIM", # simplifications "ASYNC", # async pitfalls "PERF", # performance idioms "RUF", # ruff-specific "S", # bandit-equivalent security "UP", # pyupgrade ] [tool.pyright] typeCheckingMode = "strict" [tool.mypy] strict = true
This alone catches ~70% of what the python-analyzer skill flags in the bug families above. The skill's job is the remaining ~30% that needs cross-function reasoning, plus the prioritization layer on top of the raw output.