Scalene is a high-performance CPU, GPU, and memory profiler for Python with AI-powered optimization proposals. It runs significantly faster than other Python profilers while providing detailed performance information. See the paper docs/osdi23-berger.pdf for technical details on Scalene's design.
Key features:
- CPU, GPU (NVIDIA/Apple), and memory profiling
- AI-powered optimization suggestions (OpenAI, Anthropic, Azure, Amazon Bedrock, Gemini, Ollama)
- Web-based GUI and CLI interfaces
- Jupyter notebook support via magic commands (
%scrun,%%scalene) - Line-by-line profiling with low overhead
- Separates Python time from native/C time
Platform support: Linux, macOS, WSL 2 (full support); Windows (partial support)
# Install in development mode
pip install -e .
# Run all tests
python3 -m pytest tests/
# Run tests for a specific Python version
python3.X -m pytest tests/
# Run linters
mypy scalene
ruff check scalene
# Run a single test file
python3 -m pytest tests/test_coverup_83.py -vscalene_profiler.py- Main profiler class (Scalene). Entry point for profiling. Uses signal-based sampling for CPU profiling. Coordinates all profiling subsystems.scalene_statistics.py-ScaleneStatisticsclass. Collects and aggregates profiling data. Key types:ProfilingSample,MemcpyProfilingSample. UsesRunningStatsfor statistical aggregation.scalene_output.py- Profile output formatting for CLI/HTMLscalene_json.py-ScaleneJSONclass for JSON output formatscalene_analysis.py- Profile analysis logic
__main__.py- Entry point forpython -m scaleneprofile.py- Entry point for--on/--offcontrol of background profiling
scalene_config.py- Version info (scalene_version,scalene_date) and constants:SCALENE_PORT = 11235- Default port for web UINEWLINE_TRIGGER_LENGTH- Must matchsrc/include/sampleheap.hpp
scalene_arguments.py-ScaleneArgumentsclass (extendsargparse.Namespace) with all profiler options and their defaults defined inScaleneArgumentsDictscalene_parseargs.py-ScaleneParseArgs.parse_args()builds the argument parser.RichArgParserprovides colored help output (uses Rich on Python < 3.14, native argparse colors on 3.14+)
scalene_signals.py- Signal definitions for CPU samplingscalene_signal_manager.py- Manages signal handlersscalene_sigqueue.py- Signal queue managementscalene_client_timer.py- Timer for periodic profiling
scalene_nvidia_gpu.py- NVIDIA GPU profiling viapynvmlscalene_apple_gpu.py- Apple GPU profiling (Metal)scalene_accelerator.py- Generic accelerator interfacescalene_neuron.py- AWS Neuron support
scalene_memory_profiler.py- Memory profiling logicscalene_leak_analysis.py- Memory leak detection (experimental,--memory-leak-detector)scalene_mapfile.py-ScaleneMapFilefor memory-mapped communication with native extensionscalene_preload.py- Sets upLD_PRELOAD/DYLD_INSERT_LIBRARIESfor native memory tracking
scalene_magics.py- Jupyter magic commands (%scrunfor line mode,%%scalenefor cell mode)scalene_jupyter.py- Jupyter notebook support utilities
These modules monkey-patch standard library functions to capture profiling data during blocking operations:
replacement_fork.py- Tracksos.fork()replacement_exit.py- Trackssys.exit()replacement_lock.py,replacement_mp_lock.py,replacement_sem_lock.py- Lock acquisition timingreplacement_thread_join.py,replacement_pjoin.py- Thread/process join timingreplacement_signal_fns.py- Signal function replacementsreplacement_poll_selector.py- I/O polling timingreplacement_get_context.py- Multiprocessing context
runningstats.py-RunningStatsclass for online statistical calculations (mean, variance)scalene_funcutils.py- Function utilitiesscalene_utility.py- General utilitiessparkline.py- Sparkline generation for memory visualizationsyntaxline.py- Syntax-highlighted source code linesadaptive.py- Adaptive sampling logictime_info.py- Time measurement utilitiessorted_reservoir.py- Reservoir sampling for bounded-size sample collection
Web-based GUI built with TypeScript, bundled with esbuild.
Core Files:
index.html.template- Jinja2 template for main GUI page (rendered byscalene_utility.py)scalene-gui.ts- Main TypeScript entry point, UI event handlers, initializationscalene-gui-bundle.js- Bundled JavaScript output (generated, do not edit directly)
AI Provider Modules:
openai.ts- OpenAI API integration (sendPromptToOpenAI,fetchOpenAIModels)anthropic.ts- Anthropic Claude API integrationgemini.ts- Google Gemini API integration (sendPromptToGemini,fetchGeminiModels)optimizations.ts- Provider dispatch logic, prompt generationpersistence.ts- localStorage persistence with environment variable fallbacks
Support Files:
launchbrowser.py- Opens browser to GUI (default port 11235)find_browser.py- Cross-platform browser detection
Vendored Assets (for offline support):
jquery-3.6.0.slim.min.js- jQuery (vendored locally, not loaded from CDN)bootstrap.min.css- Bootstrap 5.1.3 CSSbootstrap.bundle.min.js- Bootstrap 5.1.3 JS with Popperprism.css- Syntax highlighting stylesfavicon.ico- Scalene faviconscalene-image.png- Scalene logo
These assets are copied to a temp directory when serving via HTTP, enabling the GUI to work in air-gapped/offline environments.
Building the GUI:
cd scalene/scalene-gui
npx esbuild scalene-gui.ts --bundle --outfile=scalene-gui-bundle.js --format=iife --global-name=ScaleneGUIC++ code for low-overhead memory allocation tracking:
Headers (src/include/):
sampleheap.hpp- Sampling heap allocator. Key constantNEWLINEmust match Python config.memcpysampler.hpp- Interceptsmemcpyto track copy volumepywhere.hpp- Tracks Python file/line info for allocationssamplefile.hpp- File-based communication with Pythonsampler.hpp,poissonsampler.hpp,thresholdsampler.hpp- Sampling strategiesscaleneheader.hpp- Common header definitions
Sources (src/source/):
libscalene.cpp- Main native library (loaded viaLD_PRELOAD)pywhere.cpp- Python location tracking implementationget_line_atomic.cpp- Atomic line number accesstraceconfig.cpp- Trace configuration
Heap-Layers/- Memory allocator infrastructure (by Emery Berger)printf/- Async-signal-safe printf implementation
The codebase supports Python 3.8-3.14. Version-specific code uses:
if sys.version_info >= (3, 14):
# Python 3.14+ specific code
else:
# Older Python versionsType Annotation Compatibility (Python 3.8/3.9):
- Do NOT use
X | Yunion syntax in runtime-evaluated annotations (PEP 604 requires Python 3.10+). UseOptional[X]orUnion[X, Y]fromtypinginstead. - Do NOT use
list[X],dict[K, V],tuple[X, ...]in runtime-evaluated annotations (PEP 585 lowercase generics require Python 3.9+). UseList,Dict,Tuplefromtypingfor 3.8 support. - Adding
from __future__ import annotationsmakes all annotations strings (not evaluated at runtime), which allows modern syntax on older Python. However, this can break code that inspects annotations at runtime (e.g., dataclasses, pydantic). - The safest approach for this codebase: use
typing.Optional,typing.Union,typing.List,typing.Tuple,typing.Dictin all annotation positions that are evaluated at runtime (function signatures, variable annotations outsideif TYPE_CHECKINGblocks).
Python 3.13 Changes (dis module):
dis.Instruction.starts_linechanged fromint | None(line number) tobool- New
dis.Instruction.line_numberattribute (int | None) added for the actual line number - On Python < 3.13,
starts_lineis only set on the first instruction of each source line; use a line-tracking loop to propagate line numbers to subsequent instructions
Bytecode/Opcode Compatibility (dis module):
- Never match specific opcode names (e.g.,
JUMP_BACKWARD,JUMP_ABSOLUTE,POP_JUMP_IF_TRUE). Opcode names change across Python versions — for example, Python 3.10 while loops usePOP_JUMP_IF_TRUEfor backward jumps, Python 3.11+ usesJUMP_BACKWARD, andJUMP_ABSOLUTEwas removed in 3.12. - Always use abstract
dismodule categories when possible:dis.hasjabs(absolute jump opcodes),dis.hasjrel(relative jump opcodes),dis.hasconst,dis.hasname, etc. These are maintained by CPython and work across all versions. - For call detection, matching
opname.startswith("CALL")is acceptable since that prefix has been stable, but prefer opcode integer sets over name strings for hot paths. - When checking jump direction (forward vs backward), use
instr.argval(whichdisresolves to an absolute offset) and compare againstinstr.offset, rather than relying on opcode names to imply direction.
Python 3.14 Changes:
argparsenow has built-in colored help output (color=Trueparameter)RichArgParseruses Rich for colors on Python < 3.14, native argparse colors on 3.14+
class RichArgParser(argparse.ArgumentParser):
"""ArgumentParser that uses Rich for colored output on Python < 3.14."""
def __init__(self, *args, **kwargs):
if sys.version_info < (3, 14):
from rich.console import Console
self._console = Console()
else:
self._console = None
super().__init__(*args, **kwargs)The _colorize_help_for_rich() function applies Python 3.14-style colors using Rich markup:
usage:andoptions:→ bold blue- Program name → bold magenta
- Long options (
--foo) → bold cyan - Short options (
-h) → bold green - Metavars (
FOO) → bold yellow
Preventing Browser Password Prompts:
Use autocomplete="one-time-code" on password/API key inputs to prevent browsers from offering to save them:
<input type="password" id="api-key" autocomplete="one-time-code">Show/Hide Password Toggle:
function togglePassword(inputId: string, button: HTMLButtonElement): void {
const input = document.getElementById(inputId) as HTMLInputElement;
if (input.type === "password") {
input.type = "text";
button.textContent = "Hide";
} else {
input.type = "password";
button.textContent = "Show";
}
}Provider Field Visibility: Use CSS classes to show/hide provider-specific fields:
function toggleServiceFields(): void {
const service = (document.getElementById("service") as HTMLSelectElement).value;
// Hide all provider sections
document.querySelectorAll(".provider-section").forEach((el) => {
(el as HTMLElement).style.display = "none";
});
// Show selected provider section
const section = document.querySelector(`.${service}-fields`);
if (section) (section as HTMLElement).style.display = "block";
}Persistent Form Elements:
Add class persistent to inputs that should be saved/restored from localStorage:
<input type="text" id="api-key" class="persistent">The persistence.ts module handles save/restore automatically.
Standalone HTML Generation:
The generate_html() function in scalene_utility.py supports a standalone parameter:
- When
standalone=False(default): Assets are referenced as local files (e.g.,<script src="jquery-3.6.0.slim.min.js">) - When
standalone=True: All assets are embedded inline (JS/CSS as text, images as base64)
The Jinja2 template uses conditionals:
{% if standalone %}
<script>{{ jquery_js }}</script>
<style>{{ bootstrap_css }}</style>
{% else %}
<script src="jquery-3.6.0.slim.min.js"></script>
<link href="bootstrap.min.css" rel="stylesheet">
{% endif %}When importing submodules, be explicit:
# Correct - mypy can verify this
import importlib.util
importlib.util.find_spec(mod_name)
# Wrong - mypy error: Module has no attribute "util"
import importlib
importlib.util.find_spec(mod_name)test_coverup_*.py- Auto-generated coverage teststest_runningstats.py- Statistics tests (requireshypothesis)test_scalene_json.py- JSON output tests (requireshypothesis)test_nested_package_relative_import.py- Import handling tests
pip install pytest pytest-asyncio hypothesisfor v in 3.9 3.10 3.11 3.12 3.13 3.14; do
python$v -m pytest tests/test_coverup_83.py -v
doneThe smoketests in test/ can be flaky due to timing/sampling issues inherent to profiling:
- "No non-zero lines in X" - The profiler didn't collect enough samples. This happens when the test runs too quickly or signal delivery timing varies.
- "Expected function 'X' not returned" - A function wasn't sampled. Common with short-running functions.
These failures are usually timing-related and pass on re-run. They're more common on CI due to variable machine load.
When testing port availability, never use hardcoded ports - they may already be in use on CI runners:
# Bad - port 49200 might be in use
port = 49200
sock.bind(("", port))
# Good - find an available port first
port = find_available_port(49200, 49300)
if port is None:
return # Skip test if no ports available
sock.bind(("", port))run-linters.yml- Runs mypy and ruff on Python 3.9-3.14tests.yml- Runs pytest on Python 3.9-3.14build-and-upload.yml- Build and publish to PyPI
-
Add default value in
scalene_arguments.py:class ScaleneArgumentsDict(TypedDict, total=False): my_option: bool
-
Add argument in
scalene_parseargs.py:parser.add_argument( "--my-option", dest="my_option", action="store_true", default=defaults.my_option, help="Description of option", )
-
Create provider module (
scalene/scalene-gui/newprovider.ts):export async function sendPromptToNewProvider( prompt: string, apiKey: string ): Promise<string> { // API call implementation } export async function fetchNewProviderModels(apiKey: string): Promise<string[]> { // Optional: fetch available models from API }
-
Update
optimizations.ts:- Import the new module
- Add case in
sendPromptToService()switch statement
-
Update
index.html.template:- Add option to
#serviceselect dropdown - Add provider section with API key input, model selector, etc.
- Add CSS for
.newprovider-fieldsvisibility
- Add option to
-
Update
scalene-gui.ts:- Add provider to
toggleServiceFields()function - Add refresh handler if dynamic model fetching is supported
- Update
getDefaultProvider()if env var support is needed
- Add provider to
-
Update
persistence.ts(for env var support):- Add mapping in
envKeyMapfor new fields
- Add mapping in
-
Update
scalene_utility.py:- Read environment variable in
api_keysdict - Pass to template rendering
- Read environment variable in
-
Rebuild the bundle:
cd scalene/scalene-gui npx esbuild scalene-gui.ts --bundle --outfile=scalene-gui-bundle.js --format=iife --global-name=ScaleneGUI
The GUI supports prepopulating API keys from environment variables:
| Element ID | Environment Variable | Provider |
|---|---|---|
api-key |
OPENAI_API_KEY |
OpenAI |
anthropic-api-key |
ANTHROPIC_API_KEY |
Anthropic |
gemini-api-key |
GEMINI_API_KEY or GOOGLE_API_KEY |
Gemini |
azure-api-key |
AZURE_OPENAI_API_KEY |
Azure OpenAI |
azure-api-url |
AZURE_OPENAI_ENDPOINT |
Azure OpenAI |
aws-access-key |
AWS_ACCESS_KEY_ID |
Amazon Bedrock |
aws-secret-key |
AWS_SECRET_ACCESS_KEY |
Amazon Bedrock |
aws-region |
AWS_DEFAULT_REGION or AWS_REGION |
Amazon Bedrock |
Flow:
scalene_utility.pyreads env vars and passes to Jinja2 template- Template injects
envApiKeysJavaScript object into page persistence.tsuses env vars as fallbacks when localStorage is empty
Edit scalene/scalene_config.py:
scalene_version = "X.Y.Z"
scalene_date = "YYYY.MM.DD"Key runtime dependencies:
rich- Terminal formatting and colorscloudpickle- Serializationpynvml- NVIDIA GPU support (optional)
See requirements.txt for full list.
Scalene uses a verb-based CLI with two main subcommands:
# Profile a program (saves to scalene-profile.json by default)
scalene run [options] yourprogram.py
# View an existing profile
scalene view [options] [profile.json]scalene run prog.py # profile, save to scalene-profile.json
scalene run -o my.json prog.py # save to custom file
scalene run --cpu-only prog.py # profile CPU only (faster)
scalene run -c config.yaml prog.py # load options from config file
scalene run prog.py --- --arg # pass args to programscalene view # open in browser
scalene view --cli # view in terminal
scalene view --html # save to scalene-profile.html
scalene view --standalone # save as self-contained HTML (all assets embedded)
scalene view myprofile.json # open specific profileAfter profiling completes, Scalene prints instructions for viewing the profile:
Scalene: profile saved to scalene-profile.json
To view in browser: scalene view
To view in terminal: scalene view --cli
The filename is only included in the command if a non-default output file was used.
Create a scalene.yaml file with options:
outfile: my-profile.json
cpu-only: true
profile-only: "mypackage,utils"
cpu-percent-threshold: 5Load with: scalene run -c scalene.yaml prog.py
Use scalene run --help-advanced to see all options including:
--profile-all- profile all code, not just the target program--profile-only PATH- only profile files containing these strings--profile-exclude PATH- exclude files containing these strings--profile-system-libraries- profile Python stdlib and installed packages (skipped by default)--gpu- profile GPU time and memory--memory- profile memory usage--stacks- collect stack traces--profile-interval N- output profiles every N seconds
Smoke tests in test/ use the new CLI syntax:
# test/smoketest.py
cmd = [sys.executable, "-m", "scalene", "run", "-o", str(outfile), *rest, fname]Workflows in .github/workflows/ use the new CLI:
# Profile with interval, then view
- run: python -m scalene run --profile-interval=2 test/testme.py && python -m scalene view --cli
# Profile with module invocation
- run: python -m scalene run --- -m import_stress_test && python -m scalene view --cliScalene uses several Unix signals for profiling. The signal assignments are in scalene_signals.py:
| Signal | Purpose | Platform |
|---|---|---|
SIGVTALRM |
CPU profiling timer (default) | Unix |
SIGALRM |
CPU profiling timer (real time mode) | Unix |
SIGILL |
Start profiling (--on) |
Unix |
SIGBUS |
Stop profiling (--off) |
Unix |
SIGPROF |
memcpy tracking | Unix |
SIGXCPU |
malloc tracking | Unix |
SIGXFSZ |
free tracking | Unix |
Libraries like PyTorch Lightning may also use these signals. The replacement_signal_fns.py module handles conflicts:
On Linux: Uses real-time signals (SIGRTMIN+1 to SIGRTMIN+5) for redirection. When user code sets a handler for a Scalene signal, their handler is redirected to a real-time signal. Calls to raise_signal() and kill() are also redirected transparently.
On macOS/other platforms: Uses handler chaining. Both Scalene's handler and the user's handler are called when the signal fires.
# Platform-specific signal handling
_use_rt_signals = sys.platform == "linux" and hasattr(signal, "SIGRTMIN")
if _use_rt_signals:
# Linux: redirect to real-time signals
rt_base = signal.SIGRTMIN + 1
_signal_redirects[signal.SIGILL] = rt_base
else:
# macOS: chain handlers
def chained_handler(sig, frame):
scalene_handler(sig, frame)
user_handler(sig, frame)In Python 3.11+, frame.f_lineno can be None in edge cases (e.g., during multiprocessing cleanup). Always use a fallback:
lineno = frame.f_lineno if frame.f_lineno is not None else frame.f_code.co_firstlinenoThe vendor/printf/printf.h header defines macros that conflict with C++ standard library:
#define vsnprintf vsnprintf_
#define snprintf snprintf_This breaks std::vsnprintf in <string> and other headers. Fix: Include C++ standard headers BEFORE vendor headers in src/source/libscalene.cpp:
// Include C++ standard headers FIRST
#include <cstddef>
#include <string>
// Then vendor headers that define conflicting macros
#include <heaplayers.h> // Eventually includes printf.hSee Scalene-Agents.md for detailed information about interpreting Scalene's profiling output, including Python vs C time, memory metrics, and optimization strategies.
See Scalene-Debugging.md for signal handler debugging, async profiling debugging, the profile output pipeline (three separate renderers!), and unbounded growth prevention patterns.
See Scalene-GUI.md for adding new columns, Vega-Lite chart types, pie chart best practices (two-wedge rendering, rotating pies), and the chart rendering flow.