2月17日 22:19

What are the techniques and best practices for Python performance optimization?

Performance Analysis Tools

1. timeit Module

The timeit module is used to measure the execution time of small code snippets.

python
import timeit

# Measure code execution time
code = """
sum(range(1000))
"""

execution_time = timeit.timeit(code, number=1000)
print(f"Execution time: {execution_time:.4f} seconds")

# Use timeit decorator
@timeit.timeit
def test_function():
    return sum(range(1000))

test_function()

2. cProfile Module

The cProfile module is used to analyze program performance bottlenecks.

python
import cProfile

def slow_function():
    total = 0
    for i in range(1000000):
        total += i
    return total

def fast_function():
    return sum(range(1000000))

def main():
    slow_function()
    fast_function()

# Performance profiling
cProfile.run('main()')

# Output analysis results to file
cProfile.run('main()', filename='profile_stats')

3. memory_profiler

memory_profiler is used to analyze memory usage.

python
# Install: pip install memory-profiler
from memory_profiler import profile

@profile
def memory_intensive_function():
    data = [i for i in range(1000000)]
    return sum(data)

if __name__ == '__main__':
    memory_intensive_function()

4. line_profiler

line_profiler is used to analyze function performance line by line.

python
# Install: pip install line_profiler
from line_profiler import LineProfiler

def complex_function():
    result = []
    for i in range(1000):
        result.append(i * 2)
    return sum(result)

# Create performance profiler
lp = LineProfiler()
lp_wrapper = lp(complex_function)
lp_wrapper()

# Display results
lp.print_stats()

Algorithm Optimization

1. Choose Appropriate Algorithms

python
# Bad practice - O(n²) complexity
def find_duplicates_slow(arr):
    duplicates = []
    for i in range(len(arr)):
        for j in range(i + 1, len(arr)):
            if arr[i] == arr[j] and arr[i] not in duplicates:
                duplicates.append(arr[i])
    return duplicates

# Good practice - O(n) complexity
def find_duplicates_fast(arr):
    seen = set()
    duplicates = set()
    for item in arr:
        if item in seen:
            duplicates.add(item)
        else:
            seen.add(item)
    return list(duplicates)

2. Use Built-in Functions

python
# Bad practice - Manual implementation
def manual_sum(arr):
    total = 0
    for item in arr:
        total += item
    return total

# Good practice - Use built-in functions
def builtin_sum(arr):
    return sum(arr)

# Performance comparison
import timeit
print(timeit.timeit(lambda: manual_sum(range(10000)), number=100))
print(timeit.timeit(lambda: builtin_sum(range(10000)), number=100))

3. Avoid Unnecessary Computations

python
# Bad practice - Repeated computation
def calculate_distances(points):
    distances = []
    for i in range(len(points)):
        for j in range(len(points)):
            dx = points[j][0] - points[i][0]
            dy = points[j][1] - points[i][1]
            distances.append((dx ** 2 + dy ** 2) ** 0.5)
    return distances

# Good practice - Avoid repeated computation
def calculate_distances_optimized(points):
    distances = []
    for i in range(len(points)):
        for j in range(i + 1, len(points)):
            dx = points[j][0] - points[i][0]
            dy = points[j][1] - points[i][1]
            distances.append((dx ** 2 + dy ** 2) ** 0.5)
    return distances

Data Structure Optimization

1. Use Appropriate Data Structures

python
# List lookup - O(n)
def find_in_list(lst, target):
    return target in lst

# Set lookup - O(1)
def find_in_set(s, target):
    return target in s

# Performance comparison
import timeit
lst = list(range(10000))
s = set(range(10000))

print("List lookup:", timeit.timeit(lambda: find_in_list(lst, 5000), number=1000))
print("Set lookup:", timeit.timeit(lambda: find_in_set(s, 5000), number=1000))

2. Use Generators Instead of Lists

python
# Bad practice - Use lists
def get_squares_list(n):
    return [i ** 2 for i in range(n)]

# Good practice - Use generators
def get_squares_generator(n):
    for i in range(n):
        yield i ** 2

# Memory usage comparison
import sys
list_obj = get_squares_list(1000000)
gen_obj = get_squares_generator(1000000)

print(f"List memory: {sys.getsizeof(list_obj)} bytes")
print(f"Generator memory: {sys.getsizeof(gen_obj)} bytes")

3. Use slots to Reduce Memory

python
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

class PersonWithSlots:
    __slots__ = ['name', 'age']
    
    def __init__(self, name, age):
        self.name = name
        self.age = age

# Memory comparison
import sys
p1 = Person("Alice", 25)
p2 = PersonWithSlots("Alice", 25)

print(f"Regular object: {sys.getsizeof(p1)} bytes")
print(f"With __slots__: {sys.getsizeof(p2)} bytes")

I/O Optimization

1. Batch Process I/O

python
# Bad practice - Write line by line
def write_lines_slow(filename, lines):
    with open(filename, 'w') as f:
        for line in lines:
            f.write(line + '\n')

# Good practice - Batch write
def write_lines_fast(filename, lines):
    with open(filename, 'w') as f:
        f.write('\n'.join(lines))

2. Use Buffering

python
# Bad practice - No buffering
def read_without_buffer(filename):
    with open(filename, 'r', buffering=0) as f:
        return f.read()

# Good practice - Use buffering
def read_with_buffer(filename):
    with open(filename, 'r', buffering=8192) as f:
        return f.read()

3. Asynchronous I/O

python
import asyncio
import aiohttp

async def fetch_url(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            return await response.text()

async def fetch_all_urls(urls):
    tasks = [fetch_url(url) for url in urls]
    return await asyncio.gather(*tasks)

urls = [
    "https://www.example.com",
    "https://www.google.com",
    "https://www.github.com",
]

# Fetch all URLs asynchronously
results = asyncio.run(fetch_all_urls(urls))

Concurrency Optimization

1. Multiprocessing for CPU-Intensive Tasks

python
import multiprocessing

def process_data(data_chunk):
    return sum(x ** 2 for x in data_chunk)

def parallel_processing(data, num_processes=4):
    chunk_size = len(data) // num_processes
    chunks = [data[i:i + chunk_size] for i in range(0, len(data), chunk_size)]
    
    with multiprocessing.Pool(processes=num_processes) as pool:
        results = pool.map(process_data, chunks)
    
    return sum(results)

data = list(range(1000000))
result = parallel_processing(data)

2. Multithreading for I/O-Intensive Tasks

python
import threading
import requests

def download_url(url):
    response = requests.get(url)
    return len(response.content)

def parallel_download(urls):
    threads = []
    results = []
    
    def worker(url):
        result = download_url(url)
        results.append(result)
    
    for url in urls:
        thread = threading.Thread(target=worker, args=(url,))
        threads.append(thread)
        thread.start()
    
    for thread in threads:
        thread.join()
    
    return results

urls = ["url1", "url2", "url3"]
results = parallel_download(urls)

3. Use concurrent.futures

python
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def process_item(item):
    return item ** 2

def with_thread_pool(items):
    with ThreadPoolExecutor(max_workers=4) as executor:
        results = list(executor.map(process_item, items))
    return results

def with_process_pool(items):
    with ProcessPoolExecutor(max_workers=4) as executor:
        results = list(executor.map(process_item, items))
    return results

items = list(range(1000))
thread_results = with_thread_pool(items)
process_results = with_process_pool(items)

Caching Optimization

1. Use functools.lru_cache

python
from functools import lru_cache

@lru_cache(maxsize=128)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

# Fast calculation
print(fibonacci(100))

2. Custom Caching

python
class Cache:
    def __init__(self, max_size=128):
        self.cache = {}
        self.max_size = max_size
    
    def get(self, key):
        return self.cache.get(key)
    
    def set(self, key, value):
        if len(self.cache) >= self.max_size:
            self.cache.pop(next(iter(self.cache)))
        self.cache[key] = value

cache = Cache()

def expensive_computation(x):
    cached_result = cache.get(x)
    if cached_result is not None:
        return cached_result
    
    result = sum(i ** 2 for i in range(x))
    cache.set(x, result)
    return result

3. Use Redis Cache

python
import redis
import pickle

# Connect to Redis
r = redis.Redis(host='localhost', port=6379, db=0)

def cache_result(key, value, ttl=3600):
    """Cache result"""
    r.setex(key, ttl, pickle.dumps(value))

def get_cached_result(key):
    """Get cached result"""
    result = r.get(key)
    if result:
        return pickle.loads(result)
    return None

def expensive_operation(data):
    cache_key = f"result:{hash(str(data))}"
    
    # Try to get from cache
    cached = get_cached_result(cache_key)
    if cached:
        return cached
    
    # Execute computation
    result = complex_computation(data)
    
    # Cache result
    cache_result(cache_key, result)
    
    return result

String Optimization

1. Use join Instead of +

python
# Bad practice - Use +
def build_string_slow(parts):
    result = ""
    for part in parts:
        result += part
    return result

# Good practice - Use join
def build_string_fast(parts):
    return ''.join(parts)

# Performance comparison
import timeit
parts = ["part"] * 1000
print(timeit.timeit(lambda: build_string_slow(parts), number=100))
print(timeit.timeit(lambda: build_string_fast(parts), number=100))

2. Use String Formatting

python
# Bad practice - String concatenation
def format_message_slow(name, age):
    return "Name: " + name + ", Age: " + str(age)

# Good practice - Use f-string
def format_message_fast(name, age):
    return f"Name: {name}, Age: {age}"

# Performance comparison
print(timeit.timeit(lambda: format_message_slow("Alice", 25), number=10000))
print(timeit.timeit(lambda: format_message_fast("Alice", 25), number=10000))

3. Use String Methods

python
# Bad practice - Manual processing
def process_string_slow(s):
    result = ""
    for char in s:
        if char.isupper():
            result += char.lower()
        else:
            result += char
    return result

# Good practice - Use built-in methods
def process_string_fast(s):
    return s.lower()

# Performance comparison
print(timeit.timeit(lambda: process_string_slow("HELLO"), number=10000))
print(timeit.timeit(lambda: process_string_fast("HELLO"), number=10000))

Database Optimization

1. Use Connection Pool

python
from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool

# Create connection pool
engine = create_engine(
    'postgresql://user:password@localhost/dbname',
    poolclass=QueuePool,
    pool_size=10,
    max_overflow=5
)

def execute_query(query):
    with engine.connect() as connection:
        result = connection.execute(query)
        return result.fetchall()

2. Batch Insert

python
# Bad practice - Insert one by one
def insert_slow(items):
    for item in items:
        db.execute("INSERT INTO table VALUES (%s)", (item,))

# Good practice - Batch insert
def insert_fast(items):
    db.executemany("INSERT INTO table VALUES (%s)", [(item,) for item in items])

3. Use Indexes

python
# Create index
CREATE INDEX idx_name ON users(name);

# Use index query
SELECT * FROM users WHERE name = 'Alice';

# Avoid full table scan
# Bad practice
SELECT * FROM users WHERE LOWER(name) = 'alice';

# Good practice
SELECT * FROM users WHERE name = 'Alice';

Best Practices

1. Pre-allocate Memory

python
# Bad practice - Dynamic growth
def build_list_slow():
    result = []
    for i in range(10000):
        result.append(i)
    return result

# Good practice - Pre-allocate
def build_list_fast():
    return [i for i in range(10000)]

2. Avoid Global Variables

python
# Bad practice - Use global variables
counter = 0

def increment_global():
    global counter
    counter += 1

# Good practice - Use local variables
def increment_local(counter):
    return counter + 1

3. Use Appropriate Data Types

python
# Bad practice - Use lists for numeric data
numbers = [1, 2, 3, 4, 5]

# Good practice - Use arrays
import array
numbers = array.array('i', [1, 2, 3, 4, 5])

# Bad practice - Use strings for binary data
data = "binary data"

# Good practice - Use bytes
data = b"binary data"

4. Lazy Loading

python
# Bad practice - Load all data immediately
def load_all_data():
    data = []
    for item in large_dataset:
        processed = process_item(item)
        data.append(processed)
    return data

# Good practice - Lazy loading
def load_data_lazy():
    for item in large_dataset:
        yield process_item(item)

Performance Monitoring

1. Use logging to Record Performance

python
import logging
import time

logging.basicConfig(level=logging.INFO)

def logged_function(func):
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = func(*args, **kwargs)
        end_time = time.time()
        logging.info(f"{func.__name__} execution time: {end_time - start_time:.4f} seconds")
        return result
    return wrapper

@logged_function
def expensive_function():
    time.sleep(1)
    return "Done"

expensive_function()

2. Use Performance Counters

python
import time
from collections import defaultdict

class PerformanceMonitor:
    def __init__(self):
        self.counters = defaultdict(list)
    
    def record(self, name, duration):
        self.counters[name].append(duration)
    
    def get_stats(self, name):
        durations = self.counters[name]
        return {
            'count': len(durations),
            'total': sum(durations),
            'average': sum(durations) / len(durations),
            'min': min(durations),
            'max': max(durations)
        }

monitor = PerformanceMonitor()

def monitored_function(func):
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = func(*args, **kwargs)
        end_time = time.time()
        monitor.record(func.__name__, end_time - start_time)
        return result
    return wrapper

Summary

Key points of Python performance optimization:

Performance Analysis Tools: timeit, cProfile, memory_profiler, line_profiler
Algorithm Optimization: Choose appropriate algorithms, use built-in functions, avoid unnecessary computations
Data Structure Optimization: Use appropriate data structures, use generators, use slots
I/O Optimization: Batch processing, use buffering, asynchronous I/O
Concurrency Optimization: Multiprocessing, multithreading, concurrent.futures
Caching Optimization: lru_cache, custom caching, Redis caching
String Optimization: Use join, string formatting, string methods
Database Optimization: Connection pooling, batch insertion, use indexes
Best Practices: Pre-allocate memory, avoid global variables, use appropriate data types, lazy loading
Performance Monitoring: logging, performance counters

Performance optimization principles:

Measure first, then optimize
Optimize bottlenecks, not all code
Balance readability and performance
Use built-in functions and libraries
Consider using C extensions or Cython

Mastering performance optimization techniques enables writing more efficient and faster Python programs.

标签：Python