Python is a programming language known for its simplicity and readability, but when it comes to speed, it falls short. This is especially critical when working in machine learning and dealing with huge amounts of data. Today, we'll look at a couple of examples of how to speed up your code out of the blue.
Using generators instead of lists
Let's start with memory optimization. Imagine we're faced with the task of looping through all the data and modifying it. Moreover, using vectorized calculations ( Numpy , Pandas , etc.) is prohibited. In this case, the first thing that comes to mind is to create a list and dump everything into it. Let's check how much memory this will take:
To do this, we'll write a simple function to display how much RAM an object takes up and then generate the data:
Hidden text
After loading, the data takes up:Total MBs size: 8.057334899902344
Now we create two functions:
# Первая функция через список
def squares_list(data):
result = []
for elem in data:
result.append(elem * 2)
return result
# Вторая функция через генератор
def squares_generator(data):
for elem in data:
yield elem * 2
# Пример использования
squares_list(1_000_000) # Занимает много памяти
squares_generator(1_000_000) # Экономит память
Here, the second function doesn't create anything, but simply returns an iterator you can iterate over and save what you need.
Final measurements: the brute-force solution takes the same amount of time as the original data - 8.06 MBs, and the generators outperform - 1e-6 MBs! The second solution is convenient to use in cases where you need to create objects many times or store only part of the output.
Local variables
When writing your pipelines, keep in mind that accessing global variables can slow down your code. Let's get rid of this immediately!
If possible, don't use the operator global. It's better to create a new variable and store it in some form.
global_variable = 10
def func1():
global gloval_variable
global_variable = 52
print(global_variable)
def func2():
local_variable = 10
print(local_variable)
func1() # Доступ к глобальной переменной
func2() # Доступ к локальной переменной
Using for instead of while
for Python loops are often faster than for loops whilebecause they are optimized for iterating over sequences. If you know the number of iterations, use for for instead while.
# Цикл `while`
i = 0
while i < 10:
print(i)
i += 1
# Цикл `for`
for i in range(10):
print(i)
Avoid append in loops
Using it appendinside a loop can slow down the function because Python creates a new list each time. If possible, use it with the correct size to avoid frequent memory reallocations. list
# Использование `append`
result = []
for i in range(10):
result.append(i)
# Создание списка с нужным размером
result = [0] * 10
for i in range(10):
result = i
Using map and filter
These functions allow you to apply a function to each element of an iterable (such as a list) without writing your own loop. This makes the code more understandable and sometimes speeds up its execution.
# Использование `map`
numbers = [1, 2, 3, 4]
squares = map(lambda x: x ** 2, numbers)
print(list(squares))
# Вывод: [1, 4, 9, 16]
# Использование `filter`
numbers = [1, 2, 3, 4, 5]
even_numbers = filter(lambda x: x % 2 == 0, numbers)
print(list(even_numbers))
# Вывод: [2, 4]
As a bonus, grab a mock interview where I explain other things you might encounter in a Data Science interview. I've written more about effective code optimization for work tasks here – take advantage!
Using generators instead of lists
Let's start with memory optimization. Imagine we're faced with the task of looping through all the data and modifying it. Moreover, using vectorized calculations ( Numpy , Pandas , etc.) is prohibited. In this case, the first thing that comes to mind is to create a list and dump everything into it. Let's check how much memory this will take:To do this, we'll write a simple function to display how much RAM an object takes up and then generate the data:
Hidden text
After loading, the data takes up:Total MBs size: 8.057334899902344
Now we create two functions:
# Первая функция через список
def squares_list(data):
result = []
for elem in data:
result.append(elem * 2)
return result
# Вторая функция через генератор
def squares_generator(data):
for elem in data:
yield elem * 2
# Пример использования
squares_list(1_000_000) # Занимает много памяти
squares_generator(1_000_000) # Экономит память
Here, the second function doesn't create anything, but simply returns an iterator you can iterate over and save what you need.
Local variables
When writing your pipelines, keep in mind that accessing global variables can slow down your code. Let's get rid of this immediately!If possible, don't use the operator global. It's better to create a new variable and store it in some form.
global_variable = 10
def func1():
global gloval_variable
global_variable = 52
print(global_variable)
def func2():
local_variable = 10
print(local_variable)
func1() # Доступ к глобальной переменной
func2() # Доступ к локальной переменной
Using for instead of while
️
for Python loops are often faster than for loops whilebecause they are optimized for iterating over sequences. If you know the number of iterations, use for for instead while.# Цикл `while`
i = 0
while i < 10:
print(i)
i += 1
# Цикл `for`
for i in range(10):
print(i)
Avoid append in loops
Using it appendinside a loop can slow down the function because Python creates a new list each time. If possible, use it with the correct size to avoid frequent memory reallocations. list# Использование `append`
result = []
for i in range(10):
result.append(i)
# Создание списка с нужным размером
result = [0] * 10
for i in range(10):
result = i
Using map and filter
These functions allow you to apply a function to each element of an iterable (such as a list) without writing your own loop. This makes the code more understandable and sometimes speeds up its execution.# Использование `map`
numbers = [1, 2, 3, 4]
squares = map(lambda x: x ** 2, numbers)
print(list(squares))
# Вывод: [1, 4, 9, 16]
# Использование `filter`
numbers = [1, 2, 3, 4, 5]
even_numbers = filter(lambda x: x % 2 == 0, numbers)
print(list(even_numbers))
# Вывод: [2, 4]
Bottom line 🏖
These 5 ways to optimize Python functions will help you make your code faster and more efficient. It's important to remember that you don't always need to use all of these methods simultaneously. Experiment and choose the right solutions based on the specific task.As a bonus, grab a mock interview where I explain other things you might encounter in a Data Science interview. I've written more about effective code optimization for work tasks here – take advantage!