visit
Generating a random number, communicating over the network, or controlling a robot are all examples of side effects in Python. However, if the software can’t affect the external world, it is pointless.
But unnecessary side effects can cause problems and better be avoided (see the previous post).
Key points for this post:
People whether the second property should be called a "side effect.” I found it important to identify and distinguish "input" vs. "output" side effects, at least in Python. So in the , I will call incoming side effects “side arguments” and outgoing external side effects “side results.” It seems like 65% of people are , so here are some pictures for the majority:
Dirty function | Example |
---|---|
|
|
Calling system time.time()
is an "input" side effect from the external system on the function. The print
statement is an "output" side effect from the function to the external world. If one removes all side effects from this red "dirty" function, we will get a side-effect-free green one:
Clean function | Example |
---|---|
|
|
Dirty application | Example |
---|---|
|
|
Clean application | Example |
---|---|
|
|
def inverse(x):
print("A message that I feel everyone"
"would benefit from!")
return -x
def algorithm_1(x):
os.mkdir("folder_with_results")
return -x
def algorithm(argument):
set_theano_flags(current_time=time.time()) # Init something used 5 years ago
result = (... complicated logic based on argument)
return result
This function has two side-arguments: it reads from the global random generator for the random mask and a cutout. It has four side-results:
This function is hard to test and will always bring you trouble by polluting /tmp
and the console.
We are taking a random generator in and returning the debug mask. The high-level function is free to choose whether to pass seeded rng
for reproducibility or use the global one for convenience. It will also decide how to save the debug masks if still needed.
def dump_default_config(path):
default_config = {'hidden_size': 128, 'learning_coeff': 0.01}
with open(path, 'wb') as f:
pickle.dump(default_config, f)
def run_network(network_config_path, image):
with open(network_config_path, 'rb') as f:
config = pickle.load(f)
network = create_network(**config)
prediction = network(image)
return prediction
def network_main(image):
config_path = 'my_config.pkl'
dump_default_config(config_path)
# update the learning coefficient in the config file
with open(config_path, 'rb') as f:
config = pickle.load(f)
config['learning_coeff'] = 1e-4 # better learning coefficient
with open(config_path, 'wb') as f:
pickle.dump(config)
return run_network(config_path, image)
def create_default_config():
return {'hidden_size': 128, 'learning_coeff': 0.01}
def run_network(config, image):
network = create_network(**config)
prediction = network(image)
return prediction
def network_main(image):
config_path = 'my_config.pkl'
default_config = create_default_config()
with open(config_path, 'wb') as f:
pickle.dump(default_config, f)
# update learning coefficient in the config file
with open(config_path, 'rb') as f:
config = pickle.load(f)
config['learning_coeff'] = 1e-4 # better learning coefficient
with open(config_path, 'wb') as f:
pickle.dump(config)
with open(config_path, 'rb') as f:
config = pickle.load(f)
return run_network(config, image)
def create_default_config():
return {'hidden_size': 128, 'learning_coeff': 0.01}
def run_network(config, image):
network = create_network(**config)
prediction = network(image)
return prediction
def network_main(image):
config = create_default_config()
config['learning_coeff'] = 1e-4 # better learning coefficient
return run_network(config, image)
Side effects might bring you problems, but hidden side effects are the worst. Imagine you decided to use an external library to make a friendly math-related application:
from external_library import compute_optimal_solution
def main():
x = input("Enter the number")
value = compute_optimal_solution(x)
print("Optimal value is :", exp(value))
You happily deploy it only to receive user complaints about database-related crashes. You're really surprised since you just wanted to provide some math utility and never intended to deal with databases. Looking into the source of compute_optimal_solution
, you might find something like:
def compute_optimal_solution(x):
result = 0
for i in range(1000):
result += i*x - log(i*x) + sin(i*x)
# to understand how people use our function,
# we log the results in the debug database
database_cache = sqlite3.connect(DEFAULT_DB)
cursor = database_cache.cursor()
cursor.execute(f"INSERT INTO MyDB (argument, solution) VALUES ({x}, {result})")
database_cache.commit()
cursor.close()
return result
compute_optimal_solution_and_cache_solution_in_database(x)
You also can expose side effects by splitting “clean” and “dirty” code on the module level. For example, a library-like folder should have only clean side-effect-free code. All side effects should go into an application-like folder (e.g., scripts
, app
, or runners
). Here is another reinforcing this point.
Dependency injection?
Injecting an object that might produce a side effect instead of producing it yourself is a common way to kick a can down the road:
pass Timer
instead of time.time()
pass logging.Logger
instead of a print
Return a functor?
Instead of causing a side effect right away, you can return a "lazy" function that would do it later. See about this technique. Also, see the "Lazy functions" section in . It's a fun long read, but it probably goes beyond the needs of a regular Python mortal.
Copy an input container?
Modifying an incoming list
or dict
is also a side effect. Quite often, it's worth copying, modifying, and returning it instead. See the discussion and . Is it slower? Probably, yes. But the actual question should be this: will your company spend more money on (A) executing slower code or (B) debugging bugs caused by side-effects?
Prints and loggers?
While logging
is a side effect, but it's not the worst one. At least, the majority of developers don't treat it as such. It's hard to advise anything specific without going on a long tangent. You can adopt configurable logging
, pass a logger as a dependency to every function, return string messages, or stick with print
(e.g., ).
Also published