visit
In Python, basically, is said to be the process through which we convert the python objects into the byte stream.Now, the Byte stream is a sequence of bytes that can be used by any program for any input or output operation. The main idea is that this byte stream that will be obtained by converting Python objects has all the necessary information so that in future if we are writing any Python script and need those objects, we can reconstruct them through the byte stream we have. In other words, it is all about serializing the structure of objects. To implement this feature, Python provides a pickle module. It is used mainly to serialize and deserialize the object structure in Python. If a Python object needs to be saved on a disk, then it can be pickled before writing it to the file, which means it will be serialized first and then stored in a file.
The above image (Img src: cppsecrets) shows that through serialization, python objects are stored in a file in the form of bytes and we can also deserialize the file to obtain the objects again.
Pickling Process:
If we want to serialize any object hierarchy then the Pickle module provides a ‘dump’ function where we need to pass the desired arguments to it and it serializes the file for us. The dump function looks like this:pickle.dump(object, file_obj, protocol)
There are basically three arguments in the function:
The first argument is the python object that needs to be serialized.
The second argument will be the file object where we will store the serialized python object.
The third is the protocol. If it is not specified the by default protocol 0 is taken. As the new versions of python were introduced, they had different protocols with improved features for pickling.
Example:
# Pickling Example in Python
import pickle
# Sample Python object
sample_list = [23, 'Hello World', 'Python']
# Pickling
with open("data.pickle","wb") as file_handle:
pickle.dump(sample_list, file_handle, pickle.HIGHEST_PROTOCOL)
print("Pickling finished!")
Output:
Pickling finished!
The pickle module that is imported using keyword import pickle accepts any python objects and after that, the object is converted into a string representation and then dumped into a file with the help of dump() method. This process is called pickling.
Some advantages of the Pickle Module:
The pickle module easily handles the Recursive objects. Recursive objects are those which contain a reference to themselves. Serializing such objects may cause programs to be stuck in an infinite loop and eventually crash the interpreter as the reference to the same object will keep on occurring recursively. To handle this, the Pickle module tracks all the objects it has serialized, so if the object is already serialized and later on its reference is found, it does not serialize it again. One more great advantage of using the pickle module is that it can serialize pretty much any python object in an easy way without having to add so much extra code.Unpickling can be said as the process in which original Python objects are retrieved from the previously-stored string representation or we can say pickle file.So it’s just the opposite of pickling i.e here a Byte stream is converted into a Python object.
Unpickling Process:
If we want to deserialize any file containing byte streams and obtain the python object from it, then the Pickle module provides a ‘load’ function where we need to pass the file name as an argument, and the load function will deserialize the file and give us the Python object. Let’s see the sample implementation of the unpickling process. Here in the example, we will deserialize the same “data.pickle” file we obtained by pickling the python object we made earlier. In the Pickle module’s load function we pass the file and then receive the Python object in a variable named ‘retrieved_data’.Example:
# Unpickling example in Python
import pickle
# Pickling
with open("data.pickle","rb") as file_handle:
retrieved_data = pickle.load(file_handle)
print(retrieved_data)
Output:
[23, 'Hello World', 'Python']
What data types can be Pickled?
If we talk of the data types that support pickling, then those are Integer data types, Float data types, Boolean, Complex numbers, Strings and also the data types like Tuples, Lists, Sets all are compatible with pickling.Pickling Use Cases:
Pickling can be used when a program’s state needs to be saved on the disk so that when it is restarted, it can start off from where it was previously left. It is also very useful when python data needs to be sent over a TCP connection over a multicore or distributed system. Whenever Python objects need to be stored in a database, pickling can be helpful. Pickling can help in caching where we can convert an arbitrary python object to a string and use it as a dictionary key.Some Dangers of Pickling:
As the documentation of the pickling module states that pickle module is not secure against incorrect or maliciously constructed data because, during unpickling, it executes any arbitrary code given to it. So it becomes very easy to create such data that may harm your device if it gets executed. So it is always a great practice to never ever unpickle the data that is received from an unauthorized or unknown source.Conclusion:
Finally, we can conclude that are quite simple but yet very important and useful processes. As a data scientist, your code needs to be serialized for several reasons such as to save your fitted model to the disk. So the pickle module makes the life of data scientists much easier who work with ML algorithms all the time. Just with the help of ‘dump’ and ‘load’ functions, they can easily pickle and unpickle their data. Still while using the modules, one must always take care of vulnerabilities and should never use them between unknown parties. One must always ensure that the parties exchanging Pickle have an encrypted network connection.