How to Calculate MD5 Hash of a File in Python

MD5 (Message Digest Algorithm 5) is a widely used cryptographic hash function that generates a fixed-length 128-bit hash value. In this article, we’ll learn how to calculate the MD5 hash of a file in Python.

The MD5 hash value of a file in Python can be computed using the hashlib module. By invoking hashlib.md5() method, an MD5 hash object is created which can be finalized using the hexdigest() method to get the hash value of the file. hashlib is a built-in module in Python, supporting various secure hash algorithms such as MD5.

Use of computing MD5 Hash

MD5 is a popular cryptographic hash function used for data security and integrity purposes. It generates a fixed-size output based on the input file. It can be used to verify that the data has not been tampered with or corrupted.

Data integrity can be checked by computing the hash value of a file using the MD5 algorithm before transmission and then comparing it to the hash value computed at the recipient end. If the values match, the file is considered unaltered and free from corruption (As shown in the image below).

Calculate MD5 Hash of a File

What is hashlib module?

hashlib is a module in the Python standard library that provides a variety of hash functions for use in cryptography. hashlib module provides an implementation of various hash algorithms like SHA-1, SHA-224, SHA-256, SHA-384, SHA-512, and MD5.

We have already discussed about SHA-256 algorithm in this article: How to Calculate SHA256 Hash of a File in Python

Each algorithm takes in an input (such as a string or file) and produces a fixed-size string of bytes as an output, which is known as a hash digest or hash value.

How to calculate the MD5 hash of a file in Python using hashlib?

Below is the step-by-step guide for calculating the MD5 hash of a file in Python using hashlib the module:

Step 1: Import the hashlib module

To use the MD5 hash function, we need to import the hashlib module, As it is a build-in module, We don’t need to install it separately

import hashlib

Step 2: Open the File in binary mode

In this step, we open the file in binary mode using the open() function.

open(filename, "rb")

NOTE: ‘rb’ mentioned in the code is to open the file in binary format for reading

Step3: Read the File contents

Once the file has been opened, We can read the contents of the file in binary mode using the read() method.

file_contents = f.read()

Step4: Calculate the MD5 hash

With the file content saved as a variable “file_contents”. we can Calculate the MD5 hash of the file contents using the hashlib.md5() function.

hashlib.md5(file_contents)

Step5: Convert to a hexadecimal string

At last, We can convert the hash value to a hexadecimal string using the hexdigest() method.

hashlib.md5(file_contents).hexdigest()

Final Code:

import hashlib

filename = "example_file.txt"

with open(filename, "rb") as f:
    # Read the contents of the file in binary mode
    file_contents = f.read()

    # Calculate the MD5 hash of the file contents
    md5_hash = hashlib.md5(file_contents).hexdigest()

print(f"The MD5 hash of {filename} is: {md5_hash}")

Output:

The MD5 hash of /Users/user1/PycharmProjects/pythonProject/testing.py is: 8491f54e6ce1ceca6b949ba37e393a12

Process finished with exit code 0

Real-time Example:

Calculating the MD5 hash of a text file

Create a text file: example.txt that contains the following sentence:

Codingspell is a platform to share knowledge in which others can able to learn from your mistake and experience. Visit codingspell.com

Once created, We can proceed to calculate the MD5 hash of this file using the below code(

NOTE: I have used pycharm as IDE to execute all the code provided in this article

import hashlib

filename = "example.txt"

with open(filename, "rb") as f:
    # Read the contents of the file
    file_contents = f.read()

    # Calculate the MD5 hash
    md5_hash = hashlib.md5(file_contents).hexdigest()

print(f"MD5 hash value for the file: {filename} is: {md5_hash}")

Output:

MD5 hash value for the /Users/User1/PycharmProjects/pythonProject/example.txt is: e1bda1f0014334423d80f7f4082d3fa6


Calculating the MD5 hash of a binary file

This is as same as calculating the MD5 hash of a text file, Below proceeding further, We need to Create a binary file, which contains some random binary data. Once created, we can proceed to calculate the MD5 hash of this binary file by providing the binary file as input as shown below (I have used pycharm IDE to execute the below Python code)

For a Quick test: You can download sample binary files from this link and rename or update the filename in the below code.

import hashlib

filename = "example.bin"

with open(filename, "rb") as f:
    # Read the contents of the file
    file_contents = f.read()

    # Calculate the MD5 hash
    md5_hash = hashlib.md5(file_contents).hexdigest()

print(f"MD5 hash value for the file: {filename} is: {md5_hash}")

Output:

MD5 hash value for the file: /Users/User1/PycharmProjects/pythonProject/example.bin is: e1bda1f0014334423d80f7f4082d3fa6

Conclusion :

In Summary, we’ve learned to calculate the MD5 hash of a file in Python using the built-in `hashlib` module. We have discussed each step in detail like how to open a file in binary mode, read its contents, and calculate the MD5 hash of the file contents using the `hashlib.md5()` function.

Also Shared a few examples to calculate the MD5 hash of a custom text file and a binary file. Calculating the MD5 hash of a file is a useful technique for verifying data integrity, detecting changes in files, and ensuring that files haven’t been tampered with or corrupted. With the `hashlib` module in Python, calculating the MD5 hash of a file is easy and straightforward.

Good Luck with your Learning !!

Related Topics:

Python Create Directory If It Doesn’t Exist

Python was not found; run without arguments

ssl module in Python is not available

Mastering Python List of Dictionaries: Your Step-By-Step Guide

Jerry Richard R
Follow me