Python Program to Find Hash of Given File

Files can be hashed using various algorithms like MD5, SHA-1, SHA-256, and many more to ensure data integrity and confirm the authenticity of data. Hashing is widely used in cryptography, data verification, and digital forensics.

In this article, we will explore how to compute the hash of a file using Python. We will mainly focus on using the MD5, SHA-1, and SHA-256 algorithms.

Prerequisites

To compute the hash of a file using Python, we need the `hashlib` module, which provides algorithms for hashing.

Methods to Find the Hash of a File

MD5 (Message Digest Algorithm 5):
MD5 is a widely-used cryptographic hash function that produces a 128-bit (16-byte) hash value. It is usually represented as a 32-digit hexadecimal number. Despite its popularity, MD5 is considered broken and unsuitable for further use as it’s vulnerable to hash collisions.

SHA-1 (Secure Hash Algorithm 1):
SHA-1 produces a 160-bit (20-byte) hash value, typically rendered as a 40-digit hexadecimal number. SHA-1 is also considered broken and unsuitable for cryptographic security.

SHA-256 (Secure Hash Algorithm 256-bit):
SHA-256 is a member of the SHA-2 cryptographic hash functions, generating a hash value of 256 bits (32 bytes). It is represented as a 64-digit hexadecimal number. Currently, it is widely accepted and used for cryptographic purposes.

A Practical Example

Let’s write a Python program that computes the hash of a file using the above algorithms:


import hashlib

def compute_hash(file_path, algorithm='sha256'):
    '''Compute and return the hash of a file using the specified algorithm.'''

    # Create a hash object
    if algorithm == 'md5':
        hasher = hashlib.md5()
    elif algorithm == 'sha1':
        hasher = hashlib.sha1()
    else:
        hasher = hashlib.sha256()

    # Open the file in binary read mode
    with open(file_path, 'rb') as file:
        # Read and update hash in chunks to save memory
        for chunk in iter(lambda: file.read(4096), b""):
            hasher.update(chunk)

    # Return the hexadecimal representation of the hash
    return hasher.hexdigest()

# Example usage
file_path = "path_to_file.txt"
print(f"MD5: {compute_hash(file_path, 'md5')}")
print(f"SHA-1: {compute_hash(file_path, 'sha1')}")
print(f"SHA-256: {compute_hash(file_path)}")  # Default is sha256

Explanation

We define a function `compute_hash()` which computes the hash of the file using the provided algorithm.
Inside the function, we initiate the hash object based on the selected algorithm using `hashlib`.
We read the file in binary mode in chunks. This is useful for large files, as reading them at once might consume a lot of memory. The chunk size here is 4096 bytes.
For every chunk read, we update our hash object using the `update()` method.
Once the entire file has been read and the hash object has been updated, we return the hexadecimal representation of the hash.

In our example usage, we provide the file path and then compute the hash using the different algorithms.

Conclusion

Hashing files can be a critical process to verify the integrity of data, especially when transmitting over a network or storing for archival purposes. Python’s hashlib module makes it straightforward and efficient. However, it’s essential to choose a secure and appropriate hashing algorithm based on the use case. As of the current state, MD5 and SHA-1 should be avoided for cryptographic security, and instead, SHA-256 or even stronger algorithms from the SHA-2 or SHA-3 family should be considered.

Python Program to Find Hash of Given File

Prerequisites

Methods to Find the Hash of a File

A Practical Example

Explanation

Conclusion

How to Setup Apache with Python WSGI on Ubuntu 24.04 & 22.04

How to Install and Configure PyENV on Ubuntu in Minutes

How to Configure Flask Application Visible on the Network

Python Program to Find Hash of Given File

Prerequisites

Methods to Find the Hash of a File

A Practical Example

Explanation

Conclusion

Related Posts

How to Setup Apache with Python WSGI on Ubuntu 24.04 & 22.04

How to Install and Configure PyENV on Ubuntu in Minutes

How to Configure Flask Application Visible on the Network