Fixing `PIL.UnidentifiedImageError` when training Deep Learning Model for Image Classification
Have you ever had one of those moments where you feel like you’re on the cusp of something great, but then it all comes crashing down? That’s what happened to me while working on a deep learning project to classify images using DenseNet transfer learning model. I was already pumped to start making huge progress with training the model and just managing to conclude first epoch of training — a process that had taken over 1 hour; and then, out of nowhere, the error below popped up.
PIL.UnidentifiedImageError? What does that even mean? How on earth did an unidentified image get into one of the folders?
That was where my battle began. LOL! It happened four times in total after trying different approaches and checking different solutions online, leaving me feeling frustrated and defeated. Well, one option would be to look at each of the images and find which one is corrupt, but no way for over 50,000 images? I ain’t going to do that. But, as they say, when one door closes, another one opens, which means I have to find a way to get python to solve that for me, by finding which of the image is slowing me down from reaching my goals.
New challenge = New knowledge = New experience
Through my struggles, I was able to get a solution — identifying and removing corrupted images from my dataset with a very small code. In this blog post, I’ll take you on the same journey of discovery, where we’ll explore how to do this and jump across this bully of a PIL error. Whether you’re a data science enthusiast stuck in this same issue or you are just passing by, this post is for you!
Fixing the Issue
To start with, we are going to be importing the relevant library and modules we will use in our case.
import PIL
from PIL import Image
import os
- PIL is a Python Imaging Library, which is used for opening, manipulating and saving many different image file formats.
- Image is a module within the PIL library that provides a class with the same name, which represents an image object. It provides many methods for working with images, such as resizing, cropping, rotating, and applying filters.
- os is a built-in Python module that provides a way of using operating system dependent functionality like reading or writing to the file system.
So, in summary, PIL and Image are related to image processing, while os is related to interacting with the operating system.
Secondly, we will then go ahead to create a list called `folder_paths`, which contains the file paths of the folders that contain the corrupted images that we want to check. Since I worked on classification of images of 5 different classes, I had each image class in different folders. However, I have used folders 1–5 to represent the folder paths. You will have to adjust to suit your need by adding your own folder paths. You can add if you have more folders, or remove if you have less.
folder_paths = [
'/data/folder1',
'/data/folder2',
'/data/folder3',
'/data/folder4',
'/data/folder5'
]
Now this is where the magic happens!!! Let me explain what we will do first.
- We will use a for loop to iterate through each folder path in folder_paths. Within the first for loop, another for loop would be used to iterate through each file in the current folder using the os.listdir() function.
- For each file, we will make an attempt to open the image using the Image.open() function provided by the PIL library. If the file is not an image or is corrupted, an exception of type PIL.UnidentifiedImageError will be raised.
- If the image can’t be opened due to corruption, the exception is caught, and the error message is printed along with the file name using print() function.
- We will then use the os.remove() function to help us delete the corrupted image file from the folder with a message printed to confirm that the corrupted image has been removed successfully.
So, we will have the following and that solves our problem.
# Identify and delete corrupted image in each of the folder file
for folder_path in folder_paths:
for filename in os.listdir(folder_path):
try:
image = Image.open(os.path.join(folder_path, filename))
except PIL.UnidentifiedImageError as e:
print(f"Error in file {filename}: {e}")
os.remove(os.path.join(folder_path, filename))
print(f"Removed file {filename}")
In conclusion, identifying and removing corrupted images is an important step in image classification. By using the methods mentioned above, you can ensure that your dataset is free from corrupted images which saves you from frustrating loop of error messages after hours of training. One thing to point out though, you may not want to delete automatically, so in that case, you can remove the last two lines of the code above, so that it identifies the corrupted image and then you can delete it manually.
With any luck, I hope this little tidbit of knowledge will make someone’s day a bit brighter. It would also be good to know if you have better ways you have accomplished this in the comment box. Would be glad to know!
Feel free to connect with me via GitHub | LinkedIn | Twitter | YouTube
MY RECOMMENDED SITES TO LEVEL UP YOUR SKILLS