This is a project that I started a very long time ago (in 2006). I was inspired by a puzzle my friend had that was a picture of a Simpsons scene but made up of smaller pictures from The Simpsons. I asked myself, how could one go about doing this? I thought to myself, it might be possible to take an average of each of the RGB values from a picture, then use that average as if it were a single pixel inside a larger picture. I worked on it for a couple of nights and at the end I had something that worked.
I've re-written this project a couple of times since my original attempt. For several reasons. First among them was that this was written while I was teaching at Churchill high school, about two years before I started working towards my Computer Science degree at UTSA. I didn't use very much modularity (basically my methods were huge subroutines), I didn't use any outside libraries (just the Java2 API), and there was no optimizations of any kind. I updated it a few years later to incorporate an external package to handle resizing the images (java-image-scaling-0.8.5). I also attempted to add a threaded approach to processing the picture using the Callable Interface.
I finally re-wrote again this past year (2018), mainly to fix the parallelization issues. Also, being curious if there were already services to do this (there are), I was curious to see what their results were. One of the major improvements in this third attempt was something I noticed from an example I found from a website that does this professionally. I noticed that the individual pictures (inside the larger one), were covering more than just a single pixel. So I updated my database algorithm, to, instead of shrinking a picture into a single pixel, to store it as a 2D array of pixels (like a 3X3 or 2X2).
There are 5 main classes in this project (there are a few others that are more of helper classes, like handling the Callable Interface, handling file traversal, and a name generating class to ensure each resized picture in my database has a unique name):
find(...)
and addPicture(...)
. I chose to make this an
interface, rather than a concrete class, so that I would have
leeway in how it was implemented (i.e. this could be backed by an
actual database or XML or just plain text). In testing, I just
created a "DummyDB" that used a large text file (not the most
efficient but it was quick and easy and this interface allows me
to not be tied down to that, if I want to update it in the
future).PictureDB
.
It has a single method processFiles(final File
rootDir, final File dbDir)
. rootDir
is the root
folder from where this method will start looking for picture files
(and does a full search of all sub-folders). dbDir
is
the directory where all of the resized thumbnails will go (the
pictures that will actually be used for the final image).Comparable Interface
so that it can easily be
used in a sorted list. Thumbnails are sorted based on the smallest
Euclidean distance (squared), when comparing the RGB values, for
each region of the thumbnail subsampled.PictureDB
does most of the heavy lifting for this class since it is
responsible for finding the suitable thumbnail. It actually
populates a Java TreeSet
, I then randomly choose from
the top of this list so that the same picture is not used in
similarly colored areas. This still needs to be parallelized.This part is somewhat unfinished as I only implemented parallelization in creating the database. The creation of the database is very straightforward to parallelize since the processing of each picture is completely independent. Likewise generating the pixelated picture should be straightforward as well since the task is simply finding the best picture for each subsample--thus is independent.
The idea is simple. Each picture can be analyzed separately so multiple threads should be analyzing different pictures at all times. This presents some challenges though:
The first two are fairly straightforward to handle. I use an Executor to handle the threads in a fixed thread pool based on the Runtime method availableProcessors().
PictureFileTraversalParallel.java: I used the "poison pill" approach to signal when to stop the processing threads. Essentially each processing thread pops from a queue of files (to be processed). The thread populating the queue, once it has fully traversed the directory, will put a "poison pill" into its queue for each thread. Each processing thread perpetually tries to pop from the queue until it receives the "poison pill" which signals to the processing thread that it can quit and return its result.
The file traversal class uses a LinkedBlockingQueue.
This ensures that, in event the file queue becomes empty, the next
processing thread will wait until the next result comes in. The
method that returns the next file is
synchronized
so that only one processing thread can access the queue at a given
time.
Here are the files responsible for parallization of the database process:
synchronized
way of making sure no duplicates are
generated. This is a bit of a tradeoff. I could have either
created a class that holds the source file name and the thumbnail
path and then have the PictureFileTraversalParallel
object generate unique names (sequentially), and finally put those
objects (that hold the source and thumbnail path) into its queue.
Or I could create this NameGen
class which specifically handles name generation and push that
duty closer to the processing thread (my chosen approach). This
way seemed like a cleaner way of doing that. This was unnecessary
in the serial case because the single processing thread
could simply increment its counter to generate the next file name.
The results were very good for parallelization. In fact they are so good, I realize now that I never actually allowed the serial process to go through my entire set of pictures. As I'm writing this, I'm still waiting on the serial process to finish to give a time comparison:
Here is a screenshot comparison of the resource monitor in Windows 10 as each started. I'll leave it up to the reader to decide which one is the parallel vs. serial version.