Programming Computer Vision
Programming
Computer
Vision_CCdraft
Programming
Vision
_CCdraft
Programming Computer Visionwith PythonJan Erik SolemProgramming Computer Vision with PythonCopyright 2012 Jan Erik Solem.This version of the work is a pre-production draft made available under the termsof the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 UnitedStates License.http:/creativecommons.org/licenses/by-nc-nd/3.0/us/2ContentsPreface7Prerequisites and Overview.8Introduction to Computer Vision.9Python and NumPy.10Notation and Conventions.10Acknowledgments.111Basic Image Handling and Processing131.1 PIL the Python Imaging Library.131.2 Matplotlib.161.3 NumPy.201.4 SciPy.311.5 Advanced example:Image de-noising.392Local Image Descriptors452.1 Harris corner detector.452.2 SIFT-Scale-Invariant Feature Transform.522.3 Matching Geotagged Images.633Image to Image Mappings733.1 Homographies.733.2 Warping images.783.3 Creating Panoramas.914Camera Models and Augmented Reality1034.1 The Pin-hole Camera Model.1034.2 Camera Calibration.1094.3 Pose Estimation from Planes and Markers.1104.4 Augmented Reality.11435Multiple View Geometry1275.1 Epipolar Geometry.1275.2 Computing with Cameras and 3D Structure.1365.3 Multiple View Reconstruction.1445.4 Stereo Images.1526Clustering Images1616.1 K-means Clustering.1616.2 Hierarchical Clustering.1696.3 Spectral Clustering.1757Searching Images1857.1 Content-based Image Retrieval.1857.2 Visual Words.1867.3 Indexing Images.1907.4 Searching the Database for Images.1947.5 Ranking Results using Geometry.1997.6 Building Demos and Web Applications.2028Classifying Image Content2098.1 K-Nearest Neighbors.2098.2 Bayes Classifier.2188.3 Support Vector Machines.2238.4 Optical Character Recognition.2289Image Segmentation2379.1 Graph Cuts.2379.2 Segmentation using Clustering.2489.3 Variational Methods.25210OpenCV25710.1The OpenCV Python Interface.25710.2OpenCV Basics.25810.3Processing Video.26210.4Tracking.26510.5More Examples.273A Installing Packages279A.1 NumPy and SciPy.279A.2 Matplotlib.280A.3 PIL.2804CONTENTSA.4 LibSVM.281A.5 OpenCV.281A.6 VLFeat.282A.7 PyGame.282A.8 PyOpenGL.283A.9 Pydot.283A.10Python-graph.283A.11Simplejson.284A.12PySQLite.284A.13CherryPy.285B Image Datasets287B.1 Flickr.287B.2 Panoramio.288B.3 Oxford Visual Geometry Group.289B.4 University of Kentucky Recognition Benchmark Images.289B.5 Other.290C Image Credits291CONTENTS5PrefaceToday,images and video are everywhere.Online photo sharing sites and social net-works have them in the billions.Search engines will produce images of just about anyconceivable query.Practically all phones and computers come with built in cameras.It is not uncommon for people to have many gigabytes of photos and videos on theirdevices.Programming a computer and designing algorithms for understanding what is inthese images is the field of computer vision.Computer vision powers applications likeimage search,robot navigation,medical image analysis,photo management and manymore.The idea behind this book is to give an easily accessible entry point to hands-oncomputer vision with enough understanding of the underlying theory and algorithmsto be a foundation for students,researchers and enthusiasts.The Python programminglanguage,the language choice of this book,comes with many freely available powerfulmodules for handling images,mathematical computing and data mining.When writing this book I have had the following principles as a guideline.The bookshould:be written in an exploratory style.Encourage readers to follow the examples ontheir computers as they are reading the text.promote and use free and open software with a low learning threshold.Pythonwas the obvious choice.be complete and self-contained.Not complete as in covering all of computer vi-sion(this book is far from that!)but rather complete in that all code is presentedand explained.The reader should be able to reproduce the examples and buildupon them directly.be broad rather than detailed,inspiring and motivational rather than theoretical.In short:act as a source of inspiration for those interested in programming computervision applications.7Prerequisites and OverviewWhat you need to know Basic programming experience.You need to know how to use an editor and runscripts,how to structure code as well as basic data types.Familiarity with Pythonor other scripting style languages like Ruby or Matlab will help.Basic mathematics.To make full use of the examples it helps if you know aboutmatrices,vectors,matrix multiplication,the standard mathematical functionsand concepts like derivatives and gradients.Some of the more advanced mathe-matical examples can be easily skipped.What you will learn Hands-on programming with images using Python.Computer vision techniques behind a wide variety of real-world applications.Many of the fundamental algorithms and how to implement and apply them your-self.The code examples in this book will show you object recognition,content-basedimage retrieval,image search,optical character recognition,optical flow,tracking,3D reconstruction,stereo imaging,augmented reality,pose estimation,panorama cre-ation,image segmentation,de-noising,image grouping and more.Chapter OverviewChapter 1Introduces the basic tools for working with images and the central Pythonmodules used in the book.This chapter also covers many fundamental examplesneeded for the remaining chapters.Chapter 2Explains methods for detecting interest points in images and how to usethem to find corresponding points and regions between images.Chapter 3Describes basic transformations between images and methods for com-puting them.Examples range from image warping to creating panoramas.Chapter 4Introduces how to model cameras,generate image projections from 3Dspace to image features and estimate the camera viewpoint.8CONTENTSChapter 5Explains how to work with several images of the same scene,the fun-damentals of multiple-view geometry and how to compute 3D reconstructions fromimages.Chapter 6Introduces a number of clustering methods and shows how to use themfor grouping and organizing images based on similarity or content.Chapter 7Shows how to build efficient image retrieval techniques that can storeimage representations and search for images based on their visual content.Chapter 8Describes algorithms for classifying image content and how to use themrecognizing objects in images.Chapter 9Introduces different techniques for dividing an image into meaningfulregions using clustering,user interactions or image models.Chapter 10Shows how to use the Python interface for the commonly used OpenCVcomputer vision library and how to work with video and camera input.Introduction to Computer VisionComputer vision is the automated extraction of information from images.Informationcan mean anything from 3D models,camera position,object detection and recognitionto grouping and searching image content.In this book we take a wide definition ofcomputer vision and include things like image warping,de-noising and augmentedreality1.Sometimes computer vision tries to mimic human vision,sometimes uses a dataand statistical approach,sometimes geometry is the key to solving problems.We willtry to cover all of these angles in this book.Practical computer vision contains a mix of programming,modeling,and mathe-matics and is sometimes difficult to grasp.I have deliberately tried to present the ma-terial with a minimum of theory in the spirit of as simple as possible but no simpler.The mathematical parts of the presentation are there to help readers understand thealgorithms.Some chapters are by nature very math heavy(chapters 4 and 5 mainly).Readers can skip the math if they like and still use the example code.1These examples produce new images and are more image processing than actually extracting infor-mation from images.CONTENTS9Python and NumPyPython is the programming language used in the code examples throughout this book.Python is a clear and concise language with good support for input/output,numerics,images and plotting.The language has some peculiarities such as indentation andcompact syntax that takes getting used to.The code examples assume you have Python2.6 or later as most packages are only available for these versions.The upcomingPython 3.x version has many language differences and is not backward compatiblewith Python 2.x or compatible with the ecosystem of packages we need(yet).Some familiarity with basic Python will make the material more accessible for read-ers.For beginners to Python,Mark Lutz book 20 and the online documentation athttp:/www.python.org/are good starting points.When programming computer vision we need representations of vectors and ma-trices and operations on them.This is handled by Pythons NumPy module where bothvectors and matrices are represented by thearraytype.This is also the represen-tation we will use for images.A good NumPy reference is Travis Oliphants free book24.The documentation at http:/numpy.scipy.org/is also a good starting point ifyou are new to NumPy.For visualizing results we will use the Matplotlib module andfor more advanced mathematics,we will use SciPy.These are the central packagesyou will need and will be explained and introduced in Chapter 1.Besides these central packages there will be many other free Python packages usedfor specific purposes like reading JSON or XML,loading and saving data,generatinggraphs,graphics programming,web demos,classifiers and many more.These areusually only needed for specific applications or demos and can be skipped if you arenot interested in that particular application.It is worth mentioning IPython,an interactive Python shell that makes debug-ging and experimentation easier.Documentation and download available at http:/ipython.org/.Notation and ConventionsCode is given in a special boxed environment with color highlighting(in the electronicversion)and looks like this:#some pointsx=100,100,400,400y=200,500,200,500#plot the pointsplot(x,y)10CONTENTSText is typeset according to these conventions:Italicis used for definitions,filenames and variable names.Typewriteris used for functions and Python modules.Small constant widthis used for console printout and results from calls and APIs.Hyperlinkis used for URLs(clickable in the electronic version).Plain textis used for everything else.Mathematical formulas are given inline like thisf(x)=wTx+bor centered indepen-dentlyf(x)=Xiwixi+b,and are only numbered when a reference is needed.In the mathematical sections we will use lowercase(s,r?,.)for scalars,uppercase(A,V,H,.)for matrices(includingIfor the image as an array)andlowercase bold(t,c,.)for vectors.We will usex=x,yandX=X,Y,Zto meanpoints in 2D(images)and 3D respectively.AcknowledgmentsId like to express my gratitude to everyone involved in the development and produc-tion of this book.The whole OReilly team has been helpful.Special thanks to AndyOram(OReilly)for editing,and Paul Anagnostopoulos(Windfall)for efficient produc-tion work.Many people commented on the various drafts of this book as I shared them on-line.Klas Josephson and Hkan Ard deserves lots of praise for thorough commentsand feedback.Fredrik Kahl and Pau Gargallo helped with fact checks.Thank youall readers for encouraging words and for making the text and code examples bet-ter.Receiving emails from strangers sharing their thoughts on the drafts was a greatmotivator.Finally,Id like to thank my friends and family for support and understanding whenI spend nights and weekends on writing.Most thanks of all to my wife Sara,my longtime supporter.CONTENTS11Chapter 1Basic Image Handling andProcessingThis chapter is an introduction to handling and processing images.With extensive ex-amples,it explains the central Python packages you will need for working with images.This chapter introduces the basic tools for reading images,converting and scaling im-ages,computing derivatives,plotting or saving results,and so on.We will use thesethroughout the remainder of the book.1.1PIL the Python Imaging LibraryThe Python Imaging Library(PIL)provides general image handling and lots of usefulbasic image operations like resizing,cropping,rotating,color conversion and muchmore.PIL is free and available from http:/ PIL you can read images from most formats and write to the most commonones.The most important module is theImagemodule.To read an image usefrom PIL import Imagepil_im=Image.open(empire.jpg)The return value,pil_im,is a PIL image object.Color conversions are done using theconvert()method.To read an image andconvert it to grayscale,just addconvert(0L0)like this:pil_im=Image.open(empire.jpg).convert(L)Here are some examples taken from the PIL documentation,available at http:/ from the examples13Figure 1.1:Examples of processing images with PIL.is shown in Figure 1.1.Convert images to another formatUsing thesave()method,PIL can save images in most image file formats.Heresan example that takes all image files in a list of filenames(filelist)and converts theimages to JPEG files.from PIL import Imageimport osfor infile in filelist:outfile=os.path.splitext(infile)0+.jpgif infile!=outfile:try:Image.open(infile).save(outfile)except IOError:print cannot convert,infileThe PIL functionopen()creates a PIL image object and thesave()method saves theimage to a file with the given filename.The new filename will be the same as theoriginal with the file ending.jpg instead.PIL is smart enough to determine the imageformat from the file extension.There is a simple check that the file is not already aJPEG file and a message is printed to the console if the conversion fails.Throughout this book we are going to need lists of images to process.Heres howyou could create a list of filenames of all images in a folder.Create a file imtools.py tostore some of these generally useful routines and add the following function.import osdef get_imlist(path):141.1.PIL the Python Imaging Library Returns a list of filenames forall jpg images in a directory.return os.path.join(path,f)for f in os.listdir(path)if f.endswith(.jpg)Now,back to PIL.Create thumbnailsUsing PIL to create thumbnails is very simple.Thethumbnail()method takes a tuplespecifying the new size and converts the image to a thumbnail image with size that fitswithin the tuple.To create a thumbnail with longest side 128 pixels,use the methodlike this:pil_im.thumbnail(128,128)Copy and paste regionsCropping a region from an image is done using thecrop()method.box=(100,100,400,400)region=pil_im.crop(box)The region is defined by a 4-tuple,where coordinates are(left,upper,right,lower).PIL uses a coordinate system with(0,0)in the upper left corner.The extracted regioncan for example be rotated and then put back using thepaste()method like this:region=region.transpose(Image.ROTATE_180)pil_im.paste(region,box)Resize and rotateTo resize an image,callresize()with a tuple giving the new size.out=pil_im.resize(128,128)To rotate an image,use counter clockwise angles androtate()like this:out=pil_im.rotate(45)Some examples are shown in Figure 1.1.The leftmost image is the original,follow