Các cơ sở dữ liệu trong nhận dạng (Databases)

by jackauk Mon Aug 17, 2015 10:02 am

Nguồn trang: Face_rec.org

Khi đánh giá một thuật toán, việc cần làm là sử dụng một tập dữ liệu thử nghiệm tiêu chuẩn cho các nhà nghiên cứu để có thể trực tiếp so sánh kết quả. Trong khi có rất nhiều cơ sở dữ liệu được sử dụng hiện nay, việc lựa chọn một cơ sở dữ liệu thích hợp để được sử dụng nên được thực hiện dựa trên các nhiệm vụ được giao (tuổi tác, cảm xúc, ánh sáng vv). Một cách khác là để chọn các dữ liệu thiết lập cụ thể đối với tính chất được thử nghiệm (ví dụ như thuật toán sẽ xử lý ra sao khi đưa ra hình ảnh với những thay đổi ánh sáng, hình ảnh có nét mặt khác nhau). Và mặt khác, một thuật toán cần phải được huấn luyện với nhiều hình ảnh hơn mỗi lớp (như LDA), cơ sở dữ liệu khuôn mặt Yale có lẽ là thích hợp hơn so FERET.

Đọc thêm:

R. Gross, Face Databases, Handbook of Face Recognition, Stan Z. Li and Anil K. Jain, ed., Springer-Verlag, February 2005, 22 pages
link

Nghi thức kiểm tra:

Face Image ISO Compliance Verification Benchmark Area - FVC-onGoing là một hệ thống đánh giá tự động dựa trên nền tảng web được phát triển để đánh giá các thuật toán sinh trắc học. Thuật toán nộp cho Face Compliance Verification to ISO standard (FICV) benchmark area được yêu cầu phải kiểm tra sự phù hợp của hình ảnh khuôn mặt theo tiêu chuẩn ISO / IEC 19.794-5. Để tốt nhất của kiến thức của chúng tôi đây là điểm chuẩn sẵn có đầu tiên trực tiếp đánh giá độ chính xác của thuật toán để tự động xác minh sự phù hợp của hình ảnh khuôn mặt theo tiêu chuẩn ISO, trong nỗ lực của bán tự động hoá quá trình ban hành .

P. Jonathon Phillips, A. Martin, C.l. Wilson, M. Przybocki, An Introduction to Evaluating Biometric Systems, IEEE Computer, Vol. 33, No. 2, February 2000, pp. 56-63
download here, 407 kB

A. J. Mansfield, J. L. Wayman, Best Practices in Testing and Reporting Performance of Biometric Devices, NPL Report CMSC 14/02, August 2002
download here, 406 kB

K. Delac, M. Grgic, S. Grgic, Independent Comparative Study of PCA, ICA, and LDA on the FERET Data Set, International Journal of Imaging Systems and Technology, Vol. 15, Issue 5, pp. 252-260
download here, 412 kB

Đây là một số cơ sở dữ liệu nhận dạng thường được sử dụng bởi các nhà nghiên cứu:

The Color FERET Database, USA

The FERET program set out to establish a large database of facial images that was gathered independently from the algorithm developers. Dr. Harry Wechsler at George Mason University was selected to direct the collection of this database. The database collection was a collaborative effort between Dr. Wechsler and Dr. Phillips. The images were collected in a semi-controlled environment. To maintain a degree of consistency throughout the database, the same physical setup was used in each photography session. Because the equipment had to be reassembled for each session, there was some minor variation in images collected on different dates. The FERET database was collected in 15 sessions between August 1993 and July 1996. The database contains 1564 sets of images for a total of 14,126 images that includes 1199 individuals and 365 duplicate sets of images. A duplicate set is a second set of images of a person already in the database and was usually taken on a different day. For some individuals, over two years had elapsed between their first and last sittings, with some subjects being photographed multiple times. This time lapse was important because it enabled researchers to study, for the first time, changes in a subject's appearance that occur over a year.

SCface - Surveillance Cameras Face Database

SCface is a database of static images of human faces. Images were taken in uncontrolled indoor environment using five video surveillance cameras of various qualities. Database contains 4160 static images (in visible and infrared spectrum) of 130 subjects. Images from different quality cameras mimic the real-world conditions and enable robust face recognition algorithms testing, emphasizing different law enforcement and surveillance use case scenarios. SCface database is freely available to research community. The paper describing the database is available here.

SCfaceDB Landmarks

The database is comprised of 21 facial landmarks (from 4160 face images) from 130 users annotated manually by a human operator, as described in this paper.

Multi-PIE

A close relationship exists between the advancement of face recognition algorithms and the availability of face databases varying factors that affect facial appearance in a controlled manner. The PIE database, collected at Carnegie Mellon University in 2000, has been very influential in advancing research in face recognition across pose and illumination. Despite its success the PIE database has several shortcomings: a limited number of subjects, a single recording session and only few expressions captured. To address these issues researchers at Carnegie Mellon University collected the Multi-PIE database. It contains 337 subjects, captured under 15 view points and 19 illumination conditions in four recording sessions for a total of more than 750,000 images. The paper describing the database is available here.

The Yale Face Database

Contains 165 grayscale images in GIF format of 15 individuals. There are 11 images per subject, one per different facial expression or configuration: center-light, w/glasses, happy, left-light, w/no glasses, normal, right-light, sad, sleepy, surprised, and wink.

The Yale Face Database B

Contains 5760 single light source images of 10 subjects each seen under 576 viewing conditions (9 poses x 64 illumination conditions). For every subject in a particular pose, an image with ambient (background) illumination was also captured.

PIE Database, CMU

A database of 41,368 images of 68 people, each person under 13 different poses, 43 different illumination conditions, and with 4 different expressions.

Project - Face In Action (FIA) Face Video Database, AMP, CMU

Capturing scenario mimics the real world applications, for example, when a person is going through the airport check-in point. Six cameras capture human faces from three different angles. Three out of the six cameras have smaller focus length, and the other three have larger focus length. Plan to capture 200 subjects in 3 sessions in different time period. For one session, both in-door and out-door scenario will be captured. User-dependent pose and expression variation are expected from the video sequences.

AT&T "The Database of Faces" (formerly "The ORL Database of Faces")

Ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement).

Cohn-Kanade AU Coded Facial Expression Database

Subjects in the released portion of the Cohn-Kanade AU-Coded Facial Expression Database are 100 university students. They ranged in age from 18 to 30 years. Sixty-five percent were female, 15 percent were African-American, and three percent were Asian or Latino. Subjects were instructed by an experimenter to perform a series of 23 facial displays that included single action units and combinations of action units. Image sequences from neutral to target display were digitized into 640 by 480 or 490 pixel arrays with 8-bit precision for grayscale values. Included with the image files are "sequence" files; these are short text files that describe the order in which images should be read.

MIT-CBCL Face Recognition Database

The MIT-CBCL face recognition database contains face images of 10 subjects. It provides two training sets: 1. High resolution pictures, including frontal, half-profile and profile view; 2. Synthetic images (324/subject) rendered from 3D head models of the 10 subjects. The head models were generated by fitting a morphable model to the high-resolution training images. The 3D models are not included in the database. The test set consists of 200 images per subject. We varied the illumination, pose (up to about 30 degrees of rotation in depth) and the background.

Image Database of Facial Actions and Expressions - Expression Image Database

24 subjects are represented in this database, yielding between about 6 to 18 examples of the 150 different requested actions. Thus, about 7,000 color images are included in the database, and each has a matching gray scale image used in the neural network analysis.

Face Recognition Data, University of Essex, UK

395 individuals (male and female), 20 images per individual. Contains images of people of various racial origins, mainly of first year undergraduate students, so the majority of indivuals are between 18-20 years old but some older individuals are also present. Some individuals are wearing glasses and beards.

NIST Mugshot Identification Database

There are images of 1573 individuals (cases) 1495 male and 78 female. The database contains both front and side (profile) views when available. Separating front views and profiles, there are 131 cases with two or more front views and 1418 with only one front view. Profiles have 89 cases with two or more profiles and 1268 with only one profile. Cases with both fronts and profiles have 89 cases with two or more of both fronts and profiles, 27 with two or more fronts and one profile, and 1217 with only one front and one profile.

NLPR Face Database

450 face images. 896 x 592 pixels. JPEG format. 27 or so unique people under with different lighting/expressions/backgrounds.

M2VTS Multimodal Face Database (Release 1.00)

Database is made up from 37 different faces and provides 5 shots for each person. These shots were taken at one week intervals or when drastic face changes occurred in the meantime. During each shot, people have been asked to count from '0' to '9' in their native language (most of the people are French speaking), rotate the head from 0 to -90 degrees, again to 0, then to +90 and back to 0 degrees. Also, they have been asked to rotate the head once again without glasses if they wear any.

The Extended M2VTS Database, University of Surrey, UK

Contains four recordings of 295 subjects taken over a period of four months. Each recording contains a speaking head shot and a rotating head shot. Sets of data taken from this database are available including high quality colour images, 32 KHz 16-bit sound files, video sequences and a 3D model.

The AR Face Database, The Ohio State University, USA

4,000 color images corresponding to 126 people's faces (70 men and 56 women). Images feature frontal view faces with different facial expressions, illumination conditions, and occlusions (sun glasses and scarf).

The University of Oulu Physics-Based Face Database

Contains 125 different faces each in 16 different camera calibration and illumination condition, an additional 16 if the person has glasses. Faces in frontal position captured under Horizon, Incandescent, Fluorescent and Daylight illuminant. Includes 3 spectral reflectance of skin per person measured from both cheeks and forehead. Contains RGB spectral response of camera used and spectral power distribution of illuminants.

CAS-PEAL Face Database

The CAS-PEAL face database has been constructed under the sponsors of National Hi-Tech Program and ISVISION. The goals to create the PEAL face database include: providing the worldwide researchers of FR community a large-scale Chinese face database for training and evaluating their algorithms; facilitating the development of FR by providing large-scale face images with different sources of variations, especially Pose, Expression, Accessories, and Lighting (PEAL); advancing the state-of-the-art face recognition technologies aiming at practical applications especially for the oriental.

Japanese Female Facial Expression(JAFFE) Database

The database contains 213 images of 7 facial expressions (6 basic facial expressions + 1 neutral) posed by 10 Japanese female models. Each image has been rated on 6 emotion adjectives by 60 Japanese subjects.

BioID Face DB - HumanScan AG, Switzerland

The dataset consists of 1521 gray level images with a resolution of 384x286 pixel. Each one shows the frontal view of a face of one out of 23 different test persons. For comparison reasons the set also contains manually set eye postions.

Psychological Image Collection at Stirling (PICS)

This is a collection of images useful for research in Psychology, such as sets of faces and objects. The images in the database are organised into SETS, with each set often representing a separate experimental study.

The Sheffield Face Database (previously: The UMIST Face Database)

Consists of 564 images of 20 people. Each covering a range of poses from profile to frontal views. Subjects cover a range of race/sex/appearance. Each subject exists in their own directory labelled 1a, 1b, ... 1t and images are numbered consequetively as they were taken. The files are all in PGM format, approximately 220 x 220 pixels in 256 shades of grey.

Face Video Database of the Max Planck Institute for Biological Cybernetics

This database contains short video sequences of facial Action Units recorded simultaneously from six different viewpoints, recorded in 2003 at the Max Planck Institute for Biological Cybernetics. The video cameras were arranged at 18 degrees intervals in a semi-circle around the subject at a distance of roughly 1.3m. The cameras recorded 25 frames/sec at 786x576 video resolution, non-interlaced. In order to facilitate the recovery of rigid head motion, the subject wore a headplate with 6 green markers. The website contains a total of 246 video sequences in MPEG1 format.

Caltech Faces

450 face images. 896 x 592 pixels. JPEG format. 27 or so unique people under with different lighting/expressions/backgrounds.

EQUINOX HID Face Database

Human identification from facial features has been studied primarily using imagery from visible video cameras. Thermal imaging sensors are one of the most innovative emerging techonologies in the market. Fueled by ever lowering costs and improved sensitivity and resolution, our sensors provide exciting new oportunities for biometric identification. As part of our involvement in this effort, Equinox is collecting an extensive database of face imagery in the following modalities: coregistered broadband-visible/LWIR (8-12 microns), MWIR (3-5 microns), SWIR (0.9-1.7 microns). This data collection is made available for experimentation and statistical performance evaluations.

VALID Database

With the aim to facilitate the development of robust audio, face, and multi-modal person recognition systems, the large and realistic multi-modal (audio-visual) VALID database was acquired in a noisy "real world" office scenario with no control on illumination or acoustic noise. The database consists of five recording sessions of 106 subjects over a period of one month. One session is recorded in a studio with controlled lighting and no background noise, the other 4 sessions are recorded in office type scenarios. The database contains uncompressed JPEG Images at resolution of 720x576 pixels.

The UCD Colour Face Image Database for Face Detection

The database has two parts. Part one contains colour pictures of faces having a high degree of variability in scale, location, orientation, pose, facial expression and lighting conditions, while part two has manually segmented results for each of the images in part one of the database. These images are acquired from a wide variety of sources such as digital cameras, pictures scanned using photo-scanner, other face databases and the World Wide Web. The database is intended for distribution to researchers.

Georgia Tech Face Database

The database contains images of 50 people and is stored in JPEG format. For each individual, there are 15 color images captured between 06/01/99 and 11/15/99. Most of the images were taken in two different sessions to take into account the variations in illumination conditions, facial expression, and appearance. In addition to this, the faces were captured at different scales and orientations.

Indian Face Database

The database contains a set of face images taken in February, 2002 in the IIT Kanpur campus. There are eleven different images of each of 40 distinct subjects. For some subjects, some additional photographs are included. All the images were taken against a bright homogeneous background with the subjects in an upright, frontal position. The files are in JPEG format. The size of each image is 640x480 pixels, with 256 grey levels per pixel. The images are organized in two main directories - males and females. In each of these directories, there are directories with name as a serial numbers, each corresponding to a single individual. In each of these directories, there are eleven different images of that subject, which have names of the form abc.jpg, where abc is the image number for that subject. The following orientations of the face are included: looking front, looking left, looking right, looking up, looking up towards left, looking up towards right, looking down. Available emotions are: neutral, smile, laughter, sad/disgust.

VidTIMIT Database

The VidTIMIT database is comprised of video and corresponding audio recordings of 43 people, reciting short sentences. It can be useful for research on topics such as multi-view face recognition, automatic lip reading and multi-modal speech recognition. The dataset was recorded in 3 sessions, with a space of about a week between each session. There are 10 sentences per person, chosen from the TIMIT corpus. In addition to the sentences, each person performed a head rotation sequence in each session. The sequence consists of the person moving their head to the left, right, back to the center, up, then down and finally return to center. The recording was done in an office environment using a broadcast quality digital video camera. The video of each person is stored as a numbered sequence of JPEG images with a resolution of 512 x 384 pixels. The corresponding audio is stored as a mono, 16 bit, 32 kHz WAV file.

Labeled Faces in the Wild

Labeled Faces in the Wild is a database of face photographs designed for studying the problem of unconstrained face recognition. The database contains more than 13,000 images of faces collected from the web. Each face has been labeled with the name of the person pictured. 1680 of the people pictured have two or more distinct photos in the database. The only constraint on these faces is that they were detected by the Viola-Jones face detector. Please see the database web page and the technical report linked there for more details.

The LFWcrop Database

LFWcrop is a cropped version of the Labeled Faces in the Wild (LFW) dataset, keeping only the center portion of each image (i.e. the face). In the vast majority of images almost all of the background is omitted. LFWcrop was created due to concern about the misuse of the original LFW dataset, where face matching accuracy can be unrealistically boosted through the use of background parts of images (i.e. exploitation of possible correlations between faces and backgrounds). As the location and size of faces in LFW was determined through the use of an automatic face locator (detector), the cropped faces in LFWcrop exhibit real-life conditions, including mis-alignment, scale variations, in-plane as well as out-of-plane rotations.

Labeled Faces in the Wild-a (LFW-a)

The "Labeled Faces in the Wild-a" image collection is a database of labeled, face images intended for studying Face Recognition in unconstrained images. It contains the same images available in the original Labeled Faces in the Wild data set, however, here we provide them after alignment using a commercial face alignment software. Some of our results were produced using these images. We show this alignment to improve the performance of face recognition algorithms. We have maintained the same directory structure as in the original LFW data set, and so these images can be used as direct substitutes for those in the original image set. Note, however, that the images available here are grayscale versions of the originals.

3D_RMA database

The 3D_RMA database is a collection of two sessions (Nov 1997 and Jan 1998) consisting of 120 persons. For each session, three shots were recorded with different (but limited) orientations of the head. Details about the population and typical problems affecting the quality are given in the referred link. 3D was captured thanks to a first prototype of a proprietary system based on structured light (analog camera!). The quality was limited but sufficient to show the ability of 3D face recognition. For privacy reasons, the texture images are not made available. In the period 2003-2008, this database has been downloaded by about 100 researchers. A few papers present recognition results with the database (like, of course, papers from the author).

GavabDB: 3D face database, GAVAB research group, Universidad Rey Juan Carlos, Spain

GavabDB is a 3D face database. It contains 549 three-dimensional images of facial surfaces. These meshes correspond to 61 different individuals (45 male and 16 female) having 9 images for each person. The total of the individuals are Caucasian and their age is between 18 and 40 years old. Each image is given by a mesh of connected 3D points of the facial surface without texture. The database provides systematic variations with respect to the pose and the facial expression. In particular, the 9 images corresponding to each individual are: 2 frontal views with neutral expression, 2 x-rotated views (ą30o, looking up and looking down respectively) with neutral expression, 2 y-rotated views (ą90o, left and right profiles respectively) with neutral expression and 3 frontal gesture images (laugh, smile and a random gesture chosen by the user, respectively).

FRAV2D Database

This database is formed by up to 109 subjects (75 men and 34 women), with 32 colour images per person. Each picture has a 320 x 240 pixel resolution, with the face occupying most of the image in an upright position. For one single person, all the photographs were taken on the same day, although the subject was forced to stand up and sit down again in order to change pose and gesture. In all cases, the background is plain and dark blue. The 32 images were classified in six groups according to the pose and lighting conditions: 12 frontal images, 4 15o-turned images, 4 30o-turned images, 4 images with gestures, 4 images with occluded face features and 4 frontal images with a change of illumination. This database is delivered for free exclusively for research purposes.

FRAV3D Database

This database contains 106 subjects, with approximately one woman every three men. The data were acquired with a Minolta VIVID 700 scanner, which provides texture information (2D image) and a VRML file (3D image). If needed, the corresponding range data (2.5D image) can be computed by means of the VRML file. Therefore, it is a multimodal database (2D, 2.5D y 3D). During all time, a strict acquisition protocol was followed, with controlled lighting conditions. The person sat down on an adjustable stool opposite the scanner and in front of a blue wall. No glasses, hats or scarves were allowed. A total of 16 captures per person were taken in every session, with different poses and lighting conditions, trying to cover all possible variations, including turns in different directions, gestures and lighting changes. In every case only one parameter was modified between two captures. This is one of the main advantages of this database, respect to others. This database is delivered for free exclusively for research purposes.

BJUT-3D Chinese Face Database

The BJUT-3D is a three dimension face database including 500 Chinese persons. There are 250 females and 250 males in the database. Everyone has a 3D face data with neutral expression and without accessories. Original high-resolution 3D face data is acquired by the CyberWare 3D scanner in given environment, Every 3D face data has been preprocessed, and cut the redundant parts. Now the face database is available for research purpose only. The Multimedia and Intelligent Software Technology Beijing Municipal Key Laboratory in Beijing University of Technology is serving as the technical agent for distribution of the database and reserves the copyright of all the data in the database.

The Bosphorus Database

The Bosphorus Database is a new 3D face database that includes a rich set of expressions, systematic variation of poses and different types of occlusions. This database is unique from three aspects: (1) The facial expressions are composed of judiciously selected subset of Action Units as well as the six basic emotions, and many actors/actresses are incorporated to obtain more realistic expression data; (2) A rich set of head pose variations are available; (3) Different types of face occlusions are included. Hence, this new database can be a very valuable resource for development and evaluation of algorithms on face recognition under adverse conditions and facial expression analysis as well as for facial expression synthesis.

PUT Face Database

PUT Face Database consists of almost 10000 hi-res images of 100 people. Images were taken in controlled conditions and the database is supplied with additional data including: rectangles containing face, eyes, nose and mouth, landmarks positions and manually annotated contour models. Database is available for research purposes.

The Basel Face Model (BFM)

The Basel Face Model (BFM) is a 3D Morphable Face Model constructed from 100 male and 100 female example faces. The BFM consists of a generative 3D shape model covering the face surface from ear to ear and a high quality texture model. The model can be used either directly for 2D and 3D face recognition or to generate training and test images for any imaging condition. Hence, in addition to being a valuable model for face analysis it can also be viewed as a meta-database which allows the creation of accurately labeled synthetic training and testing images. To allow for a fair comparison with other algorithms, we provide both the training data set (the BFM) and the model fitting results for several standard image data sets (CMU-PIE, FERET) obtained with our fitting algorithm. The BFM web page additionally provides a set of registered scans of ten individuals, together with a set of 270 renderings of these individuals with systematic pose and light variations. These scans are not included in the training set of the BFM and form a standardized test set with a ground truth for pose and illumination.

Plastic Surgery Face Database

The plastic surgery face database is a real world database that contains 1800 pre and post surgery images pertaining to 900 subjects. Different types of facial plastic surgeries have different impact on facial features. To enable the researchers to design and evaluate face recognition algorithms on all types of facial plastic surgeries, the database contains images from a wide variety of cases such as Rhinoplasty (nose surgery), Blepharoplasty (eyelid surgery), brow lift, skin peeling, and Rhytidectomy (face lift). For each individual, there are two frontal face images with proper illumination and neutral expression: the first is taken before surgery and the second is taken after surgery. The database contains 519 image pairs corresponding to local surgeries and 381 cases of global surgery (e.g., skin peeling and face lift). The details of the database and performance evaluation of several well known face recognition algorithms is available in this paper.

The Iranian Face Database (IFDB)

The Iranian Face Database (IFDB), the first image database in middle-east, contains color facial imagery of a large number of Iranian subjects. IFDB is a large database that can support studies of the age classification systems. It contains over 3,600 color images. IFDB can be used for age classification, facial feature extraction, aging, facial ratio extraction, percent of facial similarity, facial surgery, race detection and other similar researches.

The Hong Kong Polytechnic University NIR Face Database

The Biometric Research Centre at The Hong Kong Polytechnic University developed a real time NIR face capture device and used it to construct a large-scale NIR face database. The NIR face image acquisition system consists of a camera, an LED light source, a filter, a frame grabber card and a computer. The camera used is a JAI camera, which is sensitive to NIR band. The active light source is in the NIR spectrum between 780nm - 1,100 nm. The peak wavelength is 850 nm. The strength of the total LED lighting is adjusted to ensure a good quality of the NIR face images when the camera face distance is between 80 cm - 120 cm, which is convenient for the users. By using the data acquisition device described above, we collected NIR face images from 335 subjects. During the recording, the subject was first asked to sit in front of the camera, and the normal frontal face images of him/her were collected. Then the subject was asked to make expression and pose changes and the corresponding images were collected. To collect face images with scale variations, we asked the subjects to move near to or away from the camera in a certain range. At last, to collect face images with time variations, samples from 15 subjects were collected at two different times with an interval of more than two months. In each recording, we collected about 100 images from each subject, and in total about 34,000 images were collected in the PolyU-NIRFD database.

The Hong Kong Polytechnic University Hyperspectral Face Database (PolyU-HSFD)

The Biometric Research Centre at The Hong Kong Polytechnic University established a Hyperspectral Face database. The indoor hyperspectral face acquisition system was built which mainly consists of a CRI's VariSpec LCTF and a Halogen Light, and includes a hyperspectral dataset of 300 hyperspectral image cubes from 25 volunteers with age range from 21 to 33 (8 female and 17 male). For each individual, several sessions were collected with an average time space of 5 month. The minimal interval is 3 months and the maximum is 10 months. Each session consists of three hyperspectral cubes - frontal, right and left views with neutral-expression. The spectral range is from 400 nm to 720 nm with a step length of 10 nm, producing 33 bands in all. Since the database was constructed over a long period of time, significant appearance variations of the subjects, e.g. changes of hair style and skin condition, are presented in the data. In data collection, positions of the camera, light and subject are fixed, which allows us to concentrate on the spectral characteristics for face recognition without masking from environmental changes.

MOBIO - Mobile Biometry Face and Speech Database

The MOBIO database consists of bi-modal (audio and video) data taken from 152 people. The database has a female-male ratio or nearly 1:2 (100 males and 52 females) and was collected from August 2008 until July 2010 in six different sites from five different countries. This led to a diverse bi-modal database with both native and non-native English speakers. In total 12 sessions were captured for each client: 6 sessions for Phase I and 6 sessions for Phase II. The Phase I data consists of 21 questions with the question types ranging from: Short Response Questions, Short Response Free Speech, Set Speech, and Free Speech. The Phase II data consists of 11 questions with the question types ranging from: Short Response Questions, Set Speech, and Free Speech. The database was recorded using two mobile devices: a mobile phone and a laptop computer. The mobile phone used to capture the database was a NOKIA N93i mobile while the laptop computer was a standard 2008 MacBook. The laptop was only used to capture part of the first session, this first session consists of data captured on both the laptop and the mobile phone.

Texas 3D Face Recognition Database (Texas 3DFRD)

Texas 3D Face Recognition database (Texas 3DFRD) contains 1149 pairs of facial color and range images of 105 adult human subjects. The images were acquired at the company Advanced Digital Imaging Research (ADIR), LLC (Friendswood, TX), formerly a subsidiary of Iris International, Inc. (Chatsworth, CA), with assistance from research students and faculty from the Laboratory for Image and Video Engineering (LIVE) at The University of Texas at Austin. This project was sponsored by the Advanced Technology Program of the National Institute of Standards and Technology (NIST). The database is being made available by Dr. Alan C Bovik at UT Austin. The images were acquired using a stereo imaging system at a high spatial resolution of 0.32 mm. The color and range images were captured simultaneously and thus are perfectly registered to each other. All faces have been normalized to the frontal position and the tip of the nose is positioned at the center of the image. The images are of adult humans from all the major ethnic groups and both genders. For each face, is also available information about the subjects' gender, ethnicity, facial expression, and the locations 25 anthropometric facial fiducial points. These fiducial points were located manually on the facial color images using a computer based graphical user interface. Specific data partitions (training, gallery, and probe) that were employed at LIVE to develop the Anthropometric 3D Face Recognition algorithm are also available.

Natural Visible and Infrared facial Expression database (USTC-NVIE)

The database contains both spontaneous and posed expressions of more than 100 subjects, recorded simultaneously by a visible and an infrared thermal camera, with illumination provided from three different directions. The posed database also includes expression images with and without glasses. The paper describing the database is available here.

FEI Face Database

The FEI face database is a Brazilian face database that contains a set of face images taken between June 2005 and March 2006 at the Artificial Intelligence Laboratory of FEI in Sao Bernardo do Campo, Sao Paulo, Brazil. There are 14 images for each of 200 individuals, a total of 2800 images. All images are colourful and taken against a white homogenous background in an upright frontal position with profile rotation of up to about 180 degrees. Scale might vary about 10% and the original size of each image is 640x480 pixels. All faces are mainly represented by students and staff at FEI, between 19 and 40 years old with distinct appearance, hairstyle, and adorns. The number of male and female subjects are exactly the same and equal to 100.

ChokePoint

ChokePoint video dataset is designed for experiments in person identification/verification under real-world surveillance conditions using existing technologies. An array of three cameras was placed above several portals (natural choke points in terms of pedestrian traffic) to capture subjects walking through each portal in a natural way. While a person is walking through a portal, a sequence of face images (ie. a face set) can be captured. Faces in such sets will have variations in terms of illumination conditions, pose, sharpness, as well as misalignment due to automatic face localisation/detection. Due to the three camera configuration, one of the cameras is likely to capture a face set where a subset of the faces is near-frontal. The dataset consists of 25 subjects (19 male and 6 female) in portal 1 and 29 subjects (23 male and 6 female) in portal 2. In total, the dataset consists of 54 video sequences and 64,204 labelled face images.

UMB database of 3D occluded faces

The University of Milano Bicocca 3D face database is a collection of multimodal (3D + 2D colour images) facial acquisitions. The database is available to universities and research centers interested in face detection, face recognition, face synthesis, etc. The UMB-DB has been acquired with a particular focus on facial occlusions, i.e. scarves, hats, hands, eyeglasses and other types of occlusion wich can occur in real-world scenarios.

VADANA: Vims Appearance Dataset for facial ANAlysis

The primary use of VADANA is for the problems of face verification and recognition across age progression. The main characteristics of VADANA, which distinguish it from current benchmarks, is the large number of intra-personal pairs (order of 168 thousand); natural variations in pose, expression and illumination; and the rich set of additional meta-data provided along with standard partitions for direct comparison and bench-marking efforts.

MORPH Database (Craniofacial Longitudinal Morphological Face Database)

MORPH database is the largest publicly available longitudinal face database. The MORPH database contains 55,000 images of more than 13,000 people within the age ranges of 16 to 77. There are an average of 4 images per individual with the time span between each image being an average of 164 days. This data set was comprised for research on facial analytics and facial recognition.

Long Distance Heterogeneous Face Database (LDHF-DB)

LDHF database contains both visible (VIS) and near-infrared (NIR) face images at distances of 60 m, 100 m and 150 m outdoors and at a 1 m distance indoors. Face images of 100 subjects (70 males and 30 females) were captured; for each subject one image was captured at each distance in daytime and nighttime. All the images of individual subjects are frontal faces without glasses and collected in a single sitting.

PhotoFace: Face recognition using photometric stereo

This unique 3D face database is amongst the largest currently available, containing 3187 sessions of 453 subjects, captured in two recording periods of approximately six months each. The Photoface device was located in an unsupervised corridor allowing real-world and unconstrained capture. Each session comprises four differently lit colour photographs of the subject, from which surface normal and albedo estimations can be calculated (photometric stereo Matlab code implementation included). This allows for many testing scenarios and data fusion modalities. Eleven facial landmarks have been manually located on each session for alignment purposes. Additionally, the Photoface Query Tool is supplied (implemented in Matlab), which allows for subsets of the database to be extracted according to selected metadata e.g. gender, facial hair, pose, expression.

The EURECOM Kinect Face Dataset (EURECOM KFD)

The Dataset consists of multimodal facial images of 52 people (14 females, 38 males) acquired with a Kinect sensor. The data is captured in two sessions at different intervals (of about two weeks). In each session, 9 facial images are collected from each person according to different facial expressions, lighting and occlusion conditions: neutral, smile, open mouth, left profile, right profile, occluded eyes, occluded mouth, side occlusion with a sheet of paper and light on. An RGB color image, a depth map (provided both as a bitmap depth image and a text file containing the original depth levels sensed by Kinect) as well as the associated 3D data are provided for all samples. In addition, the dataset includes 6 manually labeled landmark positions for every face: left eye, right eye, tip of the nose, left side of mouth, right side of mouth and the chin. Other information, such as gender, year of birth, ethnicity, glasses (whether a person wears glasses or not) and the time of each session are also available.

YouTube Faces Database

The data set contains 3,425 videos of 1,595 different people. All the videos were downloaded from YouTube. An average of 2.15 videos are available for each subject. The shortest clip duration is 48 frames, the longest clip is 6,070 frames, and the average length of a video clip is 181.3 frames. In designing our video data set and benchmarks we follow the example of the 'Labeled Faces in the Wild' LFW image collection. Specifically, our goal is to produce a large scale collection of videos along with labels indicating the identities of a person appearing in each video. In addition, we publish benchmark tests, intended to measure the performance of video pair-matching techniques on these videos. Finally, we provide descriptor encodings for the faces appearing in these videos, using well established descriptor methods.

YMU (YouTube Makeup) Dataset

The dataset consists of 151 subjects, specifically Caucasian females, from YouTube makeup tutorials. Images of the subjects before and after the application of makeup were captured. There are four shots per subject: two shots before the application of makeup and two shots after the application of makeup. For a few subjects, three shots each before and after the application of makeup were obtained. The makeup in these face images varies from subtle to heavy. The cosmetic alteration is mainly in the ocular area, where the eyes have been accentuated by diverse eye makeup products. Additional changes are on the quality of the skin due to the application of foundation and change in lip color. This dataset includes some variations in expression and pose. The illumination condition is reasonably constant over multiple shots of the same subject. In few cases, the hair style before and after makeup changes drastically.

VMU (Virtual Makeup) Dataset

The VMU dataset was assembled by synthetically adding makeup to 51 female Caucasian subjects in the FRGC dataset. We added makeup by using a publicly available tool from Taaz. Three virtual makeovers were created: (a) application of lipstick only; (b) application of eye makeup only; and (c) application of a full makeup consisting of lipstick, foundation, blush and eye makeup. Hence, the assembled dataset contains four images per subject: one before-makeup shot and three aftermakeup shots.

MIW (Makeup in the "wild") Dataset

The MIW dataset contains 125 subjects with 1-2 images per subject. Total number of images is 154 (77 with makeup and 77 without makeup). The images are obtained from the internet and the faces are unconstrained.

3D Mask Attack Database (3DMAD)

The 3D Mask Attack Database (3DMAD) is a biometric (face) spoofing database. It currently contains 76500 frames of 17 persons, recorded using Kinect for both real-access and spoofing attacks. Each frame consists of: (1) a depth image (640x480 pixels – 1x11 bits); (2) the corresponding RGB image (640x480 pixels – 3x8 bits); (3) manually annotated eye positions (with respect to the RGB image). The data is collected in 3 different sessions for all subjects and for each session 5 videos of 300 frames are captured. The recordings are done under controlled conditions, with frontal-view and neutral expression. The first two sessions are dedicated to the real access samples, in which subjects are recorded with a time delay of ~2 weeks between the acquisitions. In the third session, 3D mask attacks are captured by a single operator (attacker). If you use this database please cite this publication: N. Erdogmus and S. Marcel. "Spoofing in 2D Face Recognition with 3D Masks and Anti-spoofing with Kinect", in IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), 2013. Source code to reproduce experiments in the paper: pypi.python.org/pypi/maskattack.lbp

Senthilkumar Face Database (Version 1.0)

The Senthilkumar Face Database contains 80 grayscale face images of 5 people (all are men), including frontal views of faces with different facial expressions, occlusions and brightness conditions. Each person has 16 different images. The face portion of the image is manually cropped to 140x188 pixels and then it is normalized. Facial images are available in both grayscale and colour images.

McGill Real-world Face Video Database

This database contains 18000 video frames of 640x480 resolution from 60 video sequences, each of which recorded from a different subject (31 female and 29 male). Each video was collected in a different environment (indoor or outdoor) resulting arbitrary illumination conditions and background clutter. Furthermore, the subjects were completely free in their movements, leading to arbitrary face scales, arbitrary facial expressions, head pose (in yaw, pitch and roll), motion blur, and local or global occlusions.

SiblingsDB Database

The SiblingsDB contains two different datasets depicting images of individuals related by sibling relationships. The first, called HQfaces, contains a set of high quality images depicting 184 individuals (92 pairs of siblings). A subset of 79 pairs contains profile images as well, and 56 of them have also smiling frontal and profile pictures. All the images are annotated with, respectively, the position of 76 landmarks on frontal images and 12 landmarks on profile images. For each individual the information on sex, birth date, age (the highest and average age differences between siblings are 30 and 4.6 years, respectively) and votes of the panel of human raters (who were asked to evaluate if the couples depict siblings or not) are also available. The second DB, called LQfaces, contains contains 98 pairs of siblings (196 individuals) found over the Internet, where most of the subjects are celebrities. The position of the 76 frontal facial landmarks are provided as well, but this dataset does not include the age information and human expert ratings were not collected since this dataset is composed mainly of well-known personages and, hence, likely to produce biased ratings.

The Adience image set and benchmark of unfiltered faces for age, gender and subject classification

The dataset consists of 26,580 images, portraying 2,284 individuals, classified for 8 age groups, gender and including subject labels (identity). It is unique in its construction: The sources of the images included in this set are Flickr albums, assembled by automatic upload from iPhone5 or later smartphone devices, and released by their authors to the general public under the Creative Commons (CC) license. This constitutes the largest, fully unconstrained collection of images for age, gender and subject recognition.

FaceScrub - A Dataset With Over 100,000 Face Images of 530 People

Large face datasets are important for advancing face recognition research, but they are tedious to build, because a lot of work has to go into cleaning the huge amount of raw data. To facilitate this task, we developed an approach to building face datasets that detects faces in images returned from searches for public figures on the Internet, followed by automatically discarding those not belonging to each queried person. The FaceScrub dataset was created using this approach, followed by manually checking and cleaning the results. It comprises a total of 107,818 face images of 530 celebrities, with about 200 images per person. As such, it is one of the largest public face databases.

LFW3D and Adience3D sets

Frontalization is the process of synthesizing frontal facing views of faces appearing in single unconstrained photos. Recent reports have suggested that this process may substantially boost the performance of face recognition systems. This, by transforming the challenging problem of recognizing faces viewed from unconstrained viewpoints to the easier problem of recognizing faces in constrained, forward facing poses. Authors provide frontalized versions of both the widely used Labeled Faces in the Wild set (LFW) for face identity verification and the Adience collection for age and gender classification. These sets, (LFW3D and Adience3D) are made available along with our implementation of the method used for the frontalization.

Indian Movie Face database (IMFDB)

Indian Movie Face database (IMFDB) is a large unconstrained face database consisting of 34512 images of 100 Indian actors collected from more than 100 videos. All the images are manually selected and cropped from the video frames resulting in a high degree of variability interms of scale, pose, expression, illumination, age, resolution, occlusion, and makeup. IMFDB is the first face database that provides a detailed annotation of every image in terms of age, pose, gender, expression and type of occlusion that may help other face related applications.

Labeled Wikipedia Faces (LWF)

The goal of this project is to mine facial images and other important information for the Wikipedia Living People category. Currently, there are over 0.5 million biographic entries, and the number continues to grow. Unlike other data sets, such as the Labeled Faces in the Wild (LFW) and PubFig, the Labeled Wikipedia Faces (LWF) comes from Wikipedia, which is a creative common resource. In addition to these faces, useful meta data are released: the source images, image captions (if available), and person name detection results (through a named entity detector). So, mining experiments can also be performed. This is an unique property of this benchmark compared to others. The Labeled Wikipedia Faces (LWF) is a dataset with 8.5k faces for about 1.5k identities.

10k US Adult Faces Database

It is a database of 10,168 natural face photographs of all different individuals, and major celebrities removed. This database was made by randomly sampling Google Images for randomly generated names based on name distributions in the 1990 US Census. Because of this methodology, the distribution of the faces matches the demographic distribution of the US (e.g., age, race, gender). The database also has a wide range of faces in terms of attractiveness and emotion. Ovals surround each face to eliminate any background effects. Additionally, for a random set of 2,222 of the faces, we have demographic information, attribute scores (attractiveness, distinctiveness, perceived personality, etc), and memorability scores included with the images, to help researchers create their own stimulus sets.

Denver Intensity of Spontaneous Facial Action (DISFA) Database

Denver Intensity of Spontaneous Facial Action (DISFA) Database is a non-posed facial expression database for those who are interested in developing computer algorithms for automatic action unit detection and their intensities described by FACS. This database contains stereo videos of 27 adult subjects (12 females and 15 males) with different ethnicities. The images were acquired using PtGrey stereo imaging system at high resolution (1024×768). The intensity of AU’s (0-5 scale) for all video frames were manually scored by two human FACS experts. The database also includes 66 facial landmark points of each image in the database.

BU-3DFE Database (Static Data)

BU-3DFE (Binghamton University 3D Facial Expression) includes 100 subjects with 2,500 facial expression models. The BU-3DFE database is available to the research community (e.g., areas of interest come from as diverse as affective computing, computer vision, human computer interaction, security, biomedicine, law-enforcement, and psychology). The database contains 100 subjects (56% female, 44% male), ranging age from 18 years to 70 years old, with a variety of ethnic/racial ancestries, including White, Black, East-Asian, Middle-east Asian, Indian, and Hispanic Latino.

BU-4DFE Database (Dynamic Data)

To analyze the facial behavior from a static 3D space to a dynamic 3D space, BU-3DFE Database is extended and a new database is formed: BU-4DFE (3D + time): A 3D Dynamic Facial Expression Database. A newly created high-resolution 3D dynamic facial expression database are presented, which is made available to the scientific research community. The 3D facial expressions are captured at a video rate (25 frames per second). For each subject, there are six model sequences showing six prototypic facial expressions (anger, disgust, happiness, fear, sadness, and surprise), respectively. Each expression sequence contains about 100 frames. The database contains 606 3D facial expression sequences captured from 101 subjects, with a total of approximately 60,600 frame models. Each 3D model of a 3D video sequence has the resolution of approximately 35,000 vertices. The texture video has a resolution of about 1040×1329 pixels per frame. The resulting database consists of 58 female and 43 male subjects, with a variety of ethnic/racial ancestries, including Asian, Black, Hispanic/Latino, and White.

BP4D-Spontanous Database

Because posed and un-posed (aka “spontaneous”) 3D facial expressions differ along several dimensions including complexity and timing, well-annotated 3D video of un-posed facial behavior is needed. Therefore, newly developed 3D video database of spontaneous facial expressions in a diverse group of young adults is introduced - BP4D-Spontanous: Binghamton-Pittsburgh 3D Dynamic Spontaneous Facial Expression Database. Well-validated emotion inductions were used to elicit expressions of emotion and paralinguistic communication. Frame-level ground-truth for facial actions was obtained using the Facial Action Coding System. Facial features were tracked in both 2D and 3D domains using both person-specific and generic approaches. The work promotes the exploration of 3D spatiotemporal features in subtle facial expression, better understanding of the relation between pose and motion dynamics in facial action units, and deeper understanding of naturally occurring facial action. The database includes 41 participants (23 women, 18 men). They were 18-29 years of age; 11 were Asian, 6 were African-American, 4 were Hispanic, and 20 were Euro-American. An emotion elicitation protocol was designed to elicit emotions of participants effectively. Eight tasks were covered with an interview process and a series of activities to elicit eight emotions. The database is structured by participants. Each participant is associated with 8 tasks. For each task, there are both 3D and 2D videos. As well, the metadata include manually annotated action units (FACS AU), automatically tracked head pose and 2D/3D facial landmarks. The database is in the size of about 2.6 TB (without compression).