SIGN LANGUAGE DETECTION

Sign Language Detection: Service

OVERVIEW

Sign language is poorly understood among the general public and, as a result, imposes a communication barrier between those who need to communicate with and and those who do not. It was our goal to develop a visual classification method in which static images of signed alphabet letters could be translated into readable text. Doing so could serve as a first step in creating a meaningful mode of communication between the audibly impaired and remaining community. This was the final project for our Machine Learning course- team members were Nili Krausz, Athulya Simon, and Austin Lawrence.

Sign Language Detection: Text

GOALS

Our goal was to distinguish between a few letters of the signed alphabet, with the overall goal of being able to convert from sign language to written letters. More specifically we wanted to distinguish the letters c, e, h, l, o, w, y, as these represent a good subset of the ASL alphabet. As a first step, we would distinguish between the letters that are very different from each other (e, h, l, o, and y) and then add in two letters that are similar to others (letters c and w would be the added letters, similar to o and y) in the first group if we make it past the first stage with time to spare. An additional goal was to recognize sign language regardless of the skin color of the user.

Sign Language Detection: Text

FEATURE EXTRACTION

The first step in performing the classification was to extract features from the images. To ensure that we could produce a good classifier of the different hand poses, we wanted to produce several features that could allow for cross validation and selection of the best features. Since our goal was to be able to extract hands of varying skin colors, we chose to use a single blue background which would be easily detectable. We collected a large sample of images of the different sign language gestures using this background and had varied lighting between images. This allowed us to normalize for differences in skin color, or for variations in hand color (such as for one subject who had henna in several of the images).

The next step to obtaining our features was to extract edges from our image. We did this using a Sobel edge detector, and obtain the resulting edges. The edges were faint and somewhat disconnected, so we improved the edges extraction using a standard image processing method known as dilation. Our dilation procedure utilized a disk structuring element of size 2. Using these edges we then performed a Hough Transform, which converts the edge points into the Hough domain, which is in polar coordinates rather than cartesian coordinates. This converted lines into points and, by finding the points with the highest magnitude, we were able to select the most significant lines in the image.

The next feature that we can extract is the histogram of the Hough lines. We separated the Hough lines into bins based on direction with each bin representing a range of 45°, and then the magnitude of each bin was a feature that we used in our final classifier.

Afterwards, we considered a method to compensate for color variations in the middle of the hand (such as the henna color difference shown in some of the photos). To eliminate errors due to the color variation we converted all of the foreground/hand pixels to an intensity of 255 (or white), and found edges and Hough lines. The Hough line histogram was then used as an additional feature for our classifier.

Finally, we used several other feature extraction methods, such as Histogram of the Gradients, or the maximum hough line but found that they did not help us to improve classification.

Sign Language Detection: Text

CROSS-VALIDATION FOR FEATURE SELECTION

Once we developed a method for extracting several different types of features from our image-sets, we needed to see which features could actually separate the one class from the rest. Another element that we wanted to determine was, if the data was separable, whether it was linearly so or not. If not, could we see a trend in the graph that would give us a hint as to how we could linearize that data?

Because we had so much data, we wanted to go through iteratively and create a plot of each combination of the features with all of the classes on it in different colors such that they would be easy to distinguish from one another. This made it easy to go plot by plot (feature combination by feature combination) and then color by color to see how well one letter could be distinguished from the rest.

Sign Language Detection: Text

SUPPORT VECTOR MACHINE

Once we decided which features best separated all of the data, we needed to run this through the support vector machine (SVM) to identify a classifier. We used a soft-margin SVM with an approach that was similar to one vs. all in order to classify our five classes. The reason why we chose a soft-margin SVM is because we assumed our data does not exhibit perfect linear separability and wanted to relax the constraints of the original hard-margin SVM model. In order to accomplish this, we used the same method we were taught in class of adding in the regularization parameter (lambda).
To find the value of lambda that led to the smallest amount of error, we performed an L2 regularized cross-validation scheme. In other words, we first split our data into training and testing sets. Then we used soft-margin SVM to get a minimum x value for each function and evaluated the resulting data. The evaluation was done by finding the value of Transpose(D)*X (if the value of Transpose(D)*X is greater than 0, that means that the data point we were looking at was classified into that group; if that value is less than or equal to 0, then that data point was not classified into that group). After we found the resulting error, we found the minimum testing error and the lambda associated with it. In order to distinguish more than one letter, we iterated through the process described above for each letter (class) that we had.

Sign Language Detection: Text

CONCLUSIONS

We achieved relatively good accuracy using the algorithm that we described, but there are always improvements that can be made. One improvement that we could make is to run the cross-validation algorithm to actually check which degree polynomial would best fit our nonlinear datasets. As stated before, we plotted them all and looked at what we thought would be appropriate divisions. This worked to an extent but the results could have benefited from more appropriate feature linearization techniques. Additionally, we would like to make the feature selection algorithm more robust because, right now, we can only take in images that are taken against a solid background. As it stands, we’re really proud of what we accomplished through this project; we learned a lot and are a excited about future improvement we can make to the algorithm.

Sign Language Detection: Text