Future Work

As mentioned in the High Level Design Decisions log, we identified two fundamental features for a sign language interpreter:

Two hands and arms, with full control of all degrees of freedom, to closely emulate human arms.
A robust voice detection system, which would be constantly detecting audio.

We have implemented scaled-down versions of both features:

One hand, with two degrees of freedom in each digit and one degree of freedom in the wrist.
A voice detection system capable of capturing three seconds of audio.

For our product to be useful as a sign-language interpreter, we would need to enhance each of the features that we developed.

On the mechanical side, we would need to improve the design of our robotic hand. Currently, the robot consists of only one hand with two degrees of freedom in each digit and no motion in the wrist. Ideally, the robot would have two separate arms with full forearm control to form more complex signs. As well, the wrists of the robot would need to have full rotation and bending motion in all directions. The fingers of our robot can only move side-to-side and bend and extend. There is no motion in the knuckles of our hand. The robot would require an additional degree of freedom in the knuckles to accurately form more signs.

Solving the mechanical limitations mentioned above would require our team to redesign the robotic hand completely. We used open-source files to 3D print our mechanical hand instead of completing the mechanical design ourselves, which would have proved difficult, time-consuming, and outside the scope of this project. Simply designing the mechanics of a robust robotic arm configuration could easily become a semester or year-long project, especially given our lack of mechanical design experience and training.

For the voice detection system, we would need to be constantly detecting audio while the system is operational. Currently, we record audio for 3 seconds and then devote resources to processing and recognizing the audio. Our system takes about 5-10 seconds to process and detect 3 seconds worth of audio, so steps would have to be taken to speed up this process. As well, our system would need to record and detect audio simultaneously, which is currently not possible.

Further, we could improve audio filtering from the existing filters built into the microphone. This could be done by utilizing two or more microphone arrays that can be used to reduce background noise and focus solely on the closest speaker.

Finally, a fully robust sign language interpreter robot would require a much larger vocabulary. Our current design only implements 24 letters of the alphabet. To improve our design, we would need to make significant changes to our robot’s vocabulary to accommodate full words, phrases, and grammar in ASL.

If all the changes specified above could be made, the design could still be further improved by adding the capacity for detecting languages other than English and translating to sign languages other than ASL.

Choosing a Speech Recognition Model

Discussions

Become a Hackaday.io Member