Close

ChatGPT x Stack-chan act.2

A project log for Stack-chan - JavaScript driven super-kawaii robot

An easy-to-build and companion robot for everyone

shinya-ishikawaShinya Ishikawa 04/13/2023 at 15:430 Comments

Aiming for a "Robot that Talks with People"

One of my goals was to create a "robot that talks with people" through Stack-chan.

Stack-chan is already cute without doing anything, but I have been working on development with the dream of a future where Stack-chan can provide advice to users, give praise and encouragement, and engage in playful communication with other Stack-chans.

I had thought that it would take another 1-2 years to fully integrate dialogue management, but with the arrival of ChatGPT, I feel like the future has come in a giant leap!

So, to achieve a "robot that talks with people," I've implemented and tested the ChatGPT integration feature!

When users talk to Stack-chan, Stack-chan responds with a cute voice. When users ask Stack-chan to introduce itself, it replies, "I am a robot called Stack-chan." In other words, ChatGPT is acting as Stack-chan. As those who use the web version of ChatGPT may know, by starting the conversation with the setting "You are a super cute robot called Stack-chan," ChatGPT will generate responses adhering to this setting.

Demo Mechanism

The mechanism is as follows. Heavy processes such as speech recognition and synthesis are performed on an external PC.

  1. The PC recognizes the user's voice (using the VOSK speech recognition library).
  2. The PC sends the recognized text to Stack-chan.
  3. Stack-chan sends the user's message to the ChatGPT API and receives a reply from the AI.
  4. In addition to the authentication API key, the array chat messages is sent.
  5. Chat messages have a role (role) and content (content). The role indicates user, AI, or system, and the content represents the actual message content.
  6. Using system roles, you can provide ChatGPT with character settings and instructions for responses.
  7. The AI's reply is converted to audio data (using the VOICEVOX speech synthesis engine).
  8. The audio data is played back.

You can try this demo on the Moddable version of the firmware I'm developing. Since all the basic modules are implemented in JavaScript, it can be used by those unfamiliar with Arduino (C++) or web engineers.

In the future, I plan to improve the ease of getting started and usability, such as allowing users to write apps using a web browser without setting up an environment.

Impressions of Using ChatGPT

First and foremost, I was amazed by the naturalness of the responses! I truly felt like I was having a proper conversation with a robot for the first time.

The API is also easy to use and can be quickly integrated into various systems.

On the other hand, ChatGPT is ultimately an AI designed for "text-based chat," and there are challenges when integrating it fully into a communication robot for "voice-based conversation."

Application Ideas

Various modifications can be made from this configuration.

What would you do with Stack-chan?

Discussions