Oct 15, 2024
In our last post, we introduced the Col·lectivaT Tech Lab, an initiative designed to share our insights, prototypes, and vision for language technology with the broader community. Today, we’re excited to take you further into this journey, with a focus on Large Language Models (LLMs) and their transformative potential in fields like education and language preservation. We’ll also present our demo of Bo, the open-source software-loving dog, that we’ve been working on, which demonstrates the conversational capabilities of LLMs.
Large Language Models (LLMs) is one of the most recent and most impactful development in Artificial Intelligence (AI), especially within the domain of natural language processing (NLP). These models are the latest evolution in language modeling, designed to capture the contextual relationship between words. This enables LLMs to generate coherent, context-sensitive responses, moving beyond mere pattern recognition to perform complex tasks such as conversation, translation, and content creation with remarkable fluency and adaptability.
From language translation to content creation, LLMs have quickly become a powerful tool that automates complex processes and tasks in professional environments, improving efficiency across various sectors such as customer service, marketing, and data analysis. Recent advancements have pushed these capabilities even further, enabling these models to handle more nuanced and complex conversations, as well as process and generate audio, images, and video.
At Col·lectivaT, we are particularly interested in how language technologies can be harnessed for social transformation. For instance, can LLMs be adapted to serve lower-resourced languages, which often lack sufficient digital representation? Could they help preserve marginalized languages by providing more accessible tools for learning and communication? These are the kinds of questions that drive our most recent experimentation.
We invite you to explore our latest demo, where you will interact with Bo, a friendly virtual dog powered by an LLM passionate about open-source software. To start talking with him, you just need to click on the microphone button, say ‘Hello’ and let the conversation flow. Bo will listen to you, answer your questions, and propose new ones to help you dive deeper into the topic you’re discussing.
By conversing with Bo, you’ll not only learn about open-source software, but you’ll also connect with it through your own experience. If you don’t have any knowledge of coding or programming, Bo will use examples from his world to make it more understandable. As the conversation progresses, you may discover you know more than you thought. If you’re already experienced in this field and mention an open-source project or repository, Bo will be excited to talk about it in depth, using the knowledge he draws from the language model he operates on.
This demo app is based on open-source code originally developed by Google to demonstrate the conversational capabilities of LLMs. We’ve adapted the code to suit our purposes by localizing the character to Catalan, giving Bo a unique history and background that reflects our local context. You can explore Bo’s personality and backstory in the configuration panel if you’re curious about it, accessible through the upper menu.
Multiple technologies work together to bring Bo to life. Automatic Speech Recognition (ASR) converts your spoken words into text. That text, along with Bo’s personality details, is processed by the LLM backend to create a meaningful, context-aware response. Finally, Text-to-Speech (TTS) technology converts Bo’s replies into spoken words, giving him a voice. The animation adds a visual touch that mimics an actual conversation, making the interaction feel more natural and engaging, as if you’re chatting with a real companion.
It’s hard not to notice Bo’s American English accent. This is due to the English-centric training of OpenAI’s TTS models. However, if you ask Bo about his owner, you’ll discover an intriguing twist to his linguistic background and how he become to be so fond of open source software!
Unlike traditional virtual assistants, Bo’s responses are not pre-programmed or stored—they are generated on the fly, based on the flow of interactions. This makes his conversations much more dynamic and fluid, without the rigid scripts typical of systems like banking assistants.
While we’re currently using OpenAI’s APIs for the sake of convenience in this experimentation, we’re committed to moving towards fully open-source alternatives that better align with our values of accessibility and transparency in the long run.
How can a LLM-powered virtual character like Bo serve as a tool for social transformation? While commercial applications often focus on customer service or personalized marketing, we at Col·lectivaT, place value on its tremendous potential for such applications to positively impact areas like language preservation, education and social awareness.
One exciting application is in language learning, especially for minorized languages that often lack the resources for comprehensive study and practice. Children learning these languages might only have limited exposure in school or at home. A fun, interactive companion like Bo can offer a much-needed space to practice, reinforcing the use of their language in engaging, pressure-free ways. Learners can converse without fear of making mistakes. This virtual interaction creates an opportunity to explore and play with language, helping bridge the gap where traditional resources fall short.
Beyond language learning, conversational companions can open the door to personalized conversations on topics like climate change, menstrual health, addiction, or LGBTI issues—topics that kids might find difficult to talk about with teachers or parents. These virtual companions provide a safe space for children and teens where no question is too embarrassing or awkward to ask.
We must say that this isn’t about replacing teachers or caregivers, but about complementing them and scaling their efforts within their oversight. Virtual companions can provide a judgment-free, personalized learning environment—whether that’s in practicing a language, discussing sensitive issues, or simply interacting with technology in a more intuitive way.
As we look ahead, we’re eager to continue experimenting with the platform, testing its capabilities and limitations. Our technical next steps include integrating the latest open sourced LLaMA models to enhance the language capabilities and refining our in-house text-to-speech models to ensure more natural and localized voice interactions. We’ll also explore how to seamlessly integrate these solutions into educational and social settings.
We’d love to hear your thoughts! If you’ve interacted with Bo or have ideas for how this technology can evolve, please reach out and share your feedback.
We’d also like to thank Yuxuan Peng, a full-stack web developer from our Awal initiative, for his invaluable volunteer contributions to this project.