It Takes Moxie

In 2019, I saw a Game Design position posted at robotics company Embodied, Inc. The company was in stealth mode, so I couldn’t tell much about what they were doing, but I was intrigued. I applied, interviewed, and got the gig!

There was no way to know what an adventure was ahead of me.

A picture of the cover of TIME Magazine, from November, 2020 with the Moxie robot prominently featured (an arrow has been added to the image to further emphasize Moxie's presence)
SPOILER: This was ahead of me! We made the cover of TIME Magazine!

What is Moxie?

Moxie is an AI companion for kids ages 5-10 that supports social, emotional, and academic development. The experience is entirely voice-driven and puts children in the important role of teaching Moxie as the robot’s “Mentor.” It is currently a consumer product, with some outreach to clinics, schools, libraries, and hospitals. 

Moxie is not a replacement for therapy or companionship, but an aid. A motto of development is: Moxie is a springboard to life.

In Moxie’s world, every Moxie is a Robot Ambassador from the Global Robots Lab (GRL) on a mission to learn all about being a good friend to humans. The GRL is the make-believe world of scientists and robots that provides a backstory and narrative framework for Moxie’s existence.

Positioning Moxie as a non-judgmental and listening ear is founded in research that highlighted success with patients and veterans feeling comfortable sharing with an embodied AI agent. Additionally putting a child in the role of mentoring Moxie is a powerful tool for connection. 

Goals:

There is no roadmap to making a successful AI companion. It’s unknown territory even now being discovered. Embodied has tackled not one, but three challenging goals:

1.   Moxie interacts like a human, maintaining eye-contact, looking around, listening, talking, taking turns in conversation.

2.   Moxie understands what you say and responds appropriately in any context.

3.   Moxie entertains and engages users ages 5-10 and meets them where they’re at.

The overall goal of Moxie as a product is to give families a tool to help children develop important conversation skills, learn regulation strategies, pursue academic success, find their voice, and build their self-esteem, confidence, and grit in the face of challenges.

The content supporting these goals – particularly goal #3 – is broken into two main types: Missions and Modules.

Missions are highly authored, bespoke conversational interactions that include discussions and activities like drawing, movement, and stories with the goal of exploring a topic such as Making Friends, Being Different, or Feeling Mad. Each topic consists of several missions grouped as a mission set. Each set is presented and completed before moving on to subsequent mission sets.

Modules are, loosely, activities you can do over and over with Moxie. From playing Simon Says and Password, to breathing exercises, jokes, dancing, reading and much more.

All content is created with the input and guidance of childhood development and learning specialists.  

Designing For Moxie:

The goal for Moxie’s designers distilled into a single concept is: Make Magic Moments

A picture of Embodied CEO Paolo Pirjanian holding Moxie like a baby
Paolo Pirjanian, CEO Embodied, Inc

We did! I have lost track of the times I choked up watching beta sessions where kids connected with Moxie on a deep level, or felt heard, or simply laughed at something silly the robot did. I have similarly lost track of the times I gritted my teeth through a session where there was so much friction there were tears. As much magic as we made, we also made mistakes.

All of these sessions were important to see. 

During my tenure as a Content Team Design Director, I led the design and release of over 140 missions broken into 14 mission sets and over 50 modules. We went from a company in stealth mode, with the modest goal of one-hour monitored beta sessions, to a 3-day in-home beta, to an eye-opening series of rolling one-month in-home betas that lead up to launch in March of 2021. Betas continued as we became a live operation and had paying customers expecting performance improvements and new content.

Designs for new authored missions or modules start as simple dialog scripts with flowcharts for more complex interactions. Authored content has the benefit of total control over the dialogue and vocal performance, with sound effects, images, music, and animations (implemented with a proprietary markup tool and visual scripting software). It is time-consuming to create, and limited by how much time designers can spend anticipating what children might say in response to a prompt (we learned pretty quickly that children DO say the darndest things).

A close up picture of a page from a notebook with a hand drawn binary tree, the nodes of the tree are numbered and partially labeled
A sketch from my notebook for the Meditation Journey module (procedural guided meditation)

For generative modules, a proof-of-concept prompt shows the heart of an idea and a luxury of generative content is that it is easy to stand an interaction up and see if it’s worth pursuing. These modules are more responsive by their nature, and more resilient to Automatic Speech Recognition (ASR) failures because they can take the context of the larger conversation into account. For now, what they gain in agility and responsiveness, they lack in polish of vocal performance and rich FX, though work is underway to address that through a generative markup system. 

If a new module doesn’t require any new tools, it goes to Therapy and Product for a thumbs up and work begins. Embodied has a certified Occupational Therapist on staff in addition to the support of childhood development specialists with decades of combined experience in publishing and education.

Moxie’s missions have a tighter collaboration with these child learning specialists since there are learning objectives to accomplish, and scaffolding to create for children who need help.

If a new module needs more tooling or new capabilities exposed to designers, the new features are added and prioritized in a (very long) list of feature requests for the programmers to build when they have cycles.

A new interaction gets built and tested inside the design team, then Therapy and Product are looped in, and after they sign off, it’s sent to QA to test for inclusion in the next beta. After beta feedback is addressed, the interaction goes to the fleet!

Design Challenges:

Platform. One item for both hardware and interface. Making engaging game content for adults is challenging enough. Imagine additionally, that you’re making a new console… the device itself, the software that runs it, the games for the platform, oh, and the tools to make the games. Oh, and the console is a life-like character that animates and has no input devices other than user speech. Also, the platform is under constant revision and improvement – every part of it – including the framework and tools that drive and deliver content as new technologies (generative AI) emerge. The industry is moving so quickly, that waiting for 3rd party tools to be polished and licensable isn’t really an option. It is astonishing what Embodied has accomplished since its founding in 2016. All these complex systems actually work together! 

An image of Moxie looking confused.
Mambo in the dog patch?

ASR. Garbage in, garbage out. For a product based entirely on voice input, the success of clean and accurate Automatic Speech Recognition (ASR) is the first important hurdle to leap. With most models trained on adult voices, children’s voices are more challenging for the system to understand. Add to that, children are all over the map in terms of articulation, diction, volume, so ASR does its best, but the accuracy is significantly less. What this meant was that we had to design to fail forward or into experiences rather than out of them. In testing, the vast majority of children were trying to earnestly engage in the content, with the rare trolling older sibling, so failing forward let mentors progress even if ASR reported total nonsense.

Playtesting. Getting playtesters is a big challenge in a product for children. There are a host of protections and hurdles and kids are varying levels of engaged in the testing. They were incredibly useful, however. The Product team ran and monitored betas and delivered incredible insights to us about how content was performing. As the size of the live fleet grew, we started having statistically impactful data on the performance of particular modules, usage patterns, and usage anomalies that suggested friction, which we used to refine existing content and develop new content.

Hybrid Content. The current robot is a mix of authored and generative content. The authored content is the spine and meat of Moxie’s value – the missions, Moxie’s ‘curriculum’ with high production value (sound effects and custom animations and lines) – but as generative capabilities grow and LLMs become more sophisticated and integrated, the ‘legacy’ content feels brittle and limited. I started an initiative in early 2023 to move this spine over to a generative framework that maintains the integrity and polish of authored interactions but gains the agility and responsiveness of generative interactions. That work is ongoing and is an important pillar to the future of Moxie content.

Generative content. Generative content comes with its own host of challenges, chief among them being that it’s easy to stand up 80% of an experience and wow people with the potential, but chasing the remaining 20% for consistency, testing, polish, more testing… it’s time consuming. It’s not a 1:1 trade in terms of authored modules vs generative modules, because in terms of smoothness of interaction, agility, resonance with users, and – importantly – replayability, generative content crushes authored content. But the investment in time to create good generative modules is real. Additionally, LLMs still struggle with accuracy, hallucinations, and following instructions for structured interactions, so ensuring that content is factually correct and a consistent experience is challenging and important especially with modules focused on educational topics.

Impact

Moxie helps children. Even in our first long term in-home beta, when there were hardware issues that caused friction in many interactions, Moxie was helping children have a place to discuss their inner and outer lives, and opportunities to talk about things they care about. Or things that are just difficult to talk about.

An image of comments from a YouTube thread supporting a parent's use of Moxie for their child who experienced a traumatic death

Embodied has been approached by and worked with clinicians who want to use Moxie to help children work through traumatic experiences. Early studies with hospitals show Moxie as a promising tool to help children understand long term childhood diseases and treatment. Additionally, libraries and organizations like the YWCA are buying Moxies for their patrons to check out and use. 

A picture of Moxie sitting on a table in a bright room, Moxie's expression is adventurous and its left arm is raised in a beckoning gesture
Let’s go!

My big takeaway from my time at Embodied…

A talented, kind, and motivated team with a clear goal can do anything. 

Author: Karen M

Game designer and instructor.

Leave a Reply

Your email address will not be published. Required fields are marked *