---
title: "The Blueprint: Translating stream-of-conscious speech into responsive, actionable task lists"
description: "Welcome to The Blueprint, a new feature where we highlight how Google Cloud customers are tackling unique and common challenges across industries usin"
tags: ["데이터베이스", "AI", "에이전트", "Google Cloud"]
created: "2026-05-07"
---

# The Blueprint: Translating stream-of-conscious speech into responsive, actionable task lists

> 레이아웃 확인용으로 생성한 실시간 IT 뉴스 기반 샘플 문서입니다. 원문 RSS의 제목과 요약, 링크를 바탕으로 한국어 해설 형식의 본문을 구성했습니다.

## 원문 정보

- 출처: Google Cloud Blog
- 게시 시각: Wed, 06 May 2026 16:00:00 +0000
- 원문 링크: [https://cloud.google.com/blog/topics/startups/the-blueprint-doist-stream-of-consciousness-ai-task-list-creation/](https://cloud.google.com/blog/topics/startups/the-blueprint-doist-stream-of-consciousness-ai-task-list-creation/)

## 빠른 요약

Welcome to The Blueprint, a new feature where we highlight how Google Cloud customers are tackling unique and common challenges across industries using the latest AI and cloud technologies. We hope to inspire others looking to innovate in their work . Founded in 2007, Doist is a pioneer in async and remote-first work on a mission to simplify life’s complexities through apps like Todoist for task management and Twist for team communication. The challenge: We launched Ramble to take our popular Todoist application to the next level by capturing non-stop, stream-of-consciousness talking. Our inspiration was that scene from The Devil Wears Prada where Miranda Priestly rapid-fires a dozen tasks at her assistant. We asked: What if anyone could capture tasks that way? No typing, no careful formatting. Just talk and let Todoist do the organizing. That use case became our north star. At the outset, we identified four big technical hurdles: We needed fast and accurate real-time communication with tool-calling capabilities. Multilingual suppor t at scale but with great support for slang, accents, and more. As traditional assertion-based testing would not work for our platform, we would have to find a way to achieve non-deterministic output testing and semantic validation. Reliable, flawless handling of audio across browsers . The solution: We built Ramble using Gemini Enterprise Agent Platform and its previous iteration, Vertex AI; specifically, we’re using Agent Platform to access the Gemini Flash models . We chose these over other options primarily due to the quality of Google's state-of-the-art models and its clear terms and assurances about preserving privacy. Gemini’s Live API (accessed via Agent Platform) powers Ramble’s core real-time interactions and key capabilities, including native audio streaming, proactive tool calling, session resumption, and multilingual understanding. Ramble sends the raw PCM audio directly to the model without pre-transcription. Gemini handles language detection, speech recognition, and semantic understanding in a single pass, reducing latency. It then invokes our purposefully designed tools ( addTask , editTask , deleteTask , etc) autonomously as the user speaks, without waiting for explicit commands. The APIs in Agent Platform provide resumption tokens that let users pause and continue sessions, which is essential for mobile users who might switch apps or lose connectivity. The end result is a clear, concise list of the tasks, regardless of how many, how inconsistently, or how confusingly they may have been rambled by the user. The architecture: The outcome: Ramble has come to rely on the quality of Google’s AI models, particularly the reasoning and near-instant audio-processing capabilities of Gemini Flash. Other platforms and models offer similar capabilities, and we did bake in support for them, but none hit our internal quality bar as consistently as Gemini. When it came to a user's unstructured “rambling” and the need to fill in gaps, Gemini turned out to be the most intelligent of all the models we explored. The result was the clearest and most consistent breakdown of tasks, which was the exact magical user experience we wanted to create. After an early rate-limit incident caused by unexpectedly high usage during alpha testing, we developed a deeper, more proactive partnership with Google, ensuring long-term sustainability and the support necessary for our high API usage. Since then, it's been easy for us to connect directly with Google Cloud staff, including engineers, when issues arise. Here at Doist, Ramble took off both in a qualitative and quantitative sense. It’s become a hallmark experience that incentivizes us to explore tasteful applications of AI that can enhance our existing product experience, both in the B2C space as well as B2B. Beyond task creation, we’re considering several opportunities across the productivity journey, from capture to planning and even automation. The details: We structured our back-end to enable future voice-powered features. The architecture includes a provider-agnostic streaming layer; a dictation module for one-way audio; Ramble (our “brain dump” module); and a conversation module to support streaming bi-directional audio and future conversational features. This layered design means we can ship new voice features with minimal additional infrastructure work. It also enables provider flexibility; although we’re using Gemini Enterprise Agent Platform in production, our abstraction layer also easily supports other solutions. In addition to helping us tackle three of our four key technical challenges, Agent Platform delivered some nice surprises. First, session resumption was easier than we expected. We initially thought maintaining conversation state across reconnections would require complex server-side session management. But once we understood Agent Platform’s resumption token approach (the token is provided by the API and changes with each context update), implementation was straightforward across all platforms. Second, context injection worked on the first try. We spent considerable time designing how to provide user context (projects, labels, preferences) to the model. We explored complex retrieval strategies and dynamic context windows. In the end, the simple "v1" approach—just passing most of the user's metadata in the system prompt—worked remarkably well. For testing, we combined structural validation (task count, priority levels, date presence, etc) with semantic validation (did the model understand the user's intent?) following the LLM-as-judge approach. A second Gemini model evaluates whether the output semantically matches the expected outcome. Native speakers from our global team recorded real-world scenarios in their languages and local accents (15+ language variations and over 100 recordings total), with each scenario having expected semantic outcomes (e.g., "should create 3 tasks: one about calling family, one about shopping, one about exercise on Saturday at 11 AM"). We then created a defined pass-rate threshold for the test suite overall, while also monitoring per-language performance to catch regressions. This approach lets us evaluate new model versions systematically, understanding not just overall performance but also which specific languages might see improved or degraded experiences, and make data-informed decisions. Ultimately, Ramble is a resounding success in helping our users handle the chaos of day-to-day life. It joins the ranks of Todoist’s Quick Add — our existing natural-language task input — in providing yet another way to capture tasks that is the best in its category.

이 항목은 `문서/데이터베이스` 카테고리에 배치했습니다. 실제 운영에서는 Hermes가 뉴스 후보를 수집한 뒤, 제목·요약·출처·태그·관련 내부 문서를 함께 정리하는 방식으로 확장할 수 있습니다. 지금은 화면 확인을 위해 의도적으로 본문을 어느 정도 길게 구성했습니다.

## 왜 볼 만한가

첫째, 이 소식은 단순한 제품 발표나 링크 모음으로 끝나지 않고 개발자 경험, 인프라 운영, 보안 정책, 클라우드 비용, AI 도구 활용 방식 중 하나와 연결될 가능성이 있습니다. 기술 뉴스 사이트를 운영할 때 중요한 점은 “무슨 일이 있었다”보다 “내 운영 환경이나 학습 경로에 어떤 의미가 있는가”를 정리하는 것입니다.

둘째, 이 문서는 카테고리와 태그가 실제 화면에서 어떻게 보이는지 확인하기 위한 샘플입니다. 좌측 문서 트리에는 디렉토리 구조가 그대로 나타나고, 홈 화면의 최신 문서 카드에는 제목과 설명이 표시됩니다. 검색 페이지에서는 제목, 요약, 본문 일부가 SQLite FTS5 인덱스에 들어가므로 실제 검색 결과의 밀도도 확인할 수 있습니다.

## 운영자 관점의 해설

Hermes 기반 자동 게시 시스템에서는 이런 글을 주기적으로 생성하되, 원문을 단순 번역하지 않는 것이 중요합니다. 원문 링크를 남기고, 한국어 독자가 바로 판단할 수 있도록 맥락과 적용 포인트를 붙이는 편이 좋습니다. 예를 들어 보안 관련 뉴스라면 “패치 여부”, “영향받는 구성”, “내 서버에서 확인할 명령”이 필요하고, AI 도구 뉴스라면 “실제 워크플로우 변화”, “비용 구조”, “자동화 가능성”을 정리하는 것이 유용합니다.

## 사이트 레이아웃 확인 포인트

- 긴 제목이 카드와 본문에서 줄바꿈될 때 어색하지 않은지 확인합니다.
- 태그가 많을 때 좌측 사이드바와 문서 헤더가 지나치게 복잡해지지 않는지 봅니다.
- 원문 링크가 본문 폭을 깨뜨리지 않는지 확인합니다.
- 우측 목차가 H2 섹션을 잘 잡는지 확인합니다.
- 모바일 화면에서 본문, 문서 트리, 검색창이 자연스럽게 접히는지 확인합니다.

## 후속으로 확장할 수 있는 글

이 뉴스가 중요하다고 판단되면 별도의 심층 문서로 확장할 수 있습니다. 예를 들어 `데이터베이스` 주제의 개념 정리, 실습 가이드, 운영 체크리스트, 관련 도구 비교 문서로 이어갈 수 있습니다. 장기적으로는 이런 뉴스성 문서가 쌓이고, 그중 일부가 책이나 지식베이스 챕터로 승격되는 흐름을 만들 수 있습니다.
