AI-Powered FAQ Generator from Video Transcripts

Building an automated FAQ generation system using Google Cloud Speech-to-Text and Vertex AI to extract Q&A from video content.

The Problem

You have hours of video content but users can't find specific information without watching everything. Let's build an automated system that generates FAQs from videos using AI.

Why Google Cloud?

What you need:

Convert video audio to text
Extract meaningful questions and answers
Do it automatically at scale

The stack:

Google Speech-to-Text for transcription
Vertex AI (Gemini) for FAQ generation
Cloud Run for the API
Cloud Storage for files

Flow:

Video → Extract Audio → Transcribe → AI Analysis → Generate FAQs

Step 1: Set Up Google Cloud

Enable the APIs you need:

gcloud services enable speech.googleapis.com
gcloud services enable aiplatform.googleapis.com
gcloud services enable storage.googleapis.com
gcloud services enable run.googleapis.com

Create storage buckets:

# For uploaded videos
gsutil mb gs://your-videos-bucket
 
# For audio files
gsutil mb gs://your-audio-bucket

Step 2: Extract Audio from Video

Use FFmpeg to extract audio:

# extract_audio.py
import subprocess
from google.cloud import storage
 
def extract_audio(video_path, output_path):
    """Extract audio from video and convert to WAV."""
    subprocess.run([
        'ffmpeg',
        '-i', video_path,
        '-acodec', 'pcm_s16le',
        '-ac', '1',  # mono
        '-ar', '16000',  # 16kHz
        output_path
    ], check=True)
 
def process_video(video_uri, bucket_name):
    # Download video
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    
    video_blob = bucket.blob(video_uri)
    video_blob.download_to_filename('/tmp/video.mp4')
    
    # Extract audio
    extract_audio('/tmp/video.mp4', '/tmp/audio.wav')
    
    # Upload audio
    audio_blob = bucket.blob(f'audio/{video_uri}.wav')
    audio_blob.upload_from_filename('/tmp/audio.wav')
    
    return f'gs://{bucket_name}/audio/{video_uri}.wav'

For long videos, split into chunks to avoid API limits.

Step 3: Transcribe with Speech-to-Text

Use Google Speech-to-Text API:

# transcribe.py
from google.cloud import speech_v1p1beta1 as speech
 
def transcribe_audio(audio_uri):
    """Transcribe audio file from Cloud Storage."""
    client = speech.SpeechClient()
    
    audio = speech.RecognitionAudio(uri=audio_uri)
    
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code='en-US',
        enable_automatic_punctuation=True,
        enable_word_time_offsets=True,
        model='video',  # Optimized for video
    )
    
    # Long-running operation for files > 1 min
    operation = client.long_running_recognize(
        config=config,
        audio=audio
    )
    
    print('Waiting for transcription...')
    response = operation.result(timeout=600)
    
    # Extract transcript
    transcript = ''
    for result in response.results:
        transcript += result.alternatives[0].transcript + ' '
    
    return transcript.strip()

Key settings:

Use model='video' for better accuracy on video audio
Enable punctuation for readable text
Word timestamps help link FAQs to video moments

Step 4: Generate FAQs with Vertex AI

Use Vertex AI to analyze the transcript and generate FAQs:

# generate_faqs.py
from vertexai.preview.generative_models import GenerativeModel
import vertexai
 
def generate_faqs(transcript):
    """Generate FAQs from transcript using Gemini."""
    
    vertexai.init(project='your-project', location='us-central1')
    model = GenerativeModel('gemini-pro')
    
    prompt = f"""
You are an expert at analyzing video transcripts and generating helpful FAQs.
 
Analyze this video transcript and generate 8-12 frequently asked questions with clear answers.
 
Requirements:
- Questions should be what users would actually ask
- Answers should be comprehensive but concise
- Use information only from the transcript
- Format as JSON array
 
Transcript:
{transcript}
 
Output format:
[
  {{
    "question": "What is...?",
    "answer": "...",
    "category": "general|technical|howto"
  }}
]
"""
    
    response = model.generate_content(prompt)
    return response.text
 
# Usage
transcript = transcribe_audio('gs://bucket/audio.wav')
faqs = generate_faqs(transcript)
print(faqs)

Prompt tips:

Be specific about what you want
Provide the output format
Set clear constraints (8-12 questions)
Ask for natural, user-focused questions

Step 5: Build the API

Create a simple API with FastAPI:

# main.py
from fastapi import FastAPI, UploadFile, BackgroundTasks
from google.cloud import storage
import uuid
 
app = FastAPI()
 
storage_client = storage.Client()
 
def process_video_task(video_id: str, video_path: str):
    """Background task to process video."""
    # Extract audio
    audio_uri = extract_audio(video_path, 'your-bucket')
    
    # Transcribe
    transcript = transcribe_audio(audio_uri)
    
    # Generate FAQs
    faqs = generate_faqs(transcript)
    
    # Save results (to Firestore, DB, etc.)
    save_faqs(video_id, faqs)
 
@app.post("/process-video")
async def process_video(
    file: UploadFile,
    background_tasks: BackgroundTasks
):
    """Upload video and start processing."""
    video_id = str(uuid.uuid4())
    
    # Upload to Cloud Storage
    bucket = storage_client.bucket('your-videos-bucket')
    blob = bucket.blob(f'{video_id}/{file.filename}')
    blob.upload_from_file(file.file)
    
    # Start background processing
    background_tasks.add_task(
        process_video_task,
        video_id,
        f'gs://your-videos-bucket/{video_id}/{file.filename}'
    )
    
    return {"job_id": video_id, "status": "processing"}
 
@app.get("/faqs/{video_id}")
async def get_faqs(video_id: str):
    """Get generated FAQs for a video."""
    faqs = load_faqs(video_id)
    return {"video_id": video_id, "faqs": faqs}

Deploy to Cloud Run:

# Build container
gcloud builds submit --tag gcr.io/your-project/faq-generator
 
# Deploy
gcloud run deploy faq-generator \
  --image gcr.io/your-project/faq-generator \
  --platform managed \
  --region us-central1 \
  --memory 2Gi \
  --timeout 900

Step 6: Optimize and Monitor

Key optimizations:

1. Chunk Long Videos

# Split audio into 10-minute chunks
def chunk_audio(audio_path, chunk_length=600):
    """Split audio into manageable chunks."""
    # Use pydub or ffmpeg to split
    # Process chunks in parallel
    # Merge transcripts

2. Cache Results

Cache transcripts to avoid re-processing
Store generated FAQs
Reuse for similar videos

3. Monitor Costs

# Track API costs
costs = {
    'speech_to_text': len(audio_minutes) * 0.024,
    'vertex_ai': len(transcript_chars) / 1000 * 0.0005,
}

4. Add Error Handling

from tenacity import retry, stop_after_attempt, wait_exponential
 
@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def transcribe_with_retry(audio_uri):
    return transcribe_audio(audio_uri)