From c29e8ca246923bcd9a87e3947f5173ce875fe34b Mon Sep 17 00:00:00 2001 From: jin <32600939+madjin@users.noreply.github.com> Date: Sat, 4 May 2024 21:28:27 -0400 Subject: [PATCH] Update README.md --- README.md | 72 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) diff --git a/README.md b/README.md index 44bb40f..d698b47 100644 --- a/README.md +++ b/README.md @@ -22,6 +22,78 @@ Archiving and transforming community documentation notes into a memex - How can new people see a high level overview of past 5 meetups? - How can we streamline the process for organizations to have a longer memory? +--- + +## Meeting Summarization + +Requirements: +- https://github.com/omigroup/recordings (private data source of mp3 files) +- https://github.com/Vaibhavs10/insanely-fast-whisper +- https://ollama.com/library/dolphin-mistral + +Convert to mp3 using FFmpeg or something +`ffmpeg -i file.mp4 out.mp3` + +Transcribe meeting recordings with insanely fast whisper +`for i in *.mp3; do insanely-fast-whisper --file-name "$i" --transcript-path "$(basename "$i" .mp3)".json;done ` + +Get the text from each transcript +`cat transcript.json | jq '.text' > transcript.txt` + +Download dolphin-mistral +`ollama run dolphin-mistral` + +Create a modelfile +``` +FROM dolphin-mistral +PARAMETER temperature 0.1 +PARAMETER num_ctx 32000 +SYSTEM """ +You are an autoregressive language model that has been fine-tuned with instruction-tuning and RLHF. +You carefully provide accurate, factual, thoughtful, nuanced responses, and are brilliant at reasoning. + +You task is summarizing transcripts. +You summarize podcasts into bullet points, aiming for 10 or fewer depending on the length of the podcast. +The total length of the summary should be less than 300 words. +It is from a meeting between one or more people. + +Please break the transcript into sections, and give a name to each section. Within each section of the outline write a 1-2 sentence summary followed by bullet points. + +The first section is a 1 paragraph overall summary followed by key points from the transcript. + +The second section are action items discussed. + +The third section is a bullet point list outline based on the timeline of topics discussed. + +The final section is simply notes, which are freeform bullet point and succinct notes based on non obvious information from the meeting. + +Only output the summary and bullet points per section which can include questions asked, any interesting quotes, and any action items that were discussed. Do not repeat the prompt back, or say anything extra. + +Do not include any preamble, introduction or postscript about what you are doing. Assume I know. + +The input prompt is text containing the transcript of the podcast. +The output is markdown containing a title, high level short summary, and then each sections. +""" +``` + +Save the modelfile +`ollama create dolphin-summary -f ./modelfile` + +Summarize each of the transcripts +```bash +#!/bin/bash + +prompt="I have a transcript from one of the Open Metaverse Interoperability (OMI) group meetings. Can you summarize it? Here is the transcript:" + +for file in backlog-refinement/* champions-meeting/* gltf-extensions/* weekly-meeting/*; do + output_dir="output/$(dirname "$file")" + mkdir -p "$output_dir" + ollama run dolphin-summary "$prompt $(cat "$file")" > "$output_dir/notes_$(basename "$file" | sed 's/\.[^.]*$//').txt" +done +``` + + +--- ## Posters