Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
madjin authored May 5, 2024
1 parent 570fcb1 commit c29e8ca
Showing 1 changed file with 72 additions and 0 deletions.
72 changes: 72 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,78 @@ Archiving and transforming community documentation notes into a memex
- How can new people see a high level overview of past 5 meetups?
- How can we streamline the process for organizations to have a longer memory?

---

## Meeting Summarization

Requirements:
- https://github.com/omigroup/recordings (private data source of mp3 files)
- https://github.com/Vaibhavs10/insanely-fast-whisper
- https://ollama.com/library/dolphin-mistral

Convert to mp3 using FFmpeg or something
`ffmpeg -i file.mp4 out.mp3`

Transcribe meeting recordings with insanely fast whisper
`for i in *.mp3; do insanely-fast-whisper --file-name "$i" --transcript-path "$(basename "$i" .mp3)".json;done `

Get the text from each transcript
`cat transcript.json | jq '.text' > transcript.txt`

Download dolphin-mistral
`ollama run dolphin-mistral`

Create a modelfile
```
FROM dolphin-mistral
PARAMETER temperature 0.1
PARAMETER num_ctx 32000
SYSTEM """
You are an autoregressive language model that has been fine-tuned with instruction-tuning and RLHF.
You carefully provide accurate, factual, thoughtful, nuanced responses, and are brilliant at reasoning.
You task is summarizing transcripts.
You summarize podcasts into bullet points, aiming for 10 or fewer depending on the length of the podcast.
The total length of the summary should be less than 300 words.
It is from a meeting between one or more people.
Please break the transcript into sections, and give a name to each section. Within each section of the outline write a 1-2 sentence summary followed by bullet points.
The first section is a 1 paragraph overall summary followed by key points from the transcript.
The second section are action items discussed.
The third section is a bullet point list outline based on the timeline of topics discussed.
The final section is simply notes, which are freeform bullet point and succinct notes based on non obvious information from the meeting.
Only output the summary and bullet points per section which can include questions asked, any interesting quotes, and any action items that were discussed. Do not repeat the prompt back, or say anything extra.
Do not include any preamble, introduction or postscript about what you are doing. Assume I know.
The input prompt is text containing the transcript of the podcast.
The output is markdown containing a title, high level short summary, and then each sections.
"""
```

Save the modelfile
`ollama create dolphin-summary -f ./modelfile`

Summarize each of the transcripts
```bash
#!/bin/bash

prompt="I have a transcript from one of the Open Metaverse Interoperability (OMI) group meetings. Can you summarize it? Here is the transcript:"

for file in backlog-refinement/* champions-meeting/* gltf-extensions/* weekly-meeting/*; do
output_dir="output/$(dirname "$file")"
mkdir -p "$output_dir"
ollama run dolphin-summary "$prompt $(cat "$file")" > "$output_dir/notes_$(basename "$file" | sed 's/\.[^.]*$//').txt"
done
```


---

## Posters

Expand Down

0 comments on commit c29e8ca

Please sign in to comment.