Skip to content

Latest commit

 

History

History
18 lines (18 loc) · 2.14 KB

presentation.md

File metadata and controls

18 lines (18 loc) · 2.14 KB

Azure AI findings

  • Quality of video needs to be improved
  • Cannot analyze videos over 100MB
  • Compared results between Video indexer vs AI Studio with vision
  • Vision Studio placed incorrect timestamps on certain behavior (e.g., pulling hair or finger in the mouth)
  • Vision Studio identified 5/60 correct instances of requested behavior (video tested: 20150217_165854.mp4)
  • Playground didn't load videos after 25+ min (video tested: 20230613_155442.mp4)
  • Vision Studio continuously added duplicate timestamps to all videos tested
  • Playground reports extended/continous actions (e.g holding head for 5 seconds) as repeated actions (e.g The person touches their head at the following timestamps: 00:00:31, 00:00:33, and 00:00:35)
  • Playground responds with an accurate summary of the emotional and physical actions captured in the input. This would be essential for post-session summarizations
  • Results are better when more details are entered in the system messages
  • AI Studio has access only to the transcript, so any audio will not be picked up (e.g., guitar strumming)
  • Even with the definition of strumming, Drew's body movement while playing the guitar, and hand movement cues that signal strumming, the AI Studio could not identify the duration in which Drew was playing the guitar.
  • We can only use a singular video at any time, to use multiple videos we must clear chat. If we don't clear chat the upload button grays out.
  • Inputs (e.g videos) with the subject centered in the frame (videos not taken from an angle or side view) lead to more accurate results. Especially when watching out for physical or behavioural actions.
  • When asked to provide the transcript, AI Studio could only provide a snippet. I asked for the full transcript of the video from 00:00:00-00:02:39, and it only provided this. "Has different stuff on it, like pizza, wraps, sandwiches. You can order three things". The video I used to test is 10-11 purchase #3.mp4
  • With a transcript provided, the AI got the questions correct.
  • AI Studio was able to recognize stimming with a definition in the system message. The system message I included is in the sample system messages file.