User:Ruben/Prototyping/Sound and Voice: Difference between revisions
< User:Ruben | Prototyping
(Created page with "A project using voice recognition ([http://cmusphinx.sourceforge.net/ Pocketsphinx]) with Python. <ref name='tutorial>https://mattze96.safe-ws.de/blog/?p=640</ref> <referenc...") |
No edit summary |
||
Line 1: | Line 1: | ||
A project using voice recognition ([http://cmusphinx.sourceforge.net/ Pocketsphinx]) with Python. <ref name='tutorial>https://mattze96.safe-ws.de/blog/?p=640</ref> | A project using voice recognition ([http://cmusphinx.sourceforge.net/ Pocketsphinx]) with Python. <ref name='tutorial>https://mattze96.safe-ws.de/blog/?p=640</ref> | ||
This script has undergone many iterations. | |||
The first version merely extracted the spoken pieces. | |||
[[File:VoiceDetection1.png|200px|thumbnail|right|ugly graph of the second version]] | |||
The second version created an ugly graph to show how many was spoken in a certain part of a film (according to speech recognitions, which often detects things which are not there) | |||
[[File:VoiceDetection2.png|200px|thumbnail|right|A third version]] | |||
A third version could detect the spoken language using Pocketsphinx. Then it used ffmpeg and imagemagick to extract frames from the film, which are appended into a single image. This image is then overlaid by a black gradient when there is spoken text, as to 'hide' the image. | |||
<references></references> | <references></references> |
Revision as of 00:03, 15 January 2015
A project using voice recognition (Pocketsphinx) with Python. [1]
This script has undergone many iterations.
The first version merely extracted the spoken pieces.
The second version created an ugly graph to show how many was spoken in a certain part of a film (according to speech recognitions, which often detects things which are not there)
A third version could detect the spoken language using Pocketsphinx. Then it used ffmpeg and imagemagick to extract frames from the film, which are appended into a single image. This image is then overlaid by a black gradient when there is spoken text, as to 'hide' the image.