Deep Speech import

Loading TSV file:  /home/kenneth/Projects/JSALT_NPLM_data/Speech/Deep_Speech/cro/train.tsv
Saving new DeepSpeech-formatted CSV file to:  /home/kenneth/Projects/JSALT_NPLM_data/Speech/Deep_Speech/cro/clips/train.csv
Importing mp3 files...
Progress |##########################################################################################################| 100% completedWriting CSV file for DeepSpeech.py as:  /home/kenneth/Projects/JSALT_NPLM_data/Speech/Deep_Speech/cro/clips/train.csv
Progress |##########################################################################################################| 100% completed
Imported 50975 samples.
Skipped 278 samples that failed on transcript validation.
Skipped 32 samples that were too short to match the transcript.
Skipped 254 samples that were longer than 10 seconds.
Final amount of imported audio: 52:28:24.
Loading TSV file:  /home/kenneth/Projects/JSALT_NPLM_data/Speech/Deep_Speech/cro/test.tsv
Saving new DeepSpeech-formatted CSV file to:  /home/kenneth/Projects/JSALT_NPLM_data/Speech/Deep_Speech/cro/clips/test.csv
Importing mp3 files...
Progress |######################################################################################################### |  99% completedWriting CSV file for DeepSpeech.py as:  /home/kenneth/Projects/JSALT_NPLM_data/Speech/Deep_Speech/cro/clips/test.csv
Progress |##########################################################################################################| 100% completed
Imported 6374 samples.
Skipped 23 samples that failed on transcript validation.
Skipped 2 samples that were too short to match the transcript.
Skipped 31 samples that were longer than 10 seconds.
Final amount of imported audio: 6:34:09.
Loading TSV file:  /home/kenneth/Projects/JSALT_NPLM_data/Speech/Deep_Speech/cro/dev.tsv
Saving new DeepSpeech-formatted CSV file to:  /home/kenneth/Projects/JSALT_NPLM_data/Speech/Deep_Speech/cro/clips/dev.csv
Importing mp3 files...
Progress |######################################################################################################### |  99% completedWriting CSV file for DeepSpeech.py as:  /home/kenneth/Projects/JSALT_NPLM_data/Speech/Deep_Speech/cro/clips/dev.csv
Progress |##########################################################################################################| 100% completed
Imported 6371 samples.
Skipped 36 samples that failed on transcript validation.
Skipped 2 samples that were too short to match the transcript.
Skipped 34 samples that were longer than 10 seconds.
Final amount of imported audio: 6:34:15.

Unique characters to add to alphabet

['b', 'm', 'f', ' ', 'e', 'r', 'ú', 'l', 'd', '\uf009', 'k', 'c', 'g', 'i', 'a', 'h', 'p', 'u', 'o', 'w', 'á', 't', 's', 'n', 'x']

the kenlm model needs to be built. Use the data fran sent and check this out https://github.com/mozilla/DeepSpeech/issues/1411

Deep Speech import

Crow res

Generate trie

Deep Speech Scorer Refactor

Potentially using tensorflow api in python instead of c++

Data stats

Conversion of sph files

New Page

Deep Speech import

No Comments