Skip to content

Silent Speech Recognition

The recognition model performs direct EMG-to-text transcription using CTC loss and a KenLM-backed beam search decoder.

Core API

Main Script

The recognition_model.py script manages the training loop and Word Error Rate (WER) evaluation using character-based CTC.

Lexicon & LM Setup

Utilities for preparing the decoding environment.

get_lexicon

download_kenlm

Downloads pre-trained KenLM 4-gram language model files (lm.bin, lexicon.txt, tokens.txt). Uses torchaudio utility with a fallback to direct requests for systems with SSL issues.

Parameters:

  • output_dir (str) –

    Directory where files will be saved.

Returns:

  • None

    None. Files are saved to the specified output directory.

Source code in get_lexicon.py
def download_kenlm(output_dir: str) -> None:
    """
    Downloads pre-trained KenLM 4-gram language model files (lm.bin, lexicon.txt, tokens.txt).
    Uses torchaudio utility with a fallback to direct requests for systems with SSL issues.

    Args:
        output_dir: Directory where files will be saved.

    Returns:
        None. Files are saved to the specified output directory.
    """
    os.makedirs(output_dir, exist_ok=True)
    original_cwd = os.getcwd()
    os.chdir(output_dir)

    logging.info(f"Downloading KenLM files to {output_dir}...")
    try:
        download_pretrained_files("librispeech-4-gram")
    except Exception as e:
        logging.warning(f"Torchaudio download failed ({e}), attempting manual download...")
        files = {
            "lm.bin": "https://download.pytorch.org/torchaudio/decoder-assets/librispeech-4-gram/lm.bin",
            "lexicon.txt": "https://download.pytorch.org/torchaudio/decoder-assets/librispeech-4-gram/lexicon.txt",
            "tokens.txt": "https://download.pytorch.org/torchaudio/decoder-assets/librispeech-4-gram/tokens.txt",
        }
        for filename, url in files.items():
            if not os.path.exists(filename):
                logging.info(f"Downloading {filename}...")
                response = requests.get(url, stream=True, timeout=60)
                response.raise_for_status()
                with open(filename, "wb") as f:
                    f.write(response.content)

    os.chdir(original_cwd)

get_lexicon

Generates a lexicon file where each word is mapped to its character sequence. Format: word c h a r s |

Parameters:

  • vocab (set) –

    Set of unique words.

  • output_file (str) –

    Path to save the lexicon.

Returns:

  • None

    None. Writes the lexicon to the specified output file.

Source code in get_lexicon.py
def get_lexicon(vocab: set, output_file: str) -> None:
    """
    Generates a lexicon file where each word is mapped to its character sequence.
    Format: word c h a r s |

    Args:
        vocab: Set of unique words.
        output_file: Path to save the lexicon.

    Returns:
        None. Writes the lexicon to the specified output file.
    """
    with open(output_file, "w", encoding="utf-8") as fout:
        for word in sorted(list(vocab)):
            # split word into char tokens
            chars = list(word)
            fout.write(f"{word} " + " ".join(chars) + " |\n")

get_unigram

Extracts all unique words (unigrams) from the dataset's text samples.

Parameters:

  • dataset (H5EmgDataset) –

    The H5EmgDataset to process.

Returns:

  • set

    A set of unique lowercase words.

Source code in get_lexicon.py
def get_unigram(dataset: H5EmgDataset) -> set:
    """
    Extracts all unique words (unigrams) from the dataset's text samples.

    Args:
        dataset: The H5EmgDataset to process.

    Returns:
        A set of unique lowercase words.
    """
    unigram = set()
    for example in tqdm.tqdm(dataset, "Building unigram", leave=False):
        clean = transform.clean_text(example["text"])
        for w in clean.split():
            unigram.add(w)
    return unigram