Skip to main content

Crate word2vec

Crate word2vec 

Source
Expand description

§word2vec

A Word2Vec implementation in Rust supporting both Skip-gram and CBOW architectures with Negative Sampling.

§Architecture

  • vocab — Vocabulary construction, subsampling, and unigram noise table
  • model — Skip-gram and CBOW forward/backward pass
  • trainer — Training loop with monitoring and checkpointing
  • embeddings — Post-training embedding access, similarity, analogy
  • config — Hyperparameter configuration
  • error — Unified error type
  • plot — Loss curves and 2D PCA projection plots

§Quick Start

use word2vec::{Config, ModelType, Trainer};

let config = Config {
    embedding_dim: 100,
    window_size: 5,
    negative_samples: 5,
    epochs: 5,
    model: ModelType::SkipGram,
    ..Config::default()
};

let corpus = vec![
    "the quick brown fox jumps over the lazy dog".to_string(),
];

let mut trainer = Trainer::new(config);
let embeddings = trainer.train(&corpus).unwrap();

let similar = embeddings.most_similar("fox", 5);
println!("{:?}", similar);

Re-exports§

pub use config::Config;
pub use config::ModelType;
pub use embeddings::Embeddings;
pub use error::Word2VecError;
pub use trainer::Trainer;

Modules§

config
Hyperparameter configuration for Word2Vec training.
embeddings
Post-training embedding access: similarity, analogy, save/load.
error
Unified error type for the word2vec crate.
model
Neural network weights and forward/backward pass for Skip-gram and CBOW with Negative Sampling.
plot
Training visualisation: loss curves and 2D PCA word projection plots.
trainer
Training loop with progress monitoring, learning rate decay, and optional checkpointing.
vocab
Vocabulary construction with frequency counting, subsampling, and the unigram noise distribution table for negative sampling.