Expand description
§word2vec
A Word2Vec implementation in Rust supporting both Skip-gram and CBOW architectures with Negative Sampling.
§Architecture
vocab— Vocabulary construction, subsampling, and unigram noise tablemodel— Skip-gram and CBOW forward/backward passtrainer— Training loop with monitoring and checkpointingembeddings— Post-training embedding access, similarity, analogyconfig— Hyperparameter configurationerror— Unified error typeplot— Loss curves and 2D PCA projection plots
§Quick Start
use word2vec::{Config, ModelType, Trainer};
let config = Config {
embedding_dim: 100,
window_size: 5,
negative_samples: 5,
epochs: 5,
model: ModelType::SkipGram,
..Config::default()
};
let corpus = vec![
"the quick brown fox jumps over the lazy dog".to_string(),
];
let mut trainer = Trainer::new(config);
let embeddings = trainer.train(&corpus).unwrap();
let similar = embeddings.most_similar("fox", 5);
println!("{:?}", similar);Re-exports§
pub use config::Config;pub use config::ModelType;pub use embeddings::Embeddings;pub use error::Word2VecError;pub use trainer::Trainer;
Modules§
- config
- Hyperparameter configuration for Word2Vec training.
- embeddings
- Post-training embedding access: similarity, analogy, save/load.
- error
- Unified error type for the word2vec crate.
- model
- Neural network weights and forward/backward pass for Skip-gram and CBOW with Negative Sampling.
- plot
- Training visualisation: loss curves and 2D PCA word projection plots.
- trainer
- Training loop with progress monitoring, learning rate decay, and optional checkpointing.
- vocab
- Vocabulary construction with frequency counting, subsampling, and the unigram noise distribution table for negative sampling.