

MeoMaya Documentation
Docs1. Introduction
MeoMaya is a lightweight, high-performance Natural Language Processing (NLP) framework built entirely in Python. It is designed to be simple, modular, and efficient, making it an ideal choice for developers, researchers, and students who need a powerful NLP toolkit without the overhead of larger, more complex libraries.
The framework provides a complete text processing pipeline, including normalization, tokenization, part-of-speech (POS) tagging, and parsing. Additionally, it features a pure-Python machine learning stack with a TF-IDF vectorizer and a centroid-based classifier, allowing for straightforward implementation of text classification and analysis tasks.
Philosophy: MeoMaya is built on the principle of providing core NLP functionalities in a clear and accessible manner. It prioritizes speed and low resource consumption, making it suitable for a wide range of applications, from web backends to embedded systems.
2. Installation
Prerequisites
- Python 3.11 or higher
pip
package manager
Steps
1. Clone the repository:
git clone https://github.com/KashyapSinh-Gohil/meomaya.git
cd meomaya
2. Install core dependencies:
python -m venv .venv
source .venv/bin/activate
pip install -r meomaya/requirements.txt
3. Optional Dependencies:
For Indian language support, you need indic-nlp-library
:
pip install indic-nlp-library
For API and development tools, install the development dependencies:
pip install -r requirements-dev.txt
3. Quick Start
Using the Modelify
Class
The Modelify
class is the simplest entry point to MeoMaya's NLP pipeline. It encapsulates all the core components and runs them in the correct order.
from meomaya.core.modelify import Modelify
# Initialize the model for text processing
m = Modelify(mode="text")
# Run the pipeline on your text
result = m.run("MeoMaya makes NLP easy and fun!")
# The result is a dictionary containing the processed text
import json
print(json.dumps(result, indent=2))
Using the CLI
For quick tasks, you can use MeoMaya directly from your terminal. This is useful for testing or integrating with shell scripts.
python -m meomaya "MeoMaya is great for command-line use." --mode text
4. Core Components
MeoMaya's core functionality is built around a pipeline of four main components. You can use them individually or together to build custom NLP workflows.
Normalizer
The Normalizer
cleans and standardizes text. Its primary function is to convert text to lowercase, but it can be extended to handle other normalization tasks like removing punctuation or expanding contractions.
from meomaya.core.normalizer import Normalizer
normalizer = Normalizer(lang="en")
normalized_text = normalizer.normalize("This is an EXAMPLE sentence with some UPPERCASE words.")
# Output: "this is an example sentence with some uppercase words."
Tokenizer
The Tokenizer
breaks down a string of text into a list of individual tokens. This is a fundamental step in most NLP tasks. MeoMaya's tokenizer is designed to handle various languages and can be customized.
from meomaya.core.tokenizer import Tokenizer
tokenizer = Tokenizer(lang="en")
tokens = tokenizer.tokenize("Hello, world! This is MeoMaya.")
# Output: ['Hello', ',', 'world', '!', 'This', 'is', 'MeoMaya', '.']
Tagger
The Tagger
, or Part-of-Speech (POS) Tagger, assigns a grammatical category (like noun, verb, adjective, etc.) to each token. This provides valuable semantic information for further analysis.
from meomaya.core.tagger import Tagger
tagger = Tagger(lang="en")
tagged_tokens = tagger.tag(['MeoMaya', 'is', 'a', 'powerful', 'tool'])
# Output: [('MeoMaya', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('powerful', 'JJ'), ('tool', 'NN')]
Parser
The Parser
analyzes the grammatical structure of a sentence and creates a dependency parse tree. This reveals the relationships between words in the sentence. Currently, the parser in MeoMaya is a placeholder for a more complex implementation, but it demonstrates the structure of the pipeline.
from meomaya.core.parser import Parser
parser = Parser(lang="en")
parse_tree = parser.parse([('MeoMaya', 'NNP'), ('is', 'VBZ'), ('cool', 'JJ')])
# Output: {'tree': [('MeoMaya', 'NNP'), ('is', 'VBZ'), ('cool', 'JJ')]}
5. Machine Learning Utilities
MeoMaya includes a set of pure-Python machine learning tools for common NLP tasks like text classification.
Vectorizer
The Vectorizer
converts text documents into numerical representations. It uses the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm, which reflects how important a word is to a document in a collection or corpus.
from meomaya.ml.vectorizer import Vectorizer
texts = [
"MeoMaya is a great NLP framework.",
"I enjoy using Python for NLP.",
"This framework is fast and efficient."
]
vectorizer = Vectorizer()
X = vectorizer.fit_transform(texts)
# X is a list of TF-IDF vectors, representing the input texts
for i, doc in enumerate(texts):
print(f"Document {i+1}: {X[i]}")
Classifier
The Classifier
is a centroid-based classifier that uses cosine similarity to determine the category of a given text. It's a simple yet effective algorithm that works well for many text classification problems. For a more detailed example, including model training, saving, and prediction, please refer to the Advanced Sentiment Demo section.
from meomaya.ml.classifier import Classifier
from meomaya.ml.vectorizer import Vectorizer
# Example: Basic classification after training
# (Assuming vectorizer and classifier are already trained and loaded)
# vectorizer = ...
# classifier = ...
# new_text = "This is a great product!"
# X_new = vectorizer.transform([new_text])
# prediction = classifier.classify(X_new)
# print(prediction)
6. REST API
Run the local FastAPI server (no third-party services required):
uvicorn meomaya.api.server:app --host 0.0.0.0 --port 8000
Call the API:
curl -X POST http://localhost:8000/run -H 'Content-Type: application/json' \
-d '{"input": "Hello from MeoMaya!", "mode": "text"}'
Batch endpoint:
curl -X POST http://localhost:8000/run/batch -H 'Content-Type: application/json' \
-d '{"inputs": ["hi", "there"], "mode": "text"}'
7. Hardware Selection
MeoMaya detects hardware automatically (CPU/CUDA/MPS) without requiring torch by default.
from meomaya.core.hardware import select_device
print(select_device()) # 'cpu', 'cuda', or 'mps'
Override via environment variable:
export MEOMAYA_DEVICE=cpu
5.1 Advanced Sentiment Demo
The full_nlp_workflow_demo.py
script located in the meomaya/examples/
directory demonstrates the MeoMaya NLP workflow end-to-end, including vectorization and classification.
Training the Model
Before you can predict sentiment, you need to train the model. This command will train the model using a sample dataset and save the trained vectorizer and classifier to the sentiment_model/
directory.
python meomaya/examples/full_nlp_workflow_demo.py
Note: The PYTHONPATH
environment variable is set to ensure the script can locate the MeoMaya modules. You only need to train the model once.
Batch processing example
The demo also shows how to process multiple inputs using TextPipeline.process_batch
.
6. Command-Line Interface (CLI)
MeoMaya's CLI provides a convenient way to access its features without writing any Python code.
Basic Usage
Use the module entry point:
python -m meomaya "Your text here" --mode text
Options
--mode
: Override auto-detected mode:text
,audio
,image
,video
,3d
,fusion
.--model
: Model name (placeholder; reserved for future model selection).
7. Advanced Usage
Building a Custom Pipeline
One of MeoMaya's strengths is its modularity. You can easily create your own custom NLP pipelines by combining the core components in different ways.
from meomaya.core.normalizer import Normalizer
from meomaya.core.tokenizer import Tokenizer
from meomaya.core.tagger import Tagger
from meomaya.core.parser import Parser
def custom_pipeline(text: str, lang: str = "en"):
"""A custom NLP pipeline that normalizes, tokenizes, and tags text."""
normalizer = Normalizer(lang)
tokenizer = Tokenizer(lang)
tagger = Tagger(lang)
normalized_text = normalizer.normalize(text)
tokens = tokenizer.tokenize(normalized_text)
tagged_tokens = tagger.tag(tokens)
return {
'normalized': normalized_text,
'tokens': tokens,
'tagged': tagged_tokens,
}
# Run the custom pipeline
result = custom_pipeline("This is a demonstration of a custom pipeline.")
print(result)
8. API Reference
Core Module
Normalizer(lang: str = "en")
normalize(text: str) -> str
Tokenizer(lang: str = "en")
tokenize(text: str) -> list[str]
Tagger(lang: str = "en")
tag(tokens: list[str]) -> list[tuple[str, str]]
Parser(lang: str = "en")
parse(tagged_tokens: list[tuple[str, str]]) -> dict
ML Module
Vectorizer()
fit(documents: list[str])
transform(documents: list[str]) -> list[list[float]]
fit_transform(documents: list[str]) -> list[list[float]]
Classifier()
train(X: list[list[float]], y: list[str])
classify(X: list[list[float]]) -> list[str]
9. Troubleshooting
Common Issues
- ImportError for
indic_nlp_library
: If you are working with Indian languages and get an import error, make sure you have installed the optional dependency:pip install indic-nlp-library
. - Incorrect Path for Corpus: When using the CLI with a corpus file, ensure you provide the correct path to the file.
- Performance: For very large datasets, consider processing the data in batches to manage memory usage.
Getting Help
If you're stuck, you can:
- Review the test files in the
tests/
directory for more usage examples. - Open an issue on the GitHub issue tracker.