6 things you can do with the small LLMs example included

5 min readFeb 6, 2025

I’ve summarized these examples into six categories: (1) Text classification and information extraction, (2) Office and productivity assistance, (3) Dialogue/message processing and assistant replies, (4) Web/application integration and automation, (5) Entertainment, creation, and games, and (6) Model deployment, technical bottlenecks, and thinking.

Note: Commonly referred to as “small models” typically refer to parameter counts of around 0.5B-3B (e.g., Gemma 2 2B, Llama 2/3.x 3B, Qwen 2.5 1.5B, etc.) or smaller/simpler scenario-specific models.

1. Text classification and information extraction

1.1 Medical literature screening (Excel Add-In)

Someone (e.g., the Girlfriend Excel Add-In project mentioned in the thread) used Gemma 2 2B to classify thousands of paper titles and abstracts as “Include” or “Exclude” based on whether they studied diabetes-related nerve damage and stroke.
The user only needs to write a formula like =PROMPT(A1:B1, "...") in Excel and drag it down for thousands of rows to batch process without manual review.

1.2. Hospital mother-infant assistance SMS recognition (government hotline)

Someone fine-tuned a 2B-level small model to filter out emergency messages from new parents or pregnant women and mark them as high-priority, which is helpful in resource-constrained scenarios where privacy is needed.

1.3 Automatic summarization and grouping of Hacker News or forum content

Someone used GopherSignal to generate brief summaries or overviews of Hacker News posts using a small model, helping readers quickly understand the post’s content, with further integration features (e.g., filtering, sorting).

1.4 Work/recruitment information crawling and classification

In “Who is hiring” threads, users crawl all comments, extract keywords (e.g., location, remote work, programming languages) using a small model, then generate RSS feeds or other formats for internal filtering.

1.5 OCR + structured parsing

Someone combined a small model with Tesseract (or another OCR tool) to convert image text into structured JSON. For example, reading food nutrition labels, company invoices, or paper documents. These smaller models only require proper prompts and can meet specific recognition needs.

2. Office and productivity assistance

2.1 Automatically generating or optimizing Git commit messages

Someone used a small model to read the current git diff content and generate several candidate commit messages for developers to select and edit, which can speed up writing commit messages in simple scenarios but still requires human supplements.

2.2 Excel/table formula auto-completion

Microsoft once published a paper (FLAME) researching using a 60M-parameter model to repair and recommend Excel formulas, showing excellent performance in specific niches compared to larger models. Others have also implemented prompt-based function handling text data in Excel.

2.3 Filtering, auditing, or renaming

Someone used small models to automatically rename Linux ISOs, extract data, reorganize files for human review, or generate product copy (e.g., marketing ad slogans, H1 titles) and translate text using smaller local models for output in multiple languages.

2.4 Local log/summary generation

Handling meeting minutes, SMS records, personal to-do lists (RAG, Retrieval-Augmented Generation), or other local private scenarios, small models offer better control over speed and resource usage.

Code assistance/small script generation

Someone used small models as command-line assistants to generate Bash or Python scripts (e.g., ffmpeg/awk/sed/find parameters), which are fast and don’t require exposing internal sensitive data to external APIs.

3. Dialogue/message processing and assistant replies

3.1 Automated reply to spam SMS

Someone used Ollama + small models to automatically respond to spam messages with “pretend-to-be especially interested” or even design different personas (gym enthusiast, 19th-century British gentleman) to consume the recipient’s energy.

Note: The translation is provided as per the original content. However, please note that some of the examples and use cases might be specific to certain contexts or industries, and may not be universally applicable.

4. Web/App Integration and Automation

4.1 Cookie Banner Detection and Ad Blocking

Some users use Llama 3B to detect “Cookie Consent Banners” on websites, generating filtering rules (e.g., for EasyList Cookie) because most pop-up structures are similar, recognizing text for blacklisting. If encountering uncommon languages or age-restricted pop-ups, it might be a bit inaccurate.

4.2 Browser Extensions and Frontend WebGPU/WebAssembly

Some users are trying to run small models directly in the browser, even using WebGPU to optimize inference speed, achieving fully offline web-based AI processing, such as automatic text correction or translation.

4.3 Automatic Detection or Skipping Sponsored Content

For example, some users combined small models with identifying YouTube video “sponsored segments” and automatically skipping them, reducing reliance on human volunteers.

4.4 Wrapping into Backend Services

Some users specifically built a microservice (C++ implementation) to optimize llama.cpp, providing a lower-latency inference interface in the local network or on the same machine for various real-time applications.

5. Entertainment, Creation, and Games

5.1 Story/Novel/Dialogue Generation

Some users let 3B models generate infinite stories (e.g., sci-fi, fantasy) scrolling on a small screen at any time to “read novels” or use it as text adventure scripts or joke generators.

5.2 Game NPC Dialogue

Integrating 2B-7B models into the Godot game engine for dynamic NPC dialogue generation, enhancing immersion; also used for simple interactions (e.g., negotiating with a store owner), more flexible than fixed dialogues.

5.3 Automated Music Generation/Playlists

Some users let the model generate new song suggestions based on personal style and existing songs; removing disliked songs and adding new ones to iteratively form personalized playlists. The model doesn’t need to be extremely accurate but can bring some inspiration.

5.4 Entertainment Chat/Roles-Playing

Some users use small models as SMS joke generators or give them a certain persona for voice reports, such as Star Wars-style dialogue.
Others are trying to create a real-time host, adding commentary or judgment on speakers in dialogues or live streams.

6. Model Deployment, Technical Bottlenecks, and Thinking

6.1 Accuracy and Evaluation

Small models (e.g., 2B, 3B) often suffice for tasks like binary classification, simple summaries, and instruction rewriting but are weak in deeper reasoning (complex logic, mathematical precision).
Community advice: “Verify a small amount manually” or “cross-verify” when necessary; consider using larger or more targeted models.

6.2 Advantages of Small Models

Privacy: Can be deployed offline, keeping sensitive data local and not relying on external APIs.
Speed/Cost Control: Not dependent on cloud GPUs, especially suitable for computers with good CPU performance.
Adaptation to Specific Scenarios: Fine-tuned on small datasets or single tasks often matches the general capabilities of larger models.

6.3 Limitations and Notes

Context Window Limitations: Some small models have short context windows, making it difficult for long documents; consider chunking/blocking or RAG (Recurrent Attention Generator).
Weakness in Deep Logic: Often struggle with strict logic/date-time calculations.
Chinese/Multilingual Capability Differences: Many small models are trained more extensively on English data, and their performance with other languages is limited. They often need to be fine-tuned or found in multilingual versions.

More Ideas

It’s worth trying to combine “small models” with retrieval systems (RAG) or specialized vertical tools to form a “multi-agent” or “multi-tool” process, leveraging their strengths and weaknesses.
Some people have also called for sharing fine-tuned models or LoRA, but the lack of discovery, reusability, and documentation remains an issue.