Blog

Building Semantic Search with Transformers.js and Sentence Embeddings

By Shittu Olumide on June 5, 2026 in Language Models 0

In this article, you will learn how sentence embeddings work and how to build a fully client-side semantic search engine using Transformers.js, with no server, no API key, and no backend infrastructure required.

Using Scikit-LLM with Open-Source LLMs

By Iván Palomares Carrascosa on June 4, 2026 in Language Models 0

Learn how to leverage Python’s novel Scikit-LLM library to utilized cutting-edge LLMs similar to classical machine learning workflows: all for free.

Scikit-LLM vs. Traditional Text Classifiers: When Should You Use an LLM?

By Iván Palomares Carrascosa on June 2, 2026 in Language Models 2

From classic techniques to state-of-the-art: implementing a benchmarking between three distinct approaches for text classification.

The Roadmap for Mastering LLMOps in 2026

By Shittu Olumide on June 1, 2026 in Language Models 0

In this article, you will learn how to build production-grade LLM systems by following a structured six-step LLMOps roadmap covering observability, evaluation, cost control, and agent orchestration. Topics we will cover include: How LLMOps differs from traditional MLOps, and what foundational skills you need before touching any LLMOps tooling. How to instrument LLM calls with […]

Serving Multiple Users at Once: How Continuous Batching Keeps LLM Inference Efficient

By Yoyo Chan on May 31, 2026 in Inference from Transformer Models 0

In the previous article, we saw how a language model processes a prompt during prefill, then generates tokens one at a time during decode, and uses KV cache to avoid repeated computation. In the real world, inference servers handle hundreds or thousands of requests at the same time. How a server schedules those requests determines […]

Building a Context Pruning Pipeline for Long-Running Agents

By Iván Palomares Carrascosa on May 28, 2026 in Artificial Intelligence 0

This article shows the basic principles to implement a context pruning pipeline for long-running agents, based on conversational continuity and semantic relevance.

The Statistics of Token Selection: Logits, Temperature, and Top-P Walkthrough

By Iván Palomares Carrascosa on May 27, 2026 in Language Models 0

In this article, you will learn how logits, temperature, and top-p sampling work together to control next-token prediction in large language models.

Building a Multi-Tool Gemma 4 Agent with Error Recovery

By Matthew Mayo on May 22, 2026 in Artificial Intelligence 0

This article builds on a previous tutorial by assuming that, when dealing with an agent, things will go wrong, and shows how to recover gracefully when they do.

Implementing Hybrid Semantic-Lexical Search in RAG

By Iván Palomares Carrascosa on May 25, 2026 in Language Models 0

In this article, you will learn how to implement a hybrid search strategy for RAG systems by combining BM25 lexical search with semantic search, fused together using Reciprocal Rank Fusion.

Building Context-Aware Search in Python with LLM Embeddings + Metadata

By Bala Priya C on May 22, 2026 in Language Models 0

In this article, you will learn how to build a context-aware semantic search engine in Python that combines embedding-based similarity with structured metadata filtering.

1 2 … 187 Next →