How the AI Terminal Finds Relevant Content

I started building my portfolio AI terminal with a simple goal but quickly hit a wall. My initial approach relied on MySQL full text lookup using BM25 ranking. While this was lightning fast for exact matches, it fell apart when someone asked a natural language question. If a user asked about backend work, my system returned nothing because the relevant content mentioned REST APIs and PHP instead. I realized that expecting users to know my exact vocabulary was a losing strategy.

To fix this, I decided to bridge the gap by converting my content into numerical vectors. I process my articles, projects, experiences, services, skills, testimonials, and GitHub activity using the OpenRouter API with the nvidia/llama-nemotron-embed-vl-1b-v2:free model. This lets my system understand that backend work is semantically related to building REST APIs with PHP. By turning text into vectors, I no longer rely on exact word overlap to surface relevant information.

I did not want to lose the raw speed of my existing keyword search, so I implemented a hybrid approach. I merge the BM25 results with my new semantic results using Reciprocal Rank Fusion. This makes sure that if a piece of content performs well in both systems, it hits the top. I also apply a 1.5 times weight to semantic hits because they are usually more helpful for natural language queries.

Here is why I built it this way:

The limitations of pure keyword search BM25 lacks any understanding of context. A question about my technologies would never find an entry for Docker or Redis if those words were missing from the query. Semantic search solves this by placing related concepts close together.
The advantage of hybrid search By combining methods, I get precise answers for specific technical queries and nuanced matches for vague questions. I get the best of both worlds without sacrificing performance.
Future proofing my code My admin panel lets me update the endpoint URL, API key, and model name. If I want to switch providers or adopt a better model, I can change these settings without ever touching my codebase.

Keeping this data fresh was my next big challenge. My admin interface includes an Embed All button that pulls from seven different tables. The system batches these into groups of twenty and sends them to the OpenRouter endpoint. Each vector is saved with a SHA 256 hash of the source text so I know exactly when content has changed and needs a refresh. I also set up a background sync that runs every six minutes to handle new GitHub activity automatically.

At query time, I load all vectors into an in memory cache that refreshes every five minutes. When a user sends a query, I embed it using the same model and cache that result for sixty seconds to keep my costs at zero. I compare the query vector against my stored vectors using cosine similarity. Anything scoring above 0.3 is treated as a semantic hit and merged into the final list. This ranked output is then sent to the AI model to generate a final answer. This setup makes my terminal feel much smarter and allows me to answer questions about my work even when users do not use my exact phrasing.