Working with AI is powerful and useful, but they have the tendency to confidently make things up. That’s a problem but there is an emerging solution…
One way to give an AI access to relevant external information is to do a vector search, based on similarity to the question asked, and return the response in the prompt...More
Have you heard of vector databases?
They work by searching based on similarity rather than just “text contains X”.
You get embeddings from the AI model, these are like the location of where a concept or text is relative to other concepts in the model
Then you put them into a vector database like Pinecone
That lets you search by meaning
It’s useful for a whole host of things, but in our situation it can help the AI stop hallucinating by giving it the right context for the question we ask it to write a blog post about
This course is a work of fiction. Unless otherwise indicated, all the names, characters, businesses, data, places, events and incidents in this course are either the product of the author's imagination or used in a fictitious manner. Any resemblance to actual persons, living or dead, or actual events is purely coincidental.
Vector databases are an important tool for AI development. By providing a way to store and search for meaning, rather than specific keywords, vector databases can enable more accurate results when using AI tools.
Vector databases use embeddings to store information. Embeddings are locations in a dimensional chart – for example, a simplified 2D version of an embedding would have an X and Y coordinate. The actual embeddings used in AI tools usually have up to one thousand, three hundred and fifty dimensions. The closeness of two points on this chart is a way of understanding the similarity of the objects they represent. For example, the word 'dog' and the word 'cat' would be close together, while 'elephant' would be further away.
Pinecone is a vector database that can be used to store and search for meaning. To use Pinecone, you will need an OpenAI key, which can be created through the OpenAI website. You will also need to download a dataset – this example used a YouTube transcription dataset. After downloading the dataset, you will need to join the text together, and then create an index. This index is the heavy lifting of the process, and can take a while – in this example it took 28 minutes with 400 pieces of text.
Once the index is created, you can then search for content using the query with content feature. This feature allows you to inject context into the query, which can lead to more accurate results. For example, if you were to search for “what training methods show use for transformers when I only have pairs of related sentences?” without context, you may get a simple answer. However, when you search using the query with context feature, you are likely to get a much more accurate result with the relevant context.
Overall, vector databases can be a very useful tool for AI development. By providing a way to store and search for meaning, rather than specific keywords, vector databases can enable more accurate results when using AI tools. Pinecone is one of the vector databases that can be used, and by downloading a dataset, creating an index, and using the query with context feature, you can get the most out of this tool.
Complete all of the exercises first to receive your certificate!
Share This Course