A vector database is a type of database that stores and manages unstructured data, such as text, images, or audio, in vector embeddings (high-dimensional vectors) to make it easy to find and retrieve similar objects quickly.
How do you make that more tangible, to a software engineer?
Some parts to answering this question:
- What is meant by "high dimensional" vectors? Does that just mean the array of numbers is a big array? Why is that important / highlighted?
- How is text, for example, converted to a vector of numbers (at a high level)?
- Why do they call them vector "embeddings"?
- I don't need to learn the theory of vectors, how can I understand vector databases practical function and use (as someone who wants to build an AI assistant), without learning about the theory of vectors exactly?
I am an experienced software developer with experience with relational databases like PostgreSQL, and other databases like MongoDB (a doc database) or Neo4j (a graph database). So how does a vector database work exactly, at a high level? I just would like to have a practical sense of how it works, so I know what I'm saving to when saving to pinecone, and what I'm querying and maybe how a query roughly works.
Relational databases are easy to understand. You just look at a spreadsheet, and the columns have names. Graphs are also easy to understand, you have your named models/types, and they have links between them. But vector databases? Is it literally a hashmap of key = something and value = array of numbers? Or what is it, how to get a practical sense like these other database types?
References I've used to get a slightly better understanding so far:
- https://towardsdatascience.com/explaining-vector-databases-in-3-levels-of-difficulty-fc392e48ab78 (unfortunately, pay-walled)
- https://developers.google.com/machine-learning/crash-course/embeddings/video-lecture?ref=blog.apify.com
- https://www.pinecone.io/learn/vector-embeddings-for-developers/
- https://blog.apify.com/what-is-a-vector-database/
- https://blog.apify.com/what-is-pinecone-why-use-it-with-llms/
