Chapter 1: Debuting with PostgreSQL
YugaPlus - The Story Begins
The first version of the YugaPlus streaming platform was set for its prime-time debut. The plan involved initially launching the service for users on the US West Coast, followed by a gradual rollout to users nationwide.
The launch was successful, with the first users signing up for YugaPlus to enjoy their favorite movies, series, and sports events. The service, powered by PostgreSQL, handled the incoming traffic with ease.
In this chapter, you'll deploy one of the YugaPlus services—the movie recommendations service. This service takes user questions in plain English and uses an underlying generative AI stack (OpenAI, Spring AI, and PostgreSQL pgvector) or the full-text search capabilities of PostgreSQL to provide users with the most relevant movie recommendations.
You'll learn how to do the following:
- Deploy PostgreSQL with pgvector in Docker.
- Perform vector similarity or full-text searches in PostgreSQL.
Prerequisites
- Docker 20 or later.
- Docker Compose 1.29 or later.
- An OpenAI API key. Without the API key, the application will revert to performing full-text searches over the movie catalog, instead of vector similarity searches. Note that the full-text search capability is significantly less advanced.
Start PostgreSQL with pgvector
The pgvector extension transforms PostgreSQL into a vector database, capable of storing and accessing vectorized data. The movie recommendations service uses pgvector to provide users with highly relevant recommendations based on their input.
Follow these steps to start a PostgreSQL instance with pgvector and enable the extension:
-
Create a directory to serve as the volume for the PostgreSQL container:
rm -r ~/postgres-volume mkdir ~/postgres-volume
-
Create a custom Docker network that will be used by PostgreSQL and other containers throughout this tutorial:
docker network create yugaplus-network
-
Start a PostgreSQL container using the latest version of the image with pgvector:
docker run --name postgres --net yugaplus-network \ -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=password \ -p 5432:5432 \ -v ~/postgres-volume/:/var/lib/postgresql/data \ -d ankane/pgvector:latest
-
Check the logs to ensure the container is up and running and PostgreSQL has initialized successfully:
docker container logs postgres
-
Wait for PostgreSQL to finish the initialization and then connect to the container enabling the pgvector extension:
! while ! docker exec -it postgres pg_isready -U postgres; do sleep 1; done docker exec -it postgres psql -U postgres -c 'CREATE EXTENSION vector'
With the database operational, you're now ready to deploy the first version of YugaPlus on your machine!
Deploy YugaPlus movie recommendations service
The service is comprised of a React frontend and a Java backend. Prior knowledge of React or Java is not necessary, nor is the installation of any language-specific toolchains required. Both the frontend and backend are deployed using Docker, which automatically downloads all necessary libraries and frameworks.
Prepare for the deployment:
-
Clone the YugaPlus repository:
git clone https://github.com/YugabyteDB-Samples/yugaplus-build-and-learn-tutorial.git
-
(Optional) Create an OpenAI API key. The application requires an OpenAI embedding model for vector similarity search. If you opt not to use OpenAI, the application will default to a less advanced full-text search mode.
-
Set your OpenAI API in the
{yugaplus-project-dir}/docker-compose.yaml
file by updating theOPENAI_API_KEY
variable:- OPENAI_API_KEY=your-openai-key
Start the application:
-
Navigate to the YugaPlus project directory:
cd yugaplus-build-and-learn-tutorial
-
Build application images and start the containers:
docker-compose up --build
The yugaplus-backend
container connects to the PostgreSQL container, initializes the movie catalog, and preloads a sample dataset comprising over 2,800 movies. This dataset includes embeddings pre-generated for movie overviews using the OpenAI Embedding model (text-embedding-ada-002
). Upon successful startup, the backend will display the following messages in the terminal window:
INFO 1 --- [main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port 8080 (http) with context path ''
INFO 1 --- [main] c.y.backend.YugaPlusBackendApplication : Started YugaPlusBackendApplication in 18.681 seconds (process running for 19.173)
The yugaplus-frontend
container starts in a few seconds and can be accessed at http://localhost:3000/.
Proceed to sign in to YugaPlus! The app automatically pre-populates the sign-in form with the following user credentials:
- Username:
user1@gmail.com
- Password:
MyYugaPlusPassword
Search for your favorite movies
After you sign in, you'll see the YugaPlus home page split into two parts. In Your Movies, you'll find movies you're watching or want to watch. The Search New Movies section lets you find new movies by typing what you're looking for in plain English.
Internally, the service uses the following database schema:
movie
- this table keeps track of movies on YugaPlus. Theoverview_vector
column has 1536-dimensional vectors made using OpenAI'stext-embedding-ada-002
model from movie descriptions in theoverview
column. Theoverview_lexemes
column stores the lexemes of movie overviews that are used for full-text search.user_account
- this table holds user-specific details.user_library
- this table lists the movies users have added to their libraries.
Next, type in the following to find a movie for a space adventure:
space travel
The service will use the prompt to perform a full-text search across movie overviews. The results will appear as follows:
PostgreSQL can filter movies by rank and category before doing the full-text search. For instance, set rank to 7, choose Science Fiction as the category, and repeat the search again:
Here's the SQL query that YugaPlus uses to find the movie recommendations:
SELECT id, title, overview, vote_average, release_date FROM movie
WHERE vote_average >= :rank
AND genres @> :category::jsonb
AND overview_lexemes @@ plainto_tsquery('english', :prompt)
LIMIT :max_results
I'd like to watch a movie about a space adventure
The service turns your prompt into an embedding with OpenAI's model, then searches for similar movie descriptions. The results should look like this:
PostgreSQL can filter movies by rank and category before doing the vector search. For instance, set rank to 7, choose Science Fiction as the category, and repeat the search again:
(Hint: Pick a movie you like and add it to your library with the Add to Library button.)
Here's the SQL query that YugaPlus uses to find the movie recommendations:
SELECT id, title, overview, vote_average, release_date FROM movie
WHERE vote_average >= :rank
AND genres @> :category::jsonb
AND 1 - (overview_vector <=> :prompt_vector::vector) >= :similarity_threshold
ORDER BY overview_vector <=> :prompt_vector::vector
LIMIT :max_results
Congratulations, you've finished Chapter 1! You've successfully deployed the first version of the YugaPlus movie recommendations service and made it work on a PostgreSQL instance.
Moving on to Chapter 2, where you'll learn how to scale data and workloads using a multi-node YugabyteDB cluster.