Building an AI Tutor with Spring AI, Ollama, and Vaadin

Ankit Agrahari
7 minutes ago
5 min read

AI Tutor is a web application that uses Spring Boot and Spring AI on the backend, an Ollama-hosted LLM (e.g. Google’s Gemma3) for natural language understanding, and Vaadin for the rich web UI. Its core innovation is a Retrieval-Augmented Generation (RAG) pipeline: when the user uploads course materials (PDFs, text, etc.), the app splits them into chunks, creates vector embeddings, and stores these in a PGVector-enabled PostgreSQL database. At chat time, similar chunks are retrieved and included as context so the LLM can answer questions specific to the uploaded documents, avoiding “hallucinations” .

Tech Stack and Configuration

The project uses Spring AI to glue all pieces together. In application.yml, we configure Ollama and PGVector. For example:

spring:
  datasource:
    url: jdbc:postgresql://localhost:5432/vectordb
    username: aagrahari
    password: password
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        options:
          model: gemma3:4b
          temperature: 0.7
      embedding:
        options:
          model: mxbai-embed-large
    vectorstore:
      pgvector:
        initialize-schema: true
        schema-name: public
        table-name: vector_store
        schema-validation: true
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        dimensions: 1536
        max-document-batch-size: 10000 # Optional: Maximum number of documents per batch

Here we tell Spring AI to connect to the local Ollama server (base-url), use the gemma3:4b model for chat, and use the mxbai-embed-large model for embeddings. We also enable PGVector and set the index to HNSW with cosine distance, matching the 1024-dimensional embedding vectors. When the app starts, Spring AI will initialize the database schema (via initialize-schema=true) and (if needed) pull the specified models locally so that responses are generated by the on-prem Ollama models.

Document Ingestion and Embeddings

When the user uploads a file, the backend reads and chunks it, then creates and stores embeddings. In the Vaadin MainView, we set up an Upload component and handle successful uploads.

// In MainView class:
private static @NonNull Upload getUpload(FileUploadHandler fileUploadHandler) {
        Upload docUpload = new Upload(fileUploadHandler);
        docUpload.setAcceptedFileTypes("application/pdf", ".pdf", ".doc");
        docUpload.setMaxFiles(1);
        int maxFileSizeInBytes = 10 * 1024 * 1024; // 10MB
        docUpload.setMaxFileSize(maxFileSizeInBytes);
//        docUpload.setAutoUpload(false);
        docUpload.addFileRejectedListener(event -> {
            String errorMsg = event.getErrorMessage();
            Notification notification = Notification.show(errorMsg, 5000, Notification.Position.MIDDLE);
            notification.addThemeVariants(NotificationVariant.LUMO_ERROR);
        });
        return docUpload;
    }

When a file is uploaded, we parse it and split it into chunks using Apache Tika and a text splitter. In the listener:

public void uploadToVectorDB(String directoryName){
    //Read from directory all the files
    //Split the documents and upload to VectorStore
    log.info("RAG Service processing files under directory {}", directoryName);

    PdfDocumentReaderConfig readerConfig = PdfDocumentReaderConfig
        .builder()
                .withPageExtractedTextFormatter(
                   new ExtractedTextFormatter.Builder()
                     .withNumberOfBottomTextLinesToDelete(0)
                     .withNumberOfTopPagesToSkipBeforeDelete(0)
                     .build())
                .withPagesPerDocument(1)
                .build();

    List<Resource> resources = getFilesUnderDirectory(
        directoryName);
    resources.forEach(resource -> {
        PagePdfDocumentReader pdfDocumentReader = new PagePdfDocumentReader(resource, readerConfig);
        var textSplitter = new TokenTextSplitter();
    vectorStore.accept(
        textSplitter.apply(
            pdfDocumentReader.get()));
        });
    log.info("Document uploaded to vector store!!");
}

Each Document chunk carries a metadata field source=filename. Calling vectorStore.accept(docs) instructs Spring AI to invoke the configured embedding model (mxbai-embed-large) on each text chunk and store the resulting vectors in PostgreSQL. The uploadedFiles multi-select box is updated so the user can choose which documents to include in the chat context.

Chat Service (Spring AI Integration)

The core chat logic is encapsulated in a ChatService Spring component, which is injected into the UI. In this service we build a ChatClient using Spring AI’s fluent builder. We set a system prompt to instruct the model (e.g. to be concise and knowledgeable), add an in-memory ChatMemory for conversation history, and attach a QuestionAnswerAdvisor (for RAG) and a simple logger.

public ChatService(ChatClient.Builder chatClientBuilder,
                       ChatMemory chatMemory, VectorStore vectorStore) {
        this.vectorStore = vectorStore;
        // Add a memory advisor to the chat client
        var chatMemoryAdvisor = MessageChatMemoryAdvisor
                .builder(chatMemory)
                .build();

        // Build the chat client
        chatClient = chatClientBuilder
                .defaultAdvisors(chatMemoryAdvisor)
                .build();
    }

The prompt template looks something likes this:

You are a helpful and friendly AI assistant who can answer questions based on the document uploaded strictly
in DOCUMENTS section. Anything outside the document uploaded should sent response "This questions is not related
to the document uploaded!". Use the DOCUMENTS section to answer accurately. If unsure or if the answer isn't found
in the DOCUMENTS section, simply reply "Information Not available for me respond!"

QUESTIONS:
{input}

DOCUMENTS:
{documents}

This constructor defines the RAG logic. The QuestionAnswerAdvisor uses the VectorStore to find the most relevant document chunks (based on the user’s question and selected sources). The MessageChatMemoryAdvisor ensures conversation history is preserved across prompts. We do not explicitly specify the model here; Spring AI uses the Ollama configuration we set in application.yml to choose gemma3:4b as the chat model.

To ask a question, the service provides a method that sends a prompt to the model with the proper advisors and metadata:

public Flux<String> chatStream(String message, String chatId) {

        PromptTemplate template = new PromptTemplate(sbPromptTemplate);
        Map<String, Object> promptParam = new HashMap<>();
        promptParam.put("input", message);
        promptParam.put("documents", String.join("\n", findSimilarDocuments(message)));

        return chatClient.prompt(template.create(promptParam))
                .advisors(advisorSpec ->
                        advisorSpec.param(ChatMemory.CONVERSATION_ID, chatId)
                )
                .stream()
                .content();
    }

private List<String> findSimilarDocuments(String message) {
        List<Document> similarDocuments = vectorStore.similaritySearch(
                SearchRequest.builder().query(message).topK(3).build()
        );
        return similarDocuments.stream()
                .map(Document::getText).toList();
    }

Here we build a chat request with the user’s question, pass in the conversation ID (so the in-memory history is keyed to the Vaadin UI session), and include a filter expression so that the QuestionAnswerAdvisor only searches embeddings tagged with the selected source files. The returned ChatResponse contains the LLM’s answer .

Vaadin UI and Chat View

On the front end, a Vaadin MainView assembles the upload component, file selector, and chat interface. We create a text field and button for user input, and a vertical layout to display messages. For example:

//Analyze the uploaded document
Button analyzeButton = new Button("Analyze Uploaded Documents");
analyzeButton.addThemeVariants(ButtonVariant.LUMO_PRIMARY);
analyzeButton.setEnabled(triple.getLeft());
analyzeButton.addClickListener(event -> {
   //Call the RAGService to upload the document to Vector Database.
   ragService.uploadToVectorDB(triple.getMiddle().getName());
});

docUpload.addAllFinishedListener(event -> {
    analyzeButton.setEnabled(true);
});

buttonLayout.add(analyzeButton);
mainFeatures.add(
    docUpload, buttonLayout
);

Span span = new Span("Chatbot");
Scroller scroller = new Scroller(messageList);
scroller.setHeightFull();

var messageInput = new MessageInput();
messageInput.addSubmitListener(this::onSubmit);
messageInput.setWidthFull();

getContent().add(
         header,
         mainFeatures,
         span,
         scroller,
         messageInput
);

The handleNewMessageRequest lambda reads the user’s input, displays it in the UI, calls the ChatService.ask(...) method, and then appends the assistant’s answer to the chat layout:

private void onSubmit(MessageInput.SubmitEvent submitEvent) {
    //create and handle a prompt message
    var promptMessage = new MessageListItem(submitEvent.getValue(), Instant.now(), "User");
    promptMessage.setUserColorIndex(0);
    messageList.addItem(promptMessage);

    //create and handle the response message
    var responseMessage = new MessageListItem("", Instant.now(), "Bot");
    responseMessage.setUserColorIndex(1);
    messageList.addItem(responseMessage);

    //append a response message to the existing UI
    var userPrompt = submitEvent.getValue();
    var uiOptional = submitEvent.getSource().getUI();
    uiOptional.ifPresent(ui -> 
         chatService.chatStream(userPrompt, chatId)
                .subscribe(token ->
                    ui.access(() -> 
                        responseMessage.appendText(token))));
}

This code uses Spring AI’s ChatService to fetch an answer. The session ID ensures each browser session has its own chat history. After sending the message, the Vaadin view scrolls and shows the assistant’s response in the conversation.

In summary, the MainView wires together:

An Upload component (for PDF) to upload documents.
A message display area (chatContainer) and input controls.
The ChatService, which runs the RAG-enabled Spring AI logic.

Putting It All Together

Figure: High-level RAG flow. Documents are split and embedded in a PGVector DB, and an Ollama-hosted LLM (Gemma3) is queried with retrieved context. (Illustration) — *Figure: High-level RAG flow. Documents are split and embedded in a PGVector DB, and an Ollama-hosted LLM (Gemma3) is queried with retrieved context.* **(Illustration)**

With these pieces, the AI Tutor works as follows:

Document Upload: User uploads course materials. The backend chunks and embeds these, storing vectors in PGVector.
Question Input: User asks a question in the chat UI.
Retrieval: Spring AI’s QuestionAnswerAdvisor queries PGVector for chunks similar to the question (limited to chosen source files).
Response: The LLM (Gemma3 via Ollama) receives the question plus retrieved context, and generates a tailored answer.
Display: The Vaadin UI shows the answer to the user.