Spring AI —— AI 概念

Model 模型

AI models are algorithms designed to process and generate information, often mimicking human cognitive functions. By learning patterns and insights from large datasets, these models can make predictions, text, images, or other outputs, enhancing various applications across industries.

人工智能（AI）模型是设计用于处理和生成信息的算法，通常模仿人类的认知功能。通过从大型数据集中学习模式和洞察，这些模型能够进行预测、生成文本、图像或其他输出，从而增强各行业的各种应用。

There are many different types of AI models, each suited for a specific use case. While ChatGPT and its generative AI capabilities have captivated users through text input and output, many models and companies offer diverse inputs and outputs. Before ChatGPT, many people were fascinated by text-to-image generation models such as Midjourney and Stable Diffusion.

人工智能模型种类繁多，每种都适用于特定的用例。虽然ChatGPT及其生成式人工智能能力通过文本输入和输出吸引了用户，但许多模型和公司也提供了多样化的输入和输出。在ChatGPT出现之前，许多人就对文本到图像生成模型（如Midjourney和Stable Diffusion）着迷。

The following table categorizes several models based on their input and output types:

下表根据输入和输出类型对几种模型进行了分类：

Spring AI currently supports models that process input and output as language, image, and audio. The last row in the previous table, which accepts text as input and outputs numbers, is more commonly known as embedding text and represents the internal data structures used in an AI model. Spring AI has support for embeddings to enable more advanced use cases.

Spring AI目前支持处理语言、图像和音频作为输入和输出的模型。上表中的最后一行，即接受文本作为输入并输出数字，通常被称为文本嵌入，它代表了AI模型中使用的内部数据结构。Spring AI支持嵌入功能，以实现更高级的应用场景。

What sets models like GPT apart is their pre-trained nature, as indicated by the “P” in GPT—Chat Generative Pre-trained Transformer. This pre-training feature transforms AI into a general developer tool that does not require an extensive machine learning or model training background.

像GPT这样的模型之所以与众不同，在于其预训练的特性，这一点从GPT-Chat（生成式预训练聊天机器人）中的“P”即可看出。这一预训练特性将人工智能转变为一种通用的开发工具，无需深厚的机器学习或模型训练背景。

Prompts 提示词

Prompts serve as the foundation for the language-based inputs that guide an AI model to produce specific outputs. For those familiar with ChatGPT, a prompt might seem like merely the text entered into a dialog box that is sent to the API. However, it encompasses much more than that. In many AI Models, the text for the prompt is not just a simple string.

提示是基于语言的输入基础，用于引导人工智能模型生成特定输出。对于熟悉ChatGPT的人来说，提示可能仅仅看起来像是输入到对话框中并发送到API的文本。然而，它所包含的内容远不止于此。在许多人工智能模型中，提示的文本不仅仅是一个简单的字符串。

ChatGPT’s API has multiple text inputs within a prompt, with each text input being assigned a role. For example, there is the system role, which tells the model how to behave and sets the context for the interaction. There is also the user role, which is typically the input from the user.

ChatGPT的API在提示中包含多个文本输入，每个文本输入都被分配了一个角色。例如，系统角色会告诉模型如何表现，并为交互设置上下文。还有用户角色，这通常是用户的输入。

Crafting effective prompts is both an art and a science. ChatGPT was designed for human conversations. This is quite a departure from using something like SQL to “ask a question”. One must communicate with the AI model akin to conversing with another person.

设计有效的提示既是一门艺术，也是一门科学。ChatGPT 是为人类对话而设计的。这与使用 SQL 这样的语言来“提问”截然不同。与 AI 模型交流必须像与另一个人交谈一样。

Such is the importance of this interaction style that the term “Prompt Engineering” has emerged as its own discipline. There is a burgeoning collection of techniques that improve the effectiveness of prompts. Investing time in crafting a prompt can drastically improve the resulting output.

这种交互方式的重要性不言而喻，以至于“提示工程”一词已发展为一门独立的学科。涌现出大量技术来提高提示的有效性。投入时间精心设计提示可以显著提升最终输出效果。

Sharing prompts has become a communal practice, and there is active academic research being done on this subject. As an example of how counter-intuitive it can be to create an effective prompt (for example, contrasting with SQL), a recent research paper found that one of the most effective prompts you can use starts with the phrase, “Take a deep breath and work on this step by step.” That should give you an indication of why language is so important. We do not yet fully understand how to make the most effective use of previous iterations of this technology, such as ChatGPT 3.5, let alone new versions that are being developed.

共享提示已成为一种普遍做法，并且关于这一主题的学术研究正在积极进行。为了说明创建有效提示是多么违反直觉（例如，与SQL形成对比），最近的一篇研究论文发现，你可以使用的最有效的提示之一是以“深呼吸，一步一步来”这句话开头。这应该能让你明白语言为何如此重要。我们尚未完全了解如何最有效地利用这项技术的先前版本，如ChatGPT 3.5，更不用说正在开发的新版本了。

Prompt Templates 提示词模板

Creating effective prompts involves establishing the context of the request and substituting parts of the request with values specific to the user’s input.

创建有效的提示需要建立请求的上下文，并用用户输入的特定值替换请求中的部分内容。

This process uses traditional text-based template engines for prompt creation and management. Spring AI employs the OSS library StringTemplate for this purpose.

此流程采用传统的基于文本的模板引擎来进行提示的创建和管理。Spring AI为此使用了开源软件（OSS）库StringTemplate。

For instance, consider the simple prompt template:

例如，考虑以下简单的提示模板：

Tell me a {adjective} joke about {content}.
Copied!

In Spring AI, prompt templates can be likened to the “View” in Spring MVC architecture. A model object, typically a java.util.Map, is provided to populate placeholders within the template. The “rendered” string becomes the content of the prompt supplied to the AI model.

在Spring AI中，提示模板可以比作Spring MVC架构中的“视图”。系统会提供一个模型对象（通常为java.util.Map）来填充模板中的占位符。“渲染”后的字符串将成为提供给AI模型的提示内容。

There is considerable variability in the specific data format of the prompt sent to the model. Initially starting as simple strings, prompts have evolved to include multiple messages, where each string in each message represents a distinct role for the model.

发送给模型的提示的具体数据格式存在相当大的变异性。提示最初以简单字符串的形式出现，后来逐渐演变为包含多条消息，其中每条消息中的每个字符串都代表模型的一个不同角色。

Embeddings 嵌入

Embeddings are numerical representations of text, images, or videos that capture relationships between inputs.

嵌入是文本、图像或视频的数值表示，用于捕捉输入之间的关系。

Embeddings work by converting text, image, and video into arrays of floating point numbers, called vectors. These vectors are designed to capture the meaning of the text, images, and videos. The length of the embedding array is called the vector’s dimensionality.

嵌入（Embeddings）的工作原理是将文本、图像和视频转换为浮点数数组，即向量。这些向量旨在捕捉文本、图像和视频的含义。嵌入数组的长度称为向量的维度。

By calculating the numerical distance between the vector representations of two pieces of text, an application can determine the similarity between the objects used to generate the embedding vectors.

通过计算两段文本的向量表示之间的数值距离，应用程序可以确定用于生成嵌入向量的对象之间的相似性。

As a Java developer exploring AI, it’s not necessary to comprehend the intricate mathematical theories or the specific implementations behind these vector representations. A basic understanding of their role and function within AI systems suffices, particularly when you’re integrating AI functionalities into your applications.

作为一名探索人工智能的Java开发人员，无需深入理解这些向量表示背后的复杂数学理论或具体实现。只需对其在人工智能系统中的作用和功能有基本的了解就足够了，尤其是在将人工智能功能集成到应用程序中时。

Embeddings are particularly relevant in practical applications like the Retrieval Augmented Generation (RAG) pattern. They enable the representation of data as points in a semantic space, which is akin to the 2-D space of Euclidean geometry, but in higher dimensions. This means just like how points on a plane in Euclidean geometry can be close or far based on their coordinates, in a semantic space, the proximity of points reflects the similarity in meaning. Sentences about similar topics are positioned closer in this multi-dimensional space, much like points lying close to each other on a graph. This proximity aids in tasks like text classification, semantic search, and even product recommendations, as it allows the AI to discern and group related concepts based on their “location” in this expanded semantic landscape.

在检索增强生成（RAG）模式等实际应用中，嵌入技术尤为重要。它们能够将数据表示为语义空间中的点，这个空间类似于欧几里得几何的二维空间，但维度更高。这就好比在欧几里得几何中，平面上的点可以根据其坐标判断其距离远近；在语义空间中，点的接近程度反映了意义的相似性。在这个多维空间中，关于相似主题的句子位置更接近，就像图表上相邻的点一样。这种接近程度有助于文本分类、语义搜索甚至产品推荐等任务，因为它使人工智能能够根据相关概念在扩展的语义空间中的“位置”来识别和分组这些概念。

You can think of this semantic space as a vector.

你可以将这个语义空间想象成一个向量。

Tokens

Tokens serve as the building blocks of how an AI model works. On input, models convert words to tokens. On output, they convert tokens back to words.

词元是人工智能模型运作的基本单元。在输入时，模型将单词转换为词元；在输出时，它们将词元转换回单词。

In English, one token roughly corresponds to 75% of a word. For reference, Shakespeare’s complete works, totaling around 900,000 words, translate to approximately 1.2 million tokens.

在英语中，一个标记大致对应一个单词的75%。作为参考，莎士比亚的全部作品总计约90万个单词，转换为标记大约为120万个。

Perhaps more important is that Tokens = Money. In the context of hosted AI models, your charges are determined by the number of tokens used. Both input and output contribute to the overall token count.

或许更重要的是，Tokens即货币。在托管人工智能模型的背景下，你的收费取决于所使用的Tokens数量。输入和输出都会影响总Tokens数。

Also, models are subject to token limits, which restrict the amount of text processed in a single API call. This threshold is often referred to as the “context window”. The model does not process any text that exceeds this limit.

此外，模型受到标记限制，这限制了单个API调用中处理的文本量。这个阈值通常被称为“上下文窗口”。模型不会处理任何超过此限制的文本。

For instance, ChatGPT3 has a 4K token limit, while GPT4 offers varying options, such as 8K, 16K, and 32K. Anthropic’s Claude AI model features a 100K token limit, and Meta’s recent research yielded a 1M token limit model.

例如，ChatGPT3有4K的标记限制，而GPT4则提供了不同的选项，如8K、16K和32K。Anthropic的Claude AI模型具有100K的标记限制，而Meta最近的研究则推出了一个1M标记限制的模型。

To summarize the collected works of Shakespeare with GPT4, you need to devise software engineering strategies to chop up the data and present the data within the model’s context window limits. The Spring AI project helps you with this task.

要使用GPT4对莎士比亚的收集作品进行总结，你需要制定软件工程策略来分割数据，并在模型的上下文窗口限制内展示数据。Spring AI项目可助你完成此项任务。

Structured Output 结构化输出

The output of AI models traditionally arrives as a java.lang.String, even if you ask for the reply to be in JSON. It may be a correct JSON, but it is not a JSON data structure. It is just a string. Also, asking “for JSON” as part of the prompt is not 100% accurate.

传统上，即使你要求回复为JSON格式，人工智能模型的输出仍以java.lang.String的形式呈现。它可能是正确的JSON，但它并不是一个JSON数据结构，而只是一个字符串。此外，在提示中要求“为JSON”并不是100%准确。

This intricacy has led to the emergence of a specialized field involving the creation of prompts to yield the intended output, followed by converting the resulting simple string into a usable data structure for application integration.

这种复杂性催生了一个专门领域，该领域涉及创建提示以生成预期输出，然后将生成的简单字符串转换为可用的数据结构，以便进行应用程序集成。

The Structured output conversion employs meticulously crafted prompts, often necessitating multiple interactions with the model to achieve the desired formatting.

结构化输出转换采用精心设计的提示，通常需要与模型进行多次交互才能达到所需的格式。

Bringing Your Data & APIs to the AI Model

How can you equip the AI model with information on which it has not been trained?

你如何为人工智能模型配备它未曾训练过的信息？

Note that the GPT 3.5/4.0 dataset extends only until September 2021. Consequently, the model says that it does not know the answer to questions that require knowledge beyond that date. An interesting bit of trivia is that this dataset is around 650GB.

请注意，GPT 3.5/4.0的数据集仅涵盖至2021年9月的信息。因此，该模型表示，对于需要该日期之后信息的问题，它无法给出答案。一个有趣的小细节是，这个数据集大约有650GB。

Three techniques exist for customizing the AI model to incorporate your data:

有三种技术可用于定制人工智能模型以纳入您的数据：

Fine Tuning: This traditional machine learning technique involves tailoring the model and changing its internal weighting. However, it is a challenging process for machine learning experts and extremely resource-intensive for models like GPT due to their size. Additionally, some models might not offer this option.
微调：这是一种传统的机器学习技术，涉及对模型进行定制和更改其内部权重。然而，对于机器学习专家来说，这是一个具有挑战性的过程，而对于像GPT这样的大型模型来说，则极其耗费资源。此外，一些模型可能不提供此选项。
Prompt Stuffing: A more practical alternative involves embedding your data within the prompt provided to the model. Given a model’s token limits, techniques are required to present relevant data within the model’s context window. This approach is colloquially referred to as “stuffing the prompt.” The Spring AI library helps you implement solutions based on the “stuffing the prompt” technique otherwise known as Retrieval Augmented Generation (RAG).
提示填充：一种更实用的替代方案是将数据嵌入到提供给模型的提示中。鉴于模型的标记限制，需要采用一些技术在模型的上下文窗口内呈现相关数据。这种方法俗称“填充提示”。Spring AI库可帮助您实现基于“填充提示”技术的解决方案，该技术也被称为检索增强生成（RAG）。
Tool Calling: This technique allows registering tools (user-defined services) that connect the large language models to the APIs of external systems. Spring AI greatly simplifies code you need to write to support tool calling.
工具调用：此技术允许注册工具（用户定义的服务），这些工具将大型语言模型与外部系统的API相连接。Spring AI极大地简化了为支持工具调用而需要编写的代码。

Retrieval Augmented Generation

A technique termed Retrieval Augmented Generation (RAG) has emerged to address the challenge of incorporating relevant data into prompts for accurate AI model responses.

一种名为检索增强生成（RAG）的技术应运而生，旨在解决将相关数据融入提示以获得准确的人工智能模型响应的难题。

The approach involves a batch processing style programming model, where the job reads unstructured data from your documents, transforms it, and then writes it into a vector database. At a high level, this is an ETL (Extract, Transform and Load) pipeline. The vector database is used in the retrieval part of RAG technique.

该方法采用批处理风格的编程模型，其中作业从文档中读取非结构化数据，对其进行转换，然后将其写入向量数据库。从宏观上看，这是一个ETL（提取、转换和加载）管道。向量数据库用于RAG技术的检索部分。

As part of loading the unstructured data into the vector database, one of the most important transformations is to split the original document into smaller pieces. The procedure of splitting the original document into smaller pieces has two important steps:

在将非结构化数据加载到向量数据库的过程中，最重要的转换之一是将原始文档拆分成更小的片段。将原始文档拆分成更小片段的过程包含两个重要步骤：

Split the document into parts while preserving the semantic boundaries of the content. For example, for a document with paragraphs and tables, one should avoid splitting the document in the middle of a paragraph or table. For code, avoid splitting the code in the middle of a method’s implementation.在保持内容语义边界完整的前提下，将文档拆分为多个部分。例如，对于包含段落和表格的文档，应避免在段落或表格中间进行拆分。对于代码，应避免在方法实现过程中进行拆分。
Split the document’s parts further into parts whose size is a small percentage of the AI Model’s token limit.将文档的各个部分进一步拆分，使其大小仅为AI模型标记限制的一小部分。

The next phase in RAG is processing user input. When a user’s question is to be answered by an AI model, the question and all the “similar” document pieces are placed into the prompt that is sent to the AI model. This is the reason to use a vector database. It is very good at finding similar content.

检索增强生成（RAG）的下一阶段是处理用户输入。当用户的问题需要由人工智能（AI）模型回答时，该问题以及所有“相似”的文档片段会被放入发送给AI模型的提示中。这就是使用向量数据库的原因。向量数据库在查找相似内容方面非常出色。

The ETL Pipeline provides further information about orchestrating the flow of extracting data from data sources and storing it in a structured vector store, ensuring data is in the optimal format for retrieval when passing it to the AI model.
The ChatClient - RAG explains how to use the QuestionAnswerAdvisor to enable the RAG capability in your application.
ETL管道提供了更多关于如何编排从数据源提取数据并将其存储在结构化向量存储中的流程的信息，确保在将数据传递给人工智能模型时，数据处于最佳检索格式。
ChatClient - RAG 介绍了如何使用 QuestionAnswerAdvisor 在您的应用程序中启用 RAG 功能。

Tool Calling

Large Language Models (LLMs) are frozen after training, leading to stale knowledge, and they are unable to access or modify external data.

大型语言模型（LLM）在训练后会被冻结，从而导致知识过时，并且它们无法访问或修改外部数据。

The Tool Calling mechanism addresses these shortcomings. It allows you to register your own services as tools to connect the large language models to the APIs of external systems. These systems can provide LLMs with real-time data and perform data processing actions on their behalf.

工具调用机制解决了这些不足。它允许您将自己的服务注册为工具，以便将大型语言模型（LLM）连接到外部系统的API。这些系统可以为LLM提供实时数据，并代表它们执行数据处理操作。

Spring AI greatly simplifies code you need to write to support tool invocation. It handles the tool invocation conversation for you. You can provide your tool as a @Tool-annotated method and provide it in your prompt options to make it available to the model. Additionally, you can define and reference multiple tools in a single prompt.

Spring AI极大地简化了您为支持工具调用而需要编写的代码。它为您处理工具调用的对话。您可以将您的工具作为带有@Tool注解的方法，并在提示选项中提供它，以使其对模型可用。此外，您还可以在单个提示中定义和引用多个工具。

When we want to make a tool available to the model, we include its definition in the chat request. Each tool definition comprises of a name, a description, and the schema of the input parameters.当我们想为模型提供一个工具时，我们会将该工具的定义包含在聊天请求中。每个工具定义都包含一个名称、一段描述以及输入参数的架构。
When the model decides to call a tool, it sends a response with the tool name and the input parameters modeled after the defined schema.当模型决定调用某个工具时，它会根据已定义的模式，发送一个包含工具名称和输入参数的响应。
The application is responsible for using the tool name to identify and execute the tool with the provided input parameters.该应用程序负责使用工具名称来识别并执行具有所提供输入参数的工具。
The result of the tool call is processed by the application.工具调用的结果由应用程序处理。
The application sends the tool call result back to the model.应用程序将工具调用结果发送回模型。
The model generates the final response using the tool call result as additional context.该模型使用工具调用结果作为额外上下文来生成最终响应。

Follow the Tool Calling documentation for further information on how to use this feature with different AI models.

有关如何将此功能与不同AI模型结合使用的更多信息，请参阅Tool Calling文档。

Evaluating AI responses

Effectively evaluating the output of an AI system in response to user requests is very important to ensuring the accuracy and usefulness of the final application. Several emerging techniques enable the use of the pre-trained model itself for this purpose.

有效评估人工智能系统响应用户请求的输出对于确保最终应用的准确性和实用性至关重要。几种新兴技术使得能够使用预训练模型本身来实现这一目的。

This evaluation process involves analyzing whether the generated response aligns with the user’s intent and the context of the query. Metrics such as relevance, coherence, and factual correctness are used to gauge the quality of the AI-generated response.

这一评估过程包括分析生成的回复是否符合用户的意图和查询的上下文。相关性、连贯性和事实正确性等指标用于衡量人工智能生成回复的质量。

One approach involves presenting both the user’s request and the AI model’s response to the model, querying whether the response aligns with the provided data.

一种方法是将用户的请求和人工智能模型的响应同时呈现给模型，并询问其响应是否与提供的数据一致。

Furthermore, leveraging the information stored in the vector database as supplementary data can enhance the evaluation process, aiding in the determination of response relevance.

此外，利用向量数据库中存储的信息作为补充数据，可以优化评估过程，有助于确定响应的相关性。

The Spring AI project provides an Evaluator API which currently gives access to basic strategies to evaluate model responses. Follow the Evaluation Testing documentation for further information.

Spring AI项目提供了一个评估器API，该API目前提供了评估模型响应的基本策略。如需更多信息，请参阅评估测试文档。