Categories

Versions

Release notes for the Generative Models extension, version 2.0

Released: January 19, 2024

IMPORTANT: Due to new and meaningful port names, the changed python package management, and the use of connection objects instead of connection parameters this version is no longer compatible with the previous version. If you have used the previous version in production use, we do not recommend upgrading the extension. We are sorry for the inconvenience, but it was not possible to avoid those breaking changes.

Improvements

  • The extension now uses a unified and simplified package management and provides the complete environment definition for your convenience. Please follow the documentation to set up your Python environment correctly since otherwise this extension will not work. IMPORTANT: You will need to re-create the rm_genai environment following the documentation, even if you have used the extension previously!

  • Greatly improved robustness: if single prompts or rows fail, that specific result is set to "missing" while all other rows are processed. The previous behavior was completely aborting the processing which is not desired if only few of many prompts actually fail. In addition to the missing values for such erroneous rows, there is also a log message pointing out the root cause for the specific problems.

  • The operators for working with local models (from Huggingface) now store all models in project / repository folders by default. This greatly simplifies the usage of large language models on AI Hub where access to the underlying file system may not be possible for job agents. Storage and loading to and from the file system, including using temporary folders, is still possible with the corresponding storage type settings.

  • New finetuning operators: Finetune Text Generation, Finetune Translation, and Finetune Text Classification.

  • New embedding operators: Embeddings (FastEmbed), Embeddings (OpenAI)

    Embeddings are high-dimensional vectors which represent texts in a high-dimensional search space. Vectors are similar to each other in that space if the underlying texts are also similar. Embeddings can be useful to encode texts for machine learning tasks or when used in combination with vector databases.

  • New operator for generating (meta-)prompts: Generate Prompts.

    This operator can be used to generate a new column based on a prompt template and can refer to the values of input data columns with [[column_name]]. This operator, as well as all prompt input fields of task operators, now support multiple lines. Especially this multi-line functionality makes the operator more versatile for the creation of complex prompts compared to the regular Generate Attributes operator.

  • Added operators for managing and using vector databases: Qdrant and Milvus. Vector databases can be used to store texts using their embedding vectors which make them a vital part of retrieval augmented generation (RAG) approaches. All supported vector databases have operators for creating collections, deleting them, listing all available connections, retrieving information about a collection, inserting documents into a collection using embeddings, and retrieving documents from a collection also using embeddings.

  • Finetuning Huggingface models does now support PEFT / LoRA finetuning in addition to full finetuning. On Linux systems with a CUDA GPU LoRA finetuning can also be combined with quantization (see below). For more information about LoRA and supported models please refer to: https://github.com/huggingface/peft

  • Finetuning Huggingface models does now support 8-bit and 4-bit quantization on Linus systems with CUDA GPUs. This needs to be enabled via the "quantization" parameter on supporting systems.

  • All Huggingface finetuning operators now support 16-bit (mixed) precision for training which can reduce memory footprint but potentially may lead to less accurate models. The default is "false". This option is only available on CUDA GPUs.

  • Text2Text Finetuning no longer filters out data rows which are too long for either the input or output. The texts are truncated instead.

  • All OpenAI operators have now be renamed to their original names plus " (OpenAI)" to differentiate them from other operator sets connecting to other third-party vendors.

  • Download Model (from Huggingface) now supports a Dictionary Connection with a single key-value pair named "token" to provide the Huggingface token. This connection replaces the operator parameter which has been used before. Please note that most models do not require a token, therefore using a connection is optional. If a connection is provided, it also will be delivered as an output of the operator.

  • All OpenAI operators now use a Dictionary Connection providing the "api_key" as only key-value-pair which greatly improves security and allows for better collaboration as well as deployment, e.g., on AI Hub. This connection replaces the API and organization key parameters which have been used before. The connection is also delivered as an output of the operators for convenience.

  • The task operator for Text Generation now supports additional parameters which can control how texts are generated: num_beams, penalty_alpha, no_repeat_ngram_size, do_sample, temperature parameter, top_k, and top_p. Changing those parameters can help to avoid repetitive texts or otherwise improve the quality of the produced outputs.

  • Upgraded all libraries, in particular the transformer library to 4.35.2 which allows for newer model types and safetensor model formats among many more improvements, see here: https://github.com/huggingface/transformers/releases

  • Upgraded to OpenAI version 1.3.6 and to the latest pricing information. One consequence of this is that you no longer need to provide the organization id as a parameter, but models will from now on be associated to the users who created finetuning jobs.

  • All operators show now meaningful port names and descriptions. Those names and descriptions will now also be shown in the help tooltips and in the help view.

  • All Huggingface operators now support a "data type" parameter so that users can set the used "torch.dtype" for the models. This specifies the data type under which the model should be loaded. Using lower precisions can reduce memory usage while leading to slightly less accurate results in some cases. If set to "auto" the data precision is derived from the model itself. Please note that some models have multiple versions or "revisions", and you can potentially already download a model with a lower floating-point precision.

  • All Huggingface operators now support a "revision" parameter so that users can select the version / revision of a model in case multiple ones do exist. The default is "main". The value can be a branch name, a tag name, or a commit id of the model in the Huggingface git repository. You can find the possible for each model in the file section of the model card.

  • All Huggingface task operators now support a "trust remote code" parameter. This parameter specifies whether or not to allow for custom code defined on the Huggingface Hub in their own modeling, configuration, tokenization or even pipeline files. This option should only be set to True for model repositories you trust and in which you have read the code, as it will execute code present on the Huggingface Hub on your local machine. However, it can sometimes be necessary to use this parameter in cases where the models are much newer than the underlying transformer libraries and therefore custom workarounds are needed.

  • Inputs / prompts are now trimmed before sent to the model. Some models unfortunately deliver different results if the prompt is beginning or ending with white space vs. not. Since this was very confusing to users, we decided to always strip the white space surrounding prompts to guarantee deterministic and understandable results.

  • All HF operators now use the global process random seed. Please note however that changing the seed does not necessarily will lead to variations in generated results. For example, if a Text Generation task operator does not use sampling, the results will be the same independent of the random see. often the same. Unfortunately, there are also situations where multiple runs can still lead to different outcomes, in particular with finetuned models where model weights get randomized already during model initialization for which we do not have control.

  • Send Prompt to OpenAI models also supports the global process random seed now which should make responses from OpenAI models more deterministic. However, please note that random seeds are currently a Beta feature for OpenAI and there is no guarantee that this will work or will continue to work in the future.

  • For MacOS systems with M-chip architecture the MPS support is now also activated in "Automatic" device mode. If a CUDA-enabled GPU is available, that will be used. If not, but MPS is available then that will be used. If both are not available, computation will happen on the CPU. A similar logic applies if GPU or MPS are selected instead of "Automatic": if the selected device is not available, the operators will automatically fall back to computation on CPU. Please check the log messages to see which device has been used. Please note also that some models do not support the MPS architecture. In this case users need to switch from "Automatic" or "MPS" to "CPU" to force the computation on a CPU.

  • Improved the tutorial process for the Text Generation task operator since the old tutorial did not work on MPS systems without changes and also was not producing good results to begin with.

  • The Download Model operator now offers to set a proxy server via a parameter.

  • The Huggingface model cache on AI Hub is now persistent with AI Hub 10.3.1 (this is actually a change in AI Hub but still related to this extension which is why we added this here for completeness – please note though that you will need to upgrade AI Hub to make use of this).

Bugs

  • Columns used in prompts (referred to via [[column_name]]) can now have any data type, not just nominal.

  • Fill Mask now properly fails if used in targets mode and a target is unknown to the used tokenizer of the selected model.

  • Fixed quoting in regular expressions for prompt replacements to avoid deprecation of the previous functionality in future versions of Python.