article thumbnail

Create, Share, and Scale Enterprise AI Workflows with NVIDIA AI Workbench, Now in Beta

Nvidia

The most common quantization used for this LoRA fine-tuning workflow is 4-bit quantization, which provides a decent balance between model performance and fine-tuning feasibility. This amounts to 2041. Notice that the base model doesn’t perform well out of the box. The actual answer is 7 x 17 x 17.

AI 52