Home Innovation Cloud Cloudflare enhances its AI inf...

Cloudflare enhances its AI inference platform with a strong offering and a full host of features

Cloud

Business Fortune
07 October, 2024

To assist developers in creating AI applications that are more powerful, faster, and more performant, Cloudflare, Inc., a leading provider of cloud connectivity, has announced significant new features for Workers AI, the serverless AI platform, and its collection of AI application building blocks.

Larger models, better performance analytics, quicker inference, and other advantages are now available to applications developed on Workers AI. The simplest platform for developing worldwide AI apps and doing AI inference near the user, wherever they may be in the world, is Workers AI.

With the increasing performance and compact size Cloudflare's of large language models (LLMs), network speeds will become the limiting factor for client adoption and smooth AI interactions. Cloudflare’s globally spread network helps to minimize network latency, putting it apart from competing networks that are often made up of concentrated resources in confined data centers.

With GPUs located in more than 180 cities worldwide, serverless inference platform Workers AI was designed to be globally accessible and offer low latency times for end users everywhere. Workers AI has one of the biggest global footprints of any AI platform thanks to this network of GPUs. Its goal is to do AI inference locally, as close to the user as possible, and to support the retention of consumer data closer to home.

Additionally, Cloudflare is launching new features that make it the most user-friendly platform for developing AI applications:

Upgraded performance and support for larger models: Cloudflare is upgrading its global network with powerful GPUs for Workers AI to improve AI inference, enabling support for larger models like Llama 3.1 70B and Llama 3.2 (1B, 3B, 11B, and 90B soon). This enhances response times and context capacity, allowing AI applications to perform complex tasks more efficiently for better user experiences.

Improved monitoring and optimizing of AI usage with persistent logs: New persistent logs in AI Gateway's open beta let developers store users' prompts and model responses for better performance analysis. This feature provides insights into user experiences, including request cost and duration, aiding application refinement. Since last year's launch, over two billion requests have been processed.

Faster and more affordable queries: Vector databases enhance models' recall of previous inputs, aiding search, recommendations, and text generation. Cloudflare's Vectorize is now available, supporting up to five million vectors, a significant increase from 200,000. Median query latency improved from 549 ms to 31 ms, enabling faster and more cost-effective AI applications.