Model Gallery

Discover and install AI models from our curated collection

8 models available
1 repositories
Documentation

Find Your Perfect Model

Filter by Model Type

Browse by Tags

gpt-oss-20b
Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of the open models: gpt-oss-120b — for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters) gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters) Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. This model card is dedicated to the smaller gpt-oss-20b model. Check out gpt-oss-120b for the larger model. Highlights Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120b run on a single H100 GPU and the gpt-oss-20b model run within 16GB of memory.

Repository: localaiLicense: apache-2.0

gpt-oss-120b
Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of the open models: gpt-oss-120b — for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters) gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters) Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. This model card is dedicated to the smaller gpt-oss-20b model. Check out gpt-oss-120b for the larger model. Highlights Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120b run on a single H100 GPU and the gpt-oss-20b model run within 16GB of memory.

Repository: localaiLicense: apache-2.0

openai_gpt-oss-20b-neo
These are NEO Imatrix GGUFs, NEO dataset by DavidAU. NEO dataset improves overall performance, and is for all use cases. Example output below (creative), using settings below. Model also passed "hard" coding test too (6 experts); no issues (IQ4_NL). (Forcing the model to create code with no dependencies and limits of coding short cuts, with multiple loops, and in real time with no blocking in a language that does not support it normally.) Due to quanting issues with this model (which result in oddball quant sizes / mixtures), only TESTED quants will be uploaded (at the moment).

Repository: localaiLicense: apache-2.0

huihui-ai_huihui-gpt-oss-20b-bf16-abliterated
This is an uncensored version of unsloth/gpt-oss-20b-BF16 created with abliteration (see remove-refusals-with-transformers to know more about it).

Repository: localaiLicense: apache-2.0

openai-gpt-oss-20b-abliterated-uncensored-neo-imatrix
These are NEO Imatrix GGUFs, NEO dataset by DavidAU. NEO dataset improves overall performance, and is for all use cases. This model uses Huihui-gpt-oss-20b-BF16-abliterated as a base which DE-CENSORS the model and removes refusals. Example output below (creative; IQ4_NL), using settings below. This model can be a little rough around the edges (due to abliteration) ; make sure you see the settings below for best operation. It can also be creative, off the shelf crazy and rational too. Enjoy!

Repository: localaiLicense: apache-2.0

meta-llama-3.1-8b-instruct:grammar-functioncall
This is the standard Llama 3.1 8B Instruct model with grammar and function call enabled. When grammars are enabled in LocalAI, the LLM is forced to output valid tools constrained by BNF grammars. This can be useful for ensuring that the model outputs are valid and can be used in a production environment. For more information on how to use grammars in LocalAI, see https://localai.io/features/openai-functions/#advanced and https://localai.io/features/constrained_grammars/.

Repository: localaiLicense: llama3.1

meta-llama-3.1-8b-instruct:Q8_grammar-functioncall
This is the standard Llama 3.1 8B Instruct model with grammar and function call enabled. When grammars are enabled in LocalAI, the LLM is forced to output valid tools constrained by BNF grammars. This can be useful for ensuring that the model outputs are valid and can be used in a production environment. For more information on how to use grammars in LocalAI, see https://localai.io/features/openai-functions/#advanced and https://localai.io/features/constrained_grammars/.

Repository: localaiLicense: llama3.1

gpt-oss-20b-esper3.1-i1
**Model Name:** gpt-oss-20b-Esper3.1 **Repository:** [ValiantLabs/gpt-oss-20b-Esper3.1](https://huggingface.co/ValiantLabs/gpt-oss-20b-Esper3.1) **Base Model:** openai/gpt-oss-20b **Type:** Instruction-tuned, reasoning-focused language model **Size:** 20 billion parameters **License:** Apache 2.0 --- ### 🔍 **Overview** gpt-oss-20b-Esper3.1 is a specialized, instruction-tuned variant of the 20B open-source GPT model, developed by **Valiant Labs**. It excels in **advanced coding, software architecture, and DevOps reasoning**, making it ideal for technical problem-solving and AI-driven engineering tasks. ### ✨ **Key Features** - **Expert in DevOps & Cloud Systems:** Trained on high-difficulty datasets (e.g., Titanium3, Tachibana3, Mitakihara), it delivers precise, actionable guidance for AWS, Kubernetes, Terraform, Ansible, Docker, Jenkins, and more. - **Strong Code Reasoning:** Optimized for complex programming tasks, including full-stack development, scripting, and debugging. - **High-Quality Inference:** Uses `bf16` precision for full-precision performance; quantized versions (e.g., GGUF) available for efficient local inference. - **Open-Source & Free to Use:** Fully open-access, built on the public gpt-oss-20b foundation and trained with community datasets. ### 📌 **Use Cases** - Designing scalable cloud architectures - Writing and optimizing infrastructure-as-code - Debugging complex DevOps pipelines - AI-assisted software development and documentation - Real-time technical troubleshooting ### 💡 **Getting Started** Use the standard `text-generation` pipeline with the `transformers` library. Supports role-based prompting (e.g., `user`, `assistant`) and performs best with high-reasoning prompts. ```python from transformers import pipeline pipe = pipeline("text-generation", model="ValiantLabs/gpt-oss-20b-Esper3.1", torch_dtype="auto", device_map="auto") messages = [{"role": "user", "content": "Design a Kubernetes cluster for a high-traffic web app with CI/CD via GitHub Actions."}] outputs = pipe(messages, max_new_tokens=2000) print(outputs[0]["generated_text"][-1]) ``` --- > 🔗 **Model Gallery Entry**: > *gpt-oss-20b-Esper3.1 – A powerful, open-source 20B model tuned for expert-level DevOps, coding, and system architecture. Built by Valiant Labs using high-quality technical datasets. Perfect for engineers, architects, and AI developers.*

Repository: localaiLicense: apache-2.0