Blogs/AI/How To Use Open Source LLMs (Large Language Model)?

How To Use Open Source LLMs (Large Language Model)?

Written by Ajay Patel

Oct 22, 2025

5 Min Read

How To Use Open Source LLMs (Large Language Model)? Hero

For code versioning, we utilize GitHub, a platform that allows us to manage and store different versions of our code repositories. For Docker images, Docker Hub is the platform where we can store, manage, and distribute our Docker images. Similarly, for AI models, we have Hugging Face.

Hugging Face provides a centralized platform for sharing and managing AI models, allowing us to access and use pre-trained models, as well as distribute our own models with ease. Hugging Face is a platform with over 800k models and 186k datasets all open source and publicly available, in an online platform where people can easily collaborate and build together. The Hub works as a central place where anyone can explore, experiment, collaborate, and build technology with AI/Machine Learning.

Getting Started with HuggingFace

1. Create your account on Hugging Face.

2. Navigate to the models section and choose your model.

3. From the left sidebar, select the specific type of task or problem you're trying to solve. This could be anything from text generation, translation, question answering, or summarization. Select the model that best fits your needs.

For our tutorial, we are going to use the google/gemma-2-2b-it model.

Hardware Requirements To Use a LLM

For running a model, we can use either a CPU or a GPU to do the computation. A CPU performs most of the general computing tasks. On the other hand, a GPU is specifically designed to handle complex mathematical calculations. Therefore, when the model is computationally intensive, meaning it requires a lot of mathematical calculations, using a GPU can significantly reduce the inference time and make the process more efficient as compared to using a CPU.

There will not be any difference in output whatever we use. The only difference is inference time. They're like super fast assembly lines for mathematical calculations. In real life, imagine you have a large batch of packets that need to be labeled. A CPU (regular computer) would label one packet at a time, but a GPU can label several packets at once. This means that a GPU can finish the task much more quickly than a CPU. GPUs are designed with a large number of cores that can handle parallel processing tasks efficiently.

AI model inference often involves performing the same operation on a large set of data (eg. matrix multiplication), and the parallel architecture of GPUs allows them to handle these tasks simultaneously, leading to faster inference.

Google Colab Notebook

What is Google Colab?

Google Colab, short for Google Colaboratory, is a free cloud-based platform that allows you to write and execute Python code through your browser. It's essentially a Jupyter notebook environment that requires no setup and runs entirely in the cloud. Colab provides free access to computing resources including GPUs, making it an invaluable tool for data scientists, machine learning practitioners, and researchers.

Key features of Google Colab include:

Running and Using Open-Source LLMs Effectively

Discover top open-weight models, how to deploy them locally or in the cloud, and best practices for fine-tuning and evaluation.

Murtuza Kutub

Co-Founder, F22 Labs

Walk away with actionable insights on AI adoption.

Limited seats available!

Saturday, 21 Mar 2026

10PM IST (60 mins)

1. Free GPU and TPU access

2. Easy sharing and collaboration

3. Integration with Google Drive

4. Pre-installed popular libraries

5. Interactive code execution

Why are we using Google Colab?

We're utilizing Google Colab for several compelling reasons:

1. Accessibility: Colab eliminates the need for local setup, allowing us to start coding immediately without worrying about hardware constraints or software installations.

2. Free GPU access: For our LLM project, we require significant computational power. Colab provides free access to NVIDIA Tesla T4 GPUs, which are well-suited for machine learning tasks.

3. Cost-effectiveness: By leveraging Colab's free resources, we can experiment with and develop LLM models without incurring the high costs associated with purchasing or renting powerful hardware.

4. Collaboration: Colab notebooks are easy to share, making it simple to collaborate with team members or share our work with the community.

5. Flexibility: Colab supports a wide range of Python libraries and can be easily connected to other Google services like Drive, making data management and workflow integration seamless.

6. Learning and experimentation: The platform's user-friendly interface and pre-configured environment lower the barrier to entry for those new to machine learning or working with LLMs.

By using Google Colab, we can focus on the core aspects of our LLM project - coding, model development, and experimentation - without getting bogged down by infrastructure concerns or budget limitations. This allows for rapid prototyping and iteration, crucial in the fast-paced field of AI and machine learning.

Now let’s create a new Google Colab notebook, which provides access to Tesla T4 GPU machines. Create a new notebook and change runtime to T4. Now we are ready to use the LLM model on a GPU machine.

Comprehensive Practical Guide: Setting Up and Using the Gemma-2-2b-it Model

1. Installing Required Packages

!pip install transformers torch bitsandbytes accelerate huggingface_hub

2. Logging into Hugging Face for Model Access

from huggingface_hub import notebook_login
notebook_login()

3. Accepting Google's Usage License

Now lets go to Gemma-2-2b-it and accept the license to proceed with the following steps.

Running and Using Open-Source LLMs Effectively

Discover top open-weight models, how to deploy them locally or in the cloud, and best practices for fine-tuning and evaluation.

Murtuza Kutub

Co-Founder, F22 Labs

Walk away with actionable insights on AI adoption.

Limited seats available!

Saturday, 21 Mar 2026

10PM IST (60 mins)

4. Loading and Configuring Gemma-2-2b-it

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-2b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

5. Inference

query = "what is AI?"
input_ids = tokenizer(query, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=1024)
print(tokenizer.decode(outputs[0],skip_special_tokens=True))

Model Output:

what is AI?

**Artificial Intelligence (AI)** is a broad field of computer science that aims to create machines capable of performing tasks that typically require human intelligence. 

**Key Concepts:**

* **Learning:** AI systems can learn from data and improve their performance over time.
* **Reasoning:** AI systems can use logic and rules to solve problems and make decisions.
* **Problem-solving:** AI systems can identify and solve complex problems.
* **Perception:** AI systems can interpret sensory information, such as images and sounds.
* **Natural Language Processing (NLP):** AI systems can understand and generate human language.

**Types of AI:**

* **Narrow or Weak AI:** Designed to perform a specific task, like playing chess or recommending products.
* **General or Strong AI:** Hypothetical AI that possesses human-level intelligence and can perform any intellectual task.
* **Super AI:** Hypothetical AI that surpasses human intelligence in all aspects.

**Applications of AI:**

AI is used in a wide range of applications, including:

* **Healthcare:** Diagnosis, treatment planning, drug discovery.
* **Finance:** Fraud detection, risk assessment, algorithmic trading.
* **Transportation:** Self-driving cars, traffic optimization.
* **Customer service:** Chatbots, virtual assistants.
* **Entertainment:** Content creation, personalized recommendations.

**Benefits of AI:**

* **Increased efficiency and productivity.**
* **Improved decision-making.**
* **Automation of tasks.**
* **New discoveries and innovations.**

**Challenges of AI:**

* **Job displacement.**
* **Bias and fairness.**
* **Privacy and security.**
* **Ethical considerations.**


**In summary:** AI is a rapidly evolving field with the potential to revolutionize many aspects of our lives. It involves creating machines that can learn, reason, and solve problems, leading to advancements in various industries and applications. However, it also presents challenges that need to be addressed to ensure its responsible and beneficial development.

In conclusion, leveraging open-source Large Language Models (LLMs) has become increasingly accessible thanks to platforms like HuggingFace and Google Colab. These tools provide a user-friendly environment for exploring state-of-the-art models without the need for extensive hardware resources.

By following a comprehensive guide and understanding the hardware requirements, anyone can get started with LLMs, experiment with different models, and integrate advanced natural language understanding capabilities into their projects. Whether you're a researcher, developer, or enthusiast, the journey into the world of LLMs offers immense potential for innovation and discovery.

Ajay Patel

Sr. Backend Developer

Hi, I am an AI engineer with 3.5 years of experience passionate about building intelligent systems that solve real-world problems through cutting-edge technology and innovative solutions.

Share this article

Next for you

Zomato MCP Server Guide: Architecture and Features Cover

AI

Mar 13, 2026 • 7 min read

Zomato MCP Server Guide: Architecture and Features

Zomato has released an official MCP (Model Context Protocol) Server that allows AI assistants to securely interact with its food-ordering ecosystem. Instead of manually browsing restaurants, comparing menus, and checking delivery times, users could simply give a prompt like: “Find the best butter chicken under ₹400 within 3 km and order it.” With the Zomato MCP Server, developers can connect LLM-based assistants directly to Zomato’s platform without building custom API bridges. This enables str

How Call Centres Use Voice AI to Automate Conversations Cover

AI

Mar 13, 2026 • 8 min read

How Call Centres Use Voice AI to Automate Conversations

Call centers are going through one of the biggest shifts in their history, thanks to Voice AI. Instead of forcing customers to navigate long IVR menus like “Press 1 for billing, Press 2 for support,” modern systems allow callers to speak naturally and explain their problem. Voice AI listens to the caller, understands the intent, and responds in real time. It can handle tasks like order tracking, appointment scheduling, billing questions, and account updates without waiting for a human agent.

Voice AI vs Chatbots (What's the Difference)? Cover

AI

Mar 13, 2026 • 8 min read

Voice AI vs Chatbots (What's the Difference)?

Chatbots and Voice AI are both part of the conversational AI ecosystem, and both rely on large language models (LLMs) to understand and generate natural language. Because of this, many teams assume building a Voice AI system is simply adding a microphone to a chatbot. In reality, the two are very different. A chatbot processes text in a simple request-response flow: user input → LLM → response. A Voice AI system, however, must listen to speech, transcribe it, generate a response, and convert t