Facebook iconHow To Use Open Source LLMs (Large Language Model)?
Blogs/AI

How To Use Open Source LLMs (Large Language Model)?

Written by Ajay Patel
Oct 22, 2025
5 Min Read
How To Use Open Source LLMs (Large Language Model)? Hero

For code versioning, we utilize GitHub, a platform that allows us to manage and store different versions of our code repositories. For Docker images, Docker Hub is the platform where we can store, manage, and distribute our Docker images. Similarly, for AI models, we have Hugging Face

Hugging Face provides a centralized platform for sharing and managing AI models, allowing us to access and use pre-trained models, as well as distribute our own models with ease. Hugging Face is a platform with over 800k models and 186k datasets all open source and publicly available, in an online platform where people can easily collaborate and build together. The Hub works as a central place where anyone can explore, experiment, collaborate, and build technology with AI/Machine Learning.

Getting Started with HuggingFace

1. Create your account on Hugging Face.

2. Navigate to the models section and choose your model. 

3. From the left sidebar, select the specific type of task or problem you're trying to solve. This could be anything from text generation, translation, question answering, or summarization. Select the model that best fits your needs. 

For our tutorial, we are going to use the google/gemma-2-2b-it model.

Hardware Requirements To Use a LLM

For running a model, we can use either a CPU or a GPU to do the computation. A CPU performs most of the general computing tasks. On the other hand, a GPU is specifically designed to handle complex mathematical calculations. Therefore, when the model is computationally intensive, meaning it requires a lot of mathematical calculations, using a GPU can significantly reduce the inference time and make the process more efficient as compared to using a CPU. 

There will not be any difference in output whatever we use. The only difference is inference time. They're like super fast assembly lines for mathematical calculations. In real life, imagine you have a large batch of packets that need to be labeled. A CPU (regular computer) would label one packet at a time, but a GPU can label several packets at once. This means that a GPU can finish the task much more quickly than a CPU. GPUs are designed with a large number of cores that can handle parallel processing tasks efficiently. 

AI model inference often involves performing the same operation on a large set of data (eg. matrix multiplication), and the parallel architecture of GPUs allows them to handle these tasks simultaneously, leading to faster inference. 

Google Colab Notebook

What is Google Colab?

Google Colab, short for Google Colaboratory, is a free cloud-based platform that allows you to write and execute Python code through your browser. It's essentially a Jupyter notebook environment that requires no setup and runs entirely in the cloud. Colab provides free access to computing resources including GPUs, making it an invaluable tool for data scientists, machine learning practitioners, and researchers.

Key features of Google Colab include:

Open Source LLMs: How to Run and Customize Your Own
Exploring the future of artificial intelligence
Murtuza Kutub
Murtuza Kutub
Co-Founder, F22 Labs

Walk away with actionable insights on AI adoption.

Limited seats available!

Calendar
Saturday, 1 Nov 2025
10PM IST (60 mins)

1. Free GPU and TPU access

2. Easy sharing and collaboration

3. Integration with Google Drive

4. Pre-installed popular libraries

5. Interactive code execution

Why are we using Google Colab?

We're utilizing Google Colab for several compelling reasons:

1. Accessibility: Colab eliminates the need for local setup, allowing us to start coding immediately without worrying about hardware constraints or software installations.

2. Free GPU access: For our LLM project, we require significant computational power. Colab provides free access to NVIDIA Tesla T4 GPUs, which are well-suited for machine learning tasks.

3. Cost-effectiveness: By leveraging Colab's free resources, we can experiment with and develop LLM models without incurring the high costs associated with purchasing or renting powerful hardware.

4. Collaboration: Colab notebooks are easy to share, making it simple to collaborate with team members or share our work with the community.

5. Flexibility: Colab supports a wide range of Python libraries and can be easily connected to other Google services like Drive, making data management and workflow integration seamless.

6. Learning and experimentation: The platform's user-friendly interface and pre-configured environment lower the barrier to entry for those new to machine learning or working with LLMs.

By using Google Colab, we can focus on the core aspects of our LLM project - coding, model development, and experimentation - without getting bogged down by infrastructure concerns or budget limitations. This allows for rapid prototyping and iteration, crucial in the fast-paced field of AI and machine learning.

Now let’s create a new Google Colab notebook, which provides access to Tesla T4 GPU machines. Create a new notebook and change runtime to T4. Now we are ready to use the LLM model on a GPU machine. 

Comprehensive Practical Guide: Setting Up and Using the Gemma-2-2b-it Model

1. Installing Required Packages

!pip install transformers torch bitsandbytes accelerate huggingface_hub

2. Logging into Hugging Face for Model Access

from huggingface_hub import notebook_login
notebook_login()
Add your token and login

3. Accepting Google's Usage License

Now lets go to Gemma-2-2b-it and accept the license to proceed with the following steps.

Open Source LLMs: How to Run and Customize Your Own
Exploring the future of artificial intelligence
Murtuza Kutub
Murtuza Kutub
Co-Founder, F22 Labs

Walk away with actionable insights on AI adoption.

Limited seats available!

Calendar
Saturday, 1 Nov 2025
10PM IST (60 mins)

4. Loading and Configuring Gemma-2-2b-it

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-2b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

5. Inference

query = "what is AI?"
input_ids = tokenizer(query, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=1024)
print(tokenizer.decode(outputs[0],skip_special_tokens=True))

Model Output:

what is AI?

**Artificial Intelligence (AI)** is a broad field of computer science that aims to create machines capable of performing tasks that typically require human intelligence. 

**Key Concepts:**

* **Learning:** AI systems can learn from data and improve their performance over time.
* **Reasoning:** AI systems can use logic and rules to solve problems and make decisions.
* **Problem-solving:** AI systems can identify and solve complex problems.
* **Perception:** AI systems can interpret sensory information, such as images and sounds.
* **Natural Language Processing (NLP):** AI systems can understand and generate human language.

**Types of AI:**

* **Narrow or Weak AI:** Designed to perform a specific task, like playing chess or recommending products.
* **General or Strong AI:** Hypothetical AI that possesses human-level intelligence and can perform any intellectual task.
* **Super AI:** Hypothetical AI that surpasses human intelligence in all aspects.

**Applications of AI:**

AI is used in a wide range of applications, including:

* **Healthcare:** Diagnosis, treatment planning, drug discovery.
* **Finance:** Fraud detection, risk assessment, algorithmic trading.
* **Transportation:** Self-driving cars, traffic optimization.
* **Customer service:** Chatbots, virtual assistants.
* **Entertainment:** Content creation, personalized recommendations.

**Benefits of AI:**

* **Increased efficiency and productivity.**
* **Improved decision-making.**
* **Automation of tasks.**
* **New discoveries and innovations.**

**Challenges of AI:**

* **Job displacement.**
* **Bias and fairness.**
* **Privacy and security.**
* **Ethical considerations.**


**In summary:** AI is a rapidly evolving field with the potential to revolutionize many aspects of our lives. It involves creating machines that can learn, reason, and solve problems, leading to advancements in various industries and applications. However, it also presents challenges that need to be addressed to ensure its responsible and beneficial development.

In conclusion, leveraging open-source Large Language Models (LLMs) has become increasingly accessible thanks to platforms like HuggingFace and Google Colab. These tools provide a user-friendly environment for exploring state-of-the-art models without the need for extensive hardware resources. 

By following a comprehensive guide and understanding the hardware requirements, anyone can get started with LLMs, experiment with different models, and integrate advanced natural language understanding capabilities into their projects. Whether you're a researcher, developer, or enthusiast, the journey into the world of LLMs offers immense potential for innovation and discovery.

Author-Ajay Patel
Ajay Patel

Hi, I am an AI engineer with 3.5 years of experience passionate about building intelligent systems that solve real-world problems through cutting-edge technology and innovative solutions.

Share this article

Phone

Next for you

How to Use UV Package Manager for Python Projects Cover

AI

Oct 29, 20254 min read

How to Use UV Package Manager for Python Projects

Managing Python packages and dependencies has always been a challenge for developers. Tools like pip and poetry have served well for years, but as projects grow more complex, these tools can feel slow and cumbersome.  UV is a modern, high-performance Python package manager written in Rust, built as a drop-in replacement for pip and pip-tools. It focuses on speed, reliability, and ease of use rather than adding yet another layer of complexity. According to benchmarks from Astral, UV installs pac

15 Best AI Code Generators of 2025 (Reviewed) Cover

AI

Oct 17, 202521 min read

15 Best AI Code Generators of 2025 (Reviewed)

With most developers now relying on AI in their workflow, the question isn’t if you’ll use a code generator in 2025, but which one can deliver the most reliable, context-aware support. In just a few years, AI coding assistants have evolved from autocomplete tools to full-scale collaborators, capable of scaffolding projects, debugging complex systems, and even generating production-ready applications. Stack Overflow’s 2023 Developer Survey mentioned that nearly 70% of developers already use AI t

12 Replit Alternatives for Development in 2025 Cover

AI

Oct 15, 202512 min read

12 Replit Alternatives for Development in 2025

Is Replit still the best choice for cloud-based development in 2025? For years, Replit has been one of the most popular online IDEs, thanks to its instant setup, collaborative editing, and growing ecosystem of AI tools. For students and indie developers, it has often been the first stop for quick coding experiments. For teams, it has offered a fast way to collaborate without heavy local setups. But the developer ecosystem has changed. As projects scale, many find that Replit struggles with perf