OpenCoder - Open Cookbook for Code Large Language Models

Harald Nezbeda

2024-11-10 20:30

The projec is very interesting and it's the first time that I see a model that makes all the research data available:

OpenCoder is an open and reproducible code LLM family which matches the performance of Top-Tier Code LLM. We provide not just the final models, but also the reproducible training data, the complete data processing pipeline, rigorous experimental ablation results, and detailed training protocols for open scientific research.

Making AI models open source is not easy, there is no real definition for it, but I thin kthis is as close as it can get for now.

The models is also avialalbe on ollama and I gave it a try on my Server using the 1.5b model. The results and the execution on the CPU are quite good:

Example 1

USER

A function to calculate the first 12 fibonacci numebers

ASSISTANT

Here is a Python function that calculates and returns the first 12 Fibonacci numbers:

def fibonacci_numbers(n=12):
    fib_sequence = [0, 1]
    for i in range(2, n):
        fib_sequence.append(fib_sequence[i-1] + fib_sequence[i-2])
    return fib_sequence

print(fibonacci_numbers())

This function starts by defining the first two Fibonacci numbers as 0 and 1. Then it enters a loop that generates the remaining Fibonacci numbers. For each iteration of the loop, it calculates the next Fibonacci number as the sum of the previous two numbers and appends it to the list. The loop ends when it has generated n-2 Fibonacci numbers. Finally, it returns the list of Fibonacci numbers.

USER

now in JS

ASSISTANT

Here is a function that calculates the first 12 Fibonacci numbers:

function fibonacciNumbers(n = 12) {
    let fibSequence = [0, 1];
    for (let i = 2; i < n; i++) {
        fibSequence.push(fibSequence[i - 1] + fibSequence[i - 2]);
    }
    return fibSequence;
}

console.log(fibonacciNumbers());

This function starts by defining the first two Fibonacci numbers as 0 and 1. Then it enters a loop that generates the remaining Fibonacci numbers. For each iteration of the loop, it calculates the next Fibonacci number as the sum of the previous two numbers and appends it to the array. The loop ends when it has generated n-2 Fibonacci numbers. Finally, it returns the array of Fibonacci numbers.

USER

now in rust

ASSISTANT

Here is a function that calculates the first 12 Fibonacci numbers:

fn fibonacci_numbers(n: usize) -> Vec<usize> {
    let mut fib_sequence = vec![0, 1];
    for i in 2..n {
        let next_fib_number = fib_sequence[i - 1] + fib_sequence[i - 2];
        fib_sequence.push(next_fib_number);
    }
    return fib_sequence;
}

fn main() {
    println!("{:?}", fibonacci_numbers(12));
}

This function starts by defining the first two Fibonacci numbers as 0 and 1. Then it enters a loop that generates the remaining Fibonacci numbers. For each iteration of the loop, it calculates the next Fibonacci number as the sum of the previous two numbers and appends it to the vector. The loop ends when it has generated n-2 Fibonacci numbers. Finally, it returns the vector of Fibonacci numbers.

Example 2

USER

Create me a function that helps me find the first 33 prime numbers

ASSISTANT

Here is a Python function to do this:

def first_33_primes():
    primes = []
    candidate = 2

    while len(primes) < 33:
        if all(candidate % p > 0 for p in primes):
            primes.append(candidate)
        candidate += 1

    return primes

print(first_33_primes())

This script works by starting with the smallest prime number, 2, and checking if it is divisible by any of the previously found prime numbers. If it isn't, we know it's a prime number and add it to our list of prime numbers. We then move on to the next candidate number (incremented by one for each iteration).

This function will print out the first 33 prime numbers:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113]

USER

The code should be in PHP

ASSISTANT

Here is the equivalent PHP code for finding the first 33 prime numbers:

function first_33_primes() {
     $primes = [];
     $candidate = 2;

     while (count($primes) < 33) {
         if (all_primes($candidate, $primes)) {
             array_push($primes, $candidate);
         }
         $candidate += 1;
     }

     return $primes;
}

function all_primes($n, $arr) {
    foreach ($arr as $num) {
        if ($n % $num == 0) return false; 
    }
    return true;
}

print_r(first_33_primes());

This PHP code does exactly what the Python function does. It continues generating prime numbers until it finds the first 33 primes.

Claude computer use, Claude 3.5 Sonnet and Claude 3.5 Haiku

Harald Nezbeda

2024-10-23 06:56

Anthropic has introduced a new feature in Claude called computer use. The idea behind it is to use the computer screen as visual input and allow the model control over the mouse cursor, buttons and text input. The feature is still experimental but in public beta.

Claude 3.5 Sonnet received some impressive updates across several benchmarks (see the table below) and later this month a new model will be released called Claude 3.5 Haiku.

Benchmark Category	Claude 3.5 Sonnet (new)	Claude 3.5 Haiku	Claude 3.5 Sonnet	GPT-4o	GPT- mini	Gemini 1.5 Pro	Gemini 1.5 Flash
Graduate level reasoning (GPQA Diamond)	65.0% _{0-shot CoT}	41.6% _{0-shot CoT}	59.4% _{0-shot CoT}	53.6% _{0-shot CoT}	40.2% _{0-shot CoT}	59.1% _{0-shot CoT}	51.0% _{0-shot CoT}
Undergraduate level knowledge (MMLU Pro)	78.0% _{0-shot CoT}	65.0% _{0-shot CoT}	75.1% _{0-shot CoT}	—	—	75.8% _{0-shot CoT}	67.3% _{0-shot CoT}
Code (HumanEval)	93.7% _0-shot	88.1% _0-shot	92.0% _0-shot	90.2% _0-shot	87.2% _0-shot	—	—
Math problem-solving (MATH)	78.3% _{0-shot CoT}	69.2% _{0-shot CoT}	71.1% _{0-shot CoT}	76.6% _{0-shot CoT}	70.2% _{0-shot CoT}	86.5% _{4-shot CoT}	77.9% _{4-shot CoT}
High school math competition (AIME 2024)	16.0% _{0-shot CoT}	5.3% _{0-shot CoT}	9.6% _{0-shot CoT}	9.3% _{0-shot CoT}	—	—	—
Visual Q/A (MMMU)	70.4% _{0-shot CoT}	—	68.3% _{0-shot CoT}	69.1% _{0-shot CoT}	59.4% _{0-shot CoT}	65.9% _{0-shot CoT}	62.3% _{0-shot CoT}
Agentic coding (SWE-bench Verified)	49.0%	40.6%	33.4%	—	—	—	—
Agentic tool use - Retail (TAU-bench)	69.2%	51.0%	62.6%	—	—	—	—
Agentic tool use - Airline (TAU-bench)	46.0%	22.8%	36.0%	—	—	—	—

Note: According to Anthropic, the o1 models were omitted due to the extensive pre-response computation time and differences between the model approaches.

Claude 3.5 Haiku is an alternative to GPT-4o Mini. While not as competitive in terms of pricing, according to the benchmarks it should perform better.

Here are the current prices of the API:

Pricing for Claude 3.5 Sonnet

$15.00 / 1M output tokens
$3.00 / 1M input tokens
$3.75 / 1M prompt caching write tokens
$0.30 / 1M prompt caching read tokens

Pricing for Claude 3.5 Haiku

$1.25 / 1M output tokens
$0.25 / 1M input tokens
$0.30 / 1M prompt caching write tokens
$0.03 / 1M prompt caching read tokens

Pricing for Claude 3 Opus

$75.00 / 1M output tokens
$15.00 / 1M input tokens
$18.75 / 1M prompt caching write tokens
$1.50 / 1M prompt caching read tokens

Notes:

All models feature a 200K context window
50% discount is available when using the Batches API

What is I-XRAY?

Harald Nezbeda

2024-10-08 05:03

I-XRAY is an interesting project that aims to raise awareness about the dangers of AI and leaked personal data.

The project utilizes several online tools and LLMs based on the input captured by a camera (such as on a pair of glasses). The images are used to identify individuals and find other personal data that may be available or leaked on the internet.

The project is not open source, but some details are published here.

There are also some recommendations on how to remove private data from certain online platforms that distribute it.

What is semantic routing for LLMs

Harald Nezbeda

2024-09-24 06:29

Semantic routing is an approach used with AI agents. It is similar to a decision tree, and based on matching routes, it determines which action path to follow.

For a chatbot, routes can be defined as a set of questions, and with embeddings in a vector database, they can be matched accordingly.

This method is an effective way of routing questions to a specific knowledge base or other interactions, especially when combined with RAG or different LLMs.

Retrieval-augmented generation (RAG)

Harald Nezbeda

2024-09-24 06:15

RAG is a technique used in generative text models.

A text generator model can only create a response based on the data it has been trained on. This limitation can lead to restricted responses or hallucinations.

RAG is a method of injecting new information into the model without modifying the model itself.

The context is typically provided in the system prompt, and the model is instructed to use this information when generating a response to a question.

Text models have limitations regarding context, especially when used via an API, which means that costs can increase with larger contexts. Vector databases can be employed to optimize context selection.

System prompts and LLMs

Harald Nezbeda

2024-08-16 16:48

A very useful use case for LLMs is chatbots. This has been demonstrated primarily with the rise of ChatGPT, as prior to that, LLMs were not as remarkable.

Here is the structure of a typical chat prompt:

[
  {
    "role": "system",
    "content": "You are a helpful assistant."
  },
  {
    "role": "user",
    "content": "Who won the World Cup in 1998?"
  },
  {
    "role": "assistant",
    "content": "France"
  }
]

The most important parameter is the system prompt. This can be adapted in multiple ways, depending on what the generator should serve.

The user message is the input from the chat, and the assistant message is what the LLM generates.

Lama 3.1 - 8B, 70B and 405B models

Harald Nezbeda

2024-07-24 05:08

The models are released under the LLAMA 3.1 COMMUNITY LICENSE AGREEMENT.

Model Versions:

GPT-4o mini - pricing, performance and integrations

Harald Nezbeda

2024-07-24 04:49

The model is currently ranked second in the LMSYS chatbot arena leaderboard based on overall performance.

The model can be used directly with the API and is available for workflow integration in dify 0.6.15.

Pricing for GPT-4o Mini

Pricing with the regular API:

$0.600 / 1M output tokens
$0.300 / 1M input tokens

Pricing with the batch API:

$0.150 / 1M input tokens
$0.075 / 1M output tokens

Pricing for GPT-4o

Pricing with the regular API:

$15.00 / 1M output tokens
$7.50 / 1M input tokens

Pricing with the batch API:

$5.00 / 1M input tokens
$2.50 / 1M output tokens

LMSYS Chatbot Arena Leaderboard Data

Harald Nezbeda

2024-06-27 20:21

The data for the Chatbot Arena is available for access at here

Additionally, there is a Google Colab Notebook provided, which serves as the source for the plots displayed on the website.

The leaderboard can also be viewed on Hugging Face.

What is the difference between Codium, VSCodium, and Codium for VS Code?

Harald Nezbeda

2024-06-26 19:48

They not only look and sound similar, but they also share some commonalities. However, at the end of the day, each of these terms refers to a different concept.

Codium is an AI code assistant similar to GitHub Copilot.
VSCodium provides Free/Libre Open Source Software binaries of VS Code.
Codium for VS Code is an extension for VS Code that enables the use of Codium.