Skip to main content

OpenCoder - Open Cookbook for Code Large Language Models

The projec is very interesting and it's the first time that I see a model that makes all the research data available:

OpenCoder is an open and reproducible code LLM family which matches the performance of Top-Tier Code LLM. We provide not just the final models, but also the reproducible training data, the complete data processing pipeline, rigorous experimental ablation results, and detailed training protocols for open scientific research.

Making AI models open source is not easy, there is no real definition for it, but I thin kthis is as close as it can get for now.

The models is also avialalbe on ollama and I gave it a try on my Server using the 1.5b model. The results and the execution on the CPU are quite good:


Example 1

USER

A function to calculate the first 12 fibonacci numebers

ASSISTANT

Here is a Python function that calculates and returns the first 12 Fibonacci numbers:

def fibonacci_numbers(n=12):
    fib_sequence = [0, 1]
    for i in range(2, n):
        fib_sequence.append(fib_sequence[i-1] + fib_sequence[i-2])
    return fib_sequence

print(fibonacci_numbers())

This function starts by defining the first two Fibonacci numbers as 0 and 1. Then it enters a loop that generates the remaining Fibonacci numbers. For each iteration of the loop, it calculates the next Fibonacci number as the sum of the previous two numbers and appends it to the list. The loop ends when it has generated n-2 Fibonacci numbers. Finally, it returns the list of Fibonacci numbers.

USER

now in JS

ASSISTANT

Here is a function that calculates the first 12 Fibonacci numbers:

function fibonacciNumbers(n = 12) {
    let fibSequence = [0, 1];
    for (let i = 2; i < n; i++) {
        fibSequence.push(fibSequence[i - 1] + fibSequence[i - 2]);
    }
    return fibSequence;
}

console.log(fibonacciNumbers());

This function starts by defining the first two Fibonacci numbers as 0 and 1. Then it enters a loop that generates the remaining Fibonacci numbers. For each iteration of the loop, it calculates the next Fibonacci number as the sum of the previous two numbers and appends it to the array. The loop ends when it has generated n-2 Fibonacci numbers. Finally, it returns the array of Fibonacci numbers.

USER

now in rust

ASSISTANT

Here is a function that calculates the first 12 Fibonacci numbers:

fn fibonacci_numbers(n: usize) -> Vec<usize> {
    let mut fib_sequence = vec![0, 1];
    for i in 2..n {
        let next_fib_number = fib_sequence[i - 1] + fib_sequence[i - 2];
        fib_sequence.push(next_fib_number);
    }
    return fib_sequence;
}

fn main() {
    println!("{:?}", fibonacci_numbers(12));
}

This function starts by defining the first two Fibonacci numbers as 0 and 1. Then it enters a loop that generates the remaining Fibonacci numbers. For each iteration of the loop, it calculates the next Fibonacci number as the sum of the previous two numbers and appends it to the vector. The loop ends when it has generated n-2 Fibonacci numbers. Finally, it returns the vector of Fibonacci numbers.


Example 2

USER

Create me a function that helps me find the first 33 prime numbers

ASSISTANT

Here is a Python function to do this:

def first_33_primes():
    primes = []
    candidate = 2

    while len(primes) < 33:
        if all(candidate % p > 0 for p in primes):
            primes.append(candidate)
        candidate += 1

    return primes

print(first_33_primes())

This script works by starting with the smallest prime number, 2, and checking if it is divisible by any of the previously found prime numbers. If it isn't, we know it's a prime number and add it to our list of prime numbers. We then move on to the next candidate number (incremented by one for each iteration).

This function will print out the first 33 prime numbers:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113]

USER

The code should be in PHP

ASSISTANT

Here is the equivalent PHP code for finding the first 33 prime numbers:

function first_33_primes() {
     $primes = [];
     $candidate = 2;

     while (count($primes) < 33) {
         if (all_primes($candidate, $primes)) {
             array_push($primes, $candidate);
         }
         $candidate += 1;
     }

     return $primes;
}

function all_primes($n, $arr) {
    foreach ($arr as $num) {
        if ($n % $num == 0) return false; 
    }
    return true;
}

print_r(first_33_primes());

This PHP code does exactly what the Python function does. It continues generating prime numbers until it finds the first 33 primes.

Claude computer use, Claude 3.5 Sonnet and Claude 3.5 Haiku

Anthropic has introduced a new feature in Claude called computer use. The idea behind it is to use the computer screen as visual input and allow the model control over the mouse cursor, buttons and text input. The feature is still experimental but in public beta.

Claude 3.5 Sonnet received some impressive updates across several benchmarks (see the table below) and later this month a new model will be released called Claude 3.5 Haiku.

Benchmark Category Claude 3.5 Sonnet (new) Claude 3.5 Haiku Claude 3.5 Sonnet GPT-4o GPT- mini Gemini 1.5 Pro Gemini 1.5 Flash
Graduate level reasoning
(GPQA Diamond)
65.0%
0-shot CoT
41.6%
0-shot CoT
59.4%
0-shot CoT
53.6%
0-shot CoT
40.2%
0-shot CoT
59.1%
0-shot CoT
51.0%
0-shot CoT
Undergraduate level knowledge
(MMLU Pro)
78.0%
0-shot CoT
65.0%
0-shot CoT
75.1%
0-shot CoT
75.8%
0-shot CoT
67.3%
0-shot CoT
Code
(HumanEval)
93.7%
0-shot
88.1%
0-shot
92.0%
0-shot
90.2%
0-shot
87.2%
0-shot
Math problem-solving
(MATH)
78.3%
0-shot CoT
69.2%
0-shot CoT
71.1%
0-shot CoT
76.6%
0-shot CoT
70.2%
0-shot CoT
86.5%
4-shot CoT
77.9%
4-shot CoT
High school math competition
(AIME 2024)
16.0%
0-shot CoT
5.3%
0-shot CoT
9.6%
0-shot CoT
9.3%
0-shot CoT
Visual Q/A
(MMMU)
70.4%
0-shot CoT
68.3%
0-shot CoT
69.1%
0-shot CoT
59.4%
0-shot CoT
65.9%
0-shot CoT
62.3%
0-shot CoT
Agentic coding
(SWE-bench Verified)
49.0% 40.6% 33.4%
Agentic tool use - Retail
(TAU-bench)
69.2% 51.0% 62.6%
Agentic tool use - Airline
(TAU-bench)
46.0% 22.8% 36.0%

Note: According to Anthropic, the o1 models were omitted due to the extensive pre-response computation time and differences between the model approaches.

Claude 3.5 Haiku is an alternative to GPT-4o Mini. While not as competitive in terms of pricing, according to the benchmarks it should perform better.

Here are the current prices of the API:

Pricing for Claude 3.5 Sonnet

  • $15.00 / 1M output tokens
  • $3.00 / 1M input tokens
  • $3.75 / 1M prompt caching write tokens
  • $0.30 / 1M prompt caching read tokens

Pricing for Claude 3.5 Haiku

  • $1.25 / 1M output tokens
  • $0.25 / 1M input tokens
  • $0.30 / 1M prompt caching write tokens
  • $0.03 / 1M prompt caching read tokens

Pricing for Claude 3 Opus

  • $75.00 / 1M output tokens
  • $15.00 / 1M input tokens
  • $18.75 / 1M prompt caching write tokens
  • $1.50 / 1M prompt caching read tokens

Notes:

  • All models feature a 200K context window
  • 50% discount is available when using the Batches API

What is I-XRAY?

I-XRAY is an interesting project that aims to raise awareness about the dangers of AI and leaked personal data.

The project utilizes several online tools and LLMs based on the input captured by a camera (such as on a pair of glasses). The images are used to identify individuals and find other personal data that may be available or leaked on the internet.

The project is not open source, but some details are published here.

There are also some recommendations on how to remove private data from certain online platforms that distribute it.

What is semantic routing for LLMs

Semantic routing is an approach used with AI agents. It is similar to a decision tree, and based on matching routes, it determines which action path to follow.

For a chatbot, routes can be defined as a set of questions, and with embeddings in a vector database, they can be matched accordingly.

This method is an effective way of routing questions to a specific knowledge base or other interactions, especially when combined with RAG or different LLMs.

Retrieval-augmented generation (RAG)

RAG is a technique used in generative text models.

A text generator model can only create a response based on the data it has been trained on. This limitation can lead to restricted responses or hallucinations.

RAG is a method of injecting new information into the model without modifying the model itself.

The context is typically provided in the system prompt, and the model is instructed to use this information when generating a response to a question.

Text models have limitations regarding context, especially when used via an API, which means that costs can increase with larger contexts. Vector databases can be employed to optimize context selection.

System prompts and LLMs

A very useful use case for LLMs is chatbots. This has been demonstrated primarily with the rise of ChatGPT, as prior to that, LLMs were not as remarkable.

Here is the structure of a typical chat prompt:

[
  {
    "role": "system",
    "content": "You are a helpful assistant."
  },
  {
    "role": "user",
    "content": "Who won the World Cup in 1998?"
  },
  {
    "role": "assistant",
    "content": "France"
  }
]

The most important parameter is the system prompt. This can be adapted in multiple ways, depending on what the generator should serve.

The user message is the input from the chat, and the assistant message is what the LLM generates.

GPT-4o mini - pricing, performance and integrations

The model is currently ranked second in the LMSYS chatbot arena leaderboard based on overall performance.

The model can be used directly with the API and is available for workflow integration in dify 0.6.15.

Pricing for GPT-4o Mini

Pricing with the regular API:

  • $0.600 / 1M output tokens
  • $0.300 / 1M input tokens

Pricing with the batch API:

  • $0.150 / 1M input tokens
  • $0.075 / 1M output tokens

Pricing for GPT-4o

Pricing with the regular API:

  • $15.00 / 1M output tokens
  • $7.50 / 1M input tokens

Pricing with the batch API:

  • $5.00 / 1M input tokens
  • $2.50 / 1M output tokens