Work in progress: State of AI Technology, Version 0.5.3, Nov 17, 2024
All from personal experience with great attention to practical details at high information density
No ads, no affiliate links, no paid content
Tips from dozens of the best sources for your quick, professional entry into Artificial Intelligence
Time for companies to take advantage of local multi-modal models for their valuable data,
not to miss the next wave with advanced reasoning that is debuting in the OpenAI o1 models
Website built with VS Code using Git, GitHub, GitHub Copilot, Chrome DevTools, and ChatGPT 4o
V 0.1: GPT introduced some hard-to-see regressions.
My changes / improvements in the code were reverted by GPT without reason.
Could resolve it using GitHub History.
Generative AI, the next big thing after the internet and smartphones:
OpenAI ChatGPT 4o, a Microsoft Copilot, Google Gemini or Anthropic Claude
depending on which office ecosystem you are in; used to be called "Office" at Microsoft
▶
GPT 4omni — multi-modal generative language model and twin offerings from Microsoft
Text-to-speech module that understands (almost) everything (even without context)
DALL-E image generator that turns every scribbler into an illustrator
"Data Analyst" for breathtaking analyses
"Code" as a programming assistant
Text-to-speech with emotion; stumbles and filler words only on request to make the robo-voice appear more humanCurrently free for private use, seemingly to take advantage of the moment to directly win a broader audience for OpenAI. (with supposedly little interest in AI)
Even if qualitative tests OpenAI v Google v Anthropic vary, OpenAI / MS seems to be quite ahead in real releases.
(Google) DeepMind should be credited for releasing models that play in entirely different leagues and often significantly outperform competitors (whether human or conventional programming).
(AlphaFold protein folding and more, weather models with orders of magnitude better efficiency,…)
Page under construction...
MS Github Copilot & OpenAI ChatGPT 4o versus Codestral, Starcoder, DeepSeek Coder,… ▶
Even though clickbait headlines suggest otherwise: Open Source running locally is usually second best
For a local model to be similarly good, it currently needs to be well beyond 13b parameters. Then it is relatively slow on normal hardware.
But already the improved idea: Continue Plugin for VS Code and Ollama environment for a model combo (Codestral 22b Chat and "fill in the middle", Llama 3 7b v 70b and Starcoder 2)
If the code is not a company secret, one indulges in ChatGPT 4o for free and/or risks a $10 for GitHub Copilot
Open Source models on Hugging Face from Meta (Llama), Mistral, AI2 (Olmo)
conveniently run in LM Studio, Ollama oder GPT4All
▶
bullet
p
bullet
text
text
RAG Retrieval Augmented Generation to integrate non-public data into one of the large AI models ▶
Google NotebookLM
p
RAG via GUI or framework (e.g., LlamaIndex)
text
text
Prompt "Engineering" ▶
Simple minds postulated "prompt engineering" as the next big job — as if it wasn't obvious that LLMs were destined to do a much better job in that area than humans.
Prompt generation support in OpenAI chatGPT playground
The Generate button in the Playground lets you generate prompts, functions, and schemas from just a description of your task.
h4
p text br
text -p
Nvidia Digital Twins in Omniverse ▶
Digital twins (of robots, factories, and eventually the earth)
for planning and/or maintenance (comparison of reality v model in real-time)
Robots (various providers) trained by the dozens in extreme time-lapse with Nvidia Isaac Sim and commanded in natural language
BMW is a long-known Nvidia showcase example for the digital factory twin
Specialized models like Google Deepmind MedLMs, AlphaFold ▶
bullet
p
bullet
text
text
AI-accelerators: Nvidia GPUs or Apple Silicon M-series processors with unified memory
or an AMD GPU with 128GB HBM3e or Groq or 1TB CXL
▶
bullet
p
bullet
text
text
Dubious usage examples, often parroted by "testers" without criticism. ▶
AI was supposed to learn everything from examples alone and failed miserably at basic arithmetic.
Students can use calculators. AI could not. Exams are often "open book". AI had no Internet access. The situation has changed.
AI agents (very trendy!) for weather forecasts with the information clarity / ambiguity of icons for "variable" and an accuracy achievable with weather in April
The same example questions have been asked of Siri, Google, and Alexa for years, such as:
What's the weather (going to be1)? (1) English is a crazy language!)
The only good answer came from Amazon years ago with a TV spot featuring a hidden joke, where the attentive observer wonders why the actor doesn't look out the window.
Ahh, he's blind! And now the question asked actually makes sense.AI — at university level in many areas — often misused for everyday platitudes or sheer nonsense, featured in dozens of examples on chatbot homepages
"Explain the concept of nostalgia to a preschooler"
Logic puzzles linguistically maximally convoluted
Simplified: Three murderers. One more murderer enters and kills a murderer. How many murderers are there?
Three? Four? (Does a dead murderer count as a murderer?)Simplified: Mary travelled to the kitchen. Sandra journeyed to the kitchen. Mary went back to the garden. Where is Mary?
Llama3-8B-1.58-100B-tokens gets it wrong. ChatGPT 4o knows that "travelled" and "journeyed to the kitchen" are bad choices.
Double obfuscation for the sake of what?