GeekCoding101

"Self Attention", a pivotal advancement in deep learning, is at the core of the Transformer architecture, revolutionizing how models process and understand sequences. Unlike traditional Attention, which focuses on mapping relationships between separate input and output sequences, Self-Attention enables each element within a sequence to interact dynamically with every other element. This mechanism allows AI models to capture long-range dependencies more effectively than previous architectures like RNNs and LSTMs. By computing relevance scores between words in a sentence, Self-Attention ensures that key relationships—such as pronoun references or contextual meanings—are accurately identified, leading to more sophisticated language understanding and generation. 1. The Origin of the Attention Mechanism The Attention Mechanism is one of the most transformative innovations in deep learning. First introduced in the 2014 paper Neural Machine Translation by Jointly Learning to Align and Translate, it was designed to address a critical challenge: how can a model effectively focus on the most relevant parts of input data, especially in tasks involving long sequences? Simply put, the Attention Mechanism allows models to “prioritize,” much like humans skip unimportant details when reading and focus on the key elements. This breakthrough marks a shift in AI from rote memorization to dynamic understanding. 2. The Core Idea Behind the Attention Mechanism The Attention Mechanism’s main idea is simple yet powerful: it enables the model to assign different levels of importance to different parts of the input data. Each part of the sequence is assigned a weight, with higher weights indicating greater relevance to the task at hand. For example, when translating the sentence “I…

December 3, 2024 0comments 195hotness 0likes Geekcoding101 Read all

1. What is the Transformer? The Transformer is a deep learning architecture introduced by Google Research in 2017 through the seminal paper Attention is All You Need. Originally designed to tackle challenges in natural language processing (NLP), it has since transformed into the foundation for state-of-the-art AI models in multiple domains, such as computer vision, speech processing, and multimodal learning. Traditional NLP models like RNNs (Recurrent Neural Networks) and LSTMs (Long Short-Term Memory networks) had two significant shortcomings: Sequential Processing: These models processed text one token at a time, slowing down computations and making it hard to parallelize. Difficulty Capturing Long-Range Dependencies: For long sentences or documents, these models often lost crucial contextual information from earlier parts of the input. The Transformer introduced a novel Self-Attention Mechanism, enabling it to process entire input sequences simultaneously and focus dynamically on the most relevant parts of the sequence. Think of it like giving the model a panoramic lens, allowing it to view the entire context at once, rather than just focusing on one word at a time. 2. Why is the Transformer Important? The Transformer brought a paradigm shift to AI, fundamentally altering how models process, understand, and generate information. Here's why it’s considered revolutionary: (1) Parallel Processing Unlike RNNs that process data step by step, Transformers can analyze all parts of the input sequence simultaneously. This parallelism significantly speeds up training and inference, making it feasible to train models on massive datasets. (2) Better Understanding of Context The Self-Attention Mechanism enables the Transformer to capture relationships between all tokens in a…

December 2, 2024 0comments 149hotness 0likes Geekcoding101 Read all

The Problem: Too Much Dust on Old Photos, I need "Remove Duplicate Photos" cleaner Imagine sifting through tens of thousands of photos—manually. I mounted the NAS SMB partition on my MacBook, only to discover it was excruciatingly slow. After two days of copying files to my MacBook, my manual review session turned into a blur. My eyes hurt, my patience wore thin, and I knew there had to be a better way. When I turned to existing tools for "remove duplicate photo" task, I hit a wall. Most were paid, overly complex, or simply didn’t fit my needs. Even the so-called free solutions required learning arcane commands like find. I needed something powerful, flexible, and fast. And when all else fails, what’s a tech enthusiast to do? Write their own solution—with a "little" help from ChatGPT. The Power of ChatGPT I’d dabbled with the same task scripting years ago but quickly gave up because of the time it required. Enter ChatGPT (no marketing here... I am a paid user though...), the real hero of this story. With its assistance, I wrote the majority of the script in less than a day before i gave up ! But anyway, of course, I still have to thank the emergence of Large Language Models! Based on the current code volume and quality, without 10 to 15 days, a single person would absolutely not be able to achieve the current results! So, I believe LLMs have helped me improve my efficiency by at least 10 times! And they've helped me avoid all sorts of…

December 1, 2024 0comments 913hotness 1likes Geekcoding101 Read all

So, I did a thing. I earned my first Microsoft certificate: Azure AI Engineer Associate! 🎉 Here is my story from training to passing the AI-102 exam. The Learning Journey of AI-102 The journey began with a four-and-a-half-day company-provided AI-102 training session. It was a mix of online classes and labs. I was actually on vacation during this period, so I only managed to focus for about three days, probably. The labs provided during the training were very useful. There were about 10 labs, each lab could be done up to 10 times, with each session lasting 1 to 3 hours. So I didn’t need to pay Microsoft to get familiar with the Azure AI environment. Roughly calculation, the training provides 100 to 200 hours of lab time available, but I only used about 20 hours before taking the exam. After the AI-102 training, I mainly sticked to Microsoft Learn: Designing and Implementing a Microsoft Azure AI Solution to fill gaps. Trust me, that's really helpful! The MS Learn modules helped me understand the concepts better. Cramming and Building Knowledge for AI-102 As the exam date got closer, I quickly skimmed John Savill’s Technical Training videos on YouTube for one time. His videos helped me build a complete knowledge framework in my head. One time is enough for me. Last but not least, please do read Areeb Pasha's AI-102 notes on Notion! Thanks to Areeb Pasha! It's so useful. These notes were like a concise version of MS Learn and made my studying very efficient. I managed to cover all…

June 26, 2024 0comments 381hotness 4likes Geekcoding101 Read all

Introduction Are you ready to delve into the exciting realm of Azure AI? Whether you're a seasoned developer or just starting your journey in the world of artificial intelligence, Microsoft Build offers a transformative opportunity to harness the power of AI. Recently I came across several good tutorials on Microsoft website, e.g. "CLOUD SKILLS CHALLENGE: Microsoft Build: Build multimodal Generative AI experiences". I enjoyed the learning on it. But I found out the very first step many people might seem as a challange: get az command work on Mac! So I decided to write down all my fix. Let's go! Resolution I am following up "Install Azure on Mac". Run command: But it failed with permission issue on openssl package: I fixed it by changing the permission of /usr/local/lib to current user, but it's not enough. I hit Python permission issue at a different location: So I had to apply the permission to /usr/local. So the command is: The screenshot of brew unlink command: Finally it finished installation successfully! Well done! Ps. You're welcome to access my other AI Insights blog posts at here.

June 11, 2024 0comments 187hotness 0likes Geekcoding101 Read all

I finished the course! I really enjoyed the learning experiences in Andrew's course so far. Let's see what I've learn for the two days! Overfitting - The Last Topic of this Course! Overfitting It occurs when a machine learning model learns the details and noise in the training data to an extent that it negatively impacts the performance of the model on new data. This means the model is great at predicting or fitting the training data but performs poorly on unseen data, due to its inability to generalize from the training set to the broader population of data. The course explains that overfitting can be addressed by: We can't bypass underfitting. Overfitting and underfitting both are undesirable effects that suggest a model is not well-tuned to the task at hand, but they stem from opposite causes and have different solutions. Below two screenshots captured from course for my notes: Questions help me to master the content Words From Andrew At The End! I want to say congratulations on how far you've come and I want to say great job for getting through all the way to the end of this video. I hope you also work through the practice labs and quizzes. Having said that, there are still many more exciting things to learn. Awesome! I am already ready for next machine learning journeys!

May 12, 2024 0comments 211hotness 0likes Geekcoding101 Read all

Let's continue! Today is mainly learning about "Decision boundary", "Cost function of logistic regresion", "Logistic loss" and "Gradient Descent Implementation for logistic regression". We found out the "Decision boundary" is when z equals to 0 in the sigmod function. Because at this moment, its value will be just at neutral position. Andrew gave an example with two variables, x1 + x2 - 3 (w1 = w2 = 1) the decision bounday is the line of x1 + x2 = 3. I want to say "Cost function for logistic regression" is the most hard in week 3 so far I've seen. I haven't quite figured out why the square error cost function not applicable and where the loss function came from. I have to re-watch the videos again. The lab is also useful. This particular cost function in above is derived from statistics using a statistical principle called maximum likelihood estimation (MLE). Questions and Answers Some thoughts of today Honestly, it feels like it's getting tougher and tougher. I can still get through the equations and derivations alright, it’s just that as I age, I feel like my brain is just not keeping up. At the end of each video, Andrew always congratulates me with a big smile, saying I’ve mastered the content of the session. But deep down, I really think what he's actually thinking is, "Ha, got you stumped again!" However, to be fair, Andrew really does explain things superbly well. I hope someday I can truly master this knowledge and use it effortlessly. Fighting! Ps. Feel free to…

May 10, 2024 0comments 351hotness 0likes Geekcoding101 Read all

A break due to sick Oh boy... I was sick for almost two weeks 🤒 After a brief break, I’m back to dive deep into machine learning, and today, we’ll revisit one of the core concepts in training models—gradient descent. This optimization technique is essential for minimizing the cost function and finding the optimal parameters for our machine learning models. Whether you're working with linear regression or more complex algorithms, understanding how gradient descent guides the learning process is key to achieving accurate predictions and efficient model training. Let's dive back into the data-drenched depths where we left off, shall we? 🚀 The first coding assessment I couldn't recall all of the stuff actually. It's for testing implementation of gradient dscent for one variable linear regression. I did a walk through previous lessons and I found this summary is really helpful: This exercise enhanced what I've learnt about "gradient descent" in this week. Getting into Classification I started the learning of the 3rd week. Looks like it will be more interesting. I made a few notes: Probability that y is 1;Given input arrow x, parameters arrow w, b. I couldn't focus too long on this. Need to pause after watching a few videos. Bye now. Ps. feel free to check out my other posts in Supervised Machine Learning Journey.

May 8, 2024 0comments 153hotness 0likes Geekcoding101 Read all

Today I started with Choosing the learning rate, reviewed the Jupyter lab and learnt what is feature engineering. Choosing the learning rate The graph taugh in Choosing the learning rate is helpful when develping models: Feature Engineering When I first started Andrew Ng’s Supervised Machine Learning course, I didn’t really realize how much of an impact feature engineering could have on a model’s performance. But boy, was I in for a surprise! As I worked through the course, I quickly realized that the raw data we start with is rarely good enough for building a great model. Instead, it needs to be transformed, scaled, and cleaned up — that’s where feature engineering comes into play. Feature engineering is all about making your data more useful for a machine learning algorithm. Think of it like preparing ingredients for a recipe — the better the quality of your ingredients, the better the final dish will be. Similarly, in machine learning, the features (the input variables) need to be well-prepared to help the algorithm understand patterns more easily. Without this step, even the most powerful algorithms might not perform at their best. In the course, Andrew Ng really breaks it down and explains how important feature scaling and transformation are. In one of the early lessons, he used the example of linear regression — a simple algorithm that relies on understanding the relationship between input features and the output. If the features are on vastly different scales, it can throw off the whole process and make training the model take much longer. This…

April 26, 2024 0comments 153hotness 0likes Geekcoding101 Read all

As you know I was in progress learning Andrew Ng's Supervised Machine Learning: Regression and Classification, it's so dry! So I also spare some time to pick up some easy ML courses to help me to understand. Today I came across Machine Learning for Absolute Beginners - Level 1 and it's really easy and friendly to beginner. Finished in 2.5 hours - Maybe because I've made some good progress in Supervised Machine Learning: Regression and Classification and so feel it's easy. I want to share my notes in this blog post. Applied AI or Shallow AI Industry’s robot can handle specific small task which has been programmed, it’s called Applied AI or Shallow AI. Under-fitting and over-fitting are challenges for Generalization. Under-fitting The trained model is not working well on the training data and can’t generalize to new data. Reasons may be: An idea training process, it would looks like: Under fitting….. better fitting…. Good fit Over-fitting The trained model is working well on the training data and can’t generalize well to new data. Reasons may be: Training dataset (labeled) -> ML Training phase -> Trained Model The input (unlabeled dataset) -> processed by Trained model (inference phase) -> output (labeled dataset) Approaches or learning algorithms of ML systems can be categorized into: Supervised Learning There are two very typical tasks that are performed using supervised learning: Shallow Learning One of the common classification algorithms under the shallow learning category is called Support Vector Machines (SVM). Unsupervised Learning The goal is to identify automatically meaningful patterns in unlabeled data. Semi-supervised…

April 24, 2024 0comments 145hotness 0likes Geekcoding101 Read all

1 2 345…7

7 Key Insights on the Self-Attention Mechanism in AI Magic

Why is the Transformer Model Called an "AI Revolution"?

Instantly Remove Duplicate Photos With A Handy Script

Honored to Pass AI-102!

Install Azure-Cli on Mac

Overfitting! Unlocking the Last Key Concept in Supervised Machine Learning – Day 11, 12

Grinding Through Logistic regression: Exploring Supervised Machine Learning – Day 10

Master Gradient Descent and Binary Classification: Supervised Machine Learning – Day 9

Master Learning Rate and Feature Engineering: Supervised Machine Learning – Day 8

Finished Machine Learning for Absolute Beginners - Level 1