HACHATHON

OVERVIEW

WORKSHOP: Introduction to Machine Learning with Python


Introduction to Machine Learning with Python — IndabaX Namibia

Introduction to Machine Learning with Python

A Self-Paced Workshop Designed for the Deep Learning IndabaX Namibia Community

Welcome to this practical, self-paced introduction to Machine Learning (ML). ML architectures are currently transforming industries across Africa—from optimising agricultural yields and predicting health diagnostic metrics to building resilient cybersecurity systems. The purpose of this guide is to bridge the gap between basic coding syntax and statistical system development using Python.

Workshop Roadmap

  • Prerequisites & Environment Setup
  • Module 1: The Foundations of Machine Learning
  • Module 2: The Machine Learning Pipeline Workflow
  • Module 3: Hands-On Lab 1 – Property Price Tracking (Regression)
  • Module 4: Hands-On Lab 2 – Credit Risk Evaluation (Classification)
  • Module 5: The Cap-stone Project Challenge
  • Next Steps & Development Resources

Prerequisites & Environment Setup

To successfully run the technical labs inside this curriculum, you require a computing environment equipped with Python 3 and fundamental data processing libraries.

Option A: Google Colab (Highly Recommended)

The fastest zero-configuration option. Code executes completely within your browser with free sandbox compute engines provided.

➔ Go to colab.research.google.com and select New Notebook.

Option B: Local Deployment via Anaconda

If you prefer locally hosted processing, download the Anaconda Individual Edition and open up a clean instance of Jupyter Notebook.

To ensure your environment handles calculations correctly, execute the following dependency call in your command terminal:

Terminal
pip install numpy pandas scikit-learn matplotlib seaborn

Module 1: The Foundations of Machine Learning

Before designing analytical algorithms, we must understand the shift away from deterministic application logic.

Traditional Programming Paradigm Machine Learning Paradigm
Developers manually write explicit logic rules. Systems accept raw data inputs and parse them through these rules to evaluate strict programmatic output solutions. Data structures and corresponding target outputs (labels) are fed collectively into a learning engine. The algorithm infers the mathematical patterns to engineer its own decision rules.

Mathematically, our primary objective within Supervised Learning models is to deduce an optimal target estimation function f capable of establishing relationships such that:

y = f(X) + ε

Where y is our dependent target variable, X represents the matrix of independent predictive features, and ε accounts for unavoidable structural noise (irreducible error).

Module 2: The Machine Learning Pipeline

A functional production-grade deployment relies heavily on systematic execution routines. The end-to-end lifecycle follows this structured, repeating lifecycle:

[Data Collection] ➔ [Data Preprocessing] ➔ [Feature Engineering] │ [Model Deployment] 🔀 [Model Evaluation] ➔ [Model Training]
  1. Problem Definition: Specifying the target metrics alongside quantitative assessment goals.
  2. Data Ingestion: Parsing raw telemetry datasets from storage infrastructure.
  3. Data Preprocessing: Re-encoding missing observations, eliminating duplicate matrices, and handling anomalies.
  4. Feature Engineering: Transforming and transforming input arrays to elevate predictive visibility.
  5. Model Training: Optimising weight weights by feeding data matrices directly into our target algorithms.
  6. Evaluation: Scoring test arrays against validation standards (e.g., MSE or Classification Accuracy metrics).

Module 3: Hands-On Lab 1 – Property Price Tracking (Regression)

In this exercise, we will configure a basic **Linear Regression** system to predict continuous property values based on space scaling measurements (in square metres) located in Windhoek, mapped inside Namibian Dollars (NAD).

Step 1: Ingest Simulation Workspace Arrays

Python Script
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Establish deterministic sequence seeds
np.random.seed(42)

# Generate synthetic arrays: Size (sqm) vs Price (expressed in thousands NAD)
sizes = np.random.normal(150, 40, 100).reshape(-1, 1)
prices = 250 + (sizes * 4.5) + np.random.normal(0, 50, 100).reshape(-1, 1)

# Format structured pandas data frame workspace
df = pd.DataFrame(data=np.hstack((sizes, prices)), columns=['Size_sqm', 'Price_NAD_k'])
print(df.head())

Step 2: Generate Exploratory Visualisation Models

Python Script
plt.figure(figsize=(8, 5))
plt.scatter(df['Size_sqm'], df['Price_NAD_k'], color='blue', alpha=0.7)
plt.title('Property Size metric vs Baseline Valuation Profile')
plt.xlabel('Size (Square Metres)')
plt.ylabel('Price Index (Thousands NAD)')
plt.grid(True)
plt.show()

Step 3: Execute Train/Test Partitioning Splitting

Python Script
X = df[['Size_sqm']]  
y = df['Price_NAD_k'] 

# Segment arrays allocating 80% to active model profiling sequences
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"Active Training Footprint: {X_train.shape[0]} | Validation Test Matrices: {X_test.shape[0]}")

Step 4: Instantiate and Train the Regression Engine

Python Script
# Initialize algorithm structure
regressor = LinearRegression()

# Run optimization processes
regressor.fit(X_train, y_train)

print("Optimization Sequence Finished.")
print(f"Calculated Intercept Weight (w0): {regressor.intercept_:.2f}")
print(f"Feature Coefficient Multiplier (w1): {regressor.coef_[0]:.2f}")

Step 5: Verify Predictive Operations Accuracy

Python Script
# Compute inference operations on validation blocks
y_pred = regressor.predict(X_test)

# Extract scoring indicators
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Calculated Mean Squared Error Trend: {mse:.2f}")
print(f"Calculated Coefficient of Determination (R2): {r2:.2f}")

Module 4: Hands-On Lab 2 – Credit Risk Evaluation (Classification)

Classification strategies target explicit qualitative status boundaries. This segment configures a **Logistic Regression** layer to predict if a credit candidate reflects a Low Risk (0) or High Risk (1) system probability score profile.

Step 1: Instantiate the Synthetic Risk Data

Python Script
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import seaborn as sns

# Construct synthetic array targets reflecting user features
X_class, y_class = make_classification(
    n_samples=200, n_features=2, n_redundant=0, 
    n_informative=2, random_state=24, n_clusters_per_class=1, flip_y=0.05
)

df_class = pd.DataFrame(X_class, columns=['Scaled_Credit_Score', 'Debt_to_Income_Ratio'])
df_class['Risk_Status'] = y_class

# Graph raw coordinate distributions
plt.figure(figsize=(8, 5))
sns.scatterplot(data=df_class, x='Scaled_Credit_Score', y='Debt_to_Income_Ratio', hue='Risk_Status', palette='coolwarm')
plt.title('Risk Profile Data Metric Map')
plt.show()

Step 2: Train the Classification Model Layer

Python Script
X_c_train, X_c_test, y_c_train, y_c_test = train_test_split(
    df_class[['Scaled_Credit_Score', 'Debt_to_Income_Ratio']], 
    df_class['Risk_Status'], test_size=0.25, random_state=42
)

classifier = LogisticRegression()
classifier.fit(X_c_train, y_c_train)

Step 3: Extract the Confusion Evaluation Metrics

Python Script
y_c_pred = classifier.predict(X_c_test)

print(f"Overall Model Accuracy Evaluation: {accuracy_score(y_c_test, y_c_pred) * 100:.2f}%\n")
print("Target Classification Matrix Matrix Breakdown:")
print(classification_report(y_c_test, y_c_pred))

Module 5: The Cap-stone Project Challenge

Contextual Task Case: You are tasked with preparing an Automated Irrigation decision engine for an AgriTech setup operating within Namibia. Your core objective is to instruct a model component to infer whether a crops plot requires Active Irrigation (1) or No Action (0) based strictly on localized Soil Moisture and Environment Temperature feeds.

Your Code Objectives: Fill in the missing programmatic pipelines marked underneath the comment indicators to implement an automated classification workflow from end to end.

Challenge Workspace
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Establish environmental simulation profiles
np.random.seed(7)
moisture = np.random.uniform(10, 50, 150)
temperature = np.random.uniform(18, 42, 150)
needs_water = ((moisture < 25) & (temperature > 30)).astype(int)

agri_df = pd.DataFrame({'Soil_Moisture': moisture, 'Temperature': temperature, 'Needs_Water': needs_water})

# =========================================================
# TODO: FILL IN THE CODE ENTRIES LOCATED DIRECTLY BELOW
# =========================================================

# Task 1: Isolate your predictive matrices (X) and standard labels (y) from agri_df


# Task 2: Allocate exactly 20% of your array records into a validation test block


# Task 3: Initialize your target estimator tracking framework (e.g., LogisticRegression)


# Task 4: Execute model fit procedures on training subsets


# Task 5: Extract predictive target validations metrics using the verification partition Arrays


# Task 6: Compute and display final operational verification performance percentages

# =========================================================
# END OF ASSIGNMENT CODE TARGET BLOCKS
# =========================================================

Next Steps & Development Resources

Congratulations on executing this fundamental introductory framework dataset course! To build on these data engineering skills, consider these learning paths:

  • Scikit-Learn Documentation Guides: Explore the library's core user guides at scikit-learn.org.
  • Kaggle Challenge Workspaces: Test your models using beginner environments such as the Titanic ML Classification Framework.
  • Deep Learning Indaba: Connect with regional African AI tracking clusters and find regional mentorship support tracks by checking out community opportunities.

DEEP LEARNING INDABA

OVERVIEW

What is Deep Learning Indaba ?


The Deep Learning Indaba is the annual meeting of the African machine learning and AI community. Since 2017, our mission has been to Strengthen African AI, and to ensure that Africans are owners and shapers of the coming advances in AI. As an educational charity, our work focuses on learning, teaching, research, and the role work of peer learning and community building.

In 2024, The Deep Learning Indaba will be held in Dakar, Senegal from the 1st to the 7th of September at Amadou Mahtar Mbow university (UAM).

The theme for this edition is “Xam Xamlé”


ATTEND

INDABA X NAMIBIA

Shape the Future of African AI—Join Us In-Person or Online!

Welcome to Deep Learning IndabaX Namibia, the premier gathering for grassroots artificial intelligence and machine learning innovation. Whether you are joining us on-site to network with local pioneers or tuning in virtually from across the globe, you are part of a vibrant ecosystem driving digital sovereignty, local capacity building, and state-of-the-art research. Secure your spot today and help build the future of technology for Namibia and the continent.

FEATURED PARTNERS:

FEATURED

PAST EVENTS

Scroll to Top