
A Self-Paced Workshop Designed for the Deep Learning IndabaX Namibia Community
Welcome to this practical, self-paced introduction to Machine Learning (ML). ML architectures are currently transforming industries across Africa—from optimising agricultural yields and predicting health diagnostic metrics to building resilient cybersecurity systems. The purpose of this guide is to bridge the gap between basic coding syntax and statistical system development using Python.
To successfully run the technical labs inside this curriculum, you require a computing environment equipped with Python 3 and fundamental data processing libraries.
The fastest zero-configuration option. Code executes completely within your browser with free sandbox compute engines provided.
➔ Go to colab.research.google.com and select New Notebook.
If you prefer locally hosted processing, download the Anaconda Individual Edition and open up a clean instance of Jupyter Notebook.
To ensure your environment handles calculations correctly, execute the following dependency call in your command terminal:
pip install numpy pandas scikit-learn matplotlib seaborn
Before designing analytical algorithms, we must understand the shift away from deterministic application logic.
| Traditional Programming Paradigm | Machine Learning Paradigm |
|---|---|
| Developers manually write explicit logic rules. Systems accept raw data inputs and parse them through these rules to evaluate strict programmatic output solutions. | Data structures and corresponding target outputs (labels) are fed collectively into a learning engine. The algorithm infers the mathematical patterns to engineer its own decision rules. |
Mathematically, our primary objective within Supervised Learning models is to deduce an optimal target estimation function f capable of establishing relationships such that:
Where y is our dependent target variable, X represents the matrix of independent predictive features, and ε accounts for unavoidable structural noise (irreducible error).
A functional production-grade deployment relies heavily on systematic execution routines. The end-to-end lifecycle follows this structured, repeating lifecycle:
In this exercise, we will configure a basic **Linear Regression** system to predict continuous property values based on space scaling measurements (in square metres) located in Windhoek, mapped inside Namibian Dollars (NAD).
import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error, r2_score # Establish deterministic sequence seeds np.random.seed(42) # Generate synthetic arrays: Size (sqm) vs Price (expressed in thousands NAD) sizes = np.random.normal(150, 40, 100).reshape(-1, 1) prices = 250 + (sizes * 4.5) + np.random.normal(0, 50, 100).reshape(-1, 1) # Format structured pandas data frame workspace df = pd.DataFrame(data=np.hstack((sizes, prices)), columns=['Size_sqm', 'Price_NAD_k']) print(df.head())
plt.figure(figsize=(8, 5))
plt.scatter(df['Size_sqm'], df['Price_NAD_k'], color='blue', alpha=0.7)
plt.title('Property Size metric vs Baseline Valuation Profile')
plt.xlabel('Size (Square Metres)')
plt.ylabel('Price Index (Thousands NAD)')
plt.grid(True)
plt.show()
X = df[['Size_sqm']]
y = df['Price_NAD_k']
# Segment arrays allocating 80% to active model profiling sequences
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"Active Training Footprint: {X_train.shape[0]} | Validation Test Matrices: {X_test.shape[0]}")
# Initialize algorithm structure
regressor = LinearRegression()
# Run optimization processes
regressor.fit(X_train, y_train)
print("Optimization Sequence Finished.")
print(f"Calculated Intercept Weight (w0): {regressor.intercept_:.2f}")
print(f"Feature Coefficient Multiplier (w1): {regressor.coef_[0]:.2f}")
# Compute inference operations on validation blocks
y_pred = regressor.predict(X_test)
# Extract scoring indicators
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Calculated Mean Squared Error Trend: {mse:.2f}")
print(f"Calculated Coefficient of Determination (R2): {r2:.2f}")
Classification strategies target explicit qualitative status boundaries. This segment configures a **Logistic Regression** layer to predict if a credit candidate reflects a Low Risk (0) or High Risk (1) system probability score profile.
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import seaborn as sns
# Construct synthetic array targets reflecting user features
X_class, y_class = make_classification(
n_samples=200, n_features=2, n_redundant=0,
n_informative=2, random_state=24, n_clusters_per_class=1, flip_y=0.05
)
df_class = pd.DataFrame(X_class, columns=['Scaled_Credit_Score', 'Debt_to_Income_Ratio'])
df_class['Risk_Status'] = y_class
# Graph raw coordinate distributions
plt.figure(figsize=(8, 5))
sns.scatterplot(data=df_class, x='Scaled_Credit_Score', y='Debt_to_Income_Ratio', hue='Risk_Status', palette='coolwarm')
plt.title('Risk Profile Data Metric Map')
plt.show()
X_c_train, X_c_test, y_c_train, y_c_test = train_test_split(
df_class[['Scaled_Credit_Score', 'Debt_to_Income_Ratio']],
df_class['Risk_Status'], test_size=0.25, random_state=42
)
classifier = LogisticRegression()
classifier.fit(X_c_train, y_c_train)
y_c_pred = classifier.predict(X_c_test)
print(f"Overall Model Accuracy Evaluation: {accuracy_score(y_c_test, y_c_pred) * 100:.2f}%\n")
print("Target Classification Matrix Matrix Breakdown:")
print(classification_report(y_c_test, y_c_pred))
Contextual Task Case: You are tasked with preparing an Automated Irrigation decision engine for an AgriTech setup operating within Namibia. Your core objective is to instruct a model component to infer whether a crops plot requires Active Irrigation (1) or No Action (0) based strictly on localized Soil Moisture and Environment Temperature feeds.
Your Code Objectives: Fill in the missing programmatic pipelines marked underneath the comment indicators to implement an automated classification workflow from end to end.
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Establish environmental simulation profiles
np.random.seed(7)
moisture = np.random.uniform(10, 50, 150)
temperature = np.random.uniform(18, 42, 150)
needs_water = ((moisture < 25) & (temperature > 30)).astype(int)
agri_df = pd.DataFrame({'Soil_Moisture': moisture, 'Temperature': temperature, 'Needs_Water': needs_water})
# =========================================================
# TODO: FILL IN THE CODE ENTRIES LOCATED DIRECTLY BELOW
# =========================================================
# Task 1: Isolate your predictive matrices (X) and standard labels (y) from agri_df
# Task 2: Allocate exactly 20% of your array records into a validation test block
# Task 3: Initialize your target estimator tracking framework (e.g., LogisticRegression)
# Task 4: Execute model fit procedures on training subsets
# Task 5: Extract predictive target validations metrics using the verification partition Arrays
# Task 6: Compute and display final operational verification performance percentages
# =========================================================
# END OF ASSIGNMENT CODE TARGET BLOCKS
# =========================================================
Congratulations on executing this fundamental introductory framework dataset course! To build on these data engineering skills, consider these learning paths:

The Deep Learning Indaba is the annual meeting of the African machine learning and AI community. Since 2017, our mission has been to Strengthen African AI, and to ensure that Africans are owners and shapers of the coming advances in AI. As an educational charity, our work focuses on learning, teaching, research, and the role work of peer learning and community building.
In 2024, The Deep Learning Indaba will be held in Dakar, Senegal from the 1st to the 7th of September at Amadou Mahtar Mbow university (UAM).
The theme for this edition is “Xam Xamlé”
Shape the Future of African AI—Join Us In-Person or Online!
Welcome to Deep Learning IndabaX Namibia, the premier gathering for grassroots artificial intelligence and machine learning innovation. Whether you are joining us on-site to network with local pioneers or tuning in virtually from across the globe, you are part of a vibrant ecosystem driving digital sovereignty, local capacity building, and state-of-the-art research. Secure your spot today and help build the future of technology for Namibia and the continent.





