MuhammadLab
Machine learning studioFrontend-onlyBrowser-local

Classification Studio — Random Forest vs XGBoost

Interactive classification playground: upload a CSV, choose features + target, and compare tree-based classifiers (Random Forest mode and XGBoost boosting). See accuracy, confusion matrix, precision/recall/F1, and export Python code.

Data

One dataset, many classifiers

Load a sample dataset or upload your own CSV. Choose the target (y) and the feature columns (X), then compare models.

0 rows loadedBrowser-local
Column mapping
Feature columns (X)

Teaching note: this studio focuses on tabular classification. It can handle multiple feature columns (unlike the Regression Studio which currently fits one x).

Data preview

Showing up to 8 columns and the first 8 rows.

Models

Random Forest vs XGBoost

Both models are trained using XGBoost (WASM). Random Forest mode uses num_parallel_tree with subsampling.

Train a model to see the confusion matrix and metrics.
Generated code (Python)
scikit-learn / XGBoost equivalent
# Classification Studio — Python export
# pip install pandas scikit-learn xgboost

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from xgboost import XGBClassifier

CSV_PATH = "your_dataset.csv"
FEATURE_COLS = [
  "FEATURE_1",
  "FEATURE_2"
]
TARGET_COL = "TARGET"
ENCODING = "onehot"
STANDARDIZE_NUMERIC = False
TEST_SIZE = 0.25
RANDOM_STATE = 42

df = pd.read_csv(CSV_PATH)
X = df[FEATURE_COLS]
y = df[TARGET_COL]

numeric_cols = X.select_dtypes(include=["number"]).columns.tolist()
categorical_cols = [c for c in X.columns if c not in numeric_cols]

if ENCODING == "onehot":
    cat_transformer = OneHotEncoder(handle_unknown="ignore")
elif ENCODING == "label":
    # Teaching note: label encoding is usually NOT recommended for trees with categoricals.
    # Prefer one-hot encoding. This placeholder keeps the pipeline structure simple.
    cat_transformer = OneHotEncoder(handle_unknown="ignore")
else:
    cat_transformer = "passthrough"

num_steps = []
if STANDARDIZE_NUMERIC:
    num_steps.append(("scaler", StandardScaler()))
num_transformer = Pipeline(steps=num_steps) if num_steps else "passthrough"

preprocess = ColumnTransformer(
    transformers=[
        ("num", num_transformer, numeric_cols),
        ("cat", cat_transformer, categorical_cols),
    ],
    remainder="drop",
)

clf = XGBClassifier(max_depth=4, learning_rate=0.2, n_estimators=120, subsample=0.9, colsample_bytree=0.9, reg_lambda=1, reg_alpha=0, random_state=RANDOM_STATE, tree_method="hist")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=TEST_SIZE, random_state=RANDOM_STATE, stratify=y)

pipe = Pipeline(steps=[("preprocess", preprocess), ("model", clf)])
pipe.fit(X_train, y_train)
pred = pipe.predict(X_test)

print("Accuracy:", accuracy_score(y_test, pred))
print("Confusion matrix:\n", confusion_matrix(y_test, pred))
print("\nClassification report:\n", classification_report(y_test, pred))

Teaching note: preprocessing choices (encoding/standardization) must match between the studio and Python for comparable results.