Classification Studio — Random Forest vs XGBoost
Interactive classification playground: upload a CSV, choose features + target, and compare tree-based classifiers (Random Forest mode and XGBoost boosting). See accuracy, confusion matrix, precision/recall/F1, and export Python code.
Data
One dataset, many classifiers
Load a sample dataset or upload your own CSV. Choose the target (y) and the feature columns (X), then compare models.
Teaching note: this studio focuses on tabular classification. It can handle multiple feature columns (unlike the Regression Studio which currently fits one x).
Showing up to 8 columns and the first 8 rows.
Models
Random Forest vs XGBoost
Both models are trained using XGBoost (WASM). Random Forest mode uses num_parallel_tree with subsampling.
# Classification Studio — Python export
# pip install pandas scikit-learn xgboost
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from xgboost import XGBClassifier
CSV_PATH = "your_dataset.csv"
FEATURE_COLS = [
"FEATURE_1",
"FEATURE_2"
]
TARGET_COL = "TARGET"
ENCODING = "onehot"
STANDARDIZE_NUMERIC = False
TEST_SIZE = 0.25
RANDOM_STATE = 42
df = pd.read_csv(CSV_PATH)
X = df[FEATURE_COLS]
y = df[TARGET_COL]
numeric_cols = X.select_dtypes(include=["number"]).columns.tolist()
categorical_cols = [c for c in X.columns if c not in numeric_cols]
if ENCODING == "onehot":
cat_transformer = OneHotEncoder(handle_unknown="ignore")
elif ENCODING == "label":
# Teaching note: label encoding is usually NOT recommended for trees with categoricals.
# Prefer one-hot encoding. This placeholder keeps the pipeline structure simple.
cat_transformer = OneHotEncoder(handle_unknown="ignore")
else:
cat_transformer = "passthrough"
num_steps = []
if STANDARDIZE_NUMERIC:
num_steps.append(("scaler", StandardScaler()))
num_transformer = Pipeline(steps=num_steps) if num_steps else "passthrough"
preprocess = ColumnTransformer(
transformers=[
("num", num_transformer, numeric_cols),
("cat", cat_transformer, categorical_cols),
],
remainder="drop",
)
clf = XGBClassifier(max_depth=4, learning_rate=0.2, n_estimators=120, subsample=0.9, colsample_bytree=0.9, reg_lambda=1, reg_alpha=0, random_state=RANDOM_STATE, tree_method="hist")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=TEST_SIZE, random_state=RANDOM_STATE, stratify=y)
pipe = Pipeline(steps=[("preprocess", preprocess), ("model", clf)])
pipe.fit(X_train, y_train)
pred = pipe.predict(X_test)
print("Accuracy:", accuracy_score(y_test, pred))
print("Confusion matrix:\n", confusion_matrix(y_test, pred))
print("\nClassification report:\n", classification_report(y_test, pred))
Teaching note: preprocessing choices (encoding/standardization) must match between the studio and Python for comparable results.