Wednesday, March 25, 2026

How I Fixed My Sluggish Mac in Minutes Using Kiro CLI

My MacBook had been crawling for days. Apps took forever to open, switching between windows felt like wading through mud, and I had no idea why. I only had 4 browser tabs open — nothing unusual. Then I tried Kiro CLI, and within minutes I had my answer and my fix.

Here's exactly how it went down.


The Problem

Everything was slow. Not "a little laggy" slow — genuinely unusable slow. Spinning beach balls, delayed keystrokes, the works. I'd already tried the usual suspects: restarting apps, clearing cache, the classic "turn it off and on again." Nothing helped.


Installing Kiro CLI

Getting started took less than a minute. Kiro CLI is a terminal-based AI assistant that can interact directly with your system.

brew install kiro-cli

Then just launch it:

kiro-cli chat

That's it. No complex setup, no config files to edit.


Identifying the Issue

I typed one line to Kiro:

"my system is very slow"

Kiro immediately ran a system diagnostic and surfaced this:

Load Avg: 19.14, 69.88, 67.35
PhysMem: 7489M used, 142M unused
VM: 384609609 swapins, 394054546 swapouts

Kiro's analysis was direct: load average of 19 is dangerously high (healthy is 1–4), RAM was nearly exhausted, and the system was thrashing swap — reading and writing to disk constantly, which is orders of magnitude slower than RAM.

It also spotted the likely culprit immediately: 136 Chrome Helper processes running simultaneously.

I pushed back — I only had 4 tabs open. Kiro dug deeper and found something I hadn't considered:

"There are two user accounts running Chrome — your active session and a background session via Fast User Switching. Chrome is running in both."

That was the "aha" moment. I'd switched users earlier and never logged out. The other account had Chrome fully running in the background, invisible to me.


Fixing the Root Cause

Kiro walked me through the fix step by step.

First attempt — a clean kill signal:

sudo pkill -u [other-user] "Google Chrome"

That reduced Chrome processes from 136 to 55, but didn't finish the job. Kiro checked again and found 39 processes still running under the background account. So it gave me a harder fix:

sudo kill -9 $(ps aux | grep -i "Google Chrome" | grep [other-user] | awk '{print $2}' | tr '\n' ' ')

After that, Kiro ran another check:

Load Avg: 2.49
CPU idle: 83%
Chrome processes: 19 (normal — just my active session)

Done. System restored.


How Kiro CLI Saved My Day

What would have taken me an hour of Googling, Stack Overflow rabbit holes, and trial-and-error took about 10 minutes of conversation. Kiro didn't just tell me "Chrome uses a lot of memory" — it:

  • Ran live diagnostics on my system
  • Identified the non-obvious root cause (Fast User Switching + dual Chrome sessions)
  • Gave me the exact commands to fix it
  • Verified the fix actually worked after each step

It felt less like using a tool and more like having a sysadmin sitting next to me.


Try Kiro CLI for Free

Kiro offers a 500-credit free trial when you sign up — more than enough to explore what it can do. After the trial, there's a free tier to keep using it, with paid plans starting at $20/month if you need more capacity.

👉 kiro.dev

If your Mac (or any system) ever feels inexplicably slow, just open a terminal and ask. You might be surprised how fast you get an answer.


Content was rephrased for compliance with licensing restrictions.

References:

Thursday, March 5, 2026

Spark Connect vs RDD: Understanding Modern Spark Architecture

Spark Connect vs RDD: Understanding Modern Spark Architecture

TL;DR: Spark Connect represents a shift toward remote, DataFrame-centric development, leaving behind the low-level RDD API. Here's what that means for your data pipelines.

The Evolution of Spark APIs

Apache Spark has always offered multiple levels of abstraction, but Spark Connect marks a deliberate move up the stack. Understanding these layers is crucial for modern data engineering.

Three Layers, One Engine

At the foundation sits Spark Core — the execution engine handling task scheduling, memory management, and fault tolerance. Everything else is built on top.

The RDD (Resilient Distributed Dataset) API gave developers fine-grained control with operations like map(), filter(), and reduceByKey(). It's powerful but requires manual optimization and deep Spark knowledge.

The DataFrame/SQL API provides a declarative, schema-aware interface. Think df.groupBy().count() or pure SQL queries. The Catalyst optimizer handles query planning automatically, often outperforming hand-tuned RDD code.

What Makes Spark Connect Different?

Spark Connect introduces a client-server architecture that fundamentally changes how you interact with Spark:

Traditional Spark: Your laptop runs the full Spark runtime. You have access to everything — DataFrames, RDDs, SparkContext — but need the entire Spark distribution installed locally.

Spark Connect: Your laptop runs a thin client that sends DataFrame operations to a remote Spark cluster via gRPC. Only DataFrame/SQL APIs are supported. RDDs and SparkContext? Not available.

A Real-World Example

Let's analyze web server logs to find 404 errors and top pages.

With RDDs:logs_rdd = sc.textFile("s3://bucket/logs/*.log")

parsed = logs_rdd.map(parse_log)

errors = parsed.filter(lambda x: x['status'] == '404').count()

top_pages = parsed.map(lambda x: (x['url'], 1)) \

                  .reduceByKey(lambda a, b: a + b) \

                  .takeOrdered(10, key=lambda x: -x[1])



With Spark Connect (DataFrames):spark = SparkSession.builder.remote("sc://cluster:15002").getOrCreate()

logs_df = spark.read.text("s3://bucket/logs/*.log")

parsed = logs_df.select(split(col("value"), " ")...)


errors = parsed.filter(col("status") == "404").count()

top_pages = parsed.groupBy("url").count() \

                  .orderBy(col("count").desc()).limit(10)



Or even simpler with SQL:spark.sql("SELECT url, COUNT(*) FROM logs GROUP BY url ORDER BY 2 DESC LIMIT 10")



The DataFrame approach is more readable, automatically optimized, and runs remotely without a full Spark installation.

Why the Restrictions?

Spark Connect's limitations are intentional design choices:

Simpler API surface → easier to maintain and evolve

Remote-friendly → DataFrames serialize well over the network; RDD closures don't

Better practices → encourages modern, optimized patterns

Stability → client crashes don't affect server-side jobs


When You Still Need RDDs

RDDs aren't obsolete — they're just specialized. You need them for:

Custom partitioning logic (rdd.partitionBy())

Complex stateful transformations outside DataFrame capabilities

Working with truly unstructured data that doesn't fit tabular models

Fine-grained control over shuffle and execution


But here's the catch: if you need RDDs, you can't use Spark Connect.

The Bottom Line

For most data engineering workloads — ETL, analytics, aggregations — Spark Connect with DataFrames is simpler, faster, and more maintainable. The Catalyst optimizer often outperforms manually-tuned RDD code, and remote execution from notebooks is incredibly convenient.

RDDs remain available for those edge cases requiring low-level control, but the industry trend is clear: DataFrame APIs are the future of Spark development.

Decision Framework

Choose Spark Connect when:You want remote development (Jupyter, IDEs)

Your workload fits DataFrame/SQL patterns

You value automatic optimization

You want simplified dependency management


Stick with traditional Spark when:You need RDD-level control

Working with DynamicFrames (AWS Glue)

Custom partitioning or stateful operations

Legacy codebases that can't be refactored


The good news? Most new Spark applications can — and should — be built using DataFrames, making Spark Connect a natural fit for modern data platforms.

-------------------------

 TRADITIONAL SPARK SPARK CONNECT

================                     ==============

┌──────────────────┐                ┌─────────────┐
│  Your Laptop     │                │ Your Laptop │
│                  │                │  (Thin)    │
│  ┌────────────┐  │                │             │
│  │ Full Spark │  │                │  ┌───────┐  │
│  │ Runtime    │  │    vs.         │  │Client │  │
│  │            │  │                │  │Library│  │
│  │ RDD + DF   │  │                │  │       │  │
│  │ APIs       │  │                │  │DF API │  │
│  └────────────┘  │                │  │ only  │  │
│                  │                │  └───┬───┘  │
│  Executes        │                │      │      │
│  Locally         │                │      │      │
└──────────────────┘                └──────┼──────┘
                                           │
                                           │ Network
                                           │ (gRPC)
                                           │
                                    ┌──────▼──────┐
                                    │Spark Cluster│
                                    │             │
                                    │  Executes   │
                                    │  Remotely   │
                                    │             │
                                    │ RDD + DF    │
                                    │ (Internal) │
                                    └─────────────┘

Thursday, December 14, 2023

xargs

  1.  Run rm cmd recursively in all subdirectories. Navigate to target dir and run below command
    1. ls | xargs -I % sh -c  'cd %; pwd; rm -rf ; cd ..'
    2.  

Wednesday, June 29, 2022

TMUX cheat sheet

Description cmd
Scroll Enter scroll mode : ctrl+b [ then up/down key to scroll finally q to quit
New Window ctrl+b c
Rename Window ctrl+b ,

Friday, May 28, 2021

Productivity Tools

  1. Form History - https://stephanmahieu.github.io/fhc-home/
  2. Shell Directory Management - https://github.com/mcwoodle/shell-directory-management/blob/master/README.md
  3. Browser Extensions
    1. https://addons.mozilla.org/en-US/firefox/addon/screenshot-capture-annotate/?utm_source=addons.mozilla.org&utm_medium=referral&utm_content=search