Isen Kasa

Building an Expense Analytics Dashboard with TypeScript, React, and AI

A technical walkthrough of how I built a local expense analytics proof of concept that parses bank statements, categorizes transactions, generates JSON reports, and renders a React dashboard.

typescriptreactaifintech

Building an Expense Analytics Dashboard with TypeScript, React, and AI

I had a simple pain point: reviewing expenses from bank statements is boring, repetitive, and weirdly easy to put off.

The data is already there. The bank gives you CSV files. The transactions have dates, descriptions, and amounts. But the useful questions still take manual work:

  • How much came in this month?
  • Where is the money going?
  • What vendors show up the most?
  • Which purchases are meals, utilities, cloud hosting, software, or ads?
  • What would this look like as a dashboard instead of a spreadsheet?

So I built a proof of concept.

The goal was not to build a full SaaS product immediately. I wanted something I could run on my own machine, point at statement files, generate structured analytics, and show in a React dashboard.

That gave me a nice constraint:

CSV statement files
-> TypeScript scripts
-> generated JSON
-> React dashboard

No database. No auth. No backend API. No upload flow. No deployment story yet.

Just the smallest version of the workflow that proves the idea.


The Shape of the App

The project ended up with this structure:

expense-poc/
  statements/
    sample-statement.csv
    01_01_2026_05_22_2026_transactions.csv
 
  scripts/
    parse-statements.ts
    categorize-transactions.ts
    ai-categorize-transactions.ts
    generate-report.ts
    logger.ts
    types.ts
 
  output/
    report.json
    ai-category-cache.json
 
  dashboard/
    public/
      report.json
    src/
      App.tsx
      report.ts
      styles.css

The important architectural choice is that the dashboard does not know how to parse statements.

The scripts own data processing. The dashboard owns presentation.

That gives me a clean boundary:

Data pipeline produces report.json
React app consumes report.json

This is also what makes the project easy to evolve. The local script could eventually become a backend endpoint. The generated JSON could eventually become database rows. The dashboard could eventually become a real hosted web app.

But for the POC, a generated JSON file is enough.


Why I Chose This Approach

I wanted this to run locally because bank data is sensitive, and for a first pass I did not want to build around cloud uploads.

But I also did not want to over-romanticize the architecture. This is not limited to being a local tool forever.

The current version runs locally because that is the fastest and safest way to prove the workflow:

Put CSV files in /statements
Run a script
Open the dashboard

Later, the same system could become a full web app:

  • CSV upload instead of /statements
  • API route instead of a local script
  • Database storage instead of JSON output
  • Auth and account separation
  • Team access
  • Manual category correction
  • Exports to QuickBooks or CSV

But I did not want the first version to be swallowed by infrastructure.

That is a common trap. You start building auth, storage, queues, deployments, and permissions before you even know if the actual workflow is useful.

For this POC, I wanted to prove the data transformation first.


The TypeScript Types

I started by defining the shape of the data.

This is the core Transaction type:

export type TransactionType = "income" | "expense";
 
export interface Transaction {
  id: string;
  date: string;
  description: string;
  merchant: string;
  amount: number;
  type: TransactionType;
  category: string;
  sourceFile: string;
}

The generated report has summaries for categories, merchants, monthly trends, totals, and the full transaction list:

export interface Report {
  generatedAt: string;
  dateRange: {
    start: string;
    end: string;
  };
  totals: {
    income: number;
    expenses: number;
    net: number;
    transactionCount: number;
  };
  categories: CategorySummary[];
  merchants: MerchantSummary[];
  monthlyTrend: MonthlyTrendItem[];
  transactions: Transaction[];
  uncategorized: Transaction[];
}

This type-first approach helped keep the project from becoming a pile of loosely related scripts.

The parser produces Transaction[].

The categorizer accepts Transaction[] and returns Transaction[].

The report generator accepts Transaction[] and returns a Report.

The dashboard fetches a Report.

That is the whole contract.


Parsing Bank Statement CSVs

The parser supports two CSV formats.

The first is a clean generic format:

Date,Description,Amount
2026-05-01,OPENAI CHATGPT SUBSCRIPTION,-20.00
2026-05-02,AWS SERVICES,-84.22
2026-05-03,CLIENT PAYMENT,2500.00

The second is a more realistic headerless bank export:

01/03/2026,6800.00,,INVOICE PAYMENT NORTHSTAR DESIGN INV-1001
01/05/2026,-312.40,26005001,ACME ELECTRIC BUSINESS UTILITIES JAN
01/08/2026,-486.22,26008001,AWS SERVICES CLOUD HOSTING JAN

I modeled parsers with a small interface:

interface StatementParser {
  supports(fileName: string, fileContents: string): boolean;
  parse(fileName: string, fileContents: string): Transaction[];
}

That makes it easy to add more importers later.

For example, one parser can support the generic headered format:

const genericCsvParser: StatementParser = {
  supports(fileName, fileContents) {
    if (!isCsvFile(fileName)) {
      return false;
    }
 
    const [headerRow] = parseCsvRows(fileContents);
    return Boolean(headerRow && hasGenericCsvColumns(headerRow));
  },
  parse(fileName, fileContents) {
    const rows = parse(fileContents, {
      bom: true,
      columns: true,
      skip_empty_lines: true,
      trim: true,
    }) as Record<string, string>[];
 
    return rows.map((row, index) =>
      normalizeCsvRow(row as unknown as GenericCsvRow, fileName, index + 1),
    );
  },
};

Another parser can support the headerless bank export:

const dateAmountDescriptionCsvParser: StatementParser = {
  supports(fileName, fileContents) {
    if (!isCsvFile(fileName)) {
      return false;
    }
 
    const [firstRow] = parseCsvRows(fileContents);
    return Boolean(firstRow && isDateAmountDescriptionRow(firstRow));
  },
  parse(fileName, fileContents) {
    const rows = parseCsvRows(fileContents);
 
    return rows.map((row, index) =>
      normalizeDateAmountDescriptionRow(row, fileName, index + 1),
    );
  },
};

This gives the code a nice extension point:

const statementParsers: StatementParser[] = [
  genericCsvParser,
  dateAmountDescriptionCsvParser,
];

If I later want to support Chase, Amex, Stripe exports, or PDF statements, I do not need to rewrite the dashboard. I add another importer.


Normalizing Transactions

The parser normalizes every row into the same shape.

For the headerless bank export, the row looks like this:

function normalizeDateAmountDescriptionRow(
  row: string[],
  sourceFile: string,
  rowIndex: number,
): Transaction {
  const [rawDate, rawAmount, , rawDescription] = row;
  const date = normalizeDate(rawDate, sourceFile, rowIndex);
  const description = requireValue(rawDescription, "Description", sourceFile, rowIndex);
  const amount = normalizeAmount(rawAmount, sourceFile, rowIndex);
 
  return {
    id: createTransactionId(sourceFile, rowIndex, date, description, amount),
    date,
    description,
    merchant: normalizeMerchant(description),
    amount,
    type: amount >= 0 ? "income" : "expense",
    category: "Uncategorized",
    sourceFile,
  };
}

There are a few small but important details here.

Positive amounts become income:

type: amount >= 0 ? "income" : "expense"

The signed amount is preserved in the transaction, but expense analytics use absolute values later.

Dates are normalized into YYYY-MM-DD, even if the source file uses MM/DD/YYYY.

Merchant names are cleaned up so the dashboard does not show raw bank noise everywhere.


Transaction IDs and AI Cache Safety

One of the more interesting bugs came from transaction IDs.

Originally, I generated IDs from:

sourceFile + rowIndex

That worked until I replaced a CSV file with sanitized demo data. Row 10 in the old file and row 10 in the new file would have the same ID.

That is a problem because AI category assignments are cached by transaction ID.

So I changed the ID to include the row content:

function createTransactionId(
  sourceFile: string,
  rowIndex: number,
  date: string,
  description: string,
  amount: number,
): string {
  return createHash("sha256")
    .update(`${sourceFile}:${rowIndex}:${date}:${description}:${amount}`)
    .digest("hex")
    .slice(0, 16);
}

Now the ID is still deterministic, but it changes when the row content changes.

That makes the AI cache much safer.


Cleaning Merchant Names

Bank descriptions are messy.

They often include reference numbers, processor names, card prefixes, locations, phone numbers, or truncated merchant text.

I started with simple normalization:

function normalizeMerchant(description: string): string {
  const normalizedDescription = description
    .toUpperCase()
    .replace(/\b(POS|ACH|DEBIT|CREDIT|PURCHASE|CARD|ONLINE|RECURRING)\b/g, "")
    .replace(/X{3,}\d+\b/g, "")
    .replace(/[^\w\s&.-]/g, " ")
    .replace(/\s+/g, " ")
    .trim();
 
  const merchantAlias = merchantAliases.find(({ pattern }) =>
    pattern.test(normalizedDescription),
  );
 
  return merchantAlias?.name ?? normalizedDescription;
}

Then I added aliases for recurring vendors:

const merchantAliases = [
  { pattern: /\b(AMAZON WEB SERVICES|AWS SERVICES)\b/, name: "AWS" },
  { pattern: /^VERCEL\b/, name: "Vercel" },
  { pattern: /^NETLIFY\b/, name: "Netlify" },
  { pattern: /^DIGITALOCEAN\b/, name: "DigitalOcean" },
  { pattern: /^ACME ELECTRIC\b/, name: "Acme Electric" },
  { pattern: /^COMCAST BUSINESS\b/, name: "Comcast Business" },
  { pattern: /^VERIZON BUSINESS\b/, name: "Verizon Business" },
  { pattern: /^META ADS\b/, name: "Meta Ads" },
  { pattern: /^LINKEDIN ADS\b/, name: "LinkedIn Ads" },
];

This made the "Top Merchants" chart much more useful.

Instead of seeing twelve separate AWS-like descriptions, I see AWS aggregated as a single merchant.


Local Rules Before AI

I wanted AI in the workflow, but I did not want to use it for everything.

Some transactions are obvious:

AWS SERVICES CLOUD HOSTING JAN -> Cloud Hosting
COMCAST BUSINESS INTERNET JAN -> Utilities
INVOICE PAYMENT NORTHSTAR DESIGN -> Income
TST*HARBOR CAFE CLIENT MEETING -> Meals
META ADS LEAD CAMPAIGN -> Advertising

So I added a rule-based categorizer first:

interface CategoryRule {
  category: string;
  terms: string[];
}
 
const categoryRules: CategoryRule[] = [
  {
    category: "Cloud Hosting",
    terms: ["AWS", "AMAZON WEB SERVICES", "VERCEL", "NETLIFY", "DIGITALOCEAN"],
  },
  {
    category: "Utilities",
    terms: ["BUSINESS UTILITIES", "COMCAST BUSINESS", "VERIZON BUSINESS", "INTERNET"],
  },
  {
    category: "Meals",
    terms: ["TST*", "CAFE", "COFFEE", "BISTRO", "GRILL", "TAVERN", "UBER EATS"],
  },
  {
    category: "Income",
    terms: ["INVOICE PAYMENT", "CLIENT PAYMENT", "DEPOSIT", "STRIPE"],
  },
];

Then the categorizer is just:

export function categorizeDescription(description: string): string {
  const normalizedDescription = description.toUpperCase();
  const matchingRule = categoryRules.find((rule) =>
    rule.terms.some((term) => normalizedDescription.includes(term)),
  );
 
  return matchingRule?.category ?? "Uncategorized";
}

This is not fancy, but it is valuable.

Rules are cheap, transparent, and fast. They reduce the number of transactions that need AI.


Where AI Fits

After local rules run, some transactions may still be uncategorized.

That is where the OpenAI API fits.

I built the AI step as an optional enrichment pass:

parse statements
-> apply local rules
-> send only remaining uncategorized items to OpenAI
-> cache AI category assignments
-> generate report

The AI script sends a small payload:

transactions.map(({ amount, description, id, merchant, type }) => ({
  id,
  merchant,
  description,
  amount,
  type,
}))

The model is instructed to choose from a fixed category list.

I used structured outputs so the response comes back as category assignments instead of free-form prose:

const AiCategoryResult = z.object({
  categorizations: z.array(
    z.object({
      id: z.string(),
      category: z.enum(categorySchemaValues),
    }),
  ),
});

The call looks like this:

const response = await openai.responses.parse({
  model: process.env.OPENAI_MODEL || "gpt-4o-mini",
  input: [
    {
      role: "system",
      content: [
        "Categorize bank statement transactions for an expense analytics report.",
        `Choose exactly one category from: ${aiCategories.join(", ")}.`,
        "Do not judge tax deductibility or whether a purchase is business-related.",
        "Use Meals for restaurants, cafes, bars serving food or drink, coffee shops, bakeries, prepared-food delivery, quick-service restaurants, and clear dining merchant names.",
        "Use Groceries for grocery stores, supermarkets, and grocery delivery purchases when the text indicates groceries rather than prepared meals.",
        "Use Travel for rideshare trips, lodging, flights, tolls, parking, and transit; do not use Travel for Uber Eats.",
        "Return one categorization for each transaction id.",
      ].join(" "),
    },
    {
      role: "user",
      content: JSON.stringify(transactionsToCategorize),
    },
  ],
  text: {
    format: zodTextFormat(AiCategoryResult, "transaction_categories"),
  },
});

The interesting part is not just "add AI."

The interesting part is putting AI in the right place.

I did not want the model to own the entire pipeline. I wanted it to help with the ambiguous classification step, after deterministic rules had already handled the easy cases.


Caching AI Results

AI categorization is useful, but I do not want to pay for the same answer every time I regenerate the report.

So the script writes AI assignments to:

output/ai-category-cache.json

The cache is keyed by deterministic transaction ID.

That means repeated runs can reuse existing AI categories:

const pending = refreshCache
  ? candidates
  : candidates.filter((transaction) => !cache[transaction.id]);

If I change the prompt or category definitions, I can force a refresh:

npm run generate-report:ai:refresh

This gives me the best parts of AI without making every report generation a new API bill.


Handling Imperfect AI Responses

One thing I ran into during testing: even with structured output, a batch can occasionally omit a transaction ID.

So I added a retry path for missing IDs:

const missingTransactions = mergeResponseIntoCache(transactions, response, cache);
 
if (missingTransactions.length === 0) {
  return;
}
 
if (attempt >= 3) {
  throw new AiCategorizationError(
    `OpenAI categorization omitted ${missingTransactions.length} transaction ids after ${attempt} attempts.`,
    "api_error",
  );
}
 
await categorizeAndCacheBatch(openai, missingTransactions, cache, attempt + 1);

That is the kind of boring reliability code that makes an AI feature feel more like software and less like a demo trick.


Building the Analytics Report

Once the transactions are categorized, the report generator builds summaries.

For example, expense totals use absolute values:

const expenses = roundCurrency(
  orderedTransactions
    .filter((transaction) => transaction.type === "expense")
    .reduce((total, transaction) => total + Math.abs(transaction.amount), 0),
);

Category summaries are grouped from expense transactions:

function summarizeCategories(
  transactions: Transaction[],
  totalExpenses: number,
): CategorySummary[] {
  const summaries = new Map<string, Omit<CategorySummary, "percentage">>();
 
  for (const transaction of transactions.filter((item) => item.type === "expense")) {
    const current = summaries.get(transaction.category) ?? {
      name: transaction.category,
      total: 0,
      count: 0,
    };
 
    current.total += Math.abs(transaction.amount);
    current.count += 1;
    summaries.set(transaction.category, current);
  }
 
  return [...summaries.values()]
    .map((summary) => ({
      ...summary,
      total: roundCurrency(summary.total),
      percentage: totalExpenses === 0 ? 0 : roundPercentage((summary.total / totalExpenses) * 100),
    }))
    .sort((left, right) => right.total - left.total);
}

The final report is written to two places:

output/report.json
dashboard/public/report.json

The second file is what the React dashboard reads.


The React Dashboard

The dashboard is a Vite + React app with Recharts.

It fetches the generated JSON:

export async function loadReport(): Promise<Report> {
  const response = await fetch("/report.json", { cache: "no-store" });
 
  if (!response.ok) {
    throw new Error("Report file could not be loaded.");
  }
 
  const report: unknown = await response.json();
 
  if (!isReport(report)) {
    throw new Error("Report file does not match the expected shape.");
  }
 
  return report;
}

I added runtime validation because report.json is a generated file. If the shape changes or the file is missing, the dashboard should fail gracefully.

The dashboard includes:

  • Summary cards
  • Expense category breakdown
  • Monthly income vs expenses chart
  • Top merchants chart
  • Transaction table
  • Uncategorized transaction section

I originally had a pie chart for expenses by category. It looked fine, but it was not very useful.

For this kind of dashboard, a business owner needs to inspect categories line by line.

So I replaced the pie chart with a breakdown list:

{report.categories.map((category) => (
  <article className="category-row" key={category.name}>
    <div className="category-line">
      <strong>{category.name}</strong>
      <span>{currency.format(category.total)}</span>
    </div>
    <div className="category-meta">
      <span>{category.count} transactions</span>
      <span>{category.percentage}%</span>
    </div>
    <span className="category-track" aria-hidden="true">
      <span style={{ width: `${category.percentage}%` }} />
    </span>
  </article>
))}

That small UI change made the dashboard feel much more useful.


Using Codex While Building

AI showed up in two different ways in this project.

The first was product-facing AI: the OpenAI API helps categorize ambiguous transactions.

The second was development-facing AI: I used Codex while building the proof of concept.

That changed the workflow.

Instead of treating the project like a long checklist, I could work iteratively:

  • Sketch the architecture
  • Generate the first parser
  • Inspect real statement edge cases
  • Adjust merchant normalization
  • Replace real data with synthetic demo data
  • Improve dashboard usability
  • Add AI categorization
  • Add caching and retry behavior
  • Add demo-friendly CLI logs
  • Rewrite the README into a runbook

The useful part was not just "AI wrote code."

The useful part was having a collaborator that could keep the entire project shape in mind while moving between backend scripts, React components, README docs, and demo polish.

That is especially helpful for proof-of-concept work, where the job is not only to make code run, but to make the idea understandable.


Making It Demo-Ready

One of my favorite small additions was the CLI logger.

When you run:

npm run generate-report:demo

the terminal shows progress like:

[info] Expense Analytics POC
[info] Local bank statement analytics
... Scanning local statement files...
[ok] Loaded 92 transactions.
... Analyzing 92 transactions...
[ok] Applied local rules to 92 transactions.
... Building JSON analytics...
[ok] Calculated 7 categories and 40 merchants.
... Writing dashboard report files...
[ok] Dashboard report files updated.
[info] Income: $88,500.00
[info] Expenses: $17,075.48
[info] Net: $71,424.52

That is not necessary for the core logic, but it matters for presenting the project.

A good POC is partly technical and partly narrative. The terminal output helps people understand what the system is doing as it runs.


The Demo Dataset

I started by validating the parser against real statement exports, then replaced the data with synthetic demo transactions.

The current demo CSV represents a year of activity for a consulting business:

  • Invoice revenue
  • Cloud hosting
  • Utilities
  • Business internet
  • Business mobile
  • Client meals
  • Team meals
  • Ads
  • Software subscriptions
  • Office supplies

The generated report gives a clean demo story:

Income: $88,500.00
Expenses: $17,075.48
Net: $71,424.52

The top expense categories are:

  • Cloud Hosting
  • Utilities
  • Meals

That is realistic enough to feel like a real business, but safe to show publicly.


Commands

To install and run everything:

npm install
 
cd dashboard
npm install
cd ..
 
npm run generate-report
npm run dev

The dashboard usually runs at:

http://127.0.0.1:5173/

For AI-assisted categorization:

npm run generate-report:ai

For a screen-recording friendly run:

npm run generate-report:demo

What I Would Add Next

This is still a proof of concept.

The next things I would build are:

  • Manual category correction in the dashboard
  • Category rule export/import
  • Better bank-specific CSV importers
  • PDF statement parsing
  • Receipt matching
  • CSV export
  • QuickBooks export
  • Multi-business support
  • A hosted version with auth and secure file storage

The nice thing is that the architecture gives me a path to get there.

The script can become an API.

The JSON report can become database records.

The local dashboard can become a hosted dashboard.

The category cache can become a user-specific rules system.

The POC is small, but it is not a dead end.


Conclusion

This project started with a boring problem: reviewing expenses from bank statements.

The solution became a small but complete analytics pipeline:

statement files
-> parser
-> normalizer
-> rule categorizer
-> optional AI categorizer
-> JSON report
-> React dashboard

What I like about this build is that it keeps the first version honest.

It does not pretend to be a full accounting platform. It does not start with infrastructure. It starts with the workflow.

That is often the best way to build custom software for a real business pain point.

Prove the transformation first.

Then decide how big the product should become.