Building a Multi Tenant AI Platform: How Weam Handles Isolation and Security

Kshitij Varma
By Kshitij Varma - Developer Advocate 12 Min Read

When you’re building an AI platform that serves multiple companies, you can’t just throw everyone’s data into the same bucket and hope for the best. Company A shouldn’t see Company B’s conversations, documents, or custom agents. Ever.

This sounds obvious, but getting it right is tricky. You need to think about isolation at every layer: database queries, file storage, API access, real-time connections, and even vector embeddings. Miss one spot and you’ve got a security nightmare.

Let me explain how we built multi tenant AI platform; Weam, starting from the database and moving up through the application stack.

The Foundation for Multi Tenant AI platform

Here’s the core principle: every piece of data in Weam belongs to a company. Not a user, not a workspace, but a company. This companyId becomes your primary isolation boundary.

When you sign up for Weam, we create a company for you. This identifier is tagged on every user, brain, document, agent, and chat message. It’s not optional, and it’s not nullable.

// MongoDB Schema Example
const ChatMessageSchema = new Schema({
content: { type: String, required: true },

  messageType: { type: String, enum: ['user', 'assistant', 'system'] },
  companyId: { type: ObjectId, required: true, index: true },
  createdBy: { type: ObjectId, required: true },
  session: { type: ObjectId, required: true },
  // ... other fields

});

// Critical: Index on companyId for query performance
ChatMessageSchema.index({ companyId: 1, session: 1 });

Notice the index on companyId. Every query that touches this collection will filter by company, so you want that lookup to be fast.

Session Based Access Control

We use iron-session for managing user sessions. When you log in, your session stores your companyId along with your user ID and role. This session data becomes the source of truth for every API request.

Note

We use iron-session, a lightweight session management library for Next.js that stores encrypted session data in cookies perfect for server-side environments without external session stores.

// Session Structure

{

  _id: "user-id-here",
  email: "user@company.com",
  roleCode: "USER",
  companyId: "company-id-here"

}

Before any operation happens, we check the session. No session means no access. Wrong companyId means no access. It’s that simple.

Here’s what the middleware looks like:

// Middleware for Protected Routes

async function checkAccess(req, res, next) {
  const session = await req.session.get();
    if (!session || !session._id) {
    return res.status(401).json({ error: 'Unauthorized' });
  }

  // Attach company context to request

  req.user = {
    userId: session._id,
    companyId: session.companyId,
    role: session.roleCode
  };

  next();

}

Every protected endpoint uses this middleware. No exceptions.

Query-Level Isolation

The session gives you the companyId, but you still need to use it correctly in every query. This is where developers often mess up. They write a query that forgets to filter by company, and suddenly, there’s a data leak.

We enforce this pattern everywhere:

// WRONG - Missing company filter

const chats = await ChatMessage.find({ session: sessionId });

// RIGHT - Always filter by company

const chats = await ChatMessage.find({ 
  companyId: req.user.companyId,
  session: sessionId 

});

For extra safety, we built repository classes that automatically inject the companyId:

class ChatRepository {
  constructor(companyId) {
    this.companyId = companyId;
  }

  async findMessages(sessionId) {
    return await ChatMessage.find({
      companyId: this.companyId,
      session: sessionId
    });
  }

  async createMessage(data) {
    return await ChatMessage.create({
      ...data,
      companyId: this.companyId
    });
  }
}

// Usage in route handler

const chatRepo = new ChatRepository(req.user.companyId);
const messages = await chatRepo.findMessages(sessionId);

Note

Repository classes are a clean architecture practice that wrap direct database access, enforcing consistent data rules like always including companyId filters.

This pattern makes it harder to accidentally write an unscoped query.

Vector Database Isolation

Documents in Weam get chunked and stored in Pinecone for semantic search. But vector databases don’t have built-in multi-tenancy. You need to handle it yourself using metadata filters.

When we store embeddings, we attach the companyId and agentId as metadata:

// Storing Vectors with Metadata

await pinecone.upsert({
  vectors: [{
    id: `chunk-${chunkId}`,
    values: embedding,
    metadata: {
      companyId: companyId,
      agentId: agentId,
      fileId: fileId,
      chunkIndex: index,
      content: chunkText
    }
  }]
});

When querying, we filter by this metadata:

// Querying with Company Isolation

const results = await pinecone.query({
  vector: queryEmbedding,
  topK: 5,
  filter: {
    companyId: { $eq: req.user.companyId },
    agentId: { $eq: agentId }
  }
});

```

Without that filter, you’d get results from other companies. That’s a major security problem.

Note

Vector databases like Pinecone store numerical embeddings of text for semantic search. Since they don’t support multi-tenancy natively, we rely on metadata filters to scope queries to each company

File Storage and Access

We use either MinIO or S3 for file storage. Files get organized by company in the bucket structure:

bucket-name/
  company-abc123/
    files/
      document1.pdf
      document2.docx
  company-xyz789/
    files/
      report.pdf

When generating presigned URLs or serving files, we verify the requesting user's companyId matches the file's company:

async function getFileUrl(fileId, req) {
  const file = await File.findOne({
    _id: fileId,
    companyId: req.user.companyId  // Verify ownership
  });

  if (!file) {
    throw new Error('File not found');
  }

  // Generate presigned URL

  return await s3.getSignedUrl('getObject', {
    Bucket: process.env.AWS_S3_BUCKET,
    Key: file.s3Key,
    Expires: 3600
  });
}

No company check means no file access.

Real-Time Isolation with Socket.IO

Chat responses stream over WebSockets using Socket.IO. When a user connects, we authenticate their socket connection and store the companyId in the socket’s metadata:

io.use(async (socket, next) => {
  const session = await getSession(socket.request);
  if (!session || !session.companyId) {
    return next(new Error('Authentication failed'));
  }

  socket.companyId = session.companyId;
  socket.userId = session._id;
  next();

});

When emitting events, we can filter by company:

// Emit to all sockets in a company

io.to(`company-${companyId}`).emit('notification', data);

// Or just to a specific user in that company

io.to(`user-${userId}`).emit('message', data);

This prevents cross-company event leakage in real-time communication.

Note

Each authenticated socket automatically joins a company-{id} room after validation, so messages stay scoped to that company.

Role-Based Access Within Companies

Multi-tenancy handles company isolation, but you also need role-based access control within each company. Weam has three roles: User, Manager, and Admin.

The check-access endpoint validates both company membership and role permissions:

async function checkAccess(userId, resourceType, requiredRole) {

  const user = await User.findById(userId);
  if (!user) {
    return { allowed: false, reason: 'User not found' };
  }

    // Check if user has required role

  const hasPermission = hasRequiredPermission(user.role, requiredRole);
  return { 
    allowed: hasPermission,
    companyId: user.companyId,
    role: user.role 
  };
}

Testing Multi-Tenancy

You can’t just assume your isolation works. You need to test it. Here’s what we test:

  1. Cross-company data access attempts: Try to query data with a different companyId
  2. Missing company filters: Deliberately remove filters and verify queries fail
  3. Session hijacking: Attempt to modify session data to access other companies
  4. Vector search leakage: Query vectors without metadata filters
// Example test case

describe('Multi-tenant isolation', () => {

  it('should not return data from other companies', async () => {

    const company1 = await createCompany();

    const company2 = await createCompany();

    const user1 = await createUser({ companyId: company1._id });

    const user2 = await createUser({ companyId: company2._id });

     const chat1 = await createChat({ 

      companyId: company1._id,

      createdBy: user1._id 

    });

    // User2 should not see Company1's chat

    const result = await ChatMessage.find({

      companyId: company2._id  // User2's company

    });

    expect(result).not.toContainEqual(chat1);

  });

});

Common Pitfalls

Here are the mistakes we’ve seen (and fixed):

Forgetting to filter aggregation pipelines: Aggregations need the companyId filter in the first stage:

// WRONG

const stats = await Message.aggregate([

  { $group: { _id: '$session', count: { $sum: 1 } } }

]);

// RIGHT

const stats = await Message.aggregate([

  { $match: { companyId: new ObjectId(companyId) } },
  { $group: { _id: '$session', count: { $sum: 1 } } }

]);

Using user-provided IDs without validation: Never trust user input for cross-references:

// Validate that the resource belongs to the user's company

const brain = await Brain.findOne({
  _id: req.body.brainId,
  companyId: req.user.companyId
});

if (!brain) {
  return res.status(403).json({ error: 'Access denied' });
}

Leaking data in error messages: Don’t reveal whether resources exist in other companies:

// BAD - Reveals that the resource exists

if (!resource) {
  return res.status(404).json({ error: 'Resource not found' });
}

if (resource.companyId !== req.user.companyId) {
  return res.status(403).json({ error: 'Access denied' });
}

// GOOD - Same response for both cases

if (!resource || resource.companyId !== req.user.companyId) {
  return res.status(404).json({ error: 'Resource not found' });
}

Monitoring and Auditing

We log all access attempts with company context:

logger.info('Data access', {

  userId: req.user.userId,
  companyId: req.user.companyId,
  resource: 'chat-messages',
  action: 'read',
  timestamp: new Date()

});

This audit trail helps catch isolation bugs in production and provides compliance documentation.

The Bottom Line

Multi-tenancy isn’t something you bolt on later. It needs to be in your data model from day one. Every query, every file access, every WebSocket message needs company scoping.

The good news is that once you get the patterns right, they become second nature. Company-scoped repositories, metadata filtering in vector stores, and session-based access control give you a strong foundation.

Just remember: every new feature needs to answer the question “how does this respect company boundaries?” If you can’t answer that, you’re not ready to ship it.

Frequently Asked Questions

1. What does ā€œmulti-tenancyā€ mean in an AI platform?

2. Why is isolation important in multi-tenant AI systems?

Multi-tenancy is a software architecture where a single application serves multiple customers—called tenants—from a shared infrastructure. Each tenant’s data, users, and configurations are logically separated, even though they’re using the same underlying codebase and servers.

In SaaS or AI platforms like Weam, this architecture allows efficient scaling and centralized updates — but it introduces a major security responsibility: isolation.

3. How do you enforce isolation at the database layer?

If company data isn’t properly isolated in a multi-tenant system, it can lead to:

  • Compliance violations – Breaches of GDPR or SOC 2 requirements.
  • Data leaks – One company’s users see another’s data.
  • Unauthorized access – Sessions or APIs allow cross-tenant access.
  • File exposure – Incorrect file validation reveals private documents.
  • Vector leaks – AI embeddings mix data between companies.

4. How does Weam implement company-level data isolation?

Weam tags every entity (users, agents, chats, documents) with a companyId.

5. How is user access controlled per company?

Through session-based authentication (iron-session), middleware checks, and request validation.

About Weam

Weam helps digital agencies to adopt their favorite Large Language Models with a simple plug-an-play approach, so every team in your agency can leverage AI, save billable hours, and contribute to growth.

You can bring your favorite AI models like ChatGPT (OpenAI) in Weam using simple API keys. Now, every team in your organization can start using AI, and leaders can track adoption rates in minutes.

We are open to onboard early adopters for Weam. If you’re interested, opt in for our signup.

Kshitij Varma
By Kshitij Varma Developer Advocate
Follow:
Kshitij Varma is a Developer Advocate and AI enthusiast at Weam.AI, passionate about building smarter workflows that empower developers and organizations. With hands-on experience in AI, automation, and full-stack development; bridging the gap between technology and practical solutions. Through his work and writing, he explores how AI can augment human creativity and productivity, helping teams achieve more while driving innovation.
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *