When you’re building an AI platform that serves multiple companies, you can’t just throw everyone’s data into the same bucket and hope for the best. Company A shouldn’t see Company B’s conversations, documents, or custom agents. Ever.
This sounds obvious, but getting it right is tricky. You need to think about isolation at every layer: database queries, file storage, API access, real-time connections, and even vector embeddings. Miss one spot and you’ve got a security nightmare.
Let me explain how we built multi tenant AI platform; Weam, starting from the database and moving up through the application stack.
The Foundation for Multi Tenant AI platform
Here’s the core principle: every piece of data in Weam belongs to a company. Not a user, not a workspace, but a company. This companyId becomes your primary isolation boundary.
When you sign up for Weam, we create a company for you. This identifier is tagged on every user, brain, document, agent, and chat message. It’s not optional, and it’s not nullable.
// MongoDB Schema Example
const ChatMessageSchema = new Schema({
content: { type: String, required: true },
messageType: { type: String, enum: ['user', 'assistant', 'system'] },
companyId: { type: ObjectId, required: true, index: true },
createdBy: { type: ObjectId, required: true },
session: { type: ObjectId, required: true },
// ... other fields
});
// Critical: Index on companyId for query performance
ChatMessageSchema.index({ companyId: 1, session: 1 });
Notice the index on companyId. Every query that touches this collection will filter by company, so you want that lookup to be fast.
Session Based Access Control
We use iron-session for managing user sessions. When you log in, your session stores your companyId along with your user ID and role. This session data becomes the source of truth for every API request.
Note
We use iron-session, a lightweight session management library for Next.js that stores encrypted session data in cookies perfect for server-side environments without external session stores.
// Session Structure
{
_id: "user-id-here",
email: "user@company.com",
roleCode: "USER",
companyId: "company-id-here"
}
Before any operation happens, we check the session. No session means no access. Wrong companyId means no access. It’s that simple.
Here’s what the middleware looks like:
// Middleware for Protected Routes
async function checkAccess(req, res, next) {
const session = await req.session.get();
if (!session || !session._id) {
return res.status(401).json({ error: 'Unauthorized' });
}
// Attach company context to request
req.user = {
userId: session._id,
companyId: session.companyId,
role: session.roleCode
};
next();
}
Every protected endpoint uses this middleware. No exceptions.
Query-Level Isolation
The session gives you the companyId, but you still need to use it correctly in every query. This is where developers often mess up. They write a query that forgets to filter by company, and suddenly, there’s a data leak.
We enforce this pattern everywhere:
// WRONG - Missing company filter
const chats = await ChatMessage.find({ session: sessionId });
// RIGHT - Always filter by company
const chats = await ChatMessage.find({
companyId: req.user.companyId,
session: sessionId
});
For extra safety, we built repository classes that automatically inject the companyId:
class ChatRepository {
constructor(companyId) {
this.companyId = companyId;
}
async findMessages(sessionId) {
return await ChatMessage.find({
companyId: this.companyId,
session: sessionId
});
}
async createMessage(data) {
return await ChatMessage.create({
...data,
companyId: this.companyId
});
}
}
// Usage in route handler
const chatRepo = new ChatRepository(req.user.companyId);
const messages = await chatRepo.findMessages(sessionId);
Note
Repository classes are a clean architecture practice that wrap direct database access, enforcing consistent data rules like always including companyId filters.
This pattern makes it harder to accidentally write an unscoped query.
Vector Database Isolation
Documents in Weam get chunked and stored in Pinecone for semantic search. But vector databases don’t have built-in multi-tenancy. You need to handle it yourself using metadata filters.
When we store embeddings, we attach the companyId and agentId as metadata:
// Storing Vectors with Metadata
await pinecone.upsert({
vectors: [{
id: `chunk-${chunkId}`,
values: embedding,
metadata: {
companyId: companyId,
agentId: agentId,
fileId: fileId,
chunkIndex: index,
content: chunkText
}
}]
});
When querying, we filter by this metadata:
// Querying with Company Isolation
const results = await pinecone.query({
vector: queryEmbedding,
topK: 5,
filter: {
companyId: { $eq: req.user.companyId },
agentId: { $eq: agentId }
}
});
```
Without that filter, you’d get results from other companies. That’s a major security problem.
Note
Vector databases like Pinecone store numerical embeddings of text for semantic search. Since they donāt support multi-tenancy natively, we rely on metadata filters to scope queries to each company
File Storage and Access
We use either MinIO or S3 for file storage. Files get organized by company in the bucket structure:
bucket-name/
company-abc123/
files/
document1.pdf
document2.docx
company-xyz789/
files/
report.pdf
When generating presigned URLs or serving files, we verify the requesting user's companyId matches the file's company:
async function getFileUrl(fileId, req) {
const file = await File.findOne({
_id: fileId,
companyId: req.user.companyId // Verify ownership
});
if (!file) {
throw new Error('File not found');
}
// Generate presigned URL
return await s3.getSignedUrl('getObject', {
Bucket: process.env.AWS_S3_BUCKET,
Key: file.s3Key,
Expires: 3600
});
}
No company check means no file access.
Real-Time Isolation with Socket.IO
Chat responses stream over WebSockets using Socket.IO. When a user connects, we authenticate their socket connection and store the companyId in the socket’s metadata:
io.use(async (socket, next) => {
const session = await getSession(socket.request);
if (!session || !session.companyId) {
return next(new Error('Authentication failed'));
}
socket.companyId = session.companyId;
socket.userId = session._id;
next();
});
When emitting events, we can filter by company:
// Emit to all sockets in a company
io.to(`company-${companyId}`).emit('notification', data);
// Or just to a specific user in that company
io.to(`user-${userId}`).emit('message', data);
This prevents cross-company event leakage in real-time communication.
Note
Each authenticated socket automatically joins a company-{id} room after validation, so messages stay scoped to that company.
Role-Based Access Within Companies
Multi-tenancy handles company isolation, but you also need role-based access control within each company. Weam has three roles: User, Manager, and Admin.
The check-access endpoint validates both company membership and role permissions:
async function checkAccess(userId, resourceType, requiredRole) {
const user = await User.findById(userId);
if (!user) {
return { allowed: false, reason: 'User not found' };
}
// Check if user has required role
const hasPermission = hasRequiredPermission(user.role, requiredRole);
return {
allowed: hasPermission,
companyId: user.companyId,
role: user.role
};
}
Testing Multi-Tenancy
You can’t just assume your isolation works. You need to test it. Here’s what we test:
- Cross-company data access attempts: Try to query data with a different companyId
- Missing company filters: Deliberately remove filters and verify queries fail
- Session hijacking: Attempt to modify session data to access other companies
- Vector search leakage: Query vectors without metadata filters
// Example test case
describe('Multi-tenant isolation', () => {
it('should not return data from other companies', async () => {
const company1 = await createCompany();
const company2 = await createCompany();
const user1 = await createUser({ companyId: company1._id });
const user2 = await createUser({ companyId: company2._id });
const chat1 = await createChat({
companyId: company1._id,
createdBy: user1._id
});
// User2 should not see Company1's chat
const result = await ChatMessage.find({
companyId: company2._id // User2's company
});
expect(result).not.toContainEqual(chat1);
});
});
Common Pitfalls
Here are the mistakes we’ve seen (and fixed):
Forgetting to filter aggregation pipelines: Aggregations need the companyId filter in the first stage:
// WRONG
const stats = await Message.aggregate([
{ $group: { _id: '$session', count: { $sum: 1 } } }
]);
// RIGHT
const stats = await Message.aggregate([
{ $match: { companyId: new ObjectId(companyId) } },
{ $group: { _id: '$session', count: { $sum: 1 } } }
]);
Using user-provided IDs without validation: Never trust user input for cross-references:
// Validate that the resource belongs to the user's company
const brain = await Brain.findOne({
_id: req.body.brainId,
companyId: req.user.companyId
});
if (!brain) {
return res.status(403).json({ error: 'Access denied' });
}
Leaking data in error messages: Don’t reveal whether resources exist in other companies:
// BAD - Reveals that the resource exists
if (!resource) {
return res.status(404).json({ error: 'Resource not found' });
}
if (resource.companyId !== req.user.companyId) {
return res.status(403).json({ error: 'Access denied' });
}
// GOOD - Same response for both cases
if (!resource || resource.companyId !== req.user.companyId) {
return res.status(404).json({ error: 'Resource not found' });
}
Monitoring and Auditing
We log all access attempts with company context:
logger.info('Data access', {
userId: req.user.userId,
companyId: req.user.companyId,
resource: 'chat-messages',
action: 'read',
timestamp: new Date()
});
This audit trail helps catch isolation bugs in production and provides compliance documentation.
The Bottom Line
Multi-tenancy isn’t something you bolt on later. It needs to be in your data model from day one. Every query, every file access, every WebSocket message needs company scoping.
The good news is that once you get the patterns right, they become second nature. Company-scoped repositories, metadata filtering in vector stores, and session-based access control give you a strong foundation.
Just remember: every new feature needs to answer the question “how does this respect company boundaries?” If you can’t answer that, you’re not ready to ship it.
Frequently Asked Questions
1. What does āmulti-tenancyā mean in an AI platform?
2. Why is isolation important in multi-tenant AI systems?
Multi-tenancy is a software architecture where a single application serves multiple customersācalled tenantsāfrom a shared infrastructure. Each tenantās data, users, and configurations are logically separated, even though theyāre using the same underlying codebase and servers.
In SaaS or AI platforms like Weam, this architecture allows efficient scaling and centralized updates ā but it introduces a major security responsibility: isolation.
3. How do you enforce isolation at the database layer?
If company data isnāt properly isolated in a multi-tenant system, it can lead to:
- Compliance violations ā Breaches of GDPR or SOC 2 requirements.
- Data leaks ā One companyās users see anotherās data.
- Unauthorized access ā Sessions or APIs allow cross-tenant access.
- File exposure ā Incorrect file validation reveals private documents.
- Vector leaks ā AI embeddings mix data between companies.
4. How does Weam implement company-level data isolation?
Weam tags every entity (users, agents, chats, documents) with a companyId.
5. How is user access controlled per company?
Through session-based authentication (iron-session), middleware checks, and request validation.
