Microsoft Teams Clone is a webRTC based project to enjoy free, Quality video conversations with friends and colleagues.
Checkout the app here
- Clone the repository
- Run the following commands in the command propmt
npm install
npm install --prefix client
npm run dev
- Now the app will run on
localhost:3000
.
Note: You need to setup your own oauth credentials to login,else the app won't work locally!
- Video Meetings
- Built In Chat(before, during and after meetings)
- Waiting Rooms
- User Authentication
- Fun Face Masks
- Screen Share
- Responsive design for mobile
-
- Video Meetings
- Chats
- Screen Share
- User Authentication
- Fun Face Masks
-
Problems faced (with solutions)
- High Level Design
- Video Meetings
- Chats
- Screen Share
- User Authentication
- Fun Face Masks
- Low Level Design
- Video Meetings
- Chats
- Screen Share
- User Authentication
- Fun Face Masks
- ReactJs
- NodeJs
- Socket.io
- WebRTC(PeerJs)
- PassPortJs
- Mono DB
- ReactJs is used to make the entire front end, since it allows me to use hooks and other modern features that make the website look smooth with no reloading whatsover like in a traditional Website.
- NodeJs is used for the server, it handles get/post requests through express and also handles socket events.
- Socket.io is used for communication between the server and the client back and forth. It handles all the events like mute,cam off,etc and informs other peers in the meet to adjust the ui accordingly, thus maintaining a sync between all the users. Since it is event driven, we have much more control and all we need to do to communicate is emit and listen for an event.
- WebRTC is used to transfer video streams, Peer to Peer basis, It is one of the best ways to stream video using the resources of the browser only and not use any other service, the quality is simply awsome.
- PassPortJs is a middleware that is used to implement User authentication in NodeJs, Since it open source I preffered to use it rather than any BAAS .
- Mongo DB is used to provide a NO sql database, where we store all the users' information, along with the teams they are a part of and the chat messages for all their teams that they are a part of.
- Supports upto 5 people.
- Compatible on Mobile as well as the web.
- Options, to turn cam on/off, mute,etc avaliable.
- The chat is made so that you can chat before, during and even after the meeting.
- All your messages are stored in the database.
- People in the meeting can also chat with people who are a part of the team and not the meeting right now through meeting chat also since they are interlinked.
- Links can also be shared easily and become clickable.
- Push Notifications are sent to the user, who recieves a message.
- A single user can share their screen at a time.
- The video of the other users is also shown simentaneously
- No hassle of filling forms.
- Login with your favourite social media
- Supported *Google *Facebook *Github
- Virtual Face masks and Stickers for your face.
- Chose from a large collection.
- The host of the meeting has full control of the meeting.
- Whenever someone tries to join, the host can accept him or deny him entry.
- He can also put him in the, and then admit him at a later point.
- In the meantimes the person in the waiting room, can chose the status of his audio/video before he joins the meeting
-
Face Masks
- Face Detection Algoritm(by tensorflow) made for static images, How to make use of it in a video setting and also make it work for stream of remote peers?
- To implement algorithm on video, I ran it in a loop at interval of 100ms which makes it capture the elements of video as image fast enough to work.
- To run it on remote peers, Instead of sending the stream for the mask, I ran the algorithm locally on the other system on the video of the person
- This also saved me a lot of bandwidth and made the process overall a lot simpler as well and faster, now the starting/stopping of sticker is controlled by the server for other's video stream, and you for your own video stream!
- Face Detection Algoritm(by tensorflow) made for static images, How to make use of it in a video setting and also make it work for stream of remote peers?
-
Chats
- How to store the chats and allow users to communicate before and after the meetings as well
- I used Mongo Db to store the messages and implemented the concept of teams, where you could invite people to your team before the meeting to have a chat with them before the meeting also, Also I linked the meeting chat I already had to this new chat as well, and loaded all the messages whenver page rendered so that users can chat before, after easily.
- How to store the chats and allow users to communicate before and after the meetings as well
-
Prevent Unauthorised Access
- Since people would be sharing the links, I wanted to make sure that noone is allowed to join/use without signing in, but I did'nt want to redirect them to the home page to sign In since it hampers the user Experience, It was a challenge to remember that from which page the person has came and take him to the same page after login.
- To implement this I made use of Protected Routes, which informed the server about the route, they came from and server stored it in the user session, then later on when the user was authenticated those details were taken from the session and sent to the protected route along with a secret key, The protected route identified the secret key if it matched correctly then took the person to the original page he wanted to visit!
- Since people would be sharing the links, I wanted to make sure that noone is allowed to join/use without signing in, but I did'nt want to redirect them to the home page to sign In since it hampers the user Experience, It was a challenge to remember that from which page the person has came and take him to the same page after login.
-
Screen Share
- Sharing screens, required me to send a different video stream altogether to the other users, hence I had to figure out how to do this in such a way, that doesn't cause audio, video loss since replacing tracks in a webrtc connection causes renegotiation which results in some audio, video loss for a couple of seconds.
- I am still trying to figure out a better aproach to this problem, right now I have just used the mediastram.replaceTrack and RtcPeerConnection.replaceTrack both of which can need renogotiation but are better than completely reestablishing the entire call itself.
- Sharing screens, required me to send a different video stream altogether to the other users, hence I had to figure out how to do this in such a way, that doesn't cause audio, video loss since replacing tracks in a webrtc connection causes renegotiation which results in some audio, video loss for a couple of seconds.
- Video Meetings are created using the combination of WebRTC and Socket.io.
- All the clients are connected to the server through websockets, while they are connected to each other throogh WebRTC.
- All the communication between the clients is handled through the server only, like if someone mutes,etc except for the video streams that are transfered through webrtc.
- Since it is a mesh topology, no of connections increase quadratically, hence it cannot support very large meetings.
- Chat is made using socket.io. The user can chose to send the message to everyone.
- The server listens for the message, and on reception it broadcasts the message to the users applicable per the request from the person.
- All the messages are stored in the database for later access, and stay available in your team chats, to be accessed later on.
- The chats are stored in the database on a key of roomId so that all the messages remain there and can be presented to people when they try to access it.
- WebRTC allows us to capture the screen.
- We emit an event informing the other that we are sharing the screen so that they can adjust the ui accordingly.
- The current connection with all other clients is broken and a new one is established with the stream containing the screen.
- Face masks are added using the tensorflowJs' faceapi.
- It detects, static images and draws over them.
- We repeat this after every 100 ms to get the desired mask which follows your face.
- They are implemented by having the meeting link protected using the react router dom.
- Whenever a person tries to join the meeting, he is first redirected to the waiting room, and an emmited event informs the host about the new person.
- Now the host if accepts emits an event that allows the other user to join the meeting, similary if rejects another event firest that rejects the person.
- User Authetication is implemented through the use of Oauth 2.0
- The details fetched from the provider includes the name,profile photo only, since we don't need anything else in the app.
function setUpSocketsAndPeerEvents({ socket, myPeer, stream, myPic }, cb) {
myPeer.on("open", (userId) => {
setVideos((prev) => {
videoStatus.current, picurL: myPicRef.current, userName: titleCase(myNameRef.current) }];
});
const roomId = window.location.pathname.split("/")[2];
socket.emit("join-room", roomId, userId, { audio: audioStatus.current, myNameRef.current });
});
socket.on("user-connected", (userId, socketId, { audio: userAudio, video: userVideo, picurL: userPicUrl, name: userName }) => {
const call = myPeer.call(userId, stream);
call.on("stream", (userVideoStream) => {
if (connectedPeers.current[call.peer]) {
return;
}
connectedPeers.current[call.peer] = call;
addVideoStream(userVideoStream, call.peer, { userAudio, userVideo, userName, userPicUrl });
const roomId = window.location.pathname.split("/")[2];
setVideo((prev) => {
console.log("the state of video is", prev);
return prev;
});
socket.emit("acknowledge-connected-user", {
video: videoStatus.current,
audio: audioStatus.current,
socketId,
userId,
roomId,
picurL: myPicRef.current,
name: myNameRef.current,
screenShareStatus: someOneSharingScreenRef.current,
});
});
call.on("close", () => {
setVideos((prev) => {
return prev.filter((video) => video.userId !== call.peer);
});
});
});
socket.on("user-disconnected", ({ userId, name }) => {
connectedPeers.current[userId].close();
setOpenSnackBar({ value: true, name });
if (someOneSharingScreenRef.current.userId === userId) {
setSomeOneSharingScreen({ value: false, userId: null });
someOneSharingScreenRef.current = { value: false, userId: null };
}
delete connectedPeers[userId];
});
myPeer.on("call", (call) => {
call.answer(stream);
call.on("stream", (userVideoStream) => {
if (connectedPeers.current[call.peer]) {
return;
}
connectedPeers.current[call.peer] = call;
addVideoStream(userVideoStream, call.peer, { useraudio: true, userVideo: false });
});
call.on("close", () => {
setVideos((prev) => {
return prev.filter((video) => video.userId !== call.peer);
});
});
});
}
- Here when a user connects to the meet, we capture his video/audio and emit an event asking all the people in the room to call this user.
- When other call this user, he answers that with his stream ans a peer to peer connection is established.
socket.on("join-room", (roomId, userId, { audio, video, picurL, name }) => {
if (waitingRooms[roomId] === undefined) {
waitingRooms[roomId] = socket.id;
}
socket.join(roomId);
socket.to(roomId).emit("user-connected", userId, socket.id, { audio, video, picurL, name });
socket.on("disconnect", () => {
if (waitingRooms[roomId] === socket.id) {
delete waitingRooms[roomId];
}
socket.to(roomId).emit("user-disconnected", { userId, name: getNameFromSocketId[socket.id], audio, video });
});
});
- Here we setup the meeting such that, we listen for new user connection through join room and then emit an event user connected that informs the peer about the user id of the user they need to call and connect to.
//to send a message with all the required information
const chat = {
from: {
name: myNameRef.current,
userId: myId,
picurL: myPicRef.current,
},
all: sendTo === "all",
to: sendTo === "all" ? { roomId: window.location.pathname.split("/")[2] } : JSON.parse(sendTo),
message: chatMessage,
};
setChatMessage("");
setChatMessagges((prev) => [...prev, chat]);
socket.emit("send-chat", chat);
//to listen for reception
socket.on("recieved-chat", (chat) => {
if (chatOpenRef.current === false) setShowChatPopUp((prev) => prev + 1);
else setShowChatPopUp(0);
setChatMessagges((prev) => [...prev, chat]);
});
- The code is basically just emit to the server the message along with the scoketId/roomdId depending on whether we need to send it to everyone or an individual only.
- The other event listener listens for recieiving any message and uses that to give push notification, and add the message to the chatbox.
socket.on("send-chat", (chat) => {
if (chat.all === true && chat.to && chat.to.roomId) {
socket.to(chat.to.roomId).emit("recieved-chat", chat);
} else {
if (chat.to && chat.to.userId) {
socket.to(getSocketIdByUserId[chat.to.userId]).emit("recieved-chat", chat);
}
}
});
- This code listens for the event chat fired from the client, and broadcasts the messsage to everyone else in the meet.
-
Users
- UserId
- PicUrl
- Name
- Rooms(array of RoomIds)
-
Rooms
- RoomId
- RoomName
- Participants(array of {userId,PicUrl,Name}):It shows all the user in the room and we may access their images, name, etc also from here.
-
Chats
- RoomId
- Chats(array of following)
- from {UserId,name,PicURl}
- to {UserId,name,PicUrl)
- content of the message
- Date(date and time when the message was sent)
- They are implemented by having the meeting link protected using the react router dom.
- The concept of protect routes is used where the meeting link act as a protected route, protecting both the user who are not authenticated and those who are not yet admitted to the meeting.
- The url is given some extra paramenter to identify from where it is coming and then the user is redirected based on that to different routes and the parameter is removed from the url so that anyone else is not able to misuse it to come to our meetings
function ProtectedRoute({ component: Component, ...rest }) {
const [isLoggedIn, setIsLoggedIn] = useLogin();
const { state } = useLocation();
const [verifiedFromServer, setVerifiedFromServer] = useState(false);
useEffect(() => {
axios.get("/authenticated").then((response) => {
console.log(response.data);
if (response.data !== "unauthorised") {
setIsLoggedIn(true);
} else {
setIsLoggedIn(false);
}
setVerifiedFromServer(true);
});
}, []);
if (!verifiedFromServer)
return (
<div style={{ position: "absolute", top: "45vh", left: "44vw", textAlign: "center" }}>
<Loader
type="Puff"
color="#00BFFF"
height={100}
width={100}
timeout={3000} //3 secs
/>
</div>
);
//since the url has an extra parameter when it comes from the server,after verification we take care
//of that here
let url = window.location.pathname;
const allow = url.split("/").length === 4;
url = url = url.slice(0, url.lastIndexOf("/"));
if (allow) {
window.history.replaceState({}, "", url);
}
if (isLoggedIn === false) {
return <Redirect to={{ pathname: "/signinfirst", state: { from: rest.location.pathname, prevFrom: state && state.from === "/" ? "home" : null } }} />;
} else if (allow || (state && state.from === "/")) return <Route {...rest} render={(props) => <Component {...props} />}></Route>;
else return <Redirect to={`/waitingroom/${window.location.pathname.split("/")[2]}`} />;
}
- The redirects here are based on the fact from where the person is coming and where he intends to go to.
- It is implemented using PassportJs in NodeJs
- The session is maintained using express-session and the memoryStore is used to make the app scalable, since the session is stored on the server side.
passport.use(
new GoogleStrategy(
{
clientID: process.env.GOOGLE_CLIENT_ID,
clientSecret: process.env.GOOGLE_CLIENT_SECRET,
callbackURL: process.env.NODE_ENV === "development" ? "http://localhost:5000/auth/google/callback" : "https://hidden-beyond-12562.herokuapp.com/auth/google/callback",
userProfileURL: "https://www.googleapis.com/oauth2/v3/userinfo",
},
function (accessToken, refreshToken, profile, cb) {
User.findOrCreate(
{
googleId: profile.id,
},
{
name: profile.displayName,
picurL: profile.photos[0].value,
},
function (err, user) {
return cb(err, user);
}
);
}
)
);
- It is basically the passport strategy where it authenticates the user and provides us with the details and also stores them.
app.get("/auth/github", (req, res, next) => {
req.session.redirectDetails = { join: req.query.join, room: req.query.room, prev: req.query.prev };
passport.authenticate("github")(req, res, next);
});
app.get(
"/auth/github/callback",
passport.authenticate("github", {
scope: ["email", "username"],
failureRedirect: "/login",
}),
function (req, res) {
if (req.session.redirectDetails && req.session.redirectDetails.join && req.session.redirectDetails.prev) {
res.redirect(`${useDomain}/join/${req.session.redirectDetails.room}/${req.session.redirectDetails.prev}`);
} else if (req.session.redirectDetails && req.session.redirectDetails.join) {
res.redirect(`${useDomain}/join/${req.session.redirectDetails.room}`);
} else {
res.redirect(`${useDomain}/`);
}
}
);
- The user requests to login and is redirected and logged in through the passport middleware after which he comes to the callback. We also store details about where to redirect him in the session.
- From here he is redirected to the correct page depending on the redirect details stored in his/her session.
axios.get("/production").then((response) => {
const useDomain = response.data === "development" ? "http://localhost:5000" : "";
if (cameFrom) {
//since we need the second and third argument we destructure the array in such a way
const [extra, join, room] = cameFrom.split("/");
if (prev) window.open(`${useDomain}/auth/${service}?join=${join}&room=${room}&prev=${prev}`, "_self");
else window.open(`${useDomain}/auth/${service}?join=${join}&room=${room}`, "_self");
} else window.open(`${useDomain}/auth/${service}`, "_self");
});
- This is a utility function that is used to send request to the server, it takes in the service to use and from where the user is coming and where he needs to be redirected and converts all those things to parameters in the url.
- The user gets to see the waiting room screen where he can select status of his audio, video when he joins the meet.
- These details are stored in the user session which is broadcasted to all others when he is admitted to the meeting.
useEffect(() => {
showUserVideo();
const roomId = window.location.pathname.split("/")[2];
socket.emit("check-valid-room", roomId, ({ status }) => {
if (status === "invalid room") {
alert("Link is invalid");
setStatus("invalid room");
}
});
socket.on("you-are-admitted", () => {
setStatus("allowed");
});
socket.on("you-are-denied", () => {
setStatus("denied");
alert("host denied entry to the meeting");
});
return () => {
socket.off("you-are-admitted");
socket.off("you-are-denied");
};
}, []);
function askToJoin() {
axios.get("/authenticated").then((response) => {
if (response.data !== "unauthorized") {
setHasAskedToJoin(true);
const roomId = window.location.pathname.split("/")[2];
socket.emit("req-join-room", roomId, response.data.name);
} else {
alert("you are not logged in");
setStatus("denied");
}
});
}
- All these are set up to send and recieve events to the server and actions are executed as explained above.
- They are made using tensorFlow faceApi to detect facial landmarks and get their coordinates.
- Now we place a canvas on top of the video to draw on top of the canvas.
- We use setInterval and run the faceApi on the image to get the detections and draw accordingly on the canvas to give a mask effect.
- For our stream on other user's screen we draw the mask locally on their screen, and this behaviour is also controlled by the socket events which inform the client where to draw which masks.
async function startCanvasDrawing() {
const myId = videoStream.userId;
if (videoRefs.current[myId] === undefined) return;
videoRefs.current[myId].canvasRef.innerHTML = await faceapi.createCanvasFromMedia(videoRefs.current[myId].videoRef);
const displaySize = videoRefs.current[myId].videoRef.getBoundingClientRect();
faceapi.matchDimensions(videoRefs.current[myId].canvasRef, displaySize);
startInterval.current = () => {
clearMe.current = setInterval(async () => {
try {
const detections = await faceapi.detectAllFaces(videoRefs.current[myId].videoRef, new faceapi.TinyFaceDetectorOptions()).withFaceLandmarks();
if (detections && detections.length > 0) {
errCnt.current = 0;
const resizedDetections = faceapi.resizeResults(detections, displaySize);
const headCoods = resizedDetections[0].landmarks.getLeftEyeBrow();
const jawCoods = resizedDetections[0].landmarks.getJawOutline();
videoRefs.current[myId].canvasRef.getContext("2d").clearRect(0, 0, videoRefs.current[myId].canvasRef.width, videoRefs.current[myId].canvasRef.height);
videoRefs.current[myId].canvasRef
.getContext("2d")
.drawImage(
img.current,
jawCoods[4].x - (jawCoods[16].x - jawCoods[0].x) * 0.48,
jawCoods[0].y - (jawCoods[8].y - headCoods[3].y) * 0.9,
(jawCoods[16].x - jawCoods[0].x) * 1.7,
(jawCoods[8].y - headCoods[3].y) * 1.8
);
// faceapi.draw.drawDetections(videoRefs.current[myId].canvasRef, resizedDetections);
// faceapi.draw.drawFaceLandmarks(videoRefs.current[myId].canvasRef, resizedDetections);
} else {
if (errCnt.current > 10) {
videoRefs.current[myId].canvasRef.getContext("2d").clearRect(0, 0, videoRefs.current[myId].canvasRef.width, videoRefs.current[myId].canvasRef.height);
errCnt.current = 0;
}
errCnt.current++;
}
} catch (err) {
console.log(err);
}
}, 200);
- The detections are used to draw an image on top of the face of the person to be able to give a mask like effect.
- They cannot be run more than 10 FPS since the detections algo takes that much time.
- But for performance betterment we have kept it at 5FPS, but if you feel it is laggy you can change it to 10 FPS as well.
useEffect(() => {
if (typeof stopInterval.current === "function") {
stopInterval.current();
}
socket.on("start-sticker", (userId, key) => {
if (userId === videoStream.userId) {
if (typeof stopInterval.current === "function") stopInterval.current();
if (img.current) {
const currImg = allStickers[key];
const currImgName = Object.keys(currImg)[0];
img.current.src = currImg[currImgName];
}
if (typeof startInterval.current === "function") startInterval.current();
}
});
socket.on("stop-sticker", (userId) => {
if (userId === videoStream.userId) {
stopInterval.current();
}
});
}
- These are the events that start/stop sticker in the videostream.
- These are using refs which are actually fucntions that are used to start/stop the sticker.