Edvantis and KPC Labs formed an IT partnership in 2010, and have since collaborated on a number of software development and big data analytics projects.
-
Service
Data & Analytics
-
Industry
Real Estate
-
Location
United States
Challenge
KPC Labs aggregates and analyzes data, helping businesses gain valuable insights. We formed an IT partnership with them in 2010 and have since collaborated on a number of software development and big data analytics projects.
The primary goal of our long-term cooperation with KPC Labs was to address diverse challenges in the real estate sector — from real estate data aggregation and augmentation services to developing a dialer service and incorporating machine learning (ML) solutions. Together, we needed to enhance the workflow for real estate agents and empower brokers to efficiently identify and secure properties with a high likelihood of quick sales.
“The members of their team adhere to a workflow with great discipline. They have been very transparent about what they can deliver and kept their promises. We keep up the successful partnership. Quality, speed, cost, and skills are well-balanced!”
Seth Krauss, Partner at KPC Labs
Main Goals
- Create a web crawling framework to easily extend and maintain the automated extraction of data from public websites
- Design, develop and maintain the lead management web portal
- Optimize database queries to efficiently handle large amounts of data
- Introduce a machine learning solution to predict sales in the US real estate market
- Migrate from a legacy system to AWS-based infrastructure
- Modularize the monolith system and componentize deployments
- Create data integration pipelines and provide data acquisition & enhancement services
Technologies Used
Java 8, Spring Framework, JavaScript, jQuery, MySQL, Amazon Web Services
Team size and Composition
We assembled a cross-functional team of 7 Software Developers, 1 ML Engineer, and 2 QA Engineers who contributed to the software development, quality assurance, tech design, and machine learning/artificial intelligence research and development. During different project stages, we scaled the team up to 15 specialists matching the Client’s requirements.
Solution
Throughout more than 13 years of cooperation, our team has successfully migrated all data and the portal website to AWS, implemented a new web-based application and enhanced the performance and stability of the data acquisition, crawling, and ingestion platforms.
We brought in practical technologies like Amazon Sagemaker and used methods such as gradient boosting and random forest for better house value predictions. We also added a language model (BERT) to analyze notes left in the dialer data.
Tasks and ML Models Applied
- Prediction likely-to-list (likely-to-sell) homes from off-market properties
Gradient boosting machines (including XGBoost, LightGBM, AdaBoost), random forest, stacking ensemble, deep learning - Competitive market analysis based on pricing models that forecast home value ranges and outliers
Random forest and gradient boosting machines with controlling the predictions - Similarity analysis to identify homes close to a targeted home based on significant classifier characteristics
K-nearest neighbors (k-NN algorithm) in the vector space - Predicting homes that are most likely to become leads based on contacting behavior
Gradient boosting machines (including XGBoost, LightGBM, AdaBoost), random forest, stacking ensemble, deep learning - Predicting homes that are most likely to be receptive to approach and the best times to make contact
Gradient boosting machines (including XGBoost, LightGBM, AdaBoost), random forest, stacking ensemble, deep learning - Automated natural language processing of agent supplied notes and text to infer lead disposition
BERT language model fine-tuned as sequence classifier stacked together with TF-IDF-based logistic regression - Insight curves that identified longitudinal trends and lead rate peaks in time
Radial basis function (RBF) neural networks and Nadaraya-Watson Kernel Regression
During Our Partnership We:
- Designed and developed low-burden code-free web crawling language and framework to automatically discover and extract relevant data from public websites
- Designed and developed a lead management web application for searching extracted data for potential leads based on geo-targeted searches by lead type and maintain prospecting contact management workflow. Uses two very large databases in highly concurrent, real-time transactional usage patterns (serving 1000s of live and 100s of concurrent users)
- Application of ongoing performance, stability, and security optimizations over the course of a decade for all platforms
- Introduced 20+ microservices, refactoring the monolith to support more agile and higher velocity development and feature releases
- Separated the monolith into a white-labeling framework supporting two different portals. Also developed a robust referral framework and flexible subscription models for over a dozen affiliates
- Tightly integrated third-party dialing service and application workflow seamlessly into portals
- Added Amazon Redshift as a data warehouse solution, which supported data statistic calculations
- Migrated the portal website to Elastic Beanstalk inside a virtual private cloud (VPC), enabling auto scaling
- Added Elasticsearch, resulting in a significant improvement in website and query performance
Results
With the help of the Edvantis Engineers, KPC Labs gained workable IT services (web solutions and portals) for their operations. Currently, the team is developing an innovative Recommendations & Insights software module backed by industry-leading AI models.
Delivered Value in Numbers
- 180+ sites crawled per day
- Stable execution of complex database queries against hundreds of millions of rows in under 5 seconds for 99% of distribution with horizontal scaling
- Approximately 250 million record updates are processed monthly, concurrently, without any impact on read-performance
- Mature software engineering lifecycle with CI/CD that allow for patches to services at any point in time. Monthly minor releases and quarterly major releases.