Hot on the heels of our acquisition of Adaptive Real Estate Services we can today announce the initial roll out of edgeio’s relevance based search engine. This is the first step in our efforts to make edgeio.com the best place to find “stuff” anywhere in the world.
By way of background, edgeio launched in March with zero listings. We took in about 100 new listings per day at that time. Today we take in about 700,000 new listings per day. The search engine we began with (free text matching and then results in reverse chronological order) simply was not good enough to function with this number of listings.
We now have a dedicated search team and this is their first push. It is not yet perfect but it is a vast improvement on what was there before.
In this upgrade we are acknowledging the way partners and users are using edgeio and trying to improve their experience. Many listings based sites are uploading their listings to us and we are providing search traffic back to them. We are being used as a listings search service by companies with listings and by users looking for listings. A “search engine for stuff” if you will.
Here are a few searches to try:
These are all global searches (edgeio has data from about 15000 cities worldwide). You can use the geography widget (top right of the results screen) to choose a city. Once you have done that then the slider control can be used to fine tune the results (zip, city, state, country, continent, world). Of course, you can also sort by price or by date listed.
Arun Jagota; Josh Myer and Dale Johnson are the team – mostly quite new at edgeio – who are working on search, and have moved us from a reverse chronological display of results into a relevance ranked display. Of course they have had a lot of help from others, most notably our technical advisors. And they have a lot of work still to do to make the results the best there is.
Going forward, as edgeio strives to bring together, organize and distribute the world’s marketplaces, edgeio.com will be the place that our organizing efforts are most obvious. It will be the place to find “stuff”.
From here on relevance will be our default sorting method. Of course we will enable users to modify the sort order (by time, by price, and in the future by other criteria). Our outbound APIs will eventually reflect these options also.
There is a whole lot more to come from us, and this is a baby step in many ways, but a significant directional move. Let us know what you think.In future posts we will talk about the bring together and distribute parts of our vision – these are realized through our edgedirect product.
But for now lets meet the team working on search:
I am a search engineer at edgeio. I am working on the design and evaluation of algorithms for improving relevance in particular and search in general at edgeio.
One of the key challenges is the relevance problem itself. A tough nut to crack. The challenge is to find methods that are both simple and efficient, yet effective in returning relevant results. Another challenge (specific to edgeio) is to fetch relevant results from a variety of sources in real-time, recompute their relevance internally in real-time, and merge them into a single set of results that the user sees. A third issue, also specific to edgeio, is that our documents (unlike general web pages) are listings in verticals with varying degrees of structure. So there are special issues involving relevance and search for finding “stuff” rather than web pages.
What keeps me motivated is that “relevance and search” supply me with a constant source of challenging (but not impossible) problems to solve, and algorithms from computer science, statistics, and information retrieval present me with solution methods to consider and evaluate. Another thing that keeps me going is constant incremental progress and quick feedback. You have an idea, try it out, sometimes it improves relevance, and you notice it quickly.
Before working at edgeio, I worked at another start-up (Xoom corporation) as a data analyst and machine learner. There I designed improved algorithms for predictive modeling in an e-commerce setting and also some for improved fuzzy matching of names and addresses of people. Prior to that I taught graduate courses as an adjunct faculty member in computer engineering at Santa Clara University, including one on “Information Retrieval And Search Algorithms”.
Hi, I’m Josh. I’m the Young Guy at the office, but I make up for it with an intense background. Before going to college, I spent a few years working as a reverse engineer and general puzzle-solver, in fields ranging from accounting to instant messaging. I just wrapped up two degrees from the University of North Carolina (Chapel Hill), one in Linguistics and one in Mathematics. I focused on the typically-impractical formal aspects of both, but it’s actually come in handy when working on search problems.
I spend a lot of time in the plumbing of edgeio, but have been working more on search lately. The user-visible bits that I’ve done so far are the real-time search results from external providers. I’m currently working on several things to make search better, faster, and more user-friendly.
Working here has been great: there’s always a new problem to solve and the freedom to solve it the way you want to. All told, I get to use my entire background at work, ranging from unix arcana to the acquisition of language in children. It’s all the fun parts of college (laid-back, lots of new knowledge) with the fun parts of a job (making useful things, getting paid).
I am variously “search engineer”, “sphinx developer”, “data platform engineer”, “senior database engineer”, roughly in that order.
I have done 15 years of database work, on relational database, data warehousing and search. I have done work on PostgreSQL internals, and have studied MySQL internals. Most recently I worked at Tellme where I designed and developed a 1.5TB data repository to drive data warehouse reporting functions for the call details of each of over 1 billion Tellme-answered phone calls. This involved a redundant and reliable cluster of over 50 mysql servers using inexpensive off-the-shelf hardware. This used a combination of mysql and record-oriented raw data files.
I am currently coding extensions in C++ to our search engine, doing things like parsing out Chinese sentences into searchable blocks to support http://mulu100.com. Also I recently have implemented some statistical approaches to our full text search, gathering a corpus profile and applying that in real time to search terms to improve the selectivity of results.
The key challenge I think is be able to flexible enough to implement a solution as we discover the most natural way for a user to navigate through millions of items. To provide a back-end that is able to support a dynamic state-of-the art interactive user experience that people now expect; and to be able to provide these results in real time. Many requests need to distinguish between tens of thousands of documents which have one or more of the search terms present, and determine the top 10 / 100 / 1000 of those items in under a quarter of a second. Under these operational constraints, the traditional relation database approach completely falls down; quite the fun engineering challenge.
What keeps me motivated is the knowledge that the web is still 95% noise and 5% signal. Search is the thing that has the potential to cut through the noise, so we’re really fighting the good fight, of taking listings from potentially obscure but highly useful sites, and making them available to the people it will really matter to, and doing it in a fair and egalitarian way.
NB. Our email addresses are first name at edgeio dotcom