Search against a large, rapidly changing data set?

2 Nov 2009 - 5:16pm
4 years ago
6 replies
803 reads
DrWex
2006

I'm going to guess I'm delving into sufficiently esoteric areas that
nobody will have an answer, but we are smarter than me, so here goes:

I'm trying to improve one of our key search interfaces. The use cases
involve people making searches against a large (hundreds of thousands
of records) data set. To make matters more complicated, the data set
changes very rapidly, to the point where any set of search results we
can return may well be inaccurate or incomplete by the time it's
returned. (*)

Right now we allow unbounded searches, but truncate the result set at
an arbitrary size. Result sets are timestamped so that people know
the data were accurate as of the timestamp. My intuition and informal
user research tells me that people don't really want these large data
sets. They want more focused results.

The typical interaction patterns I know to accomplish this are
search-within-search, and faceted search. However, both patterns are
confounded by the rapid pace of data change:
- Search-within-search would result in increasingly inaccurate results
as searches were performed against outdated information or might
confuse people if we re-ran the search against the updated data, since
the second result wouldn't be a true subset of the first result, but
rather an updated subset.

- faceted search interfaces typically give people a size for their
too-large query, and then give actual results when the query
parameters have been narrowed down to the point where the result set
size is "reasonable." (for whatever definition of reasonable fits the
problem domain). In my domain, the rapid change in the data confounds
this process because the sizeof() queries are only accurate at the
time they're performed and while we might tell the person that he'd
get back 100 records based on the data now by the time the query has
run he might get back 1000 records. Or 10,000. So it's not inherently
clear to me that faceted search would help either.

Has anyone tried anything like this or have any thoughts/insights to
share about this problem?

Best regards,
--Alan

(*) There's a different problem here of people wanting to monitor the
changes, rather than perform static searches, but that's not what this
song is about.

Comments

2 Nov 2009 - 6:08pm
David Lambert
2009

The concept of searching a rapidly-changing data set, as you've
indicated, is somewhat at odds with terms like "accurate",
"static" and "complete".

The first thing I'd do is to look at places where this sort of
search is occurring right now to see whether this matches the mental
model you're shooting for. Depending on how rapid "rapid" is, you
could look at something like google blog search or a twitter hashtag
search. Either would fit the description of very possibly being
out-of-date by the time you view it.

Based on your description, it sounds like you might be talking about
a more structured search, where the results (at least) take on the
same structure across all your queries. If this is the case, then
you do indeed need to figure out whether each subsequent search means
going back to the original source data, even it it's changing.

Depending on the cost of that search, I'd maybe consider a two-phase
UI, where a user can play with search terms with a lite result set
until they're comfortable they've got the search right, and then
submit a real search that pulls a full result set.

Sorry if these seem a bit vague - I'm trying to grasp what you're
shooting for.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=47183

2 Nov 2009 - 7:50pm
Robert Wünsch
2009

Maybe a search isn't what you are looking for. Maybe an observer is.

A search agent collects data of a given moment. An observer collects without end starting at a certain moment. You could combine both. And let the result grow dynamically.

To be precise: An observer doesn't search, it gets automatically notified from the system at the event of creation or change of data. So there's a push message for you.

And maybe you shouldn't answer all questions yourself, but let the user decide. You could shift your question of "status of a given moment or current status?" from your search system to the user. Let the user decide if he wants to track only the incoming data or to search data of a given time.

2 Nov 2009 - 7:47pm
tom@tomgebauer.net
2009

Are your users entirely conscious of the incoming data after the
initial query is fired? Perhaps you could utilize some sort of queue
model, wherein the user is being updated as to how many new records
have been added since the search was performed? This would allow them
to refresh the results to display the new data. I guess you could call
that the "Twitter" model?

Tom

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=47183

3 Nov 2009 - 11:15am
Jennifer Vignone
2008

What do you mean when you use the term "accurate" to describe the results? I think I can see what you're getting at, but I would venture to say that any result set that is a true reflection of the criteria entered is "accurate". It is just that the data may have updated due to the rapidly changing addition/subtractions/and so on. That wouldn't make the results incorrect, as they were performed against the data available at that time.

That being said, it seems that you want to timestamp the results from the moment the user clicks "search" in order to give that results set a definition against the changing data. From that point on, any further filtering would be against that result set. If the user wanted to re-run it against what is "current" (in other words, may have changed since the running of the initial search) then the user could:

-- re-run the search against the most recent data
--"refresh" the results against the most recent data

I would allow the user to save:

-- search criteria
-- any results set, which would automatically be dated (and that date perhaps un-editable so it wasn't lost). You could open each new/updated search in a new window or tab.

If they were to refresh, you may want to highlight where data had changed against the earlier result(s). It depends on what the users want, need, is useful, have tolerance for (too much data isn't always a good thing).

In regard to the size of the results set, you could, as part of the criteria, ask the user to bring back results in order of the percentage of matching to the criteria, or in chunks ("show me 25/50/75/ALL results at a time").

I hope this helps or furthers the ideas along. I love 'search-related' topics
Jennifer
==========================================================

-----Original Message-----
From: discuss-bounces at lists.interactiondesigners.com [mailto:discuss-bounces at lists.interactiondesigners.com] On Behalf Of Alan Wexelblat
Sent: Monday, November 02, 2009 5:16 PM
To: list IXDA
Subject: [IxDA Discuss] Search against a large, rapidly changing data set?

I'm going to guess I'm delving into sufficiently esoteric areas that
nobody will have an answer, but we are smarter than me, so here goes:

Best regards,
--Alan

(*) There's a different problem here of people wanting to monitor the
changes, rather than perform static searches, but that's not what this
song is about.
________________________________________________________________
Welcome to the Interaction Design Association (IxDA)!
To post to this list ....... discuss at ixda.org
Unsubscribe ................ http://www.ixda.org/unsubscribe
List Guidelines ............ http://www.ixda.org/guidelines
List Help .................. http://www.ixda.org/help
This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.

3 Nov 2009 - 11:30am
Jennifer Vignone
2008

I am wondering if the data is changing so frequently, that the app would be in a constant state of being updated?
What if they could select which type of criteria they wanted to be alerted to as to when it was updated?
That would give them a more customized alert, which speaks to the original Alan concern of not bringing back too much data...could perhaps target which results get re-run.

-----Original Message-----
From: sreeramen ramaswamy [mailto:sreeramen at gmail.com]
Sent: Tuesday, November 03, 2009 11:28 AM
To: Jennifer R Vignone
Cc: Alan Wexelblat; list IXDA
Subject: Re: [IxDA Discuss] Search against a large, rapidly changing data set?

if you are looking at personalizing the search and pushing content to
user then it would be additional help. they could be informed of new
content being updated.

may be widgetize the content.

On Tue, Nov 3, 2009 at 9:45 PM, Jennifer R Vignone
<jennifer.r.vignone at jpmorgan.com> wrote:
> What do you mean when you use the term "accurate" to describe the results? I think I can see what you're getting at, but I would venture to say that any result set that is a true reflection of the criteria entered is "accurate". It is just that the data may have updated due to the rapidly changing addition/subtractions/and so on. That wouldn't make the results incorrect, as they were performed against the data available at that time.
>
> That being said, it seems that you want to timestamp the results from the moment the user clicks "search" in order to give that results set a definition against the changing data. From that point on, any further filtering would be against that result set. If the user wanted to re-run it against what is "current" (in other words, may have changed since the running of the initial search) then the user could:
>
> -- re-run the search against the most recent data
> --"refresh" the results against the most recent data
>
> I would allow the user to save:
>
> -- search criteria
> -- any results set, which would automatically be dated (and that date perhaps un-editable so it wasn't lost). You could open each new/updated search in a new window or tab.
>
> If they were to refresh, you may want to highlight where data had changed against the earlier result(s). It depends on what the users want, need, is useful, have tolerance for (too much data isn't always a good thing).
>
> In regard to the size of the results set, you could, as part of the criteria, ask the user to bring back results in order of the percentage of matching to the criteria, or in chunks ("show me 25/50/75/ALL results at a time").
>
> I hope this helps or furthers the ideas along. I love 'search-related' topics
> Jennifer
> ==========================================================
>
> -----Original Message-----
> From: discuss-bounces at lists.interactiondesigners.com [mailto:discuss-bounces at lists.interactiondesigners.com] On Behalf Of Alan Wexelblat
> Sent: Monday, November 02, 2009 5:16 PM
> To: list IXDA
> Subject: [IxDA Discuss] Search against a large, rapidly changing data set?
>
> I'm going to guess I'm delving into sufficiently esoteric areas that
> nobody will have an answer, but we are smarter than me, so here goes:
>
> Best regards,
> --Alan
>
> (*) There's a different problem here of people wanting to monitor the
> changes, rather than perform static searches, but that's not what this
> song is about.
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help
> This email is confidential and subject to important disclaimers and
> conditions including on offers for the purchase or sale of
> securities, accuracy and completeness of information, viruses,
> confidentiality, legal privilege, and legal entity disclaimers,
> available at http://www.jpmorgan.com/pages/disclosures/email.
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help
>
This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.

4 Nov 2009 - 11:00am
jaketrimble
2008

I think Tom and I share the same thought on this. Have you ever used
MS Outlook 2007's Search? It will return results to you while it
still is searching through your thousands of messages. A UI with a
subroutine that would be constantly searching and updating the
results (or at least letting the user know there are new results)
would be useful in this instance.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=47183

Syndicate content Get the feed