data – Noblesse Oblige

possibilities — A symbol to me of the power of our future, and our technology. I traveled thousands of miles in mere hours to snap this digital photo and capture it on a chip the size of a postage stamp.

While we move forward in technology at a furious pace there certainly are some huge gaps, and as those gaps and verges close we will see many new things that nobody predicted nor could have predicted; and we will see old things fade away. The largest scale macro trends will continue regardless of gaps and pitfalls; if one path to the future closes a thousand others will open. It’s been that way for most of our history and that’s a large scale macro trend I don’t expect to falter.

In future articles I’m going to outline some gaps I’ve seen, and potential means to close them. Please keep in mind however that nobody can predict the future – that you can only predict trends. Even when predicting trends you are likely to get the future wrong if you look at micro or macro trends — you cannot predict which trends will continue, and which will end, you can only look at the large confluences of trends and attempt guesses at which are most likely to continue. In other words you know that it’s likely that the Mississippi will make it to the Gulf of Mexico regardless of the oxbows and loops it makes.

Think of the sharp trend lines and market charts once there of VHS and Beta Max tape manufacturing and sales to get an idea of what I mean. At the dawn of the tape age, none could predict with certainty the micro trend of the war between the formats, or whether the macro trend of tape sales in general would continue, but it was easy to step back and see the larger scale macro trend of generic technology – data storage media would continue to change, but storage would continue to become less expensive, smaller in form and format, and more widely available.

Whether that tape machine was capturing and streaming back data in your Betamax, VHS, or computer room backup tape carousel it was all same-same when you consider the larger macro function of the technology: Capturing data for preservation and/or later playback. That was global.

The generic larger purpose of tapes and the various tape formats was to record and preserve data. The real trend wasn’t between the formats or the physical shape or the protocols: it was really towards more data in less physical space and for less cost. That large scale macro trend was occurring in all formats, from silicon to tape to hard drives to optical and it continues through this day. It’s also quite possible that some other technology will replace both Blue Ray and HD-DVD before that format battle ever finishes.

A few years back it would have been a massive project in capital and expense to perform a one time physical transfer of 650 gigabytes of data between two companies or vendors – however right now a single person could pop into SAM’s or Best Buy and pick up at 1 Terabyte or 2 Terabyte USB drive and get that transfer done in under two hours if you eliminate the travel time. Even better than that you can see storage devices becoming something a bit more than just storage devices. One example is the “Eye-Fi” chip – it’s specialized storage for Digital cameras, but it’s also a GPS and a Wifi network adapter for your camera. It’s the size of a postage stamp, and the width of a couple of quarters.

There are cards with larger memory space, and you could put an entire K-12 education in the space of one of these postage stamp cards if you worked at it.

So when looking at the longer term future to get to accurate predictions of trends you must take them to higher functional large scale levels, or look at them as very large scale macro trends. Worldwide soybean production going up is not a large scale macro trend. The large scale macro trend behind that simplex market trend is that food supplies and therefor diets are diversifying globally.

This large scale macro trend is the confluence of several technologies crossing verges, and no particular macro trend (in the marketing, woo-woo “we are trying to sell you something” definition of macro trends) is responsible.

Instead all are somewhat needed, including better packaging, preservation, transport, free trade, the internet and television proliferation of diverse cultural methods of cooking, etc . etc. Don’t worry however foodies: if any of these smaller macro trends falters, something else will take its place. The large scale macro trend of more diverse diets and food supplies is not going to end anytime soon because large scale macro trends are measured in centuries and millenniums, not years and decades. They are determined only in part by demographics and desires as marketers would tell you, but also by technology. There might be momentary fluctuations – some that last a decade or two, or even some like the Dark Ages that last centuries, but the large macro trends will continue. Other examples of large scale macro trends:

Worldwide capital increases
Our sources of energy multiply
Our ability to store and transfer knowledge increases
Life spans increase

Technology becomes more complex, and more capable, while becoming more accessible as individual powers and capabilities increase. In my garage sits a car with more horsepower than most medieval kings could muster in a few moments, in my computer is a powerful media studio that can broadcast to the world over the internet, on tap at the nearest electrical outlet is more energy than that held by all of the tyrants in history who ever held human slaves.

The particular spots or time where these large scale macro trends fail are the exceptions not the rule. Afghanistan and Sub-Saharan Africa are two places where these overall trends break right now, but they are the exceptions not the rule, and over time even those places will improve.

So it is that I am a confident optimist based on the past example of our long history. Whatever pratfalls, missteps, and tumbles that humanity has taken we have always managed to dust off and carry on with the journey after. As we witness one of those pratfalls that will become the biggest environmental disaster since we started recording them in the Gulf of Mexico I am also confident that over time the problem can be overcome.

Signal to Noise and the Future of the Net

Tired of Wall of Mud Search returns?

As we gain more people on the network, more content on the network, and more raw data on the network we have to get smarter about how we use it and our tool-kits are going to grow in ability; but can the tools keep up with the pace?

One attempt at this is Wolfram Alpha, a tool designed to answer natural language questions approximately intelligently with an assemblage of pertinent data in a page, somewhat like a mash-up of data filtered by your parameters from what I have seen. Think of it as a rosetta stone for types of factual data accessable on the internet since it does conversions, comparisons, and charts.

After watching one demo video it does appear exciting, but I fully expect WA to run into some of the same problems that Natural language Interactive Voice Response units have (NL-IVRs,) along with the problems that Google and Wikipedia both have.

You’ve probably met one of these NL IVRs over the phone, they generally go into their spiel and along the way tell you to “just ask” for what you want.

Taking away the voice recognition faults and dictionary tuning tasks, using a natural language dictionary for even limited applications like directing customers between the billing department or the customer service department does take some skill because you can’t guess all combinations of how the customer might ask for billing or for customer service. One might say bill, another billing, another pay, another the name of the product and collections, etc. etc. There’s also that question of what you do when someone asks for something outside the standard menu.

Wolfram’s advantage is that it doesn’t have to deal with interpreting language and dialects from sound, but on the other hand it does have it’s own “mispronunciations” in the form of typos and colloquialisms.

Wolfram also has greater advantage over NL-IVR’s in that the entire language is out there on the network and defined, and Wolfram will have access to all of that. On the other hand it wouldn’t hurt Wolfram’s success chances to have a chat with some of the NL-IVR industry leaders to see what they are struggling with in practice every day. Like some of the first natural language sound dictionaries (University of Oregon and their 800 number survey collection of dialects comes to mind,) and the major search engines, I suspect that Wolfram will use some crowd-sourcing to iteratively tune itself.

As the net grows millions of more people are added annually, new sources of raw data come online, and the content is becoming much richer. That said, the poor signal to noise ratio from those additions is quickly becoming alarming. When I was a roadie it was very important to keep the signal strong at the starting point, and as you amplified it was important to keep each amp in the chain leading to the last mixer at maximum strength without distortion if you wanted clean pure sound ( let’s forgive the distortionists and their wall of mud sound for the moment, it was interesting as a primitive artistic experiment, but when the day is done you want to hear the expression of the individual instruments and voices woven together cleanly – and those are the songs that will truly last.)

For a search on any specific you have to wade through a morass of sites, some using keywords and Search Engine Optimization to just get you to eat their search of links rather than the one you first chose. Some are not information sites, but instead splogger, retread, and disinformation sites – they have all the right words but they are telling lies or redirecting to nonsense and away from the original or actual content. They are the distortion and the noise in the net. How do we get to authority, how do we get to relevant searches, how do we get to trusted and genesis sources?

The major search engines have made sorting signal from noise into their business – but they are beginning to fail under the load. Merely cataloguing data and ranking through keywords, links, and page hits as authority measures and popping that stack when searched for just isn’t enough. Indeed it’s rather a simplex approach as the protagonist found out in Samuel Delany’s “Empire Star/Babel 17” upon meeting the simplex culture of the Galactic Encyclopedia the first time. (That pair of linked novellettes also explore linguistics, their nature, and how they shape culture and so are an interesting and entertaining read in their own right.)

Back to the subject at hand: the tools we have for transforming raw data to usable information are improving, and you can see that Wolfram is taking advantage of those – whole databases have been put online and places like Gapminder.org and others are working on methods to take that data and turn it into useful information in the form of tweakable charts. That is to the good, but as more data comes how do we get from simple line graphs to the search that pops up the right control chart for a specific question? How do we insure that the right data set is chosen? Therein lies the rub in taking data from raw to information form. The next step in the chain is taking that information and converting it to usable intelligence and we really haven’t quite got there anywhere that I’ve seen yet, perhaps Wolfram is the first reach to crack open that door.

In the meantime we are saddled with the growing babel of search optimization as commercial interests compete with social networks to steal the search mojo. In that open environment what tools can be used to get to factual pertinent content and genesis sources where required? There are various means in practice, but right now they amount to measuring popularity through a few means, and popularity does not usually equate to trustworthy or authoritative sources. Few of us have 100 years to live and poking through three pages of links to get to the pertinent sites needed is a waste of valuable time. The novelty of noodling through the net is also wearing off in the general public, they want what they want now, not yesterday – they are growing tired of distortion and wall of mud searches.

Both Commercial and Social ranking sites have created another phenomenon of the web, something I’ll call “yellow searchalism” for now. The snarkier and the more sensational that your headline, excerpt, and tags are the more chance that your article will get higher ranked. It’s like the yellow journalism of the past – the more alluring the headline sold more papers, the more obnoxious or sensational tagline also gets more hits.

There are also problems in communities that ding up and ding down, such as Digg and Little Green Footballs. While having completely different political bases each has “thought leaders” who if they plus something up are more likely to be followed by others who plus things up. Yellow searchalism and time of day also affect ratings at these sites. An article posted at right time of day with a snarky headline is more likely to go up in rank than the same article posted off-peak with mundane, factual headline. Each community is attracted to specific interests and you are more likely to find technology, entertainment, and humor on digg while LGF is more news, politics, science, and technology.

To Charles Johson’s credit Little Green Footballs is also pioneering with a filter system in the form of “monitor lizards” who remove links to non-factual sources, kookspiracy or hate sources, and they also clean out some of the hysterical and hyperbolic, while Digg doesn’t appear to have any similar mechanism in place.

One of the means of search ranking is through a mix several methods: number of hits, links to that page, number of times your terms appear, and similar quotes and citations. Most search engines will not divulge their full means since that allows you to “hack the stack..” But as seen with Google bombs and search page ranking races that’s not working effectively more than half of the time. People who were once attacted to the salacious and attractive are getting frustrated now because they aren’t getting exactly what they asked for – distortion and walls of mud searches are going out of style.

We have to get better at honing in to what is truly asked for versus what’s popular or what’s highly pimped, and some are trying through tailoring to stated individual preferences and past preferences. Some examples of this are Itune’s Genius, Youtube’s “recommended for you,” and other examples are in this article. The negative with “tailored for you” approaches to ranking is that it boxes individuals into a room of the same and they can lose all sight of the new. When wanting a new view of new things coloring that with past bias is not really a good thing, and it can stultify creativity.

The other factor that weighs heavy on the net: search engines can’t tell when you are looking for empircal fact or when you are looking for entertainment or fantasy, and there are no dotted lines between the information and disinformation. So when searching for the empirical you might end up at a speculative entertainment site, a political site with bias, or others. One example: if you type in “carbon dating accuracy” five of the top ten links will take you to young earth creationist pseudo-science sites that will tell you that carbon dating is bunk when it’s really a proven method.

So what means are there for trust and authority? Here are a few, some in use, some not:

First mention of terms: Is “genesis” and authorship really ranked or given credence by most?
Number of links back – (a traditional but last century approach to authority which sometimes confuses popularity with authority)
Number of “updings” at a mix of social sites (popularity)
Length of time spent on page vs. length of content (authority)
Links from authoritative sites with authority measured in scholasticism instead of number of link backs (authority/trust)
ratio of facts / data (one that I haven’t a clue about how to measure)
Think tank links (authority)
.edu links (authority)
Entertainment vs. Informaton: numbers of links from categories of sites. (entertainment, news, sports, humor, e.g. traditional classification.)
Past preferences of the individual
filters: what’s in place to stop disinformation? (authority, and I haven’t a clue how to do this without human watchers who will have bias, ala the wiki page reversion wars we’ve seen)

Now if you mix those all together and drive it with pseudo AI in a well mannered way, you might improve the system. The first ones to do this well have a great chance to displace Google. Also keep in mind that with the “get your raw data online and accessable” movement well underway, similar tools will be needed for authority of databases and as we move to a full rich media world, how do you mine a video for tags? Will natural language voice recognition be woven into search engines for audio and video content that right now relies on users and others to hand tag it with text?

Finally: What other means are there to classifying, codifying, and sorting the net? What are your ideas on it?

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Tag: data

Why I’m an Optimist: Large Scale Macro Trends

Like this:

Share

Like this:

Signal to Noise and the Future of the Net

Tired of Wall of Mud Search returns?

Share

Like this: