Spidering and linguistic analysis are just some of the tools being used now by enforcement and intelligence researchers to trawl the web looking for terrorist related information and contacts. Advanced computational power being thrown at this project is needed, due to the sheer volume of webtraffic that has to be looked at.
.."One of the tools developed by Dark Web is a technique called Writeprint, which automatically extracts thousands of multilingual, structural, and semantic features to determine who is creating 'anonymous' content online. Writeprint can look at a posting on an online bulletin board, for example, and compare it with writings found elsewhere on the Internet. By analyzing these certain features, it can determine with more than 95 percent accuracy if the author has produced other content in the past. The system can then alert analysts when the same author produces new content, as well as where on the Internet the content is being copied, linked to or discussed.".....more there
ed: and who determines who or what *is* a terrorist, so that someone winds up on some list? We already have seen serious abuses with the "no fly" list, and people affected by it have zee-ro recourse to find out why they were on it in the first place, whether or not the information is accurate, or even that the information based on some bureaucrats decision actually relates to them because of mistaken identity. How far are they going to take this? Enemy of the state level? Remember this (in)famous "decided" proclamation, "you are with us or with the terrorists". How about "hate speech", will that count as terrorism? I've seen that label severely abused as well.
Technically, all data being accessed by this new software was available to the public anyway. That said, I do see a ton of potential for abuse here. The article, being from the NSF, only talks about it being used on terrorists, but like zogger said the definition of a terrorist can be flexed. If somehow anyone who disagrees with the president is considered a terrorist, then this software could be used to track down anonymous dissent.
It's also mightily hard to counter this kind of AI - no software exists to counter it, the only way is to train yourself to vary your typed speech patterns.
Hmmm...I wonder how hard it would be to create a "language standardizer". A script that took a block of text and changed the style to make it more "normal". Sort of like a voicebox but for text. Anything you wanted to post "anonymous" would be run thru this script first, which would -- in essence -- transform it to syntax used by your average 10-year-old.
Get enough of people using these and they'd be hard pressed to filter things out.
Keep in mind, the Unabomber was initially identified by his writing style, so I'll bet the powers that be have been pursuing this for decades at least.
isms for me. Somebody once did an "isms" search on my Slashdot Journal, the results were highly entertaining (in that MOST of my posts were about the "forbidden" subjects of "religion, economics, and politics").
Data Mining for the Bad Guys
ed: and who determines who or what *is* a terrorist, so that someone winds up on some list? We already have seen serious abuses with the "no fly" list, and people affected by it have zee-ro recourse to find out why they were on it in the first place, whether or not the information is accurate, or even that the information based on some bureaucrats decision actually relates to them because of mistaken identity. How far are they going to take this? Enemy of the state level? Remember this (in)famous "decided" proclamation, "you are with us or with the terrorists". How about "hate speech", will that count as terrorism? I've seen that label severely abused as well.