|
 
Source: ONLamp.com The csv module is very useful for working with data exported from spreadsheets and databases into text files.
Module: csvPurpose: Read and write comma separated value files.Python Version: 2.3 and laterDescription:The csv module is very useful for working with data exported from spreadsheets and databases into text files. There is no well-defined standard, so the csv module uses “dialects” to support parsing using different parameters. Along with a generic reader and writer, the module includes a dialect for working with Microsoft Excel.Limitations:The Python 2.5 version of csv does not support Unicode data. There are also “issues with ASCII NUL characters”. Using UTF-8 or printable ASCII is recommended.Reading:To read data from a csv file, use the reader() function to create a reader object. The reader can be used as an iterator to process the rows of the file in order. For example:import csvimport sysf = open(sys.argv[1], 'rt')try: reader = csv.reader(f) for row in reader: print rowfinally: f.close()The first argument to reader() is the source of text lines. In this case, it is a file, but any iterable is accepted (StringIO instances, lists, etc.). Other optional arguments can be given to control how the input data is parsed.The example file “testdata.csv” was exported from NeoOffice.$ cat testdata.csv "Title 1","Title 2","Title 3"1,"a",08/18/072,"b",08/19/073,"c",08/20/074,"d",08/21/075,"e",08/22/076,"f",08/23/077,"g",08/24/078,"h",08/25/079,"i",08/26/07As it is read, each row of the input data is converted to a list of strings.$ python csv_reader.py testdata.csv['Title 1', 'Title 2', 'Title 3']['1', 'a', '08/18/07']['2', 'b', '08/19/07']['3', 'c', '08/20/07']['4', 'd', '08/21/07']['5', 'e', '08/22/07']['6', 'f', '08/23/07']['7', 'g', '08/24/07']['8', 'h', '08/25/07']['9', 'i', '08/26/07']If you know that certain columns have specific types, you can convert the strings yourself, but csv does not automatically convert the input. It does handle line breaks embedded within strings in a row (which is why a “row” is not always the same as a “line” of input from the file).$ cat testlinebreak.csv "Title 1","Title 2","Title 3"1,"first linesecond line",08/18/07$ python csv_reader.py testlinebreak.csv ['Title 1', 'Title 2', 'Title 3']['1', 'first line\nsecond line', '08/18/07']Writing:When you have data to be imported into some other application, writing CSV files is just as easy as reading them. Use the writer() function to create a writer object. For each row, use writerow() to print the row.import csvimport sysf = open(sys.argv[1], 'wt')try: writer = csv.writer(f) writer.writerow( ('Title 1', 'Title 2', 'Title 3') ) for i in range(10): writer.writerow( (i+1, chr(ord('a') + i), '08/%02d/07' % (i+1)) )finally: f.close()The output does not look exactly like the exported data used in the reader example:$ python csv_writer.py testout.csv $ cat testout.csv Title 1,Title 2,Title 31,a,08/01/072,b,08/02/073,c,08/03/074,d,08/04/075,e,08/05/076,f,08/06/077,g,08/07/078,h,08/08/079,i,08/09/0710,j,08/10/07The default quoting behavior is different for the writer, so the string column is not quoted. That is easy to change by adding a quoting argument to quote non-numeric values: writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC)And now the strings are quoted: $ python csv_writer_quoted.py testout_quoted.csv $ cat testout_quoted.csv "Title 1","Title 2","Title 3"1,"a","08/01/07"2,"b","08/02/07"3,"c","08/03/07"4,"d","08/04/07"5,"e","08/05/07"6,"f","08/06/07"7,"g","08/07/07"8,"h","08/08/07"9,"i","08/09/07"10,"j","08/10/07"Quoting:There are four different quoting options, defined as constants in the csv module.QUOTE_ALLQuote everything, regardless of type.QUOTE_MINIMALQuote fields with special characters (anything that would confuse a parser configured with the same dialect and options). This is the defaultQUOTE_NONNUMERICQuote all fields that are not integers or floats. When used with the reader, input fields that are not quoted are converted to floats.QUOTE_NONEDo not quote anything on output. When used with the reader, quote characters are included in the field values (normally, they are treated as delimiters and stripped).Dialects:There are many parameters to control how the csv module parses or writes data. Rather than passing each of these parameters to the reader and writer separately, they are grouped together conveniently into a “dialect” object. Dialect classes can be registered by name, so that callers of the csv module do not need to know the parameter settings in advance. The standard library includes two dialects: excel, and excel-tabs. The “excel” dialect is for working with data in the default export format for Microsoft Excel, and also works with OpenOffice or NeoOffice. For details on the dialect parameters and how they are used, refer to section 9.1.2 the the standard library documentation for the csv module.DictReader and DictWriter:In addition to working with sequences of data, the csv module includes classes for working with rows as dictionaries. The DictReader and DictWriter classes translate rows to dictionaries. Keys for the dictionary can be passed in, or inferred from the first row in the input (when the row contains headers). import csvimport sysf = open(sys.argv[1], 'rt')try: reader = csv.DictReader(f) for row in reader: print rowfinally: f.close()The dictionary-based reader and writer are implemented as wrappers around the sequence-based classes, and use the same arguments and API. The only difference is that rows are dictionaries instead of lists or tuples.$ python csv_dictreader.py testdata.csv {'Title 1': '1', 'Title 3': '08/18/07', 'Title 2': 'a'}{'Title 1': '2', 'Title 3': '08/19/07', 'Title 2': 'b'}{'Title 1': '3', 'Title 3': '08/20/07', 'Title 2': 'c'}{'Title 1': '4', 'Title 3': '08/21/07', 'Title 2': 'd'}{'Title 1': '5', 'Title 3': '08/22/07', 'Title 2': 'e'}{'Title 1': '6', 'Title 3': '08/23/07', 'Title 2': 'f'}{'Title 1': '7', 'Title 3': '08/24/07', 'Title 2': 'g'}{'Title 1': '8', 'Title 3': '08/25/07', 'Title 2': 'h'}{'Title 1': '9', 'Title 3': '08/26/07', 'Title 2': 'i'}The DictWriter must be given a list of field names so it knows how the columns should be ordered in the output.import csvimport sysf = open(sys.argv[1], 'wt')try: fieldnames = ('Title 1', 'Title 2', 'Title 3') writer = csv.DictWriter(f, fieldnames=fieldnames) headers = {} for n in fieldnames: headers[n] = n writer.writerow(headers) for i in range(10): writer.writerow({ 'Title 1':i+1, 'Title 2':chr(ord('a') + i), 'Title 3':'08/%02d/07' % (i+1), })finally: f.close()$ python csv_dictwriter.py testout.csv $ cat testout.csv Title 1,Title 2,Title 31,a,08/01/072,b,08/02/073,c,08/03/074,d,08/04/075,e,08/05/076,f,08/06/077,g,08/07/078,h,08/08/079,i,08/09/0710,j,08/10/07References:Python Module of the Week HomeDownload Sample CodePEP 305, CSV File APITechnorati Tags:python, PyMOTW
 
Source: ONLamp.com Jesse Noller is leading a campaign to have Python developers form a network via LinkedIn.com.
He talks about it over on his blog, so check it out for the details.According to Doug Napoleone’s comment on Jesse’s post, there is a more formal effort to set up a PyCon08 group and tie it in with the web site for the convention. I didn’t realize that LinkedIn supported groups other than “employers”. It looks like the right way to go for the community is a “networking group” (there are only a few types, and the others seem to imply a more formal organization than what we would have). Unfortunately, the groups feature is closed for right now.For now, following Jesse’s lead, I set up a position with the “Python community” organization and job title “None”. I had to guess at the start date. :-)I noticed several Python-oriented groups over on Facebook, so that might be an alternative if LinkedIn doesn’t come through. Somehow LinkedIn feels more professional; maybe I just have a historical bias based on Facebooks origins, though.
 
Source: ONLamp.com News of University of California Santa Cruz computer scientist Luca de Alfaro’s Wikipedia trust-coloring system revived - and improved - an idea I’ve been playing with: automated reputation-management for politicians. The idea is to make the concept of honor meaningful again, by creating new social rewards and penalties for behavior that affects the rest of us. (It could, of course, also be applied to journalists, corporate leaders or other public figures.) De Alfaro’s system, now operating in demo form on a sample of a few hundred Wikipedia pages, ranks the trustworthiness of Wikipedia authors by measuring how long their contributions last without being edited. Text contributed by the author is color-coded for trustworthiness: Text on white background is trusted text; text on orange background is untrusted text. Intermediate gradations of orange indicate intermediate trust values. I think it would be useful to be able to do the same thing with politicians’ names every time they appear on the web. Here’s how I think it might be spec’d:
Our software would crawl the pages of factcheck.org, looking for the names of politicians. The software would check to see if each name appeared in the context of a correction of an untruth/exaggeration/”misstatement”. The reputation of each politician would be scored according to how many appearances his/her name made in such negative contexts. Any time the politician’s name appeared on a web page, it would be displayed in a box of the appropriate color. In this case white might not be the best choice for “trustworthy”, since the politician might not be trustworthy, just unranked. So we might go spectrum-wise from green for “honest” to red for “frequent liar”. (On a relative scale - I’m not enough of a Puritan to believe there are people who are 100% honest or 100% dishonest.) This color-coded display could be accomplished either on the client side or the server side: on the client side as a browser plug-in, or on the server side as an extension of the publisher’s content management system.
I think there would be a strong value proposition for both consumers and publishers. Imagine the impact of seeing your news presented this way: In response to a question on why the US is in Iraq, Senator X said, “….” vs. In response to a question on why the US is in Iraq, Senator X said, “….” And imagine the possible impact on politicians’ respect for the truth. Currently, if factcheck.org or some other organization calls you out on a fabrication, the impact is more or less safely sequestered within their limited reach. This way, the impact could spread everywhere, the way good or bad word on one’s reputation spreads through small real-world communities. Why use factcheck.org as opposed to open ratings? If the reputation ranking were open, I think we could count on enormous amounts of abuse by partisans, including attempts to undermine all trust in the system. The people behind factcheck.org are journalism experts, and the site is avowedly non-partisan. But it might work to make the ranking system “porous” as opposed to fully open, like the new publish2 journalism community, or in fact like granddaddy slashdot. People who had themselves earned a reputation for honesty could be allowed to rank the honesty of others. There would probably be claims, especially by those with names of an embarrassing color, that factcheck.org (or any other arbiter) is not in fact non-partisan. And so consumers might choose alternative arbiters, if it came to that. But here, too, some reputations would weigh more than others, as they always have.
 
Source: ONLamp.com I rarely use a word processor: if I can’t write plain text in a text editor (strangely optimized for editing text), something’s gone weird in my world.
I can’t always avoid word processors, however. Occasionally I really need to read (or even edit) text saved from a word processor in something other than plain text. Usually my first task is to extract the relevant text from that format into something I might actually want to use.
AbiWord is my underappreciated aid. Not only is it quick and fast (especially in startup time), but it has one stunningly useful command line option: --to=format. That’s right. From the command line, you can convert a document in whatever evil, proprietary format AbiWord supports into something useful, without splash screens or menu popups or even taking your hand off of the keyboard to move a mouse.
As a word processor, it’s fairly nice too–doing exactly what I need to do on those rare occasions when I really really need a word processor.
Thanks to the AbiWord developers and everyone who’s contributed to the project!
|