|
Vmaps - Python Arrays on mmap() Version 1.1
IntroductionThis module provides Vmap objects, which are extremely fast on-disk arrays with facilities for shared access by multiple processes, independant of Python or operating system threads or SysV IPC limits. The primary purpose of the Vmap is to access memory provided by mmap() as an array in Python. Data may be accessed as integers, floats, python long integers, an array of "fixed length" strings, or as a single string. There are two dimensional variants of the numeric access types. Interfaces to the madvise(), mlock(), and munlock() system calls are also present. Vmap objects can auto-magically To facilitate use of the ability to share memory provided by mmap(), an interface to the architecture dependant "atomic swap" instruction is included for Sparc and Intel. This provides the critical mutex component on which nearly any shared data locking scheme can be built; and brings the ridiculous ease and power of python to parallel problems with little overhead. The included example program shows trivial examples of using Vmaps for inter process communication and multiple simultaneous writers of a shared data set.
Normally you'd expect to see something about where to ask questions, report bugs, etc here; but first: some begging! This module was written partly to fill the author's need, the usual genesis of open projects. The versatile form of it before you, documented and released, far surpasses the original need; in attempt to bring joy to the persons whose questions "is there something like this?" going back yea unto the latter 90's are all there is to be found, searching for "python mmap atomic" and similar keywords. Those who need this, need it fairly badly; and some effort has been expended to make Vmaps useful (if not necessarily optimal) for everything the author can imagine. The reason for this extra effort wasn't altruism, rather a cold blooded plan to raise MONEY for the non-profit Snafu Center for Cognitive Science, which will be feeding the author as soon as it has any income. Most of the select group of users who have been itching to turn python loose on big problems on big machines should be able to afford a donation of what this software is worth to them. (Those who, like the author, are managing to work with larger systems by gosh and golly and "good lord is that the power bill?" may defer donations until they have some income.) Supporters will have preference when it comes to support in the more traditional sense. The author can be reached via email: dragon @ snafu.freedom.org and may usually be found lurking in #lair on irc.slashnet.org . This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. Installing
This module has been developed with Python 2.2 on: minimal yet successful testing has been done on: Anything else; good luck, reports of failures and successes welcome: see the Support section.
The latest version of the distribution should be available from: http://snafu.freedom.org/Vmaps/. The files contained in the distribution are: The standard Python Disutils are used; so compiling and installing this module into your interpreter is minimally a matter of running: python setup.py installYou probably want the optional code which this simple method won't get you; read on. Impatient types who just ran that command should skip ahead to the Tutorial now. System Dependencies and Build Options The important options are to the setup.py build_ext command,
where you may define a preprocessor symbol to get the atomic swap
routines and thereby header locking. The module should build and run
just fine without those routines; the header lock functionality (see
Headers) will not be included, nor will the For Intel: python setup.py build_ext -D ARCH_IA32 install For Sparc: python setup.py build_ext -D ARCH_SPARC -l rt install The "-l rt" is needed to get the sched_yield() call on Solaris, does no harm on Linux/Intel, and should not be used on FreeBSD.
When returning sequences, either lists or tuples may be generated. Lists are used by default, tuples should be perceptibly faster in use, depending on the use. To change what type of sequence is returned; see the NEWRSEQ and SETRSEQ defines near the top of the Vmapsmodule.c file, and also the final parameter to the CTOPY_ARRAY macros. The following defines are also available as module constants:
The size to which the optional Vmap header (see VM_HEADER) is rounded off. Odd-size offsets can cause core dumps and segfaults when accessing 8byte floats on the SPARC, for example; and improperly aligned data access is astoundingly slow everywhere it seems.
Default mmap() protection mode flags used if no
Default mmap() flags used if no
Default FLAGS for a new Vmap (see Flags).
This documentation while believed to be correct, isn't complete, and could be more clear. There are too many "TODO"s left in the Tutorial. The code contains no docstrings and few comments. This is relentlessly 32bit code. Expect horrible failures trying to use this module on a 64 bit system, if it even compiles. No attempt to make large file support work has been made. Switching a Vmap from VM_HEADER mode and back will cause problems; don't do that. The process of initializing a Vmap header for the first time could be smoothed out, too. Generally the two modes of operation are intended for different uses. There is less error checking than there needs to be in some places; and what errors are returned are not always sensible or informative.
There need to be a lot more methods to operate on the data "in place". Assignments of arrays and other objects could be optimized, possibly by using the buffer interface to those objects. Optionalizing the error checking might be good for some minor speed improvements; develop with the error checking enabled and turn it off for real runs. More systems need to be supported, for donations of large memory SMP machines based on YOUR favorite CPU towards that end please see the Support section. Vmaps Module ReferenceThe module defines one function, which creates Vmap objects, and several integer constants to be used with those objects. newmap ( Creates a new Vmap instance. Accepts keyword arguments, plus these
variables can be changed after the Vmap has been created (see
Attributes). The Vmap is not necessarily ready to be accessed after
creation, unless the VM_AUTOPEN flag is used, the Vmap's
Constants The size in bytes of a memory page on this system. The
Allows data to be read from the memory area. Allows data to be written to the mapped memory area. Allows execution of code in the mapped memory area. I don't think its possible to make use of this from python.
All processes using this area will share the same memory; data written by one process is visible instantly to all other processes. Makes the mapping "Copy on Write"; each process has a private view of the data. Used with a file descriptor of -1, and the MAP_PRIVATE flag to acquire memory not tied to a file. Tells the OS not to reserve a swap page for every memory page mapped. See your system's references. Constants to be given to msync(); also see the
VM_ASYNCLOSE and VM_SYNCLOSE flags and the
Sync mapped memory to (or from) disk, do not return from msync call until its done. Mark memory to be sync'd ASAP, return from call immediately. System dependant: The Solaris man page says: MS_INVALIDATE invalidates all cached copies of data in memory, so that further references to the pages will be obtained by the system from their backing storage locations. This operation should be used by applications that require a memory object to be in a known state. But Linux man pages say: MS_INVALIDATE asks to invalidate other mappings of the same file (so that they can be updated with the fresh values just written). This may mean the same thing. Constants to be given to madvise(). See also see the
mm_advflags attribute and the If the madvise() call is not present, these constants may not be present in the module. Normal memory management is to be performed. Advise the system that data in this region will be accessed in a random fasion. Minimal data will be read per access. Tells the system that the data is likely to be accessed only once, so it should try to free resources as soon as it can after that access. Inform the system that the data will be needed soon, so that it may begin reading as soon as possible. Inform the system that the data is no longer needed and that it may start freeing resources. Tell the system that the data is due to be overwritten. The system may actually dispose of the data, returning 0 filled pages on subsequent requests. a single string. an array of strings, each An array of Python Long integers, variable an array of 4 byte long integers. an array of 8 byte C "long long" integers, as Python long integers. an array of 8 byte C doubles as Python Floats. an array consisting of As above, based on 8 byte C long long integers as Python long ints. As above, based on 8 byte C doubles as Python long floats. A Vmap object does not call its own If this flag is not set, the Vmap will call its own When set this flag causes an madvise call to be done for the whole
mapped area when it is opened (explicitly with the If set, msync() with flags MS_ASYNC will be called whenver the Vmap object is closed. If neither this flag nor VM_SYNCLOSE are set, no msync() call is done by the Vmap object before munmap() is called, and thus data could be lost. If set, the timestamps returned by the For the Char and FixChar access type, controls whether element and slice deletions, and "short" assignments will clear data (to the value specified with the fillwith attribute). For the FixLong access type, determines whether data is stored little endian (flag set) or big endian (unset). For the FixLong access type, if set the data is treated as signed integers; otherwise they are unsigned. Set for Vmaps that have headers. Data access will be offset into the mmap()'d area by the size of the header, which is read from the header; which obviously leads to potential problems initializing a new Vmap... for which case and others, see next flag. When set, do locking of the header. Unset, header access is unlocked and not safe if the data is shared. Depends on the atomic swap primitive (see Compile Options); if that isn't present, locks always succeed and header access is not safe for shared Vmaps. See also Header. If the VM_HEADER flag is set, and the header data in the mmap()'d area appears corrupted or a lock fails; if this flag is set the access attempt that caused the header to be read will fail with an exception, otherwise it will try to succeed in the possible absence of necessary data. In other words you almost always want this if you have a header. If set, the Vmap instance's access type is reset from the (shared)
header on access, otherwise we access the mmap'd area with whatever
access type the Vmap instance is using. The If set, the Vmap Object ReferenceThe Vmap object is a process local cache of the variables necessary to call mmap(), and access the resulting memory as an array of Python objects. The behavoir of a Vmap depends on the Vmap objects do not do concantation or repeating. They respond
properly to x = aVmap[N ],
aVmap[ N ] = x ,
seq = aVmap[LO :HI ],
aVmap[ LO :HI ] = seq
... the 'in' operator: (except for FixFloat type) if x in aVmap:
... and item and slice deletion: (clears the element(s)). del aVmap[ N ],
del aVmap[ LO :HI ]
The following is necessarily somewhat generalized, read every sentence with an implicit "except for the exceptions". See the Tutorial for more detailed explainations of those exceptions. When assigning values to items or slices in a Vmap, the type of the Python value given must be appropriate for the Vmap's access type (ie, when the Vmap is being an array of integers (type Int), feed it python integers; when its being floats feed it python floats). Values of the wrong type in item assignment will be coerced by python where posisble, and raise an error otherwise. In sequence assignments, particularly in assigning an "item" of a 2 dimensional Vmap access type, a python value of incorrect type can cause the element to be silently set to 0, without an exception being raised. Sequence assignments can be fed anything convertable to a tuple. Normal Python access methods are used on incoming sequences, so for example array module arrays may be assigned to Vmap slices without special effort. Sequence returns are python lists of values of appropriate types (floats etc.) This data is a new, process local, unshared copy of the data in the mmap()'d memory. Assignments likewise are copying process local memory into the mmap()'d area. Either lists or tuples are returned throughout; which sequence type (list or tuple) can be changed at compile time with a minor edit of the source code. It needs noting that using Vmap objects have the "buffer interface" emulated from the stock Python mmap module, but no testing has been done of that. Its expected to work, barring the cases where it doesn't (example: a Vmap with flag VM_AUTOPEN set and VM_STAYOPEN cleared will not actually be closed after use by the buffer interface routines, as there's no way to know when the user is done using them). Read Only1 if the Vmap has been and is still The Vmap instance's Flags as a python integer. The Type of the Vmap instance, as py integer. The length of the Vmap; reported from local memory which in the
case of a Vmap with a header and The size of the Vmap instance data area in bytes; == ( size-start ) - headerbytes. The size of the Vmap header in bytes, 0 if the VM_HEADER
flag is not set, overheadbytes + The size of the internally maintained Vmap header, in bytes. Read Always / Write If ClosedThe parameters given to the Always Read / Write The advice given to the madvise() call by default if no
argument is supplied to When the Vmap has a header (VM_HEADER set) and the VM_HDRLOCK flag is set, accesses to data contained in the header will wait for this many iterations. See Header. When iterating waiting for the header lock, if this attribute is non-zero; sched_yield() will be called after every unsuccessful iteration. Fundamental Methods open ( [ calls mmap(). If the VM_HEADER flag is set, reads the Vmap header and adjusts the Vmap instance's view of the data accordingly (VM_HDRTYPE, VM_HDRLEN, etc). If the If the Calling open on an already open Vmap re-reads the header and adjusts the instance's variables as needed. close ( ) calls munmap. If the VM_SYNCLOSE or VM_ASYNCLOSE flags are set, the appropriate msync() call is made before unmapping. astype ( [ Reports and optionally changes this Vmap instance's access type. If
If the VM_HEADER flag is set when this method is called, the type and element size information stored in the shared Vmap header is updated as well. The header lock will be held during that operation. Vmap Instance Information / Manipulationelsize ( ) Returns the number of bytes a single array element occupies. elpage ( Returns the page number of array element ndx. This page number
multiplied by the system PAGESIZE results in byte offsets suitable for
feeding to the getflag ( Return 1 if the Vmap instance has setflag ( Set clearflag ( Clear times ( [ Returns a 5 tuple ( If schyield ( ) Calls sched_yield(). No parameters and no errors. Data Manipulation find ( Search for data matching When searching the 2d array types (Int2d, Long2d, Float2d), all items within an element (all columns in the row) are tested, and any which match return the element index; the particular column offset in the element which matched is not saved. NOTE: Not actually implemented for the FixLong type. sort ( [ Calls qsort() to re-order te elements of the array in place. 2D types are sorted by the value of first item (first column) only, but elements are kept intact. FixChar and FixLong types don't support sort, and sorting a Char Vmap, while functional, seems like something one would rarely need. The sort can be limited to a section of the data by providing an
copyfrom ( Use memmove() to quickly copy parts of Vmaps to each other (or
themselves). Both Vmaps are handled according to their flags then in effect. raw_string ( [ Retreives a copy of data from the mmap'd area, starting at the
optional Raw System Call Interfaces raw_msync ( [ calls msync(), with optional See PAGESIZE for restrictions on the raw_madvise ( [ Calls madvise(), with optional The presence of the madvise() call on your system is implied by the existance of the MADV_NORMAL and other madvise() flags in the module. If those constants are not present, this method will probably raise an error. madvise() calls made per the VM_ADVOPEN flag will fail too; there the error is carefully ignored. See PAGESIZE for restrictions on the raw_mlock ( [ (Only root can use this) Call mlock() on the region
specified by See PAGESIZE for restrictions on the raw_munlock ( [ (Only root can use this) Call munlock() on the region
specified by See PAGESIZE for restrictions on the Header MethodsSee the Headers section. swapheader ( ) Byteswaps the Vmap header (just the internally maintained data, the user header area is not touched). No lock is attempted. getheader ( ) Returns as a string the contents of the "user space" in
the Vmap header. These are the bytes reserved with optional parameter
to The header lock is acquired and held while the data is copied. setheader ( Sets the contents of the "user header" (allocated when the header
was initialized). If The header lock is acquired and held while the data is copied. count_add ( Adds The header lock is acquired and held for this operation. count_sub ( Subtracts The header lock is acquired and held for this operation. count_get ( Returns the The header lock is acquired and held for this operation. atswap ( AVAILABLE: Int Atomic swap of the value in array element atswap ( AVAILABLE: Int2d (2d version) Atomic swap of the value in array element byteswap ( [ AVAILABLE: Int, Long, Float, Int2d, Long2d, Float2d If the Vmap instance's access type is a 4byte or 8byte numeric,
byteswap the data area of the Vmap. The optional parameters have the
same meaning as in resize ( Resets the size of elements in a Vmap to setrange ( Set a range of array elements all to the same value sumrange ( [ Return the sum of a range of elements. If cntbndrange ( [ "Count in Bounds, of range" ... Return the count of elements
greater than minmax ( [ Returns a two item sequence ( colxchg ( AVAILABLE: Int2d, Long2d, Float2d For each array element (row), exchanges the data at element column
colget ( AVAILABLE: Int2d, Long2d, Float2d Direct fetch of column System CallsThe specifics of these vary from system to system; see your system's documentation for the details that apply to you.
Creates "on demand" memory from a file (or "nowhere", see MAP_ANON); data is read from the file transparently at need, usually faster than is possible traditional file IO.
Marks mmap()'d memory to be written back to its associated file, or alternately flushes memory in favor of data on disk.
Gives the system advice on how a mmap()'d region should be handled for most efficient operation. Not functional in Linux 2.2; returns an error.
Relinquishes a timeslice, allowing another process on the system to use this CPU. Vmap HeadersVmap objects have internal support for a header area at the
beginning of the mmap()'d area. The data stored there enables
the If the VM_HDRFAIL flag is set, operations that require access to the header will fail if the header isn't initialized, or is locked by another process (assuming header locking has been enabled with the VM_HDRLOCK flag). The the VM_HDRFAIL flag is not set, failed header access doesn't raise an error, and operations fall back to the Vmap instance's memory of what the header looked like last time it was accessed. The behavoir of the locking is modified by the hlckspins and hlckyield attributes. The entire header is protected by a single mutex; whenever that lock is needed the Vmap will try hlckspins iterations before deciding the lock has failed. If the hlckyield attribute is non-zero, sched_yield() will be called on each iteration. Vmaps which have headers access that header before almost any Python operation, to get the proper length to error check with and so on. If the VM_HDRLOCK flag is set, this could affect performance. The lock is never held very long (as in, under 100 assembler instructions), but a process may be interrupted by the operating system "whenever" which can stuff things up. When the header is first initialized with the SIZE If the Vmap instance's VM_HDRTYPE flag is set; the
If the Vmap instance's VM_HDRLEN flag is set, it tries to
behave like a list. The len() of a Vmap with the
VM_HDRLEN flag set is not the numbr of elements that can be fit
into the mmap()'d area as usual, but this shared COUNT
number. The The COUNT may not drop below 0, nor may it be raised above the number of elements possible to the mmap()'d area's size, figured using the access type and element sizes then in effect.
Shared memory and SMPNOTE: A full explaination of the subtleties and best practices for parallel programming on shared memory systems is far beyond the scope of this document, and your author's capabilities. Read the following as a breif introduction to the field written by a newcomer, therefore. Sharing memory between processes on a single processor system is easy. No process will be modifying data concurrently with another, because only one is running at any given time. On multi-processor systems however, there will be concurrent processes, and there has to be some accomodation made to keep those processes from overwriting the same data at the same time. This requires hardware support. Read Stevens "Unix Network Programming: vol 2, Inter Process Communication". It goes into great detail and provides implementations of the POSIX semaphore, etc operations; but there's always a part missing: the mutexes. The implementaions given always go to OS services for that, for excellent reasons; the hardware support is different on every architechture. Using the nice abstract interfaces imposes limits, though. If you compiled the system specific code when you installed the
Vmaps module, you have access to the Intel and Sparc hardware support
needed for real, garunteed serialized access of shared memory. The
Given this, building a mutex is easy: while 1: dt = amap.atswap(0,-1) # try lock if dt !=-1: break # we get it? amap.schyield() # no, let someone else have CPU # end while # .... operate on locked data .... amap.atswap(0,0) # unlock This code fragment illustrates using element 0 of a Int type
Vmap as a mutex. A value of -1 in this element means "locked", any
other value means "unlocked" (here 0). If another process is holding
the lock, the swap here will return -1, and the If our swap returns something other than -1, we have acquired the
mutex, locked the data, and other processes will be waiting for us to
store a non -1 value back to signify that we are done( "unlock"). This example
shows using another # .... operate on locked data .... amap[0] = 0 # unlock Which way is better depends on what exactly you are doing. The
inctest.py (Piddly Purposeless Parallel Python) The distribution includes an example program inctest.py. This is a simple demonstration of multiple processes modifying shared memory. The program is organized for making minor changes and seeing the effect on run time; all the interesting variables are right at the top of the file. First, it creates a Int2d, shared Vmap to work with. The it
spawns The children immediately after being fork()ed begin to generate the
random array indices upon which they will be operating. When
Once all the child processes have their random numbers, all of them
begin calling the Operate() function on random array elements. This
function uses column 0 of an array element to lock the row,
increments a random column, and unlocks the row. If the lock operation
doesn't succeed immediately, a count of iterations through the "wait
for lock" loop is incremented ( When a child process has done its share of the total iterations
specified ( The child processes do not exit immediately, to make it easier to test the effects of msync() by the parent before and after the children close their inherited view of the shared Vmap. Changing the synchronization conditions should be easy. Vmap TutorialBecause of the multifacted nature of Vmap objects, they may seem more complex than they actually are. This demonstration and walk through should serve to detail some of the complexities and clarify the mysteries. Firstly, fire up your Python interpreter, and import the Vmaps module: Python 2.2 (#1, Dec 25 2001, 05:56:47) >>> import Vmaps >>> v=Vmaps ... the "v" is to save typing "Vmaps" every time we refernce the module from here on (and guess what your humble author was doing Christmas morning).
Lets play with a quick annonymous mapping first. This is memory not
backed by a file to which it can be msync()'d. The
>>> amap = v.newmap(-1,8192) >>> amap.open()
... this Vmap (named >>> len(amap) 8192 >>> amap[0] '\x00' >>> amap[3]='A' >>> amap[:5] '\x00\x00\x00A\x00' >>> amap[1:10] = ' '*9 >>> amap[:12] '\x00 \x00\x00' >>> ' ' in amap 1 >>> amap.find('A') # it was there but it isnt now -1 >>> amap.find(' ') 1 Fairly straightforward Python. Being an annonymous mapping, the
initial contents are all zero bytes. The builtin Python in
operator and the >>> amap.close() >>> amap[0] Traceback (most recent call last): File "<stdin>", line 1, in ? IOError: Vmap closed >>> amap.open() >>> amap[:12] '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' This demonstrates two features. First, we can We can use the builtin Python del keyword (putting some pattern in first): >>> amap[:] = '.' * 10 >>> amap[:12] '..........\x00\x00' >>> del amap[4] >>> del amap[7:9] >>> amap[:12] '....\x00..\x00\x00.\x00\x00' Notice the slice assignment of 10 bytes to a slice of len( amap ). The Char and FixChar types are more forgiving of this treatment than the numeric types, as shall be demonstrated.
The FixChar type is an array of fixed length strings. This demonstrates much easier than it explains, but first we must change the Vmap's access type: >>> amap.astype(v.FixChar,23) # returns the type code 1 This uses Now when we access the Vmap, it returns 23 byte strings for each element: >>> amap[0] '....\x00..\x00\x00.\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> len(amap[0]) 23 >>> amap[1]='xxxxxxxxxxxxx' >>> amap[0:2] # returns a list of 2 23 byte strings ['....\x00..\x00\x00.\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'xxxxxxxxxxxxx\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'] >>> len(amap) # 8192 / 23 = 356 The len (number of elements) has changed too. Notice the pre-exisitng data is still there, our view of it has changed. The string we assigned to element 1 was not a full 23 bytes. What happenes to the rest of the space in that circumstance can be controlled with the VM_DOFILL flag (which is set by default), and the fillwith attribute. >>> amap.fillwith=67 >>> del amap[1] # see above, it was partly filled with 'x' >>> amap[0] = 'mmmm' >>> amap[:2] ['mmmmCCCCCCCCCCCCCCCCCCC', 'CCCCCCCCCCCCCCCCCCCCCCC'] >>> amap[:3] # (spaces added to element 3 for html) ['mmmmCCCCCCCCCCCCCCCCCCC', 'CCCCCCCCCCCCCCCCCCCCCCC', '\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00'] >>> amap.clearflag(v.VM_DOFILL) 36 >>> amap.fillwith=0 >>> amap[1]='xxxx' # without VM_DOFILL this time >>> amap[:2] ['mmmmCCCCCCCCCCCCCCCCCCC', 'xxxxCCCCCCCCCCCCCCCCCCC'] The
Now to show off some of the other access types: >>> amap.astype(v.Int) 8 >>> len(amap) # 8192 / 4 bytes per integer 2048 >>> amap[0] # 'mmmm' when cast to an integer = 1835887981 >>> amap[100:110] # we haven't touched these yet [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] >>> amap.setrange(1) # this can be limited to just a portion of the array >>> amap[100:110] [1, 1, 1, 1, 1, 1, 1, 1, 1, 1] >>> amap[-4:] [1, 1, 1, 1] Again, our view of the existing data has changed, and ASCII
interpreted as binary integers is usually not pretty. Using the
>>> amap.astype(v.Float) 9 >>> amap[2] # Float is 8 byte C double 2.121995791459338e-314 >>> amap.setrange(1,1,10) >>> amap[:12] [2.121995791459338e-314, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.121995791459338e-314, 2.121995791459338e-314] Floating point (Float). Whee! >>> amap.astype(v.Float2d,3) # items, not bytes 18 >>> len(amap) # 8192 / 24 (3 floats per element @8 bytes) 341 >>> amap[:4] # each element is a list of 3 floats [[2.121995791459338e-314, 1.0, 1.0], [1.0, 1.0, 1.0], [1.0, 1.0, 1.0], [1.0, 2.121995791459338e-314, 2.121995791459338e-314]] This is the data as a two dimensional array. Each element (row) has 3 columns of individual 8 byte double floating point numbers.
FixLong Type and Unforgiving Acceptance Finally, the FixLong type deserves a demonstration, and the promised pickiness: >>> amap.astype(v.FixLong, 14) # bytes (the items here are bytes) 3 >>> len(amap) # 8192 / 14 = 585 >>> amap[:3] # at this point we're psuedo-random :) [20282409608374036906816896499712L, 4872769679114799555279886951120896L, 74352564683758538135984603136L] >>> amap.setflag(v.VM_LLASG) # how about if those are signed? 1060 >>> amap[:3] [20282409608374036906816896499712L, -319527179420028073250609378099200L, 74352564683758538135984603136L] >>> amap[0:4] = -1L # nah Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: Vmap assignment expected sequence Oops... Its all messy anyway. FixLong hasn't got
>>> amap.close() >>> amap.open() >>> amap.astype() # no parameters just reports the current type 3 >>> len(amap) # the size is recalled as well 585 >>> amap[0:4] = [-1L] * 4 # this works >>> amap[0:5] [-1L, -1L, -1L, -1L, 0L] >>> amap[0:4] = [-1L] * 6 # this doesn't Traceback (most recent call last): File "<stdin>", line 1, in ? IndexError: Vmap slice assignment is wrong size Oops again. The numeric types are a less forgiving about the sequences they accept and the values contained therein than are the Char and FixChar types: >>> amap[:5] # that failure didn't change data, others will [-1L, -1L, -1L, -1L, 0L] >>> amap[0:4] = [ -1L, -1L, 1L, 8 ] # this int wont be coerced Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: Vmap assignment expected long >>> amap[0:4] = [-1L,-1L,1L,8L] # there we go >>> amap[:5] [-1L, -1L, 1L, 8L, 0L]
Now for a couple of probably non-useful tricks (recall that these are 14 byte integers): >>> amap.clearflag(v.VM_LLASG) # returns the Vmap's flags as integer 36 >>> amap[:5] # as unsigned, was: [-1L, -1L, 1L, 8L, 0L] [5192296858534827628530496329220095L, 5192296858534827628530496329220095L, 1L, 8L, 0L] >>> amap.setflag(v.VM_LLALE) # least significant == most significant now 548 >>> amap[:5] # -1 is all bits on, so endianess isnt important... [5192296858534827628530496329220095L, 5192296858534827628530496329220095L, 20282409603651670423947251286016L, 162259276829213363391578010288128L, 0L] >>> Gotta love them Python bignums. FixLong Vmaps as well as
FixChar have a >>> amap.resize(8000) 8000 >>> len(amap) # 8192 / 8000 = 1 >>> amap[0]# broken for html 2271371013423771532966636899650014224536397371723 1670476922125503827279038503193467041246456334782 656935715744462339438782354166906879L >>> amap.clearflag(v.VM_LLALE) # we've dirtied the first 28 bytes... 36 >>> amap[0] # if those are the most significant bytes (big endian)... 831232460999333652239585333103 ...[clipped]... 1943132749824L >>> len(str(amap[0])) # thats a very large number 19266
So far Vmaps have done nothing that the Numeric arrays can't do... Almost. The memory for the annonymous mapping used in previous examples is actually freed when the Vmap is closed, as opposed to Numeric arrays' storage which is kept by the program for possible later use. On modern systems with swapping and virtual memory, this is usually not a big issue. To use a file to store a Vmap, give it a real file handle, attached to a file open for read and write, and large enough to contain the wntire mmap()'d area. Attempting to access data past the end of the file will cause interesting and system dependant failures. The first thing to do is create a file that is large enough, and has some known contents: >>> f = open('xxxx','w+') >>> f.write(chr(2) * 16384) >>> f.seek(0) >>> f.read(3) '\x02\x02\x02' Next we'll create a Vmap (named >>> bmap = v.newmap(f.fileno(), 16384, 0, v.MAP_SHARED)# 0 is offset into file >>> bmap.open() # the Vmap is the default Char type >>> bmap[:5] '\x02\x02\x02\x02\x02' Our Vmap contains the file contents. >>> bmap[:10] = '.' * 9 >>> bmap[:12] '.........\x02\x02\x02' >>> bmap.raw_msync(v.MS_SYNC) The TODO: ...finish this, PAGESIZE,
TODO: TODO: TODO: TODO: Attributes TODO: Autopen, stayopen
Vmap objects can use a header at the beginning of the
mmap()'d area to carry information about the shape of the data
in the rest of the Vmap, and optionally the TODO: Type/size persistance TODO: Count TODO: User header Source code and documentation Copyright © 2002 by
Mark Lamb
|