Author |
Topic |
TheIriaeban
Master of Realmslore
USA
1289 Posts |
Posted - 29 Dec 2022 : 17:43:21
|
quote: Originally posted by TheIriaeban
quote: Originally posted by questing gm
quote: Originally posted by TheIriaeban
Ok, here is a link to something that will allow a tool to be written that can pull data from a pdf to create the required csv file. That would allow a published work to be processed into the needed csv files in a week or so (assuming you process a chapter at a time and use a database schema similar to the one I suggested above). Tables/graphics would have to be handled manually to a certain extent.
http://allthesystems.com/2020/10/read-text-from-a-pdf-with-powershell/
I like where this discussion is going and would very much like to try the writing tool shared on top. But not being the computer programmer that I am, I'm not sure how do I use said tool. If you don't mind guiding a complete dummy, I've downloaded and extracted the tool but what else do I need to install before I can run the script, and do I actually need a database infrastructure for this?
That article just shows an example. It doesn't do anything with the text read from the pdf so there isn't anything to see after it is run. You would need something like the following line at the end (which puts an output file in your My Documents folder named pdftext.txt:
$text | Set-Content -LiteralPath ([Environment]::GetFolderPath("MyDocuments") + "pdftext.txt")
If you are new to PowerShell, there is an intro at the link below that does not appear to be overly complicated (the one from MS includes stuff about Visual Studio that may unnecessarily confuse people):
Honkin' big link
One note on that intro, when it mentions about changing your computer's script execution policy, I would HIGHLY suggest you DO NOT use Unrestricted. It is mentioned there out of completeness, and I am sure the author would agree with my recommendation (I am sure the other tech sages here would, too).
I got this example working and it seems to work ok on pdfs that have already been OCR'd. Obviously, the quality of that OCR is reflected in the text file. For files that have not been OCR'd, it doesn't understand multiple columns so you get alternating lines from each column which can make reading it quite a challenge.
Hint: since you need to download the iText DLL, once you get it on your machine, you may need to go to the file's properties and unblock it (that is a security measure in newer OSes to help protect your system). Also, I always do a virus scan on anything I download before I do anything with it. I am just paranoid that way.
Mod edit: As in the original post, shortened that URL that was stretching out the page. |
"Iriaebor is a fine city. So what if you can have violence between merchant groups break out at any moment. Not every city can offer dinner AND a show."
My FR writeups - http://www.mediafire.com/folder/um3liz6tqsf5n/Documents
|
Edited by - Wooly Rupert on 02 Jan 2023 19:32:47 |
|
|
HighOne
Learned Scribe
216 Posts |
Posted - 29 Dec 2022 : 18:16:42
|
Just so you know, you guys are repeating a lot of work that other people have already done. |
|
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
HighOne
Learned Scribe
216 Posts |
Posted - 29 Dec 2022 : 19:25:07
|
quote: Originally posted by Gary Dallison
I'm not sure that it does repeat what others have done. If what I am intending to make already existed I would be using it and not making it.
The scanning and OCRing have been done by others. But these books are still under copyright, so no one is going to come into a public forum like this and say, "Hey, I digitized all the FR books and put them online! Here's the link!" |
|
|
hashimashadoo
Master of Realmslore
United Kingdom
1152 Posts |
Posted - 31 Dec 2022 : 13:18:44
|
quote: Originally posted by Gary Dallison
I'm not sure that it does repeat what others have done. If what I am intending to make already existed I would be using it and not making it.
I'm intending that this database will contain every paragraph or line or large block of text ever written about the realms. It wont paraphrase, it wont be interpreted, it will be the full text.
Each block will be tagged by any number of subjects, from geographic region, to deity, to important npcs or events.
It will be a lot of work and may never get finished but what does that matter. The wiki sort of does this but it doesnt include the exact text, is open to the writers interpretation and bias. Other things are limited in what they can include because they are on the internet so cannot include text that is not publicly available.
My solution is a personal one so I can do what I want. When finished it may be available to others but will not be openly available to anyone as it is a research tool for those continuing to develop the realms.
If I'm wrong and something does exist that includes the exact text of everything arranged by subject, then please point me to it as it will save me 50 years of work.
Yeah, the reason we don't copy the text verbatim over at the wiki is because that would be copyright infringement and would likely result in a bunch of takedown notices. |
When life turns it's back on you...sneak attack for extra damage.
Head admin of the FR wiki:
https://forgottenrealms.fandom.com/ |
|
|
Italian Archmage Karsus
Learned Scribe
126 Posts |
Posted - 02 Jan 2023 : 16:22:49
|
Let us know how it goes, Gary! You'll want some experience under your belt with this thing before the next improvement- because believe me, the first one is never enough. |
|
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
TBeholder
Great Reader
2428 Posts |
Posted - 02 Jan 2023 : 18:03:06
|
quote: Originally posted by Wooly Rupert
We already have the FR Wiki.
It’s a wiki. It’s on a wikifarm without decent search. And it was overrun by sneaky edition wars and assorted loons long ago. Either the same who raids the entire fandom.com, or it’s a slow zombie apocalypse. |
People never wonder How the world goes round -Helloween And even I make no pretense Of having more than common sense -R.W.Wood It's not good, Eric. It's a gazebo. -Ed Whitchurch |
|
|
Wooly Rupert
Master of Mischief
USA
36804 Posts |
Posted - 02 Jan 2023 : 19:28:38
|
But it's still a collaborative effort, with MANY people adding to it, and where the work of setting it up and making it accessible to everyone is already done.
While I'd prefer to see all editions embraced (like the way Wookieepedia has a "Canon" and a "Legends" entry for a lot of stuff), the FR wiki already has a lot of info there and people adding to it on a regular basis.
Maybe it's just me, but I think it's better to have one thing that everyone can access and work on, rather than many individual efforts going in a lot of different directions. Even if the one thing is less than perfect, it's still centralized.
Also, I've never had any real issue searching the FR wiki. If the info is there, I've found what I was looking for. |
Candlekeep Forums Moderator
Candlekeep - The Library of Forgotten Realms Lore http://www.candlekeep.com -- Candlekeep Forum Code of Conduct
I am the Giant Space Hamster of Ill Omen! |
Edited by - Wooly Rupert on 02 Jan 2023 19:29:10 |
|
|
Italian Archmage Karsus
Learned Scribe
126 Posts |
Posted - 02 Jan 2023 : 21:22:10
|
XD for what it's worth, Valhaeria is supposed to index freely available sources, precisely so that people intending to write about a subject in the wiki but who haven't the book can still get those hands on something.
Gary, I think over time you will need to figure sone automatization in order to speed things up. Let me know if you need help on that front! Valhaeria has taught me some things lol. |
|
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
Lord Karsus
Great Reader
USA
3741 Posts |
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
zyzzyva
Acolyte
USA
21 Posts |
Posted - 08 Feb 2023 : 16:59:50
|
I've been working on something similar in MySQL, essentially a geographical bibliography to use as a DM's reference (because I'm both the type of masochist to run my campaigns open world and the type of sicko who finds data entry fun.)
I'm using a three table structure--one table of sources, one table of geographic regions, and one table of references, which notes the pages where information on a particular region is found within a particular source. Regions also include a recursive reference to their parent region, so you could search within, say, Unapproachable East, and see Aglarond, Thay, etc. and all locations within them.
I don't really have an end goal for it yet, though may end up linking it to a map or somesuch eventually. |
|
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
TheIriaeban
Master of Realmslore
USA
1289 Posts |
Posted - 08 Feb 2023 : 19:22:26
|
I wrote a tool years ago that would allow me to OCR text out of a screenshot that was in the clipboard. MS has pretty much replicated that with one of the tools in their Power Toys:
https://learn.microsoft.com/en-us/windows/powertoys/text-extractor
That way, whether or not what you are looking at has been OCR'd, you can still pull out the text as you need. Hope this helps. |
"Iriaebor is a fine city. So what if you can have violence between merchant groups break out at any moment. Not every city can offer dinner AND a show."
My FR writeups - http://www.mediafire.com/folder/um3liz6tqsf5n/Documents
|
|
|
zyzzyva
Acolyte
USA
21 Posts |
Posted - 08 Feb 2023 : 19:25:27
|
quote: Originally posted by Gary Dallison
Sounds like a good idea. I'd be interested to see how you did that recursive link. I'm a novice at this so I've just been tagging things with parent tags.
I've just been tagging them with parent tags as well (usually just one, but occasionally two when different regions overlap in non-contained ways.)
Then I'll use multiple JOIN statements in a query to display sub-regions within each region. For instance:
SELECT A1.region_id, A1.region_name, A2.region_id, A2.region_name, A3.region_id, A3.region_name, A4.region_id,A4.region_name FROM FR_Biblio.FR_Regions A1 LEFT JOIN FR_Biblio.FR_Regions A2 ON (A1.region_id = A2.parent_key1) LEFT JOIN FR_Biblio.FR_Regions A3 ON (A2.region_id = A3.parent_key1) LEFT JOIN FR_Biblio.FR_Regions A4 ON (A3.region_id = A4.parent_key1) WHERE A1.region_name LIKE '%great dale%'
displays subregions within The Great Dale, then settlements within those subregions, then buildings within those settlements, etc. (LEFT JOIN used to avoid losing regions that don't have their own subregions)
|
|
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
Delnyn
Senior Scribe
USA
955 Posts |
Posted - 22 Feb 2023 : 17:04:38
|
quote: Originally posted by zyzzyva
Then I'll use multiple JOIN statements in a query to display sub-regions within each region. For instance:
SELECT A1.region_id, A1.region_name, A2.region_id, A2.region_name, A3.region_id, A3.region_name, A4.region_id,A4.region_name FROM FR_Biblio.FR_Regions A1 LEFT JOIN FR_Biblio.FR_Regions A2 ON (A1.region_id = A2.parent_key1) LEFT JOIN FR_Biblio.FR_Regions A3 ON (A2.region_id = A3.parent_key1) LEFT JOIN FR_Biblio.FR_Regions A4 ON (A3.region_id = A4.parent_key1) WHERE A1.region_name LIKE '%great dale%'
Why are you joining multiple tables with the same name? |
|
|
zyzzyva
Acolyte
USA
21 Posts |
Posted - 22 Feb 2023 : 18:00:23
|
quote:
Why are you joining multiple tables with the same name?
I structured my regions table so that I can use a single table for any region, regardless of what other regions it's located inside.
Basically, each region row includes a numerical ID for the region, the region name, some additional information (type, alternative names, etc.) and several parent key columns that specify the region id(s) of regions in which that region is located.
The above query structure is basically used just to visually display child regions within their parent regions (i.e., a query with 'Waterdeep' as the topmost region will show each of the wards, then each of the buildings located within those wards, etc.)
|
|
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
zyzzyva
Acolyte
USA
21 Posts |
Posted - 13 Sep 2023 : 18:49:35
|
Question for anyone working on a database--if you're thinking about tracking references within video games, do you have any thoughts on how to reference where a piece of information is located?
My (very initial) thoughts are to separate references into 'locations' and 'sources'. So a 'location' will be a place that physically appears in the game, while a 'source' will be any NPC/item description/lore book that appears in the game. Of course this doesn't provide a place to find where in a NPC's dialogue tree such information might exist, but it seems a little better than just saying 'There's some information about Mulsantir in this game somewhere' and leaving it at that. Thoughts? |
|
|
Ayrik
Great Reader
Canada
7989 Posts |
Posted - 13 Sep 2023 : 19:10:13
|
quote: Originally posted by zyzzyva Question for anyone working on a database--if you're thinking about tracking references within video games, do you have any thoughts on how to reference where a piece of information is located?
A good database is supposed to do that for you.
Each record should contain fields for each datum. So when you look at a record or you search for a pattern/filter of records then you should automatically be able to see all the related data.
If your database isn't able to contain and organize all the information you need to store in it then you should rethink the way it's structured. You may need to add more fields to contain references, sources, bibliographical or video game information, etc. There's always tradeoffs, of course - more fields means more size and more performance impact.
An alternative is to use external databases. IMDb is a good online database for tracking people who work on video games. (Although it's a useless database for tracking Realmslore in video games, lol.) |
[/Ayrik] |
|
|
zyzzyva
Acolyte
USA
21 Posts |
Posted - 13 Sep 2023 : 19:59:05
|
quote: If your database isn't able to contain and organize all the information you need to store in it then you should rethink the way it's structured. You may need to add more fields to contain references, sources, bibliographical or video game information, etc. There's always tradeoffs, of course - more fields means more size and more performance impact.
I suppose this is the question I'm struggling with. It's one thing to have a robust system of referencing data within a specific game (I've seen some excellent databases for Disco Elysium's dialogue, for instance), but a simpler system that can be abstracted for all games (and cohabitate with citations for books, magazines, etc.) is a much more challenging prospect.
FWIW, the database I'm building only cares about locating information within a specific place, and not actually providing that information. So a citation might look like:
Location: Mulsantir Source: NN2: MotB Information at: [NPC name]
I'm not sure if there's anything more granular I can include here in a generalizable way, but I'm still in the conceptualization stage here, hence my question. I assume there might be resources available for more specific dialogue tags for, perhaps, the Baldur's Gate series, but I doubt I'd be able to find something similar for, say, Darkness Over Daggerford. |
|
|
Topic |
|